UltrafastSecp256k1: A High-Performance secp256k1 Execution Engine for Wallets, Privacy, and Scalable BCH Applications

shrec · April 27, 2026, 10:49pm

Hi everyone,

I would like to introduce UltrafastSecp256k1 , an open-source secp256k1 cryptographic execution engine I have been developing.

The project started as a performance-focused secp256k1 implementation, but over time it evolved into something broader: a multi-backend cryptographic engine with CPU, GPU, audit, and application-level pipelines built into one repository.

The main idea is simple:

secp256k1 should not only be correct and secure; it should also be fast enough to make privacy and wallet features practical for everyday users.

What the project provides

UltrafastSecp256k1 includes:

CPU fast-path secp256k1 operations
CPU constant-time paths for secret-bearing operations
CUDA backend
OpenCL backend
Metal backend
C ABI
batch ECDSA / Schnorr verification
ECDH
BIP-340-related primitives
BIP-352-style scanning pipelines
FROST partial verification
MuSig2/FROST-related infrastructure
ZK/DLEQ/witness-oriented primitives
benchmark tooling
reproducible audit and assurance infrastructure

The goal is not merely to optimize one primitive in isolation. The goal is to provide a full execution layer that builders can use to assemble higher-level wallet, privacy, and indexing systems.

In short:

primitives are included, pipelines are ready, and verification is built in.

Why this matters for BCH

Bitcoin Cash has a strong focus on practical payments, low fees, and user-facing applications. Many privacy and wallet features become difficult not because the cryptography is impossible, but because the required scanning, indexing, or verification work becomes too expensive for normal devices.

This is especially true for privacy-oriented reusable payment schemes, stealth-address-like designs, and wallet-side scanning models.

If a wallet must scan many outputs or compute many elliptic-curve operations, the user experience can degrade quickly. On desktop this may be tolerable. On mobile, it can become painful.

My work focuses on reducing that friction.

Local-first, GPU-optional design

A key design principle is:

GPU acceleration should be an option, not a dependency.

The engine supports GPU acceleration where available, but I am also working on CPU-side scan optimizations so that normal devices can remain useful without relying on a remote accelerator.

The intended model is:

CPU mode for maximum portability and local-first wallet behavior
local GPU mode for desktops or laptops with OpenCL/CUDA/Metal support
remote accelerator mode only when a user or service explicitly wants fast catch-up or high-volume indexing
hybrid mode where recent blocks are scanned locally first, while large historical backlogs can be accelerated separately

This avoids turning privacy features into server dependencies.

CPU PreSer mode

One of the current optimization directions is what I call CPU PreSer .

The idea is to split a scan pipeline into two stages:

an expensive stage that computes and serializes shared-secret material
a cheaper online matching stage that uses the pre-serialized material

This allows wallets to cache, pipeline, or reuse work instead of treating every scan as a fully fresh computation.

For normal wallet usage, the user usually does not need to scan months or years of history every time the app opens. Most of the time, the wallet only needs to catch up a small number of recent blocks.

This means a practical architecture can be:

local CPU scan for daily use
local GPU acceleration if available
optional accelerated catch-up only when the wallet has been offline for a long time

That model is much more compatible with user sovereignty than assuming every wallet needs a permanent remote scanning server.

Why I am posting here

I am interested in feedback from BCH developers on three questions:

Which BCH wallet or application use cases would benefit most from a faster secp256k1 execution engine?
Are there BCH-native privacy or reusable address designs where this kind of scan acceleration would be useful?
What assumptions or risks should be considered before integrating such an engine into BCH wallets or infrastructure?

I am not asking anyone to trust the project blindly. The repository is open, the benchmarks are reproducible, and the audit pipeline is designed to be extended by others.

The most useful feedback would be adversarial:

which assumption looks weak?
which benchmark is missing?
which integration mode is unrealistic?
which BCH use case should be tested first?

My goal is to make this useful infrastructure, not just a performance claim.

ShadowOfHarbringer · April 29, 2026, 1:32pm

I looked at your GitHub page and I did not find any benchmarks comparing current node software (like BCHN, Knots, BCHD) to your implementation. Are they there? I also did a search, not found.
Some of the benchmarks only compare a single instance of Secp256k1, which executes in nanoseconds. But it is not really useful to only compare a single execution because the result can be skewered for a hundred different reasons. For proper benchmarking and to get more reliable results you should always compare multiple executions, like 1000, 10000, 100,000 or more.

shrec · April 29, 2026, 9:33am

there is many banchmars but for bitcoin not for bitcoin cash but this not problem in repo also is all benchmark tools that can be build and lunched very easy and reproduce anything by your self without trusting me on anything.

ShadowOfHarbringer · April 29, 2026, 2:07pm

Oh well, since you’re posting on a Bitcoin Cash forum, it is understandable that people will expect specific benchmarks for BCH node software.

BTW did you know that, contrary to BTC, BCH has 6 different implementations of Node software (3 actively maintained AFAIK)?

That’s a lot of exciting benchmarks to do.

Especially Knuth is very performance-oriented.

shrec · April 29, 2026, 2:20pm

thanks for informations. i will add bch staff soon separatly i laready finished bch shim also just most time i was testing evrything on BTC but i’m open for any new ideas benchmarks and implementations also if missing something

ShadowOfHarbringer · April 29, 2026, 6:31pm

Hey, I understand at least some of these words!

shrec · April 29, 2026, 6:36pm

in this days i will add new benchmark suite in repo for bitcoin cash and will also release my results in repo and if you interested you will be able to reproduce evrything on your side also. if you have any tech questions or suggestions i’m open for evrything.

shrec · April 29, 2026, 7:44pm

this is from Knuth own benchmark tool 10×20,000 ops, median:

bitcoincashautist · April 29, 2026, 8:03pm

Impressive work! Did some other cryptocurrencies already start using your lib?

AFAIK it’s currently not a bottleneck anywhere, although it is nice to know there’s option for a 3-4x speed-up. It would be nice if nodes could make use of GPUs to speed up validation, but this is really most relevant for pool nodes.

Due to inherited Bitcoin implementation architecture, block validation delays block propagation (cs_main lock), so this would let us kick the can down the road and speed up relaying without having to re-architect the node

Are there BCH-native privacy or reusable address designs where this kind of scan acceleration would be useful?

Yes, there’s “RPA” (reusable payment addresses) which needs some EC grinding.

What assumptions or risks should be considered before integrating such an engine into BCH wallets or infrastructure?

I guess generic concerns apply: correctness (especially if implemented in consensus), and security (side-channel attacks).

You may be interested in a related blockchain, a code-fork of BCH called Nexa. They actually use EC-mul as part of their PoW, and the idea was to incentivize creation of efficient hardware and libs for EC ops. I don’t know what implementation their miners use now, but maybe your lib would be competitive?

shrec · April 29, 2026, 8:07pm

this is cpu benchmarks gpu is much faster at this moment frigate electrum server integrated it for bitcoin silent payments. with 5060ti i’m achieving 11 m tx in second scan speed and silent payments becomes possible. but for most coins that works on secp256k1 curve can use my engine i’m doing shim now for different coin nodes

UF vs Bitcoin Core bench: +5…+41%
UF vs Knuth : +3.6-3.8×
this all are not my bench tools all this numbers comes from original bench tools from them

shrec · April 29, 2026, 8:11pm

what bout correctness you can see my CAAS system you find all answers there.

github.com

shrec/UltrafastSecp256k1/blob/main/docs/CAAS_PROTOCOL.md

# CAAS — Continuous Audit as a Service Protocol

> Version: 1.0 — 2026-04-21
> Authoritative spec for the CAAS pipeline used by `caas_runner.py` and
> `.github/workflows/caas.yml`. Pairs with `AUDIT_MANIFEST.md` P20.

## Origin & Motivation

CAAS started from two questions the author kept asking himself while
building UltrafastSecp256k1:

1. **"If I don't have $100k, does my project never get to see daylight?"**
   Traditional cryptographic audits cost between $40k and $250k per
   engagement (Trail of Bits, NCC Group, Cure53, Quantstamp). For most
   open-source crypto authors that price tag is prohibitive. The result
   is a two-tier ecosystem: well-funded code gets a PDF, everything else
   ships uninspected. That is not a security model — it is a budget
   filter. A real-world, useful, fast crypto engine should not be
   blocked from production use just because its author cannot write a
   six-figure check.

This file has been truncated. show original

bitcoincashautist · April 29, 2026, 8:20pm

I like the idea behind this, it really sounds promising!

Thanks for sharing your work here, it may come in handy one day. Note that attention is scarce and BCH people are spread thin working on other stuff, so don’t expect folks to dig deep into this now.

shrec · April 29, 2026, 8:29pm

thanks for your time and effort. my goal is share to everyone who may use it and make things better.
like you others will see it someone will use and see results and benefits.

shrec · April 29, 2026, 8:49pm

Thanks, this is very useful context.

RPA is exactly the kind of BCH-native use case I was looking for. My main interest is not “replace something just for speed”, but finding places where faster secp256k1 execution can make privacy/reusable-address designs more practical.

For BCH, I’ll look at:

the RPA spec
the Electron Cash implementation PR
where the EC grinding/scanning cost appears
whether this can be benchmarked as a standalone workload first

I agree that correctness and side-channel behavior matter most if anything touches wallet or consensus paths. My preferred model is: start outside consensus, publish reproducible benchmarks/tests, and only then discuss integration points.

shrec · April 29, 2026, 8:55pm

more benchmarks if you interested with them

ShadowOfHarbringer · April 30, 2026, 10:45am

I think counting a single execution each time and then making an average/median could be skewering the results.

Is it not possible to execute the code 100,000 times and count the final time instead?

Well, I could be wrong tho

shrec · April 30, 2026, 11:04am

this is not my benchmarks this is when i’m integrating shim there using thier benchmark tools not mine all this measurment are knuth harness benchmarks not mine

bitcoincashautist · April 30, 2026, 10:49am

It’s typical to do it like you say (timestamp, execute many times, timestamp) and then report the average for 1 execution. If report shows per-execution, it doesn’t mean it was just 1 execution that was measured.

ShadowOfHarbringer · April 30, 2026, 10:50am

Yeah you could be right.

However it is presented in a way that makes it look like that’s how it happened

shrec · April 30, 2026, 10:55am

never trust me take anything from repo and reproduce your self all is avalible in repo for tests benchmarks you can test and reproduce anything by your self. all things are documented and avalible i’m not hiding anything