UltrafastSecp256k1: A High-Performance secp256k1 Execution Engine for Wallets, Privacy, and Scalable BCH Applications

Hi everyone,

I would like to introduce UltrafastSecp256k1 , an open-source secp256k1 cryptographic execution engine I have been developing.

The project started as a performance-focused secp256k1 implementation, but over time it evolved into something broader: a multi-backend cryptographic engine with CPU, GPU, audit, and application-level pipelines built into one repository.

The main idea is simple:

secp256k1 should not only be correct and secure; it should also be fast enough to make privacy and wallet features practical for everyday users.

What the project provides

UltrafastSecp256k1 includes:

  • CPU fast-path secp256k1 operations
  • CPU constant-time paths for secret-bearing operations
  • CUDA backend
  • OpenCL backend
  • Metal backend
  • C ABI
  • batch ECDSA / Schnorr verification
  • ECDH
  • BIP-340-related primitives
  • BIP-352-style scanning pipelines
  • FROST partial verification
  • MuSig2/FROST-related infrastructure
  • ZK/DLEQ/witness-oriented primitives
  • benchmark tooling
  • reproducible audit and assurance infrastructure

The goal is not merely to optimize one primitive in isolation. The goal is to provide a full execution layer that builders can use to assemble higher-level wallet, privacy, and indexing systems.

In short:

primitives are included, pipelines are ready, and verification is built in.

Why this matters for BCH

Bitcoin Cash has a strong focus on practical payments, low fees, and user-facing applications. Many privacy and wallet features become difficult not because the cryptography is impossible, but because the required scanning, indexing, or verification work becomes too expensive for normal devices.

This is especially true for privacy-oriented reusable payment schemes, stealth-address-like designs, and wallet-side scanning models.

If a wallet must scan many outputs or compute many elliptic-curve operations, the user experience can degrade quickly. On desktop this may be tolerable. On mobile, it can become painful.

My work focuses on reducing that friction.

Local-first, GPU-optional design

A key design principle is:

GPU acceleration should be an option, not a dependency.

The engine supports GPU acceleration where available, but I am also working on CPU-side scan optimizations so that normal devices can remain useful without relying on a remote accelerator.

The intended model is:

  • CPU mode for maximum portability and local-first wallet behavior
  • local GPU mode for desktops or laptops with OpenCL/CUDA/Metal support
  • remote accelerator mode only when a user or service explicitly wants fast catch-up or high-volume indexing
  • hybrid mode where recent blocks are scanned locally first, while large historical backlogs can be accelerated separately

This avoids turning privacy features into server dependencies.

CPU PreSer mode

One of the current optimization directions is what I call CPU PreSer .

The idea is to split a scan pipeline into two stages:

  1. an expensive stage that computes and serializes shared-secret material
  2. a cheaper online matching stage that uses the pre-serialized material

This allows wallets to cache, pipeline, or reuse work instead of treating every scan as a fully fresh computation.

For normal wallet usage, the user usually does not need to scan months or years of history every time the app opens. Most of the time, the wallet only needs to catch up a small number of recent blocks.

This means a practical architecture can be:

  • local CPU scan for daily use
  • local GPU acceleration if available
  • optional accelerated catch-up only when the wallet has been offline for a long time

That model is much more compatible with user sovereignty than assuming every wallet needs a permanent remote scanning server.

Why I am posting here

I am interested in feedback from BCH developers on three questions:

  1. Which BCH wallet or application use cases would benefit most from a faster secp256k1 execution engine?
  2. Are there BCH-native privacy or reusable address designs where this kind of scan acceleration would be useful?
  3. What assumptions or risks should be considered before integrating such an engine into BCH wallets or infrastructure?

I am not asking anyone to trust the project blindly. The repository is open, the benchmarks are reproducible, and the audit pipeline is designed to be extended by others.

The most useful feedback would be adversarial:

  • which assumption looks weak?
  • which benchmark is missing?
  • which integration mode is unrealistic?
  • which BCH use case should be tested first?

My goal is to make this useful infrastructure, not just a performance claim.

7 Likes
  1. I looked at your GitHub page and I did not find any benchmarks comparing current node software (like BCHN, Knots, BCHD) to your implementation. Are they there? I also did a search, not found.

  2. Some of the benchmarks only compare a single instance of Secp256k1, which executes in nanoseconds. But it is not really useful to only compare a single execution because the result can be skewered for a hundred different reasons. For proper benchmarking and to get more reliable results you should always compare multiple executions, like 1000, 10000, 100,000 or more.

3 Likes

there is many banchmars but for bitcoin not for bitcoin cash but this not problem in repo also is all benchmark tools that can be build and lunched very easy and reproduce anything by your self without trusting me on anything.

Oh well, since you’re posting on a Bitcoin Cash forum, it is understandable that people will expect specific benchmarks for BCH node software.

BTW did you know that, contrary to BTC, BCH has 6 different implementations of Node software (3 actively maintained AFAIK)?

That’s a lot of exciting benchmarks to do.

Especially Knuth is very performance-oriented.

3 Likes

thanks for informations. i will add bch staff soon separatly i laready finished bch shim also just most time i was testing evrything on BTC but i’m open for any new ideas benchmarks and implementations also if missing something

Hey, I understand at least some of these words! :wink:

in this days i will add new benchmark suite in repo for bitcoin cash and will also release my results in repo and if you interested you will be able to reproduce evrything on your side also. if you have any tech questions or suggestions i’m open for evrything.

image
this is from Knuth own benchmark tool 10Ă—20,000 ops, median:

1 Like

Impressive work! Did some other cryptocurrencies already start using your lib?

AFAIK it’s currently not a bottleneck anywhere, although it is nice to know there’s option for a 3-4x speed-up. It would be nice if nodes could make use of GPUs to speed up validation, but this is really most relevant for pool nodes.

Due to inherited Bitcoin implementation architecture, block validation delays block propagation (cs_main lock), so this would let us kick the can down the road and speed up relaying without having to re-architect the node :sweat_smile:

Are there BCH-native privacy or reusable address designs where this kind of scan acceleration would be useful?

Yes, there’s “RPA” (reusable payment addresses) which needs some EC grinding.

What assumptions or risks should be considered before integrating such an engine into BCH wallets or infrastructure?

I guess generic concerns apply: correctness (especially if implemented in consensus), and security (side-channel attacks).


You may be interested in a related blockchain, a code-fork of BCH called Nexa. They actually use EC-mul as part of their PoW, and the idea was to incentivize creation of efficient hardware and libs for EC ops. I don’t know what implementation their miners use now, but maybe your lib would be competitive?

1 Like

this is cpu benchmarks gpu is much faster at this moment frigate electrum server integrated it for bitcoin silent payments. with 5060ti i’m achieving 11 m tx in second scan speed and silent payments becomes possible. but for most coins that works on secp256k1 curve can use my engine i’m doing shim now for different coin nodes

  • UF vs Bitcoin Core bench: +5…+41%
  • UF vs Knuth : +3.6-3.8Ă—
    this all are not my bench tools all this numbers comes from original bench tools from them
1 Like

what bout correctness you can see my CAAS system you find all answers there.

2 Likes

I like the idea behind this, it really sounds promising!

Thanks for sharing your work here, it may come in handy one day. Note that attention is scarce and BCH people are spread thin working on other stuff, so don’t expect folks to dig deep into this now.

2 Likes

thanks for your time and effort. my goal is share to everyone who may use it and make things better.
like you others will see it someone will use and see results and benefits.

2 Likes

Thanks, this is very useful context.

RPA is exactly the kind of BCH-native use case I was looking for. My main interest is not “replace something just for speed”, but finding places where faster secp256k1 execution can make privacy/reusable-address designs more practical.

For BCH, I’ll look at:

  • the RPA spec
  • the Electron Cash implementation PR
  • where the EC grinding/scanning cost appears
  • whether this can be benchmarked as a standalone workload first

I agree that correctness and side-channel behavior matter most if anything touches wallet or consensus paths. My preferred model is: start outside consensus, publish reproducible benchmarks/tests, and only then discuss integration points.

1 Like

image
more benchmarks if you interested with them

3 Likes

I think counting a single execution each time and then making an average/median could be skewering the results.

Is it not possible to execute the code 100,000 times and count the final time instead?

Well, I could be wrong tho

this is not my benchmarks this is when i’m integrating shim there using thier benchmark tools not mine all this measurment are knuth harness benchmarks not mine

1 Like

It’s typical to do it like you say (timestamp, execute many times, timestamp) and then report the average for 1 execution. If report shows per-execution, it doesn’t mean it was just 1 execution that was measured.

Yeah you could be right.

However it is presented in a way that makes it look like that’s how it happened :man_shrugging:

never trust me take anything from repo and reproduce your self all is avalible in repo for tests benchmarks you can test and reproduce anything by your self. all things are documented and avalible i’m not hiding anything

1 Like