AlbaDsl & albaVm: Haskell based DSL and VM for Bitcoin Cash 2025 contract programming

Very cool project @albaDsl, thanks for sharing! Have you thought about or experimented with getting more efficient stack scheduling in the compiled output?

2 Likes

Thanks! Only thought about it so far as to include part of the CashScript optimization rules:

[https://github.com/albaDsl/alba-dsl/blob/main/src/Alba/Dsl/V1/Common/CashScriptOptimizerRules.hs]

Since it is a opcode level DSL some of the responsibility also falls on the developer using e.g. “roll” over “pick” in the right places etc. But I’m sure there is a lot more the tool can do.

Also, some contract developers may perhaps choose to sacrifice efficiency and byte size in order to get a one-to-one correspondence between the source code and the compiled output.

In the contract examples I have provided I haven’t focused on bytecode efficiency, just on quickly getting something that solves the problem.

1 Like

Makes sense! I’d be curious how you might apply Haskell to the next level of optimization, even just limited to optimizing sequences of contiguous stack juggling operations like this tool (simply patched for BCH – note this optimizes for opCount though, the old blocker before 2025 limits): https://github.com/bitjson/bch-wizard

(Edit: and then with type awareness, it seems likely that the DSL could identify the input/output patterns of the business logic between such sequences, allowing it to work one level higher at suggesting better orderings of business logic blocks. Not sure how far you can go with tracking specific output → input requirements of that business logic. Is there a path all the way to “represent blocks of constraints as a DAG and reschedule an optimal, provably-equivalent program”?)

With “efficient stack scheduling” what else do you have in mind than op count reduction? I’m thinking that shuffling code blocks to, for example, avoid deep stack references all boils down to reducing the op code count (byte size).

Making the right data layout choices in byte arrays (records) also would have large impact.

Yep! For most use cases, contract length reduction is the key metric – between two equivalent contracts (equivalent functionality, safety, audit quality, etc.), compiled length is what impacts real user experiences (TX fees).

Though if you want the compiler to always be able to minimize ultimate TX sizes, you also need an op-cost minimizing “mode” to switch into when the contract’s worst case(s) exceed its budget. E.g. there are cost-prioritized and length-prioritized emulations for various bitwise ops right now, and picking the right one matters a lot for ultimate contract length if a naive length-focused compilation exceeds the op-cost budget. (The compiler can also add a <variable_length_garbage> [...] OP_DROP area to give some spend-time op-cost flexibility if the budget is too close or can’t be proven safe in all cases.)

2 Likes

Loops (OP_BEGIN / OP_UNTIL) have now been added to the 2026 VM and DSL. A few other language constructs were also added, and some optimizations were performed to the VM and examples.

3 Likes

Initial support for OP_DEFINE / OP_INVOKE has been added to the 2026 VM and DSL. Both functions (global namespace) and lambdas were added based on this construct. The OP_EVAL based lambdas were removed but may be added back later (as experimental) for mutable “closure like” support.

3 Likes

Work was continued on the albaDsl secp256k1 EC multiply demo which also lead to some improvements to the albaVm efficiency. Steps were taken to optimize the EC code: VM2026 functions were introduced to compress the code and the algorithm was changed to use Jacobian coordinates. Current code size is ~1300 bytes. Since loops are now used instead of recursion, execution fits within current stack limits. On albaVm multiplying G by n - 1 (where n is the curve order) now takes around 40 ms (20x that of a straightforward native Haskell implementation). The EC algorithm/implementation can be improved further to bring down the execution time and code size.

3 Likes

The VM 2026 function support in albaDsl was revamped somewhat. VM 2026 functions are now managed in the same way as macros and can thus be more easily grouped into Haskell modules and referenced without fuss. During compilation albaDsl will collect all functions into a function table at the beginning of the program (using OP_DEFINE). Slots are assigned to functions based on how frequently they are used. The function breakdown for the EC multiplication demo can be seen below. Program size is now around 900 bytes and execution time on albaVm is ~33ms.

Module                                   Line  Function                  Ops   Slot  Sites
------------------------------------------------------------------------------------------
DslDemo.EllipticCurve.Field              44    primeModulus              1     0     57
DslDemo.EllipticCurve.Field              47    modulo                    20    1     9
DslDemo.EllipticCurve.Jacobian           37    ecMul                     6     2     7
DslDemo.EllipticCurve.JacobianAdd        18    ecDoubleJ                 147   3     2
DslDemo.EllipticCurve.Jacobian           40    ecMulJ                    70    4     1
DslDemo.EllipticCurve.Jacobian           73    toJacobian                58    5     1
DslDemo.EllipticCurve.Jacobian           89    fromJacobian              177   6     1
DslDemo.EllipticCurve.JacobianAdd        65    ecAddJ                    335   7     1
2 Likes

The EC multiply code is now 600 bytes. The function table can be seen below. Note that some functions are quite small but called frequently.

Module                                   Line  Function                  Slot  Ops   Sites
------------------------------------------------------------------------------------------
DslDemo.EllipticCurve.Field              27    feMul                     0     4     36   
DslDemo.EllipticCurve.Field              24    feSub                     1     20    9    
DslDemo.EllipticCurve.Field              33    feCube                    2     6     5    
DslDemo.EllipticCurve.JacobianPoint      44    makeIdentity              3     15    4    
DslDemo.EllipticCurve.JacobianPoint      61    getX                      4     5     4    
DslDemo.EllipticCurve.JacobianPoint      66    getY                      5     5     4    
DslDemo.EllipticCurve.JacobianPoint      71    getZ                      6     4     4    
DslDemo.EllipticCurve.Field              45    primeModulus              7     1     3    
DslDemo.EllipticCurve.JacobianPoint      25    makePoint                 8     18    3    
DslDemo.EllipticCurve.JacobianPoint      58    getTag                    9     4     3    
DslDemo.EllipticCurve.Field              39    feInv                     10    41    2    
DslDemo.EllipticCurve.JacobianAdd        19    ecDoubleJ                 11    75    2    
DslDemo.EllipticCurve.JacobianPoint      76    getField                  12    7     2    
DslDemo.EllipticCurve.Jacobian           38    ecMul                     13    48    1    
DslDemo.EllipticCurve.Jacobian           70    toJacobian                14    29    1    
DslDemo.EllipticCurve.Jacobian           81    fromJacobian              15    57    1    
DslDemo.EllipticCurve.JacobianAdd        40    ecAddJ                    16    183   1    
Functions total: 17
3 Likes

A feature was added to albaVm to export execution logs to HTML. And a feature to albaDsl for sharing information about the function table with albaVm. This way, the function definitions & calls in the HTML can be decorated/annotated. This can be used to illustrate what I think is the key feature of the Functions CHIP: functions provide a means of abstraction.

As an example we will evaluate this albaDsl program which uses a mini-language of functions (‘feMul’, ‘feAdd’, ‘feSub’, and ‘feCube’) to perform modular arithmetic on field elements (integers):

progAbstraction :: FN s (s > TBool)
progAbstraction =
  begin
    # int 3 # int 7 # feMul # int 1 # feAdd # int 12 # feSub # feCube
    # int 1000 # opNumEqual

The functions definitions are not shown here. In this example, the integer arguments are much smaller than the modulus so the results of the function applications are the same as the corresponding arithmetic operators.

Log #1 shows the execution for when functions are not used. I.e. ‘feMul’ etc. are defined as macros. Execution log #2 shows the result when they are defined as functions and the logs have been annotated with the function names.

In log #1 it is hard to see the overall structure of the program and how it ties back to the source code. There is lots of repeated code which one has to continually re-read while trying to grasp what is going on. With functions, the repetition is factored out (log #2). We can read the code at a higher abstraction level and not worry about the underlying details. If we are interested in the details we can expand the function calls by clicking on the respective row. The logging annotation framework hides the OP_INVOKES so that the functions can be read as higher-level opcodes (user defined words in Forth lingo).

I think it is a strength of Bitcoin that the opcode language is readable and not just seen as a compilation target. The functions CHIP greatly improves readability by adding structure and removing repetition. And with functions there can be a one-to-one correspondence between source language constructs and opcode level constructs.

Note how functions in some cases are just a few opcodes long. It is important that the function mechanism is very lightweight for this to be feasible.

3 Likes

I implemented a simple windowed version of Elliptic Curve point multiplication. It improves performance and also shows how OP_DEFINE/OP_INVOKE can be used to implement dynamically computed lookup tables.

My previous changes to reduce the code size using functions came with a performance regression to ~38 ms. But the windowed version of EC multiply brings that down to 26 ms (a ~30% improvement). The code size for table setup and multiply comes to 723 bytes.

The following code shows how to use the windowed EC multiply:

test :: FN s (s > TPoint > TPoint)
test =
  begin
    # gTable # g # setupTable
    # gTable # nat 100_000_000_000 # ecMul
    # gTable # nat 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEBAAEDCE6AF48A03BBFD25E8CD0364140 # ecMul
  where
    gTable = nat 100

    g = <generator point>

In this case we will calculate n*G where G is the generator point. But we could have used any other point on the curve. We first initialize a table for G. It starts at function slot 100 and will stretch to slot 115. The table contains entries for n’*G where n’ goes from 0…15. Each entry is 100 bytes long. We then use the precomputed table in two point multiplications. First with a small scalar and then with a large scalar.

The algorithm works by first breaking down ‘n’ into a string of nibbles. To store these a bytestring on the stack is used: each nibble is stored in a byte and then OP_CATed together. I refer to these as vectors. The algorithm then OP_SPLITs off one nibble at a time from the vector, and uses it as an index into the lookup table to get a point value that is then used in the rest of the computation.

The lookup table could have been stored in a vector too, but OP_INVOKE based lookups are more efficient (especially for random access as here). Vectors have the benefit that they don’t consume the limited slot resource. So this is a trade-off. A combination of vectors and OP_DEFINE/OP_INVOKE probably gives best results.

3 Likes

The 2026 Bitwise CHIP has been implemented in albaDsl & albaVm. The VM passes the Libauth success vectors for bitwise.

7 Likes

AlbaVm 2025 & AlbaVm 2026 now pass all the Libauth 2025 & 2026 tests respectively. For the success vectors, VM Limits cost is also verified to be correctly calculated. Left to do is to verify correct VM Limits and failure reasons for the failure vectors. Example test run:

Tests
  Libauth vectors 2025
    bch_2025_standard in standard-mode:        Running all 7855 tests
                                               OK (0.55s)
    bch_2025_standard in non-standard-mode:    Running all 7855 tests
                                               OK (0.52s)
    bch_2025_nonstandard in standard-mode:     Running all 7968 tests
                                               OK (0.54s)
    bch_2025_nonstandard in non-standard-mode: Running all 7968 tests
                                               OK (1.17s)
    bch_2025_invalid in standard-mode:         Running all 12260 tests
                                               OK (0.75s)
    bch_2025_invalid in non-standard-mode:     Running all 12260 tests
                                               OK (0.85s)
  Libauth vectors 2026
    bch_2026_standard in standard-mode:        Running all 14283 tests
                                               OK (1.33s)
    bch_2026_standard in non-standard-mode:    Running all 14283 tests
                                               OK (1.28s)
    bch_2026_nonstandard in standard-mode:     Running all 2289 tests
                                               OK (0.22s)
    bch_2026_nonstandard in non-standard-mode: Running all 2289 tests
                                               OK (0.53s)
    bch_2026_invalid in standard-mode:         Running all 13159 tests
                                               OK (0.97s)
    bch_2026_invalid in non-standard-mode:     Running all 13159 tests
                                               OK (0.97s)

All 12 tests passed (9.68s)
4 Likes

Hey, do you feel like implementing SHA-256 in Script? And make it so that it can continue hashing off an intermediate state. With current hash opcodes we’re limited to 10kB blobs. With stream hashing we could chain transactions together and hash unlimited blobs.

1 Like

Are there compelling use cases for stream hashing in contracts? It could be an interesting project, but I have some other things I’m working on at the moment. If you want to try AlbaDsl you could have a go yourself too. Right now I’m working on a vector library. Perhaps it is even useful for this problem.

I’m dabbling in it now, just thought to ask you since you already got EC implemented.
It can be used to verify TXID preimages greater than 10kB in size, albeit parsing would still be challenging.

I got most of it done albeit struggling with stack juggling (draft, fb_compress, still incomplete and OP_PICK depths are off), but I think I got a good estimate for ballpark compute cost: 1 round needs ~3kB of “filler” bytes :sweat_smile:
So to hash 1 MB (max. consensus TX size) would require ~15k * 3kB “filler” bytes, so hashing 1MB would cost ~0.5 BCH and require a chain of at least 45 x 1MB TXs.
Very impractical, I’m dropping this idea. If we want to be able to analyze our own raw TXs with contracts, maybe dedicated opcodes would be better.

2 Likes

The fillers are to pay for operation costs I assume. In the coming years perhaps the VM per-op-cost can be reduced significantly. But stream hashing, EC scalar point multiplication, and other compute-intense crypto operations may still be very costly in Script. It would be nice if we could introduce generic opcodes (such as vector instructions) to help make such algorithms more practical in Script and reduce the need for custom opcodes. Not sure if we can find such generic solutions for these particular algorithms, but worth investigating.

2 Likes

I worked out the missing details and completed my sha256_compress implementation (fb_compress in the template).

BitAuth IDE link

The test vector matches sha256(0x1234), the state_32b is set up to be sha256 IV, and the block_64b is the message padded to 64-byte block.

The implementation is 632 bytes. To be able to run 1 round it needs to buy extra budget using additional “filler” of 2966 bytes, totaling to 3598 bytes locking script. Practically unusable.

@bitjson was already saying he wants to CHIP reducing the base cost, and I suspect that would make this script executable without any filler, at least for 1 round of compress. If cost would drop so low that 64 bytes could “buy” 1 round of compress, then it would become feasible to use this script to hash some 9.3kB and “carry” the result to next TX.

Anyway, it was a fun challenge to implement this, I’ll re-evaluate once we start talking about base cost reduction, maybe it will be usable for some future VM benchmark.

PS this is actually useful as-is for my idea of header+coinbase oracle. Even though I can’t parse a >10kB coinbase TX in Script - I can at least use the sha256_compress to prove that it is >10kB without actually having to reveal anything other than last 64-byte block of the TX. This way, the oracle can be convinced to just skip that block.

3 Likes