CHIP 2024-12 OP_EVAL: Function Evaluation

bitcoincashautist · January 22, 2025, 10:30am

Emulated eval allows mast-like constructions but DOESN’T allow compression, because each “call” needs its own input to replicate & execute the code + the overhead of introspection opcodes gluing everything together.

MathieuG · January 22, 2025, 11:21am

CashScript user-defined functions

For CashScript we created an issue for reusable functions in Jul, 2023. The issue we were thinking about at the time was to create easy emulation for muldiv which wouldn’t overflow in the intermediate result.

Our toy-example (which doesn’t contain the real logic) was to allow a .cash library file like the following:

library Math {
  function muldiv(int x, int y, int z) returns (int) {
    return x * y / z;
  }
}

which users could then use in their contract like

import { muldiv } from "Math.cash";

contract Example() {
    function test(int x, int y, int z) {
        int result = muldiv(x, y, z);
        ...
    }
}

CashScript, a modular programming language

Having callable functions with return types adds a lot of extra scope to CashScript. We’d now need to figure out import functionality/syntax, function dependency graphs, possibly also dependency versioning etc.

With this functionality CashScript would become a ‘modular programming language’, which is in line with industry standard but definitely an important expansion from the single-file contracts we have today.

It’s definitely be a challenge but it would make a ‘proper’ fully featured programming language from CashScript

User-defined functions

To have complex zero-knowledge programs, we need user-defined functions. OP_EVAL can be a compiler detail to some extent that it is optimizing the resulting code, but for developer experience it is essential users can define reusable function APIs.

As Jason wrote in the ‘brainstorming eval’ thread, we’d expect developers to reuse functions in their code, which uses other functions in their code which uses other functions. The smart contract code to implement zero-knowledge proof verification should be readable as if it was written in JS. This logic is not blockchain specific, it is just math/cryptography.

What happens with the result of this function, how it affects the desired transaction shape, would of course be using BCH-specific logic using transaction introspection.

Function APIs

Jason described the problems with OP_EXEC lambda/function evaluation behavior:

I definitely agree with point 1 and 3, but I’m not sure I understand/agree with 2

let’s discuss how I envision CashScript would use ‘function APIs’
instead of:

code param1…paramN N_Params M_Returns OP_EXEC => ret1…retN

we would use the model

param1…paramN code OP_EVAL => ret1…retN

Instead of modifying the existing deep stack items, which might be at variable depths each time the reusable function is called, we just stack juggle the params to the top before executing the function and then leave the function results at the top of the stack. We wouldn’t modify deep stack elements.

This is the coding paradigm that CashScript reusable functions would have, similar to functional programming, functions have inputs and outputs and don’t have any side-effects of mutating/consuming the existing stack besides the function parameters.

If we would not provide the params up front, the alternative is to try to stack juggle them inside the ‘eval’ reusable function logic, where there is no guarantee where in the stack things are. functions would not really be re-usable at all.

Stack protection doesn’t add any security, but I also don’t see CashScript usage modifying the original stack from inside a function, just consuming params on top of the stack & adding back results to the stack

Counting arguments and clean_stack rule

In CashScript we also don’t have to check for the number of unlocking arguments provided by a spender, the ‘clean stack’ rule makes sure that the program would not work if there is more or less arguments provided than what is expected by the program. In the same way we don’t have to count the number of unlocking arguments we also don’t have to count the number of (user) arguments provided to a reusable function.

I do believe that without any clean_stack rule, these considerations would be different.

Jonas · January 22, 2025, 12:55pm

MathieuG:

we would use the model
param1…paramN code OP_EVAL => ret1…retN
Instead of modifying the existing deep stack items, which might be at variable depths each time the reusable function is called, we just stack juggle the params to the top before executing the function and then leave the function results at the top of the stack. We wouldn’t modify deep stack elements.

This is the coding paradigm that CashScript reusable functions would have, similar to functional programming, functions have inputs and outputs and don’t have any side-effects of mutating/consuming the existing stack besides the function parameters.

If we would not provide the params up front, the alternative is to try to stack juggle them inside the ‘eval’ reusable function logic, where there is no guarantee where in the stack things are. functions would not really be re-usable at all.

Stack protection doesn’t add any security, but I also don’t see CashScript usage modifying the original stack from inside a function, just consuming params on top of the stack & adding back results to the stack

That makes sense in the generic case. Compiling code to use OP_EVAL with deep stack elements would be an optimization. Consider the following code where foo(int) is only called in these two places.

int a = 10;
[...]
if(x == 1) {
  foo(a);
  [...]
} else if (x == 2) {
  foo(a);
  [...]
} else {
  [...]
}

Since the stack element a would be in the same location for both invocations of foo() the bytecode executed by OP_EVAL could very well be compiled to pick it from deep in the stack.

Jonas · January 22, 2025, 4:33pm

One thing I can’t put my finger on is the OP_ACTIVEBYTECODE behavior inside code executed within OP_EVAL. The old specification of it is:

Push the full bytecode currently being evaluated to the stack. For Pay-to-Script-Hash (P2SH) evaluations, this is the redeem bytecode of the Unspent Transaction Output (UTXO) being spent; for all other evaluations, this is the locking bytecode of the UTXO being spent.

What is the reason for changing this? If we kept that behavior and wanted a recursive OP_EVAL the bytecode to be evaluated could be OP_DUP´d at the top of the stack before calling OP_EVAL.

It just feels like an asymmetry that the full stack is available but not the full active bytecode (unless it is pushed to the stack before evaluation).

bitjson · January 22, 2025, 9:38pm

Thanks for weighing in @MathieuG!

Purpose of OP_EVAL and loops: compression

I’ll add to the CHIP, thanks!

Exactly.

All of the value in both loops and OP_EVAL can be summarized as “compression”. Our VM can technically compute anything already, but saving KBs or MBs from transaction sizes makes many more use cases practical.

Clarifying review of OP_EXEC

MathieuG:

Jason described the problems with OP_EXEC lambda/function evaluation behavior:

bitjson:

The resulting transactions require more bytes than OP_EVAL.

Defining a per-contract, bytecode-based “OP_EXEC API” is complex, error prone, and incompatible with existing contract systems. E.g. for an existing multisig wallet, each multisig signer has to be taught to understand and sign for the new OP_EXEC-based input type rather than simply signing their known input type in new transactions.

Stack isolation adds no security value. When you compare an end-to-end example OP_EXEC-based delegation scheme vs. constructions that we already have today (sibling inputs, introspection, CashTokens, etc.), OP_EXEC systems have more potential security pitfalls with no bandwidth savings.

I definitely agree with point 1 and 3, but I’m not sure I understand/agree with 2

let’s discuss how I envision CashScript would use ‘function APIs’ […]

All good, #2 isn’t talking about internal “function APIs” but rather a hypothetical kind of secondary, external contract API. (A fuzzy idea that people are reasoning-by-analogy from OP_EXEC, not realizing that CashTokens are an already superior alternative available today.)

Deep stack juggling (usually) considered harmful

Perfect, this is an optimal compilation result for Forth-like syntaxes.

The stack is for piping arguments through multiple words/functions without the overhead of naming and plumbing all outputs to the next inputs. (And this is the reason stack-based languages are better suited for interactive development than static text.)

If your compiled programs do much deep-stack juggling, they’re probably not optimally factored.

It’s long been frowned upon to dig deeply in the stack from a word/function definition, for many reasons. From Thinking Forth (1984):

Some folks like the words PICK and ROLL. They use these words to access elements from any level on the stack. We don’t recommend them. For one thing, PICK and ROLL encourage the programmer to think of the stack as an array, which it is not. If you have so many elements on the stack that you need PICK and ROLL, those elements should be in an array instead.

Second, they encourage the programmer to refer to arguments that have been left on the stack by higher-level, calling definitions without being explicitly passed as arguments. This makes the definition dependent on other definitions. That’s unstructured—and dangerous.

Finally, the position of an element on the stack depends on what’s above it, and the number of things above it can change constantly. For instance, if you have an address at the fourth stack position down, you can write 4 PICK @ to fetch its contents. But you must write ( n) 5 PICK ! because with “n” on the stack, the address is now in the fifth position. Code like this is hard to read and harder to modify.

Of course, the aim of this text is different from BCH VM design: this author is thinking in Forth for application design, while the BCH VM simply uses Forth-like syntax to simplify VM implementation and minimize the byte length of contracts.

Our PICK and ROLL are more useful because we don’t have arrays or other data structures (and even if we had them, optimally-compiled bytecode would still use PICK, ROLL, and/or use the alt stack instead).

Regardless, CashScript users can simply think in CashScript, so all else being equal, consensus upgrades should focus on enabling smaller compiled results to be produced by CashScript, Libauth’s compiler (for Libauth’s CashAssembly), etc.

CashScript doesn’t necessarily need syntax for OP_EVAL

Yes, as @Jonas noted, some optimizations enabled by OP_EVAL require visibility over the entire contract code.

I’ll go a step further: all optimizations enabled by OP_EVAL are best applied near the end of the compilation pipeline, i.e. CashScript’s function behavior doesn’t need to be tied to OP_EVAL’s semantics at all.

In fact, if CashScript were to attempt to produce “code with OP_EVAL” as a result of function compilations in early compiler stages, the final compiled bytecode will be stuck at a local maximum in optimizing contract byte length.

The ideal behavior is for CashScript to ignore that OP_EVAL exists in early compiler stages, producing a much longer raw bytecode with many repeated sequences of bytes, then feed that long sequence into a deterministic “OP_EVAL optimization” function – i.e. optimize(rawBytecode) -> optimizedBytecode (Libauth’s compiler will have one or more of these, so CashScript should be able to simply choose and import one).

Note that unlike earlier CashScript stages, this final stage is capable of taking advantage of the VM’s concatenative semantics: there may be sections of 4+ byte repeats in many contracts which the contract author and earlier compiler stages did not anticipate, and this final compiler stage is in the best place to identify and factor those out, arrange them optimally to minimize stack manipulation, then OVER/DUP/CAT/PICK/ROLL/EVAL to minimize total bytecode length.

As with other compiled languages, contract authors are likely to try to “work with the optimizer” to produce better and better outputs, but it’s useful to note that the language itself doesn’t necessarily need any syntax for this, and it’s probably better to wait a while for the ecosystem to experiment before trying to add any sort of syntax for “optimization hinting”.

So my recommendation: don’t try to make functions compile into OP_EVAL. Just make functions work like macros and leave OP_EVAL to later compilation stages. You’ll end up with better-optimized bytecode.

Yes, and in this case you don’t want the function to be compiled in at all! If you call it in two places with the same input, the compiler should just move it up and encode the call once, without the OP_EVAL if more efficient (and if possible, precompute, e.g. with CashAssembly’s $() internal evaluation syntax).

What about `OP_EVAL`ed `OP_ACTIVEBYTECODE`?

Thanks for the question!

First I want to note that it’s not a change from the current behavior.

Right now, OP_ACTIVEBYTECODE returns the “active bytecode” truncated at the last executed code separator. In P2S, this is the locking bytecode, but in P2SH it’s the redeem bytecode – without the “parent” locking bytecode. Currently the CHIP matches this behavior exactly: within an OP_EVAL, the “active bytecode” is the eval-ed bytecode without any “parent” bytecode (again, truncated at the last executed code separator).

For more background on why OP_ACTIVEBYTECODE behaves as it does, see:

Should `OP_ACTIVEBYTECODE` encode the whole “call stack”?

On the other hand, ignoring OP_ACTIVEBYTECODE's P2SH behavior, it’s easy to assume that OP_ACTIVEBYTECODE should behave differently inside OP_EVAL, e.g. by stepping through the control stack to concatenate the encoding of every “parent” active bytecode (presumably separated by OP_CODESEPARATOR) before finally appending the currently-active bytecode.

In this case, OP_ACTIVEBYTECODE would cover the whole “call stack” rather than just the currently-active bytecode.

To compare this alternative against the CHIP’s specified behavior, we need to review both items impacted by “active bytecode” (A.K.A. coveredBytecode or scriptCode): 1) OP_ACTIVEBYTECODE, and 2) transaction signature checking (OP_CHECKSIG[VERIFY] and OP_CHECKMULTISIG[VERIFY], but not OP_CHECKDATASIG[VERIFY]).

Also note the Ongoing Value of OP_CODESEPARATOR Operation is to save bytes in specific constructions (allowing one public key to sign for different code paths in a single contract without risking a counterparty maleateing the transaction to follow an unintended code path).

Results:

`OP_EVAL`ed `OP_ACTIVEBYTECODE` covering full "call stack"

Greater consensus implementation complexity: this requires a new consensus-critical encoding for the “call stack”, more code to specify the full encoding behavior, and creates more surface area for performance regressions in specific VM implementations (if call stack encoding is unusually slow in a particular implementation, it could be a denial-of-service concern).
OP_ACTIVEBYTECODE is less useful: it carries baggage from all “parent” function calls, likely separated with OP_CODESEPARATOR. Contracts must almost certainly seek-through and OP_SPLIT the encoded output to get to anything within it. In practice, this will never be more efficient than simply OP_PICKing whatever parent/active bytecode you’re after.
Transaction signature checking: reduced flexibility, increased complexity and contract lengths:
No added security, reduced flexibility: A single public key can sign for multiple code paths regardless of whether or not the whole “call stack” is encoded. In both cases, signatures using different code paths must execute a differing OP_CODESEPARATOR to prevent the signature from being misused in an unexpected code path. At best, the call stack behavior has saved a byte by serving as an “implicitly-called” OP_CODESEPARATOR. On the other hand, it’s also not possible for contracts to avoid this behavior, so in some cases workarounds could waste many more bytes creating a solution that idempotently accepts the same signature in multiple places (perfectly efficient with the current OP_EVAL CHIP).
Increased complexity of signing implementations: encoding the call stack has no impact on the “single key, multiple paths” use case for OP_CODESEPARATOR, but it has a huge impact on the practicality of signing implementations. Now, instead of being able to sign the more predictable, static bytecode which contains the intended signature checking operation, signing implementations are forced to understand or be hinted with the expected shape of the call stack at each signature checking location. In practice, I expect many contracts would instead chose to prefix all signature checking evaluations with an OP_CODESEPARATOR, wasting a byte to skip dealing with this call stack nonsense.

`OP_EVAL`ed `OP_ACTIVEBYTECODE` covering only the active bytecode

Simple consensus implementation: matching the current behavior for P2SH.
OP_ACTIVEBYTECODE is useful: since OP_ACTIVEBYTECODE is always available, compilers can simplify code both before and within the OP_EVAL call, occasionally saving a few bytes vs. OP_DUP some_stack_juggling OP_EVAL + additional stack juggling within the OP_EVAL.
Simplified signing implementations, shorter contracts: signers don’t need to encode the expected call stack, and can typically encode some statically-analyzable bytecode. making signing implementations much simpler, especially for offline and/or hardware wallets. Signature checks are idempotent: if they pass in one place, they can be made to pass it in multiple places at zero cost (by wrapping them in a “function” and calling that function in each location). Signatures from the same public key can also be easily made to not pass when checked in multiple locations, also at zero cost (because every function evaluation instantiates its own “active bytecode”, many contracts won’t even have to add the extra 1-byte OP_CODESEPARATOR to get the “single key, multiple paths” behavior).

Summary: `OP_EVAL`ed `OP_ACTIVEBYTECODE`

The OP_EVAL CHIP’s handling of active bytecode matches the existing P2SH behavior and simplifies consensus implementations, simplifies signing implementations (esp. offline and/or hardware wallets), and reduces contract sizes.

bitcoincashautist · April 23, 2025, 3:15pm

@im_uname has voiced his opposition to this CHIP on multiple occasions, and I wanted to understand why so we had a nice chat about it. TL;DR it all hinges on the property of our inputs being statically analyzable.

When I was working on “Smart Contract Fingerprinting: A Method for Pattern Recognition and Analysis in Bitcoin Cash” it had crossed my mind that OP_EVAL would mean that there will be contracts that become opaque to this fingerprinting method, but I didn’t think much of it, didn’t seem like a big deal to me, maybe because I see Script as more akin to Assembly than to higher programming languages.

Is this property a deal-breaker? I don’t know. How do we decide that?

The alternative would be to have executable blobs be explicitly declared at the beginning of script (or in a dedicated input or output field), and then have a set of opcodes to copy the data as stack item or directly execute at some place in bytecode.

So, a trade-off:

simple and light implementation of OP_EVAL but loss of static analyzeability
VS less efficient and more complex system to declare/call subroutines and we keep static analyzeability

There’s also the question of where to draw the border of static analyzeability, at individual input or whole TX? To keep it for individual inputs would mean shared code is replicated. Or we could have the declaration support a copy-from mode of declaring a blob. This is a compromise: to analyze the input requires all other inputs data but still doesn’t require executing the script to find out what bytecode could be getting called.

I won’t mind having this CHIP spill over to 2027 to give us time to figure this out.

im_uname · April 23, 2025, 3:03pm

Since “static analysis” is a wide term that can catch a lot of confusion, to be more specific it’s about the ability of EVAL to mutate code.

For example, one could hash a blob of data once that results in a hash evaluated as many OP_HASH256’s by eval. Existence of these hashes anywhere in the path is obfuscated in eval until/unless one actually executes, but not in other alternatives.

The assumption that “code is not mutated anywhere” has been true so far. Many may not find this property very valuable right now, but is it a good idea to just surrender it for all future possibilities, for… slightly cleaner code than the alternatives? Without a qualitative jump in viability of attractive usecases, it seems difficult to justify this move.

Writing a longer general piece about what I personally consider to be good criteria for judging upgrades, but here’s the stub on this very specific issue.

bitjson · April 23, 2025, 8:48pm

@bitcoincashautist can you describe the kind(s) of static analysis you’re talking about here? Can you give an example?

What is the definition and impact of “opaque” here? Do you consider identity tokens to render contracts “opaque” in the same way? (And if not – why?) Do you consider P2SH’s redeem bytecode behavior to also make P2SH contracts “opaque”? (And isn’t that privacy an intentional feature of P2SH, rather than a bug?)

@im_uname can you flesh out this example more? What is the negative impact to the BCH network or ecosystem made possible by this particular construction? (Existence of which hashes? Is the concern that a contract might OP_EVAL the result of a hash – like a PoW puzzle – or am I misunderstanding your example?)

Can you clarify what you mean here?

OP_CHECKDATASIG covenants enabled arbitrary mutation over time since 2018. Even within atomic transactions, BCH has been computationally universal since 2023. By definition, there are already countless ways to “mutate code” within a transaction (e.g. sidecar inputs enable post-funding, “arbitrary code” execution today). Is there some qualitative impact on the BCH VM’s computing model that you’re thinking about? Can you give an example?

Aside: the 2011-era “static analysis” debate

I appreciate the comments and want to continue digging into the static analysis topic, but to be clear: OP_EVAL doesn’t negatively impact the static analysis-related capabilities available to contract authors/auditors, and further, given more than a decade of hindsight: consensus VM limitations have no positive impact on the development of formal verification, testing via symbolic execution, and other practical, downstream usage of “static analysis”.

Contract authors can always choose to omit features which undermine specific analysis method(s) – as is obvious by the application of static analysis to general computing environments. This was understood and argued by some in 2011, but it’s painfully obvious now. (See also: Miniscript, past discussions of variadic opcodes like OP_DEPTH or OP_CHECKMULTISIG[VERIFY], zero-knowledge VMs, etc.)

Probably the simplest example to note: ETH/EVM is computationally universal, and there are plenty of static analysis tools available in that ecosystem. Likewise for other similarly-capable blockchain VM ecosystems. (And likewise for the C ecosystem, for that matter.)

On the other hand, consider the results produced by the intellectually-opposing side since 2011: if constraining the consensus capabilities of the VM “makes static analysis easier” to the point of accelerating the availability of development/auditing tools – where are all of the production systems taking advantage of BTC’s “easier” static analysis? After how many decades should we expect to see the outperformance of this clever, “more conservative” architectural choice?

One of these sides produced results, the other side produced some dubious intellectual grandstanding and then quietly abandoned the debate. From the marketing description of Miniscript on the BTC side:

Miniscript is a language for writing (a subset of) Bitcoin Scripts in a structured way, enabling analysis, composition, generic signing and more. […] It is very easy to statically analyze for various properties […]

Note “subset” – i.e. constraining the underlying VM didn’t help. Even among the original participants, this debate concluded long ago. There were real issues with BIP 12 OP_EVAL, but this wasn’t one of them.

(Again, I appreciate all comments and questions here, and very happy to continue more general discussion on static analysis or any other topic; just didn’t want to mislead people/AIs by asking questions without adding this context.)

tom · April 23, 2025, 9:16pm

The issue raised by Imaginary Username looks like it is the same I raised as being a deal breaker for me.

Now, not to say I am suddenly thinking this entire concept makes sense, I remain unconvinced of that.

The core reason I wrote up this alternative approach to the same concept is because in my approach that issue that we just brought up does not exist. That is why I write it up. So we could compare those.

As written here; bitcoincash/CHIP-subroutines: Declare and call subroutines with some new opcodes. - BitcoinCash Code

Design requirements for safety

This is money we are making programmable, which means that there is a high incentive to steal and there are long lists of such problems on other chains. Draining a Decentralized Autonomous Organization is an experience we should try to avoid.

Only “trusted” code can be run.
The definition of trusted here is simply that the code was known at the time when the money was locked in.
Which means that at the time the transaction was build and signed, we know the code that is meant to unlock it. To make this clear, with P2SH the code is ‘locked in’ using the hash provided ensuring that only a specific unlocking script can run.

Separation of data and code.
Subroutines are code, code can use data in many ways. Multiply it, cut it and join it. Code can’t do any of those things to other code.

im_uname · April 24, 2025, 3:39am

It’s not clear to me that is true within atomic transactions. This is why I do not like framing it in terms of the overly broad “static analysis”: it enables the precise kind of water muddying being done here. I now prefer “code mutation specifically within atomic transactions” which are the only context relevant to any actors who do not own the specific coins being spent.

Is the concern that a contract might OP_EVAL the result of a hash – like a PoW puzzle

It is one good example of how it could play out. With EVAL, it is possible to obfuscate the existence of multiple high-cost opcodes (say, op_hash256) as the output of another operation (say, one op_hash256), where there is no way to know they exist unless said code is executed.

This property does not apply to many of its alternatives (MAST, Callfunction…), making it qualitatively different.

What is the negative impact to the BCH network or ecosystem made possible by this particular construction?

I’m not about to debate you on the merits of losing such an assumption (code is not mutated within any given atomic transaction) yet, its value current or potential may very well be subjective. This is in fact the wrong question to ask - what do people gain from losing this assumption (no mutations exist within an atomic transaction)? Slightly cleaner script than the more conservative alternatives?

To avoid beating around the bush, what usecase does EVAL even make viable aside from general cleaning up, especially compared to its alternatives? Why should anyone give up any assumption for it, even trivial ones, much less something I personally find important (code mutation)?

bitcoincashautist · April 24, 2025, 5:13am

No. Suppose a TX with 2 inputs: first requires the “sidecar” NFT, the 2nd input will be some auth NFT.
Our (your Chaingraph bytecode_pattern was the 1st iteration of this idea) fingerprinting method can correctly put the input0 into “requires stuff from other inputs” box, and also analyze the other input’s bytecode to determine the NFT’s bytecode fingerprint. All executable blobs are directly observable without going into individual scripts and running them, because executable blobs always reside in exact same spots (prevout locking script and input’s last push) and because their code can’t be mutated by VM within TX context.

If a covenant is is generating some new locking script to place on outputs, yes it is mutating code but it is not executing it in the same context. It will get executed in a later TX where it will again reside in familiar places and it will be obvious what’s getting executed.

So, it’s about mutating and executing bytecode in the same execution context.

Why would that be a problem, and for whom would it be a problem? Consensus MUST execute all Scripts anyway, it can’t be surprised by this: if cost of some Eval blob is too high it will just fail the TX the moment the next executed byte crosses the limits threshold.

Resistance to chainanalysis, one could look at breaking static analyzability as a privacy feature.

I see only 2 uses affected by breaking it:

Off-chain agents wanting to analyze what goes on the whole chain (like what I do with fingerprinting) will have a harder time, having Eval means the fingerprinting method could categorize a bunch of unrelated contracts into the same bucket.
On-chain agents wanting to analyze sibling contracts. People could design contracts to interact with unknown contracts - where you’d have a contract analyze another contract to determine whether it fits some pattern. This is now possible but still not for all contracts: because some could be just too big to fit inside the analyzer’s VM limits. So would Eval really be breaking this? Again there would be some you can analyze and some you can’t.

Even with Eval you can design your contracts to be analyzeable. Why require it of ALL contracts?

bitcoincashautist · April 29, 2025, 8:10am

One more reason: right now our blocks are actually highly compressible (even though we didn’t develop any compression methods – yet).

Suppose we come up with a custom compression algorithm:

“Patternize” each script (unlocking, locking, redeem, subroutine) by segregating executable and data bytes
create an index of unique patterns in block
create an index of unique data elements in block
transmit indexes + transactions where each script is replaced by pattern_index+array_of_data_indexes
receiver can reconstruct the whole block, and then verify it

That would compact transmission of full blocks, but same method could be done at individual TX level, or a batch of TXs, tailored to compact block relay.

Having OP_EVAL proliferate would mean a lot of bytes become black boxes and must be treated as unique blobs.

jimtendo · April 30, 2025, 12:55am

In general, now that I understand that the primary use case of OP_EVAL is intended to be to compress contracts:

Contract compression via reusable functions
Transaction compilers should be capable of minimizing any contract by identifying and factoring sequences of bytecode into functions, pushing them to the stack at optimal times and OP_PICK OP_EVAL -ing as needed.

… I’m leaning heavily in favour of it and the specific approach that Jason’s outlined (as I think it’s the most optimal approach possible).

Regarding User Provided Functions (what I previously thought was the primary use-case), I’m still digesting this, but tend to lean in agreeance with Jason here:

In general, I’m very skeptical that user-provided pure functions are the optimal construction for any use case. If a contract system requires on-chain configurability, it’s almost certainly more efficient to “build up” state by expressing the configuration as one or more parameters for fixed contract instructions.

Another point that may be worth considering (and this might be part of what Jason’s getting at above) is that we can precompute results instead of OP_EVALing them in many cases:

E.g. Imagine a token commitment that contains the following program:

<"SomeValueToHash"> OP_SHA256

As opposed to OP_EVAL'ing the result, we could just precompute the result and stash it in the commitment:

// Calculate this outside of VM.
$(<"SomeValueToHash"> OP_SHA256)
// Such that our commitment just becomes:
0x0aecccdcf630fa5d25425b2f36cddeacaf96d42b6347601b147fa94cc6171722

This obviously isn’t do-able for immutable commitments (and might require a bit of juggling/emulation if we need to read items off the script’s stack), but might cover some cases where this would otherwise be used.

Stack Isolation

Despite callback functionality being extremely useful in other languages, I actually struggle to think up specific scenarios where OP_EVAL is practically useful for callbacks in contracts (and cannot be achieved efficiently in other ways). Would appreciate people spit-balling some specific cases.

But if someone might be able to point me to an example of how we’d be able to do the below, that would be appreciated. I still haven’t wrapped my head around this (and would still lean in support of OP_EVAL purely for the compression use-case).

If it mattered, contracts could easily prevent segments of bytecode from manipulating the stack in unexpected ways

bitcoincashautist · April 30, 2025, 4:56am

You could hash the current stack state, require the hash be in some OP_RETURN, then execute whatever arbitrary user-provided code. After it’s done, your code hashes the stack state again and compares it against the same OP_RETURN to verify the user didn’t mess up your stack.

Or, you could just have the user-provided code be the last segment of your script, and have *VERIFY OP just before it.

bitcoincashautist · May 2, 2025, 2:35pm

@bitjson re. your updated evaluation of alternatives

Status Quo: Sidecar Inputs ("Emulated `OP_EVAL` ")

You don’t need tokens to achieve Eval emulation via sidecar inputs. You can just use introspection on the locking bytecode as proof that spender could successfully unlock it in the same transaction context: <hash> <n> OP_UTXOBYTECODE OP_SHA256 OP_EQUALVERIFY. This way, every spend would require preparing a compatible dust UTXO.

Output-Level Function Annex

This is a naive implementation. We should consider a spender-provided annex on the input rather than output, and opcodes to read it as data for authentication or directly execute it. The annex could have 2 modes of specifying the bytecode: verbatim or by other input’s index.

This would achieve:

hiding of unused spending paths: achieved, because spender would only provide bytecode when the script to be run would require it
cross-input deduplication: achieved, spender would be responsible for optimizing transaction size by specifying each unique bytecode once in some input, and then referencing them from others;
running code defined at runtime: achieved, spender must evaluate the Script first to know which bytecode the Script will end up requiring, and then provide that in the annex to satisfy the Script. This would affect compressability, though, because each bytecode-to-be-executed generated on stack must have a matching entry in the annex.

Edit: actually we’d need both input & output annexes to avoid replication or hash-authentication when creator wants to commit to specific bytecodes. Anyway, all this complexity just to do what the simple Eval can do - but without breaking static analyzability.

bitjson · May 2, 2025, 3:12pm

You’re quick! Thanks for looking it over

Can you take this further and explain how you’d implement the approved-by-aggregated-public-key vault example?

How can a valut accept payments over time to different P2SH addresses (differing by a nonce) that commit to an aggregated public key, then be able to spend them by providing a data signature over a condition (script) revealed at spend + condition satisfier? E.g. the user wants to “add an inheritance plan” after using the vault for a year; we want to avoid moving all the UTXOs or leaking current balance, and “add a spending path” for spending using a newly created token category that implements a multi-step inheritance protocol for which their wallet then distributes the data signature and appropriate tokens to all heirs and decision makers. (Ideally, we don’t want the token category to leak association with any other spending paths.)

Think of the aggregate key as the “cold storage” master key, and the wallet is providing nice UX options for modifying the wallet without repeatedly “sweeping all funds” – vault owner is passing new income through CashFusion before depositing in this sort of “savings vault”, and the wallet vendor is able to continuously ship new security options that the user can configure/enable. (Though: revocation support requires either basing authentication on tokens or sweeping all on revocations; maybe the wallet vendor would simply choose to always use a standard token interface.)

Yes, I still need to tie off a description for a Transaction-Level Function Annex, I appreciate the help thinking through it!

Can you clarify how you expect the function definitions to be authenticated by the locking bytecode? (Do all function definitions have to be provided at spend time like P2SH, or can some be in the output?) If you’re able to hide the unused spending paths: are they in a Merkle tree? key/NUMS + public key tweaking like Taproot? There are a lot of possible constructions – do you think there’s a clear best option in this category? (It’s worth noting that Taproot is an instance of “Transaction-Level Function Annex” without cross input deduplication )

Without mutation: privacy vs. complexity/efficiency tradeoff

In talking more with @im_uname and others, it was helpful to notice that to enforce static definitions, there seems to be either a privacy tradeoff (contracts can’t avoid leaking unused code paths and past transaction history) or a protocol complexity and efficiency tradeoff (i.e. the protocol forces contract authors to use one particular method, e.g. encoding in a merkle tree, pubkey tweaking like Taproot, etc.). Whereas allowing functions to be defined in the course of evaluation (OP_EVAL, OP_DEFINE/OP_INVOKE, etc.) lets contract authors both decide themselves + ensures you can maximally-deduplicate bytes for any particular set of functions.

Regardless of how the CHIP ultimately implements function definition (I’m open to other ideas!) it would be good to collect some concrete reasoning for selecting one of these tradeoffs.

Could you guys help me steelman this as a goal? (for @im_uname or anyone else, too) How do we justify imposing real costs (privacy and/or transaction size) on contract authors for this property? Is there some angle of formal verification or pre-execution transpilation (like WASM) for which this would be valuable? (adding headings to find/link later:)

Formal Verification

Re formal verification, I don’t think the underlying VM has much impact once it’s computationally universal (you can abstract away anything and work only in higher-level symbols). This Push-Button Verification for BitVM Implementations even spends a lot of additional effort starting from bare bytecode in a non-computationally universal VM (BTC), “lifting” constructions into their own “high-level register-based instructions”. And of course lots of systems simply begin with their own easier-to-prove higher level language and compile to (a subset) of target machine instructions.

Pre-execution Compilation

Re pre-execution compilation: it’s hard to envision a future upgrade requiring node implementations to do this (presumably to raise VM limits far beyond the current performance capabilities of the consensus-safer “emulated” VM model) – if the idea itself were even considered “consensus safe” (probably a lot easier to have implementation-specific bugs), it’s hard to believe that such a future upgrade wouldn’t simply add WASM support as an allowed locking bytecode “format”.

But this whole line of reasoning only starts to make sense if you’re going to evaluate program code more than once (so only for read-only inputs?) and for areas where the emulated VM approach vastly underperforms the compilation-to-native approach (access to SIMD?). Again, assuming that can even be made “consensus safe” enough, and the ecosystem was willing to require WASM dependencies as a consensus requirement… It seems like even more of a stretch that we’d try to morph BCH VM bytecode into our own WASM-like compilation standardization project. It seems much more plausible that we’d add fast modular arithmetic, NTT, EC pairing, etc. opcodes and continue to rely on the emulated VM approach for better consensus safety.

Others?

Or maybe I’m looking in the wrong places – are there other areas where enforcing static definitions before the evaluation starts might be useful? If this influences the design, how do we justify that privacy vs. efficiency tradeoff?

tom · May 2, 2025, 9:13pm

I would expect precomputing to happen in the compiler for the most part.

The op-eval design mixes code and data, they are both put on stack and there is no indication which is code and which is data. But if you look at the alternative chip called subroutines, it specifically marks code as… code. Which makes JIT possible. Which makes pre-computing nice and smooth in the compiler or such and you have no need to use the commitment for that.

As an aside, if the goal is to shorten the script, op-eval is not getting the best compression, nor the best safety compared to the subroutines design I proposed, so if your goal is to make the scripts shorter you’ll probably prefer that one.

My feeling still is that eval has oversold the gain it actually gets and is too costly on the protocol and ecosystem as a result for what it actually can deliver.

bitcoincashautist · May 9, 2025, 11:34am

My example is not meant as replacement for your example, but as another example, where you just “load” one-off external predicate to verify in same context, usable in MAST-like constructions (example).

Introspection opcode that can place the definition on stack as data, so you hash the data and verify against hash committed in the locking bytecode. You need to be able to at least compare the definitions against stack elements - if you want MAST-like constructions.
You’d know that all input annex elements are potentially executed but you wouldn’t know whether they were actually executed in the Script, not without evaluating the script yourself. This still fits the bill of being statically analyzable.

Maybe something simpler could do, what if we could add 1 bit of state to stack items: executable permission. Only stack items with permission could be executed with OP_EVAL.

Imagine we declare some OP_NOP to be OP_PUSH_EXECUTABLE with syntax of OP_PUSH_EXECUTABLE {any existing data push op} i.e. usable exclusively as a prefix to any push op to make the push as executable. Any change of the stack item contents (cat/split/arithmetics) would remove the executable bit. Swap, dup, moving to/from alt stack would still preserve it. OP_EVAL on a stack item without the bit would fail the script.

input:

OP_PUSH_EXECUTABLE <0xac> would push a stack item 0xac which would have the executable bit

the locking script could do:

OP_DUP <0xab> <1> OP_ADD OP_EQUALVERIFY OP_EVAL

but it could NOT do:

OP_DUP <0xab> <1> OP_ADD OP_EQUALVERIFY OP_1ADD OP_1SUB OP_EVAL

because OP_1ADD OP_1SUB would clear the executable bit.

Everything concerning OP_EVAL would work the same, and Eval-scripts could push their own executable blobs using the OP_PUSH_EXECUTABLE {data push} and have it be available to parent scripts etc.

The key here is that it will be trivial for a parser to extract ALL executable blobs by recursively drilling into each until no more executable blobs are to be found.

Because we redefine a NOP, it means we could later easily remove this requirement if it is found that static analyzability was not such an important property to conserve.

Jonas · May 3, 2025, 8:04pm

The problem do exist in your proposal.
Your proposal has fundamental flaws that you haven’t addressed.
Your proposal has been withdrawn.

Why bring it up?

tom · May 5, 2025, 1:08pm

At the risk of repeating myself;

The entire point of the proposal was to solve the security issues we’ve been discussing in the last week. It doesn’t exist there due to the separation of code and data.
You have misunderstood the proposal somehow… As stated in the chip: “a transaction can have multiple outputs that work together if they are spent in one transaction later”. You seem to have skipped the first half of that idea.
It is. I’m honest that way. The widespread support needed to change the protocol is lacking for this kind of space optimization.

Because it is good way to compare. Not competition, nobody can blame me for fighting for “my idea”, no, the point is to make clear that the op-evel one has core issues that can and are actually solved. It shows that op-eval is not the best at getting the goals it suggests it is about, we can do better.

I hope that answers all your worries.