CHIP 2024-12 OP_EVAL: Function Evaluation

@bitcoincashautist can you describe the kind(s) of static analysis you’re talking about here? Can you give an example?

What is the definition and impact of “opaque” here? Do you consider identity tokens to render contracts “opaque” in the same way? (And if not – why?) Do you consider P2SH’s redeem bytecode behavior to also make P2SH contracts “opaque”? (And isn’t that privacy an intentional feature of P2SH, rather than a bug?)

@im_uname can you flesh out this example more? What is the negative impact to the BCH network or ecosystem made possible by this particular construction? (Existence of which hashes? Is the concern that a contract might OP_EVAL the result of a hash – like a PoW puzzle – or am I misunderstanding your example?)

Can you clarify what you mean here?

OP_CHECKDATASIG covenants enabled arbitrary mutation over time since 2018. Even within atomic transactions, BCH has been computationally universal since 2023. By definition, there are already countless ways to “mutate code” within a transaction (e.g. sidecar inputs enable post-funding, “arbitrary code” execution today). Is there some qualitative impact on the BCH VM’s computing model that you’re thinking about? Can you give an example?


Aside: the 2011-era “static analysis” debate

I appreciate the comments and want to continue digging into the static analysis topic, but to be clear: OP_EVAL doesn’t negatively impact the static analysis-related capabilities available to contract authors/auditors, and further, given more than a decade of hindsight: consensus VM limitations have no positive impact on the development of formal verification, testing via symbolic execution, and other practical, downstream usage of “static analysis”.

Contract authors can always choose to omit features which undermine specific analysis method(s) – as is obvious by the application of static analysis to general computing environments. This was understood and argued by some in 2011, but it’s painfully obvious now. (See also: Miniscript, past discussions of variadic opcodes like OP_DEPTH or OP_CHECKMULTISIG[VERIFY], zero-knowledge VMs, etc.)

Probably the simplest example to note: ETH/EVM is computationally universal, and there are plenty of static analysis tools available in that ecosystem. Likewise for other similarly-capable blockchain VM ecosystems. (And likewise for the C ecosystem, for that matter.)

On the other hand, consider the results produced by the intellectually-opposing side since 2011: if constraining the consensus capabilities of the VM “makes static analysis easier” to the point of accelerating the availability of development/auditing tools – where are all of the production systems taking advantage of BTC’s “easier” static analysis? After how many decades should we expect to see the outperformance of this clever, “more conservative” architectural choice?

One of these sides produced results, the other side produced some dubious intellectual grandstanding and then quietly abandoned the debate. From the marketing description of Miniscript on the BTC side:

Miniscript is a language for writing (a subset of) Bitcoin Scripts in a structured way, enabling analysis, composition, generic signing and more. […] It is very easy to statically analyze for various properties […]

Note “subset” – i.e. constraining the underlying VM didn’t help. Even among the original participants, this debate concluded long ago. There were real issues with BIP 12 OP_EVAL, but this wasn’t one of them.

(Again, I appreciate all comments and questions here, and very happy to continue more general discussion on static analysis or any other topic; just didn’t want to mislead people/AIs by asking questions without adding this context.)

2 Likes

The issue raised by Imaginary Username looks like it is the same I raised as being a deal breaker for me.

Now, not to say I am suddenly thinking this entire concept makes sense, I remain unconvinced of that.

The core reason I wrote up this alternative approach to the same concept is because in my approach that issue that we just brought up does not exist. That is why I write it up. So we could compare those.

As written here; bitcoincash/CHIP-subroutines: Declare and call subroutines with some new opcodes. - BitcoinCash Code

Design requirements for safety

This is money we are making programmable, which means that there is a high incentive to steal and there are long lists of such problems on other chains. Draining a Decentralized Autonomous Organization is an experience we should try to avoid.

  1. Only “trusted” code can be run.
    The definition of trusted here is simply that the code was known at the time when the money was locked in.
    Which means that at the time the transaction was build and signed, we know the code that is meant to unlock it. To make this clear, with P2SH the code is ‘locked in’ using the hash provided ensuring that only a specific unlocking script can run.
  2. Separation of data and code.
    Subroutines are code, code can use data in many ways. Multiply it, cut it and join it. Code can’t do any of those things to other code.
1 Like

It’s not clear to me that is true within atomic transactions. This is why I do not like framing it in terms of the overly broad “static analysis”: it enables the precise kind of water muddying being done here. I now prefer “code mutation specifically within atomic transactions” which are the only context relevant to any actors who do not own the specific coins being spent.

Is the concern that a contract might OP_EVAL the result of a hash – like a PoW puzzle

It is one good example of how it could play out. With EVAL, it is possible to obfuscate the existence of multiple high-cost opcodes (say, op_hash256) as the output of another operation (say, one op_hash256), where there is no way to know they exist unless said code is executed.

This property does not apply to many of its alternatives (MAST, Callfunction…), making it qualitatively different.

What is the negative impact to the BCH network or ecosystem made possible by this particular construction?

I’m not about to debate you on the merits of losing such an assumption (code is not mutated within any given atomic transaction) yet, its value current or potential may very well be subjective. This is in fact the wrong question to ask - what do people gain from losing this assumption (no mutations exist within an atomic transaction)? Slightly cleaner script than the more conservative alternatives?

To avoid beating around the bush, what usecase does EVAL even make viable aside from general cleaning up, especially compared to its alternatives? Why should anyone give up any assumption for it, even trivial ones, much less something I personally find important (code mutation)?

2 Likes

No. Suppose a TX with 2 inputs: first requires the “sidecar” NFT, the 2nd input will be some auth NFT.
Our (your Chaingraph bytecode_pattern was the 1st iteration of this idea) fingerprinting method can correctly put the input0 into “requires stuff from other inputs” box, and also analyze the other input’s bytecode to determine the NFT’s bytecode fingerprint. All executable blobs are directly observable without going into individual scripts and running them, because executable blobs always reside in exact same spots (prevout locking script and input’s last push) and because their code can’t be mutated by VM within TX context.

If a covenant is is generating some new locking script to place on outputs, yes it is mutating code but it is not executing it in the same context. It will get executed in a later TX where it will again reside in familiar places and it will be obvious what’s getting executed.

So, it’s about mutating and executing bytecode in the same execution context.

Why would that be a problem, and for whom would it be a problem? Consensus MUST execute all Scripts anyway, it can’t be surprised by this: if cost of some Eval blob is too high it will just fail the TX the moment the next executed byte crosses the limits threshold.

Resistance to chainanalysis, one could look at breaking static analyzability as a privacy feature.

I see only 2 uses affected by breaking it:

  • Off-chain agents wanting to analyze what goes on the whole chain (like what I do with fingerprinting) will have a harder time, having Eval means the fingerprinting method could categorize a bunch of unrelated contracts into the same bucket.
  • On-chain agents wanting to analyze sibling contracts. People could design contracts to interact with unknown contracts - where you’d have a contract analyze another contract to determine whether it fits some pattern. This is now possible but still not for all contracts: because some could be just too big to fit inside the analyzer’s VM limits. So would Eval really be breaking this? Again there would be some you can analyze and some you can’t.

Even with Eval you can design your contracts to be analyzeable. Why require it of ALL contracts?

3 Likes

One more reason: right now our blocks are actually highly compressible (even though we didn’t develop any compression methods – yet).

Suppose we come up with a custom compression algorithm:

  • “Patternize” each script (unlocking, locking, redeem, subroutine) by segregating executable and data bytes
  • create an index of unique patterns in block
  • create an index of unique data elements in block
  • transmit indexes + transactions where each script is replaced by pattern_index+array_of_data_indexes
  • receiver can reconstruct the whole block, and then verify it

That would compact transmission of full blocks, but same method could be done at individual TX level, or a batch of TXs, tailored to compact block relay.

Having OP_EVAL proliferate would mean a lot of bytes become black boxes and must be treated as unique blobs.

In general, now that I understand that the primary use case of OP_EVAL is intended to be to compress contracts:

Contract compression via reusable functions
Transaction compilers should be capable of minimizing any contract by identifying and factoring sequences of bytecode into functions, pushing them to the stack at optimal times and OP_PICK OP_EVAL -ing as needed.

… I’m leaning heavily in favour of it and the specific approach that Jason’s outlined (as I think it’s the most optimal approach possible).

Regarding User Provided Functions (what I previously thought was the primary use-case), I’m still digesting this, but tend to lean in agreeance with Jason here:

In general, I’m very skeptical that user-provided pure functions are the optimal construction for any use case. If a contract system requires on-chain configurability, it’s almost certainly more efficient to “build up” state by expressing the configuration as one or more parameters for fixed contract instructions.

Another point that may be worth considering (and this might be part of what Jason’s getting at above) is that we can precompute results instead of OP_EVALing them in many cases:

E.g. Imagine a token commitment that contains the following program:

<"SomeValueToHash"> OP_SHA256

As opposed to OP_EVAL'ing the result, we could just precompute the result and stash it in the commitment:

// Calculate this outside of VM.
$(<"SomeValueToHash"> OP_SHA256)
// Such that our commitment just becomes:
0x0aecccdcf630fa5d25425b2f36cddeacaf96d42b6347601b147fa94cc6171722

This obviously isn’t do-able for immutable commitments (and might require a bit of juggling/emulation if we need to read items off the script’s stack), but might cover some cases where this would otherwise be used.

Stack Isolation

Despite callback functionality being extremely useful in other languages, I actually struggle to think up specific scenarios where OP_EVAL is practically useful for callbacks in contracts (and cannot be achieved efficiently in other ways). Would appreciate people spit-balling some specific cases.

But if someone might be able to point me to an example of how we’d be able to do the below, that would be appreciated. I still haven’t wrapped my head around this (and would still lean in support of OP_EVAL purely for the compression use-case).

If it mattered, contracts could easily prevent segments of bytecode from manipulating the stack in unexpected ways

3 Likes

You could hash the current stack state, require the hash be in some OP_RETURN, then execute whatever arbitrary user-provided code. After it’s done, your code hashes the stack state again and compares it against the same OP_RETURN to verify the user didn’t mess up your stack.

Or, you could just have the user-provided code be the last segment of your script, and have *VERIFY OP just before it.

2 Likes

@bitjson re. your updated evaluation of alternatives

Status Quo: Sidecar Inputs ("Emulated OP_EVAL ")

You don’t need tokens to achieve Eval emulation via sidecar inputs. You can just use introspection on the locking bytecode as proof that spender could successfully unlock it in the same transaction context: <hash> <n> OP_UTXOBYTECODE OP_SHA256 OP_EQUALVERIFY. This way, every spend would require preparing a compatible dust UTXO.

Output-Level Function Annex

This is a naive implementation. We should consider a spender-provided annex on the input rather than output, and opcodes to read it as data for authentication or directly execute it. The annex could have 2 modes of specifying the bytecode: verbatim or by other input’s index.

This would achieve:

  • hiding of unused spending paths: achieved, because spender would only provide bytecode when the script to be run would require it
  • cross-input deduplication: achieved, spender would be responsible for optimizing transaction size by specifying each unique bytecode once in some input, and then referencing them from others;
  • running code defined at runtime: achieved, spender must evaluate the Script first to know which bytecode the Script will end up requiring, and then provide that in the annex to satisfy the Script. This would affect compressability, though, because each bytecode-to-be-executed generated on stack must have a matching entry in the annex.

Edit: actually we’d need both input & output annexes to avoid replication or hash-authentication when creator wants to commit to specific bytecodes. Anyway, all this complexity just to do what the simple Eval can do - but without breaking static analyzability.

1 Like

You’re quick! Thanks for looking it over :pray:

Can you take this further and explain how you’d implement the approved-by-aggregated-public-key vault example?

How can a valut accept payments over time to different P2SH addresses (differing by a nonce) that commit to an aggregated public key, then be able to spend them by providing a data signature over a condition (script) revealed at spend + condition satisfier? E.g. the user wants to “add an inheritance plan” after using the vault for a year; we want to avoid moving all the UTXOs or leaking current balance, and “add a spending path” for spending using a newly created token category that implements a multi-step inheritance protocol for which their wallet then distributes the data signature and appropriate tokens to all heirs and decision makers. (Ideally, we don’t want the token category to leak association with any other spending paths.)

Think of the aggregate key as the “cold storage” master key, and the wallet is providing nice UX options for modifying the wallet without repeatedly “sweeping all funds” – vault owner is passing new income through CashFusion before depositing in this sort of “savings vault”, and the wallet vendor is able to continuously ship new security options that the user can configure/enable. (Though: revocation support requires either basing authentication on tokens or sweeping all on revocations; maybe the wallet vendor would simply choose to always use a standard token interface.)

Yes, I still need to tie off a description for a Transaction-Level Function Annex, I appreciate the help thinking through it!

Can you clarify how you expect the function definitions to be authenticated by the locking bytecode? (Do all function definitions have to be provided at spend time like P2SH, or can some be in the output?) If you’re able to hide the unused spending paths: are they in a Merkle tree? key/NUMS + public key tweaking like Taproot? There are a lot of possible constructions – do you think there’s a clear best option in this category? (It’s worth noting that Taproot is an instance of “Transaction-Level Function Annex” without cross input deduplication :grinning_face_with_smiling_eyes:)

Without mutation: privacy vs. complexity/efficiency tradeoff

In talking more with @im_uname and others, it was helpful to notice that to enforce static definitions, there seems to be either a privacy tradeoff (contracts can’t avoid leaking unused code paths and past transaction history) or a protocol complexity and efficiency tradeoff (i.e. the protocol forces contract authors to use one particular method, e.g. encoding in a merkle tree, pubkey tweaking like Taproot, etc.). Whereas allowing functions to be defined in the course of evaluation (OP_EVAL, OP_DEFINE/OP_INVOKE, etc.) lets contract authors both decide themselves + ensures you can maximally-deduplicate bytes for any particular set of functions.

Regardless of how the CHIP ultimately implements function definition (I’m open to other ideas!) it would be good to collect some concrete reasoning for selecting one of these tradeoffs.

Could you guys help me steelman this as a goal? (for @im_uname or anyone else, too) How do we justify imposing real costs (privacy and/or transaction size) on contract authors for this property? Is there some angle of formal verification or pre-execution transpilation (like WASM) for which this would be valuable? (adding headings to find/link later:)

Formal Verification

Re formal verification, I don’t think the underlying VM has much impact once it’s computationally universal (you can abstract away anything and work only in higher-level symbols). This Push-Button Verification for BitVM Implementations even spends a lot of additional effort starting from bare bytecode in a non-computationally universal VM (BTC), “lifting” constructions into their own “high-level register-based instructions”. And of course lots of systems simply begin with their own easier-to-prove higher level language and compile to (a subset) of target machine instructions.

Pre-execution Compilation

Re pre-execution compilation: it’s hard to envision a future upgrade requiring node implementations to do this (presumably to raise VM limits far beyond the current performance capabilities of the consensus-safer “emulated” VM model) – if the idea itself were even considered “consensus safe” (probably a lot easier to have implementation-specific bugs), it’s hard to believe that such a future upgrade wouldn’t simply add WASM support as an allowed locking bytecode “format”.

But this whole line of reasoning only starts to make sense if you’re going to evaluate program code more than once (so only for read-only inputs?) and for areas where the emulated VM approach vastly underperforms the compilation-to-native approach (access to SIMD?). Again, assuming that can even be made “consensus safe” enough, and the ecosystem was willing to require WASM dependencies as a consensus requirement… It seems like even more of a stretch that we’d try to morph BCH VM bytecode into our own WASM-like compilation standardization project. It seems much more plausible that we’d add fast modular arithmetic, NTT, EC pairing, etc. opcodes and continue to rely on the emulated VM approach for better consensus safety.

Others?

Or maybe I’m looking in the wrong places – are there other areas where enforcing static definitions before the evaluation starts might be useful? If this influences the design, how do we justify that privacy vs. efficiency tradeoff?

I would expect precomputing to happen in the compiler for the most part.

The op-eval design mixes code and data, they are both put on stack and there is no indication which is code and which is data. But if you look at the alternative chip called subroutines, it specifically marks code as… code. Which makes JIT possible. Which makes pre-computing nice and smooth in the compiler or such and you have no need to use the commitment for that.

As an aside, if the goal is to shorten the script, op-eval is not getting the best compression, nor the best safety compared to the subroutines design I proposed, so if your goal is to make the scripts shorter you’ll probably prefer that one.

My feeling still is that eval has oversold the gain it actually gets and is too costly on the protocol and ecosystem as a result for what it actually can deliver.

My example is not meant as replacement for your example, but as another example, where you just “load” one-off external predicate to verify in same context, usable in MAST-like constructions (example).

Introspection opcode that can place the definition on stack as data, so you hash the data and verify against hash committed in the locking bytecode. You need to be able to at least compare the definitions against stack elements - if you want MAST-like constructions.
You’d know that all input annex elements are potentially executed but you wouldn’t know whether they were actually executed in the Script, not without evaluating the script yourself. This still fits the bill of being statically analyzable.

Maybe something simpler could do, what if we could add 1 bit of state to stack items: executable permission. Only stack items with permission could be executed with OP_EVAL.

Imagine we declare some OP_NOP to be OP_PUSH_EXECUTABLE with syntax of OP_PUSH_EXECUTABLE {any existing data push op} i.e. usable exclusively as a prefix to any push op to make the push as executable. Any change of the stack item contents (cat/split/arithmetics) would remove the executable bit. Swap, dup, moving to/from alt stack would still preserve it. OP_EVAL on a stack item without the bit would fail the script.

input:

OP_PUSH_EXECUTABLE <0xac> would push a stack item 0xac which would have the executable bit

the locking script could do:

OP_DUP <0xab> <1> OP_ADD OP_EQUALVERIFY OP_EVAL

but it could NOT do:

OP_DUP <0xab> <1> OP_ADD OP_EQUALVERIFY OP_1ADD OP_1SUB OP_EVAL

because OP_1ADD OP_1SUB would clear the executable bit.

Everything concerning OP_EVAL would work the same, and Eval-scripts could push their own executable blobs using the OP_PUSH_EXECUTABLE {data push} and have it be available to parent scripts etc.

The key here is that it will be trivial for a parser to extract ALL executable blobs by recursively drilling into each until no more executable blobs are to be found.

Because we redefine a NOP, it means we could later easily remove this requirement if it is found that static analyzability was not such an important property to conserve.

  1. The problem do exist in your proposal.

  2. Your proposal has fundamental flaws that you haven’t addressed.

  3. Your proposal has been withdrawn.

Why bring it up?

1 Like

At the risk of repeating myself;

  1. The entire point of the proposal was to solve the security issues we’ve been discussing in the last week. It doesn’t exist there due to the separation of code and data.
  2. You have misunderstood the proposal somehow… As stated in the chip: “a transaction can have multiple outputs that work together if they are spent in one transaction later”. You seem to have skipped the first half of that idea.
  3. It is. I’m honest that way. The widespread support needed to change the protocol is lacking for this kind of space optimization.

Because it is good way to compare. Not competition, nobody can blame me for fighting for “my idea”, no, the point is to make clear that the op-evel one has core issues that can and are actually solved. It shows that op-eval is not the best at getting the goals it suggests it is about, we can do better.

I hope that answers all your worries.

Discussed OP_EVAL on the Podcast with Jason recently here: https://youtu.be/JLxvf81ROs8?t=3874

4 Likes

There was a question on Telegram that was making me take a step back. It was a good question that needs explanation if it wasn’t known yet.

It is about an attack. Allow me to state something well known first:
Imagine I mine a transaction, like there are hundreds of millions already today, where I need to provide my public key to unlock it. One might ask, why is that secure?
The p2pkh (pay to publickey-hash) transaction this is about is secure because a) it only stores a hash. A hash is one-directional. b) the number of public keys that are possible is practically speaking infinite and brute-forcing doesn’t make sense.

So, my main worry with this chip is that it doesn’t use trusted code. Lets look at that situation where code isn’t explicitly made trusted.
The usecase of a transaction-output that uses op-eval from stack. The code to execute will be pushed by the unlocking transaction.
The locking code will be on-chain, plainly visible for everyone. Exactly like the pay to public key hash usecase.
But to find code that will unlock the transaction, that is massively easier. It doesn’t even have to be the same code as the author intended. It just has to make the full node say “ok”. The brute forcing attack, maybe with some inteligence and heuristics, will very likely find a working script in short order.

This massive difference between risk of brute forcing is why all bitcoin chains force the use of a hash to ensure that the code pushed is the one we want. And I think we want to continue that tradition.

(post deleted by author)

This is not an attack, this is someone making a mistake and unwittingly lock funds in an output that anyone spend. That has always been possible…

yes. The output can be trivially brute forced if untrusted, which is indeed a clear mistake. That is the real point.
Op eval is missing the thing that is ALWAYS needed to avoid that mistake, it misses the idea of trusted code. The hash of the code in p2sh, for instance.

Thank you.

You can’t blame the op code for user mistakes and what you are doing is spreading FUD.

Consider this example that I consider same thing and is fully possible today.
Someone locks funds with the following locking script:
[...] OP_INPUTINDEX OP_UTXOTOKENCOMMITMENT <commitment> OP_EQUALVERIFY [...]
This makes sure that any funds can only be unlocked if the user provides a NFT with the specified commitment as input. The problem? Anyone that sees this locking script can create their own token category with an NFT with this specific commitment. Clearly a user error since the locking contract should contain further checks (like the category). This does not make CashTokens unsafe.

I consider this to be true already today.
Consider this trivial example where there is no way of knowing how many calls to OP_HASH256 will be done without actually running the code with some input.

It’s bound by the number of OP_IFs though, but any looping/recursion construct would make it even more obfuscated.

OP_SHA256
OP_DUP
<2> OP_MOD
OP_IF
  OP_SHA256
  OP_DUP
  <2> OP_MOD
OP_ELSE
  // Do something
OP_ENDIF

OP_IF
  OP_SHA256
  OP_DUP
  <2> OP_MOD
OP_ELSE
  // Do something else
OP_ENDIF
// Continue this construct