CHIP 2024-12 OP_EVAL: Function Evaluation

You could hash the current stack state, require the hash be in some OP_RETURN, then execute whatever arbitrary user-provided code. After it’s done, your code hashes the stack state again and compares it against the same OP_RETURN to verify the user didn’t mess up your stack.

Or, you could just have the user-provided code be the last segment of your script, and have *VERIFY OP just before it.

2 Likes

@bitjson re. your updated evaluation of alternatives

Status Quo: Sidecar Inputs ("Emulated OP_EVAL ")

You don’t need tokens to achieve Eval emulation via sidecar inputs. You can just use introspection on the locking bytecode as proof that spender could successfully unlock it in the same transaction context: <hash> <n> OP_UTXOBYTECODE OP_SHA256 OP_EQUALVERIFY. This way, every spend would require preparing a compatible dust UTXO.

Output-Level Function Annex

This is a naive implementation. We should consider a spender-provided annex on the input rather than output, and opcodes to read it as data for authentication or directly execute it. The annex could have 2 modes of specifying the bytecode: verbatim or by other input’s index.

This would achieve:

  • hiding of unused spending paths: achieved, because spender would only provide bytecode when the script to be run would require it
  • cross-input deduplication: achieved, spender would be responsible for optimizing transaction size by specifying each unique bytecode once in some input, and then referencing them from others;
  • running code defined at runtime: achieved, spender must evaluate the Script first to know which bytecode the Script will end up requiring, and then provide that in the annex to satisfy the Script. This would affect compressability, though, because each bytecode-to-be-executed generated on stack must have a matching entry in the annex.

Edit: actually we’d need both input & output annexes to avoid replication or hash-authentication when creator wants to commit to specific bytecodes. Anyway, all this complexity just to do what the simple Eval can do - but without breaking static analyzability.

1 Like

You’re quick! Thanks for looking it over :pray:

Can you take this further and explain how you’d implement the approved-by-aggregated-public-key vault example?

How can a valut accept payments over time to different P2SH addresses (differing by a nonce) that commit to an aggregated public key, then be able to spend them by providing a data signature over a condition (script) revealed at spend + condition satisfier? E.g. the user wants to “add an inheritance plan” after using the vault for a year; we want to avoid moving all the UTXOs or leaking current balance, and “add a spending path” for spending using a newly created token category that implements a multi-step inheritance protocol for which their wallet then distributes the data signature and appropriate tokens to all heirs and decision makers. (Ideally, we don’t want the token category to leak association with any other spending paths.)

Think of the aggregate key as the “cold storage” master key, and the wallet is providing nice UX options for modifying the wallet without repeatedly “sweeping all funds” – vault owner is passing new income through CashFusion before depositing in this sort of “savings vault”, and the wallet vendor is able to continuously ship new security options that the user can configure/enable. (Though: revocation support requires either basing authentication on tokens or sweeping all on revocations; maybe the wallet vendor would simply choose to always use a standard token interface.)

Yes, I still need to tie off a description for a Transaction-Level Function Annex, I appreciate the help thinking through it!

Can you clarify how you expect the function definitions to be authenticated by the locking bytecode? (Do all function definitions have to be provided at spend time like P2SH, or can some be in the output?) If you’re able to hide the unused spending paths: are they in a Merkle tree? key/NUMS + public key tweaking like Taproot? There are a lot of possible constructions – do you think there’s a clear best option in this category? (It’s worth noting that Taproot is an instance of “Transaction-Level Function Annex” without cross input deduplication :grinning_face_with_smiling_eyes:)

Without mutation: privacy vs. complexity/efficiency tradeoff

In talking more with @im_uname and others, it was helpful to notice that to enforce static definitions, there seems to be either a privacy tradeoff (contracts can’t avoid leaking unused code paths and past transaction history) or a protocol complexity and efficiency tradeoff (i.e. the protocol forces contract authors to use one particular method, e.g. encoding in a merkle tree, pubkey tweaking like Taproot, etc.). Whereas allowing functions to be defined in the course of evaluation (OP_EVAL, OP_DEFINE/OP_INVOKE, etc.) lets contract authors both decide themselves + ensures you can maximally-deduplicate bytes for any particular set of functions.

Regardless of how the CHIP ultimately implements function definition (I’m open to other ideas!) it would be good to collect some concrete reasoning for selecting one of these tradeoffs.

Could you guys help me steelman this as a goal? (for @im_uname or anyone else, too) How do we justify imposing real costs (privacy and/or transaction size) on contract authors for this property? Is there some angle of formal verification or pre-execution transpilation (like WASM) for which this would be valuable? (adding headings to find/link later:)

Formal Verification

Re formal verification, I don’t think the underlying VM has much impact once it’s computationally universal (you can abstract away anything and work only in higher-level symbols). This Push-Button Verification for BitVM Implementations even spends a lot of additional effort starting from bare bytecode in a non-computationally universal VM (BTC), “lifting” constructions into their own “high-level register-based instructions”. And of course lots of systems simply begin with their own easier-to-prove higher level language and compile to (a subset) of target machine instructions.

Pre-execution Compilation

Re pre-execution compilation: it’s hard to envision a future upgrade requiring node implementations to do this (presumably to raise VM limits far beyond the current performance capabilities of the consensus-safer “emulated” VM model) – if the idea itself were even considered “consensus safe” (probably a lot easier to have implementation-specific bugs), it’s hard to believe that such a future upgrade wouldn’t simply add WASM support as an allowed locking bytecode “format”.

But this whole line of reasoning only starts to make sense if you’re going to evaluate program code more than once (so only for read-only inputs?) and for areas where the emulated VM approach vastly underperforms the compilation-to-native approach (access to SIMD?). Again, assuming that can even be made “consensus safe” enough, and the ecosystem was willing to require WASM dependencies as a consensus requirement… It seems like even more of a stretch that we’d try to morph BCH VM bytecode into our own WASM-like compilation standardization project. It seems much more plausible that we’d add fast modular arithmetic, NTT, EC pairing, etc. opcodes and continue to rely on the emulated VM approach for better consensus safety.

Others?

Or maybe I’m looking in the wrong places – are there other areas where enforcing static definitions before the evaluation starts might be useful? If this influences the design, how do we justify that privacy vs. efficiency tradeoff?

I would expect precomputing to happen in the compiler for the most part.

The op-eval design mixes code and data, they are both put on stack and there is no indication which is code and which is data. But if you look at the alternative chip called subroutines, it specifically marks code as… code. Which makes JIT possible. Which makes pre-computing nice and smooth in the compiler or such and you have no need to use the commitment for that.

As an aside, if the goal is to shorten the script, op-eval is not getting the best compression, nor the best safety compared to the subroutines design I proposed, so if your goal is to make the scripts shorter you’ll probably prefer that one.

My feeling still is that eval has oversold the gain it actually gets and is too costly on the protocol and ecosystem as a result for what it actually can deliver.

My example is not meant as replacement for your example, but as another example, where you just “load” one-off external predicate to verify in same context, usable in MAST-like constructions (example).

Introspection opcode that can place the definition on stack as data, so you hash the data and verify against hash committed in the locking bytecode. You need to be able to at least compare the definitions against stack elements - if you want MAST-like constructions.
You’d know that all input annex elements are potentially executed but you wouldn’t know whether they were actually executed in the Script, not without evaluating the script yourself. This still fits the bill of being statically analyzable.

Maybe something simpler could do, what if we could add 1 bit of state to stack items: executable permission. Only stack items with permission could be executed with OP_EVAL.

Imagine we declare some OP_NOP to be OP_PUSH_EXECUTABLE with syntax of OP_PUSH_EXECUTABLE {any existing data push op} i.e. usable exclusively as a prefix to any push op to make the push as executable. Any change of the stack item contents (cat/split/arithmetics) would remove the executable bit. Swap, dup, moving to/from alt stack would still preserve it. OP_EVAL on a stack item without the bit would fail the script.

input:

OP_PUSH_EXECUTABLE <0xac> would push a stack item 0xac which would have the executable bit

the locking script could do:

OP_DUP <0xab> <1> OP_ADD OP_EQUALVERIFY OP_EVAL

but it could NOT do:

OP_DUP <0xab> <1> OP_ADD OP_EQUALVERIFY OP_1ADD OP_1SUB OP_EVAL

because OP_1ADD OP_1SUB would clear the executable bit.

Everything concerning OP_EVAL would work the same, and Eval-scripts could push their own executable blobs using the OP_PUSH_EXECUTABLE {data push} and have it be available to parent scripts etc.

The key here is that it will be trivial for a parser to extract ALL executable blobs by recursively drilling into each until no more executable blobs are to be found.

Because we redefine a NOP, it means we could later easily remove this requirement if it is found that static analyzability was not such an important property to conserve.

  1. The problem do exist in your proposal.

  2. Your proposal has fundamental flaws that you haven’t addressed.

  3. Your proposal has been withdrawn.

Why bring it up?

1 Like

At the risk of repeating myself;

  1. The entire point of the proposal was to solve the security issues we’ve been discussing in the last week. It doesn’t exist there due to the separation of code and data.
  2. You have misunderstood the proposal somehow… As stated in the chip: “a transaction can have multiple outputs that work together if they are spent in one transaction later”. You seem to have skipped the first half of that idea.
  3. It is. I’m honest that way. The widespread support needed to change the protocol is lacking for this kind of space optimization.

Because it is good way to compare. Not competition, nobody can blame me for fighting for “my idea”, no, the point is to make clear that the op-evel one has core issues that can and are actually solved. It shows that op-eval is not the best at getting the goals it suggests it is about, we can do better.

I hope that answers all your worries.

Discussed OP_EVAL on the Podcast with Jason recently here: https://youtu.be/JLxvf81ROs8?t=3874

5 Likes

There was a question on Telegram that was making me take a step back. It was a good question that needs explanation if it wasn’t known yet.

It is about an attack. Allow me to state something well known first:
Imagine I mine a transaction, like there are hundreds of millions already today, where I need to provide my public key to unlock it. One might ask, why is that secure?
The p2pkh (pay to publickey-hash) transaction this is about is secure because a) it only stores a hash. A hash is one-directional. b) the number of public keys that are possible is practically speaking infinite and brute-forcing doesn’t make sense.

So, my main worry with this chip is that it doesn’t use trusted code. Lets look at that situation where code isn’t explicitly made trusted.
The usecase of a transaction-output that uses op-eval from stack. The code to execute will be pushed by the unlocking transaction.
The locking code will be on-chain, plainly visible for everyone. Exactly like the pay to public key hash usecase.
But to find code that will unlock the transaction, that is massively easier. It doesn’t even have to be the same code as the author intended. It just has to make the full node say “ok”. The brute forcing attack, maybe with some inteligence and heuristics, will very likely find a working script in short order.

This massive difference between risk of brute forcing is why all bitcoin chains force the use of a hash to ensure that the code pushed is the one we want. And I think we want to continue that tradition.

This is not an attack, this is someone making a mistake and unwittingly lock funds in an output that anyone spend. That has always been possible…

2 Likes

yes. The output can be trivially brute forced if untrusted, which is indeed a clear mistake. That is the real point.
Op eval is missing the thing that is ALWAYS needed to avoid that mistake, it misses the idea of trusted code. The hash of the code in p2sh, for instance.

Thank you.

You can’t blame the op code for user mistakes and what you are doing is spreading FUD.

Consider this example that I consider same thing and is fully possible today.
Someone locks funds with the following locking script:
[...] OP_INPUTINDEX OP_UTXOTOKENCOMMITMENT <commitment> OP_EQUALVERIFY [...]
This makes sure that any funds can only be unlocked if the user provides a NFT with the specified commitment as input. The problem? Anyone that sees this locking script can create their own token category with an NFT with this specific commitment. Clearly a user error since the locking contract should contain further checks (like the category). This does not make CashTokens unsafe.

3 Likes

I consider this to be true already today.
Consider this trivial example where there is no way of knowing how many calls to OP_SHA256 will be done without actually running the code with some input.

It’s bound by the number of `OP_IFs though, but any looping/recursion construct would make it even more obfuscated.

OP_SHA256
OP_DUP
<2> OP_MOD
OP_IF
  OP_SHA256
  OP_DUP
  <2> OP_MOD
OP_ELSE
  // Do something
OP_ENDIF

OP_IF
  OP_SHA256
  OP_DUP
  <2> OP_MOD
OP_ELSE
  // Do something else
OP_ENDIF
// Continue this construct

Clarification:

If OP_EVAL is called then any high cost operation (like OP_SHA256) exists since it is a part of the VM that executes the code. Whether it’s executed or not, and the number of times, could be impossible to tell by static analysis. But that is also true for (certain) scripts today!

5 Likes

Some chat about OP_EVAL on the BLISS technical panel.

I said @emergent_reasons was quite concerned about the static analysis disruption part of it, but he later clarified that he was more concerned about something to do with attacks. I’m hoping he can explain that more because I’m obviously not across it just yet.

2 Likes

I’ve read through the proposals. FWIW I have no interest in complicated stack protection mechanisms. The simplicity of eval is powerful and elegant. As Jeremy mentioned though, I am super concerned, through the vehicle of unexpected stack interaction (a different class from sidecar interaction AFAIU), about the increased attack surface created by “non-local” code that the compiler/reviewer cannot fully reason about at the time of UTXO creation.

Has anyone made a proposal that is exactly just op_eval, but the definition of the blob to execute is required to be local to defined in the input and immutable (if executed)? Yes it would reduce absolute byte efficiency, but honestly I DGAFFFFF about byte efficiency vs. unknown unknown attack surface (0-days on massive protocols arising from unexpected edge case stack interactions even with the best of intentions). From another angle, until some massive and concrete upside appears beyond input-local code compression, I don’t see a need for open-ended op_eval when input-defined op_eval accomplishes great byte efficiency gains for complex scripts through compression.

Maybe there is some globally hashed function that everyone would like to use and we can audit once but… really? That sounds like something that won’t actually happen in practice and you just lose a few bytes if you do it input-defined anyway - those function definitions and audits can just as well happen at a high level library/compiler level rather than at the op_code level.

TLDR: can I haz OP_EVAL except:

  1. input-local definition
  2. anything executed can’t have been mutated (could be accomplished with clean DX through local definition of custom op_code macros?)
  3. dgaf about stack protection - that’s a losing battle IMO at the op_code level

Is that a bad idea? Does it eliminate some important and concrete use case? Does it not work well with P2S?

Then (or in the meantime) if we really do identify concrete, powerful, huge benefits for raw OP_EVAL, then there’s nothing stopping the additional unlock step. However going backwards will be practically impossible. One way streets and all that.

Note that I’m also aware that through dynamic input lock script construction, the contract author may still not be able to fully reason about the logic at creation, but those cases will require significant effort to create. I’m happy to have a barrier to dangerous things that only gets crossed if there is proof positive of value on the other side that would then encourage the raw OP_EVAL.

2 Likes

Yes.

1 Like

Thanks for this. Just a follow-up question: In the current state, would this mean we could only do this safely for a single User Provided Function (e.g. OP_OUTPUTTOKENCOMMITMENT OP_EVAL) due to not being able to effectively safe-guard what it leaves on the stack?

I may’ve missed the implication of the OP_*VERIFY, but my other thought is that this pattern could possibly work for multiple if we used OP_DEPTH (which pushes the current number of stack items to the stack) as then we can verify that the OP_EVAL’d data returns the correct/expected number of Stack Items. E.g.

// ... Prior code that leaves us with a blank stack

// Evaluate token program at Output 0 
<0> OP_OUTPUTTOKENCOMMITMENT OP_EVAL

// Drop token program from Output 0 from stack.
// OP_NIP // EDIT: Not necessary because OP_EVAL pops the program

// Ensure that we're left with the expected number of stack items.
// In this example: 1
OP_DEPTH <1> OP_EQUALVERIFY

// Drop token program from Output 1 from stack.
// OP_NIP // EDIT: Not necessary because OP_EVAL pops the program

// Evaluate token program at Output 1 
<1> OP_OUTPUTTOKENCOMMITMENT OP_EVAL

// Ensure that we're left with the expected number of stack items.
// In this example: 1
OP_DEPTH <1> OP_EQUALVERIFY

// ... etc

I think the importance would then become that the Stack maintains a blank (or known) state before any of the user-provided functions are executed?

Apologies if I’m way off the mark and this is already achievable.

EDIT: Fixed mistakes in example.
EDIT: Didn’t realize there’s an OP_DEPTH which does what my hypothetical OP_STACKSIZE was trying to do.

If I understand correctly, this is half of the picture.

  1. I really don’t care about this “static analysis” thing. I mean it’s kinda nice, but I really DGAF about it, unless some major player is already using it for some specific purpose - then we just find a way to help them do whatever that is in a new way. Additionally, static analysis requires construction of abstractions and I can’t imagine it ends up being cheaper to do than just executing a script.
  2. It’s only immutability right? That’s good, half the picture.
  3. I think the missing part is the input-defined aspect. That’s crucial to reducing the attack surface of unknown unknown stack interactions (0-days) that are possible even when cashscript is working perfectly, code is written well.

I.e. this isn’t about bugs or incorrect compiler behavior, but about avoiding giving easy access to a whole new large and very hard to reason about attack surface. Think about the incredible maturity of C++ and other language toolchains, yet we still have 0-days coming up on an ongoing basis. It’s not because of bad programming or bad compilers. It’s because of very motivated people finding tiny dark corners that lead to profit or chaos.

This limited eval can easily be superceded later by raw eval if a really convincing case is made in the future.

1 Like

That’s crucial to reducing the attack surface of unknown unknown stack interactions (0-days) that are possible even when cashscript is working perfectly, code is written well (i.e. this isn’t about bugs or incorrect compiler behavior).

Are you able to try to think up an example of this? Assuming no bugs in the CashScript compiler (and that an eval function is not exposed), I can’t see a scenario where this could happen.

My concern here would be that the alternative of only allowing execution of that’s local to the input’s Unlocking Bytecode would probably introduce implementation complexity that actually heightens the risk of a 0-day.

2 Likes

I can’t. It’s kinda the nature of 0-days - 10 people can look at the code and not notice some double edge case interaction. Maybe a security researcher could?

  • There are assumptions about the state of the stack during input execution
  • There are assumptions about how “remote” code executes, leading to assumptions about the state of the stack after input execution
  • There are assumptions about the remaining code to be executed after completion of eval
  • I’m sure there are other details

Taken alone, they can be reasoned about. For example with input-defined blobs, the compiler can reason strongly about all the possibilities (this IS an appropriate place for static analysis - at compile time, not at tx validation time). But with “remote” code coming from outside, the compiler has to step into a higher level of abstraction unless you restrict it with authorized code hashes or something like that - but the unauthorized version is still there and will haunt us. Why invite that into our house?

Like… would you ever write a program that accepts and Evals a code/data blob from an API endpoint and then literally injects it into your program execution path? Even if you can write whatever validation code you want about that blob before you run it (which is harder for us with limited space)? I can’t imagine wanting to do that. And to make sure there is clarity here - I’m not talking about data/code separation per-se, although it turns into that in this particular example.

Regarding concern about implementation complexity - I really don’t see it. It’s not complex at all. Especially because, at least in what I’m suggesting, there’s no attempt at quixotic (IMO) stack protection.