Brainstorming OP_EVAL

With the VM limits CHIP in place I think we could now implement OP_EVAL safely. Whatever it executes could count against the same budget as parent Script so OP_EVAL wouldn’t be able to cheat to amplify execution cost, and it couldn’t be used to turn the Script to be Turing-complete because it would inevitably hit the total limit by the VM limits CHIP.

Recall Gavin Andersen’s BIP-0012 definition:

OP_EVAL will re-define the existing OP_NOP1 opcode, and will function as follows:

  • When executed during transaction verification, pops the item from the top of the stack, deserializes it, and executes the resulting script.
  • If there is no item on the top of the stack or the item is not a valid script then transaction validation fails.
  • If there are any OP_CODESEPARATORs in the deserialized script then transaction validation fails.
  • If there are any OP_EVALs in the deserialized script they are also executed, but recursion is limited to a depth of 2.
  • Transaction verification must fail if interpreting OP_EVAL as a no-op would cause the verification to fail.

Example, the script 0x00027551b0 would result in value of 1 on stack and pass. How so? Decompiled, that is <0> <0x7551> OP_EVAL, and 0x7551 decompiles to OP_DROP OP_1 so execution would go:

  • push 0 on stack
  • push 0x7551 on stack
  • pop 0x7551 from stack
  • execute 0x7551 with starting stack state: {0}
  • execution ends with stack state: {1}

Later, Bitcoin Unlimited was proposing OP_EXEC (now active and in use on their Nexa network), which gave more thought about interaction with stack:

  • OP_EXEC executes a subscript that is presented as data in a script. This subscript is executed in an isolated stack environment – it can neither read nor modify elements on the main or alt stacks. Any illegal operation fails validation of the entire script ( T.o1 ).
  • A zero length stack element is a valid script (that does nothing) ( T.o2 ).
  • As with any scripts, any pops of an empty subscript stack fail, which fails the entire script ( T.o3 ).

Note that a space-efficient implementation does not need to recursively start another script machine instance with new stack objects. It can execute the subscript on the existing main and altstack, with the addition of a barrier to ensure that the subscript does not access (read, write, or pop) more than the provided N params.

I think we could implement this idea but in a simpler way. Have OP_EVAL “freeze” some N bottom stack elements, so it would be called as

<bytecode_to_execute> <N_freeze> OP_EVAL.

If we apply it to above example, then 0x0002755151b0 would fail while 0x0002755100b0 would pass, because decompiled it would be:

  • <0> <0x7551> <1> OP_EVAL and there the inner OP_DROP (0x75) would fail because it would try to access bottom stack item without permission, and
  • <0> <0x7551> <0> OP_EVAL would pass because it would have the permission and would succeed at dropping the 0 and replacing it with a 1.
1 Like

Thanks for sharing these!

I made some comments on the BIP 12 approach here:

Re NEXA’s OP_EXEC: it seems like trying to get both 1) an efficient lambda/function evaluation behavior and 2) a “safe” user-provided code sandboxing system out of the same opcode is going to result in a sub-optimal answer for both.

IMO, the most important use case of OP_EVAL is as a function-calling pattern. I expect nearly every sufficiently complex contract to use OP_EVAL at least once (to minimize transaction sizes), and covenants with non-trivial math (esp. crypto) are going to be calling OP_EVAL tens, hundreds, or thousands of times (though never deeper than the 100 depth limit). E.g. applying a crypto group operation requires a 3-byte call to the previously-pushed subroutine: OP_13 OP_PICK OP_EVAL (which can then perform dozens of efficient/cheap operations which don’t need to be duplicated in the encoded transaction).

On the other hand, the sandboxing behavior for OP_EXEC requires another stack item, so we get 1) added protocol complexity and 2) an increase from ~3 to ~5 bytes per-invocation. Also interesting that “pops of an empty subscript stack fail, which fails the entire script” – so the sandboxing behavior is only safe for passing patterns? (Can’t be relied upon for failing patterns, since a somehow-preserved failing pattern could permanently freeze a covenant that is unable to complete the OP_EXEC.)

Can you help me understand some use cases you’re thinking about for this stack freezing behavior? It’s notable here that NEXA doesn’t quite have CashTokens – can the use cases you’re thinking about be solved more efficiently by giving the other party a token, and simply checking for the presence of that token in whatever inputs/outputs needed?

Instead of giving the other party write-access to your contract + a carefully sandboxed evaluation context + internal API, just let them write/control their own real contract(s) – marked by a token you can cheaply identify (~4 bytes)?

(No rush, going to be offline a bit, but if it’s helpful I’ll try to write more here on Monday.)

1 Like

OP_EXEC better matches the “function-calling pattern” because normally functions have function signatures.

OP_EVAL is more like macro expansion, the callee can mess with the whole program and nothing the caller can do about it.

Tom summarized it well:

and this is basically the same question.

What abstraction “level” is the aim, and are we internally consistent.
Op_eval is like assembly.
op_exec is like C
OP_mast / op_merkle are much more close to the “rest” of the engine with ScriptVerify and other super specific opcodes.

.

They do have native tokens, have same functionality of our fungibles, and could emulate our NFT+commitments with their group authorities & subgroups. They also have what you’d call an “edit script” capability so they don’t have to spend script code to prevent users from breaking out of a covenant.

I didn’t have particular use-case in mind, just asking for a rationale. This could be a good rationale: OP_EVAL is lighter but harder to contain, however if containment is needed you can let users run their unchecked code on sidecar tokens thus emulate OP_EXEC. If we expect common uses of eval/exec to be running pre-authenticated Scripts (e.g. OP_DUP OP_SHA256 <hash_of_function> OP_EQUALVERIFY OP_EVAL), then eval is the optimal one. If you want to have contracts that let spenders run their own callbacks or something, then OP_EXEC is more suitable.

In any case, either will unlock fundamental functionality not really available now (even though some of it can be emulated with side-cars).

I fed Claude Sonnet all 3 specs + this topic + Telegram discussion and asked him to summarize, and he did a great job (thanks Claude!):

Based on all the provided information, here are the key differences between OP_EVAL and OP_EXEC:

  1. Stack Isolation
  • OP_EVAL: Full access to caller’s stack; can modify any stack elements
  • OP_EXEC: Isolated stack environment with explicit parameter passing
  • This is perhaps the most fundamental architectural difference
  1. Function Signatures
  • OP_EXEC: Explicit function-like interface with N_Params and M_Returns
  • OP_EVAL: No formal parameter interface, more like macro expansion
  • Makes OP_EXEC more structured but more verbose
  1. Security Model
  • OP_EXEC: Strong boundaries between caller and called code
  • OP_EVAL: Relies on pre-authentication of code (hash verification)
  • Different approaches to safety vs flexibility trade-off
  1. Implementation Complexity
  • OP_EVAL: Simpler implementation, more direct
  • OP_EXEC: More complex due to isolation mechanisms
  • But both can be implemented efficiently
  1. Byte Efficiency
  • OP_EVAL: More compact (∼3 bytes per call)
  • OP_EXEC: More verbose (∼5 bytes per call)
  • Due to additional parameter count overhead
  1. Use Case Optimization
  • OP_EVAL: Better for pre-authenticated, trusted code reuse
  • OP_EXEC: Better for untrusted code execution scenarios
  • Different sweet spots for different requirements
  1. Abstraction Level
  • OP_EVAL: Lower level, assembly-like
  • OP_EXEC: Higher level, more like traditional functions
  • Reflects different philosophical approaches

The core distinction comes down to OP_EXEC providing stronger guarantees through isolation at the cost of complexity, while OP_EVAL offers simpler, more efficient execution but requires trust or external verification mechanisms.

This shows a fundamental trade-off between safety and simplicity in the design space of code execution primitives.

1 Like

Thanks @bitcoincashautist for the excellent summary/

Yeah basically with OP_EVAL you are being more efficient but are limiting use-cases to 100% code you control (or can verify a-la p2sh hash checks).

With OP_EXEC you open the door for executing “untrusted” code.

If I had to pick 1, OP_EXEC seems like the more generic solution. And OP_EVAL seems like the “efficiency optimization” for internal code only.

Jason: Do you ever envision contracts needing to execute “untrusted” subroutines? If so – we go with OP_EXEC. If not, we go with OP_EVAL.

Or we go with both and let developers decide.

I think in an ideal world we do both. So long as we are here messing with the guts of the VM, might as well do both.

1 Like

I should also note: One can do “clever” (hacky?) special case things like OP_1NEGATE OP_EXEC → becomes basically OP_EVAL. i.e. if you see -1 right before it, you know caller declined to use the call stack semantics and it behaves like 100% proposed OP_EVAL. So in that scheme it becomes:

1 Like

there’s a precedent in OP_IFDUP

1 Like

Another approach is to break this up into 2 things:

  • declare an indexed subroutine, specifying its call signature “ahead of time”. Name the op-code maybe OP_DECLARESUB.

  • call said subroutine efficiently by index. Name it maybe OP_CALLSUB

  • OP_DECLARESUB is like OP_EXEC in that you declare a piece of code and a call signature. Unlike OP_EXEC these get “saved” internally into some table the VM is tracking (for reference later).

  • OP_DECLARESUB is not like OP_EXEC in that it doesn’t jump into the code at all immediately. For that you need OP_CALLSUB.

  • OP_CALLSUB just references indexes into this table, and the VM “knows” what code your are talking about and what the call signature is now. So you don’t need to specify it at each call-site. You just need to provide the args (if any). That’s it. This way you get most efficient byte encoding if you have multiple calls of same code.

Advantages:

  • No need to constantly re-molest the stack for every OP_EVAL/OP_EXEC each time. You just call into indexed subroutine. Only 1 (or more) args for OP_CALLSUB are needed. So in terms of efficiency at the call-site for many calls, this is the most efficient requiring only 1 arg (the index) for each call.
  • No more stack molestation/pushing/popping/etc to setup the stack item for the code to be eval’d!
  • For often called code, most efficient possible.

Disadvantages:

  • Less efficeint for one-offs, OP_EVAL style executions… since you need to both declare the subroutine and then call it, increasing byte count.
  • More stateful VM since we introduce this abstraction of “known subroutine table” which now needs to be tracked…

Observation(s):

  • Philosophically the above goes in the direction of CISC (vs RISC), adding higher abstraction to the VM rather than bare minimum. Technically one could implement this (less safely and less compactly) with just OP_EVAL in just the VM itself. This is definitely CISC. Not RISC. Not sure if this is advantage or disadvantage just an observation.
1 Like

if you have a single transaction with multiple output scripts, can we imagine a case where you want to copy / run the subroutine from another output?

1 Like