Brainstorming OP_EVAL

With the VM limits CHIP in place I think we could now implement OP_EVAL safely. Whatever it executes could count against the same budget as parent Script so OP_EVAL wouldn’t be able to cheat to amplify execution cost, and it couldn’t be used to turn the Script to be Turing-complete because it would inevitably hit the total limit by the VM limits CHIP.

Recall Gavin Andersen’s BIP-0012 definition:

OP_EVAL will re-define the existing OP_NOP1 opcode, and will function as follows:

  • When executed during transaction verification, pops the item from the top of the stack, deserializes it, and executes the resulting script.
  • If there is no item on the top of the stack or the item is not a valid script then transaction validation fails.
  • If there are any OP_CODESEPARATORs in the deserialized script then transaction validation fails.
  • If there are any OP_EVALs in the deserialized script they are also executed, but recursion is limited to a depth of 2.
  • Transaction verification must fail if interpreting OP_EVAL as a no-op would cause the verification to fail.

Example, the script 0x00027551b0 would result in value of 1 on stack and pass. How so? Decompiled, that is <0> <0x7551> OP_EVAL, and 0x7551 decompiles to OP_DROP OP_1 so execution would go:

  • push 0 on stack
  • push 0x7551 on stack
  • pop 0x7551 from stack
  • execute 0x7551 with starting stack state: {0}
  • execution ends with stack state: {1}

Later, Bitcoin Unlimited was proposing OP_EXEC (now active and in use on their Nexa network), which gave more thought about interaction with stack:

  • OP_EXEC executes a subscript that is presented as data in a script. This subscript is executed in an isolated stack environment – it can neither read nor modify elements on the main or alt stacks. Any illegal operation fails validation of the entire script ( T.o1 ).
  • A zero length stack element is a valid script (that does nothing) ( T.o2 ).
  • As with any scripts, any pops of an empty subscript stack fail, which fails the entire script ( T.o3 ).

Note that a space-efficient implementation does not need to recursively start another script machine instance with new stack objects. It can execute the subscript on the existing main and altstack, with the addition of a barrier to ensure that the subscript does not access (read, write, or pop) more than the provided N params.

I think we could implement this idea but in a simpler way. Have OP_EVAL “freeze” some N bottom stack elements, so it would be called as

<bytecode_to_execute> <N_freeze> OP_EVAL.

If we apply it to above example, then 0x0002755151b0 would fail while 0x0002755100b0 would pass, because decompiled it would be:

  • <0> <0x7551> <1> OP_EVAL and there the inner OP_DROP (0x75) would fail because it would try to access bottom stack item without permission, and
  • <0> <0x7551> <0> OP_EVAL would pass because it would have the permission and would succeed at dropping the 0 and replacing it with a 1.
2 Likes

Thanks for sharing these!

I made some comments on the BIP 12 approach here:

Re NEXA’s OP_EXEC: it seems like trying to get both 1) an efficient lambda/function evaluation behavior and 2) a “safe” user-provided code sandboxing system out of the same opcode is going to result in a sub-optimal answer for both.

IMO, the most important use case of OP_EVAL is as a function-calling pattern. I expect nearly every sufficiently complex contract to use OP_EVAL at least once (to minimize transaction sizes), and covenants with non-trivial math (esp. crypto) are going to be calling OP_EVAL tens, hundreds, or thousands of times (though never deeper than the 100 depth limit). E.g. applying a crypto group operation requires a 3-byte call to the previously-pushed subroutine: OP_13 OP_PICK OP_EVAL (which can then perform dozens of efficient/cheap operations which don’t need to be duplicated in the encoded transaction).

On the other hand, the sandboxing behavior for OP_EXEC requires another stack item, so we get 1) added protocol complexity and 2) an increase from ~3 to ~5 bytes per-invocation. Also interesting that “pops of an empty subscript stack fail, which fails the entire script” – so the sandboxing behavior is only safe for passing patterns? (Can’t be relied upon for failing patterns, since a somehow-preserved failing pattern could permanently freeze a covenant that is unable to complete the OP_EXEC.)

Can you help me understand some use cases you’re thinking about for this stack freezing behavior? It’s notable here that NEXA doesn’t quite have CashTokens – can the use cases you’re thinking about be solved more efficiently by giving the other party a token, and simply checking for the presence of that token in whatever inputs/outputs needed?

Instead of giving the other party write-access to your contract + a carefully sandboxed evaluation context + internal API, just let them write/control their own real contract(s) – marked by a token you can cheaply identify (~4 bytes)?

(No rush, going to be offline a bit, but if it’s helpful I’ll try to write more here on Monday.)

1 Like

OP_EXEC better matches the “function-calling pattern” because normally functions have function signatures.

OP_EVAL is more like macro expansion, the callee can mess with the whole program and nothing the caller can do about it.

Tom summarized it well:

and this is basically the same question.

What abstraction “level” is the aim, and are we internally consistent.
Op_eval is like assembly.
op_exec is like C
OP_mast / op_merkle are much more close to the “rest” of the engine with ScriptVerify and other super specific opcodes.

.

They do have native tokens, have same functionality of our fungibles, and could emulate our NFT+commitments with their group authorities & subgroups. They also have what you’d call an “edit script” capability so they don’t have to spend script code to prevent users from breaking out of a covenant.

I didn’t have particular use-case in mind, just asking for a rationale. This could be a good rationale: OP_EVAL is lighter but harder to contain, however if containment is needed you can let users run their unchecked code on sidecar tokens thus emulate OP_EXEC. If we expect common uses of eval/exec to be running pre-authenticated Scripts (e.g. OP_DUP OP_SHA256 <hash_of_function> OP_EQUALVERIFY OP_EVAL), then eval is the optimal one. If you want to have contracts that let spenders run their own callbacks or something, then OP_EXEC is more suitable.

In any case, either will unlock fundamental functionality not really available now (even though some of it can be emulated with side-cars).

I fed Claude Sonnet all 3 specs + this topic + Telegram discussion and asked him to summarize, and he did a great job (thanks Claude!):

Based on all the provided information, here are the key differences between OP_EVAL and OP_EXEC:

  1. Stack Isolation
  • OP_EVAL: Full access to caller’s stack; can modify any stack elements
  • OP_EXEC: Isolated stack environment with explicit parameter passing
  • This is perhaps the most fundamental architectural difference
  1. Function Signatures
  • OP_EXEC: Explicit function-like interface with N_Params and M_Returns
  • OP_EVAL: No formal parameter interface, more like macro expansion
  • Makes OP_EXEC more structured but more verbose
  1. Security Model
  • OP_EXEC: Strong boundaries between caller and called code
  • OP_EVAL: Relies on pre-authentication of code (hash verification)
  • Different approaches to safety vs flexibility trade-off
  1. Implementation Complexity
  • OP_EVAL: Simpler implementation, more direct
  • OP_EXEC: More complex due to isolation mechanisms
  • But both can be implemented efficiently
  1. Byte Efficiency
  • OP_EVAL: More compact (∼3 bytes per call)
  • OP_EXEC: More verbose (∼5 bytes per call)
  • Due to additional parameter count overhead
  1. Use Case Optimization
  • OP_EVAL: Better for pre-authenticated, trusted code reuse
  • OP_EXEC: Better for untrusted code execution scenarios
  • Different sweet spots for different requirements
  1. Abstraction Level
  • OP_EVAL: Lower level, assembly-like
  • OP_EXEC: Higher level, more like traditional functions
  • Reflects different philosophical approaches

The core distinction comes down to OP_EXEC providing stronger guarantees through isolation at the cost of complexity, while OP_EVAL offers simpler, more efficient execution but requires trust or external verification mechanisms.

This shows a fundamental trade-off between safety and simplicity in the design space of code execution primitives.

1 Like

Thanks @bitcoincashautist for the excellent summary/

Yeah basically with OP_EVAL you are being more efficient but are limiting use-cases to 100% code you control (or can verify a-la p2sh hash checks).

With OP_EXEC you open the door for executing “untrusted” code.

If I had to pick 1, OP_EXEC seems like the more generic solution. And OP_EVAL seems like the “efficiency optimization” for internal code only.

Jason: Do you ever envision contracts needing to execute “untrusted” subroutines? If so – we go with OP_EXEC. If not, we go with OP_EVAL.

Or we go with both and let developers decide.

I think in an ideal world we do both. So long as we are here messing with the guts of the VM, might as well do both.

1 Like

I should also note: One can do “clever” (hacky?) special case things like OP_1NEGATE OP_EXEC → becomes basically OP_EVAL. i.e. if you see -1 right before it, you know caller declined to use the call stack semantics and it behaves like 100% proposed OP_EVAL. So in that scheme it becomes:

1 Like

there’s a precedent in OP_IFDUP

1 Like

Another approach is to break this up into 2 things:

  • declare an indexed subroutine, specifying its call signature “ahead of time”. Name the op-code maybe OP_DECLARESUB.

  • call said subroutine efficiently by index. Name it maybe OP_CALLSUB

  • OP_DECLARESUB is like OP_EXEC in that you declare a piece of code and a call signature. Unlike OP_EXEC these get “saved” internally into some table the VM is tracking (for reference later).

  • OP_DECLARESUB is not like OP_EXEC in that it doesn’t jump into the code at all immediately. For that you need OP_CALLSUB.

  • OP_CALLSUB just references indexes into this table, and the VM “knows” what code your are talking about and what the call signature is now. So you don’t need to specify it at each call-site. You just need to provide the args (if any). That’s it. This way you get most efficient byte encoding if you have multiple calls of same code.

Advantages:

  • No need to constantly re-molest the stack for every OP_EVAL/OP_EXEC each time. You just call into indexed subroutine. Only 1 (or more) args for OP_CALLSUB are needed. So in terms of efficiency at the call-site for many calls, this is the most efficient requiring only 1 arg (the index) for each call.
  • No more stack molestation/pushing/popping/etc to setup the stack item for the code to be eval’d!
  • For often called code, most efficient possible.

Disadvantages:

  • Less efficeint for one-offs, OP_EVAL style executions… since you need to both declare the subroutine and then call it, increasing byte count.
  • More stateful VM since we introduce this abstraction of “known subroutine table” which now needs to be tracked…

Observation(s):

  • Philosophically the above goes in the direction of CISC (vs RISC), adding higher abstraction to the VM rather than bare minimum. Technically one could implement this (less safely and less compactly) with just OP_EVAL in just the VM itself. This is definitely CISC. Not RISC. Not sure if this is advantage or disadvantage just an observation.
2 Likes

if you have a single transaction with multiple output scripts, can we imagine a case where you want to copy / run the subroutine from another output?

1 Like

I finally wrote a longer review of OP_EXEC vs OP_EVAL here: CHIP 2024-12 OP_EVAL: Function Evaluation - #14 by bitjson

To summarize: an optimized version of OP_EXEC still wastes several bytes per unique function definition, and stack isolation offers no security value, even for contrived use cases requiring evaluation of arbitrary “untrusted” code (see “User-provided pure functions”).

@bitcoincashautist would you mind fleshing out this example a bit more? Is it adequately covered by “Post-funding assigned instructions” here?

We actually don’t need OP_EVAL or OP_EXEC to open that door – contracts (and especially covenants) are precisely this sort of untrusted environment already. If an attacker can fool your contract by giving you 5 bytes where you expected 4, the last byte may very well be interpreted as an instruction (or part of the next data element) when encoded in the next locking bytecode.

I’ll continue to be interested in hearing any ideas about how a more “built-in” stack isolation feature might be useful, but for now, I expect OP_EVAL is still superior for “untrusted” code too.

Stack isolation is just 1) easy already and/or 2) not relevant for security. From “User-provided pure functions” here:

Fundamentally, the stack usage of an evaluation is simply not relevant to contract security. If the function uses too many items or produces too few, the rest of the contract will fail and the attempted transaction will be invalid (and if the contract can be frozen by a failing function, we’re doomed with or without an isolated stack). If the stack contains something we don’t want the pure function to modify, we need only rearrange our contract to push it later (or validate it later, if the data comes from unlocking bytecode).

OP_EVAL vs. word definition

Yes! I definitely need to include a section comparing the OP_EVAL CHIP with “proper” Forth-like word definition (also called OP_DEFINE/OP_UNDEFINE/OP_INVOKE in old Libauth branches).

As you pointed out, a full “word definition” upgrade proposal is quite a bit more involved: how and where we track definitions, any necessary limits for those new data structures, whether or not a word can be undefined (we only have OP_0 to OP_16 + maybe OP_1NEGATE!), what makes a valid identifier (only numbers? any single byte? multi-byte?), should we include Forth OP_FETCH/OP_STORE corollaries for data (some discussion of OP_EVAL vs. a TX-wide “data annex” here), and probably many more details.

Fortunately, we have a great argument for avoiding this bikeshed altogether: we can easily prove that OP_EVAL is the optimal construction for single-use evaluations (as you mentioned). Even if a “word definition” upgrade were hammered out and activated in the future, OP_EVAL would still be the most efficient option for many constructions. (This coincidentally was the same argument that made P2SH32 a strong proposal vs. OP_EVAL – even with OP_EVAL, P2SH32 remains the more byte-efficient construction for those use cases.)

As BCH VM bytecode is a concatenative language, a perfectly-optimizing compiler is quite likely to produce single-use evaluations from common pieces of different words/functions, even if the contract author didn’t deliberately design something requiring OP_EVAL (e.g. MAST-like constructions).

So:

  • OP_EVAL is feature-equivalent to word definition (each enables all the same use cases)
  • OP_EVAL typically requires 1 extra byte per invocation, but sometimes saves 1 byte vs. OP_INVOKE.
    • OP_EVAL is 3 bytes (<index> OP_PICK OP_EVAL) for many calls, but some will be optimized to only 1 byte (just OP_EVAL)
    • OP_INVOKE is always 2 bytes (<identifer> OP_INVOKE).
  • OP_EVAL always saves 1 byte per definition by avoiding the OP_DEFINE.
  • OP_EVAL remains optimal for some uses even if a future upgrade added word definition (as a 1-byte optimization for some function calls).

I’ll get this cleaned up and in the CHIP, thanks!

3 Likes

Yes, that’s it, and we can achieve that with sibling NFTs. Great work analyzing both! With all above, I’m in favor of OP_EVAL, it’s simpler and will be a better fit for us.

1 Like

This is an interesting claim, an attack on the current consensus rules that sounds very dangerous and your claims that this is a problem are useful to look into.

Notice, btw, that if this is true that this is a problem and this problem existing in NO WAY means it is OK to make more problems.
So, Calins question still actually needs a proper answer.

Back to your claim of this security issue in current consensus rules you say exists. Can you expand on that?

As far as I know we enabled the ‘push only’ rule in Bitcoin Cash ages ago to avoid exactly this kind of issues.

it’s not a security issue in consensus rules, but security issue is from contract author’s PoV - if you’re not careful with how you write your covenant then users could trick your contract system to “escape” with the funds locked in it

there’s no consensus security issue here - it executes each opcode as it should - whatever security breach is breach from the contract, not from consensus - and that’s on the contract author to worry about

As far as I know we enabled the ‘push only’ rule in Bitcoin Cash ages ago to avoid exactly this kind of issues.

it’s similar in that unauthenticated OP_EVAL could pop all the stack that came before it and mess up any contract code coming after it. That’s why contract writers should be careful how they use it - if running untrusted code then OP_EVAL must come last, and the last opcode before it should be some -VERIFY opcode.

I got a bit confused about why Jason dislikes op_exec so much. And then I looked up the design that NEXA made for that opcode and I understand. That design is retarded.

The benefit of op-exec was always to avoid stack pollution. And you can do that just fine in another way. My specific reason for calling it retarded is that every single calling location seems to require repeating the function definition. But that belongs in the function definition. And the problem with the nexa design is, they don’t have a function definition.

So, here is an idea;

Transaction:
output 0 is my subroutines output.

  • push 1 is maybe an op-return or a new op-subroutines.
  • push 2…- is your first subroutine. It has one op_n indicating the amount of arguments it has followed with the code to execute.

output 1…- are a normal output. They can call the subroutines defined in an earlier output.

So, lets define what kind of op-exec you need to make that work;

op-exec1: this opcode can execute a script pushed in this same output. It takes a single stack item indicating the index of the script. The first pushed script by this output is index 0.

op-exec2: this opcode can execute a script pushed from a different output on this same transaction. But only on lower numbered output indexes than this output. It takes one stack item indicating the output-index, a second stack item to indicate the push index.

op_sub_push: This is essentially a specialized version of op_push_data1 and maybe also a push_data2. It defines that the pushed data a subroutine. A single byte is expected as the first byte indicating the number of arguments expected.

Stack;
when any exec is called the number of expected arguments is read from the target (earlier pushed) subroutine and the stack is prepared to show ONLY that number of items. To the subroutine the stack is empty after popping those items.
The subroutine is expected to leave one or more items on the stack, but this is optional. We might want to define that the VM pushes a single int indicating the number of stack items that the subroutine left behind for security. Not sure if that is wanted.

EDIT: add op-mast idea:

in my above description the op_sub_push* are used to push in the locking script. This raises the question if we want to be able to push to the subroutine stack in the unlocking (input) script as well. And how that could be done securely. Simplest way to do that is to have the VM allow subroutine pushes in the unlocking script, but one that is unreachable by the actual executed code until it is actively verified and then accepted by the locking script.

So, imagine the following: your unlocking script has some old fashioned pushes and 3 op_sub_push instructions.
The unlocking script gets executed first and those pushes go to the stack while the sub-push instances go to some temp-sub-stack.

The locking script gets executed and it wants access to those entries on the temp-sub-stack, it could use something like op-mast to verify the scripts to migrate the scripts from the temp sub-stack to the real sub-stack, making them available for calling.

We’d have to add a rule that the temp sub-stack has to be empty at script end, just like the stack has to be empty or the evaluation fails.

end brainstorm new exec opcode.


With this design, a single call will basically be 2 bytes only. While providing stack protection. Unless you call the script from a different output, then its 3 bytes.

Would that be a better balance to keep the security without wasting bytes?

No it doesn’t, you prepare the stack before the call, you can just provide the code once as input data and then use OP_N OP_PICK <i> <j> OP_EXEC to call the code as many times as you need.

It works the same with OP_EVAL, suppose you place some code in input data once, you could just execute it whenever you want with OP_N OP_PICK OP_EVAL.

How the code ends on stack and whether it will be authenticated is at contract author’s discretion: he could hardcode it in the contract, he could push it as input data, he could read it from another in/out with introspection opcodes, he could use Script to construct it from multiple pieces obtained from any of those places…

That’s what both OP_EVAL and OP_EXEC allow, they’re agnostic to how the code to be executed got on stack.

And here you have the whole implementation of mast: CHIP 2024-12 OP_EVAL: Function Evaluation - #11 by bitcoincashautist

1 Like

So, your answer that you don’t need to repeat the function definition is that you can copy it instead? I don’t think that is a relevant distinction. You still need to repeat it. It still costs you quite some bytes to do so.

Having a function definition that is known VM wide also makes calling something recursively much easier without having to do weird things with the stack to somehow manage your subroutine.

It costs you only compute but not storage, because you use up some of the operation cost budget to duplicate the stack item, but you don’t increase the script size.

Advantage of OP_EVAL is simplicity and byte efficiency, and most uses we can imagine as Script authors will be evaling pre-authenticated code.

OP_EVAL will help split big contracts to smaller pieces so non-used pieces don’t need to be replicated on each contract use, only their hash(es).

Imagine a contract where there’s some main IF branch and 500 bytes of code behind each. Without OP_EVAL, you’d have to replicate whole 1000 bytes on each contract invocation even if one IF branch is only used rarely. With OP_EVAL, the contract could be rewritten as a 76 bytes locking script (2x32 hashes + 12 bytes of contract logic), and then spender would provide the 500-byte blob as unlocking data, only for the branch he will use - so each contract use would take only 570 bytes rather than 1000.

Unlocking script:

<code to be executed> // must be one of the 2 pre-approved blobs (hash to either hash0 or hash1)

Locking script:

OP_DUP OP_DUP
OP_SHA256 <hash0> OP_EQUAL OP_SWAP
OP_SHA256 <hash1> OP_EQUAL
OP_BOOLOR OP_VERIFY OP_EVAL

If some piece is to be run only once like in this example, then you don’t save anything with the register approach.
Having a dedicated VM-wide register to hold declared subroutines would make it all more complex, as pointed out by Calin above:

.

OP_EVAL is the simplest and most straight-forward way to get the benefits we’re most interested in.

2 Likes

Now compare it to the op-exec1/exec2 I detailed in comment 9.

None of your arguments make sense anymore with that simple improvement. Good thing is that this basically means we agree that nexa’s op-exec sucks. But let’s not have that as the benchmark. We can absolutely do better.