Brainstorming OP_EVAL

bitjson · December 13, 2024, 10:02pm

Thanks for sharing these!

I made some comments on the BIP 12 approach here:

Re NEXA’s OP_EXEC: it seems like trying to get both 1) an efficient lambda/function evaluation behavior and 2) a “safe” user-provided code sandboxing system out of the same opcode is going to result in a sub-optimal answer for both.

IMO, the most important use case of OP_EVAL is as a function-calling pattern. I expect nearly every sufficiently complex contract to use OP_EVAL at least once (to minimize transaction sizes), and covenants with non-trivial math (esp. crypto) are going to be calling OP_EVAL tens, hundreds, or thousands of times (though never deeper than the 100 depth limit). E.g. applying a crypto group operation requires a 3-byte call to the previously-pushed subroutine: OP_13 OP_PICK OP_EVAL (which can then perform dozens of efficient/cheap operations which don’t need to be duplicated in the encoded transaction).

On the other hand, the sandboxing behavior for OP_EXEC requires another stack item, so we get 1) added protocol complexity and 2) an increase from ~3 to ~5 bytes per-invocation. Also interesting that “pops of an empty subscript stack fail, which fails the entire script” – so the sandboxing behavior is only safe for passing patterns? (Can’t be relied upon for failing patterns, since a somehow-preserved failing pattern could permanently freeze a covenant that is unable to complete the OP_EXEC.)

Can you help me understand some use cases you’re thinking about for this stack freezing behavior? It’s notable here that NEXA doesn’t quite have CashTokens – can the use cases you’re thinking about be solved more efficiently by giving the other party a token, and simply checking for the presence of that token in whatever inputs/outputs needed?

Instead of giving the other party write-access to your contract + a carefully sandboxed evaluation context + internal API, just let them write/control their own real contract(s) – marked by a token you can cheaply identify (~4 bytes)?

(No rush, going to be offline a bit, but if it’s helpful I’ll try to write more here on Monday.)

bitcoincashautist · December 14, 2024, 6:56am

OP_EXEC better matches the “function-calling pattern” because normally functions have function signatures.

OP_EVAL is more like macro expansion, the callee can mess with the whole program and nothing the caller can do about it.

Tom summarized it well:

and this is basically the same question.

What abstraction “level” is the aim, and are we internally consistent.
Op_eval is like assembly.
op_exec is like C
OP_mast / op_merkle are much more close to the “rest” of the engine with ScriptVerify and other super specific opcodes.

.

They do have native tokens, have same functionality of our fungibles, and could emulate our NFT+commitments with their group authorities & subgroups. They also have what you’d call an “edit script” capability so they don’t have to spend script code to prevent users from breaking out of a covenant.

I didn’t have particular use-case in mind, just asking for a rationale. This could be a good rationale: OP_EVAL is lighter but harder to contain, however if containment is needed you can let users run their unchecked code on sidecar tokens thus emulate OP_EXEC. If we expect common uses of eval/exec to be running pre-authenticated Scripts (e.g. OP_DUP OP_SHA256 <hash_of_function> OP_EQUALVERIFY OP_EVAL), then eval is the optimal one. If you want to have contracts that let spenders run their own callbacks or something, then OP_EXEC is more suitable.

In any case, either will unlock fundamental functionality not really available now (even though some of it can be emulated with side-cars).

bitcoincashautist · December 14, 2024, 7:09am

I fed Claude Sonnet all 3 specs + this topic + Telegram discussion and asked him to summarize, and he did a great job (thanks Claude!):

Based on all the provided information, here are the key differences between OP_EVAL and OP_EXEC:

Stack Isolation

OP_EVAL: Full access to caller’s stack; can modify any stack elements
OP_EXEC: Isolated stack environment with explicit parameter passing
This is perhaps the most fundamental architectural difference

Function Signatures

OP_EXEC: Explicit function-like interface with N_Params and M_Returns
OP_EVAL: No formal parameter interface, more like macro expansion
Makes OP_EXEC more structured but more verbose

Security Model

OP_EXEC: Strong boundaries between caller and called code
OP_EVAL: Relies on pre-authentication of code (hash verification)
Different approaches to safety vs flexibility trade-off

Implementation Complexity

OP_EVAL: Simpler implementation, more direct
OP_EXEC: More complex due to isolation mechanisms
But both can be implemented efficiently

Byte Efficiency

OP_EVAL: More compact (∼3 bytes per call)
OP_EXEC: More verbose (∼5 bytes per call)
Due to additional parameter count overhead

Use Case Optimization

OP_EVAL: Better for pre-authenticated, trusted code reuse
OP_EXEC: Better for untrusted code execution scenarios
Different sweet spots for different requirements

Abstraction Level

OP_EVAL: Lower level, assembly-like
OP_EXEC: Higher level, more like traditional functions
Reflects different philosophical approaches

The core distinction comes down to OP_EXEC providing stronger guarantees through isolation at the cost of complexity, while OP_EVAL offers simpler, more efficient execution but requires trust or external verification mechanisms.

This shows a fundamental trade-off between safety and simplicity in the design space of code execution primitives.

cculianu · December 14, 2024, 8:10am

Thanks @bitcoincashautist for the excellent summary/

Yeah basically with OP_EVAL you are being more efficient but are limiting use-cases to 100% code you control (or can verify a-la p2sh hash checks).

With OP_EXEC you open the door for executing “untrusted” code.

If I had to pick 1, OP_EXEC seems like the more generic solution. And OP_EVAL seems like the “efficiency optimization” for internal code only.

Jason: Do you ever envision contracts needing to execute “untrusted” subroutines? If so – we go with OP_EXEC. If not, we go with OP_EVAL.

Or we go with both and let developers decide.

I think in an ideal world we do both. So long as we are here messing with the guts of the VM, might as well do both.

cculianu · December 14, 2024, 8:15am

I should also note: One can do “clever” (hacky?) special case things like OP_1NEGATE OP_EXEC → becomes basically OP_EVAL. i.e. if you see -1 right before it, you know caller declined to use the call stack semantics and it behaves like 100% proposed OP_EVAL. So in that scheme it becomes:

bitcoincashautist · December 14, 2024, 8:16am

there’s a precedent in OP_IFDUP

cculianu · December 14, 2024, 8:43am

Another approach is to break this up into 2 things:

declare an indexed subroutine, specifying its call signature “ahead of time”. Name the op-code maybe OP_DECLARESUB.
call said subroutine efficiently by index. Name it maybe OP_CALLSUB
OP_DECLARESUB is like OP_EXEC in that you declare a piece of code and a call signature. Unlike OP_EXEC these get “saved” internally into some table the VM is tracking (for reference later).
OP_DECLARESUB is not like OP_EXEC in that it doesn’t jump into the code at all immediately. For that you need OP_CALLSUB.
OP_CALLSUB just references indexes into this table, and the VM “knows” what code your are talking about and what the call signature is now. So you don’t need to specify it at each call-site. You just need to provide the args (if any). That’s it. This way you get most efficient byte encoding if you have multiple calls of same code.

Advantages:

No need to constantly re-molest the stack for every OP_EVAL/OP_EXEC each time. You just call into indexed subroutine. Only 1 (or more) args for OP_CALLSUB are needed. So in terms of efficiency at the call-site for many calls, this is the most efficient requiring only 1 arg (the index) for each call.
No more stack molestation/pushing/popping/etc to setup the stack item for the code to be eval’d!
For often called code, most efficient possible.

Disadvantages:

Less efficeint for one-offs, OP_EVAL style executions… since you need to both declare the subroutine and then call it, increasing byte count.
More stateful VM since we introduce this abstraction of “known subroutine table” which now needs to be tracked…

Observation(s):

Philosophically the above goes in the direction of CISC (vs RISC), adding higher abstraction to the VM rather than bare minimum. Technically one could implement this (less safely and less compactly) with just OP_EVAL in just the VM itself. This is definitely CISC. Not RISC. Not sure if this is advantage or disadvantage just an observation.

tom · December 14, 2024, 8:35pm

if you have a single transaction with multiple output scripts, can we imagine a case where you want to copy / run the subroutine from another output?

bitjson · January 11, 2025, 5:40am

I finally wrote a longer review of OP_EXEC vs OP_EVAL here: CHIP 2024-12 OP_EVAL: Function Evaluation - #14 by bitjson

To summarize: an optimized version of OP_EXEC still wastes several bytes per unique function definition, and stack isolation offers no security value, even for contrived use cases requiring evaluation of arbitrary “untrusted” code (see “User-provided pure functions”).

@bitcoincashautist would you mind fleshing out this example a bit more? Is it adequately covered by “Post-funding assigned instructions” here?

We actually don’t need OP_EVAL or OP_EXEC to open that door – contracts (and especially covenants) are precisely this sort of untrusted environment already. If an attacker can fool your contract by giving you 5 bytes where you expected 4, the last byte may very well be interpreted as an instruction (or part of the next data element) when encoded in the next locking bytecode.

I’ll continue to be interested in hearing any ideas about how a more “built-in” stack isolation feature might be useful, but for now, I expect OP_EVAL is still superior for “untrusted” code too.

Stack isolation is just 1) easy already and/or 2) not relevant for security. From “User-provided pure functions” here:

Fundamentally, the stack usage of an evaluation is simply not relevant to contract security. If the function uses too many items or produces too few, the rest of the contract will fail and the attempted transaction will be invalid (and if the contract can be frozen by a failing function, we’re doomed with or without an isolated stack). If the stack contains something we don’t want the pure function to modify, we need only rearrange our contract to push it later (or validate it later, if the data comes from unlocking bytecode).

OP_EVAL vs. word definition

Yes! I definitely need to include a section comparing the OP_EVAL CHIP with “proper” Forth-like word definition (also called OP_DEFINE/OP_UNDEFINE/OP_INVOKE in old Libauth branches).

As you pointed out, a full “word definition” upgrade proposal is quite a bit more involved: how and where we track definitions, any necessary limits for those new data structures, whether or not a word can be undefined (we only have OP_0 to OP_16 + maybe OP_1NEGATE!), what makes a valid identifier (only numbers? any single byte? multi-byte?), should we include Forth OP_FETCH/OP_STORE corollaries for data (some discussion of OP_EVAL vs. a TX-wide “data annex” here), and probably many more details.

Fortunately, we have a great argument for avoiding this bikeshed altogether: we can easily prove that OP_EVAL is the optimal construction for single-use evaluations (as you mentioned). Even if a “word definition” upgrade were hammered out and activated in the future, OP_EVAL would still be the most efficient option for many constructions. (This coincidentally was the same argument that made P2SH32 a strong proposal vs. OP_EVAL – even with OP_EVAL, P2SH32 remains the more byte-efficient construction for those use cases.)

As BCH VM bytecode is a concatenative language, a perfectly-optimizing compiler is quite likely to produce single-use evaluations from common pieces of different words/functions, even if the contract author didn’t deliberately design something requiring OP_EVAL (e.g. MAST-like constructions).

So:

OP_EVAL is feature-equivalent to word definition (each enables all the same use cases)
OP_EVAL typically requires 1 extra byte per invocation, but sometimes saves 1 byte vs. OP_INVOKE.
- OP_EVAL is 3 bytes (<index> OP_PICK OP_EVAL) for many calls, but some will be optimized to only 1 byte (just OP_EVAL)
- OP_INVOKE is always 2 bytes (<identifer> OP_INVOKE).
OP_EVAL always saves 1 byte per definition by avoiding the OP_DEFINE.
OP_EVAL remains optimal for some uses even if a future upgrade added word definition (as a 1-byte optimization for some function calls).

I’ll get this cleaned up and in the CHIP, thanks!

bitcoincashautist · January 11, 2025, 11:11am

Yes, that’s it, and we can achieve that with sibling NFTs. Great work analyzing both! With all above, I’m in favor of OP_EVAL, it’s simpler and will be a better fit for us.

tom · January 11, 2025, 9:12pm

This is an interesting claim, an attack on the current consensus rules that sounds very dangerous and your claims that this is a problem are useful to look into.

Notice, btw, that if this is true that this is a problem and this problem existing in NO WAY means it is OK to make more problems.
So, Calins question still actually needs a proper answer.

Back to your claim of this security issue in current consensus rules you say exists. Can you expand on that?

As far as I know we enabled the ‘push only’ rule in Bitcoin Cash ages ago to avoid exactly this kind of issues.

bitcoincashautist · January 11, 2025, 10:20pm

it’s not a security issue in consensus rules, but security issue is from contract author’s PoV - if you’re not careful with how you write your covenant then users could trick your contract system to “escape” with the funds locked in it

there’s no consensus security issue here - it executes each opcode as it should - whatever security breach is breach from the contract, not from consensus - and that’s on the contract author to worry about

As far as I know we enabled the ‘push only’ rule in Bitcoin Cash ages ago to avoid exactly this kind of issues.

it’s similar in that unauthenticated OP_EVAL could pop all the stack that came before it and mess up any contract code coming after it. That’s why contract writers should be careful how they use it - if running untrusted code then OP_EVAL must come last, and the last opcode before it should be some -VERIFY opcode.

tom · January 12, 2025, 11:54am

I got a bit confused about why Jason dislikes op_exec so much. And then I looked up the design that NEXA made for that opcode and I understand. That design is retarded.

The benefit of op-exec was always to avoid stack pollution. And you can do that just fine in another way. My specific reason for calling it retarded is that every single calling location seems to require repeating the function definition. But that belongs in the function definition. And the problem with the nexa design is, they don’t have a function definition.

So, here is an idea;

Transaction:
output 0 is my subroutines output.

push 1 is maybe an op-return or a new op-subroutines.
push 2…- is your first subroutine. It has one op_n indicating the amount of arguments it has followed with the code to execute.

output 1…- are a normal output. They can call the subroutines defined in an earlier output.

So, lets define what kind of op-exec you need to make that work;

op-exec1: this opcode can execute a script pushed in this same output. It takes a single stack item indicating the index of the script. The first pushed script by this output is index 0.

op-exec2: this opcode can execute a script pushed from a different output on this same transaction. But only on lower numbered output indexes than this output. It takes one stack item indicating the output-index, a second stack item to indicate the push index.

op_sub_push: This is essentially a specialized version of op_push_data1 and maybe also a push_data2. It defines that the pushed data a subroutine. A single byte is expected as the first byte indicating the number of arguments expected.

Stack;
when any exec is called the number of expected arguments is read from the target (earlier pushed) subroutine and the stack is prepared to show ONLY that number of items. To the subroutine the stack is empty after popping those items.
The subroutine is expected to leave one or more items on the stack, but this is optional. ~~We might want to define that the VM pushes a single int indicating the number of stack items that the subroutine left behind for security. Not sure if that is wanted.~~

EDIT: add op-mast idea:

in my above description the op_sub_push* are used to push in the locking script. This raises the question if we want to be able to push to the subroutine stack in the unlocking (input) script as well. And how that could be done securely. Simplest way to do that is to have the VM allow subroutine pushes in the unlocking script, but one that is unreachable by the actual executed code until it is actively verified and then accepted by the locking script.

So, imagine the following: your unlocking script has some old fashioned pushes and 3 op_sub_push instructions.
The unlocking script gets executed first and those pushes go to the stack while the sub-push instances go to some temp-sub-stack.

The locking script gets executed and it wants access to those entries on the temp-sub-stack, it could use something like op-mast to verify the scripts to migrate the scripts from the temp sub-stack to the real sub-stack, making them available for calling.

We’d have to add a rule that the temp sub-stack has to be empty at script end, just like the stack has to be empty or the evaluation fails.

end brainstorm new exec opcode.

With this design, a single call will basically be 2 bytes only. While providing stack protection. Unless you call the script from a different output, then its 3 bytes.

Would that be a better balance to keep the security without wasting bytes?

bitcoincashautist · January 12, 2025, 12:36pm

No it doesn’t, you prepare the stack before the call, you can just provide the code once as input data and then use OP_N OP_PICK <i> <j> OP_EXEC to call the code as many times as you need.

It works the same with OP_EVAL, suppose you place some code in input data once, you could just execute it whenever you want with OP_N OP_PICK OP_EVAL.

How the code ends on stack and whether it will be authenticated is at contract author’s discretion: he could hardcode it in the contract, he could push it as input data, he could read it from another in/out with introspection opcodes, he could use Script to construct it from multiple pieces obtained from any of those places…

That’s what both OP_EVAL and OP_EXEC allow, they’re agnostic to how the code to be executed got on stack.

And here you have the whole implementation of mast: CHIP 2024-12 OP_EVAL: Function Evaluation - #11 by bitcoincashautist

tom · January 12, 2025, 12:51pm

So, your answer that you don’t need to repeat the function definition is that you can copy it instead? I don’t think that is a relevant distinction. You still need to repeat it. It still costs you quite some bytes to do so.

Having a function definition that is known VM wide also makes calling something recursively much easier without having to do weird things with the stack to somehow manage your subroutine.

bitcoincashautist · January 12, 2025, 3:41pm

It costs you only compute but not storage, because you use up some of the operation cost budget to duplicate the stack item, but you don’t increase the script size.

Advantage of OP_EVAL is simplicity and byte efficiency, and most uses we can imagine as Script authors will be evaling pre-authenticated code.

OP_EVAL will help split big contracts to smaller pieces so non-used pieces don’t need to be replicated on each contract use, only their hash(es).

Imagine a contract where there’s some main IF branch and 500 bytes of code behind each. Without OP_EVAL, you’d have to replicate whole 1000 bytes on each contract invocation even if one IF branch is only used rarely. With OP_EVAL, the contract could be rewritten as a 76 bytes locking script (2x32 hashes + 12 bytes of contract logic), and then spender would provide the 500-byte blob as unlocking data, only for the branch he will use - so each contract use would take only 570 bytes rather than 1000.

Unlocking script:

<code to be executed> // must be one of the 2 pre-approved blobs (hash to either hash0 or hash1)

Locking script:

OP_DUP OP_DUP
OP_SHA256 <hash0> OP_EQUAL OP_SWAP
OP_SHA256 <hash1> OP_EQUAL
OP_BOOLOR OP_VERIFY OP_EVAL

If some piece is to be run only once like in this example, then you don’t save anything with the register approach.
Having a dedicated VM-wide register to hold declared subroutines would make it all more complex, as pointed out by Calin above:

.

OP_EVAL is the simplest and most straight-forward way to get the benefits we’re most interested in.

tom · January 12, 2025, 9:09pm

Now compare it to the op-exec1/exec2 I detailed in comment 9.

None of your arguments make sense anymore with that simple improvement. Good thing is that this basically means we agree that nexa’s op-exec sucks. But let’s not have that as the benchmark. We can absolutely do better.

bitcoincashautist · January 25, 2025, 6:10am

It’s a worse version of what @cculianu proposed above because you want to force contract authors to have sub definition be hardcoded in the contract rather than take it from a stack item. That creates the problem exec2 is solving.

With Calin’s proposal, contracts could easily be designed to execute code from another input - just use existing introspection opcodes & OP_SPLIT to extract sub data from another input and avoid replication, or put the code in an OP_RETURN so inputs can share it and not have to replicate it.
You won’t randomly load whatever from another input and blindly execute it - contracts that want to share code would be designed to load the correct blob in correct context and verify the code they’ve loaded (hash-verify).

It doesn’t “suck”, it works on a live network and has made trade-offs to fit in with how BU/Nexa people design contracts. It’s not good for us, but it doesn’t mean it sucks.

Yes, and we seem to be converging on EVAL as different people seem to like it for different reasons:

@Jonas: likes the “unsafe” property because it can be used to optimize stack manipulation operations - like, we don’t have an <n> OP_UNROLL opcode (to move a top item to a place n deep in the stack). With EVAL you could just implement it yourself at the beginning of contract and then call it with <m> OP_PICK OP_EVAL whenever needed (Calin’s proposal would be good for this as it would save a byte on each call)
@MathieuG: as CashScript compiler maintaner, it gives him a simple way to compile CashScript functions into executable bytecode, and compiler can take care of calling convention.
@bitjson: optimization of the executable when compiling from CashScript or whatever future high-level contract language, I think the big idea is to use heuristic to detect any repeating patterns of executable bytes and maximally compress the whole Script when compiling. Also, CAT/SPLIT on the bytecode to be eval’d could further compress the contract by reusing common bytes and appending some bytes here and there with OP_CAT to match the non-optimized code. All the eval-scripts in this context would be “trusted” and hard-coded in the “main” program (data pushes inside locking bytecode).
Me: I like it for its simplicity: it’s a simple, powerful, and flexible tool, so contract designers of various styles can all use it in a way that they prefer and that fits their style. I like the concept of saving contract size by committing hashes of data/code and only providing the preimage when needed. I use the pattern for data already (latest example), and I’d like to use it for executable bytes, too.

2qx · January 31, 2025, 5:23pm

There’s 1) Gavins BIP012 OP_EVAL proposal, 2) Nexa’s OP_EXEC and 3) Jason’s OP_EVAL proposal and a number of brainstorming proposals.

As a general nice feature to possibly have…

One of the features of concatenative languages Justin Spencer noted was:

With concatenative programming, a parallel compiler is a plain old map-reduce!

Meaning any script could be evaluated by dividing it into arbitrary segments, compiling the components in parallel then combining the result.

Generally, at a brainstorming level, being able to easily parallelize the execution of a script is a pretty killer feature.

TMU, if a subroutine is declared once and used many times throughout evaluation, that would break the easy parallel approach.

Likewise, I think, if an OP_EVAL is able to modify the stack outside it’s initial scope, that would complicate the parallel evaluation.

If we’re moving toward a future with lots of CPU cores and larger BCH scripts, preserving as much parallelization as possible would be my main concern to as it could keep timely evaluation on consumer grade hardware possible for a lot longer.

EDIT:

Jason has made some notes of the implementation trade-offs here: CHIP 2024-12 OP_EVAL: Function Evaluation - #6 by bitjson

I think the question is, there is a “soft” approach, to recursively call eval, and a “resource constrained” approach is to:

copy the current execution info to a new control stack frame, then swap the currently evaluating instructions in place (resetting ip/pc to 0)

I think my question is: In the constrained resource approach libauth is implementing, Is it possible to still parallelize evaluation into parts and rejoin the shards as the control stacks of the parts are cleared/completed?

@bitjson

bitjson · January 31, 2025, 8:54pm

Thanks for reviewing!

Sort of possible, but the gains don’t apply to adversarial cases (<data> OP_HASH256 OP_HASH256 OP_HASH256 ...), so we can’t e.g. safely raise limits based on such gains.

In more detail:

Yes, there are some contracts that theoretically can be “evaluated in parts”, but efficiently-factored programs generally don’t have many such divisions: a static analyzer would have to split the program at locations with one or more data pushes, then determine if the operations following those pushes can manipulate them without looking deeper down the stack. Yes, some minor acceleration-via-parallelization at this level is theoretically possible, but you’re in an altogether different territory WRT optimization, and it’s hard to justify that level of complexity in important consensus code – especially when it can’t apply to adversarial cases.

On the other hand, we already have far more parallelization than can be used by consumer hardware: individual transaction inputs can always be validated in parallel. So not only is validation parallelized across all transactions flowing over the network, but also every separate contract evaluation inside those transactions is being performed in parallel. So while pushing the parallelization even further down into the contract layer might be able to improve validation latency of a single transaction, it’s hard to imagine it rendering any cumulative scaling benefits on consumer hardware (and the extra overhead would probably even reduce cumulative performance), even with orders-of-magnitude more available validation threads. And it’s a shape-of-the-problem thing (independent of scale) – if consumer hardware is capable of validating 1M transactions per some time period, our “parallelizability” over that time period is already, by definition, at least 1M.