CHIP 2024-12 OP_EVAL: Function Evaluation

Jonas · May 9, 2025, 5:45am

You can’t blame the op code for user mistakes and what you are doing is spreading FUD.

Consider this example that I consider same thing and is fully possible today.
Someone locks funds with the following locking script:
[...] OP_INPUTINDEX OP_UTXOTOKENCOMMITMENT <commitment> OP_EQUALVERIFY [...]
This makes sure that any funds can only be unlocked if the user provides a NFT with the specified commitment as input. The problem? Anyone that sees this locking script can create their own token category with an NFT with this specific commitment. Clearly a user error since the locking contract should contain further checks (like the category). This does not make CashTokens unsafe.

Jonas · May 9, 2025, 2:09pm

I consider this to be true already today.
Consider this trivial example where there is no way of knowing how many calls to OP_SHA256 will be done without actually running the code with some input.

It’s bound by the number of `OP_IFs though, but any looping/recursion construct would make it even more obfuscated.

OP_SHA256
OP_DUP
<2> OP_MOD
OP_IF
  OP_SHA256
  OP_DUP
  <2> OP_MOD
OP_ELSE
  // Do something
OP_ENDIF

OP_IF
  OP_SHA256
  OP_DUP
  <2> OP_MOD
OP_ELSE
  // Do something else
OP_ENDIF
// Continue this construct

Clarification:

If OP_EVAL is called then any high cost operation (like OP_SHA256) exists since it is a part of the VM that executes the code. Whether it’s executed or not, and the number of times, could be impossible to tell by static analysis. But that is also true for (certain) scripts today!

BitcoinCashPodcast · May 16, 2025, 10:10pm

Some chat about OP_EVAL on the BLISS technical panel.

I said @emergent_reasons was quite concerned about the static analysis disruption part of it, but he later clarified that he was more concerned about something to do with attacks. I’m hoping he can explain that more because I’m obviously not across it just yet.

emergent_reasons · May 20, 2025, 7:45am

I’ve read through the proposals. FWIW I have no interest in complicated stack protection mechanisms. The simplicity of eval is powerful and elegant. As Jeremy mentioned though, I am super concerned, through the vehicle of unexpected stack interaction (a different class from sidecar interaction AFAIU), about the increased attack surface created by “non-local” code that the compiler/reviewer cannot fully reason about at the time of UTXO creation.

Has anyone made a proposal that is exactly just op_eval, but the definition of the blob to execute is required to be ~~local to~~ defined in the input and immutable (if executed)? Yes it would reduce absolute byte efficiency, but honestly I DGAFFFFF about byte efficiency vs. unknown unknown attack surface (0-days on massive protocols arising from unexpected edge case stack interactions even with the best of intentions). From another angle, until some massive and concrete upside appears beyond input-local code compression, I don’t see a need for open-ended op_eval when input-defined op_eval accomplishes great byte efficiency gains for complex scripts through compression.

Maybe there is some globally hashed function that everyone would like to use and we can audit once but… really? That sounds like something that won’t actually happen in practice and you just lose a few bytes if you do it input-defined anyway - those function definitions and audits can just as well happen at a high level library/compiler level rather than at the op_code level.

TLDR: can I haz OP_EVAL except:

input-local definition
anything executed can’t have been mutated (could be accomplished with clean DX through local definition of custom op_code macros?)
dgaf about stack protection - that’s a losing battle IMO at the op_code level

Is that a bad idea? Does it eliminate some important and concrete use case? Does it not work well with P2S?

Then (or in the meantime) if we really do identify concrete, powerful, huge benefits for raw OP_EVAL, then there’s nothing stopping the additional unlock step. However going backwards will be practically impossible. One way streets and all that.

Note that I’m also aware that through dynamic input lock script construction, the contract author may still not be able to fully reason about the logic at creation, but those cases will require significant effort to create. I’m happy to have a barrier to dangerous things that only gets crossed if there is proof positive of value on the other side that would then encourage the raw OP_EVAL.

bitcoincashautist · May 19, 2025, 8:40pm

Yes.

github.com/bitjson/bch-eval

Adjust the proposal to keep the static analyzability property

opened 11:40AM - 09 May 25 UTC

A60AB5450353F40E

Definition: whether it is possible to both mutate at run-time and execute the so… mutated stack item in the same in the same execution context. We can do a minimal change to OP_EVAL to preserve the property while keeping all the benefits of EVAL. Declare some `OP_NOP` to be `OP_PUSH_EXECUTABLE` with syntax of `OP_PUSH_EXECUTABLE {any existing data push op}` i.e. usable exclusively as a prefix to any push op to make the push as executable. Any change of the stack item contents (cat/split/arithmetics) would remove the executable bit. Swap, dup, moving to/from alt stack would still preserve it. `OP_EVAL` on a stack item without the bit would fail the script. input: `OP_PUSH_EXECUTABLE <0xac>` would push a stack item `0xac` which would have the executable bit the locking script could do: `OP_DUP <0xab> <1> OP_ADD OP_EQUALVERIFY OP_EVAL` but it could NOT do: `OP_DUP <0xab> <1> OP_ADD OP_EQUALVERIFY OP_1ADD OP_1SUB OP_EVAL` because OP_1ADD OP_1SUB would clear the executable bit. Everything concerning OP_EVAL would work the same, and Eval-scripts could push their own executable blobs using the `OP_PUSH_EXECUTABLE {data push}` and have it be available to parent scripts etc. The key here is that it will be trivial for a parser to extract ALL executable blobs by recursively drilling into each until no more executable blobs are to be found. This makes it possible to trivially know whether a script using Eval is executing from a small or big set of executable blobs, and evaluate complexity of the program (few blobs few evals or lots of blobs and lots of evals). Because we redefine a NOP, it means we could later easily remove this requirement if it is found that static analyzability was not such an important property to conserve.

jimtendo · May 20, 2025, 8:00am

Thanks for this. Just a follow-up question: In the current state, would this mean we could only do this safely for a single User Provided Function (e.g. OP_OUTPUTTOKENCOMMITMENT OP_EVAL) due to not being able to effectively safe-guard what it leaves on the stack?

I may’ve missed the implication of the OP_*VERIFY, but my other thought is that this pattern could possibly work for multiple if we used OP_DEPTH (which pushes the current number of stack items to the stack) as then we can verify that the OP_EVAL’d data returns the correct/expected number of Stack Items. E.g.

// ... Prior code that leaves us with a blank stack

// Evaluate token program at Output 0 
<0> OP_OUTPUTTOKENCOMMITMENT OP_EVAL

// Drop token program from Output 0 from stack.
// OP_NIP // EDIT: Not necessary because OP_EVAL pops the program

// Ensure that we're left with the expected number of stack items.
// In this example: 1
OP_DEPTH <1> OP_EQUALVERIFY

// Drop token program from Output 1 from stack.
// OP_NIP // EDIT: Not necessary because OP_EVAL pops the program

// Evaluate token program at Output 1 
<1> OP_OUTPUTTOKENCOMMITMENT OP_EVAL

// Ensure that we're left with the expected number of stack items.
// In this example: 1
OP_DEPTH <1> OP_EQUALVERIFY

// ... etc

I think the importance would then become that the Stack maintains a blank (or known) state before any of the user-provided functions are executed?

Apologies if I’m way off the mark and this is already achievable.

EDIT: Fixed mistakes in example.
EDIT: Didn’t realize there’s an OP_DEPTH which does what my hypothetical OP_STACKSIZE was trying to do.

emergent_reasons · May 20, 2025, 7:46am

If I understand correctly, this is half of the picture.

I really don’t care about this “static analysis” thing. I mean it’s kinda nice, but I really DGAF about it, unless some major player is already using it for some specific purpose - then we just find a way to help them do whatever that is in a new way. Additionally, static analysis requires construction of abstractions and I can’t imagine it ends up being cheaper to do than just executing a script.
It’s only immutability right? That’s good, half the picture.
I think the missing part is the input-defined aspect. That’s crucial to reducing the attack surface of unknown unknown stack interactions (0-days) that are possible even when cashscript is working perfectly, code is written well.

I.e. this isn’t about bugs or incorrect compiler behavior, but about avoiding giving easy access to a whole new large and very hard to reason about attack surface. Think about the incredible maturity of C++ and other language toolchains, yet we still have 0-days coming up on an ongoing basis. It’s not because of bad programming or bad compilers. It’s because of very motivated people finding tiny dark corners that lead to profit or chaos.

This limited eval can easily be superceded later by raw eval if a really convincing case is made in the future.

jimtendo · May 20, 2025, 7:03am

That’s crucial to reducing the attack surface of unknown unknown stack interactions (0-days) that are possible even when cashscript is working perfectly, code is written well (i.e. this isn’t about bugs or incorrect compiler behavior).

Are you able to try to think up an example of this? Assuming no bugs in the CashScript compiler (and that an eval function is not exposed), I can’t see a scenario where this could happen.

My concern here would be that the alternative of only allowing execution of that’s local to the input’s Unlocking Bytecode would probably introduce implementation complexity that actually heightens the risk of a 0-day.

emergent_reasons · May 20, 2025, 7:46am

I can’t. It’s kinda the nature of 0-days - 10 people can look at the code and not notice some double edge case interaction. Maybe a security researcher could?

There are assumptions about the state of the stack during input execution
There are assumptions about how “remote” code executes, leading to assumptions about the state of the stack after input execution
There are assumptions about the remaining code to be executed after completion of eval
I’m sure there are other details

Taken alone, they can be reasoned about. For example with input-defined blobs, the compiler can reason strongly about all the possibilities (this IS an appropriate place for static analysis - at compile time, not at tx validation time). But with “remote” code coming from outside, the compiler has to step into a higher level of abstraction unless you restrict it with authorized code hashes or something like that - but the unauthorized version is still there and will haunt us. Why invite that into our house?

Like… would you ever write a program that accepts and Evals a code/data blob from an API endpoint and then literally injects it into your program execution path? Even if you can write whatever validation code you want about that blob before you run it (which is harder for us with limited space)? I can’t imagine wanting to do that. And to make sure there is clarity here - I’m not talking about data/code separation per-se, although it turns into that in this particular example.

Regarding concern about implementation complexity - I really don’t see it. It’s not complex at all. Especially because, at least in what I’m suggesting, there’s no attempt at quixotic (IMO) stack protection.

emergent_reasons · May 20, 2025, 8:35am

To clarify a point - I’m talking about contract-based 0-days, not something that breaks out of the VM.

emergent_reasons · May 20, 2025, 9:17am

Based on discussion elsewhere, apparently I need to clarify that my concerns are about usages of EVAL outside the window of what most people are probably considering acceptable. Because regardless of how it’s intended to be used, it works the way it works, including all the things that are soft-considered “unacceptable”. It will surely be used in those ways.

A quote from discussion:

It might be worth articulating the point of view that you’re concerned about “OP_EVAL being used unsafely for callbacks/user-provided-functions/remote-code” in contracts […] to shave some time for others just so that it doesn’t get muddled.

That’s why I’m proposing to sacrifice a bit of simplicity and byte-efficiency at the very-well-controlled VM level in order to reduce unexpected outcomes in the wild west of contracts managing billions of dollars.

Additionally, nothing about this would block eventually a raw OP_EVAL if it becomes apparent that is in fact super valuable to allow those remote code / whatever use cases.

bitcoincashautist · May 20, 2025, 10:26am

It’s the full picture. We just add 1 bit of state to stack items. They get it only if explicitly pushed via a data push op. The bit is preserved/inherited only with stack opcodes (dup, swap, roll, etc.) and cleared with all others (cat, split, etc.).

Putting something on stack via introspection DOES NOT get executable bit.

So we preserve input locality, but you can still push something via input’s data push, then dup it and have it verified against something on another input or output, and then execute the item still having the bit from the input-local push.

In the future, we may want to revise this, we could have introspection opcodes also have the bit, but still prohibit mutations (modification loses the bit). This would make it TX-local (you could place some eval script in an op_return and have all your inputs call it without replicating it).

bitcoincashautist · May 20, 2025, 8:11pm

Yes. Basically:

Do your thing with main program, commit part of stack that must not be mutated by user-provided code.
Let user-provided code run and do its thing.
Verify stack state matches the state committed in 1., continue with the main program.

Example, the code in 3. needs to operate on result of 1. and 2., but we must prevent 2. from modifying result of 1., what do?

// 1. run some code that ends with 1 stack item
{some code}
// Commit the result to an opreturn (creator of tx must set it so it matches the result)
OP_DUP <0> OP_OUTPUTBYTECODE <2> OP_SPLIT OP_NIP OP_EQUALVERIFY

// 2. Evaluate some user-provided code
<0> OP_UTXOTOKENCOMMITMENT OP_EVAL

// 3. Verify it added just 1 stack item...
OP_DEPTH <2> OP_EQUALVERIFY
//  ...and didn't mess with result of 1.
OP_SWAP OP_OUTPUTBYTECODE <2> OP_SPLIT OP_NIP OP_EQUALVERIFY
{some more code that does something with results of 1. and 2.}

emergent_reasons · May 21, 2025, 7:22am

I think I get it now. Smart! Quite simple also. And if we ever add op_eval, it can use exactly the same code path except ignore the bit. Right?

bitcoincashautist · May 21, 2025, 7:35am

It’s all just op_eval, with/without the executable bit tracking and requirement. So, yes it can later use the same code path except ignore the bit.

To clarify, this is meant for having op_eval but with the extra rule that it can eval only stack items that have the executable bit. It would fail the script if it tried to eval a stack item without the bit.

Later, if we wanted to allow cross-input eval or mutable eval scripts, we could extend the bit to be set for results of introspection, too, or remove the executable bit tracking altogether and not require it by op_eval.

albaDsl · May 27, 2025, 3:54pm

I was asked in another thread to comment on OP_DEFINE / OP_INVOKE vs OP_EVAL so will respond here.

In general I think the evaluation feature is an immensely valuable addition to the bytecode language. It brings functions and recursion which are essential building blocks. Together with loops it completes the key features of a simple but expressive language. And these features are fully constrained by the VM sandbox and VM limits. After adding OP_EVAL to albaVm it became simple to implement functions such as exponentiation, merge sort, and basic elliptic curve multiplication.

I prefer OP_EVAL (possibly combined with OP_PUSH_EXECUTABLE) over OP_DEFINE / OP_INVOKE. There is a deeper level fundamental difference between these two solutions for function evaluation even though they look similar. Basically OP_EVAL fits into our current stack-based model of computation whereas OP_DEFINE / OP_INVOKE extends it by introducing global state. (Jason also brings up this topic in his CHIP). The global state is assign-once, but even so does change things around a bit. A couple of examples:

1)

With OP_DEFINE / OP_INVOKE, expressions involving function definitions are no longer self contained. You can paste the following expression involving the pow function (recursively implemented using OP_EVAL) anywhere into your own code where an integer is expected and it should work (e.g. in bitauth IDE 2026):

... (your code)
<2>
<16>
<0x3178009c635177777767785297009c63527952795296527976627695777777675279537953795194537976629577777768687662>
OP_EVAL
... (your code)

Had it been implemented using OP_DEFINE / OP_INVOKE, then the above evaluation would fail in case pow used a function slot that was already occupied. This makes interactive bitauth/REPL use more complicated and is as far as I can see a divergence from what we have today.

This also has implications for example when sharing libraries of compiled functions between tools. Now we need a linker to patch up function definitions from separately compiled modules so that their function slot usage does not overlap, instead of just bringing them in as is.

2)

OP_DEFINE / OP_INVOKE allows the function table to be used as a global assign-once array. A value can be assigned to a slot in the table by assembling a function that returns the value and OP_DEFINEing it to that slot. Two expressions at separate places in a program may have an agreement to pass data via function slot x. This way an expression is no longer only a function of its arguments on the stack, but also has access to global state calculated somewhere else in the program.

My sense is that we should continue to keep the bytecode language purely stack based and not also introduce global state. If we want to explicitly call out all “lambda creation sites” then I prefer the eval-bit suggestion by @bitcoincashautist/ @im_uname ( https://github.com/bitjson/bch-functions/issues/2). Although, currently my overall preference is to just have OP_EVAL on its own.

bitcoincashautist · May 27, 2025, 4:55pm

This is a great argument, thanks for joining and bringing it up! Also, it complicates cross-input code sharing when some other input defines functions inside functions and you need the whole thing. You can’t just slice the thing and execute it from running input’s context because slots could clash.

Which version? The “only push opcodes set the bit” version has the problem in that you can’t reuse code from other inputs.

We’d need an OP_PUSH_EXECUTABLE as you suggested. It could actually work the same as OP_DEFINE but instead adding to the table it just sets the bit on the top stack item.

Or we need a 3rd stack for executable blobs, where OP_DEFINE pushes new definitions <n> OP_INVOKE executes the n-deep item on the executable stack, and <m> OP_UNDEFINE clears m top items. The stack wouldn’t have to be left empty when main script finishes, the purpose of UNDEFINE is to allow inner scripts or callers to clean up and not mess up callers’s depth references.

bitjson · May 28, 2025, 5:43am

Thank you for reviewing!

I appreciate this and agree with the general principle. I want to note that this specific topic – function/“word” definition – is a spot where our model actually diverges from Forth dialects (and most concatenative and stack-based languages). Functions can of course be handled exclusively via stack scheduling and stack juggling (or just duplicated, as we do today), but the VM state in question (the “wordlist”) is generally considered a core element of stack-based models.

Fun, thanks for sharing! Have you compared the bytecode length and opCost of the OP_BEGIN/OP_UNTIL equivalent? (And/or vs. more efficient pow algorithms?) I’ve found loops are usually more efficient for the internal implementations, with functions primarily for high-level factoring e.g. CHIP 2024-12 OP_EVAL: Function Evaluation - #46 by bitjson

I expect many compilers will optimize index assignment, as it’s very easy (even via statically-applied transformations) to save bytes by assigning OP_0 through OP_16 to the most-commonly-encoded functions (not necessarily the same as most-commonly-invoked, and inlined functions don’t need an assignment at all). In this case though – assuming optimization isn’t important, no inlining, and/or a loop implementation isn’t preferred – the lambda could accept index(s) to use or even pass defined function identifiers, in addition to (lambda) function bodies (like OP_EVAL). (Aside: you might find bch-wizard useful.)

Certainly possible, the mutable version of that is described in Rationale: Immutability of Function Bodies. It’s quite inefficient vs. optimal stack scheduling though. Today’s equivalent is essentially what CashScript does already: naive deep picking from a stack area that the contract treats as a set of global registers.

bitjson · May 28, 2025, 4:31pm

Withdrawing `OP_EVAL`

Hi everyone,

On this BCH podcast we discussed why I think function support is important for Bitcoin Cash contracts, and I shared some context on the range of implementation options. In my continuing due diligence, I’ve concluded that the two-opcode “proper functions/word definition” (OP_DEFINE/OP_INVOKE) approach is a better technical choice than OP_EVAL.

I was initially more wary of the two-opcode approach: it’s slightly less byte-efficient in the simplest case (+3 bytes) and theoretically less byte-efficient for complex contracts, assuming a sufficiently advanced compiler (optimal stack scheduling).

However, experimentation has shifted my perspective since December:

OP_EVAL-based functions require exceptional integration effort in compilers and tooling, entailing considerable additional risk of compilation bugs. OP_DEFINE/OP_INVOKE offers equivalent capabilities at greater safety and minimal implementation cost.
The theoretical optimizations made possible by OP_EVAL are tiny and fundamentally temporary: the optimizations could reduce “glue code” between fixed business logic in some contracts, but a future upgrade enabling better deduplication of reused contract bytecode (e.g. read-only inputs) could fully eliminate those bytes from practical transaction sizes and blockchain storage growth. Therefore, even with extremely low time-preference, it would be a very low-impact use of time/resources to implement and verify the most aggressive, theoretical OP_EVAL-based optimizations (function juggling + stack scheduling integration; in fact, requires some novel development WRT published literature) for the minimal additional savings (max 3 bytes per non-inlined function after compilation) vs. a naive “deep pick” approach (e.g. what CashScript currently does). That means: OP_DEFINE/OP_INVOKE would likely remain both safer and more efficient in actual practice for at least the next year or two (other than the single-OP_EVAL case), with OP_EVAL only becoming temporarily more efficient in relatively rare cases, and only if 1) a novel, aggressive optimizer gets built and verified despite the meager return on investment, and 2) contract authors are willing to trade some safety and external auditability for those meager savings. (For reference, basic stack scheduling in tools like CashScript could save ~2 bytes per contract data element in nearly every contract. That optimization is a prerequisite to the OP_EVAL one – and far more commonly applicable than OP_EVAL’s max 3-byte optimization per non-inlineable function – but to my knowledge even that prerequisite remains unimplemented in any BCH-targeting compiler.)
At a much higher level: language-level function definition makes compilation or ports from other languages (EVM, WASM, JS, etc.) far safer and lower cost. Any deficiency in Bitcoin Cash’s function support (e.g. requiring stack scheduling, not allowing stack inputs, etc.) necessarily creates quirks and unexpected edge cases, often resulting in less-safe workarounds and harder-to-audit artifacts.

Summary

OP_EVAL is less safe and generally less efficient than OP_DEFINE/OP_INVOKE.
OP_EVAL’s minimal theoretical savings (3 bytes per defined, non-inlined function) require novel research and riskier compilation/audit tooling, and those specific savings may ultimately have a zero-byte impact on transaction sizes.
Proper function definition – OP_DEFINE/OP_INVOKE – simplifies compilation/ports from other languages (EVM, WASM, JS, etc.), improving the availability and safety of development tooling.

Based on this research, I’m withdrawing my advocacy for OP_EVAL and modifying the proposal to split it into two operations: OP_DEFINE and OP_INVOKE.

Most of the CHIP remains unchanged, but to minimize confusion, I’ve bumped the version to v2.0.0 and renamed the CHIP: CHIP-2025-05 Functions: Function Definition and Invocation.

Previous links continue to work, but I’ve also updated the repo to be titled bch-functions:

bitjson · May 28, 2025, 6:20am

I started a topic with the updated title:

CHIP 2024-12 OP_EVAL: Function Evaluation

1)

2)

Withdrawing OP_EVAL

Summary

Withdrawing `OP_EVAL`