CHIP-2025-08 Functions (Takes 2 & 3)

cculianu · August 12, 2025, 12:15am

In discussions on X and in telegram, it became apparent that some people have reservations about the “code that writes code” aspect of the current CHIP-2025-05 Functions: Function Definition and Invocation Operations proposal.

I am thus amending Jason’s proposal, and submitting it as a set of two different mutually-exclusive proposals:

CHIP-2025-08 Functions Take 2 - This proposal introduces a new flag, fCanDefineFunctions which is set to true initially and allows for OP_DEFINE to work. It gets set to false if opcodes that do anything other than push data items are encountered. Thus “code that writes code” is prevented by basically only allowing OP_DEFINE to work before any stack data item manipulations can occur.
CHIP-2025-08.3 Functions Take 3 - This proposal which originally was an idea from @bitcoincashautist , introduces an “executable bit” to stack data items that essentially accomplishes the same thing by tracking stack data item provenance (only directly-pushed items or their copies can become function code for OP_DEFINE).

The difference between the two proposals is that the “Take 3” one from @bitcoincashautist is a bit less restrictive than the “Take 2” one that I came up with. In the “Take 3” one, it’s possible for OP_INVOKE'd functions to also do OP_DEFINE and create new functions (which may or may not be what we want?).

I actually prefer the original proposal from Jason Dreyzehner.

If we can’t do that, then I prefer my proposal, only because implementation-wise on BCHN, @bitcoincashautist 's “Take 3” proposal would be a bit more invasive to the current code (only ever-so-slightly).

Let me know what you guys think!

ShadowOfHarbringer · August 12, 2025, 3:46pm

After comparing the specs, I prefer option “Take 2” because

it seems easier to implement
more straightforward (basic flag that denies some things)
less complicated

My stance is that we should do 2) unless 3) turns out to be more performant for whatever reason. Then 3).

Still, I am not that good with opcodes, so I am open to somebody proving me wrong in a logical way.

ShadowOfHarbringer · August 12, 2025, 3:51pm

Also at this point in time 2) is more restrictive (as claimed by Calin), so this is the more “responsible” pick right now.

As everybody knows I prefer “responsible” and “stable” to “move fast and break things”.

cculianu · August 12, 2025, 5:33pm

Just wanted to add this comment here:

With take 3 you can do conditionals as you define functions — so you can have alternate code paths that assign say function index 3 to code blob A if some variable is true or to code blob B if some variable is false.

And then rest of script calls function index 3 unaware what the preamble decided should be at 3 as the actual code blob.

This has power — it’s a kind of polymorphism

With take 2 you can achieve this same polymorphism but rather through indirection via a “lookup table” maintained by the script itself on the stack (less efficient per call).

So… idk if this is important or relevant; this is just an observation.

ShadowOfHarbringer · August 12, 2025, 8:05pm

That’s great, however wouldn’t all this extra effort and compexity just go to waste if in a year or 2 we decide to just let EVAL roam free Texas cowboy-style and remove the limitations?

tom · August 12, 2025, 8:34pm

While I think this addition makes sense (I really don’t like the idea of self-modifying code, that’s soo 1970s and it should stay there),
I need to point out that this addition is nice, but it feels like a band-aid.

Whereas an older suggestion of having an op-define only work when hash matches it, that makes any dirty-bits idea not needed at all. So, in case people missed it: an opcode that is in the spirit of p2sh. Lets call it op-define-verify to clarify. Op-define-verify takes 3 stack items. One is the id, the second is the hash and the 3rd is your actual code. The hash is the hash of the code-block being defined.

Which means that if your unlocking script simply starts with a list of op-defines, then your input holds an unlocking script with the actual code.
And the code is verified to match what was meant to execute at time of lock-in. And as a result the whole ‘checking for executable bit’ is irrelevant.

This still opens a lot of opportunities people want from functions. For instance you can take code from a different output. Safely since it will only work when it is a copy of what you intended to copy at lock-in time. Verified by hash.

The polymorphism idea still is possible in this design too. All that happens is that your op-if / then /else wrap op-define-verify statements.

So, the bool does indeed solve problems with the current CHIP. But to me it feels like a solution like this shows a deeper design issue that gets exposed (its more like JMP, but with external code, and that’s not Ok) than a solution.

ShadowOfHarbringer · August 12, 2025, 8:34pm

Can you modify the CHIP or post your own example of the “fix” so what you are saying is more clear here (or on github/lab)?

I am not that good with opcodes, but putting your solution next to the other solutions for comparison might make me understand what kind of mechanism are you talking about.

cculianu · August 12, 2025, 8:44pm

Maybe I need to see a fuller description but from what is described here, that would not prevent code-that-writes-code (because nothing is stopping a script from just hashing some arbitrary blob and passing it to OP_DEFINE_VERIFY).

Perhaps in the specification for this scheme we would also need to enforce some 2-phase operation mode to the VM (one where all you can do is define, another where you execute normally but not define)…

bitcoincashautist · August 13, 2025, 6:19am

How is it responsible to get less of the benefits when we are perfectly capable of getting more without having to sacrifice anything? 3. is just a little more work for Calin

bitcoincashautist · August 13, 2025, 6:27am

As Calin pointed out, Script can compute hash on-the-fly so it can still be made to accept unknown user code.

Also, if you use define on a blob pushed with locking script, which you would do just to structure/optimize your script, then the hash verification would be a waste of bytes because the code is already immutable because it’s being wholly defined in the locking script.

Why don’t we need a hash in old “bare” pay-to-public-key scripts? Because nobody can change the key because it’s defined in locking script itself.

With p2pkh we have to verify the key against the hash because it is provided later by spender.

tom · August 13, 2025, 7:05am

yes, the defines would indeed happen in the input, which is today push-only.
Ideally it would not “take” the code from stack, instead it would behave like a push itself. Saving script bytes.
So the ‘push only’ rule would be expanded to “push or define only”

ShadowOfHarbringer · August 13, 2025, 11:45am

In such case, from my (responsible) point of view, “less” means “more”.

I believe that more restrictive environment initially means less things that can go wrong.

Why add more work right now if we are likely to remove restrictions completely in the future? (unless it turns out it IS dangerous, which means adding more restrictions now IS “safer”)?

I mean think on your own words.

That’s my point exactly, that’s why it is responsible.

We get “less” now, in case the “more” turns out dangerous.

cculianu · August 13, 2025, 2:51pm

It also means less things can go right.

tom · August 13, 2025, 2:53pm

Naturally, you can do anything you want in the case you put it all in the locking script. Nobody disagrees AFAICT. That’s not what this topic was about, right? Nobody cares if you do fractal code expanding your own pushed code from your own locking script. Foot meet shutgun. Go ahead. At least it will never be someone else’s shutgun. That’s important. You can shoot your own foot, I can’t shoot yours.

The point is about getting the runnable code from elsewhere. Because that code is untrusted.
The p2sh example is the main known one, the code is supplied only at the time of unlocking. And to know it was the exact byte-for-byte one we meant, we use a hash.
A op-define-verify duplicates that p2sh behavior, and thereby solving the entire point of this article that code should not be mixed with data.

ShadowOfHarbringer · August 13, 2025, 2:53pm

Sure, if you think this (3) is the way to go, just go ahead.

I am not that good with opcodes anyway, I really have no position to oppose.

Just trying to save you from doing too much work that might be discarded later right now.

cculianu · August 13, 2025, 2:55pm

I honestly think (1) the original is the way to go. sigh

ShadowOfHarbringer · August 13, 2025, 2:57pm

Well nobody has produced use cases/benchmarks/tests that would break (1)

If /someone did it, that would be surely useful.

tom · August 13, 2025, 3:08pm

I’d love to see ANY usage of any version of the idea as published scripts to see what people are actually doing with this.
Very academic all this talking without any actual usage in real code…

For those that didn’t follow along in the beginning of the year, here the result of some of my research. An alternative we could talk about (not many did, however) that has some design requirements;

This is money we are making programmable, which means that there is a high incentive to steal and there are long lists of such problems on other chains. Draining a Decentralized Autonomous Organization is an experience we should try to avoid.

Only “trusted” code can be run.
The definition of trusted here is simply that the code was known at the time when the money was locked in. Which means that at the time the transaction was build and signed, we know the code that is meant to unlock it. To make this clear, with P2SH the code is ‘locked in’ using the hash provided ensuring that only a specific unlocking script can run.
Separation of data and code.
Subroutines are code, code can use data in many ways. Multiply it, cut it and join it. Code can’t do any of those things to other code.

One clear and easy to understand example of the dangers here is one where a script author may assume that the creation of 2 outputs will then result in those two being spent in one specific transaction. While the way that utxo works, this isn’t a given at all. But one of the more common misunderstandings on how stuff works.
As such a script may try to read data from another input and use it as code. Believing that to be safe.

If that code ends up on chain anyone can brute force a transaction that supplies just the right data in order to convince the script (that is now on-chain and immutable) that it can be spent. And take the money that was locked in that transaction.

This is easy to write and exploit, just a introspection to get a specific locking script from a numbered input, cut it and then op-define it and run it.

This is the main “thing” that people have worried about with regards to mixing introspection and “the stack” and functions. And what I understand the ‘executable bit’ is meant to solve. But, again, I don’t think it solves it very nicely and just hides the real problem.

cculianu · August 13, 2025, 3:13pm

See: Quantumroot: Quantum-Secure Vaults for Bitcoin Cash

He makes heavy use of the proposal in (1) and it’s not clear it would work as elegantly with proposal (2) or (3)…

ShadowOfHarbringer · August 13, 2025, 3:14pm

My idea is, that is why we can/will just remove the limitations after a trial period. And then we can have cool stuff like Quantumroot.