Proposing subroutines

tom · January 24, 2025, 2:40pm

The FORTH programming language from 1970 came including subroutines, they called them ‘words’. The equivalent in bitcoin cash of a FORTH ‘word’ is an opcode.

From wikipedia (subroutine):

In computer programming, a subroutine is a callable unit of software logic that has a well-defined interface and behavior and can be invoked multiple times.
Callable units provide a powerful programming tool. The primary purpose is to allow for the decomposition of a large and/or complicated problem into chunks that have relatively low cognitive load …
Judicious application can reduce the cost of developing and maintaining software, while increasing its quality and reliability.

This idea may be useful for Bitcoin Cash to add to our scripting language too. It allows more open source development of new subroutines as debugged reusable components. Something that could have a series of testcases shipped with it and all the things that keep quality.

What follows is a new proposal for subroutines in Bitcoin Cash that has the best savings by making the calling the cheapest from all the alternatives discussed. (a single call would be more expensive, but you wouldn’t do that, the compiler will inline that usage reducing the overhead to zero).

Aditionally this proposal isolates the code from data, disallowing operations like op_cat to work on code. With a history of millions of demonstratable exploits due to mixing of code and data, this sound like a sane thing to do and is certain to help people wary of this idea to come on board.
The best part is that the approach chosen to separate the code will also make evaluation of the script to be cheaper than alternative proposals. Faster smaller and safer, what more do you want?

How we may use this:

Future scenario that I think is the most powerful and likely,
we get a system much like NPM for people to publish small reusable subroutines. The effect is that people writing contracts will essentially get access to a new highly specialized opcode, directly in their code editor.

The compiler can include the used subroutines at the beginning of the generated script, which directly saves bytes if called multiple times and more importantly it avoids bugs by having reusable components. Because face it, literally nobody writes more complex stuff without libraries. In Bitcoin Cash we today have a library in the form of opcodes (hashes, signature checks and all the others). The opcodes provided in Bitcoin Cash are so specific because they are taking the role a crypto library. Extending that with subroutines is natural and internally consistent.

Cost to ecosystem:

To add this to the script interpreter is really quite straightforward. A BCHN would implement these requirements without much added complexity. The cost of storing the scripts in their own list is actually cheaper than needing to copy it on stack every call. Which is obvious when you remember that the vm-limits chip counts stack usage for a reason: stack usage is expensive.

A cashscript compiler, the cashconnect proposals and related store for templates, they all would want to add support but it basically is in-line with their current approaches of doing things. So I don’t expect this proposal to add much cost or complexity.

Future, a p2sh-like presenting of code at unlocking time.

As alternatives all allow untrusted code to be used to spend a coin, it is relevant to discuss this possibility.

The definition of trusted here is simply that the code was known at the time when the money was locked in. Which means that at the time the transaction was build and signed, we know the exact code that is meant to unlock it.

To only provide code at unlocking time (but maybe not p2sh) means you could end up with untrusted code. This isn’t so much code you don’t trust, but more code that probably has been written very specifically to unlock your coin that’s already mined. A real threat if known code was used in the locking script. (In a security gradient from air-gapped code to closed source to obfuscated to the extreme of open source public. Blockchain is that last, most extreme side and thus needs most protection).

To avoid any untrusted code being used to steal money, the chip goes into a bit more details, but it would be very useful to have an op-mast proposal that uses subroutines and adds validation by hashing the code and then verifying the hash against the hash provided in the locking script. Again, extremely similar to p2sh.

Plenty of detail in the CHIP, including a comparison to other alternatives.

bitcoincashautist · January 24, 2025, 7:35pm

This framing is wrong because it implies contract designers would set the whole locking script to something like just <untrusted code> OP_EVAL. Why would anyone lock coins in such an UTXO, that’s functionally equivalent to just locking them with OP_1 because anyone could post OP_1 as the untrusted code.

What the alternatives would allow - and that is a feature, not a bug - is for main program (trusted) to create a slot where users could add/modify their own authentication module (untrusted) when given access to the main program.

Or, contracts would authenticate against a hardcoded hash (or set of hashes), thus keeping the contract trusted even if provided later from an untrusted source (spender).

This proposal here is a solution looking for a problem. If objective was to make it more efficient than eval (and even provide stack protection) - then Calin’s alternative proposal was perfect:

Brainstorming OP_EVAL

Another approach is to break this up into 2 things:

declare an indexed subroutine, specifying its call signature “ahead of time”. Name the op-code maybe OP_DECLARESUB .

call said subroutine efficiently by index. Name it maybe OP_CALLSUB

OP_DECLARESUB is like OP_EXEC in that you declare a piece of code and a call signature. Unlike OP_EXEC these get “saved” internally into some table the VM is tracking (for reference later).

OP_DECLARESUB is not like OP_EXEC in that it doesn’t jump into the code at all immediately. For that you need OP_CALLSUB .

OP_CALLSUB just references indexes into this table, and the VM “knows” what code your are talking about and what the call signature is now. So you don’t need to specify it at each call-site. You just need to provide the args (if any). That’s it. This way you get most efficient byte encoding if you have multiple calls of same code.

Advantages:

No need to constantly re-molest the stack for every OP_EVAL/OP_EXEC each time. You just call into indexed subroutine. Only 1 (or more) args for OP_CALLSUB are needed. So in terms of efficiency at the call-site for many calls, this is the most efficient requiring only 1 arg (the index) for each call.

No more stack molestation/pushing/popping/etc to setup the stack item for the code to be eval’d!

For often called code, most efficient possible.

Disadvantages:

Less efficeint for one-offs, OP_EVAL style executions… since you need to both declare the subroutine and then call it, increasing byte count.

More stateful VM since we introduce this abstraction of “known subroutine table” which now needs to be tracked…

Observation(s):

Philosophically the above goes in the direction of CISC (vs RISC), adding higher abstraction to the VM rather than bare minimum. Technically one could implement this (less safely and less compactly) with just OP_EVAL in just the VM itself. This is definitely CISC. Not RISC. Not sure if this is advantage or disadvantage just an observation.

Here, you could push any stack item (whether it came from hard-coded data in locking script, fetched by introspection, or came from unlocking code) into this new subroutine table - OP_DECLARESUB would simply pop whatever top stack item (optionally with additional arguments to specify “call signature”) and move it to the subroutine table - similar to what OP_TOALTSTACK does.

But then, the other opcode, <index> OP_CALLSUB, would simply run the referenced code (without popping it) - thus achieving greatest possible efficiency when multiple calls of same code are needed.

Unlike this one, Calin’s proposal doesn’t unnecessarily tie contract designer hands.

cculianu · January 24, 2025, 11:26pm

I just want to point out that while for repeated calls, OP_EVAL seems like it does lots of stack copying – it would be basically OP_OVER OP_EVAL or OP_DUP OP_EVAL for repeated calls – that stack copying/pushing is “costed” by the new VMLimits framework to be 1% of execution cost. So pushing 1 byte costs 1, but executing 1 byte worth of code costs 100.

This means that execution of pushed code is dominated largely by … execution and not by stack pushes. Again, they are as low as ~1% of the cost.

tom · January 25, 2025, 11:43am

Thank you for your constructive comment, it largely underestimates the cost (in contract byte count) of managing the stack for your executable code, real world usage will need to do those little op-drops, dups, picks and similar. The cost is greater than you seem to think when mixing your normal processing with managing your code on the same stack.
Write some non trivial contracts that use multiple subroutines and you’ll notice it starts to add on.
Which is why the subroutine CHIP is the one that actually reaches the goals of compressing to the smallest number of bytes best.

But you end with a single line statement that is curious, I’d like to hear more about your thinking. How does a contract designers hands get tied with the subroutine proposal?

Now, if you think the contract designer should be able to xor his code-stack-item, please do explain why you’d want that and how that fits in the ideas of cash-script.

In my tests the contract designer, especially when not writing assembly by hand, will find there is no tying of hands in this proposal.

Edit; found this other post from you;

Doing ‘op-split’ on code is… nasty to be honest. I mean, if you can’t do anything else I understand it works. But it is fragile.

Instead your one script can push 2 (or 10) subroutines, making them callable by a single integer and avoiding any keeping track of state, keeping track of changes in one script needing changes in another.
Using the ‘subroutines’ approach will make it variuous magnitudes easier to write compiler support.

bitcoincashautist · January 30, 2025, 6:43pm

Because you force the designer to introduce the subroutines code in the locking script, as opposed to Eval/Exec/Calin’s where it’s taken from stack meaning it can come from any place in the TX:

Its own locking script
Its own unlocking script
Another input or output’s unlocking/locking script (through introspection)
Constructed by code from pieces of above sources

Ok, the exec2 then allows loading from another input’s subroutine table, but it’s still too rigid when we could just execute the stack item and leave it to contract designer to think about how to produce the stack item.

Calin’s achieves the same goal and without the unnecessary restriction on where the code must come from, and his could be used to implement compiler optimizations similar to how eval could (depends on spec details) and efficiency would depend on whether there’s more 1-off calls or more of multiple calls of the same bytecode.

He should, because why should we prejudge allowed ways to obtain the script to be eval’d? Script is akin to assembly - you should be free to design your contract however you want, there’s no risk to the protocol to allow this. Only risk is to money that gets locked in these contracts, but again - that’s on users/designers, why should the low-level protocol need to hold their hand? Leave that to CashScript or some other higher level language (but even those won’t protect you from making a money-losing mistake, you really have to know what you’re doing).

Assembly has JMP. If you write the executable by hand you risk making a broken or unsafe program, but if the compiler takes care of using JMP only where it makes sense (to produce an efficient executable that matches the code) then the risk is removed.

I can have a contract that requires executing dynamically created contract even now, here’s a contract example (link to load it in BiauthIDE) that demonstrates both ways - the plain OP_EVAL way, and the sideloaded, emulated OP_EVAL way.

Locking Script:

// immutable part of code to be eval'd, set by the contract designer
<0x5151935387> // <OP_1 OP_1 OP_ADD OP_3 OP_EQUAL>

// cat the "unsafe" spender-provided code after the safe code
OP_SWAP OP_CAT

// below we will execute the obtained bytecode using both ways

OP_DUP

// eval, executes it in this input's context
OP_EVAL OP_VERIFY

// sideloaded (eval emulation), requires execution it in another
// input's context, but still within the shared context (this transaction)

// 1. generate p2sh locking script for the generated bytecode to be eval'd
OP_HASH256 <0xaa20> OP_SWAP OP_CAT <0x87> OP_CAT
// 2. verify that the target sibling input has executed and passed the
// bytecode generated by the above cat operation
OP_INPUTINDEX OP_1ADD
OP_UTXOBYTECODE
OP_EQUAL

Unlocking script:

// "unsafe" spender-provided code to be eval'd
<0x7551> // OP_DROP OP_1

.

How to spend from this? Before spending the “main”, create a dust UTXO with redeem script that will match the generated bytecode by the main, and then include it in the spending TX:

inputs                                 | outputs
  0: main                              |   any
  1: execute generated bytecode script |

This achieves running “unsafe” code - but without the benefit of not having it replicated as transaction’s bytes. Here we must provide the fully expanded code as another input’s redeem script, and the main watches for it and verifies that it was executed with success. Having N slots for spender-provided code would require having N dust inputs as side-cars that side load the code to be evaluated.

Central thesis of your proposal is “we can’t allow people to execute code that’s not known ahead of locking up money”. Guess what - you can do that already, demonstrated above, so what? Should we now deprecate introspection opcodes because people can write “dangerous” contracts like the above?

Your proposal is solving the wrong problem and so the solution is unnecessarily limited and more complex than the simple OP_EVAL - which just cuts straight to the chase and keeps VM design simple.

I doubt people would do xors but they might do split/cats and they won’t do it by hand. You’ll have a nice human-readable CashScript - and the compiler could scan for repeating blocks of executable bytes and optimize them by replacing them with sequences of pushes of common elements and then combining them and eval-ing them.

Also, multi-input contract designs could be optimized by daisy-chaining: use introspection to get redeem script of the input “above” on stack, split out some variable part and replace it with this input’s variable part, cat the common code, execute… could eliminate a lot of code replication across inputs.

The nasty part is having to parse through whole input script to extract individual data pushes. Having something like OP_PARSE would be nice to have. With that, we could extract data push from anywhere in the TX and once on stack - do with it what we want.

Even without it, if we had proper Eval, then programs needing to parse input pushes could just declare a parsing subroutine and call it when needed.

Calin’s proposal achieves this, too, that’s not the differentiating selling point of your variant here.

tom · January 31, 2025, 12:00pm

Ah, this is easy to resolve then. No, this statement is not true. The ONLY difference is that subroutines require validation of the code before it is used. You can get it from any place you want. This is further explained in the op-mast subsection.

Hey, if you are perfectly fine with giving enough rope to developers to hang anyone using their code, that’s fine. But this is not a toy coin, this is aiming to be money for the world. Nobody is complaining about modern computers having memory protection, because we learned the nasty side-effects of not having it. Similar, we know the bad things that happen when basic sane protections that every single modern language uses are skipped. You may have to go back to the 1980s for this, but those lessons are learned. We’d do good to remember them and avoid repeating the mistakes.

You clearly prefer to have a hacky language that provides enough rope to hang yourself, that’s fine. But I disagree on that premise.
I think that everyone that aims for Bitcoin Cash to be money for the world will disagree with you too.

The proposal here is specifically aimed at the wider ecosystem, not at super hackers that write scripts alone. It is aimed at normal people that use existing libraries and trust them because they are tested. This is the 99% of programmers.
You’re wish means you want to be the 1%. And that’s fine. But maybe not something that is good for Bitcoin Cash.

bitcoincashautist · February 1, 2025, 9:28am

Hey, if you are perfectly fine with giving enough rope to developers to hang anyone using their code, that’s fine. But this is not a toy coin, this is aiming to be money for the world.

Oh please, stop with this patronizing “protect the developers” nonsense. You clearly don’t understand how smart contract development actually works in the real world. No one writes raw script - they use CashScript or other high-level languages that enforce their own safety rules.

Your entire argument is based on a completely wrong analogy with memory protection. Memory protection stops programs from corrupting OTHER programs’ memory space - it’s a SYSTEM level protection. What you’re proposing is more akin to restricting programming patterns within a single program’s own space. These are fundamentally different security contexts. What you’re proposing is like telling C programmers they can’t use pointers because they might shoot themselves in the foot. That’s not the VM’s job!

The proposal here is specifically aimed at the wider ecosystem, not at super hackers that write scripts alone

What an absurd strawman. The 99% of developers you’re supposedly “protecting” will NEVER see the underlying VM ops - they’ll work with safe high-level abstractions provided by CashScript. But your unnecessary restrictions at the VM level will cripple legitimate advanced use cases that need the flexibility.

Let me spell it out since you seem to be stuck in the 1980s:

Safety features belong in the compiler
The VM should be minimal and flexible
Your proposal solves a non-existent problem by adding unnecessary complexity

The fact that you can’t grasp this basic separation of concerns, yet keep pushing this paternalistic “protection” narrative, is honestly embarrassing. Stop trying to cripple the protocol because you don’t understand how modern smart contract development works.

P.S. Have you actually tried writing any non-trivial contracts with your proposed system? Because I have a feeling you’d quickly discover how restrictive and impractical your approach is in real-world scenarios.

tom · February 2, 2025, 12:07pm

If you require a cash-script compiler to be the one that protects then it naturally follows that the compiler becomes part of the “library”.

This means that if you get multiple compilers on the market, you’re no longer able to reuse a library method that was designed with the other library since you’re not exchangeing assmbly, you’re exchangeing sources that still need compilation.

It also means that compilers can’t break backwards compatibility without having a big impact on the ecosystem.

Notice also that a template (think wallet connect or cash-connect) can then only be consumed by a wallet that includes the compiler to turn that into assembly or bytecode.

So, while I understand the idea you’re proposing to push the security elsewhere, it would have rather big downsides. You’d basically ensure a mono culture of software and severely restrict competition on merit.

My idea of Bitcoin Cash is that it is a platform. A platform anyone can build on top of. Adding something to the consensus rules that goes against this is not acceptable. So your proposal doesn’t seem very attractive to me.

tom · February 2, 2025, 12:35pm

The basic point I see was lost is this;

the 99% of developers do not develop their entire (complex) script themselves. Instead they use a library and reuse code from that.

It doesn’t matter if YOU as a developer only see your code as higher level, it matters that the reusable code you get is not going to be having the same protection of your compiler as your own code does because you’re not the one compiling it.
We’re not worried about your own code being nasty, we’re worried about people putting exploits in a reuable library.

The simple protection you want is there to ensure that external assmbly based code is going to be able to ship with a test suite and you can be certain it is high quality. Because without the basic protections we’re talking about those library snippets will be impossible to proof read. Impossible to properly test.

===

The fun part here is that the subroutine proposal actually is faster, safer and takes less space on the blockchain. Which is all the goals that you started out saying you suppport.
The compiler would not need the safety off either, the compiler would stop you from abusing the safety features.
By all honest measures it is an improvement you should have welcomed.
But you don’t. So literally the only reason you can be arguing against basic safety features is because you want to be able to write code that is impossible to proof read and test.

bitcoincashautist · February 2, 2025, 3:25pm

Your argument keeps shifting and getting more confused with each post.

since you’re not exchangeing assmbly, you’re exchangeing sources that still need compilation

This shows a fundamental misunderstanding of how library distribution works. Libraries can absolutely ship bytecode that’s been pre-compiled, pre-audited, and ready to use, and which can be reproducibly built from source for independent verification. Your entire premise is flawed.

Notice also that a template (think wallet connect or cash-connect) can then only be consumed by a wallet that includes the compiler to turn that into assembly or bytecode.

Wallets don’t need compilers - they can work with pre-compiled bytecode.

My idea of Bitcoin Cash is that it is a platform.

The irony here is incredible. YOU’RE the one trying to restrict the platform by baking arbitrary limitations into the consensus layer. Your “protection” is actually REDUCING the platform’s capabilities and forcing everyone into your narrow view of how code should be structured.

We’re not worried about your own code being nasty, we’re worried about people putting exploits in a reuable library.

Oh, and your magical subroutine proposal somehow prevents malicious library code? Give me a break.

The fun part here is that the subroutine proposal actually is faster, safer and takes less space

You keep repeating this without evidence. Calin already explained that stack operations cost is negligible (1% of execution), and your approach adds unnecessary complexity to the VM. As for safety - you’re confusing “restrictive” with “safe”.

But you don’t. So literally the only reason you can be arguing against basic safety features is because you want to be able to write code that is impossible to proof read and test.

This is the most absurd strawman yet. Code readability and testability have nothing to do with your arbitrary VM restrictions.

The more you write, the more obvious it becomes that you’re stuck in a theoretical mindset without actual experience in how smart contracts work in practice. Please stop trying to cripple Bitcoin Cash’s VM with your misguided “safety” crusade.

Want to talk about ACTUAL safety? Let’s discuss formal verification tools, automated testing frameworks, and bytecode analysis - you know, the things that REALLY matter for contract security. But I suspect those topics might be a bit too practical for your theoretical safety theater.

Jonas · February 2, 2025, 4:23pm

But the proposal isn’t complete. You haven’t answered this question I wrote on TG:

But your subroutines proposal also violates that with OP_RUNSUB2? when locking there is no way to know what input it will be combined with (i.e. what subroutine will be called) at unlocking?

And further clarification:

If you lock funds in a p2sh contract for a redeem script that contains … <0> <0> OP_RUNSUB2 … there is no way of knowing what the first inputs first subroutine contains when unlocking the those funds.

Jonas · February 2, 2025, 5:35pm

Another question you didn’t answer in Telegram is:

how can subroutines work with reading a variable number (given as a parameter) of stack elements?

This seems like a quite big limitation since you can’t have a subroutine of the form “do X of Y elements on the stack”.