Proposing subroutines

The FORTH programming language from 1970 came including subroutines, they called them ‘words’. The equivalent in bitcoin cash of a FORTH ‘word’ is an opcode.

From wikipedia (subroutine):

In computer programming, a subroutine is a callable unit of software logic that has a well-defined interface and behavior and can be invoked multiple times.
Callable units provide a powerful programming tool. The primary purpose is to allow for the decomposition of a large and/or complicated problem into chunks that have relatively low cognitive load …
Judicious application can reduce the cost of developing and maintaining software, while increasing its quality and reliability.

This idea may be useful for Bitcoin Cash to add to our scripting language too. It allows more open source development of new subroutines as debugged reusable components. Something that could have a series of testcases shipped with it and all the things that keep quality.

What follows is a new proposal for subroutines in Bitcoin Cash that has the best savings by making the calling the cheapest from all the alternatives discussed. (a single call would be more expensive, but you wouldn’t do that, the compiler will inline that usage reducing the overhead to zero).

Aditionally this proposal isolates the code from data, disallowing operations like op_cat to work on code. With a history of millions of demonstratable exploits due to mixing of code and data, this sound like a sane thing to do and is certain to help people wary of this idea to come on board.
The best part is that the approach chosen to separate the code will also make evaluation of the script to be cheaper than alternative proposals. Faster smaller and safer, what more do you want?

How we may use this:

Future scenario that I think is the most powerful and likely,
we get a system much like NPM for people to publish small reusable subroutines. The effect is that people writing contracts will essentially get access to a new highly specialized opcode, directly in their code editor.

The compiler can include the used subroutines at the beginning of the generated script, which directly saves bytes if called multiple times and more importantly it avoids bugs by having reusable components. Because face it, literally nobody writes more complex stuff without libraries. In Bitcoin Cash we today have a library in the form of opcodes (hashes, signature checks and all the others). The opcodes provided in Bitcoin Cash are so specific because they are taking the role a crypto library. Extending that with subroutines is natural and internally consistent.

Cost to ecosystem:

To add this to the script interpreter is really quite straightforward. A BCHN would implement these requirements without much added complexity. The cost of storing the scripts in their own list is actually cheaper than needing to copy it on stack every call. Which is obvious when you remember that the vm-limits chip counts stack usage for a reason: stack usage is expensive.

A cashscript compiler, the cashconnect proposals and related store for templates, they all would want to add support but it basically is in-line with their current approaches of doing things. So I don’t expect this proposal to add much cost or complexity.

Future, a p2sh-like presenting of code at unlocking time.

As alternatives all allow untrusted code to be used to spend a coin, it is relevant to discuss this possibility.

The definition of trusted here is simply that the code was known at the time when the money was locked in. Which means that at the time the transaction was build and signed, we know the exact code that is meant to unlock it.

To only provide code at unlocking time (but maybe not p2sh) means you could end up with untrusted code. This isn’t so much code you don’t trust, but more code that probably has been written very specifically to unlock your coin that’s already mined. A real threat if known code was used in the locking script. (In a security gradient from air-gapped code to closed source to obfuscated to the extreme of open source public. Blockchain is that last, most extreme side and thus needs most protection).

To avoid any untrusted code being used to steal money, the chip goes into a bit more details, but it would be very useful to have an op-mast proposal that uses subroutines and adds validation by hashing the code and then verifying the hash against the hash provided in the locking script. Again, extremely similar to p2sh.

Plenty of detail in the CHIP, including a comparison to other alternatives.

This framing is wrong because it implies contract designers would set the whole locking script to something like just <untrusted code> OP_EVAL. Why would anyone lock coins in such an UTXO, that’s functionally equivalent to just locking them with OP_1 because anyone could post OP_1 as the untrusted code.

What the alternatives would allow - and that is a feature, not a bug - is for main program (trusted) to create a slot where users could add/modify their own authentication module (untrusted) when given access to the main program.

Or, contracts would authenticate against a hardcoded hash (or set of hashes), thus keeping the contract trusted even if provided later from an untrusted source (spender).

This proposal here is a solution looking for a problem. If objective was to make it more efficient than eval (and even provide stack protection) - then Calin’s alternative proposal was perfect:

Here, you could push any stack item (whether it came from hard-coded data in locking script, fetched by introspection, or came from unlocking code) into this new subroutine table - OP_DECLARESUB would simply pop whatever top stack item (optionally with additional arguments to specify “call signature”) and move it to the subroutine table - similar to what OP_TOALTSTACK does.

But then, the other opcode, <index> OP_CALLSUB, would simply run the referenced code (without popping it) - thus achieving greatest possible efficiency when multiple calls of same code are needed.

Unlike this one, Calin’s proposal doesn’t unnecessarily tie contract designer hands.

2 Likes

I just want to point out that while for repeated calls, OP_EVAL seems like it does lots of stack copying – it would be basically OP_OVER OP_EVAL or OP_DUP OP_EVAL for repeated calls – that stack copying/pushing is “costed” by the new VMLimits framework to be 1% of execution cost. So pushing 1 byte costs 1, but executing 1 byte worth of code costs 100.

This means that execution of pushed code is dominated largely by … execution and not by stack pushes. Again, they are as low as ~1% of the cost.

2 Likes

Thank you for your constructive comment, it largely underestimates the cost (in contract byte count) of managing the stack for your executable code, real world usage will need to do those little op-drops, dups, picks and similar. The cost is greater than you seem to think when mixing your normal processing with managing your code on the same stack.
Write some non trivial contracts that use multiple subroutines and you’ll notice it starts to add on.
Which is why the subroutine CHIP is the one that actually reaches the goals of compressing to the smallest number of bytes best.

But you end with a single line statement that is curious, I’d like to hear more about your thinking. How does a contract designers hands get tied with the subroutine proposal?

Now, if you think the contract designer should be able to xor his code-stack-item, please do explain why you’d want that and how that fits in the ideas of cash-script.

In my tests the contract designer, especially when not writing assembly by hand, will find there is no tying of hands in this proposal.

Edit; found this other post from you;

Doing ‘op-split’ on code is… nasty to be honest. I mean, if you can’t do anything else I understand it works. But it is fragile.

Instead your one script can push 2 (or 10) subroutines, making them callable by a single integer and avoiding any keeping track of state, keeping track of changes in one script needing changes in another.
Using the ‘subroutines’ approach will make it variuous magnitudes easier to write compiler support.

Because you force the designer to introduce the subroutines code in the locking script, as opposed to Eval/Exec/Calin’s where it’s taken from stack meaning it can come from any place in the TX:

  • Its own locking script
  • Its own unlocking script
  • Another input or output’s unlocking/locking script (through introspection)
  • Constructed by code from pieces of above sources

Ok, the exec2 then allows loading from another input’s subroutine table, but it’s still too rigid when we could just execute the stack item and leave it to contract designer to think about how to produce the stack item.

Calin’s achieves the same goal and without the unnecessary restriction on where the code must come from, and his could be used to implement compiler optimizations similar to how eval could (depends on spec details) and efficiency would depend on whether there’s more 1-off calls or more of multiple calls of the same bytecode.

He should, because why should we prejudge allowed ways to obtain the script to be eval’d? Script is akin to assembly - you should be free to design your contract however you want, there’s no risk to the protocol to allow this. Only risk is to money that gets locked in these contracts, but again - that’s on users/designers, why should the low-level protocol need to hold their hand? Leave that to CashScript or some other higher level language (but even those won’t protect you from making a money-losing mistake, you really have to know what you’re doing).

Assembly has JMP. If you write the executable by hand you risk making a broken or unsafe program, but if the compiler takes care of using JMP only where it makes sense (to produce an efficient executable that matches the code) then the risk is removed.

I can have a contract that requires executing dynamically created contract even now, here’s a contract example (link to load it in BiauthIDE) that demonstrates both ways - the plain OP_EVAL way, and the sideloaded, emulated OP_EVAL way.

Locking Script:

// immutable part of code to be eval'd, set by the contract designer
<0x5151935387> // <OP_1 OP_1 OP_ADD OP_3 OP_EQUAL>

// cat the "unsafe" spender-provided code after the safe code
OP_SWAP OP_CAT

// below we will execute the obtained bytecode using both ways

OP_DUP

// eval, executes it in this input's context
OP_EVAL OP_VERIFY

// sideloaded (eval emulation), requires execution it in another
// input's context, but still within the shared context (this transaction)

// 1. generate p2sh locking script for the generated bytecode to be eval'd
OP_HASH256 <0xaa20> OP_SWAP OP_CAT <0x87> OP_CAT
// 2. verify that the target sibling input has executed and passed the
// bytecode generated by the above cat operation
OP_INPUTINDEX OP_1ADD
OP_UTXOBYTECODE
OP_EQUAL

Unlocking script:

// "unsafe" spender-provided code to be eval'd
<0x7551> // OP_DROP OP_1

.

How to spend from this? Before spending the “main”, create a dust UTXO with redeem script that will match the generated bytecode by the main, and then include it in the spending TX:

inputs                                 | outputs
  0: main                              |   any
  1: execute generated bytecode script |

This achieves running “unsafe” code - but without the benefit of not having it replicated as transaction’s bytes. Here we must provide the fully expanded code as another input’s redeem script, and the main watches for it and verifies that it was executed with success. Having N slots for spender-provided code would require having N dust inputs as side-cars that side load the code to be evaluated.

Central thesis of your proposal is “we can’t allow people to execute code that’s not known ahead of locking up money”. Guess what - you can do that already, demonstrated above, so what? Should we now deprecate introspection opcodes because people can write “dangerous” contracts like the above?

Your proposal is solving the wrong problem and so the solution is unnecessarily limited and more complex than the simple OP_EVAL - which just cuts straight to the chase and keeps VM design simple.

I doubt people would do xors but they might do split/cats and they won’t do it by hand. You’ll have a nice human-readable CashScript - and the compiler could scan for repeating blocks of executable bytes and optimize them by replacing them with sequences of pushes of common elements and then combining them and eval-ing them.

Also, multi-input contract designs could be optimized by daisy-chaining: use introspection to get redeem script of the input “above” on stack, split out some variable part and replace it with this input’s variable part, cat the common code, execute… could eliminate a lot of code replication across inputs.

The nasty part is having to parse through whole input script to extract individual data pushes. Having something like OP_PARSE would be nice to have. With that, we could extract data push from anywhere in the TX and once on stack - do with it what we want.

Even without it, if we had proper Eval, then programs needing to parse input pushes could just declare a parsing subroutine and call it when needed.

Calin’s proposal achieves this, too, that’s not the differentiating selling point of your variant here.