Bitcoin Script: multi byte opcodes

tom · March 16, 2021, 11:41am

This came up various times and I figured I’d write my thoughts here on this topic.

The design of Bitcoin Script as shipped by Satoshi uses “opcodes” encoded into single bytes. The future-proofing of this concept is currently missing. It is likely we will run out of opcodes as we only have one byte.

My thinking is that we can use the concepts and ideas from Variable-width encoding - Wikipedia

why?

The reason why we want to do this is because when we run out of opcodes we can no longer expand the platform in a simple manner.

The introspection work that is going on gives us a strong indication that there are a lot of great ideas which require opcodes. But it feels bad when 1 great I idea uses 20 opcodes when we only have 60 left.
To avoid this being used as a realistic and valid complaint against an otherwise great idea, it makes sense to look why there is a shortage in the first place. And solve the shortage.

We all know supply/demand. So lets increase the supply of opcode space and then we can decide opcodes on merit and not on them using opcode space.

how

first we must realize that all opcodes always are stored in-memory as 32 bit integers. This happens today as well, even more bits on modern machines. We just serialize them in transactions as smaller to save space.
An opcode that is higher than 0xC0, or in binary 1100 0000, will occupy 2 bytes in the transaction. These first two bits being 1 is our signal to read a second byte.
For 2-byte opcodes we reuse the last 6 bits (zeros in the above example) and drop the first two bits as they were just for indication of there being a second byte.
A 2-byte opcode will then have, effectively, 6 + 8 bits. The 6 from the first byte plus all the bits from the second byte. This gives us an opcode space of 14 bits. (16384 opcodes).

From my research no opcode currently uses 0xC0 (192) or higher. That would make this suggestion be a clean drop-in. This is bound to change at the first next protocol upgrade, though.

The UTF-8 design uses only 1 bit for signaling when there are multiple bytes, but since we already have opcodes above 127, this ship has sailed. We can go for 2 bits and still get a nice balance where 6 bits from the first byte are still useful. A bad outcome is where the entire first byte is lost as signalling.

BigBlockIfTrue · March 16, 2021, 12:22pm

Instead of adding 20 opcodes, you could also add one single opcode with an integer 1-20 as an extra input argument.

If we do this for every proposal introducing a large number of new operations (which are rare, native introspection is an exceptional proposal in this regard), then 60 remaining opcodes is probably plenty.

bitjson · March 17, 2021, 4:59am

I think it’s really valuable to have some research on what expansion might look like, but I’d be really uncertain about implementing anything before we’re even close to the current 255 limit (comment from the TxInt Telegram group):

I originally thought this parameterized strategy was the best: op-pushstate, but I changed my mind for a few reasons:

variadic parameters – that original proposal was using a lot of derived state (I consider it just wrong now ) so I had missed that many of these state elements really should accept and number from which to select and input or output (I had just missed the idea entirely until @tobiassan sent his PR: `OP_PUSHSTATE` —> multi-byte `OP_TX*` opcodes. by EyeOfPython · Pull Request #1 · bitjson/op-pushstate · GitHub) - if some opcodes need to accept a different number of parameters, we need to separate them into at least 2 different opcodes to avoid having the opcodes accept different numbers of parameters based on value (which makes static analysis, provers, and certain compilers far more complicated).

YAGNI You aren't gonna need it - Wikipedia – there’s only so much the BCH VM could reasonably do without changing its operating model. 10 opcodes is plenty for adding/replacing broken future crypto algorithms, and 10 more is probably plenty for additional control flow structures (loops? switches?) and math operations (exp, log, maybe rounding?). If we’re already adding all the possible state elements with 13 opcodes, there’s a good chance we’ll never need 255 opcodes.

Consider: even if we assume we’ll eventually have thousands of opcodes – why would we start using double-width opcodes now, when we’re still well below 255? Maybe we think these introspection opcodes will be ultra-rare, and opcodes 256-300 will be more common? (If so, where are those opcodes? Wouldn’t we have wanted them already if they’re so useful?) Doubling our cost now would be classic premature optimization: we’d force VM implementations to support multi-byte opcodes, increase script sizes, complicate script complexity calcuation, etc. and we’d get nothing for it but a vague idea that we’re prepared for a future that may never happen. YAGNI – if we ever get to 250 opcodes, we can always use the last 5 for expansions. (And trying to expand before then also doesn’t win us anything – we’re already past 127, so we can’t do the UTF8 trick.)

Related: some background on how the 13 opcodes in the draft TxInt spec were derived from the VM:

Fortunately, it’s easy to prove correctness for introspection opcodes – the VM is already defined, so we have nothing to design. There’s a precise list of raw state currently available to the VM, and in the CHIP opcodes 189 to 201 are exactly those state elements. I’m also pretty confident the parameters are uncontroversial too: for each numerically-indexed state element, its opcode just accepts the number to select. (But if anyone has another idea for how that should work, we should talk about it.) That’s why the CHIP includes e.g. an OP_INPUTSEQUENCENUMBER – I’m not sure if there’s any realistic use case, but it’s in the list for correctness/completeness. (We can choose not to add it if we want to save an opcode. But we should think of that the same way as if OP_1 through OP_16 were included, but OP_7 was excluded because we decided no one is likely to use the number 7 in real contracts )

Any other possible introspected state would necessarily be computed from these basic state elements (e.g. the aggregated/hashed ones below). In those cases, I’m currently thinking it’s a better idea to not complicate things with aggregated/hashed/templated operations and instead let contracts compute aggregated state using the existing VM operations (e.g. OP_CAT, transformations, math, etc). If there’s an important use case which can’t be done that way, I’d say it’s a deficiency in the rest of the VM, like a lack of some sort of safe OP_LOOP.

tom · March 17, 2021, 8:02am

This answers the question why it makes sense to do this before we hit the 250 opcodes. My scheme allows reusing of 6 bits from the first byte, plus 8 from the next byte. (which is how I got the 16K opcode max).

If you wait until there is no opcodes left you lose that benefit and you lose bits fast which means future opcodes scarcity is going to be a much harder problem to solve with a much uglier solution.

The problem of us now having one more byte for some opcodes being too expensive sounds strange to me, are most such scripts not quite long anyway? The one extra byte doesn’t seem to be that percentage change that I’d think you are worried about…

bitjson · March 17, 2021, 8:49pm

Is 16K the best target? Even x86 is well below 2000 right now. How about reserving 0xf0 through 0xff to leave space for 4000 two-byte opcodes?

I still expect we’ll never need more than 255. We’re not building a typical ISA – for example, a huge number of x86 instructions are really just slightly more performant ways of doing something already possible with existing instructions.

That sort of thing doesn’t apply to us: programmers on our ISA might have an adversarial relationship with the “computer”, and even if well-intentioned contract authors want to speed up validation performance (e.g. to get reduced network fees) the cost of even measuring which operations they choose to use is far more expensive than any micro-optimizations they implement for non-crypto operations. And crypto operations are practically the only meaningful burden on validation speed, for all other operations, the 1 satoshi-per-instruction heuristic is basically ideal.

E.g. we’d need to be dealing with >10 MB programs for the tiny differences between instructions like OP_SWAP, OP_TUCK, OP_NIP, etc. to even be measurable. If people want autonomous contracts of 10MB complexity, they’ll be far more efficiently implemented as two-way-bridged sidechains than directly in the BCH VM. (This is the strategic difference between BCH and ETH – BCH doesn’t aim to have all the world’s decentralized applications running on the same VM. We trade ETHs simpler mental model for better scalability by only doing the money-movement parts.)

So what future opcodes could make sense in the BCH model? (Just solidifying thoughts here:)

Control Flow – estimate: 10

bounded loops – some operation where looped instructions still count toward the opcode limit, but e.g. merkle tree validation contracts don’t have to be filled with 8 duplicated copies of the merkle tree validation instructions. This would reduce contract byte size without affecting VM opcode limits. In CashTokens, OP_FROMALTSTACK OP_IF OP_SWAP OP_ENDIF OP_CAT OP_HASH160 is repeated for each layer of merkle tree depth:

CashTokens unrolled loop1958×490 109 KB
switches/matches – some construct for switching based on a value would save a lot of wasted OP_ELSE OP_OVER <value> OP_EQUAL OP_IF manipulations. You can see an “emulated” switch at line 232 of the CashToken Covenant:

CashTokens emulated switch2546×748 119 KB

Other control flow structures? I’ll shoot high and estimate 10 new opcodes.

Total: 10

Introspection – estimate: 13

Unless I’ve messed some VM state, there are exactly 13 “raw” VM state components (opcodes 189 through 201 in the draft spec).

Any other introspection opcodes would be producing computed results from these raw components, and I’d argue we should at least wait to implement any “VM computed” (aggregated, hashed, etc.) operations at least until they are extremely common in the wild. E.g. I’d prefer we implement “bounded loops” from above rather than add an aggregated OP_TOTALOUTPUTSVALUE.

I think it will be hard to justify non-raw introspection opcodes, so I’m just going to assume the base 13 here.

Total: 23

Formatting – estimate: 5

I’d define this category for operations like OP_BIN2NUM and OP_NUM2BIN. On the horizon, if we don’t implement some strategy like PMv3’s use of Script Number in the transaction format, we’ll need an OP_NUM2VARINT and OP_VARINT2NUM. (I would prefer that we change the transaction “packaging” rather than add complexity to the VM.)

Also, depending on how we solve for bigger integers, some people are advocating for a new larger number type in the VM. (I think we should just allow larger Script Numbers. We’re already stuck with Script Numbers, and fortunately they are signed and variable-width, so they can support whatever we need.) If we somehow ended up adding a new number type though, that would require an OP_NUM2WHATEVER and OP_WHATEVER2NUM.

I think both of those would be bad ideas, but I’m probably also missing some useful new possibilities like OP_REVERSEBYTES, so maybe it’s safest to estimate another ~5 opcodes for future formatting operations.

Total: 27

New Crypto – estimate by 2040: 8

We should note that Bitcoin Cash has been around for 12 years already, and hasn’t yet added any completely new crypto algorithms, and certainly none as opcodes. I think a very high estimate for this is up to 4 opcodes per 10 years. (And that’s only if we ultimately wanted to support things like incremental hashing with separate INIT, UPDATE, DIGEST operations. I’d personally prefer that hashing be non-incremental.)

Total by 2040: 35

Math – estimate: 10

This is the category I’m least sure about.

If we added support for very large integers, it would be possible to implement some crypto systems with standard math operations (assuming we figured out how to limit abuse – that gets very messy). I’m not sure what other operations this might require, but I also don’t expect we’ll want to do this, since it blurs the lines between the “expensive operations” (which right now are only the crypto operations) and “cheap operations” (all the other current non-crypto operations). We’d have to reevaluate a lot about DOS limit and fee estimation.

When on-chain IPO/ICOs start happening (selling shares for covenants and covenant-based sidechains), we’ll probably have use for exponentiation and logarithms. (Though a lot of those applications can probably make do with bounded loops and OP_ADD, OP_SUB, OP_MUL, OP_DIV, OP_MOD, etc. If we add native operations, that could be 2 to maybe 6 opcodes?

Looking at the Math operations available in JavaScript, we can reasonably emulate the rounding-related methods (round, ceil, floor), but I’ll add 1 to the max estimate anyways.

Then all that’s left is trigonometric functions – anyone have an idea of how those might be useful in the BCH VM? (Are there any market maker or security-pricing algorithms that require trig functions?) I’ll add 3 because .

Total: 45

Have I missed anything? Is there some category of computation that could apply to transaction validation which we haven’t explored yet?

If not, I think we’re still well-within the 66 remaining undefined opcodes (OP_UNKNOWN189 through OP_UNKNOWN255), and that’s without recycling any previously defined opcodes.

Also, we’ll always be able to use OP_NOP opcodes as two-byte expansion opcodes, e.g. OP_NOP4 through OP_NOP10 would give us 255 more operations each, so 1,785 two-byte opcodes without even needing to reserve any range outside the OP_NOPs. (We could fit x86. )

In the sprit of YAGNI, I’m fairly convinced we would regret jumping ahead to two-byte opcodes in the near future.

proteusguy · March 18, 2021, 11:33am

I’ll throw in my initial thoughts but want to make clear that I’m waiting until I’ve had a chance to play around with the introspection ops (at the very least) and get some good real-world use cases before I determine my ultimate preference.

I’ll write this in the context of the introspection ops primarily because that’s the practical use case we’re dealing with right now. My perspective of this is that what we have here is a meta data push op. We’re identifying some piece of meta data about the current tx (and possible parent tx for other op code considerations being considered) and that’s it.

As a language and cpu ISA designer the very concept of dedicating an entire op code entry for each individual possible piece of meta data that we might ever want to push is crazy thinking. That’s my initial visceral reaction - it could be wrong - but it’s quite a shocking concept at first glance.

Even were you going to do this - you’d want those op codes to have an identical pattern with consecutive values for the selection bits for the meta data you’re requesting but this presumes there will never be any additions to the meta data that can be addressed once designed. This is akin to op codes referencing a set of registers for example. But it’s actually quite likely (I’d almost guarantee it) that some day we’ll want to add to this meta data and, by then, the natural op code pattern for it to use will be occupied by something else. So now our op code assignments are really messy and the interpreter is more complicated for really no good reason.

I also don’t think this justifies moving to multi byte opcodes - YET. But I think it’s actually a lot more reasonable to anticipate this coming one day and would recommend that we reserve opcodes with values > 0xC0 for when that eventuality arrives. This will make implementing the interpreter quite simple and clean when the time comes. So, in general, I think Tom’s proposal is prudent and important to at least reserve asap.

So now that I’ve attacked both the present ideas I guess I’m obligated to produce one of my own to get attacked. Actually mine is an expansion of BitBlockifTrue’s comment earlier. There are two options that I think provide a cleaner design and fit the concept of a proper stack machine ISA better. There should be a single meta data push op code. It should use the current item at the top of stack (tos) to identify which meta data to push. That tos data gets there as a prior PUSH instruction. I anticipate one of two ways to represent this item identifier. 1) An index value 0…n unique for each piece of meta data, or 2) A bit pattern with each bit position assigned, from least significant bit (lsb) to most significant bit (msb), to a single piece of meta data. Option 2 allows you to push multiple pieces of meta data onto the stack in a single instruction quite efficiently. I would do this in order of msb to lsb so the tos would have the meta data of the most-lsb with a 1 value. My preference for one over the other would be based on the real world use cases that demonstrate whether or not the typical pattern is to use only one piece of metadata at a time (therefore Option 1) or frequently multiple pieces of metadata at a time - perhaps a few pieces that always seem to come together (therefore Option 2).

So - I guess this really ends up being an endorsement of the proposal for muti byte codes, but without a practical use case that necessitates them yet and with an example of how to delay that necessity for quite some time by being properly frugal with the ones remaining.

tom · March 18, 2021, 1:26pm

I enjoyed your great points with forward looking statements.

I’m left with the nagging feeling that I would personally really like it if Bitcoin Script, and transactions as they exist today, stay valid and stay possible to be parsed and validated for a very very long time. I’m thinking 100 years here.

Now, this means that code written today needs to still be valid code in 50 years. But more importantly, new usecases that honestly none of us can imagine today will end up being added in the future. New ways to lock data based on concepts we have not invented yet.

So, I may be an optimist here but I think that it is not very responsible to state too hard that 250 opcodes are likely enough. Here is why, you can build for the future and the only design that takes the future into account actually will be future proof. Self fulfilling prophesy there.

At one point we should accept multi-byte opcodes are going to be part of BCH, the alterntive is accepting that BCH won’t become the world money we are hoping it to be. And that is why I wrote this idea down, so we can decide on WHEN we will move from 1-byte opcodes to 2-byte opcodes.

proteusguy · March 18, 2021, 1:40pm

For example, what happens when zero knowledge proofs become a fundamental aspect of cryptoledgers? How many opcodes is that? I envision several capabilities such as this in this decade alone.

bitjson · March 18, 2021, 7:17pm

I definitely agree. That was my initial reaction too, and that’s why the original proposal used only one opcode (the proposal is basically your 1st option, and the template as a bitfield PR is like your 2nd option). I wrote a bit here about what changed my mind:

And here’s why I abandoned the “templated” idea: (TL;DR: there’s only one hypothetical 1-byte optimization, and only for the relatively-rare case of UTXO reconstruction during parent transaction inspections.)

Is there a reason you would start at 0xC0 over 0xf0?

Even x86 is well below 2000 instructions right now (and as mentioned above, our model doesn’t benefit from the duplicated-but-faster paradigm that drives most of x86’s instruction set growth). 0xf0 through 0xff would leave space for >4000 two-byte opcodes. I could definitely get behind reserving that range (and if the time comes, our grandchildren could still decide to repurpose the OP_NOPs for two-byte extension ranges).

We share the same goal

There is another potential future: in 2080, we might have ~235 opcodes, but a random ~40 of them will be two-bytes. BCH would still be the world’s money (because this obscure detail of the system will have little effect on BCH’s product-market fit), but when our great grandchildren learn about “premature optimization”, BCH’s two-byte opcodes may be the go-to example.

In the case of zero knowledge proofs (ZKPs) specifically, those will almost certainly not be a set of new opcodes – for ZKPs to be integrated that way, all other operations would also need to be compatible with the ZKP paradigm, including e.g. flow control + OP_DEPTH. That may be possible with a lot of big breakthroughs + complex hacks, but it’s far more likely that a smaller, more well-proven ZKP system will be integrated in a self-contained way (where existing opcodes don’t complicate or break otherwise well-proven security properties).

So in practice, this will likely look more like a single OP_CHECKZKP or a small set of setup/check operations. Like a new sort of OP_CHECKDATASIG rather than a new grab-bag of small OP_ZKPADD, OP_ZKPSUB, etc. operations.

When thinking about new crypto algorithms, it’s probably most useful to think of the VM as exposing them over a VM-friendly API. Contracts don’t manually use large integer math operations to implement signature checking, they use OP_CHECKSIG. Again, it’s possible that future crypto systems will require different, more complex API interfaces, but I’ve tried to account for that by estimating high – 4 per 10 years. (And remember, in the past 10 years, we added ~2: OP_CHECKDATASIG and OP_CHECKDATASIGVERIFY.)

Just to clarify: it would require absurd overconfidence for me to claim that “no one will ever need more than X opcodes” for a relatively new VM, especially when we’re aiming at a 100+ year lifetime.

The easy answer here is “we probably can’t know how many opcodes we’ll need, let’s just plan for a ton.” But I don’t think that’s correct either: the paradigm of imperative programming in stack-based languages/VMs is over 50 years old now, and it’s been explored by a lot of intellectual giants. We can have reasonable confidence in its ultimate scope.

It’s very possible that we will figure out another “new frontier” in the current BCH VM (like with transaction introspection) which would benefit from a new category of operations. (It would be silly to assume we’re at the end of history/development). But realistically, we should try to actually estimate: what is the likelihood that we’ll find another large, new category of imperative operations, useful for defining spending constraints, which fits within the existing VM model? (E.g. doesn’t require an alternative VM model like ZKP.) Looking over the world of programming languages and VMs, I’d say less than 50%?

Now how likely are we to need more than 4000 new opcodes (exhausting the 0xf0 to 0xff range)? I’d say far below 1%. (And remember, if that happens, we can always use 3-byte opcodes. Even just considering the system cost over time: an extra byte in 2080 will be much cheaper than an extra byte in 2022.)

Again, we’re only looking for new operations which fit within the current VM model. In the next 100 years, we may need to add additional VMs to support e.g. homomorphic encryption or some quantum paradigm (beyond my current expertise), but those systems would also have their own mutually exclusive opcodes or basic primitives.

Based on those probabilities, jumping to two-byte opcodes in our imperative, stack-based VM to leave 16,000 two-byte opcodes would waste resources immediately, with a >99% chance of providing no benefit over only reserving space for 4000 opcodes.

tom · March 18, 2021, 8:03pm

I didn’t expect anyone to care this much about the specific cut-off.

Saying today that we would cut off at 0xC0 (aka 1100 0000) means that we still have a good amount of opcodes we can assign to 1-byte code-points. And we today can say “hey, this opcode is only really used in large-data situations, lets already now move it to be 2-byte”.

But, sure, the cut-off can be at 1110 0000 as well. Since 8K opcodes is likely enough too. That gives you 32 extra 1-byte opcodes.

I honestly didn’t expect you to care soo much about giving BCH the opportunity to grow more, at a cost of 1 byte for certain opcodes that likely never happen more than once a transaction.

You never actually explained what you think the cost is of going mult-byte? From what I can see, the cost is close to zero, all implementations already think in > 8 bits. The complexity of reading code is extremely low.
So, am I missing something other than you being convinced we never need it, what is the cost? Trying to follow the next quote, what regret are you referring to?

Your reaction looks a little extreme for a single byte extra for a small percentage of all transactions.

bitjson · March 19, 2021, 6:11pm

Ah, sorry for the length! That was not emotionally charged, there’s just a lot of depth to the subject.

Which opcode(s) are you thinking about here?

I’d be ok with this since it solves the immediate issue – the native introspection opcodes would be single byte and wouldn’t require other VM changes.

(Though I would hope future developers wouldn’t blindly accept that cut-off. I don’t think 0xe0 is a wise place to start two-byte opcodes either.)

You’re definitely right, it’s not hard to implement. It just makes parsing VM bytecode a little more complex. In the Libauth VM (a FP-inspired/stateless implementation) (used by Bitauth IDE), parsing happens before evaluation, so this would add a BCH-specific, special case in that currently-universal code. (I guess the change doesn’t look as nasty in codebases which don’t support other/historical chains?)

It would also create two classes of BCH opcodes – every time we’d consider adding a new opcode, we’d have to estimate how common/valuable it will be, then choose whether or not to use one of the remaining single-byte codepoints.

Excessive caution tends to look like intelligence to uninformed observers (see: BTC), so it would always be an uphill battle to push for new first-class, single-byte opcodes.

Since two-byte opcodes are almost certainly never going to be needed (and remember, we’ll always be able to get 2000 two-byte opcodes for free out of the unused OP_NOPs), I would prefer we avoid deploying multi-byte opcodes until we’ve at least filled through 0xef (50 more opcodes).

TL;DR: we probably won’t need two-byte opcodes before year 2100, and if they’re needed then, I think our great-grandchildren will decide to add them using the OP_NOP range.

tom · March 19, 2021, 8:35pm

I feel like I’m eating happy cookies while you are eating pickles.
A single byte extra for some opcodes that are rare (read; used once in 100 transactions). And you say this is making a class-difference. shakes head.

I’m sorry, I don’t really accept your strong resistance. I maintain that the cost is near zero and for opcodes that expect big pushes in transactions using them, this should be considered in order to be more future proof.

But the main point of this post was to have one more alternative to consider (or to at least normalize the conversation) with regards to the approach to the native introspection. I think we have some great options there and the need for multi-byte isn’t something I forsee being needed for 2022.

In the mean time its great to have this described online and to have a technical start of how it could be done for “future generations”

proteusguy · March 22, 2021, 4:42pm

I had read your earlier writing on your reasons. I didn’t find them compelling, however. I’ll try to explain why: Note that none of my arguments were based on saving bytes - they were on keeping the ISA clean and logical in terms of being a state machine VM. The term “templated” implies a denial of that reality. In stack machines it is perfectly natural and correct for operations to take inputs from the stack to control their behavior. It’s about as clean and safe a mechanism for code flow control as you can define. We are a stack machine VM so why not play to its strengths - one of which is a super simple ISA that’s amazingly easy to implement and test compared to register-based ISAs? The good news is that it’s quite efficient in terms of code size as well so there’s no downside in regards to that.

You also did not address my concern about how, while the initial meta-data push op codes might be sequenced together (thus making the interpreter and any correctness checking algorithms simpler to implement), future additions of meta-data that we might want to push will be randomly assigned somewhere else and complicate the above as well as just look damn sloppy.

Which you double down with:

ZKPs was just one obvious example but I can’t quite decide whether your claim is quite brave or lacking imagination at the same time. I share Tom’s wonder at your insistence on these points. Tom’s proposal is to reserve some space (and I agree it doesn’t have to be that many but I could easily make arguments for designs that would benefit from it) for future expansion that we know we cannot anticipate. Crypto is like in the 1977 era of the PC revolution now. It’s wild west. We’re on freaking v1 of the tx protocol structure. That’s not going to be the case by the end of this decade. The things we will be addressing in 5 years will shock and amaze each and every one of us. Here’s an opportunity to be conservative and protect our ability to expand with pretty much zero downside. In my experience you never pass up such an opportunity.