CHIP 2021-02 Add Native Introspection Opcodes

Jonathan_Silverblood · March 28, 2021, 8:41pm

Opening up here for public discussion around the Native Introspection proposal: adding one or more opcodes to the Bitcoin Cash scripting language that pushes transaction information into the stack for usage inside scripts.

The current state of the CHIP is still very much a draft, but there has been real work done to build and implement variations of native introspection and I expect the CHIP to be updated as the discussion moves forward.

emergent_reasons · February 23, 2021, 12:02pm

Statement from General Protocols.

(fixed naming)

emergent_reasons · February 23, 2021, 3:31am

Actually on this point, could you please specify the standards-type name for this CHIP? I think the one I used is good but whatever it is I think it is valuable for CHIPs to converge on some kind of standard that is useful, memorable and does not beg for centralized control.

Jonathan_Silverblood · February 23, 2021, 11:22am

After a discussion about naming I’ve renamed the proposal and filename, new link is: CHIP-2021-02-Add-Native-Introspection-Opcodes.md · master · GeneralProtocols / Research / CHIPs · GitLab

Jonathan_Silverblood · February 23, 2021, 11:22am

You can change to the new name now, should be more consistent with the other CHIP names.

proteusguy · March 8, 2021, 7:41am

Curious - several of these operations are effectively meta-data PUSH ops. Shouldn’t they be allowed in unlocking scripts? That would mean an adjustment in transaction validation per HF-20181115 presumably because these opcodes don’t have values of 0x60 or less.

proteusguy · March 8, 2021, 9:07am

What would be some use cases where we want to use OP_TXINPUTCOUNT, OP_TXOUTPUTCOUNT, or OP_TXVERSION?

Jonathan_Silverblood · March 8, 2021, 8:44am

I can’t think of a good reason right now as to why they wouldn’t be allowed there, but adding a requirement to change those rules might negatively impact the work to reach concensus. If you have a clear usecase I would encourage you to share that, and if there’s significant demand I might consider addressing this as part of the same proposal.

Jonathan_Silverblood · March 8, 2021, 8:48am

For transction input count, I am currently enforcing single-input transactions in the anyhedge setup, by verifying that the hash of all inputs matches the hash of the single input.

This opcode would let me clean up that code to remove unnecessary hashing and have the actual check enforced be what I want to do - enforce that input count equals 1 - rather than coming to that conclusion by other means.

I can imagine similar things for output and other numbers, but I don’t have any examples.

For the transaction versions, I don’t know what that use case would be today.

It is worth mentioning that the list at the moment is exhaustive and that the plan is to survey users of introspection on their needs and not include things that no one wants.

proteusguy · March 8, 2021, 9:05am

Thanks, makes sense. And nice trick!

bitcoincashautist · March 11, 2021, 12:35pm

It looks like a neat feature to have!

Maybe you already explained it elsewhere but I’ll go ahead and ask – what was the rationale of going with multiple opcodes instead of just one like in Jason’s proposal? His looks cleaner and future proof.

Jason’s OP_PUSHSTATE 0x07 woud push the output index and is functionally equivalent to
your: OP_OUTPOINTINDEX.

So we save a byte, is that it? The cost is having to take a lot of opcode numbers and being limited in number of types of introspection. Sure, Jason’s 256 is still limited but it is an order of magnitude bigger number so would give us more headroom if we later think about some other neat introspection, possibly even derived, like a calculated value based on some TX data?

As you already demonstrated with your anyhedge example, we’d save a lot of space by having the feature in either form, so is saving that one extra byte really worth it?

With Jason’s you could enable multi-push too, say a template 0xFF for multipush which has to be followed by number of templates and then the actual templates, and nesting another 0xFF forbidden, so you could do:
OP_PUSHSTATE 0xFF <count> <template1> <template2> ... <templateN> e.g.
OP_PUSHSTATE 0xFF 0x02 0x06 0x07 Which would push output hash and index to the stack.

Hell, you could even implement a simple calculator with these sub-opcodes to get some derived values. Dunno, maybe this is taboo, because one could get carried away and use a single opcode to enable a whole other programming language in the bytecode following it…

Jonathan_Silverblood · March 12, 2021, 6:02am

The reason for using multiple opcodes is to have simpler and more clear scripts and opcode implementations. Originally Jason wanted to use a templated version, because he thought there would be clear needs to push multiple things at the same time, but after he implemented and started writing contracts using it he realized that he had misjudged the needs and thus chose to go for a cleaner / simpler system.

Personally, I am ok with both approaches, but if we are to have a templated approach I think it would be best to first push a string to the stack, where each byte represents one data point - then pusht eh opcode and it will, for each byte of the string, push one stack element with the content desired.

I see some value in the templated version, for example you can push the same data to the stack more than one time, and you can make the template ordered to optimize execution.

I talked with several other people about this, and there are some that prefer templated, and some that prefer multiple opcodes. Given the people involved in building and making this happen, the sentiment is currently leaning towards multiple opcodes and I’m OK with that.

I further suggested, that since we already are looking to use a lot of opcodes, why not use 1 opcode more, and add the templated version as well - it will cost more (one more thing to implement, and one additional opcode used) but it will make contracts that use multiple data points smaller.

bitcoincashautist · March 12, 2021, 8:02am

Thanks for writing it down, it’s clearer now!

This makes more sense yeah, it’s more dynamic as you could then construct that string from inside the VM.

tom · March 16, 2021, 12:01pm

On this topic, I’d like to suggest the VM-wide multiple bytes approach.

If there is support for this we can run this as a separate CHIP and make the NI-chip move independent on its success or failure.

Bitcoin Script: multi byte opcodes

bitjson · March 18, 2021, 4:43pm

Just to add to this – my current thoughts on why we should go with the multiple opcode approach:

I originally thought this parameterized strategy was the best: op-pushstate, but I changed my mind for a few reasons:

variadic parameters – that original proposal was using a lot of derived state (I consider it just wrong now ) so I had missed that many of these state elements really should accept and number from which to select and input or output (I had just missed the idea entirely until @tobiassan sent his PR: `OP_PUSHSTATE` —> multi-byte `OP_TX*` opcodes. by EyeOfPython · Pull Request #1 · bitjson/op-pushstate · GitHub) - if some opcodes need to accept a different number of parameters, we need to separate them into at least 2 different opcodes to avoid having the opcodes accept different numbers of parameters based on value (which makes static analysis, provers, and certain compilers far more complicated).

YAGNI You aren't gonna need it - Wikipedia – there’s only so much the BCH VM could reasonably do without changing its operating model. 10 opcodes is plenty for adding/replacing broken future crypto algorithms, and 10 more is probably plenty for additional control flow structures (loops? switches?) and math operations (exp, log, maybe rounding?). If we’re already adding all the possible state elements with 13 opcodes, there’s a good chance we’ll never need 255 opcodes.

Consider: even if we assume we’ll eventually have thousands of opcodes – why would we start using double-width opcodes now, when we’re still well below 255? Maybe we think these introspection opcodes will be ultra-rare, and opcodes 256-300 will be more common? (If so, where are those opcodes? Wouldn’t we have wanted them already if they’re so useful?) Doubling our cost now would be classic premature optimization: we’d force VM implementations to support multi-byte opcodes, increase script sizes, complicate script complexity calcuation, etc. and we’d get nothing for it but a vague idea that we’re prepared for a future that may never happen. YAGNI – if we ever get to 250 opcodes, we can always use the last 5 for expansions. (And trying to expand before then also doesn’t win us anything – we’re already past 127, so we can’t do the UTF8 trick.)

Related: some background on how the 13 opcodes in the draft TxInt spec were derived from the VM:

Fortunately, it’s easy to prove correctness for introspection opcodes – the VM is already defined, so we have nothing to design. There’s a precise list of raw state currently available to the VM, and in the CHIP opcodes 189 to 201 are exactly those state elements. I’m also pretty confident the parameters are uncontroversial too: for each numerically-indexed state element, its opcode just accepts the number to select. (But if anyone has another idea for how that should work, we should talk about it.) That’s why the CHIP includes e.g. an OP_INPUTSEQUENCENUMBER – I’m not sure if there’s any realistic use case, but it’s in the list for correctness/completeness. (We can choose not to add it if we want to save an opcode. But we should think of that the same way as if OP_1 through OP_16 were included, but OP_7 was excluded because we decided no one is likely to use the number 7 in real contracts )

Any other possible introspected state would necessarily be computed from these basic state elements (e.g. the aggregated/hashed ones below). In those cases, I’m currently thinking it’s a better idea to not complicate things with aggregated/hashed/templated operations and instead let contracts compute aggregated state using the existing VM operations (e.g. OP_CAT, transformations, math, etc). If there’s an important use case which can’t be done that way, I’d say it’s a deficiency in the rest of the VM, like a lack of some sort of safe OP_LOOP.

bitjson · March 17, 2021, 5:15am

And because I haven’t written it anywhere else yet: the reason I abandoned the “templated” idea from OP_PUSHSTATE is that I haven’t been able to come up with any examples where concatenating state elements is more efficient that pushing elements one at a time and checking them individually.

At the time, I was very used to the idea that you’d need to concatenate lots of state to re-create the signing serialization, then use the OP_CHECKSIG + OP_CHECKDATASIG hack. But introspection opcodes provide trusted data directly from the VM – I don’t think you’ll ever need to reconstruct the signing serialization of the current transaction, since you can check whatever state you want via opcodes. (And further, an introspection opcode with “template” support would barely even help with e.g. reconstructing parent transactions, since only the UTXO data is available to the VM. Everything else has to be provided via unlocking bytecode.)

Griffith · March 31, 2021, 7:59am

@Jonathan_Silverblood
Please consider adding a “Discussions” section somewhere in the CHIP that contains links all of the places discussions about this CHIP are taking place in order to make it easier for those who have a link to the CHIP to find right place for discussions.

Please also consider adding the following BIP style header to the top of the CHIP (and correct any information i got wrong)

```
Title: Add Native Introspection Opcodes
Created: 2021-02-21
Last Edited: 2021-03-22
Owner: Jason Dreyzehner, Jonathan Silverblood
Type: Technical
Layer: Consensus
Status: Draft
```

Jonathan_Silverblood · March 31, 2021, 8:21am

In the first section of the chip, there is: DISCUSSION: Bitcoin Cash Research, Telegram with links. I can add more if there’s other places with significant and focused discussion, but generally I think it’s a good thing that discussion doesn’t happen in a lot of spread out places.

The section I referred to also includes the current status of the chip, who the owners are etc - is there any particular item that you feel is missing from it?

Griffith · March 31, 2021, 8:35am

i did not realise those were clickable links. my mistake

In terms of information, dates for creation and last edit. More specifically I am requesting the inclusion of the header itself to align the format with other CHIPs.
Examples:
Multiple_OP_RETURN_for_Bitcoin_Cash
Group_Tokenization_for_Bitcoin_Cash
Allow Smaller Transactions

andrewstone · March 31, 2021, 5:28pm

The purpose of concatenation (and optional hashing of the concatenated result) is to use it along with CHECKDATASIG to allow for the signing of arbitrary subsets of the transaction. CHECKDATASIG + introspection opcodes is more fundamental (can fully replace) CHECKSIG, but do so in a more powerful way.

The sighash flags byte is a limited attempt to sign arbitrary subsets of the transaction. But it fails because it is so limited. For example, SIGHASH_SINGLE can’t be used to enforce the existence of some important output and a change output.