CHIP 2021-02 Add Native Introspection Opcodes

For transction input count, I am currently enforcing single-input transactions in the anyhedge setup, by verifying that the hash of all inputs matches the hash of the single input.

This opcode would let me clean up that code to remove unnecessary hashing and have the actual check enforced be what I want to do - enforce that input count equals 1 - rather than coming to that conclusion by other means.

I can imagine similar things for output and other numbers, but I don’t have any examples.

For the transaction versions, I don’t know what that use case would be today.

It is worth mentioning that the list at the moment is exhaustive and that the plan is to survey users of introspection on their needs and not include things that no one wants.

1 Like

Thanks, makes sense. And nice trick! :slight_smile:

It looks like a neat feature to have!

Maybe you already explained it elsewhere but I’ll go ahead and ask – what was the rationale of going with multiple opcodes instead of just one like in Jason’s proposal? His looks cleaner and future proof.

Jason’s OP_PUSHSTATE 0x07 woud push the output index and is functionally equivalent to
your: OP_OUTPOINTINDEX.

So we save a byte, is that it? The cost is having to take a lot of opcode numbers and being limited in number of types of introspection. Sure, Jason’s 256 is still limited but it is an order of magnitude bigger number so would give us more headroom if we later think about some other neat introspection, possibly even derived, like a calculated value based on some TX data?

As you already demonstrated with your anyhedge example, we’d save a lot of space by having the feature in either form, so is saving that one extra byte really worth it?

With Jason’s you could enable multi-push too, say a template 0xFF for multipush which has to be followed by number of templates and then the actual templates, and nesting another 0xFF forbidden, so you could do:
OP_PUSHSTATE 0xFF <count> <template1> <template2> ... <templateN> e.g.
OP_PUSHSTATE 0xFF 0x02 0x06 0x07 Which would push output hash and index to the stack.

Hell, you could even implement a simple calculator with these sub-opcodes to get some derived values. Dunno, maybe this is taboo, because one could get carried away and use a single opcode to enable a whole other programming language in the bytecode following it…

2 Likes

The reason for using multiple opcodes is to have simpler and more clear scripts and opcode implementations. Originally Jason wanted to use a templated version, because he thought there would be clear needs to push multiple things at the same time, but after he implemented and started writing contracts using it he realized that he had misjudged the needs and thus chose to go for a cleaner / simpler system.

Personally, I am ok with both approaches, but if we are to have a templated approach I think it would be best to first push a string to the stack, where each byte represents one data point - then pusht eh opcode and it will, for each byte of the string, push one stack element with the content desired.

I see some value in the templated version, for example you can push the same data to the stack more than one time, and you can make the template ordered to optimize execution.

I talked with several other people about this, and there are some that prefer templated, and some that prefer multiple opcodes. Given the people involved in building and making this happen, the sentiment is currently leaning towards multiple opcodes and I’m OK with that.

I further suggested, that since we already are looking to use a lot of opcodes, why not use 1 opcode more, and add the templated version as well - it will cost more (one more thing to implement, and one additional opcode used) but it will make contracts that use multiple data points smaller.

3 Likes

Thanks for writing it down, it’s clearer now!

This makes more sense yeah, it’s more dynamic as you could then construct that string from inside the VM.

On this topic, I’d like to suggest the VM-wide multiple bytes approach.

If there is support for this we can run this as a separate CHIP and make the NI-chip move independent on its success or failure.

Bitcoin Script: multi byte opcodes

1 Like

Just to add to this – my current thoughts on why we should go with the multiple opcode approach:

I originally thought this parameterized strategy was the best: op-pushstate, but I changed my mind for a few reasons:

  1. variadic parameters – that original proposal was using a lot of derived state (I consider it just wrong now :sweat_smile:) so I had missed that many of these state elements really should accept and number from which to select and input or output (I had just missed the idea entirely until @tobiassan sent his PR: `OP_PUSHSTATE` —> multi-byte `OP_TX*` opcodes. by EyeOfPython · Pull Request #1 · bitjson/op-pushstate · GitHub) - if some opcodes need to accept a different number of parameters, we need to separate them into at least 2 different opcodes to avoid having the opcodes accept different numbers of parameters based on value (which makes static analysis, provers, and certain compilers far more complicated).

  2. YAGNI You aren't gonna need it - Wikipedia – there’s only so much the BCH VM could reasonably do without changing its operating model. 10 opcodes is plenty for adding/replacing broken future crypto algorithms, and 10 more is probably plenty for additional control flow structures (loops? switches?) and math operations (exp, log, maybe rounding?). If we’re already adding all the possible state elements with 13 opcodes, there’s a good chance we’ll never need 255 opcodes.

Consider: even if we assume we’ll eventually have thousands of opcodes – why would we start using double-width opcodes now, when we’re still well below 255? Maybe we think these introspection opcodes will be ultra-rare, and opcodes 256-300 will be more common? (If so, where are those opcodes? Wouldn’t we have wanted them already if they’re so useful?) Doubling our cost now would be classic premature optimization: we’d force VM implementations to support multi-byte opcodes, increase script sizes, complicate script complexity calcuation, etc. and we’d get nothing for it but a vague idea that we’re prepared for a future that may never happen. YAGNI – if we ever get to 250 opcodes, we can always use the last 5 for expansions. (And trying to expand before then also doesn’t win us anything – we’re already past 127, so we can’t do the UTF8 trick.)

Related: some background on how the 13 opcodes in the draft TxInt spec were derived from the VM:

Fortunately, it’s easy to prove correctness for introspection opcodes – the VM is already defined, so we have nothing to design. There’s a precise list of raw state currently available to the VM, and in the CHIP opcodes 189 to 201 are exactly those state elements. I’m also pretty confident the parameters are uncontroversial too: for each numerically-indexed state element, its opcode just accepts the number to select. (But if anyone has another idea for how that should work, we should talk about it.) That’s why the CHIP includes e.g. an OP_INPUTSEQUENCENUMBER – I’m not sure if there’s any realistic use case, but it’s in the list for correctness/completeness. (We can choose not to add it if we want to save an opcode. But we should think of that the same way as if OP_1 through OP_16 were included, but OP_7 was excluded because we decided no one is likely to use the number 7 in real contracts :laughing:)

Any other possible introspected state would necessarily be computed from these basic state elements (e.g. the aggregated/hashed ones below). In those cases, I’m currently thinking it’s a better idea to not complicate things with aggregated/hashed/templated operations and instead let contracts compute aggregated state using the existing VM operations (e.g. OP_CAT, transformations, math, etc). If there’s an important use case which can’t be done that way, I’d say it’s a deficiency in the rest of the VM, like a lack of some sort of safe OP_LOOP.

2 Likes

And because I haven’t written it anywhere else yet: the reason I abandoned the “templated” idea from OP_PUSHSTATE is that I haven’t been able to come up with any examples where concatenating state elements is more efficient that pushing elements one at a time and checking them individually.

At the time, I was very used to the idea that you’d need to concatenate lots of state to re-create the signing serialization, then use the OP_CHECKSIG + OP_CHECKDATASIG hack. But introspection opcodes provide trusted data directly from the VM – I don’t think you’ll ever need to reconstruct the signing serialization of the current transaction, since you can check whatever state you want via opcodes. (And further, an introspection opcode with “template” support would barely even help with e.g. reconstructing parent transactions, since only the UTXO data is available to the VM. Everything else has to be provided via unlocking bytecode.)

1 Like

@Jonathan_Silverblood
Please consider adding a “Discussions” section somewhere in the CHIP that contains links all of the places discussions about this CHIP are taking place in order to make it easier for those who have a link to the CHIP to find right place for discussions.

Please also consider adding the following BIP style header to the top of the CHIP (and correct any information i got wrong)

```
Title: Add Native Introspection Opcodes
Created: 2021-02-21
Last Edited: 2021-03-22
Owner: Jason Dreyzehner, Jonathan Silverblood
Type: Technical
Layer: Consensus
Status: Draft
```

In the first section of the chip, there is: DISCUSSION: Bitcoin Cash Research, Telegram with links. I can add more if there’s other places with significant and focused discussion, but generally I think it’s a good thing that discussion doesn’t happen in a lot of spread out places.

The section I referred to also includes the current status of the chip, who the owners are etc - is there any particular item that you feel is missing from it?

i did not realise those were clickable links. my mistake

In terms of information, dates for creation and last edit. More specifically I am requesting the inclusion of the header itself to align the format with other CHIPs.
Examples:
Multiple_OP_RETURN_for_Bitcoin_Cash
Group_Tokenization_for_Bitcoin_Cash
Allow Smaller Transactions

The purpose of concatenation (and optional hashing of the concatenated result) is to use it along with CHECKDATASIG to allow for the signing of arbitrary subsets of the transaction. CHECKDATASIG + introspection opcodes is more fundamental (can fully replace) CHECKSIG, but do so in a more powerful way.

The sighash flags byte is a limited attempt to sign arbitrary subsets of the transaction. But it fails because it is so limited. For example, SIGHASH_SINGLE can’t be used to enforce the existence of some important output and a change output.

I am working on the implementation of this spec. for BCHN and Knuth.
It would be helpful if interested parties could provide test cases with real examples of use of these new op. codes.
Thanks!

1 Like

Hi. I dont’ have anything at the moment, but Jason might. I made an issue for it for now: Introspection: test cases (#21) · Issues · GeneralProtocols / Research / CHIPs · GitLab

I will bring it up on the telegram chat and see if there’s anything available.

1 Like

Hey all, I just pushed up a MR with a new revision for the introspection CHIP:

This revision clarifies several technical details and adds an extensive Rationale section documenting design decisions made in the spec.

And here’s the MR. Review and feedback appreciated!

1 Like

Minor suggestion: please add versioning and changelog into the document, it is easier to the readers to follow the changes.

2 Likes

Another suggestion: I would split the spec so it doesn’t depend on either PMv3 or Bigger Script Integers

1 Like

Thanks for the review @fpelliccioni! (For others following this thread, there was more discussion on Telegram: Telegram: Contact @transactionintrospection)

These suggestions are both incorporated in v1.1 :+1:

1 Like

@bitjson
I have a small issue with this spec and am requesting the following edit.

The stated used opcodes in the technical section are 0xc0 (192) through 0xcf (207) however 0xce is dependent on PMv3 and 0xcf is never mentioned anywhere else other than that it is included in the spec.

Can you please add a table for reserved op codes under the Unary Operations table to clearly mark 0xce and 0xcf as reserved? They should also be named OP_RESERVED3 and OP_RESERVED4 or some other name of your choosing that clearly indicates they are reserved for this spec.

I feel that the spec is a little unclear that these are reserved op codes without this.

(sorry about the late edit request, i did not notice this until late stages of spec implementation)

@Griffith thanks for the comment! @Jonathan_Silverblood opened an issue with an initial MR. I made some comments here:

Please let us know what you think.