CHIP 2021-02 Unforgeable Groups for Bitcoin Cash

bitcoincashautist · February 23, 2022, 7:13pm

Hi All,

I understand the points made by Tom elsewhere that a CHIP is not a goal in itself. What drove me to partake in activities leading to this was that all the conversations I saw about Group Tokens seemed to be wasting energy because it was not clear what the conversation is about. I wanted to break what I saw as deadlock. Now we have a more concrete subject which I believe addresses many of the historical concerns about group tokens, because it has evolved since the time you may have last looked at it! I hope that everyone’s familiarity with the base concept has already been increased. I see this CHIP as a starting point, so we can better focus future discussions and get better ROI on everyone’s energy.

With this in mind I present you the Group Tokenization For Bitcoin Cash CHIP.

I want to thank @andrewstone for creating Group Tokens and accepting my help in bringing it forward, and everyone else for having patience with me even though I came in banging. This shows me the maturity of this community. You didn’t sneer me off, instead you constructively criticized me and offered me a way to do better, and for that I am thankful. This is the way.

I believe the whole of us could benefit from this development and that other proposals could, at this early stage in the upgrade cycle, start to think on how to use Group Tokens to enrich their proposals with it.

With this synergistic effect Bitcoin Cash could deliver a more powerful set of features than any single team could alone!

Best regards,
Anonymous Contributor A60AB5450353F40E

bitcoincashautist · February 26, 2021, 6:05pm

Quick update: I started working on a doc which I intend to attach to the CHIP. The idea is to explore some use-cases of Group Tokens which would be enabled by implementing the spec. I decided to start with a stablecoin example seeing how it’s one of more popular uses of tokens, and start with examples using only genesis, baton, mint, melt to demonstrate the power and versatility of group token approach. It’s hosted here:

The idea is to answer the “How to do X with group tokens?” kind of questions. This is just to let anyone interested know of its existence and I’ll simply update it in the repo as I think of new examples and copy examples from Andrew’s functional description document.

tom · February 26, 2021, 11:00pm

I just have the wish that the part you labeled “technical description” actually described the tech. It now is a bit like a history of ideas, realizing how it didn’t work, and then how direction was changed.

bitcoincashautist · March 2, 2021, 10:47am

I imagined that section as an introduction of the concept to someone who first hears of group tokenization. My logic is that the CHIP should have an easy flow:

problem → concept → spec → usage examples

The technical description could be more succint and more detailed but then I feel it’d be redundant with the spec. It introduces the main concepts: dual-currency token outputs, authority outputs, token genesis, and also addresses incentives and scaling concerns right at the start. The spec is there to show exactly what authorities are proposed, how many bits they need, exactly how they affect transaction validation, etc.

Simply adding a currency field was never proposed AFAIK but I start it off like that just to demostrate why it’s necessarry to have dual-currency outputs. Having independent other currency outpus would require touching BCH consensus logic to make some BCH disappear to create a token, and later reappear upon token destruction. That would be a major change. With group tokens, we can avoid that and simply add a few rules without touching existing ones.

PS an update: The examples now show how to do stablecoins and atomic swaps (even multi-party, multi-currency in the same TX). Coin Join can be done too, that will be next.

tom · March 2, 2021, 3:07pm

It did? I missed that.

Could it try to introduce it without the history side-steps and simply just a nice graph of the concepts? Actual concepts, not some verbose description based on examples that most of us don’t actually know.

Let me be more specific:

When thinking about multiple currencies what naturally comes to mind is to generalize the protocol to support a currency field in the transaction format

This sentence raises more questions than it answers. Let me try to figure out what you may mean with it…
Satoshi designed the transaction format to have a ‘value’ field, in satoshis. When you say “currency” field, you may be referring to this amount. Or maybe to some token-ID, mistakingly using the term currency for a token. Or maybe its something else entirely.

So, lets assume you meant token-ID, as that is how group seems to be built.

I’d expect a technical introduction to read like this;

Bitcoin Cash transactions are designed to move money, always in the form of its native token BCH, in the unit “Satoshis”. From now on simple called BCH.
This proposal intents to allow the movement of other types of tokens via the same means as BCH tokens. Some inputs unlocking a specific token and some outputs moving that same token.
The BCH inputs and outputs can be combined with any number of token inputs and outputs in the same transaction. The only rule is that the amounts in, counted for each individual token, has to be exactly the same as the amounts send out. Only for BCH are fees subtracted.

More text is needed to explain the management actions, but there isn’t enough info in your version for me to understand how they work, so I’d just frustrate you by making a suggestion

This approach avoids introducing lots of specific terminology, is exact about details and avoids long drama-like statements about who decided what and why (we don’t care in this section!).

(ps. I don’t actually know if the rules for fees, as I wrote it in my example, is correct. Please correct me if its wrong).

bitcoincashautist · March 2, 2021, 4:14pm

Thanks for the constructive feedback! I’ll rework the section having that in mind.

It’s correct, only the BCH fees get subtracted. If no management prevouts are in the inputs then the token quantity balancing check is the default i.e. ==. If at least one MINT is present then it’s <=, if at least one MELT is present then it’s >=, and if at least one of each is present then it’s entirely skipped. For completeness, with BCH it’s only ever the >= check where the “extra” goes to fees instead of being destroyed.

tom · March 2, 2021, 5:04pm

I created a merge request, please check that.

tom · March 2, 2021, 6:45pm

@andrewstone

Some feedback on the spec (which I hope will be incorporated in the CHIP soon).

QuantityOrFlags

The encoding is not standard. Please just reuse var-int as used in BCH already.

Problem:
The “Flags” part (authority) is encoded in bit 64.

This looks like a sub-optimal solution. especially the fact you only have a handful of bits to actually encode causing a massive waste.

What about instead of the highest bit being used you use the lowest bit as a switch between the two options.
Put the value (either a quantity or a flags option) into a uint64_t, bitshift 1 to the left and that is encoded in a var-int for placement in the Script.

result, your “Bits” always use 1 byte instead of 8. Your quantity is also using the minimum amount of bits with no loss of resolution.

What about a 160 bit (ripe) based hash instead for the groupID?

If this hash is good enough for our payment addresses, its good enough for group. And it saves 12 bytes.

Group flag bits are part of the hash.

Problem:
You can’t change them directly, users need to ‘grind’ data to create the right group-flags.

This sounds weird, what about a simpler solution where the group-data is hashed into a ripe160 hash (20 bytes) and the groupId is 21 bytes.
The extra byte is a var-int encoded integer with the flag-bits. Future extensibility is easy because its a var-int (see link above).

confusing note:

Note: If OP_GROUP appears as a proper script prefix, both these rules and the script machine rules apply. If OP_GROUP is used in any other place within the script, only the script machine rules are applied. This means that its valid to have an OP_GROUP in some other location within a script whose arguments have any length. Even though this makes specification more complex, it makes validation simpler.

This doesn’t parse for me. I don’t understand.

Grammar:

“This formulation means that a transaction can contain only create one new group.”

IsStandard Changes

Why not force op_group to be at the start of the output-script, this allows
template detection to be much easier.

This then leads you to just drop the OP_GROUP (and the 2 pushes) from the start and then continue as normal with minimal impact. Specifically, any future “standard” scripts need not care or know about Group.

Subgroup ID

the specy writes that the subgroup-ID can be any length. This means people can use subgroup IDs to be the next op_return random-data container.
Please set a sane maximum size for subgroups. I’m thinking 10 (additional) bytes should be enough.

bitcoincashautist · March 3, 2021, 11:16am

Thanks, you make good points, can’t wait to see what Andrew thinks. In the meantime I’d like to clarify two points.

They aren’t meant to be changed – ever. Those are used to record permanent features of a group which are set in stone at genesis. Two GroupIDs sharing the same bits except settings bits (computationally impossible btw) would then be entirely different tokens! This way, the recipient can immediately know what kind of token he’s dealing with. I’ll try to make it more clear in the next iteration of technical description.

There’s a kind of symmetry with group settings and group authorities: group settings are used to globally constrain the group, and authorities are used to locally (single TX scope) unconstrain it.

The spec already enables that, but without force. If OP_GROUP is at the start it’s treated as a group token so wallet templates can work with that assumption. If it’s anywhere else then it’s NOT a group token. It’s not forbidden but is simply ignored and changes nothing for the Script since it neither executes nor changes the stack so it will be nothing but useless data to which we can apply your other point of enabling a random data container.

tom · March 3, 2021, 1:24pm

I guess I chose my words poorly. I didn’t expect them to be changed. I suggest to be able to explicitly set them at creation. The nonce solution looks weird.

Right, and its trivial to force it, so lets do that.

andrewstone · March 3, 2021, 10:20pm

QuantityOrFlags

The encoding is not standard. Please just reuse var-int as used in BCH already.

The reason VarInt is not used is because this number is a pushed CScript entity so it already has a size. Putting a VarInt in a script stack item specifies the size twice so it inefficient. VarInts also don’t appear in scripts. They are also complicated compared to what I specified which is basically just interpreting the number in standard little-endian format. This is very similar to what CScriptNum does except that CScriptNum has additional “minimal encoding” constraints. These minimal encoding constraints are not needed in Group prefixes because they are not malleable (do not appear in input scripts).

WRT using the lowest bit rather than the highest: Yes, it would use less room for the authorities. But use of authorities is rare compared to normal tx. And I think doing that would be a lot more confusing and likely buggy for normal use. Finally, there is a future use for some of those empty bits. In particular, they might define a maximum quantity of mint or melt that this authority can do. Doing this would be more rules though, so I think we should wait for a need to show up.

But I am willing to change this if the majority of the dev community wants it the more efficient, low bit way.

What about a 160 bit (ripe) based hash instead for the groupID?

It is generally agreed that P2SH is right on the edge with 160 bits due to the fact that for certain operations among multiple parties its security is actually 2^80 via wagner’s birthday attack. A similar attack is possible against groups where a group creator searches for a GrpId collision at difficulty 2^80 and then commits only one of the 2 transactions. This group could have a guaranteed limited supply via burning authorities. However, subsequently committing the 2nd transaction would recreate the authorities that the creator burned.

GroupIds will be highly compressible because they will be repeated a lot, especially when you consider that tokens will likely have exponential popularity (few tokens will constitute most of the transactions).

Group flag bits are part of the hash.

The group flag bits define the properties of the group which are not changeable. That would not be “fair” to holders. Also, its more efficient to do it this way.

confusing note

hmm yes I’m trying to say something obvious very pedantically. What I’m trying to say is that OP_GROUP only has group semantics if it comes first. But it can appear anywhere in the script and is just treated as a no-op that drops its 2 arguments. This makes implementation simple.

IsStandard Changes

What you are saying is the right way to implement it. OP_GROUP appearing anywhere except as the prefix is non-standard. You can’t quite drop it because it should be stored in the UTXO, etc… but basically you can skip executing it.

There was pressure years ago for the Group prefix to be interpretable as part of a normal script, since its in the “CScript”. Group could even be implemented as a soft fork by reusing one of the NO-OP instructions.

If no one cares about this anymore, then the most extensible way to do it is to make CScript be divided into an “attribute prefix” and then the actual script (if there is one). We could use a single leading opcode to identify such a “CAttributeScript”. There are other interesting attributes. For example, if there’s an opcode that pushes attribute N onto the stack then scripts that enforce constraints on other scripts could be much simpler to write since all the variable parts could be factored out into attributes. And really when you think about, P2SH is better conceived of as attribute: the hash of the script that will be supplied later. Its not a viable script which is why bitcoind needs to handle it specially.

But again I’d like to hear overwhelming support from the community for a change like that. Making OP_GROUP an executable opcode that just pops its args from the stack is the minimum-change path.

Subgroup ID

Note that the subgroup id is limited by the maximum stack size. I don’t have strong feelings about this or about what this number should be. But at a minimum, encoding another cryptographic hash makes sense since that hash can act as lookup-and-proof of some larger amount of data.

At the same time, I have always felt that we can solve data-on-blockchain problems by encouraging miners to charge more for data transactions (esp. now that double spend proofs exist), which is why I specced it this way.

So I’m happy with whatever BCHN as the majority hash power or the dev community at large chooses so long as its >= 32 bytes.

tom · March 4, 2021, 9:48am

The problem with that is that it mixes layers. The code that interprets the group stuff doesn’t care much about Script. Script is really just the carrier wave. To the Group checking code there are simply two byte-arrays that it gets and needs to interpret.

From that point of view, the input being 2 bytearrays, I think using an existing encoding is the way to go.

I doubt that it would be more buggy. All it adds to “normal usage” is a times by two or divide by two.
Conversely, your option makes that same “normal usage” test for a bit.

Again, the point of view of which software deals with this data is relevant. A JavaScript version is definitely not harder, possibly even easier with the checking bit at the lowest.

  let amount = in2 / 2;
  if (Math.trunc(amount) == amount) {
     Console.log("Token amount being transferred " + amount);
  } else {
     Console.log("This is a control transaction!");
  }

ps. extensibility is not an issue. Its a var-int. If you need more bits, it scales with no backwards compatibility issues.

The whole process on creation of tokens is still unclear to me, I’ll have to read more.

The suggestion was not to remove the bits from the groupId. Naturally the bits are part of the group Id, just not part of the hash (still not clear what is gained by it being a hash, btw).

Ok, I don’t want to go with your much more expensive idea you followed up with. But can we simply agree to the idea that OP_GROUP has to be first, otherwise the transaction is invalid (consensus). The same way that a single byte is appended for the sighash. That approach cleans up your spec too.

You have any explanation of that somewhere?
The whole subgroups part isn’t really explained in the spec or the chip yet. (or I missed it, apologies!)

Ok, lets set the (sub)groupID to 32 + 32 bytes max then. Sounds expensive for the general case, and I will keep pushing to try to optimize that.
But the max is relevant, the avoidance of another general data-store is important and has to be taken into account.

bitcoincashautist · March 4, 2021, 10:44am

You’re right – it’s not, I plan to add it in the rewrite. For now, an explanation can be found in Andrew’s doc here, link is directly to the relevant section..

Maybe this will help. We have a global rule that sums must balance, and authorities are a way to locally (single TX scope) break that rule. That’s the fundamental principle for all further utility – constrain globally, relax locally. Another rule is that an authority can’t be created from nothing, it must be created from an authority to create authority. So how do we get authorities into the system? Genesis is the only place where that rule can be locally broken, where the first authority can be created, from nothing. And for it to be safe, we must be able to prove that genesis can happen only 1 time for any given groupID, and we do this by hashing so it’s computationally impossible to generate 2 genesis transactions resulting in the same groupID. If you could generate a “double” through birthday attack, you could publish one, and hold onto the other until the right time comes to abuse it.

andrewstone · March 4, 2021, 11:36pm

Tomz, while I am not against your suggestions I think that I’m going to wait for BCHN to weigh in. I like some of my decisions slightly better but do not have strong feelings. As the hash power leader, let’s be honest: what BCHN wants for any of these small things, they’ll get because I won’t block deployment of this for something small. For example, I pretty much expect a subgroup max length to happen like you’ve suggested. But what I don’t want to do is go off and change the code and then have to turn around and change it back.

To continue our discussion:

The problem with that is that it mixes layers. The code that interprets the group stuff doesn’t care much about Script. Script is really just the carrier wave. To the Group checking code there are simply two byte-arrays that it gets and needs to interpret.

I actually think that you are mixing layers because VarInts are part of TX serialization and this is accessed after that. Its also:

a much simpler format,
very common in non-bitcoin software,
most language have standard library routines to load/store integers in this format
putting a VarInt inside a pushed stack item redundantly encodes the integer length and wastes space.

But can we simply agree to the idea that OP_GROUP has to be first, otherwise the transaction is invalid (consensus). The same way that a single byte is appended for the sighash. That approach cleans up your spec too.

If we did that, script verification would require another pass, searching for any other OP_GROUP instructions (because parts of scripts may not be executed).

I’m ok with defining OP_GROUP as failing the transaction when executed, rather than popping its args like it currently does. And then requiring that the prefix " OP_GROUP" be chopped from the beginning of any script. However, I think you will find that doing it this way makes more changes because you have to specially handle OP_GROUP in more places. (Again though, I want to wait for BCHN’s opinion).

The suggestion was not to remove the bits from the groupId. Naturally the bits are part of the group Id, just not part of the hash (still not clear what is gained by it being a hash, btw).

Making it a hash ensures that the group id is unique upon creation. Post creation, the fact that its is a hash is mostly irrelevant. The one way its relevant is that it commits to a human readable contract (or a hash pointer to one). So the token creator can’t change the TOS underneath the holders.

Making the bits part of the hash means that they essentially cost nothing WRT blockchain storage space. We need 32 bytes of group ID to block wagner’s birthday attack, but by requiring group authors to do a bit of work to search for a hash, we can use those bits for another purpose, while still requiring the same work to be done for any attempt to find a group collision.

Also from a practical perspective, the group id fits in a 256 bit integer which is very nice. BTW, its also nice (but not necessary because we can use a different prefix) that the size is greater than the 20 byte bitcoin cash address. This is an easy way to tell them apart.

tom · March 5, 2021, 1:51pm

Thank you for continueing the discussion

Right, I guess I didn’t make clear what I meant becuase I had a suggested design in my head that doesn’t match the example one. So let me give a high level overview of this so we can clearly see the “layers” appear.

First of all, the Group code needs to access the data as stored in the script. But the important part is that those are stored in the output that is spent. So, not inside of the transaction we are checking, but we need to get the data from each of the inputs that are spent.

During validation this is already data that is accessed. And the best way to avoid a double-lookup, I would grafitate towards a solution where the validation process stores the 2 stack items on some transaction-meta-data object.
After the scripts are all validated and Ok, the group information can then be validated based on the data stored with the transaction itself (on the meta obj), avoiding any extra costs for lookups of previous output-scripts.
This means that the group-checking code (which operates on a per-transaction basis) just finds 2 byte-arrays for each input that actually is a group input.

I guess 1 & 2 are personal, my main criterea is to not introduce a new format. All transaction parsing code already knows the varint format that Satoshi invented. Sure, something else can be simpler, but its still more code to do nearly the same thing and more chances for mistakes (which covers 3).
Point 4 is false. No space is wasted because Bitcoin Script reserves OP_1 - OP_75 for direct pushes.

There is no such need. There are various simple solutions to avoid doing two passes. The simplest (kinda hacky) is to add a bool in the Interpreter.cpp call eval() which detects multiple during normal validation. A simpler (clean) solution is based on the general architecture I described above which detects the same, during validation that already loops over the script.

To be fair, we don’t need 32 bytes. The next step after 20 bytes hash (160 bits, or 80 in case of the birthday attac) is not 32 bytes (256 bits or 128). There is a lot in between 160 and 256.

That is a UX question, and while I understand the goal, you may not have included the cost in your calcuation. The current cash-address algorithms won’t work for non-20-bytes bytearrays. Something new has to be designed for that.

bitcoincashautist · March 6, 2021, 9:34am

I’m trying to understand this issue of multiple OP_GROUP instances. Is this picture correct?

opgroup

The blockchain would not even see those other instances until someone would post a redeem script, in which case there’s nothing special about OP_GROUP, it could be processed as if it was 2x OP_DROP or just ignored because it’d be cheaper, right? People can use the wide pallete of script to write whatever gibberish they want in the redeem script, so why go out of our way to give one opcode a special treatment there?

This code would recognize only 1 “prefix”, it wouldn’t allow multiple OP_GROUP to appear in the pubkey/output script as those would return TX_NONSTANDARD, right? So other OP_GROUPs could appear only in the signature/redeem script – and they can’t make much problems there, they even allow compressing a 2x pop operation.

If my understanding is correct, then Group Tokenization could be made to work by allowing this in the pubkey/output script: OP_PUSHDATA2 0x20 <arbitrary 32 bytes> OP_PUSHDATA2 0x08 <arbitrary 8 bytes> OP_DROP OP_DROP OP_HASH160 OP_DATA_X <redeem Script hash> OP_EQUAL and then the consensus code would set rules on that data and fail the tx if it doesn’t fit into group semantics. This would avoid a new opcode but it would be uglier.

PS isn’t VarInt sort of “double wrapping” the data? First you tell the push opcode how many bytes to expect, and then inside those bytes you again use a byte to tell the user how many bytes to expect. If we stored group data info directly into TX format, then varint would make sense. But since the push OP already provides a data size feature, why again introduce the size one more time into the data being pushed?

PPS

What do you mean by this, something new designed for what purpuose? Tokens can be sent to any existing address, we wouldn’t need a new format. We will need a standard (user/wallet layer) to present tokenIDs in human-readable format, link them to a ticker. There is word on that in the doc, Appendix C, but it is outside the scope of this CHIP. Maybe it’s for another CHIP to recommend a standard way to track token metadata across wallets.

tom · March 6, 2021, 10:17am

To avoid the long discussions from becoming too long, I’ll open issues on the CHIP repo with the change requests.

Here is the first.

tom · March 6, 2021, 10:21am

Your question:

This question is very confusing because Andrew wrote this:

If nobody addresses a group-id, then why would Andrew write this? There is a consusion somewhere. Maybe you can spot the origins.

bitcoincashautist · March 9, 2021, 9:18am

Good idea, we can keep everything more tidy that way and anyone interested on the status of certain issues can visit the relevant threads. I’ll make a list here of all that were mentioned in the discussion above, thanks for creating those issues!

Rewriting technical part - done and closed. ~~Todo: write about advanced features (subgroups, hold BCH, covenant)~~ Done.

Include specification in the CHIP body - done and closed

Treatment of multiple OP_GROUP opcodes - discussion ongoing

Using VarInt for data values - discussion ongoing

Encoding of the quantity | flags variable - discussion ongoing

Limit the size of subgroup - discussion ongoing

tom · March 6, 2021, 4:46pm

Yeah, let me know when the issues are fixed.