CHIP 2021-01 PMv3: Version 3 Transaction Format

Responded to @andrewstone here: Native Group Tokenization - #3 by bitjson

1 Like

Another summary of this proposal derived from telegram discussions:

Bitcoin cash covenants already support introspection/validation of outputs, enforcing rules for future transactions.

PMv3 allows us to do validation looking backwards, inspecting elements of a covenant’s parent transaction and enforcing rules for parent transactions.

Support for both forward and backward validation unlocks a huge number of contract use cases: a lot of complex applications happening on “global state” chains like Ethereum can be implemented more scalably on BCH, with no other changes to the VM. (Details: Hashed Witnesses: Build Decentralized Applications on Bitcoin Cash)

Personally I prefer small incremental steps. Much easier to roll out and debug. As such it is important to notice that indeed they strenghthen each other, but they don’t depend on each other. They can be in two individual upgrades if that is the way it turns out.

Personally I prefer small incremental steps.

Strongly agree. Minimal changes, one at a time. :+1: :+1:

They can be in two individual upgrades

Sorry, I wasn’t clear – there’s only one upgrade we still need. The forward validation part is already working on mainnet. (E.g. CashChannels)

This PMv3 proposal focuses only on the minimum upgrade to support backward validation: “compression” of past proofs + script integer compatibility. With these two elements, we can succinctly introspect parent inputs in the same way we can currently introspect future outputs.

The other “features” of PMv3 (reduced TX size and an upgrade path to fractional satoshis) are just helpful byproducts of these two minimum changes.

So I strongly agree. There’s a chance people will want to pack in other changes too, so I tried to describe here why I think we should keep PMv3 minimal: CashTokens & PMv3: fixed-size inductive proofs, transaction integer compatibility - #2 by bitjson

After some more discussion on Telegram, I think it might be helpful for me to copy/paste some of the “most interesting” parts of the CashToken Demo here for start-to-end viewing (and mobile readers).

If you’re on a modern desktop browser, you can also jump right into the CashTokens template on Bitauth IDE:

CashTokens template screenshot

CashTokens Template Summary

This template provides scripts for each stage of a simplified "depository"
CashToken Corporation, where users may deposit 10,000 satoshis in exchange 
for a CashToken. This implementation supports exactly 4 homogeneous tokens. 
Tokens are tracked in a 2 level Merkle tree:
     rt
    /  \
   b0   b1
  / \   / \
 a0 a1 a2 a3

The covenant begins with no depositors (all leaves start as the hash of 0) and a
dust balance. A user deposits 10,000 satoshis, receiving a CashToken (recorded
at a0) secured by a single key. Then the CashToken is "notarized" to prove it is
a descendant of the mint transaction. Once notarized, the owner transfers the
CashToken to a 2-of-3 multisig for stronger security. From there, the owner
transfers the token to a different owner (now single sig). Finally, the new owner
redeems the CashToken with the corporation, withdrawing their 10,000 satoshi
deposit, and replacing a0 in the tree with the hash of 0 (making the leaf available
to other depositors).

There are 5 transaction types demonstrated:
- tx0: Incorporation TX - the "corporation" is established with a dust balance
- tx1: Token Mint TX - a CashToken is created
- tx2: Notarization TX - the CashToken is proven to be valid
- tx3: Token Transfer TX - the CashToken is transferred independently
- tx4: Token Redeem TX - the CashToken is redeemed with the "corporation"

Transaction Graph/Timeline:

tx0 --- tx1 -------------------------------------- tx4
           \--- tx2 --- tx3 --- tx3 --- ...tx3 ---/

Transaction Contents:

tx0: Incorporation TX - 
  Input 0 - any funding output (dust amount + tx fee)
  ---
  Output 0 - Corporation Covenant (dust amount)
  Output 1 - change output

tx1: Token Mint TX -
  Input 0 - tx0 output 0
  Input 1 - any funding output (10,000 satoshi deposit + tx fee)
  ---
  Output 0 - Corporation Covenant
  Output 1 - CashToken Covenant (Token ID is tx0 hash)
  Output 2 - change output

[Token circulates separately from the corporation covenant]

  tx2: Notarization TX
    Input 0 - tx1 output 1
    Input 1 - any funding output (tx fee)
    ---
    Output 0 - CashToken covenant
    Output 1 - change output

  tx3: Token Transfer TX
    Input 0 - tx2 output 0
    Input 1 - any funding output (tx fee)
    ---
    Output 0 - CashToken covenant
    Output 1 - a change output

[Token is redeemed with the corporation covenant]

tx4: Token Redeem TX
  Input 0 - tx1 output 0
  Input 1 - tx3 output 0
  Input 2 - any funding output (for fees)
  ---
  Output 0 - Corporation covenant
  Output 1 - a change output (+10,000 satoshi withdrawal)

---

Additional Notes

Indestructible Dust – Once created, this CashToken corporation
implementation cannot be destroyed. The covenant could be modified
to allow the final dust balance to be collected from the covenant if all
shares are unallocated, but the additional bytecode required over the
life of the covenant is likely to be more costly than the dust amount "saved"
when the covenant is abandoned. It's worth noting that any bitcoin user
can create an unlimited number of dust outputs at any time, so dust from
covenants is irrelevant from the perspective of the network.

Token Mint (Unlocks the Corporation Covenant)

/**
 * To mint a token we must prove:
 * - TX output 0 is the next covenant state where:
 *   - the new tree replaces an empty leaf (<0>) with the covenant outpoint TX hash
 *     - requires: new tree root, all opposite nodes, proof path
 *   - the new covenant UTXO has a value of 10,000 satoshis greater than before
 * - TX output 1 is the expected token pattern
 *   - requires: m-of-n and public keys to use for the token
 * 
 *     rt
 *    /  \
 *   b0   b1
 *  / \   / \
 * a0 a1 a2 a3
 * 
 * This example verifies that a0 is `0`, then replaces
 * it with the outpoint TX hash.
 */

<root_hash_after_mint> // rt
<sibling_tier_2_hash> // b1
<sibling_leaf_hash> // a1

<1> // tier 2 is left side (requires swap)
<1> // leaf is left side (requires swap)

<1> // enter branch: Mint Token

Corporation Covenant

<empty_root_hash> OP_TOALTSTACK 

OP_IF
/**
 * branch: Mint Token
 */

/**
 * we need 2 copies of the "validation path": 
 * 1. confirm the leaf was empty
 * 2. confirm the new root only changes that leaf
 */
OP_2DUP
OP_FROMALTSTACK OP_ROT OP_ROT // current_root_hash
OP_TOALTSTACK OP_TOALTSTACK

// First we confirm the leaf being updated was empty:
<4> OP_PICK // sibling_tier_2_hash
<4> OP_PICK // sibling_leaf_hash
<0> OP_HASH160 // constant: hash160 of <0>

OP_FROMALTSTACK // check if leaf requires swap
OP_IF OP_SWAP OP_ENDIF
OP_CAT OP_HASH160

OP_FROMALTSTACK // check if tier 2 requires swap
OP_IF OP_SWAP OP_ENDIF
OP_CAT OP_HASH160 // should be current root

OP_EQUALVERIFY // verified that replaced leaf was <0>

// Now we confirm that the new leaf uses outpoint TX hash

OP_TOALTSTACK OP_TOALTSTACK

OP_OUTPOINTTXHASH
OP_HASH160

OP_FROMALTSTACK // check if leaf requires swap
OP_IF OP_SWAP OP_ENDIF
OP_CAT OP_HASH160

OP_FROMALTSTACK // check if tier 2 requires swap
OP_IF OP_SWAP OP_ENDIF
OP_CAT OP_HASH160 // verified new root

OP_DUP OP_TOALTSTACK // save for later use
OP_EQUALVERIFY // new root is correct


// require that output 0 is 10_000 satoshis larger than last value
OP_UTXOVALUE
<10_000> OP_ADD
<0> OP_OUTPUTVALUE
OP_EQUALVERIFY // covenant has been paid properly

// require that the correct covenant script is used
<OP_PUSHBYTES_20>
OP_FROMALTSTACK // root_hash_after_mint
OP_UTXOBYTECODE // get this covenant's bytecode
/**
 * OP_SPLIT is often unsafe for user input, but this input
 * comes from the VM/previous contract.
 */
<21> OP_SPLIT OP_NIP // trim off previous root hash push
OP_CAT OP_CAT
OP_HASH160
<OP_EQUAL> <OP_HASH160> <OP_PUSHBYTES_20> OP_2SWAP
OP_CAT OP_CAT OP_CAT
// require that output 0 is the new covenant bytecode
<0> OP_OUTPUTBYTECODE
OP_EQUALVERIFY // TX pays to the updated covenant

// require that output 1 is a cashtoken UTXO
<1> OP_OUTPUTBYTECODE
<0xaabbccddee> // TODO: push expected cashtoken bytecode
/**
 * We get the Token ID from the last covenant TX ID. This: 
 * 1) prevents tokens from being created to impersonate existing tokens
 * 2) saves bytecode space (vs. pushing an ID)
 * 3) avoid bugs in wallet implementations (validating at contract-level)
 */
// TODO: use <0> OP_OUTPOINTTXHASH to get the Token ID, concat into CashToken covenant
OP_EQUAL
OP_ELSE
/**
 * branch: Redeem Token
 * TODO: leaving out of first draft, see notes in unlocking script
 */
OP_ENDIF

Notarization (Unlocks Token Covenant)

/**
 * This is the initial "Notarization" spend. It can only 
 * be successful if the parent (mint) transaction includes 
 * the outpoint TX hash referenced by the CashToken
 * covenant (in its 0th output). 
 * 
 * To prove that this notarization step has been
 * completed successfully, future spends need only 
 * prove that the covenant has previously been
 * successfully spent.
 * 
 * This implementation does not offer a way to transfer
 * the token to a new set of holders during the notarization
 * step. Rather, the notarization transaction must lock funds
 * in a covenant with the same signing requirements as those
 * specified in the mint transaction. This is a feature:
 * notarization can be safely performed by anyone, making
 * CashTokens slightly easier to implement in some types of
 * wallets (and reducing the size of the covenant by saving
 * on validation bytecode).
 */

<example_tx1> // TODO: switch to `<parent_mint_tx>`, fix scenario generation error (using `example_tx1` for `bytecode.parent_mint_tx`)

<0> <0>

Token Transfer (Also Unlocks Token Covenant)

/**
 * This script is use to transfer a notarized CashToken between
 * wallets. This implementation supports multisig wallets of
 * up to 3 keys.
 * 
 * Counterintuitively, the token transfer branch of this contract
 * does not place any limitations on the outputs to which a token
 * is spent. Instead, only its parent and grandparent transactions
 * are validated to ensure the tokens lineage. However, because
 * this covenant is impossible to spend without the proper
 * "proof-of-lineage", if any transfer does not properly continue
 * the covenant, the token is burned.
 * 
 * This means it is always possible to "burn" the token in one final
 * transaction – in this authentication template, the token should
 * be burned/redeemed back into the parent corporation covenant
 * (which itself then validates that the token's lineage was
 * unbroken). With this strategy, the parent covenant can validate
 * the authenticity of tokens.
 */

[...]

Token Covenant

/**
 * This script is the key to the inductive proof. It's 
 * designed to be unspendable unless:
 * 1. Its parent was the mint transaction funded by the transaction
 *    hash claimed (requires full parent transaction), or
 * 2. It was previously spent successfully – the parent's 0th input's
 *    UTXO used this covenant, i.e. the parent's 0th-input-parent's
 *    0th output uses this covenant. (Valiated by checking the full
 *    parent and grandparent transactions.)
 * 
 * With these limitations, if a user can move the CashToken,
 * we know the CashToken has the lineage it claims.
 */

/**
 * By "baking-in" the code branch selection, we prevent the
 * CashToken from being unecessarily notarized multiple times.
 */
<is_notarized>

// mint parent hash: the outpoint TX hash in the 0th output of the mint transaction
<tx0_hash> 

OP_TOALTSTACK // tx0_hash
OP_TOALTSTACK // is_notarized

<1> // threshold (m)

// push n public keys (up to 3)
// TODO: convert to variable
<first_holder_key.public_key>

<1> // count (n)

/**
 * Though this is only checked by the transfer branch,
 * it drops all ownership-related items from the stack
 * more efficiently than can be done with manual dropping
 * operations.
 * 
 * Note, notarizations could be restricted to token owners by
 * replacing this with OP_CHECKMULTISIGVERIFY.
 */
OP_CHECKMULTISIG

// in both branches, verify the full parent transaction is provided
OP_DUP
OP_HASH256
OP_OUTPOINTTXHASH

OP_DROP OP_DROP // debug: switch comment to skip check
// OP_EQUALVERIFY // verfied provided transaction is parent

OP_FROMALTSTACK // is_notarized
OP_NOTIF
    /**
     * Notarization branch:
     * Prove that this transaction's parent is the claimed token mint transaction,
     * i.e. its 0th input spends from the claimed token ID (outpoint TX hash).
     * 
     * Note, this branch can be executed by any interested observer (doesn't 
     * require access to any private keys), so it must be carefully validated
     * to avoid griefing.
     */

    OP_DROP // drop failed multisig check

    <4> OP_SPLIT OP_NIP // remove and discard tx version
    <1> OP_SPLIT OP_SWAP // get first byte of tx input count
    /**
     * Between 0 and 127, Script Numbers and VarInts are compatible.
     * 
     * 127 (`0xfc`) is the largest integer which can be represented
     * by the Script Number format in a single byte (0x80 begins 
     * the range negative 0 through 127).
     */
    <2> OP_EQUALVERIFY // require exactly 2 inputs (covenant + fee funding)

    <32> OP_SPLIT OP_DROP // get 0th outpoint tx hash, drop everything else
    OP_FROMALTSTACK // tx0_hash
    OP_EQUALVERIFY // Token ID verified: parent transaction spends from claimed outpoint tx hash

    /**
     * Verify the transaction's 0th output is re-assigned
     * the updated covenant.
     */
    OP_UTXOVALUE
    <0> OP_OUTPUTVALUE
    OP_EQUALVERIFY // require output value to be the same
    <OP_HASH160 OP_PUSHBYTES_20>
    <OP_1> OP_UTXOBYTECODE
    <1> OP_SPLIT OP_NIP OP_CAT // remove is_notarized of OP_0, replace with OP_1
    OP_HASH160 <OP_EQUAL>
    OP_CAT OP_CAT // expected P2SH bytecode
    <0> OP_OUTPUTBYTECODE
    OP_EQUAL // 0th output bytecode is correct
    /**
     * TODO: further optimization: after notarization, can we prune the
     * notarization branch?
     * Requires other changes to validation in both convenants.
     */

OP_ELSE
    /**
     * Transfer branch:
     * Prove that the outpoint tx spends from this same covenant. Then
     * prove that this transaction is signed by the required private key(s).
     * 
     * Note: to save space, this branch doesn't validate the locking bytecode
     * to which the transaction pays, making it possible for wallets to burn
     * the CashToken (intentionally or due to a bug). This is unlikely to
     * be a problem in practice because CashTokens can only be moved by
     * wallets which support the CashToken covenant template (no
     * backwards-compatibility with wallet which might unintentionally
     * burn tokens).
     * 
     * By not forward-validating outputs, we elliminate the need for each
     * CashToken covenant to be capable of correctly "identifying" its 
     * parent covenant during redeem transactions. (Instead, we verify the
     * lineage of the token by checking its parent and grandparent.)
     */

    OP_VERIFY // Signature validation must have been successful

    // Owner has authorized transfer, now prove lineage:

    // get parent tx's outpoint tx hash, then verify we have the grandparent
    <4> OP_SPLIT OP_NIP // remove and discard tx version (no validation)
    <1> OP_SPLIT OP_SWAP // get first byte of tx input count
    <0x02> OP_EQUALVERIFY // require exactly 2 inputs
    <32> OP_SPLIT // get parent outpoint tx hash
    <4> OP_SPLIT OP_DROP // get parent outpoint index, drop everything else
    OP_BIN2NUM
    <0> OP_EQUALVERIFY // must be grandparent's 0th output (grandparent may be 1th for mint)
    OP_TOALTSTACK // parent outpoint tx hash

    // validate and concat grandparent back together, confirming it used this covenant
    <
      0x02000000 // always require version 2
      0x02 // always require exactly 2 inputs
    >

    OP_SWAP // top is grandparent input 0 outpoint (hash + index)
    OP_SIZE <36> OP_EQUALVERIFY // require grandparent input 0 outpoint to be 36 bytes
    OP_CAT

    OP_SWAP // top is grandparent input 0 bytecode
    OP_SIZE 
    OP_DUP
    <4> OP_ROLL OP_EQUALVERIFY // provided bytecode is expected size
    OP_NUM2VARINT // serialize length
    OP_SWAP
    OP_CAT
    OP_CAT // grandparent TX up to input 1

    
    OP_SWAP // top is grandparent input 1 outpoint (hash + index)
    OP_SIZE <36> OP_EQUALVERIFY // require grandparent input 1 outpoint to be 36 bytes
    OP_CAT

    OP_SWAP // top is grandparent input 1 bytecode
    OP_SIZE
    OP_DUP
    <4> OP_ROLL OP_EQUALVERIFY // provided bytecode is expected size
    OP_NUM2VARINT // serialize length
    OP_SWAP
    OP_CAT
    OP_CAT // grandparent TX through inputs
    
    // start building grandparent outputs

    OP_SWAP // top is grandparent is_notarized
    <0> OP_EQUAL
    OP_IF
      /**
       * Grandparent is not notarized (mint transaction), the 1th output is the covenant.
       */
      <3> OP_CAT // require exatly 3 outputs in mint transaction
      OP_SWAP // top is 0th output satoshi value
      OP_SIZE <8> OP_EQUALVERIFY // satoshi value must be 8 bytes
      OP_CAT
      <$(<23> OP_NUM2VARINT)> // length of corporate covenant bytecode (P2SH)
      OP_CAT
      OP_SWAP
      OP_SIZE <23> OP_EQUALVERIFY // corporate covenant bytecode must be P2SH length
      OP_CAT
      // end of 0th output
    OP_ELSE
      /**
       * Grandparent is notarized, the 0th output is the covenant.
       */
      <0x02> OP_CAT // require exactly 2 outputs in transfer transactions
    OP_ENDIF

    OP_SWAP // top is grandparent's cashtoken covenant value
    OP_DUP
    /**
     * To support heterogenous tokens, this covenant prevents the
     * satoshi value of the token from changing during transfers.
     * 
     * However, we can't simply verify the "next" satoshi value is the
     * same as the "current" value - eventually, the token will be
     * redeemed with the corporation covenant, and the covenant's 
     * expected balance at that time can't necesssarily be predicted.
     * 
     * Instead, we only verify backwards by comparing the token output
     * values of the grandparent and current transaction. With this
     * limitation, we can know the token's value can't be modified
     * until the redeem transaction, where the parent covenant can
     * read the value before it is destroyed.
     */
    OP_UTXOVALUE <8> OP_NUM2BIN
    OP_EQUALVERIFY // disallow token value from being modified
    OP_CAT

    < <23> OP_HASH160 > OP_CAT // length of cashtoken covenant bytecode (P2SH)

    // begin transforming parent bytecode into grandparent bytecode for validation

    <OP_0> // is_notarized for mint transactions
    OP_UTXOBYTECODE // get parent bytecode
    <1> OP_SPLIT OP_NIP OP_CAT // replace is_notarized with OP_0
    
    <36> OP_SPLIT // preserve token ID (OP_0 + OP_PUSHBYTES_32 + 32 bytes + OP_TOALTSTACK + OP_TOALTSTACK)
    
    <1> OP_SPLIT OP_NIP // remove parent threshold pushing opcode (<<m>>)
    <3> OP_ROLL // add grandparent_threshold pushing opcode
    OP_SWAP // top is parent bytecode after removed m push

    <4> OP_ROLL // get parent public key count opcode

    // TODO: PRs welcome – are there more efficient ways to implement this "case" statement?
    // check for parent n of 1, 2, or 3 (saving the opcode)
    OP_TUCK 
    <<1>> OP_EQUAL OP_IF
      <34> OP_SPLIT
      OP_ELSE
      OP_OVER <<2>> OP_EQUAL OP_IF
        <$(<34> <34> OP_ADD)> OP_SPLIT
      OP_ELSE
        OP_OVER <<3>> OP_EQUAL OP_IF
          <$(<34> <34> <34> OP_ADD OP_ADD)> OP_SPLIT
        OP_ELSE
          OP_RETURN // fail, parent must match a planned n
        OP_ENDIF
      OP_ENDIF
    OP_ENDIF
    /**
     * TODO: SECURITY: do we need to validate these bytes? (E.g. in the
     * corporation covenant, they must be validated.) Can a malicious
     * grandparent transaction use different opcodes in these bytes to
     * defraud future token holders?
     */
    OP_NIP // drop the removed parent public key pushes
    OP_NIP // drop the parent public key count opcode
    <1> OP_SPLIT OP_NIP // drop parent n opcode from bytecode

    <4> OP_ROLL // pick grandparent public key count opcode
    // TODO: PRs welcome – there are definitely more efficient ways to implement this one
    OP_DUP
    <<1>> OP_EQUAL OP_IF
      <OP_PUSHBYTES_33> <6> OP_ROLL OP_SIZE <33> OP_EQUALVERIFY
      OP_CAT
      OP_ELSE
      OP_DUP <<2>> OP_EQUAL OP_IF
        <OP_PUSHBYTES_33> <6> OP_ROLL OP_SIZE <33> OP_EQUALVERIFY
        OP_CAT
        <OP_PUSHBYTES_33> <7> OP_ROLL OP_SIZE <33> OP_EQUALVERIFY 
        OP_CAT OP_CAT
      OP_ELSE
        OP_DUP <<3>> OP_EQUAL OP_IF
          <OP_PUSHBYTES_33> <6> OP_ROLL OP_SIZE <33> OP_EQUALVERIFY
          OP_CAT
          <OP_PUSHBYTES_33> <7> OP_ROLL OP_SIZE <33> OP_EQUALVERIFY
          OP_CAT OP_CAT
          <OP_PUSHBYTES_33> <7> OP_ROLL OP_SIZE <33> OP_EQUALVERIFY
          OP_CAT OP_CAT
        OP_ELSE
          OP_RETURN // fail, grandparent must match a planned n
        OP_ENDIF
      OP_ENDIF
    OP_ENDIF
    OP_SWAP OP_CAT // concat grandparent n opcode after public key pushes

    OP_SWAP
    OP_CAT OP_CAT OP_CAT // reconstructed grandparent redeem bytecode
    OP_HASH160 // get redeem script hash
    OP_CAT
    <OP_EQUAL> OP_CAT

    OP_SWAP // top is remaining TX serialization

    // no need to verify remaining grandparent outputs

    OP_CAT // full grandparent transaction
    OP_HASH256 // grandparent transaction hash
    OP_FROMALTSTACK // outpoint tx hash from parent

    OP_DROP OP_DROP <1> // debug: switch comment to skip check
    // OP_EQUAL // verify grandparent is parent outpoint tx

    // (don't bother dropping the mint parent hash left on altstack)

OP_ENDIF

Again, these are just the most important parts.

As you can see, I’ve made comments documenting many design decisions and noting various security and usability concerns. You can read a more thorough intro on the CashToken Demo Gist.

If you’re on a supported browser, you can also jump right into the CashTokens template on Bitauth IDE to experiment with the full evaluation traces.


PMv3’s “hashed witnesses” allow scripts like the above Token Covenant to “compress” their parent transaction’s witness data (by hashing it), avoiding the uncontrolled growth in transaction sizes which currently prevents covenants from introspecting parent transactions. The second change proposed by PMv3 allows scripts to parse the integer format used in these parent transactions by making them compatible with the existing Script Number format. (Which also happens to save ~5% in transaction sizes.)

1 Like

Hashed Witnesses vs. MalFix, SIGHASH_NOINPUT, SegWit, etc.

The hashed witnesses half of this proposal could also be solved by any transaction malleability solution which removes witness data from the parent transaction’s outpoint transaction hash.

If there was any demand for a full transaction malleability solution, we might want to consider implementing that instead of hashed witnesses.

I personally don’t see the need for an additional “malleability fix”, so I prefer the hashed witnesses solution:

  • hashed witnesses avoid creating two different TXID formats (e.g. TXID and WTXID) – this keeps the system as simple as possible, ensuring the competing ID formats never confuse user experiences, and ensuring consistency between e.g. the TXID in SPV proofs and the TXID referenced in contracts.
  • hashed witnesses avoid the need for an additional TXID index – nodes don’t need to maintain an index to match the two TXID formats (with and without witness data) even at the P2P layer.
  • hashed witnesses don’t affect block merkle root construction or any system components other than v3 transactions (depending on the solution – malleability fixes can be more disruptive)

Unless someone can point out advantages in implementing a malleability solution instead, I prefer that this problem be solved with the hashed witnesses solution. I started a thread to collect discussion if anyone is interested:

Hashed Witnesses vs. TXID Merkle Trees

Just to document another possible alternative to hashed witnesses: a new transaction version could also use a new merkle tree construction for its TXID.

Instead of hashing the full, serialized transaction, each major component of the transaction could be hashed, allowing child transactions to only introspect e.g. a particular output during a proof. (So in a sense, this would be hashed witnesses + hashed everything else, in a tree.)

While this may save a few bytes for contracts which introspect a parent transaction, it would require significantly more hashing per transaction. And even for a relatively small type of tree, e.g. hash256([version][input merkle tree][output merkle tree][locktime]), the small additional savings would probably not be worth it. (Proofs would still require 32 bytes * node count; a minimal 330 byte proof could maybe be reduced to ~128 bytes, but at the cost of requiring possibly 10x as much hashing when computing TXIDs.)

I think hashed witnesses are a much better choice, only costing one additional hash for any inputs which use them (and users pay for the extra bytes, discouraging unnecessary use).

For users just seeing this thread: I’m hoping to get other commonly asked questions documented in an FAQ soon. I’m happy to answer any questions here or in the new CashToken Devs Telegram group.

Group is essentially having a new kind of cash bill with a 546 sats BCH anti-fraud strip attached to each bill (UTXO) which can later be reclaimed if the token becomes useless, even without melt authority - by consolidating all your tokens into one UTXO and reclaiming the anti-fraud strips from all other inputs. Since I realized this simple truth, I’m having trouble sleeping, I feel the need to write, to make noise, to let everyone know. Writing this had me up all night. Maybe you will cringe at me “selling” it like that, but I just want everyone to reach the “aha!” moment with this. We all need to want this. Maybe you will feel annoyed that I’m making noise. Well, that’s a small price to pay for what we could be getting. The missed opportunity cost of not having it due to ABC has already been huge. We could have owned USDT market.

It’s really a small protocol change. The consensus rules are simple and conceptually easy. Using those rules in a meaningful way from a wallet dev. perspective could be seen as large but they don’t have to support everything. Basic token transactions are as simple as any BCH transaction, it only needs to have the OP_GROUP quantity balance in addition to the BCH balance. Token mint/melt/authorize is as simple as including an extra UTXO in the inputs, which affects the balancing rule. Authority is just another balance to check. Token genesis is as simple as creating a single UTXO.

Building blocks are simple, that’s what Group is. What you build with them may or may not need to be complicated, but the blocks themselves aren’t complicated, they couldn’t be any simpler.

They aren’t, and that’s the great news! Since every Group UTXO is also a BCH UTXO you can do with those whatever you want from any smart contract platform we may roll out. I just disagree on the priorities when it comes to deployment. Ok, I’m a nobody, but please hear me out.

Group should come first, as soon as possible, and as a minimum viable (but future proof) product: mint/melt/baton/maybe subgroup because people also need time to work on wallet software once it’s out. We have time, but we have to work together. Have group first, and think about how you could use them from whatever smart contract platform you want ie work with the assumption that you will have Group tokens to use. That way we can get the synergy!

Everybody already has smart contracts that look like cash, nobody has actual hard cash that smart contracts can use, yet.

Cash is not just one currency.

Having them available before any smart contract platform will mean that anyone wanting tokens will prefer it for simple token usage, for which there’s lots of demand as we have seen. And later when you roll out smart contracts, you should be operate on those tokens exactly the same like you operate on BCH. Only exotic tokens should require being implemented by the contract. For a simple token, miner doesn’t need you to tell them that 1+1=2, they ought to know that already, and check for that in C++, the same place where they check the BCH “contract”.

Everyone is thinking in smart contracts, so they can’t see the simple truth: smart contracts need smartly designed electronic cash system. That’s what Group is.

@bitcoincashautist would you mind moving this comment to the Group Tokens topic? It would be valuable to keep that discussion in one place.

I don’t need further convincing: token support is valuable. :+1: I think that adding built-in support to the protocol could eventually make certain constructions slightly more efficient.

My concern remains that committing to a large protocol expansion like Group Tokens is a one-way change, and getting it correct in “one shot” is much less likely than if we allow the ecosystem to first experiment with tokens in “userland” before we lock ourselves to one particular token standard. I wrote more detail in that topic – if you’re interested, I would appreciate answers there.

For this topic:

Have you gotten to review CashTokens & PMv3 yet? Please don’t let the script stuff scare you, there’s no complex math: the “parent introspection” is just checking that two hashes are equal, and the “inductive proof” is just checking that two scripts are equal.

If contracts can efficiently inspect the contents of their parent transactions, wallet developers can design all kinds of new contracts without waiting for “permission” or BCH protocol changes – including token implementations like CashTokens.

2 Likes

Not only that, it would also require all parties that now can just hash a bytearray to introspect and understand the transaction.

Thanks for getting back on this and explaining a bit more about this on the dev-chat yesterday.

As we talked about here I’ll write my thoughts on the topic of the var ints.

Here the arguments for changing the transaction formats parts which encode numbers:

  • The current transaction format has 3 different ways to encode integers. You name 4 as you include the one inside of the script as well.
    The idea to replace all with one is attractive from a purely architectural point of view.
  • Bitcoin Script has severe problems with the var-width encoding when doing covenants.
  • Variable-width encoding gains in tx size are a couple of percent.
  • Variable width means that the ‘amount’ field can grow beyond the 64-bit limit it has today.
    Caveat is that the 21 million limit we have today only takes 6½ bytes. We could multiply it with 8196 without needing more bytes.

So, really the main reason you suggested this is because Bitcoin Script doesn’t understand one of the 3 used integer formats. The var-int. And your suggestion is for the rest of the transaction to use the integer encoding that the scritping VM already is familiar with.

So, I have a counter suggestion. Two actually.

Use a more common encoding scheme.

My thinking here is that interpreting transactions, at least at the basic level, is going to be something that a LOT of software libraries will do. On the other hand the amount of places that implement a VM will be much lower. Probably only the full nodes and some specialized software.

Based on this, I vastly prefer to solve your basic problem of Bitcoin Script not understanding the format by introducing two opcodes. One for each of the conversion directions and picking a format that is actually much more standardized. See Variable-width encoding - Wikipedia. Think UTF-8 as an example of this.

Now, if you are going to break 100% of all transaction parsing code-bases, I think it makes a lot of sense to include a little upgrade that makes future transaction format changes much easier. This is based on reseach I did some years ago in the project called “Flextrans”.

The basic idea is that each field becomes a “name / format / value” type. This allows inserting new data in the transaction at a later version without forcing everyone to AGAIN rewrite the entire transaction parsing. Just like adding a new html tag to the standard doesn’t require you to update the old parsers.

This thinking got me to suggestion two:

Use the current size-formats.

So, we can go fancy and require everyone to rewrite the format, while also requiring two new opcodes to do the conversion between tx-numbers and script-vm-numbers.
But, hang on. If you need those opcodes anyway, then why would we change the numbering format?

I liked (and still do) the FlexTrans approach. But the gains are small and the change intrusive.

I’d vastly prefer to not have a different numbering format and simply add two opcodes for the conversion.

2 Likes

In my deep dive into the protocol the disparity between the screams for restricting transaction sizes or growth of data and then how so many single digit numbers are expressed across multiple bytes has always confused me a bit. :slight_smile:

Your example of UTF-8’s encoding model is a good one. The Smalltalk community had something called State Replication Protocol (SRP) which was a way of streaming objects and entire networks of objects across different dialects of SmallTalk and even completely disparate hardware environments.

Similar to UTF-8, the high bit of a byte (octet as they called it - the single primitive in SRP) determines whether or not the value you’re dealing with continues on to the next octet or finishes there. Ultimately you get 7 bits of data for every octet. Any value under 128 is represented in a single byte. There’s also no limit to how large a number you can represent and no endian considerations to confuse.

Were I to make a new VM for a cryptoledger to execute on, I’d make SRP the native representation of all data. It naturally compresses the size of most data by eliminating wasteful fixed sizes above 8 bits. A type like this in BCH could be really helpful in eliminating wasteful padded bytes.

Thank you @tom for the detailed review! And thanks @proteusguy for digging into the spec too! I want to continue with some more research on the encoding options before I make changes to the spec and/or respond to specific points. (I’m going to be mostly offline from March 5-15, but I’m hoping to focus on this again at the end of March. Also posted in CashTokens Devs.)

For anyone interested in a high-level summary of this topic, I recently spoke with @GeorgeDonnelly about hashed witnesses, PMv3, CashTokens, prediction markets, synthetic assets, decentralized exchanges, BCH scaling considerations, comparisons with ETH, and more:

2 Likes

I think PMv3 is a great proposal. It’s minimal, powerful and backwards compatible. :clap:

It did take me several days to wrap my head around it, and I thought it might to useful to explain where I got confused along the way …

To start with it’s the first time I’ve seen the term witness used in context of Bitcoin Cash. We already have the terms input script, scriptSig, redeemScript and unlocking script, so I wasn’t sure why the new term. Then to make matters worse I read “hashed witness” to mean “the hash of the witness” as opposed to “the full witness which is referenced elsewhere by its hash”. I spent ages trying to understand why you’d optionally append a hash at the end of the transaction :man_facepalming:. This led me further astray as I progressed through the doc: “If the parent transaction chose to provide a hash”… my interpretation: oh that must be the optional hash at the end of the transaction.

The term Hashed Unlocking Script would still have confused me here. Perhaps Detached Input Script, Detached Unlocking Bytecode or just Detached Script / Detached Bytecode would have been clearer. Then you could replace “Optional Hashing of Witness Data” with “Optional Detachment of Input Scripts”.

In the rationale section I found it useful to think of this change as enabling provenance (restrictions on ancestors) as opposed to covenants (restriction on descendants). A simple example of provenance would be a proof that “my parent has the same redeem script as me”. This can be translated to “my grandparent pays the same P2SH address as my parent”. This can be proved as follows:

  1. include both the parent and the grandparent transaction in the input script (in front of the redeem script)
  2. in the redeem script, inspect them and verify they have the matching output scripts
  3. in the redeem script, verify the embedded parent by comparing its hash to an outpoint hash in the current transaction
  4. in the redeem script, verify the embedded grandparent by comparing its hash to an outpoint in the parent
  5. To avoid infinite growth, embed truncated transactions which are sufficient to inspect their output scripts (in step 2) and calculate their hashes (in step 3 and 4) .

Side note: I use the term redeem script because locking script is ambiguous. It could refer to the redeem script (as I think it does in Bitauth IDE) or it could refer to the scriptPubkey. I might be a bit confused here and would be happy to be corrected. Also while on terminology I would like to suggest that there is no such thing as parent introspection, only parent inspection.

In the scheme above, the wallet which constructs the transaction is responsible for obtaining the raw parent and grandparent transactions so that they can be embedded. Nothing in the virtual machine allows inspecting the parent or grandparent transactions. But that’s not what I thought at first. Why? The citation of CashTokens as an example in the Medium article is an example of “Proof by Induction” but is not referenced in that section, it’s referenced under “Fixed Size Inductive Proofs” so I was looking to find some of the answers about PMv3 there. What I found was new opcodes, so I thought that these were parent inspection opcodes, but no mention of opcodes in the PMv3 spec :thinking:. Eventually I realised they were introspection opcodes related to TxInt and were not part of PMv3, and that fixed size inductive proofs don’t need parent inspection opcodes (although they would certainly be one solution).

Basically I was completely confused due to lack of knowledge on the subject matter. Once I cleared up all of the above, the penny dropped and I had the eureka moment. All down to my lack background, and I’m only brain-dumping my confusion here in the hopes it will help others at my level get there faster. Because wow, it’s awesome! Well done on thinking this up.

2 Likes

I can’t see how Hashed Witnesses affect signing serialization, so I’m wondering if a transaction could be malleated by moving a witnesses from its usual location in the input to hashed witnesses section?

1 Like

Thanks for the detailed walkthrough @rnbrady, this is really helpful!

I’ll try to revise the spec to make those areas clearer. And that step-by-step summary probably needs to be included too.

Also agree the terminology needs work – “detached” could be a good option, I’ll explore that when I working on incorporating everything else. (And thanks for the catch on “parent introspection” → “parent inspection”, I can’t unsee it now. :man_facepalming:)

Ah, you’re right! As specified, there’s a third-party malleability vector. Not sure if I have a solid answer yet. It’s just one bit of data we need to commit to for each signature (either true or false), so it’s a good candidate for a sighash/type flag. That will definitely need to be in the spec, thanks.

1 Like

Just an update for those following this topic: I’m continuing to work on a large decentralized BCH application as a more full-featured example. It’s a “two-way-bridge” covenant for a prediction market with support for millions of shareholders, an on-chain continuous IPO, and a quarterly BCH withdrawal process (which uses a Hivemind-like voting/commitment system among shareholders).

If you’re curious, you can get a sense for the plan in the video above and in the BCH Prediction Markets working group on Telegram.

I think a large, real-world example will be valuable for evaluating PMv3 and several of the other VM-related proposals. There are several dependencies I’ll need to iron out, so I’ll also release that tooling over the next few months as I’m working on the larger goal.

3 Likes

You know, I think another discussion made this finally “click” with me just now!

While pondering this, I realized that CashTokens is exactly where you’d arrive at if you want to solve those problems I describe! Let me see if I got it right:

  1. You implement all CashToken interfaces in the actual contract, hash it, and it becomes the genesis. Nobody knows it’s a CashToken genesis because they only see the P2SH hash in the UTXO at this point and it could be any contract. Only those with whom the author shared the contract via a side-channel could know.
  2. You then make another TX spending it, where the full contract is written to the blockchain in the input script and revealed to the world.
  3. Through a covenant, you enforce the outputs to carry that same token contract, and it can be hash-compressed in the following TXes.
  4. The new P2SH hash hashes the previous hash+token contract, that’s why it’s always different but that only proves the input script == output script, right?
  5. The HashedWitness is then needed to prove that input script == previous TX output script
  6. By induction, any CashToken output can then prove its lineage back to genesis
  7. Anyone can verify that he’s got a token from just the receiving TX and the contract. I’m unsure where he obtains the contract, from the genesis TX? In all TXes that follow it’s compressed into a hash yeah?

Script stuff still scares me but the nice thing is: I don’t need to understand it to understand how CashTokens work!

One detail, where do you store the token state? It must live outside the part that’s enforced to be the same in this covenant chain. So I guess it’s somewhere in the input script and that part doesn’t need to satisfy the in==out but must satisfy CashToken semantics which are verified with the fixed part, something like that yeah?

Hey everyone, just wanted to share an update on PMv3:

I’ve spent a lot of time testing these ideas over the past few months, and I stumbled upon a modification to the PMv3 draft that both 1) fixes a malleability vector and 2) opens a path for signature aggregation.

Background

In the first draft of PMv3, I had originally discounted 3rd-party malleability as a significant problem. In the Bitcoin Cash world, we have both covenants and reasonably secure zero-confirmation transactions: malleability is at most an inconvenience. (We’re not trying to migrate most user activity onto chains of signature-less transactions.) However, when @rnbrady identified the arbitrary detaching/re-attaching above, I realized malleability needs to be directly addressed for detached proofs (formerly “Hashed Witness”), since 3rd parties could actually disrupt important activity (beyond just fiddling with the TXID before confirmation).

In working on solutions, I spent a lot of time thinking about optimizing contracts, covenants, and transactions in general. I realized there are several closely related problems that an ideal solution should cover:

  • Malleability makes contracts less efficient and harder to validate – most non-trivial contracts must carefully validate all unlocking bytecode data to prevent vulnerabilities introduced by malleation, and this validation both bloats the contract and makes it harder to review for security. (For example, most covenants which use OP_SPLIT are vulnerable to a sort of “padding” attack which is not intuitive to first-time contract authors.)

  • The primary blocker to deduplication in transactions is unlocking bytecode malleability – because unlocking bytecode typically contains signatures (and signatures can’t sign themselves), unlocking bytecode is excluded from transaction signing serialization (“sighash”) algorithms. This is also the reason why unlocking bytecode must contain only push operations – the result of any non-push-including unlocking bytecode is a “viable malleation” for that unlocking bytecode. But if unlocking bytecode is signed, transaction introspection operations offer an opportunity to further reduce transaction sizes via deduplication. In a sense, if non-push operations could be used in unlocking bytecode, transactions would effectively have safe, efficient, zero-cost decompression via introspection opcodes.

  • Signature aggregation could save >20% of network bandwidth/storage – we know signature aggregation could save a lot of space and possibly improve transaction validation performance, but there’s no good place to put signatures which are shared between inputs. While we don’t want to bloat PMv3 with signature aggregation, a good v3 transaction format should not ignore this problem.

There are several possible solutions for each of these, but there is one particular solution I think is very elegant – it’s now specified in the CHIP:

TL;DR:

  • “Hashed Witnesses” have been renamed to Detached Proofs.

  • Detached Signatures are now separated from Detached Proofs, and they both comprehensively solve third-party malleability and enable signature aggregation (some immediately, some via a future sighash upgrade like the one proposed by Chris Pacia).

  • The PMv3 specification is now a proper CHIP (CasH Improvement Proposal): CHIP-2021-01-PMv3, and includes a lot more supporting detail outside of just the technical specification.

I’ve incorporated the answers to many of the questions from this thread, so thank you again to everyone who has contributed here.

I’d appreciate any feedback, reviews, or questions you have, either in this thread or in the GitHub issues. Thanks!

2 Likes