CHIP 2021-05 Targeted Virtual Machine Limits

This proposal replaces several poorly-targeted virtual machine (VM) limits with alternatives which protect against the same malicious cases while significantly increasing the power of the Bitcoin Cash contract system:

  • the 520-byte stack element size limit is raised to 10,000 bytes.
  • The 201 operation limit is removed.
  • A new cumulative hashing limit is introduced, limiting contracts to 660 digest iterations per evaluation.
  • A new stack memory usage limit is introduced, limiting contracts to 130,000 bytes of data on the stack.

This proposal intentionally avoids modifying other properties of the current VM:

  • The limits on maximum standard input bytecode length (A.K.A. MAX_TX_IN_SCRIPT_SIG_SIZE ā€“ 1,650 bytes), maximum VM bytecode length (A.K.A. MAX_SCRIPT_SIZE ā€“ 10,000 bytes), maximum standard transaction byte-length (A.K.A. MAX_STANDARD_TX_SIZE ā€“ 100,000 bytes) and consensus-maximum transaction byte-length (A.K.A. MAX_TX_SIZE ā€“ 1,000,000 bytes) are not modified.
  • The cost and incentives around blockchain ā€œdata storageā€ are not measurably affected.
  • The worst-case transaction validation processing and memory requirements of the VM are not measurably affected.

Comments, feedback, and reviews are appreciated, either here or on GitHub issues. Thanks!

7 Likes

Previous discussion on this topic beginning Feb 8, 2021:

Submitted this CHIP to the list at bch.info:

Great work! Having worked on the Group CHIP and studying the costs of proposed genesis setup that requires hashing operations I found myself wondering why arenā€™t we counting hash operations and ended up reading this CHIP a few times to give me an idea of the costs involved. :slight_smile:

This proposal is well thought out and allows much more utility without significantly increasing node operating costs, and it would be great if it could make it into 2023 upgrade window.

Btw, I also discovered this analysis of digest and signature verification costs, just for reference: Benchmarking Hash and Signature Algorithms | by Michael Zochowski | Logos Network | Medium

1 Like

The VM limits have come up again in recent discussion now that the VM is much more powerful after the CashTokens upgrade. Both myself, Pat from mainnet-js & @bitcoincashautist have ran into those limits.

For now there has not really been an effort to document these case, but going forward it would be good if we did. Copying from the Jedex topic:

It would also be relevant that we discuss the complexity of splitting contract code into multiple utxos compared to just raising the limit, for example with regards to Jedex. Is 8 different contracts to optimal design or just the necessary design?

4 Likes

Thanks for bringing this back up!

For the most part, itā€™s actually the optimal design too: the size of each transaction at each particular interaction is minimized if only the necessary code makes it into that transaction.

So for example the Redeem Order transaction only includes the Payout Covenant, a ~110 byte contract that essentially hard codes all the information needed to accept order receipts and payout what users are owed after a settlement period. Importantly, it can exclude all the hundreds of bytes of code in the DEX Covenant, Market Maker Covenant, Thread Covenant, etc. as each of those need only be included in the specific types of transactions requiring them.

For the v1 demo of Jedex, increased VM limits would likely only offer some minor efficiency improvements in the Market Maker Covenant, depending on whatā€™s ultimately implemented there (e.g. if limit orders were also implemented). As it stands, that covenant would probably have to be broken into a few coupled-UTXOs in which each UTXO performs part of the computation and stores its result in a commitment which is then read by the next UTXO, etc. Depending on the complexity ultimately required there, that might save a few hundred bytes per settlement period. (But note, the market maker covenant is already excluded from most end-user interactions, it only gets pulled in to perform the ā€œjoint executionā€ every e.g. 2 hours.)


I think a particularly important use case for increased VM limits is to enable introspection of slightly larger parent transactions. This becomes really useful for contract constructions with escrows/bounties for unusual misbehavior like Treeless ZCEs.

The only real downside of Treeless ZCE right now (but not a showstopper) is that transactions are limited to 3 inputs. With the 10,000 byte stack limit in this proposal, treeless ZCE transactions could easily fit 20+ inputs. (Well-designed wallets should consolidate with things like CashFusion anyways, so just getting that limit to ~10 or so should handle even the most extreme edge cases.)

Likewise, there are tons of other ZCE-like constructions that rely on pre-commitment to not attempting double spends. A lot of those constructions are made possible and/or much more efficient with a little more headroom on stack element size and operation count.


Overall, better-targeted VM limits arenā€™t personally a top priority of mine, but I still think getting this right is long-term important. Iā€™d love to see it happen in 2024 or 2025.

My biggest uncertainty with the current proposal is the stack memory usage limit ā€“ itā€™s relatively expensive and inelegant to measure/enforce, so Iā€™d especially love to see other interested developers research and review that part.

4 Likes

If we had stream hash function opcode(s), then you could split the TX across multiple inputs, but then parsing it would become complicated.

Iā€™d like to put something on the brainstorming table. Itā€™s so cumbersome to slice up the pushed TX to extract the fields, you have to implement a TX parser from Script. Why not have an opcode to ā€œloadā€ a TX from stack, and then use introspection opcodes on it, something like:

<push raw tx> OP_1 OP_TXFROMSTACK OP_INPUTBYTECODE

The OP_FROMSTACK would be like a prefix to introspection code-points. If used, then an introspection opcode MUST follow (except UTXO codes which should fail eval). In the example, the VM would parse the stack item as a TX and extract bytecode of input with index 1.

1 Like

Another benefit of focusing on this issue before other VM-related upgrade proposals: weā€™ll get much better real-world usage information to guide to future improvements like bounded loops or any changes related to higher-precision math emulation.

Loops are essentially a transaction size optimization: anything that can be done with loops can also be done by automatically unrolling the loop in the transaction compiler (already done today by some BCH VM compilers).

Better targeted VM limits would allow the development ecosystem to experiment more with various constructions and help to validate specific proposed APIs for ā€œnativeā€ bounded loops. (Whereas right now, developers can already achieve the same behavior as using loops, but only by breaking computations up across multiple inputs such that each remains within todayā€™s poorly-targeted limits.)

Similarly, with better limits, weā€™ll also be able to see if/how developers actually use higher-precision math emulation in the wild, giving us much better data with which to judge the actual impact of any proposed optimizations.

2 Likes

Probably best to break any further discussion on this out into a new thread, but Iā€™ll just add a few thoughts. I wrote a little about this sort of ā€œSerialized Transaction Introspectionā€ here too.

  • After CashTokens, I think the use case for parent TX introspection is very, very limited: essentially just ā€œpunishment transactionsā€ in anti-double-spending schemes that regulate the spending of P2PKH contracts (e.g. Treeless ZCE). Other kinds of contracts can allow participants to commit to a more-efficient-to-enforce message structure, so the same sort of punishment transactions donā€™t require inspection of parent transactions. (The downside is that these schemes require setup, whereas treeless ZCE requires only some confirmed P2PKH outputs without any further setup.)
  • Iā€™m likewise very uncertain about streaming hash function opcodes ā€“ I donā€™t know of any use cases beyond working around absurdly-small stack item size limits, and theyā€™re a pretty inefficient solution at that (more assigned VM opcodes, longer contract bytecode when used, more cycles during execution, messier to optimize with caching/memoization).
  • Currently, the VM is agnostic to the actual structure of encoded transactions; an operation parsing serialized transactions would necessarily mix a lot of complexity back down into underlying VM implementations. Depending on design, it could also prevent future improvements to transaction format, as we canā€™t rule out that some P2SH contracts might be broken by any observable change to the ā€œVM APIā€.
  • I donā€™t think this would actually reduce the number of bytes used, even for these very unusual parent transaction inspection: <part> <part> OP_DUP [analyze] OP_CAT has ~equivalent overhead as <raw_tx> OP_DUP <some_item_number> <some_item_type> OP_TXFROMSTACK [analyze]. I think developers would get the same value from a user-space library that simply makes working with the transaction preimage a little more comprehensible. (Though again, I really doubt this will be common outside the treeless ZCE contract itself.)
1 Like

Yeah I think this philosophy makes the most sense. Main motivation for these new opcodes is to squeeze the functionality into the current per-input limits so people can keep their sanity when designing contracts :slight_smile: Also, with current limits, splitting a contract to multiple inputs results in lots of overhead since they need to require/verify each otherā€™s execution - which spends a lot of redeem script size & opcode count budget.

But if limits were increased, then functionality could be achieved by unrolling the operations, so the only motivation for specialized opcodes would be to reduce TX sizes, which is an OK goal but would be premature optimization given the current context.

With this in mind, I believe this limits CHIP should be top priority for the next VM upgrade.

4 Likes

Interestingly enough, that overhead is much smaller than you might expect. E.g. for Jedexā€™s process tick transaction, the ā€œmarket makerā€ can be split across any number of inputs ā€“ each input has a mutable token with a 1-byte commitment indicating itā€™s index in the sequence of contracts. Each must verify:

  • That itā€™s at the expected input index (e.g. OP_INPUTINDEX <4> OP_EQUALVERIFY) and has the expected commitment at that index (e.g. <0x01> OP_EQUALVERIFY)
  • That itā€™s covenant is reproduced (e.g. <4> OP_UTXOBYTECODE <4> OP_OUTPUTBYTECODE OP_EQUALVERIFY)
  • That the UTXO token category is equal to the input token category (e.g. <4> OP_UTXOBYTECODE <4> OP_OUTPUTBYTECODE OP_EQUALVERIFY)
  • And that itā€™s portion of the computation is fulfilled, (e.g. <3> OP_INPUTBYTECODE <8> OP_SPLIT <8> OP_SPLIT [...] to split off the result of the last computation and do something with it.)

So thereā€™s significant overhead in terms of transaction size due to other components (~60 bytes per output, ~40 bytes per input), but from the perspective of the contract code, itā€™s only ~12 operations/~20 bytes of overhead in each ā€œcode splitā€ (of 201 operations and 1650 bytes, respectively).

Of course, Iā€™d prefer bring that overhead to 0 (in most cases) by targeting VM limits to be less wasteful. Just wanted to offer more background here and note that this is partially a tooling problem; our tools need to get better at combining multiple contracts into atomic transactions for other reasons, regardless of VM limits, as there are lots of cases which require this coupled-contracts primitive.

3 Likes

Just want to highlight what I think is next here:

Iā€™d love to see node implementation developers experiment with how the VM can most simply and efficiently limit stack memory usage.

The current proposal is essentially: sum_all_stack_items + stack_depth <= 130,000. That limit is rounded up from a max of 125,320 + 242 in what we believe is the most memory-expensive script currently possible.

The issue here is that we need to check memory usage after every operation in which memory usage might have increased.

Instead of the simple, incrementing counter that is currently used (++maxOperationsPerEvaluation < 201), we have to accumulate the sizes of all stack memory items, then add the length of the stack.

The calculation is not terribly expensive, and it can be optimized in many ways ā€“ e.g. many operations can skip the check, stack memory usage at >6 depth can be cached across operations, etc., but the current formulation still strikes me as less elegant than past improvements to the VM.

Itā€™s possible that this is the best we can do, but Iā€™d like to see some other people review the issue. Some questions:

  • Stack depth: Do we actually need to limit stack depth? MAX_SCRIPT_SIZE is 10,000 bytes (and for standardness, MAX_TX_IN_SCRIPT_SIG_SIZE is only 1,650 bytes), so with OP_3DUP, weā€™re looking at a maximum depth of ~30k for transactions first heard in blocks (and only ~5k for transactions first heard over the network). Can we save cycles and complexity by just skipping that? (Are we ok with the protocol requiring stack + OP_ROLL implementations to be reasonably optimized?) And if not, do we need to multiply depth by some number >1 to represent the realistic overhead of a stack item?
  • Memory usage: Is there a more elegant way to limit stack memory usage? Rather than checking total usage after every operation in which it may have increased, is there a more efficient, less direct strategy?

If other developers with an interest in low-level optimization want to take a look at this, Iā€™d love to hear your thoughts!

3 Likes

CC: maybe @cculianu, @tom, @fpelliccioni, @groot-verde, @Andrew-128, @markblundeberg? If you have thoughts on how to efficiently measure stack memory usage, comments are much appreciated!

3 Likes

There are only a limited number of places that the stack and alt-stack are changed. For those Iā€™d simply add a couple of lines that subtract or add the item to be put on the stack.
The stack-depth itself is already being tracked, so you can ignore that.
The actual data on the stack can simply be counted with an extra integer which gets changed on every stack-change with the size of the thing being popped or pushed.

I donā€™t believe that to be an issue. A little abstraction will make that readable (and maintainable) and the actual CPU load is not the issue. IO is very much the problem for scaling, not doing this extra house-keeping.

I want to tie this effort into another long standing one.

Measuring how much a script takes to executes should in my opinion have an effect on the cost/reward balance of a transaction. There is only a very crude ā€˜rewardā€™ currenty which is the fee, but to benefit an open market that measurement should improve.

So, what Iā€™m thinking is that a transaction spending a UTXO which is very expensive to spend should have its ā€˜costā€™ increased.

Things like Coin-Days-Destroyed and indeed fees paid could be used to stand on the other side of the ledger and supply the miner reward for this specific transaction.

This means that the black/white proposal for VM limits becomes a little more gray-scale where the limits are still hard limits, but getting closer to them will increase the cost of usage and hopefully avoid overloaded full nodes.

2 Likes

I definitely am a very enthusiastic supporter of this idea and any practical initiative to move such a thing forward. Ideal designs would involve ā€œcheap-to-measureā€ approaches when evaluating scripts to decide what the required fee / fee discount would be.


As for this proposal itself. I have no clear idea of what the impacts would be on perf. ā€“ the only way to know is to construct the most pathological txn imaginable with these limits in place ā€“ and perhaps a whole 32MB block of such txns, and measure the execution/validation speed of that versus status quo.

Anybody want to volunteer to construct such a pool of 32MB of txns?

If you hand me a big stash of pathological txns that abuse these limits to the extreme, I can modify a test version of BCHN to use these limits and we can benchmark how bad/not bad it all would be.

4 Likes

I wanted to add some thoughts about OP_RETURN limits and was about to post under the old thread, but I think itā€™s better to post here.

Redeem script reveal before first spend, or applications like https://memo.cash. Btw, the other day I had a nice chat about scaling with Memoā€™s founder Jason Chavannes, and I learned heā€™d like a bigger OP_RETURN, too, and said heā€™d support a CHIP for it so I got reminded of this CHIP.

That has the inconvenience of having to create a dust UTXO ahead of every data push. Also, if you use the data by contracts then you have the added complication of having to prevent others from stealing the input, e.g. if it a plain commit-reveal with no authentication.

Hereā€™s something new to consider about the where: bulk of the data would ideally be pushed before the 1st non-opreturn output. Why? Because this allows keeping more compact proofs of TXā€™s outputs: hash the serialization until start of outputs and store the sha256 & length of hashed data instead of the raw TX part. Then later, when you want to prove that outputs are part of the TX, you can do the sha256 length-extension with the outputs & locktime, and apply one more sha256 to get the TXID.

How about we allow a larger limit only if OP_RETURN or an uninterrupted sequence of them is placed before the 1st non-OP_RETURN output, while those placed in any place after the 1st non-OP_RETURN output would have tighter limits?

Ok, the above was quoting the old thread, now some fresh thoughs about stuff from this one.

You could do a sha256 length-extension from within ScriptVM, and have smaller proofs of some TXs outputs :slight_smile: Donā€™t know whether itā€™d be useful, just pointing it out for info.

  1. Contract-based refund transactions, too. Contract could inspect the parentā€™s input and pay refund back to the originating pubkey.
  2. Non-interactive tokenization of AnyHedge contracts (as opposed to interactive: needing to make tokenizer contract be part of funding TX)
  3. Combined with something like OP_BLOCKHASHMATURED, weā€™d get an easy way to ā€œintrospectā€ any historical TX - but the spender would have to provide it.

Btw, did anyone wonder why code-point 0x50 was skipped (left undefined), and it belongs into data push group? What if we made the opcode special - 0x50 to be: OP_PUSHRAWTX <raw TX>. It would be valid only if the raw TX would be of valid TX format. Then, user could slice it up to extract what he needs, knowing that the format is OK and he can skip doing some checks on the blob.

Letā€™s get moving with this CHIP! Many builders (@MathieuG & @mainnet_pat, and I imagine a few more) have already expressed their hopes about this CHIP moving forward. Letā€™s define actions that need to be taken if it is going to be ready for 2025 activation.

Would the best course of action be to just implement the limits and then do some benchmarks - and if the timings are all below the current worst case then we could safely implement it?

IMO we should aim higher, make these test cases be part of future-proofing tests with 256 MB.

2 Likes

The thing about hashing is ā€¦ really the amount of data you hash is more costly than the number of hashing operations. Consider hashing 80 bytes 660 times versus hashing 10kb 660 timesā€¦

I have a feeling that the stack item 10kb thing combined with hash abuse can be a problemā€¦ maybe?

EDIT: What size objects do you ā€œtypicallyā€ anticipate hashing? 32-byte things? 64-byte things? Perhaps an additional constraint would be: you get up to 660 hash ops, or maybe ~20kb of data you can hash, cumulatively, per script evaluation. Whichever limit is hit first. Thatā€™s one way to ensure the amount of data hashed doesnā€™t get large?

2 Likes

Thereā€™s some quantization due to padding to blocks, I think hashing 1 byte costs the same as hashing 32 bytes, but hashing 33-64 bytes costs 2x the hashing of 1-32 bytes, and so on. So if we limit total data then someone could be more abusive by doing a big number of smaller messages.

Correction on hash block size (source):

With most algorithms (including MD4, MD5, RIPEMD-160, SHA-0, SHA-1, and SHA-256), the string is padded until its length is congruent to 56 bytes (mod 64). Or, to put it another way, itā€™s padded until the length is 8 bytes less than a full (64-byte) block (the 8 bytes being size of the encoded length field). There are two hashes implemented in hash_extender that donā€™t use these values: SHA-512 uses a 128-byte blocksize and reserves 16 bytes for the length field, and WHIRLPOOL uses a 64-byte blocksize and reserves 32 bytes for the length field.

Sounds good, yeah.

1 Like

:+1:

Note: I chose numbers out my buttocks. Ideally one benches this on say BCHN and gets an idea of what the tradeoffs areā€¦ I just raised this issue as a potential thing to keep in mindā€¦

EDIT:

Thereā€™s some quantization due to padding to blocks, I think hashing 1 byte costs the same as hashing 32 bytes, but hashing 33-64 bytes costs 2x the hashing of 1-32 bytes, and so on. So if we limit total data then someone could be more abusive by doing a big number of smaller messages.

Yes. Good point. Perhaps hashing 1 byte should be ā€œcostedā€ as if it were 32-bytes. That is, each chunk of bytes you hash is rounded up to the nearest 32-bytes in terms of costingā€¦ or something.

Something like thatā€¦

2 Likes