CHIP 2021-05 Targeted Virtual Machine Limits

Tom raises some good points and some good arguments. I do agree we are still living with the psyops baggage from BTC — where block size was the one limited resource and where txns auction in a fee market for block space

I also fondly remember the time when coin days was a thing and when you could move Bitcoin for 0 sats.

All of this is very good discussion.
So when reading this I at first thought maybe some arguments were being made for using not a density based limit at all but just a static limit.

But then it seems the argument is more about how we calculate density — is it the density of the block on hand? Or is it the density of the block plus all the past blocks and transactions it “invokes”?

When framed like that… I am partial to the block in hand, since that’s easier to reason about, versus the more difficult property of worrying about a txn and all the past txns it invokes or a block and all the past blocks it invokes…

1 Like

No, 100x case is per TX with flat limit per TX.

With density-based limit the 2 blocks would take the same time in worst case no matter which kind of TXs they’re packed with. With flat, larger TXs would be relatively cheaper to validate (or smaller ones relatively more expensive, depending on your PoV).

That’s the flat vs density-based consideration.

You raised another question, given a density-based system why not have prevout Script contribute to TX budget?

Because that would allow maybe even 1000x differences when comparing one 10MB vs some other 10MB block, even if packed with TXs made of same opcodes - because the prevouts could load much more data into validation context and have the same opcodes operate on bigger data sizes etc.

Consider a simple example:

locking script: <data push> OP_DUP OP_DUP OP_CAT OP_CAT OP_HASH256 OP_DROP
unlocking script: OP_1

and now you get a 10MB block full of these little TXs of 65 bytes each. Can you estimate how long it will take you to validate it? You can’t - because you don’t know what prevouts it will load.

each prevout’s locking script could be hashing 10k bytes, so your 10MB block could end up having to hash 1.5GB!

Yeah so with the “density of current block only” approach it’s easier to turn a single knob up and down — block size knob — and constrain worst case time in an obvious way.

I mean even with utxo based calculations for limits there is a theoretical maximum any 1 input can cost you in terms of CPU — so block size is still a very very inexact measure of worst case.

But if you were some hyper vigilant miner that wanted to constrain block delays — you’d just have to write software that also measures a txns execution budget with numbers such as “utxo size” thrown into the mix.

Not a huge deal it really is 6 of 1 half dozen of the other … in a sense …

But the devils in the details if we are going for density based limits and we can get great results from just looking at the current txns size or the current inputs size and we can turn a single knob — block size to well constrain block validation and propagation cost — rather than two knobs — block size plus execution time — maybe the single knob design is preferred?

Idk.

I do get the offensiveness of “punishing” the UTXO script in a way… it does feel unfair.

But im ok with that since I like the single knob to bind them — block size.

1 Like

The problem is that there was never a consensus limit on size of TX outputs (at the moment of their creation) so anyone with access to some hash could be creating whatever DoS “bombs” ready to be executed by a TX that would spend them.

Well, we could keep the density-based approach and create some allowance for “bare” contracts, by clamping their contribution to budget, like:

if (prevout_size <= 200) { budget += prevout_size; } else { budget += 200; }

This way you have no risk of such bombs, because you know that no single UTXO can blow up the budget for the active TX.

I know @tom hopes to see more experimentation with “bare” scripts, and this approach would allow it in a cautious way. I’d be in favor of relaxing standardness rules to match the 200 (or whatever safe number we’d pick) so people can actually experiment with it.

1 Like

Yeah our fundamental problem is we are poor and can’t afford another metric. Byte Size is the 1 metric we have.

We are sort of pushing it here to also have it serve as a proxy for worst case complexity.

Really the correct solution is a gas system — ha ha.

But since we are poor and size seems to be all we have — I think looking at the size of the block on hand is the best heuristic for complexity we can muster with our primitivity.

There’s too much variance if one were to use utxo script size … to contribute to execution budget. . . Given that current block size is our 1 rate limiter. . .

Or we can create a real gas system …

Idk that’s how I see it.

1 Like

Hi all, just a reminder that this discussion is getting a bit off topic. The Limits CHIP has been carefully constrained to avoid touching economic incentives, and this discussion is veering that way.

Remember, adding the UTXO length to the Density Control length would have essentially no impact on any contract that is currently or will be standard following the Limits CHIP: it would give P2SH20 contracts an additional 18400 pushed bytes/Op. Cost (23*800=18400) and P2SH32 contracts an additional 28000 (35*800=28000). In practice, there’s little difference between this and ~1.5x-ing the limits.

We’ve been extremely careful to re-target all limits precisely at their existing practical levels to avoid any impact to how BCH works today; adding UTXO length to Density Control length would be a surprising and potentially-dangerous change even if it came in a CHIP for 2026 – it certainly doesn’t belong in this one (beyond the already-existing rationale for not adding it in 2025).

To reiterate, adding UTXO length to Density Control length is a one-way change that can always be applied in a future upgrade. If anyone feels strongly that the VM should offer even higher limits based on UTXO length, please propose an independent CHIP.

If anyone wants to write more about the more general topic of relaxing output standardness, please do that here:

4 Likes

FWIW, I agree with that.

2 Likes

So I’ve been checking my facts a bit longer and I think there is a lot of FUD flying around. A fear of execution times causing problems on the network is naturally a good reason to take action. And I definitely support this CHIPs approach to limiting that.

But the question is, limit it to what?

To start, basic operations are cheap. With the limit of a single stack entry being 10KB, you can check the time it takes an actual full node to do something like an XOR. I wrote some code and checked the execution time on my laptop and it is fast. It takes 50 nano seconds to copy 10KB into another 10KB buffer using XOR.
50ns means you can do that 20 million times in a single second. On a laptop.

More realistically, there is a great benchmarking app that BCHN has and it runs a lot of scripts and gives the run-time of each.
A single script takes on average about 40 microseconds to run. Meaning a single CPU core can process 25 thousand inputs per second.
Again, on a laptop. A Pi will be slower, a desktop or even a server will be much faster.

I’m hoping if someone can get me “disaster big horribly expensive” scripts to compare, but so far the scripts are not standing out as expensive. The overhead of the interpreter is going to be a large chunk of that 40 micro-seconds and adding more opcodes that are “expensive” won’t increase the actual time spent much. I’ve seen the slowest script take about 100 microseconds instead of the median of 40.

Practically all historical exploits have been about not-smart implementations. For instance the quadratic hashing code was about hashing the same transaction hundreds of times. Solution: cash the hashed value (Core did that many years ago). Another was a UTXO implementation issue where using 1 output would load into memory all outputs of the transaction and causing memory exhaustion. Also this is older than bitcoin cash is, and fixed as long ago.

The CHIP is quite successful in measuring expensive stuff. I have no doubt it is the sane thing to do in order to protect the VM and the full node. Set a limit and avoid issues.

But I need to stress that the ACTUAL capabilities are vastly bigger than people may have in their heads. A 100x increase in processing-time still isn’t going to cause any issues for the network or the nodes.

So, keep it sane.

2 Likes

More numbers, something that is needed for this CHIP:

When we remove limits totally, the cost (wall-time) of doing 16 really big multiplications is 29 microseconds.

If I create a 10k opcodes max script filled with biggest possible multiplications and run that, the total run time (again, needing to turn off limits) is 95 milliseconds. (notice the micro vs mili here)

So the takeaway here is that without actual liimts, things are not really all that expensive. Much cheaper than expected. Or, in other words, the actual safe limits of what the hardware (and software) is capable of is massively more than we need.

Which brings me again to my question of: what limits should actually be used for actual operation.

The first script (less than 20 opcodes, but big multiplications) uses 22430325 (22 million) op-code cost. Which is equivalent to requiring a push of 29KB in the current state of the CHIP.
To remind us, this script took 29 microseconds to actually run. It will absolutely not cause any scaling issues if that were made possible on mainnet. I’ve not checked the limits required for the 10KB script, it would be quite enormous to allow that one to run.


Jason picked the current limits based on what is needed by script- devs. In my opinion those limits are so laughably low that I have no problem with them, I mean if that is the max that honest scripts use, then set those limits.

I do want to renew my objection to tying the scriptsig size to the limits, though. It now looks to be completely unneeded and links two layers where there is no need to link two layers.

3 Likes

As I understand the design choices, it’s less based on what is needed per se, and more on what is there already, which is a conservative choice that I support in the spirit of “do one thing at a time”. IMO easy to update or remove as appropriate going forward.

I understand that the bigint is to some degree “two things at a time”, but :man_shrugging:

1 Like

yeah, the CHIP mentions indeed that the baseline is p2pkh hashing.

And while I fully appreciate the concept of doing no harm and doing one step at a time. BUT it adds a link between the transaction size and the VM processing. Which I explained above is ‘weird’. So it didn’t actually fully remove the avoidance of harm. Removing that scriptsig-size link in the future requires a hardfork, so lets not add it in the first place, is my thinking.

So when we look at the actual cost of p2pkh (the baseline) and the cost of math and other opcodes, being conservative is not needed at all.

For reference, the 97ms massive expensive script would require a 40GB in inputs to be allowed to compute. THAT is just :exploding_head:. :laughing:

1 Like

How big is the Script? If you managed to generate 29kB worth of data with 20-ish bytes of Script and have the TX validate in 0.00003s (==30μs) then one could stuff a 10MB block with those TXs and have the block take… 5 minutes to verify?

OP_1 0x028813 OP_NUM2BIN OP_REVERSEBYTES generates you a 4999B number (all 0s except highest bit) while spending only 6 bytes, and you can then abuse the stack item (OP_DUP and keep hashing it or using OP_MUL on it or some mix. or w/e) to your heart’s content (until you fill up the max. script size).

Can you try benchmarking this:

StackT stack = {};
CScript script = CScript()
                << OP_1
                << opcodetype(0x02)
                << opcodetype(0x88)
                << opcodetype(0x13)
                << OP_NUM2BIN
                << OP_REVERSEBYTES
                << OP_DUP;
for (size_t i = 0; i < 3330; ++i) {
    script = script
                << OP_2DUP
                << OP_MUL
                << OP_DROP;
}
script = script << OP_EQUAL;

you can adjust the 3330 to w/e you want, the 3330 would yield a 9998-byte script

With density-based limits, the so generated 10kB input would get rejected after executing 20th OP_MUL or so, and the small one (say i < 1) would get rejected on 1st OP_MUL because it’d be too dense for even 1, so filling the block with either variant could not exceed our target validation cost.

I’m afraid you completely missed the point of my post. Nobody is genering data, no block is validated on arrival, nobody has been suggesting any removal of limits. Removal of lmits is needed to understand the relationship of cost and score. But obviously that is in a test environment. Not meant to be taken into production.

Everyone is in agreement we need density based limits. Read my posts again, honestly. You’re not making sense.

Hi all! Just a status update:

We’re up to 4 developers publicly testing the C++ implementation now, thanks @cculianu, @bitcoincashautist, and @tom for all of the review and implementation performance testing work you’re doing!

Reviews so far seem to range from, “these limits are conservative enough,” to, “these limits could easily be >100x higher,” – which is great news.

It also looks like one or two additional node implementations will have draft patches by October 1. I think we should coordinate a cross-implementation test upgrade of chipnet soonhow about October 15th at 12 UTC? (Note: a live testnet can only meaningfully verify ~1/6 of the behaviors and worst-case performance exercised by our test vectors and benchmarks, but sanity-checking activation across implementations is a good idea + testnets are fun.) I’ll mine/maintain the test fork and a public block explorer until after Nov 15.

I just published a cleaned up and trimmed down set of ~36K test vectors and benchmarks, the previous set(s) had grown too large and were getting unwieldy (many GBs of tests) – this set is just over 500MB and compresses down to less than 15MB, so it can be committed or submodule-ed directly into node implementation repos without much bloat. (And diffs for future changes will also be much easier for humans to review and Git to compress.) The test set now includes more ancillary data too:

  • *.[non]standard_limits.json includes the expected maximum and final operation cost of each test,
  • *.[non]standard_results.json provides more detailed error explanations (or true if expected to succeed)
  • *.[non]standard_stats.csv provides lots of VM metrics for easier statistical analysis in e.g. spreadsheet software:
    • Test ID, Description, Transaction Length, UTXOs Length, UTXO Count, Tested Input Index, Density Control Length, Maximum Operation Cost, Operation Cost, Maximum SigChecks, SigChecks, Maximum Hash Digest Iterations, Hash Digest Iterations, Evaluated Instructions, Stack Pushed Bytes, Arithmetic Cost

Next I’ll be working on merging and resolving the open PRs/issues on the CHIP repos:

  • Committing test vectors and benchmarks directly to the CHIP repos (now that they’ve been trimmed down),
  • A risk assessment that reviews and summarizes each testing/benchmarking methodology and results, and
  • Some language clarifications requested by reviews so far.

After that, I’ll cut spec-finalized versions of the CHIPs and start collecting formal stakeholder statements on September 23. Next week:

  • I’ll host a written AMA about the CHIPs on Reddit, Telegram, and/or 𝕏 on Wednesday, September 25;
  • I’ll be joining @BitcoinCashPodcast at 20 UTC, Thursday, September 26; and
  • I’ll be joining General Protocols’ Space on 𝕏 at 16 UTC, Friday September 27.

Thanks everyone!

7 Likes

Bitcoin Verde has announced a flipstarter for their v3.0 release, which includes bringing the node implementation back into full consensus (the May 2024 upgrade), a >5x performance leap, and support for the 2025 CHIPs!

We seek to immediately complete our technical review of the CHIP-2024-07-BigInt and CHIP-2021-05-vm-limits CHIPs, and implement those CHIPs as a technical proof of concept (to include integrating the CHIPs’ test-vectors) in order to facilitate the timely and responsible assurance of node cross-compatibility of the BCH '25 upgrade. We consider the goals outlined in these CHIPs to be a positive incremental betterment of the BCH protocol, and look forward to supporting their inclusion in the next upgrade. [emphasis added]

The flipstarter is here:

4 Likes

As this seems to not be taken seriously, let me re-type this out and maybe we can fix this before the CHIP is frozen.

Edit; this is not a new issue. I’ve notified this thread about it quite some time ago and had long voice conversations with Jason about it. I did listen to all his arguments and spent time verifying the numbers he assumed. In the end I stated very clearly that this is an issue that should be solved and suggested various solutions. Communications died at that point. So, this leaves me no choice other than to air the dirtly laundry here.

the system has a misalignment of incentives as Jason designed it.

A 10KB transaction input has CPU time allotted to execute just below 8 million opCost.
Here an example that actually stays within this limit (7474730 is used):

std::vector<uint8_t> bigItem;
for (int i = 0; i < 9296; ++i) {
    bigItem.push_back(10);
}
CScript script;
script << bigItem;
for (int i = 0; i < 200; ++i) {
    script << OP_DUP << OP_9 << OP_DIV;
}

Since a 10KB (total) script is the maximum this is the most opCost points a script can gather.
The above executes in 1ms on my laptop. Use op-mul instead and it runs in 500-micro seconds.

Now,
if I were to move this script to live in the transaction-output, I can no longer run it.
To run it I would be forced to add big pushes to the spending transaction to allow the script to run.

There are two misalignments of incentives here;

  1. scripts have to be stored in the scriptSig (aka spending transaction) and not in the transaction output (locking transaction). The total amount of bytes on the blockchain goes up as a result. Overhead is between 25 and 40 bytes.
  2. To get more execution time, bogus bytes may be added in order to increase the opCost budget.

In Bitcoin Cash the fees are intentionally kept low, as a result the cost of pushing more bytes in order to allow a bigger CPU allowance is similarly very low. It is inevitable for people to start pushing bogus information on the blockchain just to get more CPU time.

Notice that wasting blockchain data is long term much more expensive than CPU time. This is the case because historical transactions do not get their scripts validated.

Example two

Consider usage of OP_UTXOBYTECODE. This copies data from another UTXO than the one we are spending. The result is an input script that has nearly no bytes. The script will end up being severely restricted in what it can do.
You literally can’t do a single OP_DIV from the above example in this case. (limit=37600, cost=46881)

Based on the fact that pushes are the only thing allowed in the spending transaction, this has the potential to forever lock funds on the chain. Until the rules are relaxed.

Understanding ‘cost’.

The CHIP confusingly talks about OpCost and limits like they are the same. There is a term called “density control length”, which may be the source of the confusion since that is the actual base for the limits. But it is not called such.

In simple English;

opCost is calculated based on actual work done. Each opcode has its own formula on what op-cost they have. Running the program thus adds to the tallied op-cost every step of the way.
This is a concept used by various different chains, nobody has a problem with this.

When the cost exceeds the limit, the script fails. Simple as that.

Understanding the cost being a separate concept from the limit, we can agree that the cost calculation is great. Don’t change that. The limits, however, give the problems described in this post.

So this leaves the question of where the limit comes from.

The CHIP explains that a transaction has a maximum size which is then spread over all its inputs and thus the total amount of CPU time spent on a single transaction is bound.
Using the example above, a 100kb transaction can have 10 inputs of each 10KB. Because the max is 10KB. This leaves the maximum amount of CPU time in my example to be 10 ms to run the scripts on this transaction. (which is to say, very fast).

As such this seems like a sane solution at first. Workarounds to pack more inputs don’t work since the tx-size is still limited…
Yet, the solution does have those bad side-effects. I mentioned this above, and unfortunately my repeated pointing this out have yet to make it into the CHIP. I think the CHIP rules state problems should be mentioned clearly in the CHIP. Weather the author agrees with them or not.

Future opcodes

Today we had some discussions about concepts like MAST. We contemplated that op-eval may be better since it allows interaction with things like OP_UTXOBYTECODE. A cool idea since that means your script can stay very small because the actual locking script from a cashtoken can be copied and reused from another input.

The downside is that the limits will be so low that this is practically not going to work. Unless we add a bogus push to work around the limits coming from the unlocking script.

Conclusion

The CHIP as it stands today has economic issues. Users are incentivized to pollute the blockchain in order to ‘buy’ more CPU cycles (in the form of opCost).

This should not be activated like it is today. If this can’t be fixed, we should move this CHIP to May 2026.

The op-cost of those two opcodes should probably reflect the fact that one is twice as slow as the other.

It is taken seriously, there is a whole section in the rationale about it. It’s just that others aren’t reaching the same conclusions as you. It is a defensible design choice, not a problem to be fixed.

And there’s a good reason for that. Because if you could, then you also could prepare 24389 UTXOs with such Script, and then spend them all together in a single 1 MB TX, which would take 24 seconds to validate.

If you were to propose a flat limit, it would have to be set based on that edge case (empty input script spending a max. pain locking script). That is how Jason got to 41 in the (41 + unlocking_bytecode_length) * 800 formula - an empty input’s overhead is exactly 41 bytes.

An alternative is to have a flat limit per input, and it would have to be of 41*800 or somewhere close to that value, else we’d introduce a worse pathological case than current ones. With such limit, nobody would be allowed to execute the above 10kB example script all at once in any context (as locking script, or as p2sh redeem script, even when it would be diluted enough by input data so that such TX wouldn’t be usable as a DoS vector anymore. So we’d be limiting harmless uses just so we can catch pathological ones, too. Density-based approach allows us to limit pathological cases while still allowing more of the harmless uses.

The total amount of prunable bytes goes up. Why would that be a problem long-term? Whether blockchain size goes up for this or that reason makes no difference - if it is used then the size will go up. The ultimate rate-limit for that is the blocksize limit, not incentives on which kinds of bytes will fill the space. Below that hard boundary, miners are free to implement custom policies for whatever filler bytes (which are highly compressible), if they would be seen as a problem.

Yes, they may. Why is that a problem?

Consider the alternative: say I really need to add two 10k numbers, but all I have is the flat 41*800 budget. What do I do then? I split the calculation to a bunch of biggest additions that can fit the budget and carry the calculation over a bunch of TXs, and those TXs will add up to more total data on the chain than if I just did it in one go in a single input with some filler bytes. The filler bytes save me (and the network) from the overhead of additional script bytes required to verify that the execution was carried over from the previous TX correctly (which means having redundant data in intermediate verification steps).

Also, filler bytes would be highly compressible for storage. The carry-over redundancy of splitting it across multiple TXs would not be so compressible.

Ordinary people can’t push more than 1,650 bytes in a single input because we’re keeping the relay limit the same. Also, if filler bytes are found to be a problem, then miners are free to implement their own policies to further restrict inclusion of inputs that are using padding, or just have a custom fee policy for these filler bytes and price them differently.

Copies, but doesn’t execute it, just puts it on stack of the input that executed it as part of its own script. Normally this is used in pair with OP_OUTPUTBYTECODE and then you just verify they match, and then verify some other things like BCH amount etc., usually scripts that do this will be few 10s of bytes or more, which will be more than enough budget for usual applications.

Yes, there is this side-effect, it is a trade-off of having a simpler system rather than implementing a more complex gas-like system. The CHIP does specifically mention padding in that context:

Finally, this proposal could also increase the operation cost limit proportionally to the per-byte mining fee paid or based on the declaration of “virtual” bytes, e.g. for fee rates of 2 satoshis-per-real-byte, the VM could allow a per-byte budget of 2000 (a 2.5 multiple, incentivizing contracts to pay the higher fee rate rather than simply padding the unlocking bytecode length). However, any allowed increase in per-real-byte operation cost also equivalently changes the variability of per-byte worst-case transaction validation time; such flexibility would need to be conservatively capped and/or interact with the block size limit to ensure predictability of transaction and block validation requirements. Additionally, future research based on real-world usage may support simpler alternatives, e.g. a one-time increase in the per-byte operation cost budget.

I did have more words about it in my proposed edit, maybe CHIP should give this side-effect more words? I still disagree that it is a problem to be solved, the system is good enough as proposed.

Disagree. Your conclusion was the reason for my thumb-down (which I wanted to undo on 2nd thought but forum won’t let me).

2 Likes

Interesting to hear about your benchmarking Tom, where OP_DIV takes twice as long as OP_MUL.

I think @bitcoincashautist did a good job above re-stating the reason why input script length is used for the formula.

Strictly More Permissive

When considering your example 2 with OP_UTXOBYTECODE , it’s worth emphasizing the following parts of the rational:

this proposal ensures that all currently-standard transactions remain valid, while avoiding any significant increase in worst case validation performance vs. current limits

By design, this proposal reduces the overall worst-case computation costs of node operation, while significantly extending the universe of relatively inexpensive-to-validate contract constructions available to Bitcoin Cash contract developers.

So contract authors will only be able to use this opcode more permissively than is currently the case. Only abusive constructions intentionally trying to drain CPU cycles would get meaningfully limited.

Any construction that we can imagine currently where you split the utxobytecode, replace some variables of the contract and then hash it into a new locking bytecode script (this process has been called ‘simulated state’) will still be possible in the same way, but now freed from the 201 opcode limit & 520 bytesize limit.

Push Operations Included

When I was going over the ‘OpCost’ table I was slightly concerned that OP_CHECKSIG and OP_CHECKDATASIGVERIFY had a cost of > 26,000 and OP_CHECKMULTISIGVERIFY even a cost of > k * 26,000 . However all honest contract usage involves pushing arguments to the stack (and not just spawning random sigchecks on the stack), signatures are already 65 bytes and the pk is also 33 bytes, so this results in a 80k budget for each sig+key.

Implications for Contract Authors

The CHIP is designed to prevent abusive contracts, and from reviewing the proposal I don’t expect ANY of the CashScript contract authors to run into these new accounting limits. Our aim should be to not increase the learning curve and barrier to entry of contract development, I do not want contract authors to have to worry/learn about opcode accounting costs, and I don’t believe they will have to.

To me the only redeeming quality the 201 opcode limit & 520 bytesize limit was that they were easy to understand. I expect the OpCost accounting to be invisible to normal contract developers and for the main limitation that contract authors would still have to be aware of after these VM limits changes to be the limit on standard input bytecode length

Maximum standard input bytecode length (A.K.A. MAX_TX_IN_SCRIPT_SIG_SIZE – 1,650 bytes),

1 Like

The first case seems like a theoretical situation that I don’t think has meaningful impact, but the second case looks more sensible to me.

I agree that there could be some scheme where you intentionally have very small inputs, that might get restricted by this VM-limit change.

If that is the case, they have the option of doing the now unnecessary push, but I think a better outcome can be had: Raising the base limit.

However, doing that would raise the worst case validation cost, and cannot be done as part of the VM-Limit chip, since that chip explicitly has as a design goal not to raise the current worst case validation cost, even if that has other benefits.

I think the use case you are imagining is interesting and I’d love to see it well supported, but my preferred solution is to activate VM-Limits as they are, then raise the base limit slightly with a separate CHIP.

EDIT: Actually, I think such schemes would require non-standard transactions, and so would need to be miner-assisted. This further reduces the “risk” of these things being negatively impacted and puts it at the point where I think it’s better for someone to build a proof of concept, then propose an updated limit later on, using the proof of concept as motivation for why the limit should be changed.

2 Likes