CHIP 2021-05 Targeted Virtual Machine Limits

That link doesn’t make sense.

Why grant more CPU ‘rights’ to a transaction that is bigger?

If the answer isn’t about fees paid (as you implied) then that makes it even more weird.

Why not give a static limit to every single input-script (regardless of size, color or race) which it has to stay inside. Make sure that that limit protects all nodes from DOS by picking something low enough.

Worried you picked something too low? Increase that limit every halving… (that idea stolen from BCA :wink: )

That’s the current system, and it forces people to work around the limits. Like, if I have a 1kB TX that loads some oracle data and does something with it, I could hit this static limit and then maybe I’d work around it by making 2x 1kB TXs in order to carry out the full operation.

With static limits, this 1kB TX (or 2x 1kB TXs) will be orders of magnitude cheaper than some 1kB CashFusion TX packed with sigops. Why can’t my 1kB oracle TX have the same CPU budget as P2PKH CashFusion 1kB TX? Why should I have to create more bandwidth load of CPU-cheap TXs when it could be packed more densely into 1 TX?

That’s how we get to density-based limit, I thought the CHIP needed a rationale for it so there’s this PR open: https://github.com/bitjson/bch-vm-limits/pull/19

Density-based Operational Cost Limit

The objective of this upgrade is to allow smart contract transactions to do more, and without any negative impact to network scalability.
With the proposed approach of limiting operational cost density, we can guarantee that processing cost of a block packed with smart contract transactions can’t exceeed the cost of a block packed full of typical payment transactions (pay-to-public-key-hash transactions, abbreviated P2PKH).
Those kinds of transactions make more than 99% of Bitcoin Cash network traffic and are thus a natural baseline for scalability considerations.

Trade-off of limiting density (rather than total cost) is that input size may be intentionally inflated (e.g. adding <filler> OP_DROP) by users in order to “buy” more total operational budget for the input’s script, in effect turning the input’s bytes into a form of “gas”.
Transaction inputs having such filler bytes still wouldn’t negatively impact scalability, although they would appear wasteful.
These filler bytes would have to pay transaction fees just like any other transaction and we don’t expect users to make these kinds of transactions unless they have economically good reasons, so this is not seen as a problem.
With the density-based approach, we can have maximum flexibility and functionality so this is seen as an acceptable trade-off.

We could consider taking this approach further: having a shared budget per transaction, rather than per input.
This would exacerbate the effect of density-based approach: then users could then add filler inputs or outputs to create more budget for some other input inside the same transaction.
This would allow even more functionality and flexibility for users, but it has other trade-offs.
Please see Rationale: Use of Input Length-Based Densities below for further consideration.

What are the alternatives to density-based operational cost?

If we simply limited total input’s operation cost, we’d still achieve the objective of not negatively impacting network scalability, but at the expense of flexibility and functionality: a big input would have as much operational cost budget as a small input, meaning it could not do as much with its own bytes, even when the bytes are not intentionally filler bytes.
To be useful, bigger inputs normally have to operate on more data, so we can expect them to typically require more operations than smaller inputs.
If we limited total operations, contract authors would then have to work around the limitation by creating chains of inputs or transactions in order to carry out the operations rather than packing all operations in one input - and that would result in more overheads and being relatively more expensive for the network to process while also complicating contract design for application developers.
This is pretty much the status quo, which we are hoping to improve on.

Another alternative is to introduce some kind of gas system, where transactions could declare how much processing budget they want to buy, e.g. declare some additional “virtual” bytes without actually having to encode them.
Then, transaction fees could be negotiated based on raw + virtual bytes, rather than just raw bytes.
This system would introduce additional complexity and for not much benefit other than saving some network bandwidth for those exotic cases.
Savings in bandwidth could be alternatively achieved on another layer: by compressing TX data, especially because filler bytes can be highly compressible (e.g. data push of 1000 0-bytes).

There is nothing structurally wrong with the current system. The limits are too low, they were too conservative, so increase the limits.

Your entire argument of splitting things over multiple transactions can be solved by increasing the limits. Solving the problem.

Sooo, now it is again about fees?

The basic premise to me is this;

  • Limits today are too low, stopping innovation.
  • Limits today are not an issue on full nodes. AT ALL.
  • We can massively increase limits without hurting full nodes.
  • 99% of the transactions will never even use 1% of the limits. They are just payments. And that is Ok.
  • A heavy transaction packed with sigops, which stays withing limits will then by definition not hurt anyone. That is why the limits were picked, right?.

The quoted part of the CHIP again talks about fees, so it is clear that the solution is based on the outdated idea from Core that block-space is paid for by fees. This is false and we should really really internalize that this idea is meant to destroy Bitcoin. (I mean, look at BTC).

Miners are paid in the native token, which has value based on Utility. Utility is thus what pays miners. Fees just play a tiny role in that whole. Miners can just mine empty blocks on bch if they didn’t agree with this basic premise.

A transaction that increases the value of the underlying coin is thus implicitly paying the miner by implicitly increasing the value of the sats they earn. Just like many many transactions make high fees, many many peer to peer payments increase the value of the coin. They multiply and make miners happy.

Blockspace still is limited, if you have a bunch of transactions that add filler opcodes or dummy pushes in order to be allowed to use more CPU, that blockspace is taken away from actually economic transactions. You decreased the value for miners. The fees being increased for those dummy bytes don’t make up for the loss of utility of the coin. Lower utility (people wait longer for their simple payments and may even move to competing chains) and miners are less happy.

To unwind, the basic premise of fees paying for blockspace in the eyes of the miners dismisses the idea of a coin having value based on utility, which is currently not really an issue since gambling is the main reason for coin value today (moreso on btc than on bch). But long term this is a real problem when blocks get full and people want to use this system for their daily payments.

Blockspace is paid for in most part by increase in utility. Which in reality means peer to peer payments. As evident by the hashpower backing btc vs bch.

To unwind, we get back to this CHIP.
The link between fees paid, bigger transactions and rights of usage on the CPU is a bad one. It will propagate the fees-pay-for-blockspace in a way that perverts the basic incentives. Much like on BTC, but we probably wouldn’t see it for years.
Next to that, the basic premise of using transaction size in any way is irrelevant to obtaining the goals of the CHIP. Nobody gives a shit about cpu being fairly divided between ‘complex’ scripts and simple scripts like p2pkh. All we care about is to keep the system safe. I would argue that static limits do that without the downsides.

Yes, and density-based approach will keep it as safe as it is, while still allowing each TX byte to do more CPU ops.

With density-based limits: a block full of 1000 x 10kB TXs (10MB) will take as long to validate as a block full of 100,000 x 100B TXs, (10MB of different composition) even if they were both composed of worst-case TXs.

With flat limit per TX, the latter could take 100x more time to validate.

Not per TX, per UTXO. That is what I’ve consistently been writing :wink:

One unit as handed to the VM. Which is the combination of 1 input script and 1 output script.

Tom raises some good points and some good arguments. I do agree we are still living with the psyops baggage from BTC — where block size was the one limited resource and where txns auction in a fee market for block space

I also fondly remember the time when coin days was a thing and when you could move Bitcoin for 0 sats.

All of this is very good discussion.
So when reading this I at first thought maybe some arguments were being made for using not a density based limit at all but just a static limit.

But then it seems the argument is more about how we calculate density — is it the density of the block on hand? Or is it the density of the block plus all the past blocks and transactions it “invokes”?

When framed like that… I am partial to the block in hand, since that’s easier to reason about, versus the more difficult property of worrying about a txn and all the past txns it invokes or a block and all the past blocks it invokes…

1 Like

No, 100x case is per TX with flat limit per TX.

With density-based limit the 2 blocks would take the same time in worst case no matter which kind of TXs they’re packed with. With flat, larger TXs would be relatively cheaper to validate (or smaller ones relatively more expensive, depending on your PoV).

That’s the flat vs density-based consideration.

You raised another question, given a density-based system why not have prevout Script contribute to TX budget?

Because that would allow maybe even 1000x differences when comparing one 10MB vs some other 10MB block, even if packed with TXs made of same opcodes - because the prevouts could load much more data into validation context and have the same opcodes operate on bigger data sizes etc.

Consider a simple example:

locking script: <data push> OP_DUP OP_DUP OP_CAT OP_CAT OP_HASH256 OP_DROP
unlocking script: OP_1

and now you get a 10MB block full of these little TXs of 65 bytes each. Can you estimate how long it will take you to validate it? You can’t - because you don’t know what prevouts it will load.

each prevout’s locking script could be hashing 10k bytes, so your 10MB block could end up having to hash 1.5GB!

Yeah so with the “density of current block only” approach it’s easier to turn a single knob up and down — block size knob — and constrain worst case time in an obvious way.

I mean even with utxo based calculations for limits there is a theoretical maximum any 1 input can cost you in terms of CPU — so block size is still a very very inexact measure of worst case.

But if you were some hyper vigilant miner that wanted to constrain block delays — you’d just have to write software that also measures a txns execution budget with numbers such as “utxo size” thrown into the mix.

Not a huge deal it really is 6 of 1 half dozen of the other … in a sense …

But the devils in the details if we are going for density based limits and we can get great results from just looking at the current txns size or the current inputs size and we can turn a single knob — block size to well constrain block validation and propagation cost — rather than two knobs — block size plus execution time — maybe the single knob design is preferred?

Idk.

I do get the offensiveness of “punishing” the UTXO script in a way… it does feel unfair.

But im ok with that since I like the single knob to bind them — block size.

1 Like

The problem is that there was never a consensus limit on size of TX outputs (at the moment of their creation) so anyone with access to some hash could be creating whatever DoS “bombs” ready to be executed by a TX that would spend them.

Well, we could keep the density-based approach and create some allowance for “bare” contracts, by clamping their contribution to budget, like:

if (prevout_size <= 200) { budget += prevout_size; } else { budget += 200; }

This way you have no risk of such bombs, because you know that no single UTXO can blow up the budget for the active TX.

I know @tom hopes to see more experimentation with “bare” scripts, and this approach would allow it in a cautious way. I’d be in favor of relaxing standardness rules to match the 200 (or whatever safe number we’d pick) so people can actually experiment with it.

1 Like

Yeah our fundamental problem is we are poor and can’t afford another metric. Byte Size is the 1 metric we have.

We are sort of pushing it here to also have it serve as a proxy for worst case complexity.

Really the correct solution is a gas system — ha ha.

But since we are poor and size seems to be all we have — I think looking at the size of the block on hand is the best heuristic for complexity we can muster with our primitivity.

There’s too much variance if one were to use utxo script size … to contribute to execution budget. . . Given that current block size is our 1 rate limiter. . .

Or we can create a real gas system …

Idk that’s how I see it.

1 Like

Hi all, just a reminder that this discussion is getting a bit off topic. The Limits CHIP has been carefully constrained to avoid touching economic incentives, and this discussion is veering that way.

Remember, adding the UTXO length to the Density Control length would have essentially no impact on any contract that is currently or will be standard following the Limits CHIP: it would give P2SH20 contracts an additional 18400 pushed bytes/Op. Cost (23*800=18400) and P2SH32 contracts an additional 28000 (35*800=28000). In practice, there’s little difference between this and ~1.5x-ing the limits.

We’ve been extremely careful to re-target all limits precisely at their existing practical levels to avoid any impact to how BCH works today; adding UTXO length to Density Control length would be a surprising and potentially-dangerous change even if it came in a CHIP for 2026 – it certainly doesn’t belong in this one (beyond the already-existing rationale for not adding it in 2025).

To reiterate, adding UTXO length to Density Control length is a one-way change that can always be applied in a future upgrade. If anyone feels strongly that the VM should offer even higher limits based on UTXO length, please propose an independent CHIP.

If anyone wants to write more about the more general topic of relaxing output standardness, please do that here:

4 Likes

FWIW, I agree with that.

2 Likes

So I’ve been checking my facts a bit longer and I think there is a lot of FUD flying around. A fear of execution times causing problems on the network is naturally a good reason to take action. And I definitely support this CHIPs approach to limiting that.

But the question is, limit it to what?

To start, basic operations are cheap. With the limit of a single stack entry being 10KB, you can check the time it takes an actual full node to do something like an XOR. I wrote some code and checked the execution time on my laptop and it is fast. It takes 50 nano seconds to copy 10KB into another 10KB buffer using XOR.
50ns means you can do that 20 million times in a single second. On a laptop.

More realistically, there is a great benchmarking app that BCHN has and it runs a lot of scripts and gives the run-time of each.
A single script takes on average about 40 microseconds to run. Meaning a single CPU core can process 25 thousand inputs per second.
Again, on a laptop. A Pi will be slower, a desktop or even a server will be much faster.

I’m hoping if someone can get me “disaster big horribly expensive” scripts to compare, but so far the scripts are not standing out as expensive. The overhead of the interpreter is going to be a large chunk of that 40 micro-seconds and adding more opcodes that are “expensive” won’t increase the actual time spent much. I’ve seen the slowest script take about 100 microseconds instead of the median of 40.

Practically all historical exploits have been about not-smart implementations. For instance the quadratic hashing code was about hashing the same transaction hundreds of times. Solution: cash the hashed value (Core did that many years ago). Another was a UTXO implementation issue where using 1 output would load into memory all outputs of the transaction and causing memory exhaustion. Also this is older than bitcoin cash is, and fixed as long ago.

The CHIP is quite successful in measuring expensive stuff. I have no doubt it is the sane thing to do in order to protect the VM and the full node. Set a limit and avoid issues.

But I need to stress that the ACTUAL capabilities are vastly bigger than people may have in their heads. A 100x increase in processing-time still isn’t going to cause any issues for the network or the nodes.

So, keep it sane.

2 Likes

More numbers, something that is needed for this CHIP:

When we remove limits totally, the cost (wall-time) of doing 16 really big multiplications is 29 microseconds.

If I create a 10k opcodes max script filled with biggest possible multiplications and run that, the total run time (again, needing to turn off limits) is 95 milliseconds. (notice the micro vs mili here)

So the takeaway here is that without actual liimts, things are not really all that expensive. Much cheaper than expected. Or, in other words, the actual safe limits of what the hardware (and software) is capable of is massively more than we need.

Which brings me again to my question of: what limits should actually be used for actual operation.

The first script (less than 20 opcodes, but big multiplications) uses 22430325 (22 million) op-code cost. Which is equivalent to requiring a push of 29KB in the current state of the CHIP.
To remind us, this script took 29 microseconds to actually run. It will absolutely not cause any scaling issues if that were made possible on mainnet. I’ve not checked the limits required for the 10KB script, it would be quite enormous to allow that one to run.


Jason picked the current limits based on what is needed by script- devs. In my opinion those limits are so laughably low that I have no problem with them, I mean if that is the max that honest scripts use, then set those limits.

I do want to renew my objection to tying the scriptsig size to the limits, though. It now looks to be completely unneeded and links two layers where there is no need to link two layers.

2 Likes

As I understand the design choices, it’s less based on what is needed per se, and more on what is there already, which is a conservative choice that I support in the spirit of “do one thing at a time”. IMO easy to update or remove as appropriate going forward.

I understand that the bigint is to some degree “two things at a time”, but :man_shrugging:

1 Like

yeah, the CHIP mentions indeed that the baseline is p2pkh hashing.

And while I fully appreciate the concept of doing no harm and doing one step at a time. BUT it adds a link between the transaction size and the VM processing. Which I explained above is ‘weird’. So it didn’t actually fully remove the avoidance of harm. Removing that scriptsig-size link in the future requires a hardfork, so lets not add it in the first place, is my thinking.

So when we look at the actual cost of p2pkh (the baseline) and the cost of math and other opcodes, being conservative is not needed at all.

For reference, the 97ms massive expensive script would require a 40GB in inputs to be allowed to compute. THAT is just :exploding_head:. :laughing:

1 Like

How big is the Script? If you managed to generate 29kB worth of data with 20-ish bytes of Script and have the TX validate in 0.00003s (==30μs) then one could stuff a 10MB block with those TXs and have the block take… 5 minutes to verify?

OP_1 0x028813 OP_NUM2BIN OP_REVERSEBYTES generates you a 4999B number (all 0s except highest bit) while spending only 6 bytes, and you can then abuse the stack item (OP_DUP and keep hashing it or using OP_MUL on it or some mix. or w/e) to your heart’s content (until you fill up the max. script size).

Can you try benchmarking this:

StackT stack = {};
CScript script = CScript()
                << OP_1
                << opcodetype(0x02)
                << opcodetype(0x88)
                << opcodetype(0x13)
                << OP_NUM2BIN
                << OP_REVERSEBYTES
                << OP_DUP;
for (size_t i = 0; i < 3330; ++i) {
    script = script
                << OP_2DUP
                << OP_MUL
                << OP_DROP;
}
script = script << OP_EQUAL;

you can adjust the 3330 to w/e you want, the 3330 would yield a 9998-byte script

With density-based limits, the so generated 10kB input would get rejected after executing 20th OP_MUL or so, and the small one (say i < 1) would get rejected on 1st OP_MUL because it’d be too dense for even 1, so filling the block with either variant could not exceed our target validation cost.

I’m afraid you completely missed the point of my post. Nobody is genering data, no block is validated on arrival, nobody has been suggesting any removal of limits. Removal of lmits is needed to understand the relationship of cost and score. But obviously that is in a test environment. Not meant to be taken into production.

Everyone is in agreement we need density based limits. Read my posts again, honestly. You’re not making sense.

Hi all! Just a status update:

We’re up to 4 developers publicly testing the C++ implementation now, thanks @cculianu, @bitcoincashautist, and @tom for all of the review and implementation performance testing work you’re doing!

Reviews so far seem to range from, “these limits are conservative enough,” to, “these limits could easily be >100x higher,” – which is great news.

It also looks like one or two additional node implementations will have draft patches by October 1. I think we should coordinate a cross-implementation test upgrade of chipnet soonhow about October 15th at 12 UTC? (Note: a live testnet can only meaningfully verify ~1/6 of the behaviors and worst-case performance exercised by our test vectors and benchmarks, but sanity-checking activation across implementations is a good idea + testnets are fun.) I’ll mine/maintain the test fork and a public block explorer until after Nov 15.

I just published a cleaned up and trimmed down set of ~36K test vectors and benchmarks, the previous set(s) had grown too large and were getting unwieldy (many GBs of tests) – this set is just over 500MB and compresses down to less than 15MB, so it can be committed or submodule-ed directly into node implementation repos without much bloat. (And diffs for future changes will also be much easier for humans to review and Git to compress.) The test set now includes more ancillary data too:

  • *.[non]standard_limits.json includes the expected maximum and final operation cost of each test,
  • *.[non]standard_results.json provides more detailed error explanations (or true if expected to succeed)
  • *.[non]standard_stats.csv provides lots of VM metrics for easier statistical analysis in e.g. spreadsheet software:
    • Test ID, Description, Transaction Length, UTXOs Length, UTXO Count, Tested Input Index, Density Control Length, Maximum Operation Cost, Operation Cost, Maximum SigChecks, SigChecks, Maximum Hash Digest Iterations, Hash Digest Iterations, Evaluated Instructions, Stack Pushed Bytes, Arithmetic Cost

Next I’ll be working on merging and resolving the open PRs/issues on the CHIP repos:

  • Committing test vectors and benchmarks directly to the CHIP repos (now that they’ve been trimmed down),
  • A risk assessment that reviews and summarizes each testing/benchmarking methodology and results, and
  • Some language clarifications requested by reviews so far.

After that, I’ll cut spec-finalized versions of the CHIPs and start collecting formal stakeholder statements on September 23. Next week:

  • I’ll host a written AMA about the CHIPs on Reddit, Telegram, and/or 𝕏 on Wednesday, September 25;
  • I’ll be joining @BitcoinCashPodcast at 20 UTC, Thursday, September 26; and
  • I’ll be joining General Protocols’ Space on 𝕏 at 16 UTC, Friday September 27.

Thanks everyone!

5 Likes

Bitcoin Verde has announced a flipstarter for their v3.0 release, which includes bringing the node implementation back into full consensus (the May 2024 upgrade), a >5x performance leap, and support for the 2025 CHIPs!

We seek to immediately complete our technical review of the CHIP-2024-07-BigInt and CHIP-2021-05-vm-limits CHIPs, and implement those CHIPs as a technical proof of concept (to include integrating the CHIPs’ test-vectors) in order to facilitate the timely and responsible assurance of node cross-compatibility of the BCH '25 upgrade. We consider the goals outlined in these CHIPs to be a positive incremental betterment of the BCH protocol, and look forward to supporting their inclusion in the next upgrade. [emphasis added]

The flipstarter is here:

3 Likes