CHIP 2021-05 Targeted Virtual Machine Limits

cculianu · September 13, 2024, 4:49am

I think I get your argument. Due to the “invisible” nature of UTXOs … where they aren’t part of the transaction itself but must be resurrected from the “dead” (from the UTXO db) to be evaluated – you can do some crazy things like have an undead army of scripts that are expensive to validate but attached to a simple-enough-looking txn.

Basically you can “prepare” some pretty expensive-to-validate UTXOs and have this 1 seemingly innocent transaction that looks innocent enough… and you can have it totally choke out the system.

Mempool and blockspace is limited – but UTXO DB space is boundless!

What’s more – it’s difficult to sift through what “looks” expensive to validate and what doesn’t when all you have in a TXN is a pointer to some old script living in a DB table…

I think I get why it’s a pandora’s box to credit the UTXO size to the input’s budget… it’s just hard for me to put numbers behind it but I intuitively get the argument, FWIW.

cculianu · September 13, 2024, 4:50am

Stated another way: Imagine a txn that references 10,000 prevouts that are all super huge 10kb UTXOs. You end up in this weird territory where the script VM is now required to allow execution of 100MB+ worth of script data, and is required to allow it to pass if it’s all “correct” and returns “true”. All for a paltry ~500kb in txn space in the present.

Seems a bit out of whack even on the face of it.

Really the achilles heel of UTXO locking scripts is they can be all dead sitting in a DB and resurrected to join an undead army in a single txn and they can overwhelm the costs of evaluation. By “crediting” the txn and allocating budget for all these UTXOs you basically just allow that to happen… which seems to be the opposite of where we want to push the design… ?

tom · September 13, 2024, 7:37am

I’ll have to share the brutal truth, if you are saying that this scheme doesn’t protect nodes in the usecase of 2, then there is a very strong misalignment somewhere.

You using pushes only has side-effects that make an (apparently) wobby system even less stable. Side effects like this are huge red-flags in a running system. That’s really not Ok to introduce. The economic basics of Bitcoin Cash is balanced out carefully and while increasing the VM limits has effectively zero effect on the economic properties you won’t find any objections.
But this changes the balance. Makes one type of transaction “cheaper” than others. It is not as bad as the segwit discount, but it is the same concept.

I’d rather not introduce changes that have known side-effects at all. If you think this can be fixed in the future, then fix it now.
We can always bump this to next year if that is what it takes…

This sounds like the limits are not actually doing their job then. The limits weer supposed to be about protecting the VM. Lower the limits if that is needed to protect the VM. I mean, this is the basic concept of what you guys have been testing and evaluating.
Please do that based on the input AND output size. Make the max script size less, if you need to.
Don’t insert perverse incentives to allow limits to be higher in some cases. That’s not a good idea and will be guarenteed to come back to bite you. BTC has some pretty bad examples of that. Let’s not do that here.

At this point I want to ask WHY there is a connection between the transaction size (and presumably fees paid) and the max / budget that a script can use.

Why not simply make that static. Maybe increases that every halving (to keep up with CPU / bandwidth cost).
That would be much simpler and avoid all this discussion, no?

tom · September 13, 2024, 7:42am

To repost this, the system we have today is not going to be used in the future. None of the properties are hard-coded / consensus. The miners that start to innovate and become most profitable will change them. Things like relay-fee, things like max free transactions in a block. They are all mining software that can be changed.

Do NOT bind the limits to the assumption that fees-paid is allowing one to run larger fees on (not that miner’s) CPUs. That cross-connects things that should not be connected.

bitcoincashautist · September 13, 2024, 8:08am

The proposed VM limits system is agnostic of fees paid. If a miner will accept 0-fee TX then you get free execution - but still up to the CPU density limit. It’s just that we limit CPU density per byte of TX in order to protect ALL nodes from DoS. This way if you see a 1MB TX you know before even executing it or looking up its prevouts that it can’t take longer than X seconds to validate - no matter the UTXOs it references, because the budget will be known before even looking up prevout UTXOs and is decided solely by the 1MB TXs contents.

Having prevouts contribute to budget would break that.

UTXOs are not executed in TX that creates them, they’re just dumb data at that point.

When spending TX references them it has to “load” the data and then execute it. And if 1 byte can load 10,000 bytes and execute the 10,000 bytes in a min. size TX - then you can produce poison TXs that inflate CPU density orders of magnitude beyond what is typical.

tom · September 13, 2024, 8:11am

That link doesn’t make sense.

Why grant more CPU ‘rights’ to a transaction that is bigger?

If the answer isn’t about fees paid (as you implied) then that makes it even more weird.

Why not give a static limit to every single input-script (regardless of size, color or race) which it has to stay inside. Make sure that that limit protects all nodes from DOS by picking something low enough.

Worried you picked something too low? Increase that limit every halving… (that idea stolen from BCA )

bitcoincashautist · September 13, 2024, 8:22am

That’s the current system, and it forces people to work around the limits. Like, if I have a 1kB TX that loads some oracle data and does something with it, I could hit this static limit and then maybe I’d work around it by making 2x 1kB TXs in order to carry out the full operation.

With static limits, this 1kB TX (or 2x 1kB TXs) will be orders of magnitude cheaper than some 1kB CashFusion TX packed with sigops. Why can’t my 1kB oracle TX have the same CPU budget as P2PKH CashFusion 1kB TX? Why should I have to create more bandwidth load of CPU-cheap TXs when it could be packed more densely into 1 TX?

That’s how we get to density-based limit, I thought the CHIP needed a rationale for it so there’s this PR open: https://github.com/bitjson/bch-vm-limits/pull/19

Density-based Operational Cost Limit

The objective of this upgrade is to allow smart contract transactions to do more, and without any negative impact to network scalability.
With the proposed approach of limiting operational cost density, we can guarantee that processing cost of a block packed with smart contract transactions can’t exceeed the cost of a block packed full of typical payment transactions (pay-to-public-key-hash transactions, abbreviated P2PKH).
Those kinds of transactions make more than 99% of Bitcoin Cash network traffic and are thus a natural baseline for scalability considerations.

Trade-off of limiting density (rather than total cost) is that input size may be intentionally inflated (e.g. adding <filler> OP_DROP) by users in order to “buy” more total operational budget for the input’s script, in effect turning the input’s bytes into a form of “gas”.
Transaction inputs having such filler bytes still wouldn’t negatively impact scalability, although they would appear wasteful.
These filler bytes would have to pay transaction fees just like any other transaction and we don’t expect users to make these kinds of transactions unless they have economically good reasons, so this is not seen as a problem.
With the density-based approach, we can have maximum flexibility and functionality so this is seen as an acceptable trade-off.

We could consider taking this approach further: having a shared budget per transaction, rather than per input.
This would exacerbate the effect of density-based approach: then users could then add filler inputs or outputs to create more budget for some other input inside the same transaction.
This would allow even more functionality and flexibility for users, but it has other trade-offs.
Please see Rationale: Use of Input Length-Based Densities below for further consideration.

What are the alternatives to density-based operational cost?

If we simply limited total input’s operation cost, we’d still achieve the objective of not negatively impacting network scalability, but at the expense of flexibility and functionality: a big input would have as much operational cost budget as a small input, meaning it could not do as much with its own bytes, even when the bytes are not intentionally filler bytes.
To be useful, bigger inputs normally have to operate on more data, so we can expect them to typically require more operations than smaller inputs.
If we limited total operations, contract authors would then have to work around the limitation by creating chains of inputs or transactions in order to carry out the operations rather than packing all operations in one input - and that would result in more overheads and being relatively more expensive for the network to process while also complicating contract design for application developers.
This is pretty much the status quo, which we are hoping to improve on.

Another alternative is to introduce some kind of gas system, where transactions could declare how much processing budget they want to buy, e.g. declare some additional “virtual” bytes without actually having to encode them.
Then, transaction fees could be negotiated based on raw + virtual bytes, rather than just raw bytes.
This system would introduce additional complexity and for not much benefit other than saving some network bandwidth for those exotic cases.
Savings in bandwidth could be alternatively achieved on another layer: by compressing TX data, especially because filler bytes can be highly compressible (e.g. data push of 1000 0-bytes).

tom · September 13, 2024, 9:25am

There is nothing structurally wrong with the current system. The limits are too low, they were too conservative, so increase the limits.

Your entire argument of splitting things over multiple transactions can be solved by increasing the limits. Solving the problem.

Sooo, now it is again about fees?

The basic premise to me is this;

Limits today are too low, stopping innovation.
Limits today are not an issue on full nodes. AT ALL.
We can massively increase limits without hurting full nodes.
99% of the transactions will never even use 1% of the limits. They are just payments. And that is Ok.
A heavy transaction packed with sigops, which stays withing limits will then by definition not hurt anyone. That is why the limits were picked, right?.

The quoted part of the CHIP again talks about fees, so it is clear that the solution is based on the outdated idea from Core that block-space is paid for by fees. This is false and we should really really internalize that this idea is meant to destroy Bitcoin. (I mean, look at BTC).

Miners are paid in the native token, which has value based on Utility. Utility is thus what pays miners. Fees just play a tiny role in that whole. Miners can just mine empty blocks on bch if they didn’t agree with this basic premise.

A transaction that increases the value of the underlying coin is thus implicitly paying the miner by implicitly increasing the value of the sats they earn. Just like many many transactions make high fees, many many peer to peer payments increase the value of the coin. They multiply and make miners happy.

Blockspace still is limited, if you have a bunch of transactions that add filler opcodes or dummy pushes in order to be allowed to use more CPU, that blockspace is taken away from actually economic transactions. You decreased the value for miners. The fees being increased for those dummy bytes don’t make up for the loss of utility of the coin. Lower utility (people wait longer for their simple payments and may even move to competing chains) and miners are less happy.

To unwind, the basic premise of fees paying for blockspace in the eyes of the miners dismisses the idea of a coin having value based on utility, which is currently not really an issue since gambling is the main reason for coin value today (moreso on btc than on bch). But long term this is a real problem when blocks get full and people want to use this system for their daily payments.

Blockspace is paid for in most part by increase in utility. Which in reality means peer to peer payments. As evident by the hashpower backing btc vs bch.

To unwind, we get back to this CHIP.
The link between fees paid, bigger transactions and rights of usage on the CPU is a bad one. It will propagate the fees-pay-for-blockspace in a way that perverts the basic incentives. Much like on BTC, but we probably wouldn’t see it for years.
Next to that, the basic premise of using transaction size in any way is irrelevant to obtaining the goals of the CHIP. Nobody gives a shit about cpu being fairly divided between ‘complex’ scripts and simple scripts like p2pkh. All we care about is to keep the system safe. I would argue that static limits do that without the downsides.

bitcoincashautist · September 13, 2024, 9:28am

Yes, and density-based approach will keep it as safe as it is, while still allowing each TX byte to do more CPU ops.

With density-based limits: a block full of 1000 x 10kB TXs (10MB) will take as long to validate as a block full of 100,000 x 100B TXs, (10MB of different composition) even if they were both composed of worst-case TXs.

With flat limit per TX, the latter could take 100x more time to validate.

tom · September 13, 2024, 9:32am

Not per TX, per UTXO. That is what I’ve consistently been writing

One unit as handed to the VM. Which is the combination of 1 input script and 1 output script.

cculianu · September 13, 2024, 9:44am

Tom raises some good points and some good arguments. I do agree we are still living with the psyops baggage from BTC — where block size was the one limited resource and where txns auction in a fee market for block space

I also fondly remember the time when coin days was a thing and when you could move Bitcoin for 0 sats.

All of this is very good discussion.
So when reading this I at first thought maybe some arguments were being made for using not a density based limit at all but just a static limit.

But then it seems the argument is more about how we calculate density — is it the density of the block on hand? Or is it the density of the block plus all the past blocks and transactions it “invokes”?

When framed like that… I am partial to the block in hand, since that’s easier to reason about, versus the more difficult property of worrying about a txn and all the past txns it invokes or a block and all the past blocks it invokes…

bitcoincashautist · September 13, 2024, 10:45am

No, 100x case is per TX with flat limit per TX.

With density-based limit the 2 blocks would take the same time in worst case no matter which kind of TXs they’re packed with. With flat, larger TXs would be relatively cheaper to validate (or smaller ones relatively more expensive, depending on your PoV).

That’s the flat vs density-based consideration.

You raised another question, given a density-based system why not have prevout Script contribute to TX budget?

Because that would allow maybe even 1000x differences when comparing one 10MB vs some other 10MB block, even if packed with TXs made of same opcodes - because the prevouts could load much more data into validation context and have the same opcodes operate on bigger data sizes etc.

Consider a simple example:

locking script: <data push> OP_DUP OP_DUP OP_CAT OP_CAT OP_HASH256 OP_DROP
unlocking script: OP_1

and now you get a 10MB block full of these little TXs of 65 bytes each. Can you estimate how long it will take you to validate it? You can’t - because you don’t know what prevouts it will load.

each prevout’s locking script could be hashing 10k bytes, so your 10MB block could end up having to hash 1.5GB!

cculianu · September 13, 2024, 10:03am

Yeah so with the “density of current block only” approach it’s easier to turn a single knob up and down — block size knob — and constrain worst case time in an obvious way.

I mean even with utxo based calculations for limits there is a theoretical maximum any 1 input can cost you in terms of CPU — so block size is still a very very inexact measure of worst case.

But if you were some hyper vigilant miner that wanted to constrain block delays — you’d just have to write software that also measures a txns execution budget with numbers such as “utxo size” thrown into the mix.

Not a huge deal it really is 6 of 1 half dozen of the other … in a sense …

But the devils in the details if we are going for density based limits and we can get great results from just looking at the current txns size or the current inputs size and we can turn a single knob — block size to well constrain block validation and propagation cost — rather than two knobs — block size plus execution time — maybe the single knob design is preferred?

Idk.

I do get the offensiveness of “punishing” the UTXO script in a way… it does feel unfair.

But im ok with that since I like the single knob to bind them — block size.

bitcoincashautist · September 13, 2024, 10:11am

The problem is that there was never a consensus limit on size of TX outputs (at the moment of their creation) so anyone with access to some hash could be creating whatever DoS “bombs” ready to be executed by a TX that would spend them.

Well, we could keep the density-based approach and create some allowance for “bare” contracts, by clamping their contribution to budget, like:

if (prevout_size <= 200) { budget += prevout_size; } else { budget += 200; }

This way you have no risk of such bombs, because you know that no single UTXO can blow up the budget for the active TX.

I know @tom hopes to see more experimentation with “bare” scripts, and this approach would allow it in a cautious way. I’d be in favor of relaxing standardness rules to match the 200 (or whatever safe number we’d pick) so people can actually experiment with it.

cculianu · September 13, 2024, 10:26am

Yeah our fundamental problem is we are poor and can’t afford another metric. Byte Size is the 1 metric we have.

We are sort of pushing it here to also have it serve as a proxy for worst case complexity.

Really the correct solution is a gas system — ha ha.

But since we are poor and size seems to be all we have — I think looking at the size of the block on hand is the best heuristic for complexity we can muster with our primitivity.

There’s too much variance if one were to use utxo script size … to contribute to execution budget. . . Given that current block size is our 1 rate limiter. . .

Or we can create a real gas system …

Idk that’s how I see it.

bitjson · September 13, 2024, 4:29pm

Hi all, just a reminder that this discussion is getting a bit off topic. The Limits CHIP has been carefully constrained to avoid touching economic incentives, and this discussion is veering that way.

Remember, adding the UTXO length to the Density Control length would have essentially no impact on any contract that is currently or will be standard following the Limits CHIP: it would give P2SH20 contracts an additional 18400 pushed bytes/Op. Cost (23*800=18400) and P2SH32 contracts an additional 28000 (35*800=28000). In practice, there’s little difference between this and ~1.5x-ing the limits.

We’ve been extremely careful to re-target all limits precisely at their existing practical levels to avoid any impact to how BCH works today; adding UTXO length to Density Control length would be a surprising and potentially-dangerous change even if it came in a CHIP for 2026 – it certainly doesn’t belong in this one (beyond the already-existing rationale for not adding it in 2025).

To reiterate, adding UTXO length to Density Control length is a one-way change that can always be applied in a future upgrade. If anyone feels strongly that the VM should offer even higher limits based on UTXO length, please propose an independent CHIP.

If anyone wants to write more about the more general topic of relaxing output standardness, please do that here:

tom · September 14, 2024, 3:20pm

FWIW, I agree with that.

tom · September 15, 2024, 9:30pm

So I’ve been checking my facts a bit longer and I think there is a lot of FUD flying around. A fear of execution times causing problems on the network is naturally a good reason to take action. And I definitely support this CHIPs approach to limiting that.

But the question is, limit it to what?

To start, basic operations are cheap. With the limit of a single stack entry being 10KB, you can check the time it takes an actual full node to do something like an XOR. I wrote some code and checked the execution time on my laptop and it is fast. It takes 50 nano seconds to copy 10KB into another 10KB buffer using XOR.
50ns means you can do that 20 million times in a single second. On a laptop.

More realistically, there is a great benchmarking app that BCHN has and it runs a lot of scripts and gives the run-time of each.
A single script takes on average about 40 microseconds to run. Meaning a single CPU core can process 25 thousand inputs per second.
Again, on a laptop. A Pi will be slower, a desktop or even a server will be much faster.

I’m hoping if someone can get me “disaster big horribly expensive” scripts to compare, but so far the scripts are not standing out as expensive. The overhead of the interpreter is going to be a large chunk of that 40 micro-seconds and adding more opcodes that are “expensive” won’t increase the actual time spent much. I’ve seen the slowest script take about 100 microseconds instead of the median of 40.

Practically all historical exploits have been about not-smart implementations. For instance the quadratic hashing code was about hashing the same transaction hundreds of times. Solution: cash the hashed value (Core did that many years ago). Another was a UTXO implementation issue where using 1 output would load into memory all outputs of the transaction and causing memory exhaustion. Also this is older than bitcoin cash is, and fixed as long ago.

The CHIP is quite successful in measuring expensive stuff. I have no doubt it is the sane thing to do in order to protect the VM and the full node. Set a limit and avoid issues.

But I need to stress that the ACTUAL capabilities are vastly bigger than people may have in their heads. A 100x increase in processing-time still isn’t going to cause any issues for the network or the nodes.

So, keep it sane.

tom · September 17, 2024, 12:42pm

More numbers, something that is needed for this CHIP:

When we remove limits totally, the cost (wall-time) of doing 16 really big multiplications is 29 microseconds.

If I create a 10k opcodes max script filled with biggest possible multiplications and run that, the total run time (again, needing to turn off limits) is 95 milliseconds. (notice the micro vs mili here)

So the takeaway here is that without actual liimts, things are not really all that expensive. Much cheaper than expected. Or, in other words, the actual safe limits of what the hardware (and software) is capable of is massively more than we need.

Which brings me again to my question of: what limits should actually be used for actual operation.

The first script (less than 20 opcodes, but big multiplications) uses 22430325 (22 million) op-code cost. Which is equivalent to requiring a push of 29KB in the current state of the CHIP.
To remind us, this script took 29 microseconds to actually run. It will absolutely not cause any scaling issues if that were made possible on mainnet. I’ve not checked the limits required for the 10KB script, it would be quite enormous to allow that one to run.

Jason picked the current limits based on what is needed by script- devs. In my opinion those limits are so laughably low that I have no problem with them, I mean if that is the max that honest scripts use, then set those limits.

I do want to renew my objection to tying the scriptsig size to the limits, though. It now looks to be completely unneeded and links two layers where there is no need to link two layers.

emergent_reasons · September 17, 2024, 1:31pm

As I understand the design choices, it’s less based on what is needed per se, and more on what is there already, which is a conservative choice that I support in the spirit of “do one thing at a time”. IMO easy to update or remove as appropriate going forward.

I understand that the bigint is to some degree “two things at a time”, but