CHIP 2021-05 Targeted Virtual Machine Limits

The “Hashing Limit” section has a handful of unnamed, bare constants that I didn’t find an explanation for:

  • 7: ((41 + unlocking_bytecode_length) * 7) / 2
  • 1, 8, 64: 1 + ((message_length + 8) / 64) + (is_double ? 1 : 0); (the first 1+, not the conditional)

Each of them deserves a name and/or explanation. In contrast, e.g. the numbers 41, 800 have explanations.

I can guess at or infer the source of them based on other parts of the spec, but I’m not totally confident about my inference. It should be in the spec even if it’s explained here.

2 Likes

No, in that you don’t need full execution, as in, run through the end of the Script and only then be able to tell whether it fails or not. You can stop and mark invalid the moment when the budget is exceeded, you need to execute only up until that point. Important for DoS scenarios, you don’t exceed your budget even if someone tries to give you a TX excessively over the budget.

Hmm @bitjson did specify it like that in his operation cost table, e.g. for OP_MUL (a b -> c) it is specified as 100 + (2 * c.length) + (a.length * b.length), but I wonder why? Can’t we just do (a.length * b.length)*3 ?

You only need the size of script to know the execution budget. Then you start evaluating and you only need to execute up until the point where inputs can be determined, if still under budget you continue to next one, etc. You linearly execute the Script and keep updating the operation cost, the moment it exceeds the budget, you can fail it, without needing to execute the rest.

Right. That’s what I was thinking it would be good to avoid, but I think I see that the whole system is made on this basis where it’s by nature a static + runtime accounting process in many aspects. Makes sense and avoids needing to make bad (either limitingly tight or dangerously loose) assumptions. It’s probably obvious to those steeped in validation process, but not to someone who isn’t.

1 Like

Hopefully I can help clarify some things with my chiming in –

  1. There are two concepts here to keep in mind – the cost of execution … and the execution budget (limit).
  2. The cost of execution of a particular script is not known ahead of time… until you execute it. I would say that is an accurate statement, yes, Emergent.
  3. The existing sigCheck mechanism which Mark Lundeberg added in 2018/2019 has the same property – before that there was a heuristic “sigOps” mechanism that just scanned the script for 0xba or whatever and used that to determine if the script would pass/fail ahead of time. This was problematic… because unexecuted branches would add to the cost needlessly.
    • sigCheck limit also has the property that you don’t know the actual cost until you execute
    • vmLimits now also has this property as well – given that sigChecks exists and we are ok with it, this isn’t a huge departure from status quo. It’s just another dynamic cost you only “know” once you execute.
  4. Now… executuon limit is definitely known ahead of time. The execution budget, as it were, is based on “input byte length”. The concept of input bytes is identical to what (old) documentation refers to as scriptSig. It’s just the bytes in the transaction “scriptsig”… whatever they may be. Usually they are just a bunch if push ops – push of pubkey and push of sig (for p2pkh). The total byte count of that blob is what’s used to determine the execution budget – not the output size!
  5. So… output size doesn’t come into the picture at all here … it can be added — but as I understand it Jason felt it wasn’t needed – it’s sufficient to just go with scriptSig and use that byte size to determine the execution budget. The current proposal then has the side-effect of incentivizing p2sh over some crazy non-standard “bare” contract…

So the scenario we have is:

  • the execution budget (limit) is determined statically by just looking at the size of scriptSig
  • then, you execute the script (which you need to do anyway) to determine if among all the thousand other things it does, it also respects that budget. If it doesn’t, kill it and fail. If it does, great… pass (assuming it returns true).

I hope that helps?

5 Likes

Thanks! After your and BCA’s responses, I’ve filed two issues that would be better to resolve. I’d probably be able to write a PR for the static considerations if Jason wants that, but probably wouldn’t do justice to the other one.

2 Likes

GP has posted this issue. We think handling the two main issues listed is required in some form and not optional before our endorsement can be considered for 2025 activation. The 3rd issue listed is more minor but we still recommend some reasonable handling of it.

2 Likes

Great, thanks to General Protocols for the review! I’ll work on incorporating the feedback on both CHIPs.

Re stakeholder statements: I plan to start asking for final statements beginning in October and after all the active implementations have completed their development and review work. (Stakeholders are free to send PRs with statements before then, but I’ll have to clear any statements if feedback from one of the implementations requires meaningful CHIP language changes. Easier to post those elsewhere for now.)

5 Likes

Thanks for the questions, and thanks @bitcoincashautist and @cculianu for the detailed answers, just going to add a few more comments:

Yeah, there’s a naming-clarity issue here. The CHIP says “approximate spending input byte length” when referring to 41 + unlockingBytecode.length (A.K.A. scriptSig length), but it would be good for us to (in the spec) just define a proper term.

Background (for others reading along): all evaluations take place inside transaction inputs. The new density limits are based on the length of those transaction inputs, see the very long Rationale: Use of Input Length-Based Densities.

Anyways, the precise number is more intelligent than “encoded input length” for a few reasons (Rationale: Selection of Input Length Formula), but things would be clearer if we committed to a formal term for it. In Libauth, it’s now Density Control Length, which I think is descriptive and hard to confuse with anything else. I’ll try to revise the CHIP language to use that as formal terminology. Thanks @emergent_reasons!

3 Likes

Thanks, we should make those more explicit.

The 7 is just part of a possible formula for 3.5, without * 7, for 0.5:

For standard transactions, the maximum density is approximately 0.5 hash digest iterations per spending input byte; for block validation, the maximum density is approximately 3.5 hash digest iterations per spending input byte. See Rationale: Selection of Hashing Limit and Rationale: Use of Input-Length Based Densities.

Given the spending input’s unlocking bytecode length (A.K.A. scriptSig), hash digest iteration limits may be calculated with the following C functions:

int max_standard_iterations (int unlocking_bytecode_length) {
    return (41 + unlocking_bytecode_length) / 2;
}
int max_consensus_iterations (int unlocking_bytecode_length) {
    return ((41 + unlocking_bytecode_length) * 7) / 2;
}

And the hash digest iteration formula is (I believe) the most efficient way to calculate what goes on inside all of our hash functions:

Note on 64-Byte Message Block Size

Each VM-supported hashing algorithm – RIPEMD-160, SHA-1, and SHA-256 – uses a Merkle–Damgård construction with a block size of 512 bits (64 bytes), so the number of message blocks/digest iterations required for every message size is equal among all VM-supported hashing functions.

I just double checked it with a few different LLMs, but would appreciate others confirming if that formula is ideal. (Of course, it’s technically possible for implementations to add a counter rather than use this formula, but it’s somewhat less likely they’d want to modify their hashing implementation to add an internal counter.)

We could probably also clarify that note to be more explicit about the 8 and + 1 accounting for padding. Maybe:

Explanation of Digest Iteration Formula

Each VM-supported hashing algorithm – RIPEMD-160, SHA-1, and SHA-256 – uses a Merkle–Damgård construction with a 512-bit (64-byte) message block length, so the number of message blocks/digest iterations required for every message length is equal among all VM-supported hashing functions. The specified formula correctly accounts for padding: hashed messages are padded with a 1 bit, followed by enough 0 bits and a 64-bit (8-byte) message length to produce a padded message with a length that is a multiple of 512 bits. Note that even small messages require at least one hash digest iteration, and an additional hash digest iteration is required for each additional 512-bit message block in the padded message.

How’s that explanation?

2 Likes

Ok better! I’m still a bit unclear on the minimum part. It sounds like it would require a conditional (or min equivalent) to represent what is in the paragraph? I must still be missing something.

A construct you might use is:

# A: <explanation>
# B: <explanation>
# C: <explanation>
asdf = A + (C * D)

This pretty much removes all room for doubt.

Re:

And:

(Note @bitcoincashautist and I talked out of band, but will summarize here:)

All operations which create or modify stack items (not just math) must be evaluated to “know their total cost”, but this is precisely because our limits already operate one step ahead of actual execution. Changing any particular operation’s sub-linear cost formula(s) to an operand length-based heuristic would be both technically incorrect (in that total cost would no longer match bytes manipulated) and not improve worst-case performance. On the arithmetic ops specifically, the rationale is described in Arithmetic Operation Cost and Rationale: Inclusion of Numeric Encoding in Operation Costs.

Re cost determination requiring contract execution:

As @cculianu mentioned, the 2020 SigChecks upgrade was careful to preserve some optimizations discussed here. We’ve done the same thing (Rationale: Continued Availability of Deferred Signature Validation) and taken additional steps to limit hashing, as it cannot be deferred.

With these limits, VM implementations can always abort before expensive operations, and the specific design also allows us to avoid special per-operation logic or heuristics for all sub-linear and linear-time operations:

This proposal limits the density of both memory usage and computation by limiting bytes pushed to the stack to approximately 700 per spending input byte (the per-byte budget of 800 minus the base instruction cost of 100).

Because stack-pushed bytes become inputs to other operations, limiting the overall density of pushed bytes is the most comprehensive method of limiting all sub-linear and linear-time operations (bitwise operations, VM number decoding, OP_DUP, OP_EQUAL, OP_REVERSEBYTES, etc.); this approach increases safety by ensuring that any source of linear time complexity in any operation (e.g. due to a defect in a particular VM implementation) cannot create practically-exploitable performance issues (providing defense in depth) and reduces protocol complexity by avoiding special accounting for all but the most expensive operations (arithmetic, hashing, and signature checking).

Additionally, this proposal’s density-based limit caps the maximum memory and memory bandwidth requirements of validating a large stream of transactions, regardless of the number of parallel validations being performed1.

Alternatively, this proposal could limit memory usage by continuously tracking total stack usage and enforcing some maximum limit. In addition to increasing implementation complexity (e.g. performant implementations would require a usage-tracking stack), this approach would 1) only implicitly limit memory bandwidth usage and 2) require additional limitations on linear-time operations.

  1. While the 10KB stack item length limit and 1000 item stack depth limit effectively limit maximum memory to 10MB plus overhead per validating thread, without additional limits on writing, memory bandwidth can become a bottleneck. This is particularly critical if future upgrades allow for increased contract or stack item lengths, bounded loops, word/function definition, and/or other features that reduce transaction sizes by increasing their operation density.

I also highlighted an example above where our approach here is measurably safer than any previously proposed for bitcoin (BCH) or bitcoin-like contract systems: (:rocket:)

1 Like

Blockquote
Right. That’s what I was thinking it would be good to avoid, but I think I see that the whole system is made on this basis where it’s by nature a static + runtime accounting process in many aspects. Makes sense and avoids needing to make bad (either limitingly tight or dangerously loose) assumptions. It’s probably obvious to those steeped in validation process, but not to someone who isn’t.

You know – I was thinking about some optimizations to the VM and one optimization is to “lazy convert” the ScriptNums to vector data – that is, only convert the c result of a + b = c to a vector when it’s used “as a vector”, otherwise keep it as a ScriptNum. Such an optimization is called a lazy evaluation optimization and in some usage patterns it can pay off. Say that c was then to be input into another operation later “as a” script num… you save on not having to serialize it to vector only to unserialize from vector back to script num.

Anyway – what I am getting at is you may have a point @emergent_reasons – basically there is some cost to having to calculate the length… (our num encoding in memory for the stack nums differs from a bigint lib like libgmp… so you pay the cost of encoding to stack each time).

I suppose there is no real other way to cost this though… so, it is what it is. Serialization to vector is fast enough but it’s faster to not have to do that… especially for ephemeral temporary values.

Oh well.

2 Likes

I’d like to talk a bit about this one…

There are a couple of assumptions that Core pushed upon us that are actually really not logical and they seem to have been used to come to the conclusions you made.

In short, I think the approach should be to base the budget on the combined output and input scripts.

The most important assumption that should be challenged is the concept that fees pay for transactions and pay for UTXO ‘storage’. The Satoshi client didn’t solely use that, it was one of various ways and Core has over time removed and changed things to make fees be the only way to differentiate transactions.
A lot of things went wrong there, from the removal of coin-days being used to the most harmful change that transaction priorities are no longer a concept decided by mners in a free market (min-relay fee).

If we internalize this, then the idea that a big output-script (aka locking script, aka scriptPubKey) does not have any influence on the budget of execution fails the sniff test. Someone already paid for that, afterall.

Look at this from the practical point of view:

imagine a setup where the full node remembers the actually consumed processing budget. The miner then uses that as an input for a transaction-priority. You have to imagine that as when a transaction uses a high amount of CPU (obviously not more than the limit) it lowers its priority and that may mean it can take 10 blocks to get mined.
Obvious way to counter this is to have a high coin-days-destroyed or as last resort to pay a larger fee. (all this is permissionless innovation that very likely will happen when blocks get full).

Bottom line here is that the fear stated in the rationale of “volatility of worst-case transaction validation performance” is solved in a neat way by simply charging actual usage (lets not call it gas fee, though). Again, in a permissionless way that spreads the load because that is the most profitable thing to do.

As an aside, the attack has always been based solely on a crappy utxo implementation, it was never a real problem of cpu consumption. That is because transactions are validated already well before the block-header comes in. So worries about that should not remove normal incentives. End aside.

To get back to the point, whatever is mined in the output script has been paid for in some way or other. Ignoring that already paid for data in the VM limits chip creates wrong incentives. As Calin above wrote:

Regardless of anyones opinion of p2sh or whatever, it should be obvious that VM limits should not have such side-effects. Side effects are bad. mkay?

Edit: I think the simple way to do this is to add the output-script of the UTXO and the input-script that unlocks it together in total size and use that size as the input.
This script is ALREADY going to be combined into one and passed as one to the VM. As such it is the natural approach that actually accounts for the VM usage.

Thanks for the comments @tom! That alternative is discussed in Rationale: Use of Input Length-Based DensitiesI agree that the operation cost limit could be further increased based on UTXO size, but there are some serious economic and worst-case performance implications that would need to be worked out before it could be done safely.

The net effect of such a change would be:

  1. A very minimal (constant) increase in the budget available to all P2SH contracts (with slight advantage to P2SH32 over P2SH20), and
  2. A many-orders-of-magnitude increase in the worst-case budget available to currently-nonstandard contracts.

Fortunately, we can reduce the scope of this proposal by basing the budget only on input size for now. If there’s demand for it (considering how excessive these retargeted limits will already be for most usage), future upgrades could always raise limits by increasing budgets based on UTXO size or other TX overhead (like other same-TX outputs), by discounting specific operations or usage patterns (e.g. Karatusba-optimizable multiplication), etc. For now, this CHIP 1) takes the most conservative possible choice for each of these areas and 2) assumes the worst-case implementation of every system (e.g. even a zero-day bug in some implementation’s OP_SWAP making its inadvertently O(n) cannot be exploited).

That rationale section with the relevant section highlighted:

Use of Input Length-Based Densities

This proposal limits both hashing and operation cost to maximum densities based on the approximate byte length of the input under evaluation (see Rationale: Selection of Input Length Formula).

Alternatively, this proposal could measure densities relative to the byte length of the full containing transaction, sharing a budget across all of the transaction’s inputs. Such a transaction-based approach would provide contracts with the most generous possible computation limits given the transaction fees paid, allowing computationally-expensive inputs to also claim the computing resources purchased via the bytes of transaction overhead, outputs, and of other inputs. Additionally, if a future upgrade were to relax output standardness, transaction-based budgets would also offer non-P2SH (Pay to Script Hash) contracts a greater portion of the computing resources purchased via the transaction’s other bytes, particularly for contracts which rely mainly on introspection for validation and include little or no unlocking bytecode (e.g. the “Merge Threads” script in Jedex).

However, this proposal’s input-based approach is superior in that it: 1) allows contract authors to reliably predict a contract’s available limits regardless of the size and other contents of the spending transaction, 2) ensures that contract evaluations which do not exceed VM limits can be composed together in multi-input transactions without further regard for VM limits, 3) preserves the simplicity of parallelizing input validation without cross-thread communication, 4) more conservatively limits the worst-case validation performance of maliciously-invalid transactions (by failing the malicious transaction earlier), and 5) could be safely expanded into a transaction-based approach by a future upgrade if necessary.

Another alternative to offer greater limits to non-P2SH contracts would be to base densities on the byte length of both locking and unlocking bytecode, i.e. including the Unspent Transaction Output (UTXO) in the input’s budget. However, this alternative approach would increase the volatility of worst-case transaction validation performance: the price of the UTXO’s past bandwidth is paid by the previous transaction’s mining fee, while the price of the UTXO’s storage is paid only by the time-value of associated dust; if compute resources aren’t strictly tied to costs in the present (like the current transaction’s mining fee), the instantaneous computation requirements of transaction validation are not bound (e.g. slow-to-validate UTXOs may be stored up and evaluated in a smaller number of attack transactions). Instead, this proposal bases computation limits only on costs paid in the present, with the 41-byte minimum input length providing a reasonable minimum computation budget.

Finally, this proposal could also increase the operation cost limit proportionally to the per-byte mining fee paid, e.g. for fee rates of 2 satoshis-per-byte, the VM could allow a per-byte budget of 2000 (a 2.5 multiple, incentivizing contracts to pay the higher fee rate rather than simply padding the unlocking bytecode length). However, any allowed increase in per-byte operation cost also equivalently changes the variability of per-byte worst-case transaction validation time; such flexibility would need to be conservatively capped and/or interact with the block size limit to ensure predictability of transaction and block validation requirements. Additionally, future research based on real-world usage may support simpler alternatives, e.g. a one-time increase in the per-byte operation cost budget.

As this proposal attempts to avoid any impact to worst-case validation time – and future upgrades can safely deploy increases in operation cost limits – solutions for increasing operation cost limits are considered out of this proposal’s scope.

1 Like

I think I get your argument. Due to the “invisible” nature of UTXOs … where they aren’t part of the transaction itself but must be resurrected from the “dead” (from the UTXO db) to be evaluated – you can do some crazy things like have an undead army of scripts that are expensive to validate but attached to a simple-enough-looking txn.

Basically you can “prepare” some pretty expensive-to-validate UTXOs and have this 1 seemingly innocent transaction that looks innocent enough… and you can have it totally choke out the system.

Mempool and blockspace is limited – but UTXO DB space is boundless!

What’s more – it’s difficult to sift through what “looks” expensive to validate and what doesn’t when all you have in a TXN is a pointer to some old script living in a DB table…

I think I get why it’s a pandora’s box to credit the UTXO size to the input’s budget… it’s just hard for me to put numbers behind it but I intuitively get the argument, FWIW.

3 Likes

Stated another way: Imagine a txn that references 10,000 prevouts that are all super huge 10kb UTXOs. You end up in this weird territory where the script VM is now required to allow execution of 100MB+ worth of script data, and is required to allow it to pass if it’s all “correct” and returns “true”. All for a paltry ~500kb in txn space in the present.

Seems a bit out of whack even on the face of it.

Really the achilles heel of UTXO locking scripts is they can be all dead sitting in a DB and resurrected to join an undead army in a single txn and they can overwhelm the costs of evaluation. By “crediting” the txn and allocating budget for all these UTXOs you basically just allow that to happen… which seems to be the opposite of where we want to push the design… ?

1 Like

I’ll have to share the brutal truth, if you are saying that this scheme doesn’t protect nodes in the usecase of 2, then there is a very strong misalignment somewhere.

You using pushes only has side-effects that make an (apparently) wobby system even less stable. Side effects like this are huge red-flags in a running system. That’s really not Ok to introduce. The economic basics of Bitcoin Cash is balanced out carefully and while increasing the VM limits has effectively zero effect on the economic properties you won’t find any objections.
But this changes the balance. Makes one type of transaction “cheaper” than others. It is not as bad as the segwit discount, but it is the same concept.

I’d rather not introduce changes that have known side-effects at all. If you think this can be fixed in the future, then fix it now.
We can always bump this to next year if that is what it takes…

This sounds like the limits are not actually doing their job then. The limits weer supposed to be about protecting the VM. Lower the limits if that is needed to protect the VM. I mean, this is the basic concept of what you guys have been testing and evaluating.
Please do that based on the input AND output size. Make the max script size less, if you need to.
Don’t insert perverse incentives to allow limits to be higher in some cases. That’s not a good idea and will be guarenteed to come back to bite you. BTC has some pretty bad examples of that. Let’s not do that here.

At this point I want to ask WHY there is a connection between the transaction size (and presumably fees paid) and the max / budget that a script can use.

Why not simply make that static. Maybe increases that every halving (to keep up with CPU / bandwidth cost).
That would be much simpler and avoid all this discussion, no?

To repost this, the system we have today is not going to be used in the future. None of the properties are hard-coded / consensus. The miners that start to innovate and become most profitable will change them. Things like relay-fee, things like max free transactions in a block. They are all mining software that can be changed.

Do NOT bind the limits to the assumption that fees-paid is allowing one to run larger fees on (not that miner’s) CPUs. That cross-connects things that should not be connected.

The proposed VM limits system is agnostic of fees paid. If a miner will accept 0-fee TX then you get free execution - but still up to the CPU density limit. It’s just that we limit CPU density per byte of TX in order to protect ALL nodes from DoS. This way if you see a 1MB TX you know before even executing it or looking up its prevouts that it can’t take longer than X seconds to validate - no matter the UTXOs it references, because the budget will be known before even looking up prevout UTXOs and is decided solely by the 1MB TXs contents.

Having prevouts contribute to budget would break that.

UTXOs are not executed in TX that creates them, they’re just dumb data at that point.

When spending TX references them it has to “load” the data and then execute it. And if 1 byte can load 10,000 bytes and execute the 10,000 bytes in a min. size TX - then you can produce poison TXs that inflate CPU density orders of magnitude beyond what is typical.

1 Like

That link doesn’t make sense.

Why grant more CPU ‘rights’ to a transaction that is bigger?

If the answer isn’t about fees paid (as you implied) then that makes it even more weird.

Why not give a static limit to every single input-script (regardless of size, color or race) which it has to stay inside. Make sure that that limit protects all nodes from DOS by picking something low enough.

Worried you picked something too low? Increase that limit every halving… (that idea stolen from BCA :wink: )