CHIP 2021-05 Targeted Virtual Machine Limits

@cculianu has gotten acceptance from the other BCHN members to start work on implementing the ‘Targeted VM Limits’ CHIP in a branch of BCHN and do testing of those limits. :smiley:

This seems a great way to move the proposal forward, as a way to investigate and resolve the open questions about the stack memory usage limit - how expensive it is to measure and enforce compared to alternative solutions - and questions around the need for a limit stack depth.

3 Likes

Copy pasting my own message from telegram, seems relevant:

Re: “Does this make inscriptions (data storage) easier, that sounds bad”:

The data storage question is pretty vague on BCH since putting things on the chain is quite cheap already regardless of method. imho we mainly have to address three things:

  1. Jpegs are not somehow made cheaper than other things (see also btc)
  2. Jpegs don’t somehow gain easy entrance into the UTXO
  3. Jpegs don’t somehow gain an excessively easy route, like a limitless opreturn. This is particularly very ill defined and open to debate.

Otherwise there’s nothing preventing people from just dumping a bunch of 200 byte opreturns on the chain today and assemble them into a jpeg offchain. ^ just to demonstrate that any such proposal doesn’t make it worse than status quo.

4 Likes

Smart contract developers run into the opcode limit every so often. For example @dagurval ran into the limit implementing his ‘Stable asset token - incentivized floating peg’ contract, then optimized it, then later ran into the limit again and tweeted “The BCH opcode limit is too damn low!

I also ran into the opcode limit when designing the improved sha-gate contract.

Sometimes it is possible to split the contract into 2 with relatively low overhead, other times like when trying to emulate math, the added complexity simply becomes practically prohibitive to use.

Agreed! After spending time thinking through ‘Composite Arithmetic Opcodes’ and ‘The need for additional Math opcodes’ it became clear that best to allow this functionality to be emulated first, before it is added natively.

Cashscript library functionality, similar to that of Solidity, is not very useful currently because of the 201 opcode count limit. With the newly proposed limit of 10,000 bytes it would allow contract authors to emulate math/opcode functionality and easily abstract this into separate libraries.

6 Likes

I see one major problem here: How do we name the new upgrade? :wink:

Targeted Virtual Machine Limits makes “TVML”. Hm, no good, need to work on this.

TaVLiMa

MiLViTa

How about ViLMa T.? As in Vilma Tlinstone?

image

:joy:

1 Like

Fantastic! I started cleaning up the CHIP a bit in this PR: Propose for 2025 upgrade by bitjson · Pull Request #6 · bitjson/bch-vm-limits · GitHub

3 Likes

Just an update here: I plan to push this proposal for November 2024 lock in, May 2025 deployment. The latest revision is available at the repo:

I’ll continue my research on the open issues:

I’d appreciate any feedback here on those issues and on the latest draft proposal. You can also open new issues directly at the repo.

I plan to finalize those items and produce a complete set of VMB test vectors prior to the May 2024 upgrade so that node/VM implementation developers will have plenty of time to review before November.

6 Likes

This is definitely needed, indeed.

I’ve been blunt with Jason in the last days, will do so again. Hope he will understand.

The bottom line with limits for the VM is that they are in practice limits for the C++ implementation as that is the only place where this is an actual realistic place we’ll run those validations with massive throughput expectations.

For stack-limits, for instance, the current design uses a vector of byte-arrays. Two actually, since we have an alt-stack too. By far the most risky part of massive usage of the stack will then be the calls to malloc (which std::vector does internally). Especially if there is an expectation of doing tens of thousands of script validations every second, malloc is a bad choice and unstable.

To gather good stack-limits, then, is to start a small project (would likely be less than a man-month of work) which is a single class that does a single mem-allocation at full-node startup which represents the stack for this single specific script invocation. @cculianu will likely understand the idea without much explanation.

If that approach is taken, I expect that the script limits for stack can trivially go up to some 25MB without any issues to the full node. Even running 100 threads in parallel means only 5GB static allocation for this.

So, to repeat, the bottom line is that we need an actual implementation where full, parallel validation of scripts is possible in order to find the limits that are possible today and indeed for a long time in the future (decades).

Nobody doubts that the current limits should be raised, but my vote for what the new limits should be goes to a technical analysis based on actual written code.

2 Likes

“Targeted”, not increased VM limits

I want to note again that this CHIP does not increase the primary VM limits:

  • The effective processing limit before/after this CHIP is 1000 hash digest iterations.
  • The effective memory usage limit before/after this CHIP is approximately 130KB.

These existing limits are already well in excess of what we need today, and they may be sufficient for decades to come.

I recommend against any increase in these limits for 2025. If developers have use cases for higher limits, those can serve as motivation for future CHIPs.

What does “Targeted” mean?

The existing VM is very naive about limits: attackers can easily abuse the current limits, but honest users are stifled by them. This CHIP remedies the imbalance between attackers and honest users by counting VM processing and memory usage in a harder-to-abuse way.

There’s much more detail in the Summary and Motivation sections. You can also see exactly how we calculate the existing limits in the Technical Specification.

Because this CHIP would change how those primary limits are enforced, all implementations would need to review their performance on the Benchmarks, especially those for OP_IF and OP_ROLL, and if necessary, implement O(1) algorithms where relevant, e.g. the O(1) Execution Stack mentioned in the CHIP. While these should generally be orders of magnitude less computationally expensive than the hashing limit, they may be bottlenecks (and thus DOS vectors) if naively implemented.

6 Likes

This got a little complicated now.

For increased clarity, if I understand this right, there are 3 main limits in the script VM:

  • 1: Processing limit: 1000 hash digest iterations
  • 2: Memory usage limit: 130KB
  • 3: Total Opcode script length limit: TOO DAMN LOW!

So we will be only increasing the 3rd limit. Please correct me if wrong.

A CHIP like this has two sides. You (and definitely various others in this space!), have a good grasp on the one side. The limits that get in your way. You identified some in your CHIP.

The second part here is the limits that make sense for a high-velocity implementation.

As I wrote above, the 130KB limit on script execution is pathetically low. In practice that can be massively increased.

What also is a bit weird in your CHIP is the counting of a select few opcodes. And having them have the same weight. A sha1 and a sha256 operation are not the same thing. My personal opinion (without writing the code, mind you) about the proposed 1000 hashing-units limit is that it makes little sense to single-out hashing functions.
It is pretty easy to calculate a weight for every operator while running, and give a max weight there. Instantly solving your for loop conundrum and indeed the quadratic issue in one go with a clean and predictable design.

The 1000 hashing limit, for instance, and the 130kb too is you setting the limits by usage. What you expect a contract to need. This places the limit squarely in the first side, you have the expertise in the NEEDs of the VM.

I respectfully disagree. The limits are there to protect the full node implementations. Your point of you thinking users won’t need more is valid and taken. But I read it very much like the people never needing more than 640kb. (for the young ones, that’s a reference, not literal)

I think the way to actually decide on actual limits should be based on actually running code in a native implementation (native means compiled to assembly).

If it turns out that it is essentially free to increase the memory usage of the stack from 130KB to 25MB, we can have a discussion why it would be somehow a problem to do so as you advice against it (why did you?). I don’t expect the machines that run the chain to get lower specs over the next decade, afterall.

2 Likes

He’s saying “for 2025”, and I’d say at this level of research that’s the conservative approach: the need is satisfied, and we can be sure the change won’t bottle-neck anything since the proposed adjustments are well below the theoretically possible limits.

Later, when more research & development in node software is done, we could further increase them, why not?

5 Likes

Assuming a 1GB block in the future is only filled with such maximum-sized VM smart contracts, how much memory and CPU time would it roughly theoretically take to validate such block?

While it’s cool to give more power to developers, we should be careful not to damage anything either now or in the future.

1 Like

Depends on the number of parallel validation threads the machine has. it does NOT depend on the number of transactions (or the blocksize) itself.
If its a Pi, then it takes 100MB in my example 25MB stack limit. It would make the bch full node allocate that once in the entire operation of the full node.

You wouldn’t even notice, in other words. (Pi5 comes with 4 or 8GB)

1 Like

The problem is over-complicating stuff, an open standard that has too many little limits and features. I call it an open standard a little bit optimistically, but if we want more implementations in the decades to come we have to see it as such.

For instance the quadratic hashing issue caused a consensus change in BitcoinCash various years ago based on very similar discusions that did not include actual real hardware testing. Just models and speculation. Even though various years before that the actual issue was already shown to be a non-issue. I want to make sure we don’t make the same type of mistake again.

The idea I showed above about a cost to each opcode that is being run would replace a lot of different concepts:

Big chunks of:

And indeed the proposed 1000 hash-units from this chip draft.

So, to reiterate my point, requirements are great input to this discussion. I fully trust Jason to be capable and motivated to find the lower limits he feels we should be able to process in a BitcoinCash full node.
Now lets check what the actual, tested-in-running-software bottlenecks are before we introduce new consensus level limits, no?

Edit; sigop-limits were introduced by Core a very long time ago, similarly without actually doing the research which is why we replaced them in bch the first moment we got as they were useless to actually protect the thing they were supposed to protect.
Not a new thing, the problem I’m trying to fight here. :slight_smile:

Great CHIP!

Got me wondering if there could be any historical UTXOs secured by hashes that would become vulnerable to a length-extension or identical-prefix collision attack as a result of the element size increase.

The answer is no, that’s not how hashes get used in Bitcoin, but I was interested to discover Peter Todd’s SHA1 collision bounty where the winner extracted a 520byte collision from a 422kB collision (two PDFs) to satisfy the element size limit.

5 Likes

When thinking about the effects of contract authors having targeted VM Limits, it came up that targeted VM Limits could serve as a natural intermediate step to see whether MAST would be a big optimization long term.

If with targeted VM limits we see contract authors write smart contract code in a single script instead of utilizing ‘side car outputs’ then the smart contract bytecode will have a lot of unused functions taking up space. This is might a reasonable tradeoff on Bitcoin Cash where transaction fees are low, and developer time (DX) is an important limiting factor.

MAST optimizes this case where contracts have a bunch of unused functions in the contract code, compared to the case where there are separate ‘side car outputs’ with the logic for unused scripts.

We would then have a better, more realistic estimate/calculation of the space-savings that MAST would offer :grinning_face_with_smiling_eyes:

2 Likes

Stack memory usage limit should count size of both the primary and “alt” stack


Note that there are two stacks in the VM – the primary stack and the “alt” stack (accessed by e.g. OP_FROMALTSTACK and OP_TOALTSTACK).

The current stack depth limit of 1000 applies to the depth of both stacks summed together.

Proposal: You should update the spec to specify that the 130,000 byte limit should apply to both altstack and the primary stack summed together.

In this way, this mirrors current logic for the stack depth limit.

3 Likes

Specification should specify at what point during execution the stack memory usage limit is enforced


I would suggest the specification specify that the 130,000 byte limit is enforced after the currently-executed OP code completes .

Why specify this? Because it makes it very clear and future-proofs the specification.

Note: the current stack limit of 1000 only is enforced after the execution of the current OP code completes.

Further rationale: We may introduce future opcodes that are complex and that temporarily exceed limits, only to resume back to below-limit after op-code-completion. As such, the limit should be specified to apply at some specific point. And it makes sense to mirror current operation of the stack depth limit (which only applies the stack depth limit after the current op-code completes execution).

4 Likes

In the spec, what is the definition of an “evaluation” for the purposes of hitting the hash ops limit? For example, a p2sh:

(1) evaluates the locking script, hashing the redeemscript via e.g. OP_HASH160 or whatever.
(2) then it does another evaluation using the redeemscript (which itself may contain further hashing op codes).

When (2) begins evaluating redeemscript, is the hash count reset to 0 or does it continue where (1) left off for the counts?

EDIT: So far in my test implementation, I am having it not reset to 0 as it proceeds to the redeemscript, thus the hashOps count is 1 when the redeemScript begins execution in the p2sh case.

EDIT2: For the hash limit: OP_CHECKSIG and friends don’t count towards the hash ops limit, right?

3 Likes

I’m late to the party, but I wanted to share, I brought myself up to date with the CHIP, and notably the 130kB limit lacks a rationale.

I understand that it is preserving the current maximum limit, but it is not stated as to why we’d want to keep the current maximum.

Thank you in advance, and sorry if this has already been addressed, the thread is way too long to parse in one go, sorry!

2 Likes