CHIP 2021-05 Targeted Virtual Machine Limits

tom · November 22, 2023, 10:24pm

From an architectural point of view, I would advice to think about this in the vain of there being two mempools.
First the one you know, inside of the full node. Its task is to make the network function (detect double spends etc) and for normies as well as miners this mempool is important to make sure that block transfer is cheap. Which means, it should have all the transactions that might be in the next block so we can avoid fetching them afterwards.

The second mempool would be one explicitly for mining software.
Ideally it can be connected to a full node and slave to is so it is simply a subset from the full node’s mempool in order to avoid any and all validation. Keep it simple, and all that.

But adding new transactions to this mempool can be done much more leniently and take more CPU time (probably multi-threaded) in order to give weights and scores to transactions. Include the highest scoring ones and bumb less happy ones.

So, I imagine this second mempool to be snapshotted for a block template regularly and various algorithms can be used to make the most profitable block within the parameters given. Because we’ll inevitably end up with more transactions than will fit in blocks and selection then should be done smarter than just on fees.

bitjson · November 23, 2023, 5:06am

Thank you all for digging into this! I’m mostly AFK until December, so I’m sorry I’ll have to come back to thread later. Just wanted to link to recent related discussion on limiting memory usage:

https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2023-October/022082.html

The patch bundles the limiting of total stack item memory usage into the stack depth limit.

Not sure if that ends up being simpler in most implementations than this CHIP’s current approach, but I wonder if that strategy opens up more opportunities for skipping the validation in common cases (so implementations can check memory usage less often).

Anyways, this question is still the main item that I personally think needs more research for this CHIP to be ready:

tom · November 23, 2023, 12:22pm

Flowee the Hub is probably the only one that has a validator which is compltely multi-threading. My thinking there has always been that we simply allocate the number of threads that the hardware supports (including hyperthreading) and queue jobs for the first thread to take.

As a result there will be at most N virtual machines (stack etc) in memory. Where N equals the number of virtual cores. Typically less as those VMs are cheap to start (though that may be an optimization in future).

My thumb-in-air feeling is that if we use 1MB per VM for things like stack, that is not even going to register. I’d be OK with 10MB, to stretching that to 25MB per VM.

Because in my 64cores machine, which can validate 64 transactions at a time, that would mean we allocate 1.6GB of memory max for the validation part. Which in 2024 is perfectly acceptable IMO.

MathieuG · February 1, 2024, 11:37pm

@cculianu has gotten acceptance from the other BCHN members to start work on implementing the ‘Targeted VM Limits’ CHIP in a branch of BCHN and do testing of those limits.

This seems a great way to move the proposal forward, as a way to investigate and resolve the open questions about the stack memory usage limit - how expensive it is to measure and enforce compared to alternative solutions - and questions around the need for a limit stack depth.

im_uname · February 1, 2024, 11:58pm

Copy pasting my own message from telegram, seems relevant:

Re: “Does this make inscriptions (data storage) easier, that sounds bad”:

The data storage question is pretty vague on BCH since putting things on the chain is quite cheap already regardless of method. imho we mainly have to address three things:

Jpegs are not somehow made cheaper than other things (see also btc)
Jpegs don’t somehow gain easy entrance into the UTXO
Jpegs don’t somehow gain an excessively easy route, like a limitless opreturn. This is particularly very ill defined and open to debate.

Otherwise there’s nothing preventing people from just dumping a bunch of 200 byte opreturns on the chain today and assemble them into a jpeg offchain. ^ just to demonstrate that any such proposal doesn’t make it worse than status quo.

MathieuG · February 2, 2024, 12:10am

Smart contract developers run into the opcode limit every so often. For example @dagurval ran into the limit implementing his ‘Stable asset token - incentivized floating peg’ contract, then optimized it, then later ran into the limit again and tweeted “The BCH opcode limit is too damn low!”

I also ran into the opcode limit when designing the improved sha-gate contract.

Sometimes it is possible to split the contract into 2 with relatively low overhead, other times like when trying to emulate math, the added complexity simply becomes practically prohibitive to use.

Agreed! After spending time thinking through ‘Composite Arithmetic Opcodes’ and ‘The need for additional Math opcodes’ it became clear that best to allow this functionality to be emulated first, before it is added natively.

Cashscript library functionality, similar to that of Solidity, is not very useful currently because of the 201 opcode count limit. With the newly proposed limit of 10,000 bytes it would allow contract authors to emulate math/opcode functionality and easily abstract this into separate libraries.

ShadowOfHarbringer · February 2, 2024, 7:03pm

I see one major problem here: How do we name the new upgrade?

Targeted Virtual Machine Limits makes “TVML”. Hm, no good, need to work on this.

TaVLiMa

MiLViTa

How about ViLMa T.? As in Vilma Tlinstone?

bitjson · February 2, 2024, 10:47pm

Fantastic! I started cleaning up the CHIP a bit in this PR: Propose for 2025 upgrade by bitjson · Pull Request #6 · bitjson/bch-vm-limits · GitHub

bitjson · February 13, 2024, 10:28pm

Just an update here: I plan to push this proposal for November 2024 lock in, May 2025 deployment. The latest revision is available at the repo:

I’ll continue my research on the open issues:

I’d appreciate any feedback here on those issues and on the latest draft proposal. You can also open new issues directly at the repo.

I plan to finalize those items and produce a complete set of VMB test vectors prior to the May 2024 upgrade so that node/VM implementation developers will have plenty of time to review before November.

tom · February 14, 2024, 2:02pm

This is definitely needed, indeed.

I’ve been blunt with Jason in the last days, will do so again. Hope he will understand.

The bottom line with limits for the VM is that they are in practice limits for the C++ implementation as that is the only place where this is an actual realistic place we’ll run those validations with massive throughput expectations.

For stack-limits, for instance, the current design uses a vector of byte-arrays. Two actually, since we have an alt-stack too. By far the most risky part of massive usage of the stack will then be the calls to malloc (which std::vector does internally). Especially if there is an expectation of doing tens of thousands of script validations every second, malloc is a bad choice and unstable.

To gather good stack-limits, then, is to start a small project (would likely be less than a man-month of work) which is a single class that does a single mem-allocation at full-node startup which represents the stack for this single specific script invocation. @cculianu will likely understand the idea without much explanation.

If that approach is taken, I expect that the script limits for stack can trivially go up to some 25MB without any issues to the full node. Even running 100 threads in parallel means only 5GB static allocation for this.

So, to repeat, the bottom line is that we need an actual implementation where full, parallel validation of scripts is possible in order to find the limits that are possible today and indeed for a long time in the future (decades).

Nobody doubts that the current limits should be raised, but my vote for what the new limits should be goes to a technical analysis based on actual written code.

bitjson · February 16, 2024, 5:25am

“Targeted”, not increased VM limits

I want to note again that this CHIP does not increase the primary VM limits:

The effective processing limit before/after this CHIP is 1000 hash digest iterations.
The effective memory usage limit before/after this CHIP is approximately 130KB.

These existing limits are already well in excess of what we need today, and they may be sufficient for decades to come.

I recommend against any increase in these limits for 2025. If developers have use cases for higher limits, those can serve as motivation for future CHIPs.

What does “Targeted” mean?

The existing VM is very naive about limits: attackers can easily abuse the current limits, but honest users are stifled by them. This CHIP remedies the imbalance between attackers and honest users by counting VM processing and memory usage in a harder-to-abuse way.

There’s much more detail in the Summary and Motivation sections. You can also see exactly how we calculate the existing limits in the Technical Specification.

Because this CHIP would change how those primary limits are enforced, all implementations would need to review their performance on the Benchmarks, especially those for OP_IF and OP_ROLL, and if necessary, implement O(1) algorithms where relevant, e.g. the O(1) Execution Stack mentioned in the CHIP. While these should generally be orders of magnitude less computationally expensive than the hashing limit, they may be bottlenecks (and thus DOS vectors) if naively implemented.

ShadowOfHarbringer · February 16, 2024, 10:33am

This got a little complicated now.

For increased clarity, if I understand this right, there are 3 main limits in the script VM:

1: Processing limit: 1000 hash digest iterations
2: Memory usage limit: 130KB
3: Total Opcode script length limit: TOO DAMN LOW!

So we will be only increasing the 3rd limit. Please correct me if wrong.

tom · February 16, 2024, 11:07am

A CHIP like this has two sides. You (and definitely various others in this space!), have a good grasp on the one side. The limits that get in your way. You identified some in your CHIP.

The second part here is the limits that make sense for a high-velocity implementation.

As I wrote above, the 130KB limit on script execution is pathetically low. In practice that can be massively increased.

What also is a bit weird in your CHIP is the counting of a select few opcodes. And having them have the same weight. A sha1 and a sha256 operation are not the same thing. My personal opinion (without writing the code, mind you) about the proposed 1000 hashing-units limit is that it makes little sense to single-out hashing functions.
It is pretty easy to calculate a weight for every operator while running, and give a max weight there. Instantly solving your for loop conundrum and indeed the quadratic issue in one go with a clean and predictable design.

The 1000 hashing limit, for instance, and the 130kb too is you setting the limits by usage. What you expect a contract to need. This places the limit squarely in the first side, you have the expertise in the NEEDs of the VM.

I respectfully disagree. The limits are there to protect the full node implementations. Your point of you thinking users won’t need more is valid and taken. But I read it very much like the people never needing more than 640kb. (for the young ones, that’s a reference, not literal)

I think the way to actually decide on actual limits should be based on actually running code in a native implementation (native means compiled to assembly).

If it turns out that it is essentially free to increase the memory usage of the stack from 130KB to 25MB, we can have a discussion why it would be somehow a problem to do so as you advice against it (why did you?). I don’t expect the machines that run the chain to get lower specs over the next decade, afterall.

bitcoincashautist · February 16, 2024, 12:19pm

He’s saying “for 2025”, and I’d say at this level of research that’s the conservative approach: the need is satisfied, and we can be sure the change won’t bottle-neck anything since the proposed adjustments are well below the theoretically possible limits.

Later, when more research & development in node software is done, we could further increase them, why not?

ShadowOfHarbringer · February 16, 2024, 12:25pm

Assuming a 1GB block in the future is only filled with such maximum-sized VM smart contracts, how much memory and CPU time would it roughly theoretically take to validate such block?

While it’s cool to give more power to developers, we should be careful not to damage anything either now or in the future.

tom · February 16, 2024, 12:50pm

Depends on the number of parallel validation threads the machine has. it does NOT depend on the number of transactions (or the blocksize) itself.
If its a Pi, then it takes 100MB in my example 25MB stack limit. It would make the bch full node allocate that once in the entire operation of the full node.

You wouldn’t even notice, in other words. (Pi5 comes with 4 or 8GB)

tom · February 16, 2024, 1:29pm

The problem is over-complicating stuff, an open standard that has too many little limits and features. I call it an open standard a little bit optimistically, but if we want more implementations in the decades to come we have to see it as such.

For instance the quadratic hashing issue caused a consensus change in BitcoinCash various years ago based on very similar discusions that did not include actual real hardware testing. Just models and speculation. Even though various years before that the actual issue was already shown to be a non-issue. I want to make sure we don’t make the same type of mistake again.

The idea I showed above about a cost to each opcode that is being run would replace a lot of different concepts:

github.com/bitjson/bch-vm-limits

Replace `SigChecks` with a limitation on `OP_CODESEPARATOR`?

opened 07:58PM - 06 Feb 24 UTC

bitjson

Unless I'm missing something, SigChecks are only protective in [preventing abuse… of `OP_CODESEPARATOR`](https://gist.github.com/markblundeberg/c2c88d25d5f34213830e48d459cbfb44). With consensus enforcement of the `NULLFAIL` rule ([since Nov. 2017](https://upgradespecs.bitcoincashnode.org/nov-13-hardfork-spec/)), any signature check that triggers validation will be cacheable, and therefore every new signature check requires at least 65 bytes of signature data. The signing serialization components that make up the preimage – `hashPrevouts`, `hashUtxos`, `hashSequence`, `hashOutputs` – are also cacheable across input evaluations within the same transaction, so only the `coveredBytecode` is a meaningful determiner of relevant preimage size. So: only `OP_CODESEPARATOR` can create excessive resource usage by requiring validators to create new preimages for each signature in the evaluation. This is a potentially useful feature in smart contracts, as it allows a single public key to commit to the location within a contract at which different signatures can be included. (I probably wouldn't have included `OP_CODESEPARATOR` as a feature of the VM if we were starting from scratch, but I can't deny that because it already exists, it's worth preserving – it remains the most efficient way of accomplishing that commitment behavior, and would remain valuable in covenant and non-interactive contract use cases even if we e.g. added MAST to BCH.) Following [this discussion](https://gitlab.com/bitcoin-cash-node/bitcoin-cash-node/-/issues/109#note_1758699049), @zander suggests just limiting `OP_CODESEPARATOR` directly, which I'm starting to think is the best approach. I need to do a bit more review to identify reasonable limits. Simply carrying over equivalent limits from `SigChecks` is probably best, with a minor increase to remedy the 2020 issue mentioned in https://github.com/bitjson/bch-vm-limits/issues/8.

Big chunks of:

And indeed the proposed 1000 hash-units from this chip draft.

So, to reiterate my point, requirements are great input to this discussion. I fully trust Jason to be capable and motivated to find the lower limits he feels we should be able to process in a BitcoinCash full node.
Now lets check what the actual, tested-in-running-software bottlenecks are before we introduce new consensus level limits, no?

Edit; sigop-limits were introduced by Core a very long time ago, similarly without actually doing the research which is why we replaced them in bch the first moment we got as they were useless to actually protect the thing they were supposed to protect.
Not a new thing, the problem I’m trying to fight here.

rnbrady · March 25, 2024, 11:25am

Great CHIP!

Got me wondering if there could be any historical UTXOs secured by hashes that would become vulnerable to a length-extension or identical-prefix collision attack as a result of the element size increase.

The answer is no, that’s not how hashes get used in Bitcoin, but I was interested to discover Peter Todd’s SHA1 collision bounty where the winner extracted a 520byte collision from a 422kB collision (two PDFs) to satisfy the element size limit.

MathieuG · April 7, 2024, 3:00pm

When thinking about the effects of contract authors having targeted VM Limits, it came up that targeted VM Limits could serve as a natural intermediate step to see whether MAST would be a big optimization long term.

If with targeted VM limits we see contract authors write smart contract code in a single script instead of utilizing ‘side car outputs’ then the smart contract bytecode will have a lot of unused functions taking up space. This is might a reasonable tradeoff on Bitcoin Cash where transaction fees are low, and developer time (DX) is an important limiting factor.

MAST optimizes this case where contracts have a bunch of unused functions in the contract code, compared to the case where there are separate ‘side car outputs’ with the logic for unused scripts.

We would then have a better, more realistic estimate/calculation of the space-savings that MAST would offer

cculianu · April 10, 2024, 11:20am

Stack memory usage limit should count size of both the primary and “alt” stack

Note that there are two stacks in the VM – the primary stack and the “alt” stack (accessed by e.g. OP_FROMALTSTACK and OP_TOALTSTACK).

The current stack depth limit of 1000 applies to the depth of both stacks summed together.

Proposal: You should update the spec to specify that the 130,000 byte limit should apply to both altstack and the primary stack summed together.

In this way, this mirrors current logic for the stack depth limit.