CHIP 2021-03 Bigger Script Integers

markblundeberg · August 16, 2020, 11:13pm

So, one interesting plan would be as follows:

Update to 1+63 bit math and OP_MUL. Change all overflow handling to reject overflow on output. This way, implementations can use fast 64-bit checked math that is native to current CPUs.
In future, expand to larger numbers (1+255 or 1+256 bit?) and regulate big-number arithmetic operations by counting them as partial SigChecks. We would probably want to rename and rescale the sigchecks. I’m not sure what calibration values would be desired, but I expect any reasonable values would be far from causing any reasonable smart contracts from running into a sigchecks limit.

emergent_reasons · August 17, 2020, 2:25am

If that does not reduce the chances of a future better upgrade, that would be nice. If 128 is effectively as easy as 64 as a first step, then that would be much better for contracts like AnyHedge.

Regarding the cost, rather than overloading sigchecks, do you think it might be a good time to consider a script accounting mechanism that would create the ground work for more rational fee setting?

markblundeberg · August 17, 2020, 4:01am

As far as I can tell, checked 64 bit is in a class of its own as low hanging fruit and we don’t need to worry about DoS workarounds (Actually I’m not 100% sure about that! Integer division, even CPU-native division, can be quite slow. Needs analysis.). Of course it should be coded in a way that makes it possible to expand in future.

It seems that going beyond 64 is in another category – 80, 128, 129, 256, 257, 512, whatever. (I don’t understand why 128 is chosen btw, that seems to be focussed on some particular applications? 1+256 seems like the next natural step given how bitcoin is all about 256-bit nonnegative numbers. E.g. maybe someone wants to sha256 something and take a modulo for some clever idea.) Anyway once we figure out the DoS workarounds these are all fair game.

emergent_reasons · August 17, 2020, 5:49am

The reason of 128 is that Tobias seemed convinced that it is the boundary for doing things efficiently and without needing boost / additional dependencies in other languages. Also he thought it was important for fractional sats which I personally don’t put weight on for a near-term update.

andrewstone · August 20, 2020, 4:32pm

I think that we need to focus on the medium to long term when deciding to change the scripting language. And also an evolutionary approach is slower. Some 3rd world countries have better cell coverage than places in the USA because they didn’t bother to put down all the copper first – they leap-frogged the technology.

If we define a large bignum, then any interpreter can optimize cases that fit into a machine word. A large bignum prepares BCH script for the future, and provides key niche advantages today. Specifically, it should be very clear to developers that many of the tasks of BCH script will consist of cryptographic operations so there will be a need. And for proof, you can look at ETH: there are a variety of “bignum-out-of-256int” libraries that implement bignums there. We can “leapfrog” ETH by providing native, high performance bignums.

Worry about bignum support for various languages is putting too much focus on detail verses the broad picture of creating the most useful scripting language. All major languages have ways to import C libraries. Sure doing so may not be super convenient – but it is not correct to assume that the C++ impls get an “unfair advantage”. Arguably, C++, being an older and more complex language, assesses a continuous penalty on C++ full nodes. If the C++ implementations gain a small advantage here, how can we assess this advantage verses disadvantage?

I have created a bignum implementation over libgmp (gnu multiprecision library, LGPL licensed - that is the: LIBRARY GPL, not the standard GPL – so not viral and therefore compatible with MIT and commercial use). A quick search showed me that libgmp already contains bindings for all the major languages we are interested in.

See www.nextchain.cash for detailed specifications.

bitjson · January 13, 2021, 10:43pm

Do any fleshed-out proposals exist for this smaller upgrade to 64-bit numbers?

bitjson · January 13, 2021, 10:47pm

Has anyone written any full explorations of whether or not upgrades to the number range should keep the current overflow behavior (overflows are accurate, but further math operations will fail) vs. a new behavior to fail the script immediately on overflows?

Scripts can be badly-designed to rely on either mechanic such that an upgrade would “break” their contract. (Of course, I’d prefer these possibly-non-existent scripts be broken than for BCH to take on permanent technical debt to avoid breaking them, I just think it’s important that we publicly note the decision.)

emergent_reasons · January 14, 2021, 3:01am

No one has done it. I also think this is a relatively low-hanging fruit of investigation. Find all transactions since some reasonable point in the past, say the CDS upgrade, and do a regression test on actual rules vs. drop-in 64 bit numbers. If some outcomes are different due to dependence on overflow behavior, it will give insight into actual risk. If no outcomes are different then .

Just as a guess, using the script testing system that @rosco developed using Libauth seems reasonable if Libauth could provide an engine with a signed 64-bit drop-in replacement. Hmm… but then data sizes may have knock-on effects. What do you think of the possibility?

bitjson · January 19, 2021, 11:10pm

Cool, I might try to do that on-chain analysis in the next few months. Of course, whether or not they already exist, if several implementations start working on a solid proposal, I expect someone will broadcast transactions which are broken by it before deployment (griefing). So maybe we should also clarify in the spec that “contracts designed to break after script number range upgrades will require migration”.

if Libauth could provide an engine with a signed 64-bit drop-in replacement […]

Sure, that should be fairly easy. When instantiating a VM instance, you can modify the instruction set to replace each arithmetic operation with a version that supports signed 64-bit inputs and results, e.g. transaction introspection alternative VM.

bitjson · March 23, 2021, 6:39pm

Until this past week, I was somewhat indifferent to whatever solution was eventually decided here, but I thought about it a lot more while writing these two posts:

I’m fairly convinced now that we want to focus on ~64 bit integers (if we can be clever and do 63 bit for a performance boost, that would be fine too).

As far as I can tell, math on larger numbers is only valuable for crypto applications, which we don’t really want to be done using the opcode system:

Right now crypto operations are at least 1000s of times more expensive than all other non-crypto operations (assuming we fix/optimize some edge cases with particular implementations of OP_ROLL and OP_IF/OP_ELSE). As such, it isn’t even worth measuring the performance of non-crypto operations when considering fees or DOS limits: non-crypto operations are totally fungible.

If we start supporting e.g. arbitrary length math operations, it becomes necessary to somehow measure or estimate their performance at runtime (based on the numbers actually used) for both fee estimation and DOS limits. The 1 sat/byte fee + opcode limit is no longer adequate (and static analysis is made even harder).

These costs may be worth it if there were important long-term use cases for arbitrary length math, but as far as I can tell, the only potential application is manually-implemented crypto systems. On the topic (from above post):

When thinking about new crypto algorithms, it’s probably most useful to think of the VM as exposing them over a VM-friendly API. Contracts don’t manually use large integer math operations to implement signature checking, they use OP_CHECKSIG. Again, it’s possible that future crypto systems will require different, more complex API interfaces, but I’ve tried to account for that by estimating high – 4 per 10 years. (And remember, in the past 10 years, we added ~2: OP_CHECKDATASIG and OP_CHECKDATASIGVERIFY.)

Built-in crypto opcodes will always be superior to equivalent, opcode-implemented crypto algorithms because they are both easier to optimize and use more concise VM bytecode (one opcode vs. an algorithm written using several opcodes). So rather than making numbers and the existing math operations support arbitrary sizes, it seems more likely that we’ll provide an “opcode interface” to load/manipulate/validate messages and keys using new crypto systems.

So I think the value proposition of >64 bit math boils down to: will we have a “crypto diaspora” in which users of Bitcoin Cash want to simultaneously switch to a vast number of new crypto systems which can’t or won’t be implemented as built-in opcodes?
I think this is unlikely. New crypto systems typically undergo extensive review before widespread industry use – it seems unlikely that e.g. a significant number of Bitcoin Cash users/businesses will want to start using a new crypto system unless it has been reviewed (and is academically and technically popular) enough to be implemented as one or more built-in opcodes.

There is the possibility that Bitcoin Cash development slows or stops, such that even a large plurality of users can’t “push through” the addition of new crypto opcodes. I think this is unlikely too: if Bitcoin Cash is to continue to survive, it will have to maintain some crypto-agility – at least to remedy future breaks in foundational cryptographic primitives (e.g. developments in quantum computing).

So with that in mind, I’m fairly convinced that:

we should implement ~64 bit math, and
there is little, long-term, practical use for >64 bit math.

I still need to consider the largest practical sizes for simple arithmetic on future, fractional satoshi values, but I suspect even that is not likely to be worth jumping from 64 bit to 128 bit math – very few covenants/applications will be dealing with large enough sums to hit those limits, and those applications are likely capable of using either rounded numbers (e.g. do the initial math using 1,000s of satoshis, then multiply to get back to fractional satoshis) or emulated precision (like the long division hack).

So: does anyone know of any hypothetical use case for that sort of non-crypto arithmetic beyond 64 bit numbers?

emergent_reasons · March 24, 2021, 1:34am

This is a fantastic write-up. At the very least, I agree about no arbitrary-precision.

I do have one issue although not a use-case per se. As I understand it, ETH uses 256 bit numbers and a de facto fixed-point standard has emerged there. I can imagine there may be significant value in piggy-backing on infrastructure and standards established there, but it is an abstract understanding. Do you already have insight into potential value there such as optimized numerical libraries, validated math procedures, etc?

Griffith · March 24, 2021, 2:04pm

what about cryptographic operations as most of them require numbers larger than 64 bits?

bitjson · April 30, 2021, 3:16pm

After that analysis on optimizing crypto math, I now consider the current Bitcoin Cash VM strategy to be strictly superior to any systems which attempt to mix simple arithmetic operations and cryptographic math operations. I don’t see any use case where mixing the two might be useful, and it’s a major performance bottleneck.

In a sense (very oversimplified): to mix simple arithmetic and crypto, we’d need “gas” too, just like ETH.

By keeping them separate, the Bitcoin Cash VM is plenty efficient enough for maliciously invalid transactions to be handled by the existing peer-banning mechanism – no need to add a wasteful pay-per-operation layer.

Mixing simple arithmetic and crypto in the VM sounds plausible, but if you look closely, it doesn’t really work.

Another way to think about this: crypto operations in BCH don’t really operate on numbers, they operate on strictly formatted objects like signatures and public keys.

A “signature” is theoretically a “point on a curve”, but in practice, it is represented in the VM as e.g. a strict DER encoded (BIP66) array of bytes followed by the bitcoin-specific “sighash” byte, which encodes the VM-specific signing serialization algorithm to be used during validation: 0x30 [total-length] 0x02 [R-length] [R] 0x02 [S-length] [S] [sighash byte]. Bitcoin Cash Schnorr signatures use the encoding [R] [S] [sighash byte]. Public keys can be in either compressed or uncompressed form (these days, they’re almost all compressed). Uncompressed public keys encode both the X and Y coordinates, compressed public keys only encode the X coordinate and whether Y is odd (0x03) or even (0x02).

These are not numbers; to perform (crypto) math on them, the VM math operations would need to be modified to parse and handle these particular data structures. But we wouldn’t want to do that for the performance reasons mentioned above – that gives us the same disadvantages as Ethereum math.

Instead, new crypto operations in the Bitcoin Cash VM are best implemented in ways which are compatible with the existing crypto API: separate arithmetic and crypto operations.

Note, this doesn’t prevent us from representing crypto-sized numbers anywhere in the VM. Crypto math operations can still – for example – return a 32-byte X value, and that value could in turn be consumed by other crypto math operations. But the arithmetic operations should read it as any other 32-byte binary string and return an error.

emergent_reasons · May 1, 2021, 3:39pm

Thanks for your reply! I think I agree about cryptographic operations. However I want to clarify that I was talking about the potential opportunity cost of re-inventing mathematical proofs, libraries, synthetic operations (like fixed point libraries) that are all based on 256 bit numbers. What is your opinion about the size of that opportunity cost? If it is huge, then maybe using 256-bit is a good idea just for that (not to enable cryptographic operations). If it is small or doesn’t exist, then it’s a complete non-issue.

bitjson · May 2, 2021, 11:15pm

Ah! Well I might be overlooking something, but I think the opportunity cost goes the other way (to the disadvantage of ETH) – the rest of the world mostly uses 64-bit math such that ETH had to develop most of their 256-bit math in the various implementations. So we’d probably have an easier time with 64-bit math.

I’m maybe not familiar enough with the ETH ecosystem to know of any contract development tooling that would be useful, but I also haven’t identified any use cases where the math I need is challenging to write in BCH VM bytecode. (The primary use case I’m working on is an on-chain token sale with increasing prices per lot and some modulation based on outstanding supply. Only really needs OP_ADD and/or OP_MUL.)

emergent_reasons · May 3, 2021, 12:16am

Great answer. I hadn’t thought of it from that angle. Thank you.

bitjson · June 9, 2021, 10:55pm

Hi all, I just opened an MR to update the CHIP for this with a full spec for the 64-bit + OP_MUL proposal. I’d appreciate any feedback either here or on the MR:

(Direct link to contents:)

BigBlockIfTrue · June 10, 2021, 12:00pm

The spec is excessively complex because the normative definition of overflows refers to X86-64 GNU C Integer Overflow Builtins, instead of simply referring to mathematics. I would recommend to rewrite the definition of overflows using simple mathematics. For example, use this definition of 64-bit math:

The valid number range is from -9223372036854775807 to +9223372036854775807 inclusive. Both inputs and outputs must lie within this range. Otherwise, an overflow error occurs, causing script execution to fail immediately.

Implementations are then free to decide how to compute whether the result lies within this range and if so, what the result is. Any usage of X86-64 GNU C Integer Overflow Builtins in combination with blacklisting -9223372036854775808 then becomes an optional implementation consideration that can be mentioned in a separate section.

freetrader · June 10, 2021, 3:42pm

@bitjson at this point I only have one comment, and that is on the test vectors which at the moment cover only valid and invalid representations of integers, but should, I think, be extended to cover the upgraded operations.
i.e. testing of operations with results that would’ve previously just exceeded the 32 bit floor/ceiling and testing the new floor and ceiling with overflow / underflow cases from various operators.

EDIT: nevermind the above, I was looking at the script vectors section. Now I see you already have a separate Test Cases section which is still in progress.

Also, reading back in this thread to the good idea from @emergent_reasons about a regression test on past transactions … I suppose that could be done by just revalidating past blocks going back suitably far with the upgraded integer logic. Really curious to see if there would be any surprises there.

bitjson · June 21, 2023, 5:33pm

To follow up on this comment:

Some recent discussion here:

The need for additional Math opcodes

That remains the case – 64-bit numbers are an excellent, conservative choice for our “base numbers” (though we were careful to ensure future upgrades can expand further), and I still think that range supports most practical use cases.

It’s important to distinguish between numbers used for representation of real values (like satoshis, shares, votes, tickets, etc.) and numbers used in intermediate calculations by contracts. Bumping up our base number range to 128 or 256 bits wouldn’t help with those intermediate values; we’d just need to e.g. emulate 512-bit fixed precision multiplication to fully support the new larger number values (or we’re back to this same problem of needing to sanitize inputs by limiting them to some value lower than the maximum range of the base numbers).

You can see this in practice on Ethereum: 2^256 numbers haven’t eliminated the need for big integer libraries – instead, contracts need to emulate even higher precision to avoid sanitizing inputs of the larger base numbers. E.g. In practice, essentially no tokens actually need 18 decimal places in the real world (just the network fees often cost over half of those decimals places), but the contract ecosystem is forced to pay an ongoing cost for that unused “base” precision.

Anyways, I think the real solution here is to figure out the most byte-efficient “macro” to emulate higher-precision computations in BCH VM bytecode, then optimize that macro to be relatively byte-efficient without fully offloading it to the VM implementation. This keeps bytecode length tied to computation costs, incentivizing contract authors to conserve both (without requiring the overhead of some arbitrary operation-counting or gas system). Any optimizations here also continue to pay dividends even if future VM numbers expand to 128-bit or larger numbers, as it seems that higher-precision math is always useful for some intermediate calculations.

(N.B.: My perspective on distinguishing arithmetic from crypto math is also informed by an expectation that sidechains are significantly more efficient than automata for a wide variety of use cases.)