CHIP 2021-03 Bigger Script Integers

emergent_reasons · June 10, 2021, 2:18pm

Having larger numbers in script will simplify scripts[1] involving arbitrary satoshi amounts and other calculations. Perhaps even more importantly, it will make those scripts safer where the scripts are currently required to use additional artificial calculation techniques[2]. It is possible that it will increase the number of use cases as well, for example where artificial calculation techniques may not cover all needs or may be impossible to fit in script size limits.

Making op_mul available will have similar benefits by allowing multiplication instead of workarounds involving division or artificial calculation techniques.

The combination of larger integers and op_mul can be seen as a set that completes the basic calculation capability of Bitcoin Cash script. If they should be separated, I will remove op_mul from this topic.

I think this topic deserves discussion and evidence around:

Benefits and risks of different sizes (64, 128, 256, 512, …, bignum, other?) from a scripting safety and complexity perspective.
Benefits and risks of different sizes from an implementation perspective.
Desired behavior with respect to overflow.
Desired behavior of op_mul.
Should everything be done in one step, or should it be more than one to get to the end result?

[1] Two specific examples of use cases that would benefit from increased simplicity and safety:

Mecenas, etc. scripts created by @Licho (predicting username here) and others are limited to about 20 BCH in the best case.
The majority of complexity in AnyHedge contract is truncation math to get around the 31-bit positive integer limitation and retain precision. An alternative 60-bit math technique created by @Tobias_Ruck would solve the precision better, but still the vast majority of complexity would be a workaround for the 31-bit positive integer limitation.

[2] Artificial calculation techniques

As linked above, 60-bit math technique created by @Tobias_Ruck
Also as linked above, truncation math and precision recovery

Tobias_Ruck · May 20, 2020, 3:30pm

Relevant here is the proposal I made for 128-bit integers:

github.com

EyeOfPython/bitcoincash.org/blob/int128/spec/int128.md

---
layout: specification
title: 2020-NOV-15 128-bit integer specification
date: 2020-01-14
activation: 2020-NOV-15
version: 1.0
author: Tobias Ruck
---

# Summary
All arithmetic operations in Script (see below for an exhaustive list) will be able to work with 128-bit signed integers for Pay-To-Script-Hash inputs, ranging from `-0x7fff'ffff'ffff'ffff'ffff'ffff'ffff'ffff` to `0x7fff'ffff'ffff'ffff'ffff'ffff'ffff'ffff`. Any input or output outside of this boundary will cause the execution of the script to fail.

Further, the opcode `OP_MUL` will be re-enabled.

# Background
Currently, Bitcoin Script on the Bitcoin Cash blockchain limits integers to 32-bit signed integers, with values ranging from `-0x7fff'ffff` to `0x7fff'ffff`.

This is severely limiting, if we consider the following facts:
* Values locked in UTXOs are 64-bit unsigned integers.
* SLP values are 64-bit unsigned integers.

This file has been truncated. show original

markblundeberg · August 9, 2020, 8:53am

Thanks for starting this topic, and thanks Tobias for the excellent work writing a spec and running benchmarks. I am surprised that the 128 bit math is that slow. This seems to give an argument for only going to 64-bit at first, since C++ implementations have access to e.g. __builtin_smulll_overflow that on x86-64 does two opcodes – multiply and branch. The idea of limiting to P2SH is very interesting, though I feel like a generalized ‘cost-accounting’ approach (possibly merging sigchecks & other CPU-consuming metrics) would serve better in the long run.

tom · August 9, 2020, 11:16am

I think most people don’t realize that todays Script operators are hardcoded to use 4-byte math (i.e. 32 bit). Which is really quite small. I only was told this a couple of days ago as I never really paid attention to script before.

64 bits is used to encode the ‘amount’ fields (in satoshi) for instance.

This little detail was missing from this thread so far and in my opinion is a good problem statement that makes it easy to convince almost all devs.

emergent_reasons · August 9, 2020, 2:54pm

Thanks for focusing on this specific issue. Just for reference, it’s even slightly worse than it sounds for general satoshi calculations since it’s 32 bit signed. We are writing up some details about specific use cases that are broken or enabled by larger integers and multiplication.

markblundeberg · August 16, 2020, 11:13pm

So, one interesting plan would be as follows:

Update to 1+63 bit math and OP_MUL. Change all overflow handling to reject overflow on output. This way, implementations can use fast 64-bit checked math that is native to current CPUs.
In future, expand to larger numbers (1+255 or 1+256 bit?) and regulate big-number arithmetic operations by counting them as partial SigChecks. We would probably want to rename and rescale the sigchecks. I’m not sure what calibration values would be desired, but I expect any reasonable values would be far from causing any reasonable smart contracts from running into a sigchecks limit.

emergent_reasons · August 17, 2020, 2:25am

If that does not reduce the chances of a future better upgrade, that would be nice. If 128 is effectively as easy as 64 as a first step, then that would be much better for contracts like AnyHedge.

Regarding the cost, rather than overloading sigchecks, do you think it might be a good time to consider a script accounting mechanism that would create the ground work for more rational fee setting?

markblundeberg · August 17, 2020, 4:01am

As far as I can tell, checked 64 bit is in a class of its own as low hanging fruit and we don’t need to worry about DoS workarounds (Actually I’m not 100% sure about that! Integer division, even CPU-native division, can be quite slow. Needs analysis.). Of course it should be coded in a way that makes it possible to expand in future.

It seems that going beyond 64 is in another category – 80, 128, 129, 256, 257, 512, whatever. (I don’t understand why 128 is chosen btw, that seems to be focussed on some particular applications? 1+256 seems like the next natural step given how bitcoin is all about 256-bit nonnegative numbers. E.g. maybe someone wants to sha256 something and take a modulo for some clever idea.) Anyway once we figure out the DoS workarounds these are all fair game.

emergent_reasons · August 17, 2020, 5:49am

The reason of 128 is that Tobias seemed convinced that it is the boundary for doing things efficiently and without needing boost / additional dependencies in other languages. Also he thought it was important for fractional sats which I personally don’t put weight on for a near-term update.

andrewstone · August 20, 2020, 4:32pm

I think that we need to focus on the medium to long term when deciding to change the scripting language. And also an evolutionary approach is slower. Some 3rd world countries have better cell coverage than places in the USA because they didn’t bother to put down all the copper first – they leap-frogged the technology.

If we define a large bignum, then any interpreter can optimize cases that fit into a machine word. A large bignum prepares BCH script for the future, and provides key niche advantages today. Specifically, it should be very clear to developers that many of the tasks of BCH script will consist of cryptographic operations so there will be a need. And for proof, you can look at ETH: there are a variety of “bignum-out-of-256int” libraries that implement bignums there. We can “leapfrog” ETH by providing native, high performance bignums.

Worry about bignum support for various languages is putting too much focus on detail verses the broad picture of creating the most useful scripting language. All major languages have ways to import C libraries. Sure doing so may not be super convenient – but it is not correct to assume that the C++ impls get an “unfair advantage”. Arguably, C++, being an older and more complex language, assesses a continuous penalty on C++ full nodes. If the C++ implementations gain a small advantage here, how can we assess this advantage verses disadvantage?

I have created a bignum implementation over libgmp (gnu multiprecision library, LGPL licensed - that is the: LIBRARY GPL, not the standard GPL – so not viral and therefore compatible with MIT and commercial use). A quick search showed me that libgmp already contains bindings for all the major languages we are interested in.

See www.nextchain.cash for detailed specifications.

bitjson · January 13, 2021, 10:43pm

Do any fleshed-out proposals exist for this smaller upgrade to 64-bit numbers?

bitjson · January 13, 2021, 10:47pm

Has anyone written any full explorations of whether or not upgrades to the number range should keep the current overflow behavior (overflows are accurate, but further math operations will fail) vs. a new behavior to fail the script immediately on overflows?

Scripts can be badly-designed to rely on either mechanic such that an upgrade would “break” their contract. (Of course, I’d prefer these possibly-non-existent scripts be broken than for BCH to take on permanent technical debt to avoid breaking them, I just think it’s important that we publicly note the decision.)

emergent_reasons · January 14, 2021, 3:01am

No one has done it. I also think this is a relatively low-hanging fruit of investigation. Find all transactions since some reasonable point in the past, say the CDS upgrade, and do a regression test on actual rules vs. drop-in 64 bit numbers. If some outcomes are different due to dependence on overflow behavior, it will give insight into actual risk. If no outcomes are different then .

Just as a guess, using the script testing system that @rosco developed using Libauth seems reasonable if Libauth could provide an engine with a signed 64-bit drop-in replacement. Hmm… but then data sizes may have knock-on effects. What do you think of the possibility?

bitjson · January 19, 2021, 11:10pm

Cool, I might try to do that on-chain analysis in the next few months. Of course, whether or not they already exist, if several implementations start working on a solid proposal, I expect someone will broadcast transactions which are broken by it before deployment (griefing). So maybe we should also clarify in the spec that “contracts designed to break after script number range upgrades will require migration”.

if Libauth could provide an engine with a signed 64-bit drop-in replacement […]

Sure, that should be fairly easy. When instantiating a VM instance, you can modify the instruction set to replace each arithmetic operation with a version that supports signed 64-bit inputs and results, e.g. transaction introspection alternative VM.

bitjson · March 23, 2021, 6:39pm

Until this past week, I was somewhat indifferent to whatever solution was eventually decided here, but I thought about it a lot more while writing these two posts:

I’m fairly convinced now that we want to focus on ~64 bit integers (if we can be clever and do 63 bit for a performance boost, that would be fine too).

As far as I can tell, math on larger numbers is only valuable for crypto applications, which we don’t really want to be done using the opcode system:

Right now crypto operations are at least 1000s of times more expensive than all other non-crypto operations (assuming we fix/optimize some edge cases with particular implementations of OP_ROLL and OP_IF/OP_ELSE). As such, it isn’t even worth measuring the performance of non-crypto operations when considering fees or DOS limits: non-crypto operations are totally fungible.

If we start supporting e.g. arbitrary length math operations, it becomes necessary to somehow measure or estimate their performance at runtime (based on the numbers actually used) for both fee estimation and DOS limits. The 1 sat/byte fee + opcode limit is no longer adequate (and static analysis is made even harder).

These costs may be worth it if there were important long-term use cases for arbitrary length math, but as far as I can tell, the only potential application is manually-implemented crypto systems. On the topic (from above post):

When thinking about new crypto algorithms, it’s probably most useful to think of the VM as exposing them over a VM-friendly API. Contracts don’t manually use large integer math operations to implement signature checking, they use OP_CHECKSIG. Again, it’s possible that future crypto systems will require different, more complex API interfaces, but I’ve tried to account for that by estimating high – 4 per 10 years. (And remember, in the past 10 years, we added ~2: OP_CHECKDATASIG and OP_CHECKDATASIGVERIFY.)

Built-in crypto opcodes will always be superior to equivalent, opcode-implemented crypto algorithms because they are both easier to optimize and use more concise VM bytecode (one opcode vs. an algorithm written using several opcodes). So rather than making numbers and the existing math operations support arbitrary sizes, it seems more likely that we’ll provide an “opcode interface” to load/manipulate/validate messages and keys using new crypto systems.

So I think the value proposition of >64 bit math boils down to: will we have a “crypto diaspora” in which users of Bitcoin Cash want to simultaneously switch to a vast number of new crypto systems which can’t or won’t be implemented as built-in opcodes?
I think this is unlikely. New crypto systems typically undergo extensive review before widespread industry use – it seems unlikely that e.g. a significant number of Bitcoin Cash users/businesses will want to start using a new crypto system unless it has been reviewed (and is academically and technically popular) enough to be implemented as one or more built-in opcodes.

There is the possibility that Bitcoin Cash development slows or stops, such that even a large plurality of users can’t “push through” the addition of new crypto opcodes. I think this is unlikely too: if Bitcoin Cash is to continue to survive, it will have to maintain some crypto-agility – at least to remedy future breaks in foundational cryptographic primitives (e.g. developments in quantum computing).

So with that in mind, I’m fairly convinced that:

we should implement ~64 bit math, and
there is little, long-term, practical use for >64 bit math.

I still need to consider the largest practical sizes for simple arithmetic on future, fractional satoshi values, but I suspect even that is not likely to be worth jumping from 64 bit to 128 bit math – very few covenants/applications will be dealing with large enough sums to hit those limits, and those applications are likely capable of using either rounded numbers (e.g. do the initial math using 1,000s of satoshis, then multiply to get back to fractional satoshis) or emulated precision (like the long division hack).

So: does anyone know of any hypothetical use case for that sort of non-crypto arithmetic beyond 64 bit numbers?

emergent_reasons · March 24, 2021, 1:34am

This is a fantastic write-up. At the very least, I agree about no arbitrary-precision.

I do have one issue although not a use-case per se. As I understand it, ETH uses 256 bit numbers and a de facto fixed-point standard has emerged there. I can imagine there may be significant value in piggy-backing on infrastructure and standards established there, but it is an abstract understanding. Do you already have insight into potential value there such as optimized numerical libraries, validated math procedures, etc?

Griffith · March 24, 2021, 2:04pm

what about cryptographic operations as most of them require numbers larger than 64 bits?

bitjson · April 30, 2021, 3:16pm

After that analysis on optimizing crypto math, I now consider the current Bitcoin Cash VM strategy to be strictly superior to any systems which attempt to mix simple arithmetic operations and cryptographic math operations. I don’t see any use case where mixing the two might be useful, and it’s a major performance bottleneck.

In a sense (very oversimplified): to mix simple arithmetic and crypto, we’d need “gas” too, just like ETH.

By keeping them separate, the Bitcoin Cash VM is plenty efficient enough for maliciously invalid transactions to be handled by the existing peer-banning mechanism – no need to add a wasteful pay-per-operation layer.

Mixing simple arithmetic and crypto in the VM sounds plausible, but if you look closely, it doesn’t really work.

Another way to think about this: crypto operations in BCH don’t really operate on numbers, they operate on strictly formatted objects like signatures and public keys.

A “signature” is theoretically a “point on a curve”, but in practice, it is represented in the VM as e.g. a strict DER encoded (BIP66) array of bytes followed by the bitcoin-specific “sighash” byte, which encodes the VM-specific signing serialization algorithm to be used during validation: 0x30 [total-length] 0x02 [R-length] [R] 0x02 [S-length] [S] [sighash byte]. Bitcoin Cash Schnorr signatures use the encoding [R] [S] [sighash byte]. Public keys can be in either compressed or uncompressed form (these days, they’re almost all compressed). Uncompressed public keys encode both the X and Y coordinates, compressed public keys only encode the X coordinate and whether Y is odd (0x03) or even (0x02).

These are not numbers; to perform (crypto) math on them, the VM math operations would need to be modified to parse and handle these particular data structures. But we wouldn’t want to do that for the performance reasons mentioned above – that gives us the same disadvantages as Ethereum math.

Instead, new crypto operations in the Bitcoin Cash VM are best implemented in ways which are compatible with the existing crypto API: separate arithmetic and crypto operations.

Note, this doesn’t prevent us from representing crypto-sized numbers anywhere in the VM. Crypto math operations can still – for example – return a 32-byte X value, and that value could in turn be consumed by other crypto math operations. But the arithmetic operations should read it as any other 32-byte binary string and return an error.

emergent_reasons · May 1, 2021, 3:39pm

Thanks for your reply! I think I agree about cryptographic operations. However I want to clarify that I was talking about the potential opportunity cost of re-inventing mathematical proofs, libraries, synthetic operations (like fixed point libraries) that are all based on 256 bit numbers. What is your opinion about the size of that opportunity cost? If it is huge, then maybe using 256-bit is a good idea just for that (not to enable cryptographic operations). If it is small or doesn’t exist, then it’s a complete non-issue.

bitjson · May 2, 2021, 11:15pm

Ah! Well I might be overlooking something, but I think the opportunity cost goes the other way (to the disadvantage of ETH) – the rest of the world mostly uses 64-bit math such that ETH had to develop most of their 256-bit math in the various implementations. So we’d probably have an easier time with 64-bit math.

I’m maybe not familiar enough with the ETH ecosystem to know of any contract development tooling that would be useful, but I also haven’t identified any use cases where the math I need is challenging to write in BCH VM bytecode. (The primary use case I’m working on is an on-chain token sale with increasing prices per lot and some modulation based on outstanding supply. Only really needs OP_ADD and/or OP_MUL.)