The need for additional Math opcodes

bitcoincashautist · June 22, 2023, 11:00am

I think fixed-point arithmetic is the direction we should look into: Fixed-point arithmetic - Wikipedia

IMO it’s a natural extension of our basic integer arithmetic, and we’d have consistent granularity of results. The exp and log ops could be popping 4 stack items (2 fractions of 2x 64-bit ints) and pushing 2 stack items (a fraction of 2x 64-bit ints).

bitcoincashautist · June 27, 2023, 8:30am

Let’s consider the OP_MULDIV primitive:

Word	Value	Hex	Input	Output	Description
OP_MULDIV	???	0x??	a b c	out	a is first multiplied by b then divided by c. The intermediate product CAN be in the INT128 range, while the final result MUST be in the INT64 range.

We can implement INT32 version as a macro op_muldiv: OP_ROT OP_ROT OP_MUL OP_SWAP OP_DIV but then we’re limited to INT32 precision for the result. Still, we can use the macro just to demonstrate utility. Consider this example:

Multiply some x with 0.999998^10 at max precision:

<1000000000> <0xffffff7f> OP_DUP
<999998> <1000000>
OP_2DUP OP_2ROT OP_2ROT op_muldiv OP_2SWAP
OP_2DUP OP_2ROT OP_2ROT op_muldiv OP_2SWAP
OP_2DUP OP_2ROT OP_2ROT op_muldiv OP_2SWAP
OP_2DUP OP_2ROT OP_2ROT op_muldiv OP_2SWAP
OP_2DUP OP_2ROT OP_2ROT op_muldiv OP_2SWAP
OP_2DUP OP_2ROT OP_2ROT op_muldiv OP_2SWAP
OP_2DUP OP_2ROT OP_2ROT op_muldiv OP_2SWAP
OP_2DUP OP_2ROT OP_2ROT op_muldiv OP_2SWAP
OP_2DUP OP_2ROT OP_2ROT op_muldiv OP_2SWAP
op_muldiv
OP_SWAP op_muldiv

Result: 999979999

The above example essentially unrolls an OP_POW by using the naive method, which could be further optimized by the square & multiply algorithm:

<1000000000> <0xffffff7f> OP_DUP
<999998> <1000000> op_muldiv
OP_DUP OP_2 OP_PICK op_muldiv OP_DUP
OP_DUP OP_3 OP_PICK op_muldiv
OP_DUP OP_3 OP_PICK op_muldiv
OP_2 OP_PICK op_muldiv
OP_SWAP op_muldiv

With bigger VM limits, one could use this as a building block to implement other functions using Taylor series.

I found that Ethereum folks are considering it too:

and they already have two similar opcodes that may be useful for other purposes: OP_ADDMOD, OP_MULMOD

emergent_reasons · June 27, 2023, 11:26am

Just a historical note about this. It was discussed heavily. 128 would obviously be more useful, but it comes with heavy costs, especially at the time when these were some of the first CHIPs, and we had less resources / eyes / developers than today. 64-bit has standards and standard implementations everywhere that require mostly just a little boundary drawing to fit the VM. 128 is a different beast that would require custom implementations and probably new standards. More risk.

I would have loved to have 128, but it wasn’t worth the cost at the time.

If 128 ever really became desired, it’s a relatively straightforward upgrade path from 64 to 128 from a behavior perspective. The only difference would be that some contracts that failed due to overflow would now not fail. As far as I know, that would only matter for an app that specifically uses overflow as a logic mechanism.

MathieuG · July 2, 2023, 4:46pm

Now there are already 2 CashTokens AMMs! Cauldron Swap & Fex.Cash!
Cauldron Swap has published its whitepaper already, FexCash is in preparation to do so soon. Interestingly Cauldron Swap is made using raw BCH script while FexCash is written in CashScript.

For these AMMs using x * y = k the discussed op_muldiv opcode would help with addressing overflows of the 64-bit multiplication. Rosco Kalis explained the old version of AnyHedge using 32-bit integers had to do the same complex math workaround AMMs now have to do to address overflows.

Discussing the future of AMMs on BCH @im_uname wrote

xy=k is one of those ideas that sound so dumb when you hear it, but turns out to be so convenient people are willing to use it to the tune of billions despite being inefficient as all hell

Eventually, new math opcodes or the emulation thereof will be important for enabling experimentation with more advanced and efficient AMM functions.

@bitcoincashautist shows above using op_muldiv it would be reasonable to emulate op_pow. and from there on any AMM function could be approximated with Tayler functions. I assume being able to emulate op_pow means op_exp & op_log could also be emulated more cheaply?

In conclusion, the new op_muldiv opcode suggested above seems like an important primitive which could see great use right away in simplifying Constant Product Market Maker DEXes, and in emulating missing Math opcodes

MathieuG · July 3, 2023, 9:07am

Taking a closer look at the EVM Arithmetic Instructions it is interesting that they only have exp (for exponentiation with any base) from the ones discussed at the top and use math libraries for things like root & log.

Searching for an explanation of the op_addmod & op_mulmod use cases I found this medium post

There are also opcodes that help the compiler in detecting overflows and underflows in the code, some of the common ones are ADDMOD, MULMOD and SMOD

It is notable the op_muldiv EIP discussion linked by @bitcoincashautist above is initiated by Uniswap developers who say

In Uniswap V3, we make heavy use of a FullMath#mulDiv , which allows you to multiply two numbers x and y then divide by a third number d , when x * y > type(uint256).max but x * y / d <= type(uint256).max

Which is the point I was making above for AMM DEXes. The rational for the EIP on ETH continues with

This function is extremely useful to smart contract developers working with fixed point math. Hari (@_hrkrshnn ) suggested that we write an EIP to add a muldiv opcode.

This is a first important consideration, to enable fixed point math the working of the op_muldiv opcode should be different than above. Reading the EIP discussion further:

Some initial discussions we had when writing the EIP:
– Support both “checked” (abort on overflow) and “unchecked” (do not abort) case
– Whether to return multiple stack items or not
– Support accessing the hi bits (after division) or not
The current design wanted to avoid multiple opcodes and returning multiple stack items, hence we arrived at having the special case with z=0 .

Having a special case like this when dividing by zero can be inconsistent and even dangerous for contract design, so apart from the number of stack items returned the denominator 0 case should be discussed.

Lastly, rereading the Market Making Algorithm of Jedex it says

Notably, because Jedex aggregates liquidity for every market tick, slippage for any particular order can be significantly minimized by other orders within the same tick (in a sense, pool sizes grow dynamically based on intra-tick demand); in this context, even the relatively-simple Constant Product function offers better liquidity than much more complex pricing functions used within other decentralized exchange designs.

It would be interesting to hear from @bitjson whether Jedex would benefit from a muldiv opcode, or how it works around the multiplication overflow issue.

bitcoincashautist · July 4, 2023, 8:09am

I’ve been looking for a way to emulate full precision a * b / c by using modulo math. It is possible to do by emulating a wider integer by storing it as 2x uint64:

Math in Solidity (Part 3: Percents and Proportions) | by Mikhail Vladimirov | Coinmonks | Medium
Mathemagic: Full Multiplication. How to do multiplication with full 512… | by Remco Bloemen | wicketh | Medium
Mathemagic: 512-bit Division. In this post I will introduce five… | by Remco Bloemen | wicketh | Medium

Problem for our ScriptVM is that we don’t have uint64, so can’t cleanly split an uint128 to 2x uint64. Maybe we could implement it with uint63 & emulated uint126 as the uint63 can fit into a ScriptNumber, and uint126 can fit into 2 ScriptNumbers.

Division part is the big problem, it needs a bounded loop - worst case it would need 63 iterations. With higher limits CHIP the loop could be unrolled, and implementing it with Script would be quite expensive, lots of bytecode just to perform a single muldiv - an operation we can expect will be common.

If int32 precision is ok, then contracts can use the relatively small macro (OP_ROT OP_ROT OP_MUL OP_SWAP OP_DIV).
If more precision is needed, contracts must compromise on accuracy and input range. Latest example of this is fex cash MULDIV implementation.

I think this demonstrates why OP_MULDIV would be very valuable.

It is cheap and easy to implement the opcode, I already implemented the muldiv function in C for another purpose (multiplying a value with a small factor to calculate blocksize limit algo), the OP_MULDIV would simply be doing this:

uint64_t muldiv(uint64_t x, uint64_t y, uint64_t z) {
    uint128_t res;
    res = ((uint128_t) x * y) / z;
    assert(res < (uint128_t) UINT64_MAX);
    return (uint64_t) res;
}

For VM purposes, it would be operating on signed ints, and should fail on result overflow or on z=0, to be consistent with OP_MUL and OP_DIV.

MathieuG · July 4, 2023, 10:07am

Thanks @bitcoincashautist for explaining the problem with emulating full precision for a * b / c above!

In the just released Fex.Cash whitepaper they explain their math workaround using modulo math

   a * (x/y) = (floor(a/y) * y + (a%y)) * (x/y)
             = (floor(a/y) * y * (x/y) + (a%y) * (x/y)
             = (floor(a/y) * x + (a%y) * x / y

They made a separate file to show the functions for their math workarounds in MULDIV.h, there they have the helper functions FloorMul,FloorDiv, CeilMul & CeilDiv for their modulo math. This could all be simplified & with full precision with an op_muldiv opcode.

With their CashScript contract FexCash demonstrated the need for a CashScript library system. This development has already been noted and investigation into the necessary changes has started with the GitHub issue “Add support for libraries/macros”.

bitcoincashautist · July 4, 2023, 1:46pm

I’m looking at our code-point list, and we have OP_2MUL and OP_2DIV that are DISABLED. So, we could propose to replace OP_2MUL with OP_MULDIV and OP_2DIV with OP_MULMOD:

Word	Value	Hex	Input	Output	Description
OP_MULDIV	141	0x8d	a b c	out	a is first multiplied by b then divided by c and the quotient of the division is returned. The intermediate product CAN be in the INT128 range, while the quotient MUST be in the INT64 range. Sign is preserved and c MUST be a non-zero value.
OP_MULMOD	142	0x8e	a b c	out	a is first multiplied by b then divided by c and the modulus of the division is returned. The intermediate product CAN be in the INT128 range, while the modulus MUST be in the INT64 range. Sign is preserved and c MUST be a non-zero value.

Above we demonstrated the utility of OP_MULDIV, and utility of OP_MULMOD is in emulating even higher precision operations than int64.

emergent_reasons · July 5, 2023, 3:42am

I know this would be useful. I could have used it myself. My main concern is whether it is a good general approach that solves a class of important issues, or if this is right solution for only a small set of problems. In other words, any CHIP would need to explore adjacent solutions to the problems proposed and convincingly show that muldivmod (or whatever the proposal becomes) is the right choice among meaningfully investigated alternatives.

tl121 · July 5, 2023, 4:26am

There are basically two different approaches. One is to mimic existing techniques, e.g. what EVM does, with the main goal is to facilitate migration of existing contracts onto BCH. The other is to provide math primitives grounded in existing hardware architecture and suited for general purpose. This might include support of uint64 data types, which allows for higher precision much better than (signed) int64. uint64 is supported on amd64 and AArch64 architectures.

Evaluation ideally involves sample contracts from a variety of applications and then benchmarking them. Metrics include ease of coding or compiler code generation to get applications and small script size for network performance. These metrics are secondary to the critical requirement for consensus, which includes a thorough specification of the opcodes including all edge cases, a reference implementation, and complete set of test cases.

bitcoincashautist · July 5, 2023, 7:51am

@MathieuG is drafting a CHIP, he asked me about a title and I proposed “Composite Math Opcodes”. They’re orthogonal to native VM precision, because without them - whatever the precision - the operations a * b / c and a * b % c must limit a and b to each be below the native precision.

We could say the main motivation is to bring the 2 composite operation to the same level of precision as basic arithmetic opcodes, and it would pay dividends forever - like if we ever increase VM to int128 then people could easily work with fractions of int128/int128 instead of having to lose precision or use expensive workarounds. In the meantime, they would allow cheaper emulation of even higher precision.

Actually we kinda independently re-discovered this, and then I found that Ethereum folks are looking in the same direction and already solved some problems we’re just discovering now, and then I collected useful referenes from their ecosystem.
I first “discovered” muldiv while working on the blocksize limit CHIP, where I needed to multiply an uint64 with a small decimal number factor. I wanted to store the factor as precisely as possible so ended up with:
next_blocksize_limit = (uint64) previous_blocksize_limit * (uint64) factor_numerator / ((uint64) 1 << 56)

Then, talking with experienced C/C++ devs, I learned that CPU architectures generally support double-width integer type - precisely so they can capture overflowing ops and do something with the result. So, if CPU architecture is supporting int64 ops, then it will support int128 type, too, and it would be used for an efficient implementation of the 2 proposed opcodes.

That would be a much more extensive change to our VM, and would need a new set of opcodes, else all existing contracts would be broken. I’d actually like to have uint64, and I was pondering some kind of “int mode” instruction to VM to switch back and forth when interpreting stack items and doing arithmetic opcodes. It would be a big change, but I don’t think there’s a need - more like nice to have.

Anyway, it is orthogonal to these composite arithmetic opcodes - they’re useful regardless of whether operands and the final result is int64 or uint64.

MathieuG · July 5, 2023, 6:21pm

Published the a CHIP “composite arithmetic opcodes” which proposed to add op_muldiv & op_mulmod to the BCH VM!

bitjson · July 14, 2023, 7:54pm

Nice work @MathieuG and @bitcoincashautist, thanks for diving into the higher-precision math issue!

I’m a bit uncertain about overflow handling for composite arithmetic opcodes, e.g. for OP_MULDIV, you can be certain that the operation won’t overflow if the MUL input is larger than the DIV input, but if that doesn’t hold, you’re back to range checking inputs and/or multiple code paths. Still very useful, but less universal than I’d generally expect BCH VM opcodes to be.

On applicability to Jedex’s market making algorithm:

Jedex uses the “joint execution” strategy to make the liquidity of a simple Constant Product Market Maker (CPMM) algorithm go farther, and Jedex’s algorithm has already been factored to reduce the need for higher-precision math to a single location (fee = satoshis_per_token * 12 / 4096).

So in Jedex’s case, I think an OP_MULDIV would make the arithmetic of the MM contract a bit more efficient, but I can’t say confidently that the change wouldn’t just be optimizing the BCH VM specifically for these CPMM algorithms. E.g. take a look at SoK: Decentralized Exchanges (DEX) with Automated Market Maker (AMM) Protocols (page 37) for a good overview of currently-popular AMM algorithms:

I’m a bit out of my depth on these arithmetic precision questions, but unless I’m missing something, OP_MULDIV would not be sufficient for many of the higher-order multiplication operations required here. E.g. I think Balancer would require exponentiation (see equation 38) and Curve would require greater multiplication precision (of more than 2 VM numbers) - as well as square roots - prior to dividing to get quotients and remainders (see equation 46).

So even with OP_MULDIV, non-CPMM use cases would still need to solve the more general case of creating efficient macros for higher-precision fixed point arithmetic where intermediate results are represented using multiple VM numbers. (And we might need to figure out these macros whether the VM has 63-bit numbers, 127-bit, 255-bit, etc. as increased VM precision just increases the required precision of intermediate calculations.)

Here’s a scratch pad I started to experiment with higher-precision math emulation. If anyone has the bandwidth to experiment with this stuff, I think having some examples to review would really help to discuss/evaluate new math operations. E.g. we might find that more generally-useful primitives would be opcodes for chunking numbers into words of a certain bit-length and merging them back together + SIMD-like arithmetic operations, arithmetic operations that operate on binary sub-sections of stack items, or maybe just operations that make range checking easier + a VM number precision increase. I’m just very uncertain about the long term direction here in general, and I’d love to see someone with a deeper interest in this field review the topic more systematically.

Another approach to this topic that seems a little safer is to avoid prescribing any particular solution in 2024 or 2025, and instead retarget the VM limits and see what sort of patterns the ecosystem comes up with in the wild:

bitcoincashautist · July 23, 2023, 6:44am

This is consistent with existing OP_ADD, OP_SUB and OP_MUL opcodes.

You can implement int127 = int64 * int64, int64 = int127 / int64 & int64 = int127 % int64 macros by using the OP_MULDIV and OP_MULMOD and verify the result directly, no need for range check.

Thing is, OP_{MUL, ADD}{DIV, MOD} lets you use full precision of the VM to implement higher precision math whereas without it, you’re forced to split the higher numbers to int32 words instead of int64 words. So, it’s a general tool to make use of native node code which is anyway computing 128-bit numbers when doing 64-bit arithmetic.
So, while obvious use-case is for CPMM, it’s usefulness is general - in building up whatever higher precision math using macros - and if our VM numbers are int63 and if we had the 4 opcodes then your macros would be using 2x int64s to build up to int127 instead of 4x int32s to build up to int125 as you’d have to do now.

I’d definitely love that, I had the problem of splitting and chunking when I was doing uint256 math. You can’t simply use OP_SPLIT and OP_CAT due to the way VM numbers are encoded, so this is definitely an interesting research direction. Even with those, it’s annoying we only have signed type so you’d need 5 words for a full uint256.

How about this idea: add 1 bit of VM state and turn some OP_NOP into a OP_INTMODE to toggle all arithmetic opcodes, BIN2NUM, and NUM2BIN from int64 to uint64 and back.

emergent_reasons · July 28, 2023, 7:47am

My review on the current state 2023-07-28

If you want, you can add AnyHedge as an example of a contract constrained by this. What we had to do with 32 bit integers was absolutely nuts, but even with 64 bit, it constrains the size of contracts. For example for the USD oracle, 64 bit limits the contracts to about $1m which is too small for general finance.
"Easy overflow check": The context of this section isn’t clear. I think a quick example in a code fence might help?
"when a calculations would overflow": Deos this refer to the current VM or the updated VM? What is the delta between the behavior of this between the two?
"Higher Precision Math": In this section, it mentions addition, subtraction, division, but not multiplication. The reason might be obvious, but maybe not? Can you add an explicit note about multiplication also? Better to over-explain in a CHIP than to possibly under-explain.
"Modifications to VM limits have the potential to increase worst-case transaction validation costs":
But this CHIP does not change VM limits? Maybe the first part of the sentence should change?
"A simple arithmetic sequence such as multiplying-dividing or multiplying-modulo should have no impact on worst-case transaction validation costs and the DOS-attack surface.": This needs to be backed up with specific analysis such as O notation or similar, even if it is not a formal analysis.
"Because this proposal only affects internals of the VM, standard wallets, block explorers, and other services will not require software modifications for these changes.": This is probably still true, but as wallets get more sophisticated, it will stop being true. I think it’s safer to say that common wallets that do not deal with smart contracts will not require changes. Wallets that handle specific smart contracts not using this change, e.g. escrow contracts, also will not require changes. Wallets that handle general smart contracts will need to update their code or dependencies to be able to understand the code. Then a list of examples of each.
"The intermediate product CAN be in the INT128 range, while the quotient MUST be in the INT64 range.": Please describe more specifically in VM terms. This can also be read that the quotient will be truncated/constrained even if the actual result is not. I.e. according to this description, a valid result of 2^63 * 2^63 / 1 would be 2^63 (because the quotient MUST be in the INT64 range). I assume you want to describe that script execution ends with failure if MUST is violated, but that needs to be stated explicitly.
"OP_MULMOD": The CHIP needs to describe all edge cases clearly. Is there an edge case where b is maxint?
Missing (serious) alternative: OP_MULDIVMOD and OP_ADDDIVMOD. These would be a better design choice if it is expected for users to consume both values.

The specific code and examples are excellent additions that make it more real.

My biggest concern with the current setup is that it doesn’t explicitly describe all boundary conditions and ambiguous cases such as what happens with maxint, max negativeint, direction of truncation when the result is negative, sign of modulo when the result is negative, … probably others.

bitcoincashautist · July 28, 2023, 8:33am

Something like this?

<MAXINT> <MAXINT> <MAXINT> OP_MULDIV returns 1
<MAXINT-1> <MAXINT> <MAXINT> OP_MULDIV returns 0
<-MAXINT> <MAXINT> <MAXINT> OP_MULDIV returns -1
<-MAXINT+1> <MAXINT> <MAXINT> OP_MULDIV returns -0 (0x80) or +0 (0x00) hmm, need to specify
<MAXINT> <MAXINT> <MAXINT-1> OP_MULDIV fails (result overflow)
<MAXINT+1> <MAXINT-1> <MAXINT> OP_MULDIV fails (operand overflow)

MathieuG · July 28, 2023, 10:02am

Thanks for the review and comments! I’ll try to respond to all the main points

Yeah, but checking if the multiplication would overflow in the contract is very easy with a muldiv opcode by doing muldiv(a,b,maxint) whereas currently this would require emulating int127 multiplication as worked out in emulatedOpcodes.cash of the composite arithmetic opcodes CHIP, so even the multiple codepaths option would be reasonably simple.

Thanks, this is interesting! Surely Cauldron & FexCash could work around the overflow issue in the same way but I assume this loses precision as satoshis_per_token can only be an integer and otherwise the price of an asset in CPMMs is represented as X/Y where the result can be a decimal number.

The CPMM algorithms are indeed the inspiration for the proposal, but enabling int127 math in this remarkably easy way is much more general than just CPMMs. Also, multiplication overflows are more general than CPMMs, @emergent_reasons explained the BCH amount for anyhedge contract is limited by multiplication overflows. It seems to me like many more financial applications would run into this.

The combination of muldiv & mulmod allows for emulating higher precision multiplication, theadddiv & addmod combination allows for emulating higher precision addition and subtraction in a very simple way, this is shown in the int127Math example library of the composite arithmetic opcodes CHIP. Even higher precision multiplication, addition & subtraction can be emulated in the same way. Exponentiation, square roots as well as higher order division would indeed still require their own, separate solutions.

bitcoincashautist:

bitjson:

So even with OP_MULDIV, non-CPMM use cases would still need to solve the more general case of creating efficient macros for higher-precision fixed point arithmetic where intermediate results are represented using multiple VM numbers. (And we might need to figure out these macros whether the VM has 63-bit numbers, 127-bit, 255-bit, etc. as increased VM precision just increases the required precision of intermediate calculations.)

Thing is, OP_{MUL, ADD}{DIV, MOD} lets you use full precision of the VM to implement higher precision math whereas without it, you’re forced to split the higher numbers to int32 words instead of int64 words. So, it’s a general tool to make use of native node code which is anyway computing 128-bit numbers when doing 64-bit arithmetic.
So, while obvious use-case is for CPMM, it’s usefulness is general - in building up whatever higher precision math using macros - and if our VM numbers are int63 and if we had the 4 opcodes then your macros would be using 2x int64s to build up to int127 instead of 4x int32s to build up to int125 as you’d have to do now.

Fully agree with what @bitcoincashautist wrote here.

I added this under evaluated alternatives to the composite arithmetic opcodes CHIP and worked out what emulating the muldiv , mulmod , adddiv & addmod opcodes in a cashscript library would look like!

Higher script limits

With a proposal such as CHIP 2021-05 Targeted Virtual Machine Limits, it becomes practically feasable to emulate muldiv , mulmod , adddiv & addmod functionality.
This is demonstrated in the emulatedOpcodes.cash example CashScript library, which only uses the current VM capabilities.
As can be seen from just the file length, it would be impractical to use such functions in smart contracts due to the 201 opcode & 520 bytesize limit.
By lifting or re-engeneering these limits, the same functionality proposed in the CHIP can be achieved at the cost of larger smart contracts.
With good library support in Cashscript the complexity of using such emulated functions can be hidden from developers.
The Int127Math.cash library could work by importing emulatedOpcodes.cash instead of utilizing native muldiv , mulmod , adddiv & addmod functions.

emergent_reasons · August 22, 2023, 12:09am

Something like that but you would want to show convincingly that it covers literally every possible edge case. There must be proofs and similar specs in existing integer math specifications out there. It’s a very old field with a huge amount of work done already.

MathieuG · October 24, 2023, 12:13pm

Revising this topic out of necessity, for financial applications like interest it’s a real drawback that here is no exponentiation. As discussed earlier, emulation is not practically viable with the current VM limits.

Emulation with just higher VM limits & the same opcodes would still be very impractical. As @bitcoincashautist wrote, it would be possible to emulate easily with bounded loops & bit shift ops.

My preferred solution is still that a simple exponentiation opcode can be added without the need for emulation.

bitcoincashautist · October 24, 2023, 1:28pm

Yeah I think I lean this way too, and IMO these opcodes should work with fractions of native VM integers, so they’d pop pairs of stack items (a/b)^(c/d) and return 2 stack items (e/f)