The need for additional Math opcodes

I would not underestimate the importance of a precise specification of any new opcodes, which must be specified down to the last bit for all possible inputs. And once this is done, exactly how any implementation can be thoroughly validated under all possible edge conditions.

There is a long history of low quality math library calculations starting with the difference between real numbers, rational numbers and integers, continuing through coding errors in library routines, compiler errors, and even processor errors. Remember the Pentium FDIV bug?

2 Likes

Speed not an issue. Issue is making sure we don’t have a repeat of block 74638.

1 Like

An area of research that belongs in this discussion: what’s the most efficient way to emulate higher-precision math in the BCH VM?

Right now the BCH VM is in a fantastic position – math operations are very flexible, but not enough to require per-operation accounting:

I think any future math-related upgrades should maintain this performance characteristic. It would be a shame if we had to complicate the VM by e.g. counting up the number of executed exponentiation operations, or worse, implement some form of “gas”.

One possible solution is to have new operations simply error on overflows, and the ~64-bit VM integer size naturally limits the computation required to correspond with bytecode length. Then contract authors are properly incentivized to simplify and reduce the need for higher precision math, as higher-precision calculations require additional bytes to emulate. E.g. for Jedex’s market making algorithm to support tokens with total supply greater than 2^55, we can either 1) emulate ~150-bit unsigned multiplication or 2) use a “Higher-Denomination Token Equivalent (HDTE)” within the market maker.

For use cases where simply performing the higher-precision math is most advantageous, it would be great if some researcher(s) could also figure out:

  1. The most byte-efficient way to perform those operations, and
  2. What further capabilities would be useful for reducing the bytecode length of emulating much higher precision operations while maintaining the correlation between bytecode length and computation cost (e.g. various bit shifts, SIMD, bounded loops, etc.)

With CashTokens now, it’s already possible to implement essentially unlimited precision math in contracts, even without changes to the VM limits: you can perform part of the computation in each contract execution, handing the result after each section of the program to the next <201 opcode contract. You really only start hitting limits at the 100KB transaction standardness limit (and consensus allows up to 1MB), and even that can be worked around using multiple transactions. (And with better targeted VM limits developers wouldn’t even need these workarounds.)

I started experimenting with building a general macro “library” for higher-precision math into Bitauth IDE last year as part of the Jedex demo, but I haven’t made it very far. I’ll bet other developers could make much faster work of the problem!

So for anyone interested in this topic:

  • I’d love to see some research and user-space libraries that enable >128-bit precision multiplication and division;
  • In the context of new proposals, I’d love to see examples using new proposed math opcodes in higher-precision calculations too, e.g. an example of >128-bit exponentiation using a proposed OP_EXP.

I expect this research is required to evaluate the long term impacts of various implementation details in any new, math-related upgrade proposals: API, performance considerations, overflow behavior, etc.

6 Likes

Still, it’s probably better to check it right now than do a hard-fork removal of such new OPCodes 6 years from now, because it turns out they are too slow for mining & relaying 5GB blocks somehow…

We run a risk of the BCH protocol getting frozen up later in its lifetime, so it would be wiser to double-check and make sure before we go there.

I implemented some uint256 arithmetics by using ScriptVM int64, with the objective of creating a native Bitcoin-compatible header chainwork oracle.
I got to calculating a single block’s chainwork from the block’s compressed target (script unlock_verify_chainwork).
Next step would be to instantiate a CashTokens covenant chain, that would start from some header and add up the chainwork, and keep the height & cumulative chainwork in the NFT’s commitment.
Then, anyone could obtain a copy of the state as an immutable NFT, for use in contracts that would want the information.

What I noticed is this: it takes a lot of bytecodes to cast back and forth from uint32 to ScriptNum. I think some opcode for this would be beneficial for anyone trying to do higher precision math using lower precision VM primitives.

2 Likes

Someone in BSV has made a library implementing log , exp , root and pow. I’m not sure how representative it is for BCH (given that OP_MUL & 64-bit integers were added). In their article they write:

The exponential function is smallest with only 8kb . The logarithm comes in with 17kb while root and pow require 25kb each as they are using exp and log.

Without new Math primitives developers who want to use AMM equations involving exp and log will have to use similar workarounds. If these numbers are representative, it would mean at the very least 125 UTXOs to use pow (not counting the overhead of extra code required to split over so many UTXOs) without any changes to the VM limits. This seems practically prohibitive, Targeted Virtual Machine Limits would be required to realistically allow for this kind of experimentation.

The erroring-on-overflow is how I thought it would work. If there is a large need for such a “general macro library for higher-precision math” then it seems our choice of 64-bit integers was quite restrictive after all. Re-reading the Bigger Script Integers CHIP from ~2 years ago, it seemed it was totally unanticipated that there would be a real need for higher precision math.

If this becomes a common use, it might make sense to revisit this idea by @markblundeberg

3 Likes

I think one nice general-purpose improvement would be OP_MULDIV so you could multiply any 64-bit number with a fraction of any two int64s, and it would preserve full 64-bit precision for the result (or fail if overflow).

It would save contract authors from having to test for overflows if doing something like: <3> OP_MUL <1000> OP_DIV, they could just multiply with any fraction less than 1 and be sure it won’t overflow and that they’ll get a precise result.

2 Likes

Awesome, glad to hear you’re looking into it! :rocket:

The only way to end up with a uint32 “number” within the VM is to pull it in from a source not intended for the BCH VM, right? So this would really only be applicable to the specific case of contracts parsing block headers or signed data blobs containing uint32 numbers from outside the system?

Unless I’m misunderstanding, these “uint32 numbers” are essentially strings regardless of whether or not conversion utilities exist – they still don’t work with any other operations or have any native use within the VM.

Blessing a particular uint32 format with VM opcodes would be similar to e.g. adding “utf8 number” utilities to cast numbers to/from utf8 strings. In fact, the only other number format that is already inherently “supported” by the VM is the r and s values encoded in signatures and x and (sometimes) y values encoded in public keys. But here again, there’s little point in us considering them anything but strings from the perspective of the VM.

Interesting, thanks for sharing! I expect those numbers are massively inflated by the lack of some bounded looping mechanism – even very simple algorithms that could be represented in <20 bytes require ~8x the operations simply to re-apply the same operations across each byte, or even ~64x for operations at the bit level. You also have to pay the cost of all IF/ELSE branches within each iteration, even if most are unused. Bytecode length ends up being pretty disconnected from computation costs.

I should have clarified that bullet point more. The next paragraph added more context:

That remains the case – 64-bit numbers are an excellent, conservative choice for our “base numbers” (though we were careful to ensure future upgrades can expand further), and I still think that range supports most practical use cases.

It’s important to distinguish between numbers used for representation of real values (like satoshis, shares, votes, tickets, etc.) and numbers used in intermediate calculations by contracts. Bumping up our base number range to 128 or 256 bits wouldn’t help with those intermediate values; we’d just need to e.g. emulate 512-bit fixed precision multiplication to fully support the new larger number values (or we’re back to this same problem of needing to sanitize inputs by limiting them to some value lower than the maximum range of the base numbers).

You can see this in practice on Ethereum: 2^256 numbers haven’t eliminated the need for big integer libraries – instead, contracts need to emulate even higher precision to avoid sanitizing inputs of the larger base numbers. E.g. In practice, essentially no tokens actually need 18 decimal places in the real world (just the network fees often cost over half of those decimals places), but the contract ecosystem is forced to pay an ongoing cost for that unused “base” precision.

Anyways, I think the real solution here is to figure out the most byte-efficient “macro” to emulate higher-precision computations in BCH VM bytecode, then optimize that macro to be relatively byte-efficient without fully offloading it to the VM implementation. This keeps bytecode length tied to computation costs, incentivizing contract authors to conserve both (without requiring the overhead of some arbitrary operation-counting or gas system). Any optimizations here also continue to pay dividends even if future VM numbers expand to 128-bit or larger numbers, as it seems that higher-precision math is always useful for some intermediate calculations.

(N.B.: My perspective on distinguishing arithmetic from crypto math is also informed by an expectation that sidechains are significantly more efficient than automata for a wide variety of use cases.)

Awesome! I’d love to see more detail on this idea; this seems like exactly the kind of optimization I’m hoping we can find to simplify practical implementations of higher-precision math libraries, both for existing operations and for proposed operations like exp and log.

4 Likes

Context is my uint256 implementation. I push the uint256 as 32-byte stack item, split it to 8x 32-bit uints. Then I have to do operations on them to compute the big number, yeah? But because script nums are signed I must append 0x00 to each of them, then do the math, then strip any 0x00, and finally concatenate them all to get the uint256 result.

Thanks this is the context I was missing :pray:

3 Likes

I think fixed-point arithmetic is the direction we should look into: Fixed-point arithmetic - Wikipedia

IMO it’s a natural extension of our basic integer arithmetic, and we’d have consistent granularity of results. The exp and log ops could be popping 4 stack items (2 fractions of 2x 64-bit ints) and pushing 2 stack items (a fraction of 2x 64-bit ints).

Let’s consider the OP_MULDIV primitive:

Word Value Hex Input Output Description
OP_MULDIV ??? 0x?? a b c out a is first multiplied by b then divided by c. The intermediate product CAN be in the INT128 range, while the final result MUST be in the INT64 range.

We can implement INT32 version as a macro op_muldiv: OP_ROT OP_ROT OP_MUL OP_SWAP OP_DIV but then we’re limited to INT32 precision for the result. Still, we can use the macro just to demonstrate utility. Consider this example:

Multiply some x with 0.999998^10 at max precision:

<1000000000> <0xffffff7f> OP_DUP
<999998> <1000000>
OP_2DUP OP_2ROT OP_2ROT op_muldiv OP_2SWAP
OP_2DUP OP_2ROT OP_2ROT op_muldiv OP_2SWAP
OP_2DUP OP_2ROT OP_2ROT op_muldiv OP_2SWAP
OP_2DUP OP_2ROT OP_2ROT op_muldiv OP_2SWAP
OP_2DUP OP_2ROT OP_2ROT op_muldiv OP_2SWAP
OP_2DUP OP_2ROT OP_2ROT op_muldiv OP_2SWAP
OP_2DUP OP_2ROT OP_2ROT op_muldiv OP_2SWAP
OP_2DUP OP_2ROT OP_2ROT op_muldiv OP_2SWAP
OP_2DUP OP_2ROT OP_2ROT op_muldiv OP_2SWAP
op_muldiv
OP_SWAP op_muldiv

Result: 999979999

The above example essentially unrolls an OP_POW by using the naive method, which could be further optimized by the square & multiply algorithm:

<1000000000> <0xffffff7f> OP_DUP
<999998> <1000000> op_muldiv
OP_DUP OP_2 OP_PICK op_muldiv OP_DUP
OP_DUP OP_3 OP_PICK op_muldiv
OP_DUP OP_3 OP_PICK op_muldiv
OP_2 OP_PICK op_muldiv
OP_SWAP op_muldiv

With bigger VM limits, one could use this as a building block to implement other functions using Taylor series.

I found that Ethereum folks are considering it too:

and they already have two similar opcodes that may be useful for other purposes: OP_ADDMOD, OP_MULMOD

3 Likes

Just a historical note about this. It was discussed heavily. 128 would obviously be more useful, but it comes with heavy costs, especially at the time when these were some of the first CHIPs, and we had less resources / eyes / developers than today. 64-bit has standards and standard implementations everywhere that require mostly just a little boundary drawing to fit the VM. 128 is a different beast that would require custom implementations and probably new standards. More risk.

I would have loved to have 128, but it wasn’t worth the cost at the time.

If 128 ever really became desired, it’s a relatively straightforward upgrade path from 64 to 128 from a behavior perspective. The only difference would be that some contracts that failed due to overflow would now not fail. As far as I know, that would only matter for an app that specifically uses overflow as a logic mechanism.

2 Likes

Now there are already 2 CashTokens AMMs! Cauldron Swap & Fex.Cash!
Cauldron Swap has published its whitepaper already, FexCash is in preparation to do so soon. Interestingly Cauldron Swap is made using raw BCH script while FexCash is written in CashScript.

For these AMMs using x * y = k the discussed op_muldiv opcode would help with addressing overflows of the 64-bit multiplication. Rosco Kalis explained the old version of AnyHedge using 32-bit integers had to do the same complex math workaround AMMs now have to do to address overflows.

Discussing the future of AMMs on BCH @im_uname wrote

xy=k is one of those ideas that sound so dumb when you hear it, but turns out to be so convenient people are willing to use it to the tune of billions despite being inefficient as all hell

Eventually, new math opcodes or the emulation thereof will be important for enabling experimentation with more advanced and efficient AMM functions.

@bitcoincashautist shows above using op_muldiv it would be reasonable to emulate op_pow. and from there on any AMM function could be approximated with Tayler functions. I assume being able to emulate op_pow means op_exp & op_log could also be emulated more cheaply?

In conclusion, the new op_muldiv opcode suggested above seems like an important primitive which could see great use right away in simplifying Constant Product Market Maker DEXes, and in emulating missing Math opcodes

1 Like

Taking a closer look at the EVM Arithmetic Instructions it is interesting that they only have exp (for exponentiation with any base) from the ones discussed at the top and use math libraries for things like root & log.

Searching for an explanation of the op_addmod & op_mulmod use cases I found this medium post

There are also opcodes that help the compiler in detecting overflows and underflows in the code, some of the common ones are ADDMOD, MULMOD and SMOD

It is notable the op_muldiv EIP discussion linked by @bitcoincashautist above is initiated by Uniswap developers who say

In Uniswap V3, we make heavy use of a FullMath#mulDiv , which allows you to multiply two numbers x and y then divide by a third number d , when x * y > type(uint256).max but x * y / d <= type(uint256).max

Which is the point I was making above for AMM DEXes. The rational for the EIP on ETH continues with

This function is extremely useful to smart contract developers working with fixed point math. Hari (@_hrkrshnn ) suggested that we write an EIP to add a muldiv opcode.

This is a first important consideration, to enable fixed point math the working of the op_muldiv opcode should be different than above. Reading the EIP discussion further:

Some initial discussions we had when writing the EIP:
– Support both “checked” (abort on overflow) and “unchecked” (do not abort) case
– Whether to return multiple stack items or not
– Support accessing the hi bits (after division) or not
The current design wanted to avoid multiple opcodes and returning multiple stack items, hence we arrived at having the special case with z=0 .

Having a special case like this when dividing by zero can be inconsistent and even dangerous for contract design, so apart from the number of stack items returned the denominator 0 case should be discussed.

Lastly, rereading the Market Making Algorithm of Jedex it says

Notably, because Jedex aggregates liquidity for every market tick, slippage for any particular order can be significantly minimized by other orders within the same tick (in a sense, pool sizes grow dynamically based on intra-tick demand); in this context, even the relatively-simple Constant Product function offers better liquidity than much more complex pricing functions used within other decentralized exchange designs.

It would be interesting to hear from @bitjson whether Jedex would benefit from a muldiv opcode, or how it works around the multiplication overflow issue.

1 Like

I’ve been looking for a way to emulate full precision a * b / c by using modulo math. It is possible to do by emulating a wider integer by storing it as 2x uint64:

Problem for our ScriptVM is that we don’t have uint64, so can’t cleanly split an uint128 to 2x uint64. Maybe we could implement it with uint63 & emulated uint126 as the uint63 can fit into a ScriptNumber, and uint126 can fit into 2 ScriptNumbers.

Division part is the big problem, it needs a bounded loop - worst case it would need 63 iterations. With higher limits CHIP the loop could be unrolled, and implementing it with Script would be quite expensive, lots of bytecode just to perform a single muldiv - an operation we can expect will be common.

If int32 precision is ok, then contracts can use the relatively small macro (OP_ROT OP_ROT OP_MUL OP_SWAP OP_DIV).
If more precision is needed, contracts must compromise on accuracy and input range. Latest example of this is fex cash MULDIV implementation.

I think this demonstrates why OP_MULDIV would be very valuable.

It is cheap and easy to implement the opcode, I already implemented the muldiv function in C for another purpose (multiplying a value with a small factor to calculate blocksize limit algo), the OP_MULDIV would simply be doing this:

uint64_t muldiv(uint64_t x, uint64_t y, uint64_t z) {
    uint128_t res;
    res = ((uint128_t) x * y) / z;
    assert(res < (uint128_t) UINT64_MAX);
    return (uint64_t) res;
}

For VM purposes, it would be operating on signed ints, and should fail on result overflow or on z=0, to be consistent with OP_MUL and OP_DIV.

1 Like

Thanks @bitcoincashautist for explaining the problem with emulating full precision for a * b / c above!

In the just released Fex.Cash whitepaper they explain their math workaround using modulo math

   a * (x/y) = (floor(a/y) * y + (a%y)) * (x/y)
             = (floor(a/y) * y * (x/y) + (a%y) * (x/y)
             = (floor(a/y) * x + (a%y) * x / y

They made a separate file to show the functions for their math workarounds in MULDIV.h, there they have the helper functions FloorMul,FloorDiv, CeilMul & CeilDiv for their modulo math. This could all be simplified & with full precision with an op_muldiv opcode.

With their CashScript contract FexCash demonstrated the need for a CashScript library system. This development has already been noted and investigation into the necessary changes has started with the GitHub issue “Add support for libraries/macros”.

1 Like

I’m looking at our code-point list, and we have OP_2MUL and OP_2DIV that are DISABLED. So, we could propose to replace OP_2MUL with OP_MULDIV and OP_2DIV with OP_MULMOD:

Word Value Hex Input Output Description
OP_MULDIV 141 0x8d a b c out a is first multiplied by b then divided by c and the quotient of the division is returned. The intermediate product CAN be in the INT128 range, while the quotient MUST be in the INT64 range. Sign is preserved and c MUST be a non-zero value.
OP_MULMOD 142 0x8e a b c out a is first multiplied by b then divided by c and the modulus of the division is returned. The intermediate product CAN be in the INT128 range, while the modulus MUST be in the INT64 range. Sign is preserved and c MUST be a non-zero value.

Above we demonstrated the utility of OP_MULDIV, and utility of OP_MULMOD is in emulating even higher precision operations than int64.

1 Like

I know this would be useful. I could have used it myself. My main concern is whether it is a good general approach that solves a class of important issues, or if this is right solution for only a small set of problems. In other words, any CHIP would need to explore adjacent solutions to the problems proposed and convincingly show that muldivmod (or whatever the proposal becomes) is the right choice among meaningfully investigated alternatives.

2 Likes

There are basically two different approaches. One is to mimic existing techniques, e.g. what EVM does, with the main goal is to facilitate migration of existing contracts onto BCH. The other is to provide math primitives grounded in existing hardware architecture and suited for general purpose. This might include support of uint64 data types, which allows for higher precision much better than (signed) int64. uint64 is supported on amd64 and AArch64 architectures.

Evaluation ideally involves sample contracts from a variety of applications and then benchmarking them. Metrics include ease of coding or compiler code generation to get applications and small script size for network performance. These metrics are secondary to the critical requirement for consensus, which includes a thorough specification of the opcodes including all edge cases, a reference implementation, and complete set of test cases.

1 Like