I couldn‘t sleep well the last two days so I wrote down my ideas on how a VM for Mitra could look like: https://docs.google.com/spreadsheets/d/10PVUGoOB-otUXvFLbDr3z2R6bPzjVhFqIhdlJlO-Fak/edit?usp=sharing.
The goal is to make a hyper efficient VM, so we can keep Bitcoin Cash as scalable as possible, while allowing as complex smart contracts as possible. The VM has a couple of weird constraints, which are hard to uphold:
- It should be very fast to interpret.
- It should be possible to compile the code to efficient machine code, but not take too long for the compilation.
- Ideally, it should be possible to at some point build ASICs which can run these opcodes. This doesn‘t seem out of the question given ASICs are absolutely standard for miners for years now.
- It should be relatively easy to implement.
For this, I did some research into WebAssembly, i.e. among other things analyze existing wasm files (one for Mono (C# interpreter), one for d3force and once for some animation framework), but found a few issues:
- It‘s not very compact, e.g. the most common operation (local.get) always takes at least two bytes.
- The most common operation by far is local.get, which means most programs spend the majority of instructions on putting values back on the stack. This is solved by using a belt, as used by the Mill CPU architecture.
- It has two integer types, i32 and i64, which is not sufficient once we integrate fractional satoshis. I added i8, i16 and i128.
- Each integer operation encodes the type it uses. This bloats the opcode space, and we can encode more ops if we have just one opcode for add, mul, etc.
- It has many ops for floating-point numbers, which we can ignore and use the opcode space for other stuff.
- The stack can grow unbounded, which means we cannot pre-allocate the required memory before executing.
WebAssembly instructions are already variable-sized. I made a instruction set which is also variable-sized but utilized the opcode space, I think, more efficiently. Rosco quoted Donald Knuth (premature optimization is the root of all evil) and I agree, so we should’t spend too much time on crafting the right instruction set and rather get to playing around with the thing as early as possible.
I also added a few other things, which are inspired by the Mill CPU architecture (see video 1, video 2, video 3 and video 4 for an intro to that architecture):
- A belt, instead of a stack, which limits the required memory to a fixed value (currently, 256 bytes) and allows for a small opcode size.
- NaNs, which allow for, let‘s say, interesting error handling. I haven‘t explored this idea too throroughly yet though,
I‘ve also added slices, so no allocations at all are required when evaluating a VM. Slices point to a region of memory, and can be handed around in the VM like any other value.
If I cannot sleep again at a different date, I‘ll implement an interpreter for this in Python or so, so we can experiment with this. Maybe we‘ll get a compiler for that, too.
What do you guys think?