Mitra VM: Thoughts on the architecture + instruction set

I couldn‘t sleep well the last two days so I wrote down my ideas on how a VM for Mitra could look like: https://docs.google.com/spreadsheets/d/10PVUGoOB-otUXvFLbDr3z2R6bPzjVhFqIhdlJlO-Fak/edit?usp=sharing.

The goal is to make a hyper efficient VM, so we can keep Bitcoin Cash as scalable as possible, while allowing as complex smart contracts as possible. The VM has a couple of weird constraints, which are hard to uphold:

  • It should be very fast to interpret.
  • It should be possible to compile the code to efficient machine code, but not take too long for the compilation.
  • Ideally, it should be possible to at some point build ASICs which can run these opcodes. This doesn‘t seem out of the question given ASICs are absolutely standard for miners for years now.
  • It should be relatively easy to implement.

For this, I did some research into WebAssembly, i.e. among other things analyze existing wasm files (one for Mono (C# interpreter), one for d3force and once for some animation framework), but found a few issues:

  • It‘s not very compact, e.g. the most common operation (local.get) always takes at least two bytes.
  • The most common operation by far is local.get, which means most programs spend the majority of instructions on putting values back on the stack. This is solved by using a belt, as used by the Mill CPU architecture.
  • It has two integer types, i32 and i64, which is not sufficient once we integrate fractional satoshis. I added i8, i16 and i128.
  • Each integer operation encodes the type it uses. This bloats the opcode space, and we can encode more ops if we have just one opcode for add, mul, etc.
  • It has many ops for floating-point numbers, which we can ignore and use the opcode space for other stuff.
  • The stack can grow unbounded, which means we cannot pre-allocate the required memory before executing.

WebAssembly instructions are already variable-sized. I made a instruction set which is also variable-sized but utilized the opcode space, I think, more efficiently. Rosco quoted Donald Knuth (premature optimization is the root of all evil) and I agree, so we should’t spend too much time on crafting the right instruction set and rather get to playing around with the thing as early as possible.

I also added a few other things, which are inspired by the Mill CPU architecture (see video 1, video 2, video 3 and video 4 for an intro to that architecture):

  • A belt, instead of a stack, which limits the required memory to a fixed value (currently, 256 bytes) and allows for a small opcode size.
  • NaNs, which allow for, let‘s say, interesting error handling. I haven‘t explored this idea too throroughly yet though,

I‘ve also added slices, so no allocations at all are required when evaluating a VM. Slices point to a region of memory, and can be handed around in the VM like any other value.

If I cannot sleep again at a different date, I‘ll implement an interpreter for this in Python or so, so we can experiment with this. Maybe we‘ll get a compiler for that, too.

What do you guys think?

5 Likes

The belt concept introduces potential undefined behavior, is more complex to model than a stack, and has less functionality (given it can’t pop a value). Moreover, it seems the value you’re wanting from it can be gotten simply by implementing a rotating stack ( tos = stack[pos mod stack_limit] ) which many embedded forth engines have done in the past.

Being stack based keeps the language and the compiler extremely simple. Your belt is basically a register file with complications and invites a lot of complexities for mapping local variables which, although not necessarily super complex (see lisp & scheme for example), are still an order of magnitude or two more complex than simple stack based operations - especially when you consider the possibility of multiple stacks as a classic forth engine provides.

I understand the attraction to webasm but it’s primary focus of executing Javascript as fast as possible has made some architectural decisions that make it a poor general VM which is partially evidenced by the criticisms you’ve already noted in your own suggestions.

Agree completely that a VM’s ISA should be able to be implemented as an FPGA/ASIC. This is imminently doable and desirable. The key to such things, however, is being absolutely brutal in elimination of all complexity not absolutely required. A classic stack machine is really an ideal model if done right. The BTC & ETH VMs were clearly designed by people who hadn’t a lot of experience in such things before. I’d recommend reading Phil Koopman’s “Stack Computers - the new wave” which shows what’s possible with even primitive technology.