several months later, this topic floated to the top and I notice this comment, its even been actioned on.
To at least comment on this, I’d like to say that for the UTXO storage in Flowee I considered the var-int and rejected it after testing.
As background, Flowee implements its own database (actual low level code writing files and all that) for the UTXO itself. There are indeed lots of values in there which make little sense to store in the full 8-byte setup.
Instead, what Flowee does is that we use this little used bitcoin-core (satoshi time, really) way of compresing integers as a simple variable-byte-size system. If you ever looked at UTF-8, you’ll know it as its quite similar.
Where ‘var-int’ we use in the transaction-encoding jumps from 1 to 5 or 9 bytes, this smoothly scales from one to 9 bytes. It also doesn’t waste 1 full byte to indicate which type is used. Instead it is easiest understood as a 7-bit encoding.
A block-height of 700000 (0x0AAE60
) gets encoded in 3 bytes instead of 5, to make a concrete example.
Code may speak louder then words; Java, C++.
A big bonus here is that you encode a weak form of checksum in your dataset. Start reading at a wrong position and you’ll get thrown an exception most of the time. Data-corruption is likewise much faster to detect.
Anyway, just my 2 cents based on a couple of years working on storing the utxo dataset.
Cc: @joshmg @groot-verde