I like data-structures and thinking about data-lookup techniques. I’ve progressed from simply using to actually written various types of databases over the last decade, this is an odd hobby of mine. The UTXO one is part of Flowee today.
A database that is meant to be used and to grow over time will be best when designed with the specific use-case in mind. For instance the UTXO-database stores all not-yet-spent money on a chain, its usecase is thus to store unspent coins.
It grows with the amount of users. Individuals will have a relatively stable amount of actual coins (outputs) because the database can delete old ones that are no longer used. Spend one, create a new one means there is no change in database size.
So the amount of UTXO entries there are will grow at the same rate of the amount of people using the coin.
This is super useful information to make sure your database stays scalable and predicting how many users it will be capable of supporting.
The UTXO-database example is linear. If the number of users goes from 10 million to 100 million, amount of data stored and processed by the database is also roughly going to be 10x. Linear scaling is OK
Linear databases grow based on number of users. Notice that user-numbers will likely grow much faster over time. This is normal and how adoption works. Moore’s’ law is the one that is taking care of accelerated growth of users.
Non-linear databases
A good example of this is an address-lookup database.
The main difference between the UTXO and an address-database is deletion of old records. In the UTXO database we remove items that are used-up. In an address database we don’t. All addresses since the beginning of the chain will forever be searchable.
We are happy that privacy is a focus point on our chain, people use HD wallets and we advertise every single receiving transaction to use a new address. This is great for privacy. Not so great for growth of an address indexer.
Now we have two growth dimensions. More users create more addresses, but since we don’t delete old entries we also have per-month usage as a growth dimension. We went from linear growth to exponential growth. From only caring about how many users we add to also caring how many addresses each user creates every month.
Remember the traditional exponential growth explanation from the Indian story about the brahmin Sissa ibn Dahir, who reportedly invented chess.
Sissa invents chess for an Indian king for educational purposes. In gratitude, the King asks Sissa how he wants to be rewarded. Sissa wishes to receive an amount of grain which is the sum of one grain on the first square of the chess board, and which is then doubled on every following square.
It looks cheap. First is one grain. Then 2, 4, 8, 16, 64 grains. Etc.
But the total number of grains can be shown to be 2⁶⁴ - 1
or 18,446,744,073,709,551,615 (eighteen quintillion, four hundred forty-six quadrillion, seven hundred forty-four trillion, seventy-three billion, seven hundred nine million, five hundred fifty-one thousand, six hundred and fifteen, over 1.4 trillion metric tons), which is over 2,000 times the annual world production of wheat. (wikipedia)
That gives a small introduction to exponential growth. Its numbers go up much faster that most people realize. Point is: any system that has to support such growth will reach ultimately its limits and fail.
So, in a world with total stagnation and no new users, the UTXO database will stay will similarly not grow. But in this non-growth world the address database will continue to grow. Now back to the real world where we want massive growth of users and your address database will grow exponentially in size and usage.
Why does that matter?
In the short term we don’t have that big a user base and it doesn’t matter much.
What is important is to realize the underlying scaling issue, and avoid building infrastructure that depends on such indexers because those low-level decisions will cause centralization over time when the only people that can run such databases are the ones investing lots of money for hardware.
The blockchain itself is a data-store, not a database. It, too, will have exponential growth. But storage is cheap and the blockchain being a data store means it won’t actually be used for lookup or inserts.
Indeed, blockchain won’t work without databases like the UTXO one. It would thus be wise to only use and depend on databases that have linear growth.
Why is this an unpopular opinion?
Because in Bitcoin Cash the vast majority of projects / wallets depend on such databases.