CHIP 2021-07 UTXO Fastsync

Only the node that’s performing IBD from a snapshot must validate it by adding the whole UTXO set to a {}-commitment, and doing that will take O(N), where N is the number of UTXOs in the snapshot. This is much faster than parsing the whole blockchain from 0 to construct the UTXO set at the same height. Instead of doing that, you’d get a snapshot for height N from whatever untrusted source, do the O(N), and compare it with the value written to associated coinbase message. Then, you only need the new blocks on top of that and never have to download the history (unless you want/need it).

For nodes that are in sync, they’d just be performing incremental updates to the last verified commitment. The size of the set compressed in the hash doesn’t matter - additions and deletions are O(1).

If you have a commitment to a set of N elements, and you have to update it to a set of M elements, you only need the N commitment and a “diff” between states N and M: some k elements that were added and j elements that were removed, and the update will take O(j+k) operations, regardless of N and M sizes.

All you write is true, nothing you wrote invalidates the detected problem or improvement suggestions.

The problem I stated still exists, today it requires a 10GB download or so of the UTXO which then needs to be validated in total, if one byte is off you have no way to detect which byte in that entire set is incorrect.
If we move to a 100GB UTXO this is still true and a bigger issue still.

The idea I wrote solves that and additionally adds a lot of opportunity for more optimizations.

Nitpicking, this is not true. If we assume that one day miners will include such commitments, they too will build it the hard way at least once.

1 Like

I will need to read up on the details far more heavily, but intuitively I think this idea about the most frequent updates & changes happening on the most recent UTXOs and therefore some kind of bucketing system which helps to prioritise ease of updates there even if it means trading off some performance on the other chunks may make sense in some kind of bucketing system.

I can imagine a world where your pruned node comes online in chunks, starting with the most critical 10% of “UTXOs from the last month”, which are the ones also most likely to be respent, and then filling in each of the following 10% chunks which change progressively less and less often.

1 Like

Not at consensus level, no, but it doesn’t prevent the P2P layer from partitioning the dataset and having a way to verify integrity of each partition, using commitments to partitions (Josh calls it “buckets”). What’s nice is that you can add / subtract partition’s commitment from the top commitment without having the data that made the commitment. So even if your full snapshot is 100 GB and is not matching, but you find that a chunk of 1 GB is at fault, you only need to download some alternative chunk’s commitment, EC-subtract the faulty one then EC-add the supposedly good one, test the result against commitment in coinbase, and if good proceed to download the 1 GB. Once finished, just compute the 1 GB and verify against partition’s commitment and you’re good.

Sure, but the first snapshot can be built up before activation date. This problem is really implementation-specific, although it has to be handled in a performant manner for smooth activation.

Consider this method:

  1. Get upgraded software 6 months ahead of the time to embed
  2. Restart node to run it, it quickly resumes as if nothing much changed
  3. The new version spawns an async process, parsing the stored blockchain linearly from disk, and adding/removing items to / from the commitment, and eventually it catches up to current height, and you have your commitment for some height N. It’s similar to calculating an average really… new_commitment = old_commitment + item; or new_commitment = old_commitment - item for removal, where - and + are EC ops. You don’t need to store the item, just gotta be sure you don’t skip any as you process the chain.
  4. When activation time comes, you just have to add / remove whatever diff happened between N and 1st snapshot’s height N+x.

Minimal node implementation doesn’t need to generate, serve or download the UTXO snapshots.
It can just stream-process the UTXOs it pops/adds to be ready to verify the coinbase commitments.

Service that generates and serves UTXO snapshots can be independent, outside the node’s critical path!

1 Like

I’ve just been skimming this topic (and more importantly haven’t read the proposal closely), so please correct me if I’m wrong.

but you find that a chunk of 1 GB is at fault, you only need to download some alternative chunk’s commitment, EC-subtract the faulty one then EC-add the supposedly good one, test the result against commitment in coinbase, and if good proceed to download the 1 GB.

The problem is that you can’t verify if one specific bucket is “at fault” until you have verified all of them together. Sure, you can verify it against a P2P-level merkel tree of buckets on a non-consensus layer but there is no way of knowing if that merkel root matches the consensus level EC-Multiset commitment.

1 Like

Ah ok, but you don’t need to download the actual data in order to know which buckets you need, so you’d need to keep asking fast-sync servers for sub-commitments (how big is this, 33 bytes x number of buckets?) until you get a combination which when added up will match the top commitment embedded in a coinbase message. THEN you ask for the partitions, and you can verify each against a sub-commitment.

1 Like

After reading the doc and some discussions on TG I understand your point.

One thing I would note is the fixed number of buckets in the Snapshot Metadata message (128). I would prefer to add a field bucket count that specifies the number of buckets for this snapshot. The value 128 could very well be marked as a recommended value and client implementations are free to reject any commitment that has a bucket count value that differs from 128. This way there is a freedom to roll out new recommended values in the future without needing a new utxocmts P2P message version.

On a related note the there is one fixed layer of SubBuckets in this message and those could theoretically be further divided into SubSubBuckets, SubSubSubBuckets and so on. Instead of the sub-buckets field in the Snapshot Bucket Metadata being of the type Snapshot SubBucket Metadata it could be recursive with a Snapshot Bucket Metadata type along with a SubBucket Depth field describing how deep the buckets are.
Note: the cost of inserting a transaction into such a structure is O(L) where L is the height of the tree.

2 Likes

How can a client verify this before the entire UTXO set is downloaded?
Let’s say a (bad-behaving) node moves one transaction from the first bucket to the last and then refuses to serve the last bucket.
A client would have to connect to a new (well-behaving) node and fetch the metadata, but the pubkey for each bucket would differ from what was fetched from bad node and thus needs to download the entire set again.

This still seems unclear to people.

  • UTXOs, as restored by the CHIP, contain enough to allow a full node to sync. Allows a full node to build on top of the restored UTXO.

  • Wallets don’t use UTXOs, walles use transactions.
    A UTXO set is not useful to them. No wallet exists that will take a UTXO as a historical ‘state’.
    More to the point, there is no way to prove to that thin wallet that the historical transaction is actually valid.

So for wallets such a full node, restored from fastsync, is only useful for full transactions received after the sync point. It can not have any, really any history on its owned addresses before the sync point.

This is the current state of the tech, as is the case in software, this can be changed because software can be changed.

But before that is the case, I will object to any consensus changes because I consider this feature to not be finished yet.

I do agree with what you are saying above. UTXO fast-sync (with or without UTXO commitments) are indeed only useful for full nodes to be able to fully validate the blockchain from the point of synchronization. Any wallet can thus only fetch history from that point and forward.

I think it is important to talk about what the benefits of such functionality might be.
One use-case from the top of my head:
Let’s say that there is a CashToken launched called DummyCoin with genesis at block X. DummyCoin gains a lot of popularity with an entire eco-system around it with independently developed DummyCoin mobile wallets, DummyCoin merchant checkout services and so on. Each user/service/organization that wishes to launch a fully validating node exclusively for DummyCoin would be uninterested in whatever happened before block X and wouldn’t need to spend the time and storage of processing decades of redundant data.
For those the ability to quickly launch a node from, or slightly before, block X gains real value since there will be more fully validating nodes that can serve all DummyCoin SPV wallets with history.

This is not mutually exclusive for nodes with history going back from genesis.

1 Like

I know this has been a somewhat controversial subject in the past and many people don’t agree with me, but…
I think synchronizing wallet history from genesis will eventually be a premium service. Without UTXO commitments I’m afraid that getting recent history will also come at a (monetary) cost.
There simply isn’t incentives for keeping public nodes open for everyone when the cost rises.

2 Likes

I described one of the benefits during a brainstorm session on Telegram: trustlessly bootstarting historyless SPV-serving nodes:

  1. download block headers, verify them
  2. get SPV proof for latest coinbase TX that has utxo commitment in it, verify that
  3. get UTXO snapshot, verify that against the commitment
  4. ask SPV serving nodes for proof for each UTXO in the set
  5. sync up normally from there

With this you not only get a 100% verified UTXO snapshot, you also gather SPV proofs for each of them. With this you can be sure you have all the UTXOs at height N and all the proofs for them.
This pruned node can serve UTXO snapshot to other nodes, too.

Imagine some hypothetical future where blocks are 32 MB, history is 10 TB, and UTXO state is 30 GB. At 10k blocks interval, fast-sync would require a download of 30 GB + 32 MB * 10000 == 350 GB. Not so much compared to a full download of 10 TB. If you had 4MB/s fast-sync would take roughly a day, and full IBD would take roughly a month.

Then, if you really need it, you can fill in the spent history at your convenience.

We should separate the fast-sync and commitment CHIPs, or should I say - we should make a standalone commitments CHIP to hash things out since only the fast-sync CHIP now exists.
The commitments are the minimum consensus change needed to enable fast-sync to be trustless (they’re possible now but you need a trusted commitment, which Verde already implemented, and there’s been some work on BCHN too).
Fast-sync is the tech stack to make use of the commitments and capture the benefits of having consensus-validated commitments: trustless bootstarting from a snapshot.

@joshmg put it well in recent hangouts on X: “So that’s the difference between fast-syncing and UTXO commitments, UTXO commitments are actually making it a part of the rules that make a block valid, and fast-syncing is the benefit you get from having a UTXO commitment.”

1 Like

This ignores the point I made.

The fastsync is about UTXOs, wallets (and SPV) is about transactions.

You can’t just replace one with the other and pretend you can get SPV proofs that are meant for transactions, for utxos.

If you think you can, please build it. Prove me wrong.

Until then, I don’t think fastsync in consensus makes much sense. Too high risk of changes needed (by hardfork) to make the normal usecase work.

Yes, I was inaccurate, sorry, let’s try again:

Bootstarting historyless SPV-serving nodes:

  1. Download block headers, verify them.
  2. Get SPV proof for latest coinbase TX that has the UTXO commitment of interest in it, verify that against the headers from (1.).
  3. Get UTXO snapshot, verify that against the commitment in (2.).
  4. From the UTXO snapshot, generate a non-duplicate list of TXIDs that have at least one UTXO in them. Ask SPV serving nodes for each transaction and SPV proof for each.
  5. Sync up normally from there.
  6. The node can now serve historyless SPV wallets and also help bootstart other such nodes by providing them with UTXO snapshot & SPV proofs for TXs still having UTXOs in them.

I agree.

This is in line with Satoshi’s “Nodes will be server farms”.

Of course, the corpos and operators running these farms will not be willing to pay the costs of maintaining all the tree back to genesis for no compensation.

But I also think that it is important that some independent/hobbyist/academic/subsidied by government nodes remain, so it can be verified at any time that somebody is not cheating and supplying clients with fake history for some nefarious/for-profit reason that we cannot imagine today.

Here’s some empirical evidence of there always being some copies of history: people seed Torrents, r/DataHoarder exists.

If Bitcoin Cash grows, having an archival node will come with big bragging rights.

Yes, also this is not going to happen in the next 20-30 years, because technology is still following Moore’s Law.

Hard drive space and Internet connection speed is rising geometrically. Much faster than adoption of BCH can probably proceed (that is, assuming no Black Swan-type scenario happens that causes billions of people to suddenly want to use Crypto).

1 Like

I think the crux of the problem is the details of this step

A client that just synced the UTXO set could in theory connect to every known Electrum server and issue blockchain.transaction.get and blockchain.transaction.get_merkle for each UTXO to achieve it. (No I don’t think this is a good solution)

Or, since the block height of each UTXO is known, set the appropriate bloom filter and request merkelblock for specific block from nodes with full history. Rinse and repeat for each UTXO.

Or the node gets the data from a side-channel not specified in this spec.

One key point is that those transactions and proofs are not contained within the snapshot and we are still dependent on someone being around and serving it in some way.

To summarize:
For a node to be able to fully validate everything (i.e. being a “full node”) from the point of the UTXO snapshot nothing more is needed. This is the problem solved by this CHIP and a miner validated commitment in a future CHIP could make this fully trust-less.

For a node to able to serve history (transactions and proofs) from the point of the UTXO snapshot additional data needs to fetched and validated. To me this seems like a separate CHIP if we would like a standardized way of doing it.

1 Like

So,
you just turned the small utxo set download into “download all full transactions that still have at least 1 unspent output in them”.

Maybe good to re-do the math (how big the download is) and the fastsync chip based on this new approach.

I suspect it changes a LOT.

When someone implements that, so we can be certain it actually works (not just theoretically), I’d love to see some discussion on how useful or viable it is for fastsync.

That’s only if you want to run your node as a historyless SPV server.

If you just need a node wallet, you don’t need the SPV stuff since you’ll have obtained the UTXOs from the snapshot and you’ll continue to maintain it as part of normal node activities.