CHIP 2021-07 UTXO Fastsync

Jonas · September 27, 2023, 12:38pm

I’ve just been skimming this topic (and more importantly haven’t read the proposal closely), so please correct me if I’m wrong.

but you find that a chunk of 1 GB is at fault, you only need to download some alternative chunk’s commitment, EC-subtract the faulty one then EC-add the supposedly good one, test the result against commitment in coinbase, and if good proceed to download the 1 GB.

The problem is that you can’t verify if one specific bucket is “at fault” until you have verified all of them together. Sure, you can verify it against a P2P-level merkel tree of buckets on a non-consensus layer but there is no way of knowing if that merkel root matches the consensus level EC-Multiset commitment.

bitcoincashautist · September 27, 2023, 12:46pm

Ah ok, but you don’t need to download the actual data in order to know which buckets you need, so you’d need to keep asking fast-sync servers for sub-commitments (how big is this, 33 bytes x number of buckets?) until you get a combination which when added up will match the top commitment embedded in a coinbase message. THEN you ask for the partitions, and you can verify each against a sub-commitment.

Jonas · September 29, 2023, 9:27am

After reading the doc and some discussions on TG I understand your point.

One thing I would note is the fixed number of buckets in the Snapshot Metadata message (128). I would prefer to add a field bucket count that specifies the number of buckets for this snapshot. The value 128 could very well be marked as a recommended value and client implementations are free to reject any commitment that has a bucket count value that differs from 128. This way there is a freedom to roll out new recommended values in the future without needing a new utxocmts P2P message version.

On a related note the there is one fixed layer of SubBuckets in this message and those could theoretically be further divided into SubSubBuckets, SubSubSubBuckets and so on. Instead of the sub-buckets field in the Snapshot Bucket Metadata being of the type Snapshot SubBucket Metadata it could be recursive with a Snapshot Bucket Metadata type along with a SubBucket Depth field describing how deep the buckets are.
Note: the cost of inserting a transaction into such a structure is O(L) where L is the height of the tree.

Jonas · September 29, 2023, 9:44am

How can a client verify this before the entire UTXO set is downloaded?
Let’s say a (bad-behaving) node moves one transaction from the first bucket to the last and then refuses to serve the last bucket.
A client would have to connect to a new (well-behaving) node and fetch the metadata, but the pubkey for each bucket would differ from what was fetched from bad node and thus needs to download the entire set again.

tom · November 5, 2023, 9:27am

This still seems unclear to people.

UTXOs, as restored by the CHIP, contain enough to allow a full node to sync. Allows a full node to build on top of the restored UTXO.
Wallets don’t use UTXOs, walles use transactions.
A UTXO set is not useful to them. No wallet exists that will take a UTXO as a historical ‘state’.
More to the point, there is no way to prove to that thin wallet that the historical transaction is actually valid.

So for wallets such a full node, restored from fastsync, is only useful for full transactions received after the sync point. It can not have any, really any history on its owned addresses before the sync point.

This is the current state of the tech, as is the case in software, this can be changed because software can be changed.

But before that is the case, I will object to any consensus changes because I consider this feature to not be finished yet.

Jonas · November 5, 2023, 12:40pm

I do agree with what you are saying above. UTXO fast-sync (with or without UTXO commitments) are indeed only useful for full nodes to be able to fully validate the blockchain from the point of synchronization. Any wallet can thus only fetch history from that point and forward.

I think it is important to talk about what the benefits of such functionality might be.
One use-case from the top of my head:
Let’s say that there is a CashToken launched called DummyCoin with genesis at block X. DummyCoin gains a lot of popularity with an entire eco-system around it with independently developed DummyCoin mobile wallets, DummyCoin merchant checkout services and so on. Each user/service/organization that wishes to launch a fully validating node exclusively for DummyCoin would be uninterested in whatever happened before block X and wouldn’t need to spend the time and storage of processing decades of redundant data.
For those the ability to quickly launch a node from, or slightly before, block X gains real value since there will be more fully validating nodes that can serve all DummyCoin SPV wallets with history.

This is not mutually exclusive for nodes with history going back from genesis.

Jonas · November 5, 2023, 12:59pm

I know this has been a somewhat controversial subject in the past and many people don’t agree with me, but…
I think synchronizing wallet history from genesis will eventually be a premium service. Without UTXO commitments I’m afraid that getting recent history will also come at a (monetary) cost.
There simply isn’t incentives for keeping public nodes open for everyone when the cost rises.

bitcoincashautist · November 5, 2023, 1:43pm

I described one of the benefits during a brainstorm session on Telegram: trustlessly bootstarting historyless SPV-serving nodes:

download block headers, verify them
get SPV proof for latest coinbase TX that has utxo commitment in it, verify that
get UTXO snapshot, verify that against the commitment
ask SPV serving nodes for proof for each UTXO in the set
sync up normally from there

With this you not only get a 100% verified UTXO snapshot, you also gather SPV proofs for each of them. With this you can be sure you have all the UTXOs at height N and all the proofs for them.
This pruned node can serve UTXO snapshot to other nodes, too.

Imagine some hypothetical future where blocks are 32 MB, history is 10 TB, and UTXO state is 30 GB. At 10k blocks interval, fast-sync would require a download of 30 GB + 32 MB * 10000 == 350 GB. Not so much compared to a full download of 10 TB. If you had 4MB/s fast-sync would take roughly a day, and full IBD would take roughly a month.

Then, if you really need it, you can fill in the spent history at your convenience.

We should separate the fast-sync and commitment CHIPs, or should I say - we should make a standalone commitments CHIP to hash things out since only the fast-sync CHIP now exists.
The commitments are the minimum consensus change needed to enable fast-sync to be trustless (they’re possible now but you need a trusted commitment, which Verde already implemented, and there’s been some work on BCHN too).
Fast-sync is the tech stack to make use of the commitments and capture the benefits of having consensus-validated commitments: trustless bootstarting from a snapshot.

@joshmg put it well in recent hangouts on X: “So that’s the difference between fast-syncing and UTXO commitments, UTXO commitments are actually making it a part of the rules that make a block valid, and fast-syncing is the benefit you get from having a UTXO commitment.”

tom · November 5, 2023, 8:35pm

This ignores the point I made.

The fastsync is about UTXOs, wallets (and SPV) is about transactions.

You can’t just replace one with the other and pretend you can get SPV proofs that are meant for transactions, for utxos.

If you think you can, please build it. Prove me wrong.

Until then, I don’t think fastsync in consensus makes much sense. Too high risk of changes needed (by hardfork) to make the normal usecase work.

bitcoincashautist · November 6, 2023, 8:02am

Yes, I was inaccurate, sorry, let’s try again:

Bootstarting historyless SPV-serving nodes:

Download block headers, verify them.
Get SPV proof for latest coinbase TX that has the UTXO commitment of interest in it, verify that against the headers from (1.).
Get UTXO snapshot, verify that against the commitment in (2.).
From the UTXO snapshot, generate a non-duplicate list of TXIDs that have at least one UTXO in them. Ask SPV serving nodes for each transaction and SPV proof for each.
Sync up normally from there.
The node can now serve historyless SPV wallets and also help bootstart other such nodes by providing them with UTXO snapshot & SPV proofs for TXs still having UTXOs in them.

ShadowOfHarbringer · November 6, 2023, 9:11am

I agree.

This is in line with Satoshi’s “Nodes will be server farms”.

Of course, the corpos and operators running these farms will not be willing to pay the costs of maintaining all the tree back to genesis for no compensation.

But I also think that it is important that some independent/hobbyist/academic/subsidied by government nodes remain, so it can be verified at any time that somebody is not cheating and supplying clients with fake history for some nefarious/for-profit reason that we cannot imagine today.

bitcoincashautist · November 6, 2023, 9:16am

Here’s some empirical evidence of there always being some copies of history: people seed Torrents, r/DataHoarder exists.

If Bitcoin Cash grows, having an archival node will come with big bragging rights.

ShadowOfHarbringer · November 6, 2023, 9:20am

Yes, also this is not going to happen in the next 20-30 years, because technology is still following Moore’s Law.

Hard drive space and Internet connection speed is rising geometrically. Much faster than adoption of BCH can probably proceed (that is, assuming no Black Swan-type scenario happens that causes billions of people to suddenly want to use Crypto).

Jonas · November 6, 2023, 9:31am

I think the crux of the problem is the details of this step

A client that just synced the UTXO set could in theory connect to every known Electrum server and issue blockchain.transaction.get and blockchain.transaction.get_merkle for each UTXO to achieve it. (No I don’t think this is a good solution)

Or, since the block height of each UTXO is known, set the appropriate bloom filter and request merkelblock for specific block from nodes with full history. Rinse and repeat for each UTXO.

Or the node gets the data from a side-channel not specified in this spec.

One key point is that those transactions and proofs are not contained within the snapshot and we are still dependent on someone being around and serving it in some way.

To summarize:
For a node to be able to fully validate everything (i.e. being a “full node”) from the point of the UTXO snapshot nothing more is needed. This is the problem solved by this CHIP and a miner validated commitment in a future CHIP could make this fully trust-less.

For a node to able to serve history (transactions and proofs) from the point of the UTXO snapshot additional data needs to fetched and validated. To me this seems like a separate CHIP if we would like a standardized way of doing it.

tom · November 6, 2023, 10:14am

So,
you just turned the small utxo set download into “download all full transactions that still have at least 1 unspent output in them”.

Maybe good to re-do the math (how big the download is) and the fastsync chip based on this new approach.

I suspect it changes a LOT.

When someone implements that, so we can be certain it actually works (not just theoretically), I’d love to see some discussion on how useful or viable it is for fastsync.

bitcoincashautist · November 6, 2023, 12:10pm

That’s only if you want to run your node as a historyless SPV server.

If you just need a node wallet, you don’t need the SPV stuff since you’ll have obtained the UTXOs from the snapshot and you’ll continue to maintain it as part of normal node activities.

tom · November 6, 2023, 12:26pm

You are not wrong, but for that usecase I suggest you just run an SPV wallet instead. Its massively less download for the same gain.
Also permissionless, trustless and actually works on-chain today.

bitcoincashautist · November 6, 2023, 1:26pm

and fully dependent on altruistic nodes that need to sync all history from genesis, and be ready to support such wallets queries.

tom · November 6, 2023, 8:21pm

yes,

so lets aim to make the fastsync actually capable of doing so. Because if we don’t then the vast majority of people that are indeed going to use SPV wallets can no longer use BitcoinCash as more and more fastsync nodes start popping up.

As I pointed out for some time, there is a logical error in your steps to get from the basic sync as the CHIP now states to actually being able to serve wallets. If you don’t believe me, then prove me wrong by building your suggestion in some proof of concept.

Arguing here makes no sense. This forum software is telling me I’ve commented enough on this thread. And, you know, its right. I’m not going to continue nicely trying to save you from the mistakes I see (for 18 months now) and you are not seeing.

Build it, saves us a lot of talk.

sandakersmann · June 10, 2024, 7:40am

BLISS Presentation: Committing to UTXOs with Calin Culianu