CHIP 2021-07 UTXO Fastsync

you misread. That was not claimed by anyone.

That was directly claimed by you. Maybe you miswrote.

Let me remind you, you said:

So we have freetrader, a person that doesn’t want to talk on anything but text, backing up BCA, who also refuses video or voice chat. Text only.
And they both support a project that is not supported by any of the actually known full node developers, this project is explicitly stated to introduce the one thing that killed BTC.

In the topic about UXTO commitments / EDIT: UXTO fastsync.

If this is not what you meant, maybe you should edit your post because it is highly unclear.

Tom seems to be saying that BTC introducing adaptive block sizes like in this proposal (which I support, for now) is what killed BTC.

Ignoring that

  1. BTC didn’t introduce adaptive block sizes

  2. BTC hasn’t been “killed” or even died naturally yet

  3. Full node developers such as BU’s devs - who are also BCH developers - are evidently supportive of adaptive blocksizes, so much so that they just implemented virtually this exact proposal in NEXA (compare Design and operation of the adaptive blocksize feature - credit to BCA for noticing that it’s basically identical)

  4. Tom does not supply any links to statements from other full node developers to back his claim that they do not generally support what’s being proposed here. Maybe it’s because it’s just too early for them to comment, this is a research work in progress after all… But I did not see any other full node developers backing Tom’s “do nothing and leave it to the market” CHIP either.

  5. It isn’t clear which known full node developers do not support this proposal. This thread is there for them to speak up at any time.

2 Likes

Uh, are we all going offtopic now?

This thread is about UXTO Fastsync, not “Asymmetric Moving Maxblocksize Based On Median

This is why pointed out to Tom that his response does not make sense.

This may be confusing ,but since you speak specifically about “this thread” I conclude you are asking people supporting UTXO fastsync to speak up? I think most people support it in principle, waiting for the details.

Here is my voice of support; Supporting UTXO Commitments

move it here => CHIP 2023-01 Ensuring safe permissionless block growth

There is no point.

Your CHIP also does not make sense. The same as Javier Gonzales’ BMP did not make sense.

This topic has been beat to death like 10 times already. It will not work because miners do not make decisions in this ecosystem, they are not active players. They will not be setting any parameters because of network conditions at all, unless some serious catastrophe happens.

What we are doing here is already very offtopic, so I will not explain it more here.

Back on-topic,

Upon re-reading im_uname’s original proposal, I realized I have re-discovered the problem, he had covered it already:

A: This proposal adds additional requirements for any fast-sync schemes: in additional to hash of the UTXO, blocksizes need to be committed and verified as well. Fast-syncing clients should download blocksize commitments and use them to determine consensus maximum blocksize limit for new blocks.

even though he didn’t specify where and how often to commit. I brainstormed it with Josh & Andrew on Tg, in conclusion, quoting myself:

so really the blockiszes commitment could be an entirely separate thing from the utxo commitment, but it would be maybe more practical if they coincided and one could find them in the same block

Also, on Slack, @dagurval told me that Bitcoin Unlimited’s new blockchain has addressed this by including block sizes in their block headers, which is not an option for BCH.

1 Like

This thread received some flags. I invite you to keep focused on the topic in discussion and avoid anything that may look like a personal attack or directed misinterpretation (intentional or not).
For the sake of BCH, let’s focus on what unites us and try to smooth things over with respect.
Thanks.

5 Likes

Do you think it would make sense to split the UTXO buckets using sorting based on locking script instead?

I’m not very sure of the possible implementation, but something where the UTXO set would be sorted by their locking script and then split such that all 128 buckets are of uniform size or close to it.

Here, the snapshot SubBucket Metadata may then look like:

Field Name Byte Count Format Description
public key 33 public key, big endian The secp256k1 point of the EC multiset for this UTXO snapshot sub-bucket.
start lockingScriptByteCount 1-4 compact variable length integer The number of bytes in the starting** locking script of the SubBucket (as a compact variable length integer).
start lockingScript ? big endian The starting** locking script of the SubBucket (“scriptPubKey”, “pk_script”).
byte count 8 integer, little endian The number of bytes within this UTXO snapshot sub-bucket. The sum of these must equal the byte count of the snapshot bucket.

** The ending lockingScript of a SubBucket is the starting lockingScript of the next.

Advantages:

  • This can foster a new kind of light wallet that can easily query their UTXO set from nodes. The wallet can query extra SubBuckets or SubBuckets from different nodes to circumvent nodes omitting UTXOs by not following the ordering. By checking the ordering in the SubBucket and its EC mutliset, the wallet can determine if the node follows the UTXO ordering.
  • The wallet may maintain the privacy of the user by downloading extra SubBuckets to obfuscate the intended locking script query. These extra SubBuckets can also be used to check the UTXO ordering of the node.
  • Such wallets won’t have to build a history of transactions over the blockchain or query the entire UTXO set. This gets even safer if the EC multiset is committed to the block header.
  • Since such UTXO set is generated once per block, a node does not need to generate additional filters for each wallet subscribing to a locking script (or maybe my understanding of SPV is at fault).

Disadvantages:

  • Will add overhead of splitting the UTXO algorithmically. However, clever schemes of passing UTXO sets from one SubBucket to another can leverage the O(n) operation of generating EC multiset.
  • Nodes may not follow the UTXO ordering so closely, and will lead to omission of UTXOs.
  • Increases the size of P2P UTXO Commitments (“utxocmts”) message.
  • The UTXO set may be flooded by UTXO with the same locking script to make the ordering and SubBucket formation difficult.
1 Like

I don’t know what the current state of these UTXO committment ideas are, but Tadje Dryja of Lightning Network fame did a whole talk about block size increases and a “proof of inclusion” UTXO system to scale on-chain on BTC. Interesting idea to observe.

1 Like

there are some pretty big problems that need solving and nobody is working on them.

  1. The basic concept of compressing the entire UTXO set to a single hash is stalled, the approaches tested have proven to be not good enough. The resource usage would be too great.
  2. starting a full node from a committment is useless with the current state of pruning. That node would not be able to share historical blocks, it would not be able to serve as an SPV node (or fulcrum backend) and it would not be much use for anything else than a private wallet or mining node.

Should all this get solved and the system works well, I would still suggest running it permissionlessly for maybe a year by selected miners before adding the commitment should be included in the blockheader (somehow) in some mandatory fashion.

I’ve been out of the loop for a little while, but getting back into it, so forgive me if the answer to this question is somewhere above and I just missed it. That being said, where did your assertion come from? I don’t think I saw anything about the ec multiset being too expensive. Did BCHN do a test and that was their conclusion, or am I misunderstanding?

Fast synced nodes are pruned nodes, so I think what you’re saying is that pruned nodes are worthless, so therefore fast-sync is worthless. Is that a correct interpretation of your opinion? If so, I think it’s not quite a fair conclusion; Bitcoin Verde can run its explorer with a pruned node (which obviously omits pruned data but is useful for address balance-checking and transaction review/inspection). Additionally, pruned nodes can serve most requests needed on the electrum API; the only downside is you won’t be able to get a list of your historical spent outputs. However, I think that’s perfectly acceptable in the scheme of long-term scaling. Let me know if you’re thinking of things I am not.

2 Likes

Hi Josh,

things have not changed since this post.

What you write that is possible with such pruned nodes I acknowledge in the post you reply to, just in different words.

For a permissionless idea, its fine. It works for private nodes, you can trust it and point your wallet to it. Most of the time it will work. Just don’t import a seed or sync a BCMR, or anything else which requires historical transactions.

The point is that pruning as defined in the whitepaper is about removing only data that is no longer relevant because it has been spent. This is not yet implemented in any software. Pruning we actually have is less subtle, it deletes entire blocks at a time. Which includes unspent transaction outputs.

Pruning can be fixed to only delete the spent outputs, this together with merkle-proofs means you can still do SPV trustless. Because the node can provide historical transactions AND their merkle-proofs as long as said transactions are unspent.
Instead a pruned node is incapable of generating a merkle-proof of unspent transactions. Because it doesn’t have the merkle-tree for that block at all. The entire block is removed.

So, sure, you can spin up a full node that is pruned, but people can’t get the proofs needed from it, they need to trust your node. And that misses the point for the wider network.

So, please don’t take this as criticism towards Verde. The fact that this is on this website was the assumption that this is about the interoperability of multiple nodes, of perhaps moving this to the consensus layer. And this is the basis of my comments.

Bottom line is, if most nodes being spun up start using this commitment method, it will stop all wallets of actual users being able to do their thing.

I’ve been spending more time looking, but I can’t find anyone reporting otherwise. I can’t find a single report of CPU time or data-processing time. I can’t find any results stating it is “good enough”.

If you have actual information stating otherwise, some amount of milliseconds used to process a block of N transactions (or outputs), that would help us determine that it is in fact good enough.

In my world, if nobody publishes their success, there is no succes and further research is needed. Apologies if that is naive, I just don’t have time to implement the algo myself to test if its good enough. I assume if others reach success that they will say so out loud and proud…

1 Like

Because there is none, yet. The “Implementations costs and risks” and “Ongoing costs and risks” sections are still marked “TODO”, which is why this hasn’t moved forward for a while. Someone™ has to do the necessary work, and it hasn’t happened yet.

Agreed. However, I think we can also make a preliminary analysis from first principles. Based on that, the expectation is that it will be proven performant once performance tests do get done, else this would all be a waste of time. Let’s do the preliminary analysis here step by step.

First we have to make one thing clear, because it impacts everything downstream. This proposal is NOT using Merkle tree commitments, and it is NOT using Utreexo like what ZeroSync for BTC would. This proposal is using EC Multiset as the building block - which has the feature we want: O(1) insertion and removal.

The trade-off is that it can’t be used to verify that a single element belongs to the set. It can only be used to verify that the whole set matches the commitment - and that’s OK, because it is the only thing that we want from it: be able to verify the whole UTXO state at some height N, obtained from some untrusted source.

Let’s say you have commitment N and UTXO database state at height N, and you want to generate a commitment for UTXO database state at height N+X, what do?

To get the database state, you do what you do anyway - delete and insert UTXOs from / into the database as you process each TX in a block, right? That’s the O(1) operation nodes already do.
At the same time, you can delete and insert from / into the commitment, which would be just another O(1) operation performed for each UTXO spent / created by a TX, and one such that it doesn’t require disk i/o (unlike actual UTXO db ops). Updating the commitment to add or remove a single UTXO is an operation that fundamentally requires a single hash operation and a single EC addition or subtraction - much cheaper than a sigop which requires an EC mul.
This wouldn’t change the O(N) scaling law for block validation - it would only slightly increase the cost per transaction.

I found something related that could give us an idea of what to expect:

On the other hand, we demonstrate a highly-efficient software implementation of ECMH, which our thorough empirical evaluation shows to be capable of processing over 3 million set elements per second on a 4 GHz Intel Haswell machine at the 128-bit security level—many times faster than previous practical methods.

Maitin-Shepard, Jeremy, et al. “Elliptic Curve Multiset Hash” (2016)

2 Likes

Thanks you for confirming my point, it looks like the problem may be smaller than I thought, but still not exactly solved. So my short update wasn’t wrong, but more details is good. Although its not a short update anymore :wink:

On the multiset approach.

This approach has a bit of a problem in that it has no intermediate state. What I mean with that is that you may end up modifying this 33 bytes array until the end of time and the cost of validating it will continue to rise as the utxo set itself grows.
This naturally goes hand in hand with the actual UTXO set itself growing in size.

Most systems have something like a checksum they apply to a much smaller section of the data, fail fast is a first principles design approach.

This reminds me of the design I applied in the UTXO-database that is part of Flowee the Hub. Instead of having one huge database, it is instead split into a bunch of databases based on the very real statistic that utxo spending tapers off as the output gets older. If you were to plot the age of a utxo you’ll see it starting with very short lifetime and as you add on years it very significantly drops steadily.

The effect of this is that if you were to design a database that holds only the first N years (or first M-million utxo entries), it will see less and less updates to it as time moves on.

In Flowee the Hub you’ll find it creates some 20 databases as it syncs the chain from zero. With less and less changes to each database as new ones are added.

This same could be done with multi-sets. You could have a new multi-set every 6 months. With a multitude of benefits. Least of all is that it makes downloads much less risky as those sets now each can be a separate download.

Only the node that’s performing IBD from a snapshot must validate it by adding the whole UTXO set to a {}-commitment, and doing that will take O(N), where N is the number of UTXOs in the snapshot. This is much faster than parsing the whole blockchain from 0 to construct the UTXO set at the same height. Instead of doing that, you’d get a snapshot for height N from whatever untrusted source, do the O(N), and compare it with the value written to associated coinbase message. Then, you only need the new blocks on top of that and never have to download the history (unless you want/need it).

For nodes that are in sync, they’d just be performing incremental updates to the last verified commitment. The size of the set compressed in the hash doesn’t matter - additions and deletions are O(1).

If you have a commitment to a set of N elements, and you have to update it to a set of M elements, you only need the N commitment and a “diff” between states N and M: some k elements that were added and j elements that were removed, and the update will take O(j+k) operations, regardless of N and M sizes.

All you write is true, nothing you wrote invalidates the detected problem or improvement suggestions.

The problem I stated still exists, today it requires a 10GB download or so of the UTXO which then needs to be validated in total, if one byte is off you have no way to detect which byte in that entire set is incorrect.
If we move to a 100GB UTXO this is still true and a bigger issue still.

The idea I wrote solves that and additionally adds a lot of opportunity for more optimizations.

Nitpicking, this is not true. If we assume that one day miners will include such commitments, they too will build it the hard way at least once.

1 Like

I will need to read up on the details far more heavily, but intuitively I think this idea about the most frequent updates & changes happening on the most recent UTXOs and therefore some kind of bucketing system which helps to prioritise ease of updates there even if it means trading off some performance on the other chunks may make sense in some kind of bucketing system.

I can imagine a world where your pruned node comes online in chunks, starting with the most critical 10% of “UTXOs from the last month”, which are the ones also most likely to be respent, and then filling in each of the following 10% chunks which change progressively less and less often.

1 Like