CHIP 2021-07 UTXO Fastsync

several months later, this topic floated to the top and I notice this comment, its even been actioned on.

To at least comment on this, I’d like to say that for the UTXO storage in Flowee I considered the var-int and rejected it after testing.

As background, Flowee implements its own database (actual low level code writing files and all that) for the UTXO itself. There are indeed lots of values in there which make little sense to store in the full 8-byte setup.

Instead, what Flowee does is that we use this little used bitcoin-core (satoshi time, really) way of compresing integers as a simple variable-byte-size system. If you ever looked at UTF-8, you’ll know it as its quite similar.
Where ‘var-int’ we use in the transaction-encoding jumps from 1 to 5 or 9 bytes, this smoothly scales from one to 9 bytes. It also doesn’t waste 1 full byte to indicate which type is used. Instead it is easiest understood as a 7-bit encoding.

A block-height of 700000 (0x0AAE60) gets encoded in 3 bytes instead of 5, to make a concrete example.

Code may speak louder then words; Java, C++.

A big bonus here is that you encode a weak form of checksum in your dataset. Start reading at a wrong position and you’ll get thrown an exception most of the time. Data-corruption is likewise much faster to detect.

Anyway, just my 2 cents based on a couple of years working on storing the utxo dataset. :wink:

Cc: @joshmg @groot-verde

2 Likes

Oh yeah this is what is referred to in the Core or BCHN sources as “VarInt”. It’s certainly more compact than the existing CompactSize scheme here. I’m willing to change to it if @joshmg is. Whatever people want. It definitely saves space…

2 Likes

It’s not set in stone yet, so whatever we think is going to have the best compatibility and storage benefits is fine with me. IIRC, we did run numbers on space-savings for CompactVariableLengthInteger (“varint”) when storing amounts and for the current distribution of UTXOs it was more efficient than a flat int (however, perhaps Flowee’s format is even better). If you get to it first, run some numbers on the current UTXO set with your format and let me know how it compares to the varint size and that should be justification enough, in my opinion. Otherwise, I can look into it either during or after the holiday and report back.

Ok, that’s a good idea. I’ll modify it locally here and produce 2 different serialized UTXO sets for the same block height and compare and report back!

By the way if we switch to what BCHN/Core codebase calls this format: VarInt – we can “variable-int”-ize more things since it has more lee-way in how it encodes – like we can also do blockheight | is_coinbase and other of the ints we opted not to compactsize-ize.

Ok, so I exported the set to disk, both with CompactSize, and a second run with VarInt (I am calling what Tom proposed VarInt here since that’s what it’s called in the BCHN sources already).

Block 770,000
Num UTXOs exported: 57,925,776

  • Using CompactSize (status quo), size was: 4907723232 bytes (4.907 GB)
  • Using VarInt (what Tom proposes), size was: 4787066061 bytes (4.787 GB)

This is a savings of about 120MB for a 58M entry UTXO set. Or about 2 bytes per entry. Note that if we want to get even more stingy, there are ways to encode locking scripts in a more compact form to save maybe 3-4 bytes per UTXO as well …

UPDATE: I did the “more stingy” method on the same utxoset as above (used this thing called CTxOutCompression from BCHN sources, which serializes the locking script in a more compact form).

  • Using CTxOutCompression, size was: 4557067376 bytes (4.557 GB)

This saves 350MB over the original proposal, or 230MB over just using VarInt. Or about 6 bytes per UTXO over the original proposal, or 4 bytes per UTXO over just VarInt.

2 Likes

Interaction between UTXO commitments and a proposed excessive blocksize adjustment algorithm has been bugging me because the internal state of the algorithm would also be part of “current state”, but commiting the algo’s state would make the algorithm a new consensus rule, and any tweaking of algorithm’s parameters would require messing with the commitments code.

Thankfully there’s a way around that: commit only actual mined block sizes. Then clients can get the whole block sizes dataset (about 9MB now) when they do IBD, validate it against the commitment, and then compute the algo’s current state. This way, in case of need be, we could revert to flat limit, tweak or change the algo, and there wouldn’t be any baggage of the old algo.

Could the specification be extended to add the block size data?

3 Likes

I support this idea to extend the committed data to include prior block sizes.

1 Like

Cool! I think it could be implemented as just one hash – of the list (could be varints serialized in order) of blocksizes, just 1 more EC point to add to the multiset (subtract old 1, add new 1 on each block). Optimization: keep the hash stream open, just add blocks and dump the function’s output instead of rehashing the whole list every block, cache internal states of the hash function for last 100 blocks so can reorg easily.

So we have freetrader, a person that doesn’t want to talk on anything but text, backing up BCA, who also refuses video or voice chat. Text only.
And they both support a project that is not supported by any of the actually known full node developers, this project is explicitly stated to introduce the one thing that killed BTC.

Lets keep the blocksize properties to the free market, please. Then you don’t need to track the historical size of blocks. Simpler is better.

Did you forget the part where EB is still free to be set by other nodes according to your own scenario? If the market is free to adjust the flat EB, then it is free to automate it, too. Even if BCHN would ship with the algo, other nodes could set it to some flat EB above what’s getting mined.

1 Like

this is off topic on the Fastsync topic. Please don’t rehash previously answered questions on unrelated topics.

So is your comment re. text only, which is completely out of place.

It’s neither not supported, it’s is research, and if we can demonstrate it is good, then maybe it will get more support? Not from you obviously, because you have your fantasy CHIP where everyone magically decides to adjust a consensus-sensitive parameter at their own leisure and everything somehow works out… and I don’t expect you’ll let go of it, so let’s continue talking in circles please.

This is just pure FUD. The objective of the algo is exactly to minimize the risk of getting dead-locked on a flat value ever again.

2 Likes

Dude, what’s wrong with you. I’d propose to go on a video chat and figure out why you are not understanding the dynamic of it, but this kind of toxic statements are really not Ok. Being biased for your own idea is fine, but calling a competing idea “fantasy CHIP” is just not Ok.

Not ok dude.

What if it is you who is not understanding?

You’re being toxic in FUDDing this as “the one thing that killed BTC” and bringing my privacy preferences into the discussion.

You’re being biased to your idea of everyone magically somehow communicated when/how to move EB. Same like Javier Gonzales was fantasizing about miner’s BMP chat - got 0.1% hashrate to participate. The market decided alright - not to participate. “fantasy CHIP” is accurate, like it or not.

3 Likes

This argument is complete nonsense.

The reality is that when you want to remain at least semi-anonymous in today’s Internet, you need to remain text-only.

It is way too easy to make a mistake if you make a Video and too hard to scramble your voice properly with also completely nullifying the background noise (including noise from, for example, some AC-powered devices that can give you out because the electrical hum of such devices is strictly dependent on your location, every huge AC transformer gives slightly different hum, which allows law enforcement to find people even if they did not make other mistakes) .

What exactly were you trying to achieve by posting nonsense here?

2 Likes

This argument also seems nonsense.

Can you explain precisely how UXTO commitments can kill BCH the same way they killed BTC?

I don’t remember BTC ever implementing anything even remotely resembling UXTO commitments.

1 Like

you misread. That was not claimed by anyone.

That was directly claimed by you. Maybe you miswrote.

Let me remind you, you said:

So we have freetrader, a person that doesn’t want to talk on anything but text, backing up BCA, who also refuses video or voice chat. Text only.
And they both support a project that is not supported by any of the actually known full node developers, this project is explicitly stated to introduce the one thing that killed BTC.

In the topic about UXTO commitments / EDIT: UXTO fastsync.

If this is not what you meant, maybe you should edit your post because it is highly unclear.

Tom seems to be saying that BTC introducing adaptive block sizes like in this proposal (which I support, for now) is what killed BTC.

Ignoring that

  1. BTC didn’t introduce adaptive block sizes

  2. BTC hasn’t been “killed” or even died naturally yet

  3. Full node developers such as BU’s devs - who are also BCH developers - are evidently supportive of adaptive blocksizes, so much so that they just implemented virtually this exact proposal in NEXA (compare Design and operation of the adaptive blocksize feature - credit to BCA for noticing that it’s basically identical)

  4. Tom does not supply any links to statements from other full node developers to back his claim that they do not generally support what’s being proposed here. Maybe it’s because it’s just too early for them to comment, this is a research work in progress after all… But I did not see any other full node developers backing Tom’s “do nothing and leave it to the market” CHIP either.

  5. It isn’t clear which known full node developers do not support this proposal. This thread is there for them to speak up at any time.

2 Likes

Uh, are we all going offtopic now?

This thread is about UXTO Fastsync, not “Asymmetric Moving Maxblocksize Based On Median

This is why pointed out to Tom that his response does not make sense.