Auto reorg scalenet as intelligently as possible

freetrader · November 28, 2021, 9:44am

Opening a new topic that relates to this thread (and the fact that scalenet has grown to something like 200GB).

In this thread I would like to explore a different approach:

Automatic re-organization of scalenet at regular intervals

Scalenet right now consists of a fixed common genesis block (height 0) with a checkpoint at a “last permanent block” which is currently at height 9,999, making block 10,000 the first impermanent block on that network.

There have been some deliberations on shifting that last permanent block out to e.g. 10,006 to simplify code for certain DAA algorithms.

For the purposes of this thread, we will simply talk about block L as the last permanent block, regardless of what height it’s at, and call the next block L+1 , and so forth.

The proposal here in brief is to establish a way to re-org to a new block L+1 at some pre-determined interval.

We form a consensus on a ‘starting date’ S, expressed in seconds past the epoch, from when this scalenet auto-reorg mechanism should take over from the current infinitely extending scalenet.
We pick an interval D along the UNIX epoch timeline at which we want nodes to deep-reorg their scalenet chains automatically. Here I propose every roughly D=2,592,000 seconds (30 days).
If the scalenet tip’s MTP exceeds S but is less than S+D and the block L+1 is still the “old” L+1 (i.e. a never re-orged scalenet [1]), then we need to perform the First Automatic Scalenet Reorg (FASR) which is described below in a separate section and skip step 4.
(we only get here if FASR has been done at least once, so the block L+1 contains a commitment of the MTP at which the previous re-org was done)
If the scalenet tip’s MTP exceeds the MTP time in current block L+1 (committed either in the coinbase message string or via in an extra OP_RETURN of the coinbase of block L+1) by more than D , then we are due for an auto-reorg and we perform the Yet Another Scalenet Reorg (YASR) procedure.
For each block incoming on scalenet, we evaluate from Step (3) again. If FASR (3) or YASR (4) is not performed, carry on processing as normal. Entering FASR/YASR will perform a re-org down to block L and append a newly constructed block L+1 which should be identical for all nodes. Processing then continues normally after return from FASR/YASR with block L+2 mined as normal by some miner on the network.

FASR:

Use S as the commitment value for the new block L+1 to be mined
Also use S for the timestamp of block L+1

YASR:

Use the MTP at current chaintip as the commitment value for the new block L+1 to be mined
Use MTP+D as the timestamp for the block L+1

Common procedure for FASR and YASR:

Construct the common new block L+1.
In each case, the new block L+1 will carry a commitment to a UNIX epoch timestamp.

In the normal (YASR) case, this block will carry a commitment timestamp T equal to the MTP value at the chain tip prior to auto-reorg.

In the FASR case this commitment T will be equal to the value S.

This commitment (to a UNIX timestamp) can be encoded in the coinbase string as a null terminated string of decimal digits (which have to be convertible to a number 0 < C <= 2^64-1 to be valid). This would have the benefit of being easy to read in block explorers. Alternatives could be to encode it in an additional OP_RETURN output of the coinbase transaction.

For the coinbase transaction:

tx version set to 1
locktime set to 0
single output, locking script is simple fixed OP_TRUE and sequence number set to 0xFFFFFFFF (4294967295)

For the block L+1:

block version set to 0x02000000 (536870912)
block timestamp set as computed per FASR or YASR (see sections above)
difficulty target of block L+1 is computed and set “as normal” (same expected value on all nodes).
Nonce to be computed with all nodes mining from same nonce starting nonce value: 0

Given these parameters are all deterministic and fixed, all nodes should be able to mine the same block L+1 with identical proof of work.

They must do so before resuming their regular processing on scalenet.

They must NOT announce this newly created block L+1 to other peers via inventory message, but instead validate it themselves and advance the chain tip to wait for the next block, L+2, to be mined normally by whoever finds it. Thus life goes on until the next YASR.

Nodes starting up who need to do IBD from below block L or who have sat out on a reorg cycle:

might need to be invalidated manually (or just wipe their scalenet datadir so that they IBD afresh)
should query and receive block L+1 normally and enter the processing loop to check for FASR / YASR
might need to receive the expected commitment value of block L+1 manually to ensure they end up on the right chain, in case there are peers serving them chains that are not compliant to this scheme

[1] this can be confirmed via checkpoint set up prior to FASR if we wanted to be quite sure, or by deeming any block L+1 without a suitable timestamp commitment as “old”.

freetrader · November 28, 2021, 1:00am

This was hacked together quickly, and might contain one or more dumb ideas.

Please criticize constructively

cculianu · November 28, 2021, 2:47am

Hmm… so the scheme where you have to embded consensus data in the coinbase is… problematic for mining software I think.

One would need to modify existing mining software to “know” about BCH’s funky scalenet rules.
Not impossible but it is a burden to do that. Mining software doesn’t just take the node’s template coinbase txn at face value. Most mining software throws it away completely.

Stated another way: some existing mining software just authors its own coinbase txn basically (such as p2pool and ckpool/asicseer pool). I feel that can create headaches if we add the requirement that the mining software must now be scalenet-aware.

Can we take a step back again and … can I ask Why do we need this commitment?

Is it so that a node just coming online (perhaps after having been turned off for a while, which is now on a stale chain) can decide it should potentially follow a shorter/different chain? Or – I’m having trouble understanding the motivation behind some of the proposed design decisions here.

Basically: Why do we need the coinbase commitment in the first place?

If it’s to solve the problem of a node knowing whether to switch chains or not – can’t we just do that in a simpler way with the timestamp in the headers?

We already have a “commitment” of sorts in headers – the timestamp in the 80-byte header! I am thinking we can be clever here and just leverage that commitment here, for this, thus keeping things as simple as we can.

Like say make the epoch length D=2,592,000 seconds. You only follow the chain that starts AFTER the most recent epoch as you calculate from your “trusted” local system time. All other chains are invalid if they are too old. This also potentially can have the odd property that after a node mines the last block that pushed the epoch to be too long past the deadline, nodes would just auto-invalidate themselves down back to height L. So the last guy mining the last block in the last epoch won’t see his block “live” for very long…

Stated another way: You kind of already know what the most recent epoch is – you can take current system time modulus D=2,592,000 seconds. You are expecting the valid chain that starts at height L+1 to have a timestamp no earlier than this time you calculated. You automatically reject any chain that does not follow this predicate. Including the (previously) valid chain you are on now!

So… just leveraging the header timestamp in some way like that – Wouldn’t that solve all the problems? Or what problem wouldn’t it solve?

mtrycz · November 28, 2021, 10:47am

I have slept on this and have a simple scheme that just might work.

Once we’ve settled on a period for reorgs, say 30 days, use these two rules:

Up to block L, all blocks are valid according to the usual validation rules
Starting from block L+1, in addition to the usual validation rules, the date of this block needs to be inside the current time period.

This means that blocks from the last time period become invalid. Even if the latest blocks could be in the valid period, they do build upon invalid blocks.

Things I’m not worried about:

That attackers will take advantage of the simplicity of the rules to take over the chain; it has no economic value
That the process is not fully automatic; people will want to nuke their sizeable datadirs anyway.

What’s the downsides?

freetrader · November 28, 2021, 11:12am

On consideration, I would agree that for such testnets it is acceptable to introduce a requirement that a node is properly time synchronized in order to end on the correct chain.

So I think this much simpler scheme could work - I like where this is heading.

This is equivalent to what @cculianu proposed, from what I understood…

We could take the nTime of block L to be the starting time of these 30-day (or whatever is decided) epochs.

freetrader · November 28, 2021, 11:34am

Good point about these commitments complicating life for mining software. That’s certainly a downside.

Block L (height=9,999) has difficulty 1, so I thought in fact any simple node can and should be able to easily CPU-mine block L+1, and it would happen pretty much instantly.

I went with commitments to eliminate dependence on system time, any node would be able to just use the block data to judge whether the block L+1 is valid.

But now I am almost convinced we can safely relax this - for a test network like this. If we can avoid the commitment complexity, that would be great!

I think what you propose above, and what @mtrycz proposed in his reply, are functionally equivalent.

I’ll think about the possible downsides / problems some more, but it may take lots of time since my mind is too unfocused right now

mtrycz · November 28, 2021, 11:34am

Oh yeah, I missed Mr Culianu’s comment. Yeah, I think he stated the same thing more eloquently.

In virtue of no economic value, I’d go for the simplest approach possible.

freetrader · November 28, 2021, 11:36am

In virtue of no economic value, I’d go for the simplest approach possible.

Simplicity here is also what I’m after, but I think there MAY BE economic value for others to disrupt BCH’s scaling activities, so I will spend some time (not today, but probably in the next few days) thinking about possible attacks on the simplified scheme.

mtrycz · November 28, 2021, 11:38am

Joke’s on them. The more they disrupt us, the more resilient we become. (jk jk)

freetrader · November 28, 2021, 5:07pm

Ah yes, after thinking about it during a walk, I think the main reason I wanted to lock down the exact L+1 block, down to the expected hash, is to avoid someone nasty from fragmenting the chain whenever it reorgs.

But … not sure the incentives are there. But if it turns out to happen (and I can imagine someone writing a script to do it because the re-orgs would happen like clockwork) then … dynamic checkpoints could help with nailing down a valid scalenet L+1 , but they are not a feature available yet, and there are those who find them unappealing, and it’s quite possibly overkill.

But, since L+1 is so easy to mine, I would think fragmentation might even occur naturally. Maybe it’s not a concern because incentives are low, and if someone wants to mess with scalenet or other low-difficulty testnets, they can do so already or starting from L+2 even if L+1 is completely deterministic.

mtrycz · November 28, 2021, 7:18pm

Yeah, the incentives to attack at reset are the same as incentive to attack at any random moment.

The worst outcome would at a maximum be minor nuissance, so there’s that, I think.

tom · December 16, 2021, 10:52am

it would not solve that a full node with 200GB data dir starts to add more data without ever throwing away that old data.

Effectively people will want to re-sync from zero every now and then to avoid an ever growing full-node data dir. Mtrycz recognizes this too in his post.

Personally, any “consensus” rule that is unique to the scalenet or testnet is something to be avoided as that makes the main code more complex (see pow.cpp).

The simplest solution I think is basically to reuse the already existing invalidateblock concept which marks a known block (longest chain) off-limits.

The effectively means is that people will simply trash their datadir and start a new client every x months, with a list of known outdated block-ids that are provided in the config or on the command line.

The advantage of this is that its the most simple and stupid method there is, it is completely without new algorithms or consensus rules. It has the advantage that when the main public nodes running this chain do it that new clients starting up will probably not even see the old chain.

The only code that may be nice to add is to read from config a list of blocks to invalidate to avoid following the wrong chain. Installers that install the client as a system-service may benefit by shipping such a list of deprecated-blockids in a config file.

freetrader · December 16, 2021, 12:49pm

The “automatic” requirement of this proposal is supposed to eliminate the coordination burden on such re-organizations.

There is already precedent for having specific consensus rules for certain testnets, indeed testnets exist mostly to test changes to consensus. Adding a simple (which I believe @mtrycz proposal is) rule to the scaling testnet does not strike me as overly burdensome. Support for ScaleNet is entirely optional, and the principle of chain reorganization is one that must already be accommodated by software.

The issue of sub-optimal storage backends is a separate one, but indeed a good point to draw attention to.

tom · December 16, 2021, 1:54pm

And the opening statement of that being an unattainable goal, due to the user needing to flush its datadir anyway, dismissed that. I just didn’t want to be too blunt, but it seems my more fluffy language made the point not get across. Sorry about the confusion, hope this clarifies that.

There is already precedent for having specific consensus rules for certain testnets

Yes, I pointed one out in the post you replied to. And indeed this is why we have that experience and we know that it is a bad idea. This is the basis of why I wrote that adding more is what I pointed out as something to avoid. The complexity of the addition isn’t really all that relevant in this regard so the original point stands.

As an aside, I would really not like my test node to suddenly do a many months reorg without operator interaction. This is not just about a full node, this is for all the software running on top as well. Most of that would likely also not like to code in support for such a massive reorg. It would defeat the purpose of the scalenet.

freetrader · December 16, 2021, 2:18pm

Not an unattainable goal, simply an implementation issue - I wouldn’t even be making assumptions that all nodes require such manual clearing at this time. It’s something that can be improved at level of each node without requiring any further coordination.

As an aside, I would really not like my test node to suddenly do a many months reorg without operator interaction. This is not just about a full node, this is for all the software running on top as well. Most of that would likely also not like to code in support for such a massive reorg. It would defeat the purpose of the scalenet.

That’s your prerogative, nothing any other node project(s) do are going to deprive you of it.

tom · December 16, 2021, 2:54pm

Ehm, what part of the operator needing to flush its datadir manually makes you think it is an implementation detail?

This is not about any specific implementation…

You seem to misunderstand my English. Please re-read the point made as a node-operator, not as a ‘project’ (whatever that is). Node operators do not want massive reorgs. People testing deployments of block-explorers or any other software don’t want that even more.

Very confused why you want to automate something like this, what the problem is you are trying to solve that can’t be solved like I suggested…

Edit:

the topic is to reorg as intelligently as possible.

Naturally, some smart thing can be done (its only code) that takes you half a year to code. Code that re-downloads formerly pruned blocks to ‘undo’ up 20000 blocks, then some code to delete blocks from the datadir while not pruning and probably other massive projects like using some dates. Repeat this for any software actually using the full node.

Alternatively, the intelligent thing to do is once every year people delete a dir and restart the node. Done.

freetrader · December 16, 2021, 3:00pm

Everything about it.
Better node software would not keep block data that’s no longer needed around forever.
To the extent that this is not implemented already in nodes, it’s simply an area that can be improved (outside of all consensus).

I don’t think your English is the problem.

Whether a re-org of scalenet happens manually or not, it’s still a deep re-org that is proposed here. If a node operator doesn’t want to partake in a re-org because it would interrupt some test they are doing , then they could simply continue mining the existing chain to continue their testing as long as they want to.

Most node operators want the least possible manual intervention, esp. not having to coordinate manually on a re-org to reach consensus at irregular intervals.

An automatic re-org scheme has significant benefits of reducing maintenance for Scalenet operators.