Thinking about TX propagation

tom · December 30, 2020, 5:12pm

Intro

I’ve been thinking about the basic question of responsibility of transaction propagation. The main choice is between a) the originating wallet. b) the bitcoin full nodes (aka the network).

What problem are we trying to solve?

Transactions that fall outside of the standard norms, even without their fault, may get stuck, severely damaging the user experience.

For instance when a mempool is full, transactions may get evicted. Transactions that fall outside of network-wide rules may not get accepted in some mempools, for instance with low fee or long unconfirmed chains of parent utxos.

Wallets that create such transactions may never see them mined for reasons that they can’t understand and if some nodes accepted the transaction in the mempool they can also not double spend it. Leaving the inputs locked until the network ejects the transaction based on a timeout.

The two network-wide rules we have (1-sat fee minimum and long chains of parents) are centrally planned as a result of this issue, if everyone follows these rules the UX damage is minimized and mostly avoidable.

This does beg the question of fragility and if there may be a more healthy way to solve this.

Can we improve the predictability of transaction propagation in a decentralized network. And related: Can we make BCH more anti-fragile by avoiding dependence on homogenity of settings.

The issues (scenarios)

A user creates a transaction that is less than 1 sat/byte. He sends it and some nodes accept it, but it never gets mined. Now what? Trying to double spend it with a higher fee will get accepted at some but not other nodes (the ones that saw it before). Waiting a day until the first transaction has expired may or may not work. The user is left in a very uncertain situation waiting for a confirmation.
A user creates a transaction that exceeds the 50 unconfirmed outputs rule. Very similar scenario to the first one, knowing what to do is difficult.
User sends a transaction to merchant directly (not network) via bip70. Goes down the street and spends a child transaction. 2nd merchant rejects because it doesn’t know the first (parent) transaction. Now what?
Some miners decide that they want to earn more and start to select transactions based on fee. A low fee one doesn’t get mined for an hour and user decides he will pay more fees. Now what?

Past solutions (attempts)

Satoshi’s 0.1 code recognized this issue too and this is the reason for the transaction having a ‘sequence’ field, allowing the creator to replace his transaction with one of a higher sequence. The problem with this and similar approaches, like RBF, it that they destroy the zero-conf usecase.
Expunge from mempool after timeout.
This is actually implemented by all popular nodes, but this only solve the problem sideways where you can wait the timeout (currently 72 hours by default) and assume your transaction is unknown to all.
Re-broadcast mempools / mempool sync.
This is done by some nodes and it mostly nullifies the previous solution. This helps some transactions that didn’t get accepted due to time-limited rules (chain-length) but hurts transactions that got rejected due to policy rules.
Wallets re-broadcast transactions regularly.
Some wallets do this (the original Satoshi one does), there is a real benefit for a wallet that receives a transaction to re-broadcast the incoming tx regularly in order to ensure it will be mined and ensure that it can not be double spend.
Child-pays-for-parent. Specifically for scenario 4. This is hardly used on BCH because fees are not a deciding factor for inclusion. Yet. The downside of this solution is a quadratic scaling issue on the side of the network, which is the entire source of the chain-limit to begin with.

Pulling it together

Having the requirement for common settings between nodes makes the network fragile. UX can be severely impacted when malicious nodes start appearing that have different policy settings. Keeping all nodes the same also implies that innovation is stifled due to the requirement to upgrade all at the same time. The chain-depth length is a great example of both these issues.

These centrally planned values are also going against the open market innovation: we eventually will have to remove the 1-sat/byte fee and allow a fee-market to appear. As we wait longer the cost of doing so becomes higher.

All of these items lead to the same issue: we need to find a way for the network to handle out-of-the-ordinary transactions because centrally planned settings are too fragile and unmaintainable.

We need to allow users to make mistakes, allow them to create a transaction that the network doesn’t accept, or the network evicts over time, while the user stays able to recover from this in a standard way.

My suggested solution

Ideally the one that is motivated to get the transaction mined (the receiver) solves the problem of a transaction being ejected or rejected by simply offering it to the network again.
The fact that the mempool keeps a transaction for 72 hours makes this very difficult, though. You can not offer a transaction again to a node that still has it. They will refuse to broadcast it to their peers and worse, you can’t double spend the transaction either as this same node will reject that one.

If we decide that the best party to become responsible for re-broadcast is the wallet, this implies that we could drop the timeout from 72 hours to just a couple of blocks. That would solve a lot of problems for all those outlier transactions with practically no cost to “normal” transactions that 99.99% of the time get mined in 1 block.

This would also clarify the situation with regards to services sending transactions to an open peer, much like noise.cash or member.cash do. The mistaken assumption today is that those services can send-and-forget. And they they complain about the network when the network forgot their transaction. This doesn’t really lead to a lot of complaints today because the mempool isn’t full very often, but this core-design decision is relevant for services to not fail when the network gets more popular.

I expect people building services that create transactions but don’t want to have any responsibility to store and re-broadcast such transactions will have a problem with this approach. Its potentially more work for them, afterall. In respose I’d say that near 100% of their transactions will get mined just fine. If they don’t care too much about the “near” in there, then go for it. If they do, then this is a clear solution for them.

To be clear: this suggestion is mostly accepting what the network already does today, and being brutally honest about it. Nodes do not re-broadcast transactions today, if a miner didn’t receive the transaction the first time it was sent, nobody but the wallet will re-submit it today.
This is vastly different from what most people believe will happen with the current network.

Our scenarios. Or, what does the world look like when the (receivers) wallet has the responsibility to re-broadcast:
1 and 2 are now solved with the same solution. The node forgets after a bunch of blocks, the wallet that wants the transaction to get mined re-broadcasts when it detecs it not getting included, or decides to create a double-spending one that fixes the low fee. The decision is being made at the only place where they can decide what is the best solution.

scenario 3 is useful to look at the fun problems you can hit when the network stops being very predictable (when stuff doesn’t work 100%, which happens in real life). I included this because it makes the point that a merchant is will be obligated to not just broadcasting the transaction for him, but also any parents as of yet unconfirmed. (to be clear, this is true today, but software ignores this usecase).

scenario 4: In the design that the network of mempools is there just to support mining the next couple of blocks, then changing a transaction to pay more fees (by the creator) is as simple as replacing it with a new one that pays more fees. The entire concept of Child-pays-for-parent keeps on being irrelevant to BCH!

Conclusion

As a wallet or user creates a transaction they send it to the network for mining. Today the “ownership” of this transactoin after it has been successfully sent to the network is disputed. Services assume it is “fire and forget”, the network assumes it can just toss a transaction without recourse.

Bitcoin started with a best-effort design to make transactions minable, which works for the vast majority of them. But it creates a list of issues that become harder and harder to fix. The CPFP idea is a prime example and the fact that we need to keep the network rules similar means we can’t simply fix this.

In this post I laid out an alternative idea, taking 10 steps back and turning right instead of left. We can be clear and decisive about the ownership of a transaction after it has been successfully sent to the network. We can say that the economic actor that benefits from that transaction being mined is the one that has ownership. Which means that they need to take action if the transaction is not getting mined.

This different view of things actually has no effect on almost all transactions, as they get mined in the next (non-empty) block. This is about all those transactions we don’t really know what do to with. Stuff that doesn’t propagate, stuff that doesn’t get mined for whatever reason.

Moving ownership is done with two technical changes: 1) nodes stop trying to preserve the transaction for a long time. 2) wallets/PoS/services take some action when a transaction they care about is not getting mined.

The positive fall-out of such a simple policy change is that we can drop child-pays-for-parent. Users and wallets now know how to recover from a non-mined transaction. Nodes can stop trying to save a mempool or sync mempools for the benefit of the users.
It additionally becomes much easier to change policy rules (in combination with DSProofs) without requiring the entire network to do it at the same time.

As a postscript: it will be very useful to improve upon the payment protocol implementations. Merchants should be able to reject a transaction that is too low fee or too long a chain, forcing the sender to create a better tx.

Also worth mentioning is that I simple refer to the suggested timeout for mempool eviction as “a couple of blocks”. This is kept vague as a bit more intelligence is likely needed which will take some actual experimentation. For instance you can skip counting an empty block. Or write very complex code to calculate the median time it takes for a tx to get confirmed. The exact solution for this is open and innovation at the edges will iterate to the ideal situation.

proteusguy · December 30, 2020, 6:11pm

Lots to chew on here. Devil’s in the details - one detail for example: what’s required to “resubmit” a transaction. Can the bytes just be pushed again by any system that holds the tx and wants to ensure it gets processed or do we need to increment the sequence number and get the tx signed again by the wallet? If the former then you have multiple options to keep the system robust. If the latter, then a chain of transactions will fail given any wallet’s inability/willingness to re-sign/submit.

Additionally - the scenario that I’m dealing with is all about keeping the latency for awareness of whether or not the tx will complete as low as possible. Waiting around for several blocks - much less 72 hours - is not really an option in these situations. Are there heuristics or strict rules where a node can immediately reply to the wallet that the tx is rejected with a reason code so it can be repaired/abandoned quickly?

GeorgeDonnelly · January 4, 2021, 12:29pm

This could create situations where, from the consumer’s perspective, a transaction “didn’t go through”, “is rejected” or “fails”, which from a UX perspective is not acceptable.

People experience this right now from banks and it is a major pain point.

Our transactions always go through and we need to preserve this.

That said, maybe even under your proposal this UX need could be met at the wallet level.

Thanks for sharing your thoughts.

tom · January 4, 2021, 2:06pm

I think this is optimistic thinking, I can create several types of transactions today that will not get mined. There are some examples in the main text. The sub-1-satoshi fee is the easiest one.

The vast majority of transactions you are thinking about will get mined no different today than with the suggestion I wrote. They typically get mined within a small number of blocks. Nothing will change for those 99.99% of the transactions we see today.

Transactions that do not get mined today are basically the only ones that this proposal affects, it will make the UX for those proposals a LOT better without hurting the normal usecase.

The biggest problem here is getting people to admit we have this problem in the first place.

Jonathan_Silverblood · February 9, 2021, 6:39am

This is the heart of the issue, and a great summary.

One thing that I think is incorrect, is: the wallet that wants the transaction to get mined re-broadcasts when it detecs it not getting included, or decides to create a double-spending one that fixes the low fee

From my understanding, the receiving wallet is incapable of creating that double-spend transaction to fix the fee.

About the costs for services / wallets to implement rebroadcast and tracking features, I think the cost of that is not a problem for any profitable service or wallet.

As for the merchant2 doesn’t know the transaction that merchant1 received, adopting a payment protocol that supplies all unconfirmed dependency transactions would allow merchant2 to pre-broadcast those to ensure his nodes mempool is aware before broadcasting the transaction of interest. This should probably be the case either way, since merchant1 and merchant2 may use different nodes and there can be network delays or other network problems.

Another approach to the problem could be to classify it as an information gap. By alterating the response that nodes give upon accepting a transaction from “OK” to “I have validated this transaction, and my local node policy states that I will hold on to this for X time, and rebroadcast under Y conditions”. That way, it’s neither fire-and-forget, nor “toss-without-recourse” and instead becomes “wallets knows what is going on”.

I also agree with Donnely here, that from a user experience perspective it is absolutely key that a transaction sent is “fire-and-forget” for the users. Wallets can provide this by offering non-custodial re-broadcasting services though, so I don’t think any solution presented here actually impacts that.

I am all in favor of clarity and brutal honesty - but even if we set the mempool to evict unmined TX’s after “some blocks”, if we don’t also solve for the information gap we can still end up with “some mempools drop earlier or for different reasons than others”.

tom · February 9, 2021, 1:36pm

I like this thinking.

And in a little offline talk we came up with an actionable approach:

When a wallet / service sends a new transaction to a peer via P2P they bypass the INV/GetData design, they send the transaction directly. Now we already send a reply when it is rejected, but nothing when its accepted. It should be pretty easy (both code-wise and upgrade wise) to start sending a message in such a case with the time-to-live (TTL) which is a property stored in most mempools anyway.

Wallets can start to listen for this and learn a lot more about the transaction than they can now. And this is the path towards a healthier system where full nodes can take ownership of a transaction, but only for a little while.

Jonathan_Silverblood · February 9, 2021, 12:07pm

I think this is reasonable, and it is also a solution compatible with the current eviction strategies, as well as a shorter duration to help wallets be able to take alternative actions faster. I also suggested 4 hours as a ballpark figure of where I think would be a reasonable choice for a fullnode to keep a transaction before taking action (in this case, defaulting to eviction)

Jonathan_Silverblood · May 18, 2021, 4:45am

While this isn’t a concensus change, this would probably do well as a CHIP or topic of it’s own - it’s a clear and concise suggestion and it needs more attention, in my opinon.