Network-Layer Privacy on Bitcoin Cash

Additional possibility to add better-than-Tor resistance against timing analysis by powerful network adversaries (who can e.g. see the timings and sizes of packets going between most nodes): a DTX (“delayed TX”) message type which behaves mostly like PTX but is also allowed to introduce meaningful delay.

We want PTX messages to broadcast very quickly to preserve a fast user experience (we don’t want light wallets to avoid PTX broadcasting due to latency). However, for clients that don’t mind latency, DTX messages would be very resistant even to origin de-anonymization by global dragnet-level attackers. (And the existence of DTX broadcasts would also introduce uncertainty into timing analysis of PTX and TX messages.)

DTX messages should look approximately as if broadcast via a mix network: attempt to batch DTX broadcasts with other messages using fixed length chunks, allow a highly variable per-relay delay (maybe between 0 and 2 seconds – matching ~99% global TX diffusion time), much longer fallback timeouts, etc. The terminating node then rebroadcasts the DTX via PTX.

Like PTX, intermediate DTX relayers can see the payload, but this 1) improves efficiency and makes abuse easy to detect, and 2) reveals less to an active attacker than mixnets for 1-to-1 messages (and like Tor, even incentivizes would-be attackers to run nodes and subsidize our network bandwidth).

TLDR: give end users (or their software) the option between:

  • near-instant (TX),
  • fast + private (PTX), and
  • slow + anonymous (DTX).

Even if most wallets don’t use them, the simple existence of PTX and especially DTX broadcasts would significantly improve real-world privacy for all TXs.

3 Likes

Re Clover, it seems single-hop proxying to one randomly selected outbound peer can be done without a protocol change, and would reduce deanonymysation precision to D = |S|/|R|.

How does multi-hop with PTX improve on single-hop to a random outbound peer?

I’m missing something obvious here, just not sure what. :thinking:

2 Likes

Yes, PTX and P2P connectivity would improve the privacy of light wallets that support CashFusion but ultimately leak privacy info in earlier or later transactions.

Can you link to prior discussion of Clover? Also, which chain are you referring to?

Interesting, thanks for sharing! Have you all considered @markblundeberg’s progressive knockout filters? (I know this might be a lot of work in a different direction, so this is not necessarily a recommendation, I’m just curious if there’s any discussion on it that you can link.) I think a protocol using PKFs on “sub-block” portions of block merkle trees could be really efficient, even as block sizes get very large.

A SharedWorker seems reasonable, though it seem just as easy and performant to build the node as a website with an iframe postMessage interface. Compatibility there extends back decades, and you could also expose APIs for cross-domain access.

I agree that this is very frustrating. It’s hard right now to envision PWAs being fully competitive with native frameworks on the various mobile platforms.

A web-stack accessible P2P network would still improve the status quo by making direct P2P usage a bit more accessible and maintainable, even for Expo, React Native, etc. And of course, for use cases where PWAs are already sufficient (despite the nerfing), it’d be great to have an easy P2P network access!

1 Like

Hi Jason I believe the overall work is amazing you have my full support however, latency is a real issue here, especially for DTX it would be better if there is a way to improve PTX imo

1 Like

I might be misunderstanding the question, but right now, if you receive an unsolicited TX message from a random inbound peer, you know with almost 100% certainty that they are the origin (otherwise you’d have gotten it via INV).

Likewise, network analysis companies maintain lots of well-connected peers, and can generally pinpoint TX origin with high certainty by monitoring INV messages (even if they’re not selected for the random initial broadcast). Nodes even have a helpful GETADDR message + ADDR response to help attackers construct a clear network graph.

With PTX messages, you both introduce uncertainty at the first hop and mark the transaction for a brief phase of continued proxying by honest nodes.

1 Like

I believe this is what you talking about?

print(network. subscribed_addresses)

1 Like

I agree, we probably want to select PTX Clover parameters to aim for fast propagation and “ok privacy” rather than PTX messages being slow enough to risk wallet vendors resorting to TX or INV broadcast for user experience reasons. (Though note that payment protocol(s) still enable instant feedback at point-of-sale by skipping the P2P network round-trip, even if the wallet itself always broadcasts via PTX.)

I think acknowledging the difference in use cases – “everyday privacy” (PTX) vs. “privacy is critical” (DTX) – by offering both options to wallets can maximize the overall usefulness of BCH.

Note that the above DTX description is very half-baked/underspecified, I just wanted to clarify that a one-size-fits-all solution probably isn’t optimal. Overall usage (and therefore overall anonymity set) could be improved by offering two distinct options. Maybe to simplify node implementation, DTX messages would still perform the exact same protocol, just with Clover parameters selected to maximize privacy vs. latency. (Users needing global-dragnet resistance should still broadcast over Tor.)

3 Likes

Super excited to see where this leads – awesome stuff!

2 Likes

https://x.com/bitjson/status/1899735806337601903

If one wanted encrypting connections, why not just use TLS? Hasn’t this problem has been solved aeons ago?

The P2P network doesn’t necessarily “need” authentication, only encryption, and BIP324 makes some good arguments: bips/bip-0324.mediawiki at master · bitcoin/bips · GitHub.

But I agree, that was my conclusion too: we should use existing protocols with good language/library support rather than BIP324 or making our own.

E.g. libp2p WebTransport would use TLS, and libp2p standardizes other messy bits – browsers require the hash of self-signed certificates in advance of the connection, so libp2p “multiaddresses” + a security handshake lets browser clients connect. See: libp2p Connectivity

1 Like

I’m not suggesting unsolicted TX messages. I’m suggesting the originating node X picks a random outbound peer Y and sends an INV message, waits for GETDATA then sends TX.

If Y is honest, how does the adversary A know that the transaction originated with X and not with Y? Surely the first-timestamp estimator will select Y.

To what extent does this rely on orignating node X broadcasting new transactions to all inbound and outbound peers?

Are there any strategies that can reduce deanonymisation precision without a protocol change?

1 Like

There is a need to correct this widespread “misunderstanding” of CashFusion.

The fact is that a wallet that has used fusion properly will have severed its history completely. Any money held in that wallet is then no longer linked to any past transactions. And this is permanent. Barring bugs, there is no way that a user can mistakenly undo this.

This is a very important fact. After applying this simple privacy feature, you clean your coins. History is dropped, permanently. User error doesn’t have risks to undo this.

So, Jason claiming that this is good for “leaks in privacy in earlier (or later) transactions” is at best misunderstanding the tech. But with Jason doing this repeatedly it likely means he is aware and just spreading this misinformation to entice people to think his ideas are needed to solve those not existing problems.

CashFusion being applied properly is by far the best privacy tool we have. And it is available today.
It is not really much in demand in wallets and such because most people don’t really care.

1 Like

This is indeed how it works in real life. In practically all of the actual software doing this.

If Jason is doing it differently, he’s doing it wrong.

I found the answer in the Dandelion paper, which calls it diffusion-by-proxy:

A natural strategy for breaking symmetry about the source is to ask someone else to spread the message. That is, for every transaction, the source node chooses a peer uniformly at random from the pool of all nodes. It transmits the message to that node, who then broadcasts the message.

Diffusion-by-proxy might seem like it should have low precision because the graph is so dynamic, but that intuition turns out to be false.

See the paper for more details on why. My observation is alluded to in section 4.4:

Intuitively, this statement holds because each node delivers its own message to the adversary with probability p.

In Dandelion paper p refers to the ratio of colluding nodes to the total number of nodes in the network. The Clover paper calls this is |S|/|R| (or |A|/|R| in figures 2 and 3, perhaps a typo?) and uses p for broadcast probability.

In my initial analysis I was thinking only about detection probability (recall), not detection precision. Recall has a floor at p, precision has a floor at p2 (where p is as defined in Dandelion paper).

It’s a pity that the Clover paper doesn’t use the same terminology and graphs. It seems to be focussed on precision rather than recall, perhaps because Dandelion already achieves optimal recall but not optimal precision.

Anyway Clover does seem simpler than Dandelion and provides significant improvement over Diffusion so it could be a great addition to BCH.

3 Likes

Lol i was reading that line multiple times and didn’t notice this previously thanks for this highlight. What actually attracted me to clover is that its is very simple and have better precision which the paper focus on very heavily.

2 Likes

Thanks for digging into this @rnbrady! Since you answered your own questions, I’ll just add some comments:

An anonymity set of 2 is pretty dismal, and that’s in the best case (assuming the user didn’t send it straight to one of the attacker’s nodes) – for low-resource attackers – before other attacks/analysis.

E.g. if the transaction spends a CashFusion output, and you’re broadcasting using the same node or Fulcrum server you were using before the CashFusion – you’ve probably helped attackers cross off a lot of valid partitions (again, before other attacks). If many CashFusion inputs/outputs are compromised in this same way, the whole CashFusion is probably compromised.

Related: when thinking about network privacy, it’s easy to assume the “outbound” network looks like the left below, but in reality it looks like the right (or worse – less than 7 “honest” connections):

Lattice graph of 8 outbound connections Lattice graph of 8 outbound connections + 90% supernode

The attacker(s) are actually surrounding you, and they’re connected to almost every node by at least one connection, often outbound (attackers most certainly maintain good uptime and low latency).

On a related note, I want to put out there that I think node implementations should not change their transaction broadcast behavior. Broadcasting to all peers is important for reliability – if e.g. BCHN changed the default to broadcast to fewer peers, consider that attackers could simply black-hole all new transactions (they probably already do), and now the user experience is impacted + the attacker continues getting new timing-based chances to learn about you and the network graph at each re-broadcast attempt.

A better solution is to soil transaction broadcast as a heuristic. Like widespread PayJoin deployment would soil the common-input ownership heuristic, if we keep initial broadcasts as they are + introduce PTX (and maybe DTX) messages, we retain the faster transaction user experience while reducing or eliminating the value of initial broadcast origin tracing.

This is another item that Clover “gets right” but isn’t explicitly mentioned in the paper. (Maybe because Dandelion++ does it too, and the authors consider it obvious.)

Unless I’m mistaken, BCHN is not. That means services using BCHN – Fulcrum servers, most Chaingraph instances, etc. – are also initially broadcasting via TX flood to all peers.

Edit:

My initial post wasn’t correct: transactions are immediately broadcasted to all peers, but only in the next INV for that peer, not as an unsolicited TX. So no, BCHN is not picking just one peer to broadcast new transactions (and should not change that behavior for the reasons described above), but it’s also not immediately blasting TX messages to everyone as I initially said.

(IIRC, my Chaingraph instance regularly reports receipt of unsolicited TX broadcasts from BCHN, but I’m still trying to track down why – it’s possible that’s because it’s mining chipnet and testnet4 blocks? Or possibly just a logging bug; I’ll need to review.)

Hopefully the attack on CashFusion I described above illustrates how incorrect this assumption is.

Network-layer privacy leaks probably de-anonymize most cryptocurrency transactions – even those of “privacy coins”.

I think CashFusion is probably about as private as Monero right now (esp. given Monero’s recently published MAP Decoder attack): ok for casual privacy, but very susceptible to targeted attacks and powerful network adversaries (global dragnet).

I suspect that network-layer attacks have always been the worst privacy problems. E.g. I read between these lines that Zcash probably doesn’t have any intentional backdoors (unless this was intentional), but that experienced privacy experts recognize the fundamentally hard problem of getting strong privacy out of real-time systems.

To illustrate, here’s the conclusion of a recent paper studying attacks on Dandelion, Dandelion++, and Lightning Network: https://arxiv.org/pdf/2201.11860

Our evaluation of the said schemes reveals some serious concerns. For instance, our analysis of real Lightning Network snapshots reveals that, by colluding with a few (e.g., 1%) influential nodes, an adversary can fully deanonymize about 50% of total transactions in the network. Similarly, Dandelion and Dandelion++ provide a low median entropy of 3 and 5 bit, respectively, when 20% of the nodes are adversarial.

Those are some very worrying numbers, and they imply that “on-chain” privacy systems like CashFusion, Zero-Knowledge Proof (ZKP) covenants, etc. can only offer casual privacy without significant additional operational security.

After more thought (here), I think that should remain out of scope for Bitcoin Cash’s P2P network, and DTX messages should just re-use the PTX behavior with much longer parameters: plenty of other networks (Tor, I2P, etc.) are already working on the general problem of real-time network privacy, and we shouldn’t pretend that our real-time anonymity set is competitive. (As far as I know, no cryptocurrency network is even close.) Our time is better spent making it easier for users to use specific Bitcoin Cash software over Tor, I2P, etc.

3 Likes

First of all, thank you for actually replying. It is rare to actually see you engage in discussion with people that disagree with you.

The correct way is to send an INV; the docs from the original client are here:

The point is to avoid sending duplicate data to each and every single of the (sometimes hundreds) of peers. Has been the case for over a decade.
So, your rationale for this feature is missing. What you say is a problem is simply not how the tech actually works.

You haven’t actually shown any attack on cash fusion. I read a couple of FUD statements that make assumptions about fulcrum servers being compromised, but nothing related to CashFusion.

You continue your sales-pitch language spreading fear by describing the security of cash fusions an “assumption”. It is factually not an assumption, apart from being in operation for many years now we have had actual researchers try to find holes and they gave it a clean bill of health.
So, second count against your story trying to undermine what we have today.

Seriously, if you think it is so bad, go and prove it by de-anonimizing cash fusion transactions. I’ll wait.

Good try, continuing to push the FUD buttons for people that don’t know enough to understand what you are doing.
The fact is, Cash Fusion is based on extremely simple math, stuff that has been proven for centuries. It is simply massive numbers of combinations, large number theory. Nothing fancy. Nothing that can be broken by a smarter mathematician.

So, again, if you are so much advocating that CashFusion is bad, broken, in need of your massive changes to the Bitcoin Cash protocol, then go and prove your point by following the money on a series of fusions. (EC defaults to 3 in a row)

Fun fact: Roger Ver had a professional try it. Someone that works for a chain-analisys company. And they failed the test. There was nothing useful found.

CashFusion works, is available today. WIth multiple rounds is perfectly good enough for you and everyone else.

On top of that, the lack of demand for cash fusion (or other such mechanisms) in clients shows that the general audience is not demanding more privacy.

Any suggestions for people to do work or changes should reflect that reality.

You know that Jason isn’t suggesting to change how the consensus work. Remember that cashfusion work on the transaction level by mixing while privacy at the network layer work on the node level which is different. You get the benefit of more privacy by running your own node this way. Imo jason proposal increases node operator benefits & wallet providers without extra cost or doing complicated privacy protocols for example PTX would give nodes extra privacy by just enabling it without any extra steps! This is not FUD its the necessary evolution of privacy from different perspective or attack vector

2 Likes

Thanks for detailed response @bitjson.

I would have thought 9 rather than 2 when the proxy is honest, but point taken.

“Nodes should not change their broadcast behaviour, but they should proxy first” does not add up. That is changing the broadcast behaviour. Perhaps you mean that once a TX transitions from proxying phase to diffusing phase, it should still be diffused to all peers. I agree and am not suggesting otherwise, but you’ve still introduced a reliability loss in the proxying phase. That is the cost of the added privacy.

Ok you are viewing the proxying phase as completely separate from transaction broadcast. I am viewing the proxying phase as a part of, and a change to, transaction broadcast. If you view them as separate then yes, you have “soiled transaction broadcast as a heuristic” but you have also introduced transaction proxying as a heuristic. And this will become the primary heursitic used by a well connected adversary as they will see the PTX before the INV for most transactions.

BCHN sending a PTX to one peer is surely changing “the default to broadcast to fewer peers” and the adversary will be first hop with probability p over all transactions and nodes. Clover accepts this, hence just times out and resorts to diffusion. If you’ve been caught red handed by a neighbour, going back inside and changing your disguise won’t help you.

That’s why recall has a floor of p.

The goal then is to minimise the adversary’s precision, which represents their ability to link transactions to a node of interest X even when they are not a peer of X or not the first hop. The floor for this is p2 which is a much lower number.

3 Likes

I’m now wondering what would happen if you didn’t bother with PTX and just modified the node behaviour with INV messages. Every incoming transaction gets either proxied or diffused/flooded based on a coin toss and if proxying then the Clover relay pattern is used. Some transactions will go from proxy to diffusion and back to proxy, others might get stuck in a loop, and in both cases the timeouts will cause all nodes to diffuse eventually.

2 Likes

Privacy beyond encryption

Even if we had widespread encryption use, it’s important to understand that the current P2P protocol would still be possible for powerful network-layer attackers (local admin, ISPs, governments, etc.) to fingerprint – and even de-anonymize transaction broadcast origins – based on packet sizes and timings.

Handshake fingerprinting

BIP324 leaves room in the handshake protocol for “garbage” to allow later upgrades that introduce traffic shaping measures and confuse fingerprinting (make the connection look like some other, non-censored protocol). However, the BIP stops short of actually implementing such protections.

Libp2p would probably give us more of a head start than BIP324: much of our handshake(s) would already be confused with other libp2p-based protocols, and in practice e.g. the WebTransport protocol might already be hard to differentiate from other web traffic. (The underlying QUIC protocol is well designed to resist traffic analysis, with encrypted headers and more efficient loss recovery vs. TCP if the WebTransport isn’t downgraded to HTTP/2 due to UDP being blocked.)

While I don’t think fingerprint resistance is very high priority, if a low-cost change (simple implementation + minimal impact on latency and bandwidth) might reduce the effectiveness of attempting to block all BCH connections, that would probably be worth exploring.

For example, the final step of a hypothetical libp2p WebTransport + BCH handshake – the VERSION + VERACK back and forths – has very predictable timings and message lengths. Introducing random padding, slight delays, and/or otherwise muddying that signature could reduce an attacker’s certainty of identifying the connection as a BCH P2P connection.

Inspecting transaction propagation despite encryption

A far more critical weakness for practical privacy against powerful network-layer adversaries (local admin, ISPs, governments, etc.) is the predictability of timing and message lengths involved in transaction propagation.

Fundamentally, all BCH transactions ultimately become public. Even if the attacker cannot break the encryption between honest BCH peers, if they can record the timings and sizes of packets traveling between connections, they can ultimately unwind the transaction propagation path (even back through a Clover, Dandelion++, etc. broadcast) to the node which first had the transaction. Encryption is practically useless in this scenario. (IMO, this is a weakness of BIP324 without other improvements.)

Padding-only Defenses Add Delay in Tor includes great background on this sort of attack against Tor. It notes (emphasis added):

[Website Fingerprinting] attacks use information about the timing, sequence, and volume of packets sent between a client and the Tor network to detect whether a client is downloading a targeted, or “monitored,” website.

These attacks have been found to be highly effective against Tor, identifying targeted websites with accuracy as high as 99% [6] under some conditions.

And a snippet I think we should keep in mind when reviewing solutions (below):

[…] it may be desirable to consider allowing cell delays, since adding padding already incurs latency, and cell delays may in fact reduce the resource stress caused by [Website Fingerprinting] defenses.

The worst offenders: INV and GETDATA

Encrypted transaction inspection is made much easier by the predictability of INV and GETDATA messages. Their distinct timing and length patterns allow the attacker to glean 1) that they’re looking at INV/GETDATA and not PTX or TX 2) the number of inventory items being advertised, and 3) the number of transactions being requested in response.

All this information significantly reduces the attacker’s uncertainty, even for attacks against Clover, Dandelion++, etc.:

  • INV and GETDATA messages can be more definitively filtered from timing data, clarifying the remaining picture of mostly TX and/or PTX message propagation,
  • INV messages which do not trigger GETDATA messages are easily distinguished from PTX messages,
  • The remaining TX timing data can even be more clearly unwound and labeled given knowledge of how many items were requested in each interaction – each time window is a sudoku puzzle that can be solved using the now-public transaction sizes from that time window.

Solutions

As mentioned above, users needing the best possible privacy should broadcast using Tor (but not necessarily listen exclusively via Tor), where their broadcast will also be obscure by Tor’s much larger real-time anonymity set than (I think) any cryptocurrency can currently offer.

As for improving the BCH P2P protocol’s resistance to timing analysis – I haven’t done enough review to be confident about any particular approach.

Right now, I’m thinking we should try to avoid:

  • Padding messages to fixed sizes – this would trade significant additional bandwidth costs for beyond-casual privacy against powerful network-layer adversaries. It’s not necessarily in every BCH user’s interest to make that trade. Instead, let BCH specialize at being efficient money, and users who need that additional privacy should use BCH through a network that specializes in privacy (Tor, I2P, etc.)
  • Decoy traffic – same efficiency for privacy tradeoff: we get the best of both worlds by being as efficient as possible and letting users choose to use a general privacy network if they need it.
  • Meaningful delays in the typical user experience – as mentioned above, I think we should keep the common path free of artificial delays, i.e. PTX messages should be as fast as possible, offering casual privacy without much impact to user-perceived speed. Ultimately, this increases the anonymity set size by making it less annoying for users/wallets who don’t really care to leave enabled.

Admittedly, this would leave our toolbox quite empty.

One option to possibly strengthen Clover against this sort of analysis (at the cost of broadcast latency) would be to instead advertise “PTX transactions” as a new inventory type, having nodes fetch and re-advertise them in the same way as TX (the 3-step INV + GETDATA + TX pattern is obviously different than 1-way PTX broadcasts). I don’t know if the Clover authors considered that and decided against it (I’ve reached out to ask), but it’s probably worth testing how badly it impacts latency. If we could do it while keeping typical PTX latency under maybe ~10s, it’s probably worthwhile. Otherwise, maybe we could explore making “DTX transactions” work via INV broadcast (so PTX is fast + casual privacy, while DTX is the dragnet-resistant option).

I’d be very interested in others’ thoughts on INV and GETDATA lengths/timings.

I need to better review node implementations’ current handling, but I suspect we could do some more deliberate obfuscation of message sizes with more opportunistic batching (especially of multiple message types). Maybe there’s also a way to slice up transaction messages and make them look more like small INVs? (But maybe just PTX and/or DTX if that would slow down TX propagation?)

2 Likes