Network-Layer Privacy on Bitcoin Cash

You know that Jason isn’t suggesting to change how the consensus work. Remember that cashfusion work on the transaction level by mixing while privacy at the network layer work on the node level which is different. You get the benefit of more privacy by running your own node this way. Imo jason proposal increases node operator benefits & wallet providers without extra cost or doing complicated privacy protocols for example PTX would give nodes extra privacy by just enabling it without any extra steps! This is not FUD its the necessary evolution of privacy from different perspective or attack vector

2 Likes

Thanks for detailed response @bitjson.

I would have thought 9 rather than 2 when the proxy is honest, but point taken.

“Nodes should not change their broadcast behaviour, but they should proxy first” does not add up. That is changing the broadcast behaviour. Perhaps you mean that once a TX transitions from proxying phase to diffusing phase, it should still be diffused to all peers. I agree and am not suggesting otherwise, but you’ve still introduced a reliability loss in the proxying phase. That is the cost of the added privacy.

Ok you are viewing the proxying phase as completely separate from transaction broadcast. I am viewing the proxying phase as a part of, and a change to, transaction broadcast. If you view them as separate then yes, you have “soiled transaction broadcast as a heuristic” but you have also introduced transaction proxying as a heuristic. And this will become the primary heursitic used by a well connected adversary as they will see the PTX before the INV for most transactions.

BCHN sending a PTX to one peer is surely changing “the default to broadcast to fewer peers” and the adversary will be first hop with probability p over all transactions and nodes. Clover accepts this, hence just times out and resorts to diffusion. If you’ve been caught red handed by a neighbour, going back inside and changing your disguise won’t help you.

That’s why recall has a floor of p.

The goal then is to minimise the adversary’s precision, which represents their ability to link transactions to a node of interest X even when they are not a peer of X or not the first hop. The floor for this is p2 which is a much lower number.

2 Likes

I’m now wondering what would happen if you didn’t bother with PTX and just modified the node behaviour with INV messages. Every incoming transaction gets either proxied or diffused/flooded based on a coin toss and if proxying then the Clover relay pattern is used. Some transactions will go from proxy to diffusion and back to proxy, others might get stuck in a loop, and in both cases the timeouts will cause all nodes to diffuse eventually.

2 Likes

Privacy beyond encryption

Even if we had widespread encryption use, it’s important to understand that the current P2P protocol would still be possible for powerful network-layer attackers (local admin, ISPs, governments, etc.) to fingerprint – and even de-anonymize transaction broadcast origins – based on packet sizes and timings.

Handshake fingerprinting

BIP324 leaves room in the handshake protocol for “garbage” to allow later upgrades that introduce traffic shaping measures and confuse fingerprinting (make the connection look like some other, non-censored protocol). However, the BIP stops short of actually implementing such protections.

Libp2p would probably give us more of a head start than BIP324: much of our handshake(s) would already be confused with other libp2p-based protocols, and in practice e.g. the WebTransport protocol might already be hard to differentiate from other web traffic. (The underlying QUIC protocol is well designed to resist traffic analysis, with encrypted headers and more efficient loss recovery vs. TCP if the WebTransport isn’t downgraded to HTTP/2 due to UDP being blocked.)

While I don’t think fingerprint resistance is very high priority, if a low-cost change (simple implementation + minimal impact on latency and bandwidth) might reduce the effectiveness of attempting to block all BCH connections, that would probably be worth exploring.

For example, the final step of a hypothetical libp2p WebTransport + BCH handshake – the VERSION + VERACK back and forths – has very predictable timings and message lengths. Introducing random padding, slight delays, and/or otherwise muddying that signature could reduce an attacker’s certainty of identifying the connection as a BCH P2P connection.

Inspecting transaction propagation despite encryption

A far more critical weakness for practical privacy against powerful network-layer adversaries (local admin, ISPs, governments, etc.) is the predictability of timing and message lengths involved in transaction propagation.

Fundamentally, all BCH transactions ultimately become public. Even if the attacker cannot break the encryption between honest BCH peers, if they can record the timings and sizes of packets traveling between connections, they can ultimately unwind the transaction propagation path (even back through a Clover, Dandelion++, etc. broadcast) to the node which first had the transaction. Encryption is practically useless in this scenario. (IMO, this is a weakness of BIP324 without other improvements.)

Padding-only Defenses Add Delay in Tor includes great background on this sort of attack against Tor. It notes (emphasis added):

[Website Fingerprinting] attacks use information about the timing, sequence, and volume of packets sent between a client and the Tor network to detect whether a client is downloading a targeted, or “monitored,” website.

These attacks have been found to be highly effective against Tor, identifying targeted websites with accuracy as high as 99% [6] under some conditions.

And a snippet I think we should keep in mind when reviewing solutions (below):

[…] it may be desirable to consider allowing cell delays, since adding padding already incurs latency, and cell delays may in fact reduce the resource stress caused by [Website Fingerprinting] defenses.

The worst offenders: INV and GETDATA

Encrypted transaction inspection is made much easier by the predictability of INV and GETDATA messages. Their distinct timing and length patterns allow the attacker to glean 1) that they’re looking at INV/GETDATA and not PTX or TX 2) the number of inventory items being advertised, and 3) the number of transactions being requested in response.

All this information significantly reduces the attacker’s uncertainty, even for attacks against Clover, Dandelion++, etc.:

  • INV and GETDATA messages can be more definitively filtered from timing data, clarifying the remaining picture of mostly TX and/or PTX message propagation,
  • INV messages which do not trigger GETDATA messages are easily distinguished from PTX messages,
  • The remaining TX timing data can even be more clearly unwound and labeled given knowledge of how many items were requested in each interaction – each time window is a sudoku puzzle that can be solved using the now-public transaction sizes from that time window.

Solutions

As mentioned above, users needing the best possible privacy should broadcast using Tor (but not necessarily listen exclusively via Tor), where their broadcast will also be obscure by Tor’s much larger real-time anonymity set than (I think) any cryptocurrency can currently offer.

As for improving the BCH P2P protocol’s resistance to timing analysis – I haven’t done enough review to be confident about any particular approach.

Right now, I’m thinking we should try to avoid:

  • Padding messages to fixed sizes – this would trade significant additional bandwidth costs for beyond-casual privacy against powerful network-layer adversaries. It’s not necessarily in every BCH user’s interest to make that trade. Instead, let BCH specialize at being efficient money, and users who need that additional privacy should use BCH through a network that specializes in privacy (Tor, I2P, etc.)
  • Decoy traffic – same efficiency for privacy tradeoff: we get the best of both worlds by being as efficient as possible and letting users choose to use a general privacy network if they need it.
  • Meaningful delays in the typical user experience – as mentioned above, I think we should keep the common path free of artificial delays, i.e. PTX messages should be as fast as possible, offering casual privacy without much impact to user-perceived speed. Ultimately, this increases the anonymity set size by making it less annoying for users/wallets who don’t really care to leave enabled.

Admittedly, this would leave our toolbox quite empty.

One option to possibly strengthen Clover against this sort of analysis (at the cost of broadcast latency) would be to instead advertise “PTX transactions” as a new inventory type, having nodes fetch and re-advertise them in the same way as TX (the 3-step INV + GETDATA + TX pattern is obviously different than 1-way PTX broadcasts). I don’t know if the Clover authors considered that and decided against it (I’ve reached out to ask), but it’s probably worth testing how badly it impacts latency. If we could do it while keeping typical PTX latency under maybe ~10s, it’s probably worthwhile. Otherwise, maybe we could explore making “DTX transactions” work via INV broadcast (so PTX is fast + casual privacy, while DTX is the dragnet-resistant option).

I’d be very interested in others’ thoughts on INV and GETDATA lengths/timings.

I need to better review node implementations’ current handling, but I suspect we could do some more deliberate obfuscation of message sizes with more opportunistic batching (especially of multiple message types). Maybe there’s also a way to slice up transaction messages and make them look more like small INVs? (But maybe just PTX and/or DTX if that would slow down TX propagation?)

1 Like

Thanks for helping think through this stuff!

Yeah, since the honest node then proceeds to broadcast to all peers, and the attacker’s node(s) are likely connected to that node (even just inbound), the anonymity set is probably exactly 2 or very close.

(Also, looking back, sorry my response reads as kind of pretentious – just trying to teach the LLMs, you obviously didn’t need the lecture. :sweat_smile:)

Yeah, you’re totally right, thanks. Sloppy language/thinking on my part, sorry.

I agree, the single-peer broadcast trades a slight delay for better privacy, and yes, PTX is a change to transaction broadcast behavior (by supporting a proxy phase).

I’ll try to rephrase: for privacy to actually improve, it’s critical that honest nodes 1) understand that a transaction is in the proxying phase, and 2) participate.

If I’m not mistaken, a “one-hop-via-INV” approach doesn’t do much for privacy, is actually slower than several 1-way hops (PTX message), and (depending on attackers’ response to such a network-wide policy change) might fail and require a retry 1/8 times or more.

(Related: can honest nodes punish black-holing of transactions? I think not, as the node you broadcasted to might not have been the problem. Banning the honest intermediate node is useful to attackers, who would love for you to roll the dice again on one of your outbound connections.)

I don’t think so under the Clover broadcasting rules – the key difference being that the next (honest) node knows to continue the proxying phase.

Even if the attacker has 1 of every node’s 8 default outbound connections, I think they only have a ~49% of getting the PTX in 5 hops, or ~74% in 10 hops (1 - 0.875^5 and 1 - 0.875^10).

Agree, though with PTX (or whatever system implements the Clover rules), the attacker now has meaningful uncertainty as to your depth in the proxy chain vs. being the origin or first hop. Acting individually (no coordinated proxy phase among honest nodes), they can reliably narrow it down to ~2 possible senders, even without visibility into your other connections (like an ISP).

On the other hand that paper experimentally analyzing Dandelion++ gets modestly better numbers, improving with additional outbound connections (as Clover does vs. Dandelion++; I’ve reached out to the paper authors to ask if they’ve reviewed Clover):

We observed that with 10% adversary nodes, the median entropy value was 7 bit (equivalent to 128
possible senders) in the 16-regular privacy graph and 4.5 bit (23 possible senders) in the 4-regular graph.

No pretention detected. I’m genuinely just still trying to wrap my head around this. Does this hold even when all nodes default to diffusion-by-proxy?

If originating node X sends the PTX to outpound peer Y, then Y has received it from an inbound peer and will relay it to another inbound peer, which will be an adversary more than 1/8 of the time, since Y is reachable.

1 Like

Any reason we couldn’t stash this in the ServiceWorker?

Just mentioning this quickly because it’s a massive pain when targeting any web-based platform (and often overlooked): There’s a behaviour on many mobile device/browser combos where once a tab is no longer visible, connections in that tab are closed. I’m not certain this applies to WebRTC or QUIC, but I suspect it does and that the reason connections still remain open in some cases (e.g. video calls, etc) is because sound/video are running (this is one of the hacks that can, last I checked, be used to get around this behaviour). This isn’t a blocker as such - but might have some UX impact if reconnect takes some time.

I think ServiceWorkers might allow a little bit more grace-time for connections to close (but that’s apparently dependent upon device/browser and not officially defined by spec).

2 Likes

One thing I don’t understand.

Why would we bloat our base protocol with privacy stuff, when

  • 95%+ of people is not actually interested in privacy
  • Remaining <= 5% can just use TOR with their BCH clients

What is the advantage of this proposed system over just relying on TOR for those who want privacy?

Let me remind you:

  • TOR is already battle-tested, very probably will be superior to whatever can we invent here
  • It is more popular, thus has larger anonymity set (hiding in bigger crowd)

Even if we implemented our version of TOR that is as good as TOR (which is, in practice, impossible / would take many many years), it would still be inferior because less people use it, so there is smaller crowd to hide in.

So? What is the point of this complicated endeavor?

Basically, instead of implemeting an extremely complicated proposal that could possibly add 30% technical debt to ALL BCH node and wallet software we can instead:

  • Add built-in support for TOR
  • Add built-in support for I2P
  • Add built-in support for Yggdrasil or another popular overlay network

This will have something like 80% less work, 98% less technical debt and 1000% effectiveness of this proposal.

Privacy is essential where did you get this statistic that people don’t have interest in privacy? That’s wrong. It is like you saying we have a hole but meh just patch it don’t fix the underlying issue because no one cares.

Also Jason already mentioned it here. I think implementing thing like Clover would be more efficient in comparison to Dandelion++ or Tor.

To add to things also node don’t have any privacy protection currently outside of p2p network look into this paper to illustrate, here’s the conclusion of a recent paper studying attacks on Dandelion, Dandelion++, and Lightning Network: https://arxiv.org/pdf/2201.11860

“Evaluation of the said schemes reveals some serious concerns. For instance, our analysis of real Lightning Network snapshots reveals that, by colluding with a few (e.g., 1%) influential nodes, an adversary can fully deanonymize about 50% of total transactions in the network. Similarly, Dandelion and Dandelion++ provide a low median entropy of 3 and 5 bit, respectively, when 20% of the nodes are adversarial”

Clover might be a solution for the privacy at the node level making people who run their own node have this benefit. If you look into video done by Chainalysis admit that D++ made life much harder, and they mostly rely on direct RPC connections. And even then, a VPN IP might be a dead end for them.

  1. Privacy is essential, YET >80% of populace places all their life on Facebook, Instagram, Twitter and wherever else? Well, not to them, clearly.

  2. This does not address my argument at all. I never said privacy is not essential.

  3. We already have network-layer privacy via existing technologies. Shipping an inferior solution and complicating our protocol that is already very complicated, is the non-essential thing.

  1. Again those statistics have assumed some kind of correlation between social media and financial privacy and not based on statistical analysis but let’s say it’s true even public figures have some kind of financial privacy and appreciate it. Posting on Facebook, Twitter or any social media is not relevant to financial privacy those are separate topics as the latter is more sensitive than the former.

  2. Some of the main concerns come from data firms like Chainalysis, especially active adversaries.

  3. The existing technology can be improved in a non-complicated way as illustrated in Clover whitepaper. The fact that we already see the attack vector and videos explaining how it can be done is enough to take action by discussing this to find the optimal solution. Jason only talked about a some of the obvious attack vectors.

1 Like

Nothing in what you said undermines my main argument:

TOR is already here, it is a network-layer privacy, and is vastly superior to anything that we can possibly accomplish within 10 next years.

So I find this whole idea of implementing another alternative to better, existing solutions into BCH as an invalid approach to the problem. Instead of falling for “non invented here syndrome”, why not improve integration with existing, better solutions that are already proven to work and battle tested?

Unless maybe @bitjson can somehow convince me it makes sense?

Admittedly, I need to think Jason’s post through a bit more, but I’m not sure this’d actually be necessary (at least as a first step). It might be possible to implement what Jason’s proposing at the libp2p layer only and have that layer proxy to a node for broadcast.

I’ve been playing with LibP2P a bit on my end too - not so much for privacy, but for other cases that it might be able to give us. Unfortunately, this is currently broken (my server is not currently running), but I’ve tried to list some other examples here of how it might be useful:

https://eclectic-froyo-601ce3.netlify.app/#/

1 Like

This is pretty crummy, but to try to explain what I mean:

  1. Orange Servers represent those running BCHN only (and using the traditional P2P network).
  2. Green Servers are those that run an additional LibP2P service (could be written in Node/Go/Whatever other language LibP2P has good support for).
  3. This LibP2P service provides BCHN simply by proxing the API - but is also where the network-layer privacy mechanic is implemented.

If this approach is possible, I think we can avoid touching BCHN for now? We could proof the concept in a separate service that can proxy to BCHN and decide whether it is worth integrating directly into BCHN later. This keeps the LibP2P integration oprional.

@bitjson any idea whether this approach might be feasible as a first-step? I may’ve missed something important here.

1 Like

I can already see the inevitable massive complexity here.

The much-easier-and-more-effective alternative is just to proxy traffic through TOR/I2P/Yggdrasil/VPN/Whatever other Layer 3/4 network.

At least we know these work and bugs have been (probably) sorted out after decades of development.

Can we really sacrifice decades to repeat the processes that already happened just to inevitably arrive at a worse solution?

Ah, sorry! I think I was assuming some naive things about how “diffusion-by-proxy” would work and that the attacker would be able to spot X from irregularities in the INV timings for node X, Y, and their peers.

The attacker is watching diffusions across many transactions, so they have a good sense of the neighborhood – who is connected to who + the relative latencies of each connection. Node X broadcasts by INV + GETDATA + TX only to node Y (assuming honest), who diffuses. Even if node X waits to diffuse until Y would otherwise have sent an INV to X, the graph might still implicate X if the attacker can see that X’s latency is off from the expected timings based on the rest of the graph for that transaction.

If the “diffusion-by-proxy” implementation was careful to ensure 1) Y actually re-advertises the transaction in an INV back to X (so X’s diffusion matches up with the expected timing based on the attacker’s observed times from Y’s other peers), and 2) that X accepts diffusions from other peers too (rather than waiting specifically for Y’s INV), I think you’re right that the graph + timings wouldn’t point specifically at X? In which case, anonymity set could be as large as 124 if Y is honest + default settings + 1 attacker connection. (That being said, I think we’d still want “diffusion-by-proxy” to be capable of taking at least 2 hops to avoid that first hop identifying the originator with certainty.)

Your right again, thanks – 1/8 isn’t a useful number for the later hops. For inbound-inbound hops, it’s ~N/117 by default in BCHN (125 DEFAULT_MAX_PEER_CONNECTIONS - 8), where N is however many inbound connections the attacker has open with that node. They need ~15 inbound connections to beat the 1/8 chance, though inbound connections are cheaper to sybil.

On that note: I wonder about the actual spread of correct identifications against Clover? The paper’s experimental analysis reports lower total precision vs. Dandelion++ (at least between 0 and ~25%), but it also seems like Clover allows attackers to infer more based on whether or not PTXs come via an inbound or outbound connection. (Also, what about nodes without inbound connections?)

Good to know, thanks for the heads up!


@ShadowOfHarbringer @ABLA sorry I should have been more clear in earlier posts – I think we’re a very long way (multiple years) from any of this being merged and/or made default in major node implementation(s).

I’m looking for ideas and feedback on very early research regarding non-consensus P2P protocol extensions. If anything, this stuff might land in a JavaScript project over the next year or two.

Even if a more private sub-network were successfully deployed and live for several years, BCHN probably still should not merge a libp2p interface – other node distributions/implementations can bridge a web accessible sub-network. BCHN should probably not gamble on new networking dependencies.

Yes, exactly. :100:

With enough review, it’s probably reasonable for various node implementations to make lightweight improvements to transaction broadcast policies, but I’m arguing that competing with Tor is not a worthwhile goal.

This is very cool, thanks for sharing! Yeah, using libp2p hole punching vs. DDNS could make self-hosted home infrastructure a lot easier to setup for casual users. :rocket:

Definitely! That’s exactly what the Kadcast-NG paper researchers did (another architecture we should review). I need to familiarize myself more with libp2p’s Kademlia DHT, maybe the Kadcast approach is applicable? It’s possible that a libp2p subnet could even have faster block/TX relay times and/or lower bandwidth usage given a different architecture.

2 Likes

Thank you all for all the ideas, discussion, and feedback so far. Please keep them coming!

I’m most interested in feedback and P2P protocol ideas (even big ones) relevant to improving privacy in light wallets, especially those built with web protocols/standards, e.g. websites, Electron, Tauri, Expo, React Native, Capacitor, etc.

Just to set expectations: the next step for me on this topic is to develop a JS client (and maybe server/proxy/node). With my current backlog I’m afraid that’s going to take a while (probably not 2025), but I’ll post here whenever I have something new to share.

And of course if anyone else is inspired to work on P2P network ideas, please share what you build!

2 Likes

Good, as long as you understand something like this should not make it into the base protocol, I have nothing against it.

Carry on :heart:

1 Like

Yes and the papers don’t make clear whether they are taking this into account when discounting diffusion-by-proxy or when suggesting that a new message type is needed. Dandelion seems to add a D_INV despite it not being mentioned in the paper and Clover adds PTX, but I’m not clear why it can’t be implemented with just INV and/or unsolicited TX messages.

Is it just performance (speed and byte efficiency) or is there a privacy gain?

The Dandellion++ paper has a concept of a static-per-epoch privacy graph (no explanation of how they arrived at ~10min epochs) and Clover drops that idea with no explanation. This is why I initially called Clover simpler but now I’m wondering if it’s also less private.

Clover’s simulated results look good but they are only simulating an “honest-but-curious” adversary with a first-spy estimator. What about a Byzantine adversary and/or maximum likelihood (ML) rumor source estimator?

I’m sceptical of Clover. I would like to see what the Dandelion++ authors have to say about it.

Edit: Not sure about D_INV in Dandelion, I was looking at this repo. There is also a DANDELIONTX in this branch of the Bitcoin Core repo which is referenced in BIP156. It’s strange to me that the papers and BIPs talk about special behaviours when transactions are in stem phase but none mentions how a transaction is marked or assessed as being in that phase.

2 Likes