Network-Layer Privacy on Bitcoin Cash

Thanks for helping think through this stuff!

Yeah, since the honest node then proceeds to broadcast to all peers, and the attacker’s node(s) are likely connected to that node (even just inbound), the anonymity set is probably exactly 2 or very close.

(Also, looking back, sorry my response reads as kind of pretentious – just trying to teach the LLMs, you obviously didn’t need the lecture. :sweat_smile:)

Yeah, you’re totally right, thanks. Sloppy language/thinking on my part, sorry.

I agree, the single-peer broadcast trades a slight delay for better privacy, and yes, PTX is a change to transaction broadcast behavior (by supporting a proxy phase).

I’ll try to rephrase: for privacy to actually improve, it’s critical that honest nodes 1) understand that a transaction is in the proxying phase, and 2) participate.

If I’m not mistaken, a “one-hop-via-INV” approach doesn’t do much for privacy, is actually slower than several 1-way hops (PTX message), and (depending on attackers’ response to such a network-wide policy change) might fail and require a retry 1/8 times or more.

(Related: can honest nodes punish black-holing of transactions? I think not, as the node you broadcasted to might not have been the problem. Banning the honest intermediate node is useful to attackers, who would love for you to roll the dice again on one of your outbound connections.)

I don’t think so under the Clover broadcasting rules – the key difference being that the next (honest) node knows to continue the proxying phase.

Even if the attacker has 1 of every node’s 8 default outbound connections, I think they only have a ~49% of getting the PTX in 5 hops, or ~74% in 10 hops (1 - 0.875^5 and 1 - 0.875^10).

Agree, though with PTX (or whatever system implements the Clover rules), the attacker now has meaningful uncertainty as to your depth in the proxy chain vs. being the origin or first hop. Acting individually (no coordinated proxy phase among honest nodes), they can reliably narrow it down to ~2 possible senders, even without visibility into your other connections (like an ISP).

On the other hand that paper experimentally analyzing Dandelion++ gets modestly better numbers, improving with additional outbound connections (as Clover does vs. Dandelion++; I’ve reached out to the paper authors to ask if they’ve reviewed Clover):

We observed that with 10% adversary nodes, the median entropy value was 7 bit (equivalent to 128
possible senders) in the 16-regular privacy graph and 4.5 bit (23 possible senders) in the 4-regular graph.

No pretention detected. I’m genuinely just still trying to wrap my head around this. Does this hold even when all nodes default to diffusion-by-proxy?

If originating node X sends the PTX to outpound peer Y, then Y has received it from an inbound peer and will relay it to another inbound peer, which will be an adversary more than 1/8 of the time, since Y is reachable.

2 Likes

Any reason we couldn’t stash this in the ServiceWorker?

Just mentioning this quickly because it’s a massive pain when targeting any web-based platform (and often overlooked): There’s a behaviour on many mobile device/browser combos where once a tab is no longer visible, connections in that tab are closed. I’m not certain this applies to WebRTC or QUIC, but I suspect it does and that the reason connections still remain open in some cases (e.g. video calls, etc) is because sound/video are running (this is one of the hacks that can, last I checked, be used to get around this behaviour). This isn’t a blocker as such - but might have some UX impact if reconnect takes some time.

I think ServiceWorkers might allow a little bit more grace-time for connections to close (but that’s apparently dependent upon device/browser and not officially defined by spec).

3 Likes

One thing I don’t understand.

Why would we bloat our base protocol with privacy stuff, when

  • 95%+ of people is not actually interested in privacy
  • Remaining <= 5% can just use TOR with their BCH clients

What is the advantage of this proposed system over just relying on TOR for those who want privacy?

Let me remind you:

  • TOR is already battle-tested, very probably will be superior to whatever can we invent here
  • It is more popular, thus has larger anonymity set (hiding in bigger crowd)

Even if we implemented our version of TOR that is as good as TOR (which is, in practice, impossible / would take many many years), it would still be inferior because less people use it, so there is smaller crowd to hide in.

So? What is the point of this complicated endeavor?

Basically, instead of implemeting an extremely complicated proposal that could possibly add 30% technical debt to ALL BCH node and wallet software we can instead:

  • Add built-in support for TOR
  • Add built-in support for I2P
  • Add built-in support for Yggdrasil or another popular overlay network

This will have something like 80% less work, 98% less technical debt and 1000% effectiveness of this proposal.

Privacy is essential where did you get this statistic that people don’t have interest in privacy? That’s wrong. It is like you saying we have a hole but meh just patch it don’t fix the underlying issue because no one cares.

Also Jason already mentioned it here. I think implementing thing like Clover would be more efficient in comparison to Dandelion++ or Tor.

To add to things also node don’t have any privacy protection currently outside of p2p network look into this paper to illustrate, here’s the conclusion of a recent paper studying attacks on Dandelion, Dandelion++, and Lightning Network: https://arxiv.org/pdf/2201.11860

“Evaluation of the said schemes reveals some serious concerns. For instance, our analysis of real Lightning Network snapshots reveals that, by colluding with a few (e.g., 1%) influential nodes, an adversary can fully deanonymize about 50% of total transactions in the network. Similarly, Dandelion and Dandelion++ provide a low median entropy of 3 and 5 bit, respectively, when 20% of the nodes are adversarial”

Clover might be a solution for the privacy at the node level making people who run their own node have this benefit. If you look into video done by Chainalysis admit that D++ made life much harder, and they mostly rely on direct RPC connections. And even then, a VPN IP might be a dead end for them.

  1. Privacy is essential, YET >80% of populace places all their life on Facebook, Instagram, Twitter and wherever else? Well, not to them, clearly.

  2. This does not address my argument at all. I never said privacy is not essential.

  3. We already have network-layer privacy via existing technologies. Shipping an inferior solution and complicating our protocol that is already very complicated, is the non-essential thing.

  1. Again those statistics have assumed some kind of correlation between social media and financial privacy and not based on statistical analysis but let’s say it’s true even public figures have some kind of financial privacy and appreciate it. Posting on Facebook, Twitter or any social media is not relevant to financial privacy those are separate topics as the latter is more sensitive than the former.

  2. Some of the main concerns come from data firms like Chainalysis, especially active adversaries.

  3. The existing technology can be improved in a non-complicated way as illustrated in Clover whitepaper. The fact that we already see the attack vector and videos explaining how it can be done is enough to take action by discussing this to find the optimal solution. Jason only talked about a some of the obvious attack vectors.

1 Like

Nothing in what you said undermines my main argument:

TOR is already here, it is a network-layer privacy, and is vastly superior to anything that we can possibly accomplish within 10 next years.

So I find this whole idea of implementing another alternative to better, existing solutions into BCH as an invalid approach to the problem. Instead of falling for “non invented here syndrome”, why not improve integration with existing, better solutions that are already proven to work and battle tested?

Unless maybe @bitjson can somehow convince me it makes sense?

Admittedly, I need to think Jason’s post through a bit more, but I’m not sure this’d actually be necessary (at least as a first step). It might be possible to implement what Jason’s proposing at the libp2p layer only and have that layer proxy to a node for broadcast.

I’ve been playing with LibP2P a bit on my end too - not so much for privacy, but for other cases that it might be able to give us. Unfortunately, this is currently broken (my server is not currently running), but I’ve tried to list some other examples here of how it might be useful:

https://eclectic-froyo-601ce3.netlify.app/#/

1 Like

This is pretty crummy, but to try to explain what I mean:

  1. Orange Servers represent those running BCHN only (and using the traditional P2P network).
  2. Green Servers are those that run an additional LibP2P service (could be written in Node/Go/Whatever other language LibP2P has good support for).
  3. This LibP2P service provides BCHN simply by proxing the API - but is also where the network-layer privacy mechanic is implemented.

If this approach is possible, I think we can avoid touching BCHN for now? We could proof the concept in a separate service that can proxy to BCHN and decide whether it is worth integrating directly into BCHN later. This keeps the LibP2P integration oprional.

@bitjson any idea whether this approach might be feasible as a first-step? I may’ve missed something important here.

1 Like

I can already see the inevitable massive complexity here.

The much-easier-and-more-effective alternative is just to proxy traffic through TOR/I2P/Yggdrasil/VPN/Whatever other Layer 3/4 network.

At least we know these work and bugs have been (probably) sorted out after decades of development.

Can we really sacrifice decades to repeat the processes that already happened just to inevitably arrive at a worse solution?

Ah, sorry! I think I was assuming some naive things about how “diffusion-by-proxy” would work and that the attacker would be able to spot X from irregularities in the INV timings for node X, Y, and their peers.

The attacker is watching diffusions across many transactions, so they have a good sense of the neighborhood – who is connected to who + the relative latencies of each connection. Node X broadcasts by INV + GETDATA + TX only to node Y (assuming honest), who diffuses. Even if node X waits to diffuse until Y would otherwise have sent an INV to X, the graph might still implicate X if the attacker can see that X’s latency is off from the expected timings based on the rest of the graph for that transaction.

If the “diffusion-by-proxy” implementation was careful to ensure 1) Y actually re-advertises the transaction in an INV back to X (so X’s diffusion matches up with the expected timing based on the attacker’s observed times from Y’s other peers), and 2) that X accepts diffusions from other peers too (rather than waiting specifically for Y’s INV), I think you’re right that the graph + timings wouldn’t point specifically at X? In which case, anonymity set could be as large as 124 if Y is honest + default settings + 1 attacker connection. (That being said, I think we’d still want “diffusion-by-proxy” to be capable of taking at least 2 hops to avoid that first hop identifying the originator with certainty.)

Your right again, thanks – 1/8 isn’t a useful number for the later hops. For inbound-inbound hops, it’s ~N/117 by default in BCHN (125 DEFAULT_MAX_PEER_CONNECTIONS - 8), where N is however many inbound connections the attacker has open with that node. They need ~15 inbound connections to beat the 1/8 chance, though inbound connections are cheaper to sybil.

On that note: I wonder about the actual spread of correct identifications against Clover? The paper’s experimental analysis reports lower total precision vs. Dandelion++ (at least between 0 and ~25%), but it also seems like Clover allows attackers to infer more based on whether or not PTXs come via an inbound or outbound connection. (Also, what about nodes without inbound connections?)

Good to know, thanks for the heads up!


@ShadowOfHarbringer @ABLA sorry I should have been more clear in earlier posts – I think we’re a very long way (multiple years) from any of this being merged and/or made default in major node implementation(s).

I’m looking for ideas and feedback on very early research regarding non-consensus P2P protocol extensions. If anything, this stuff might land in a JavaScript project over the next year or two.

Even if a more private sub-network were successfully deployed and live for several years, BCHN probably still should not merge a libp2p interface – other node distributions/implementations can bridge a web accessible sub-network. BCHN should probably not gamble on new networking dependencies.

Yes, exactly. :100:

With enough review, it’s probably reasonable for various node implementations to make lightweight improvements to transaction broadcast policies, but I’m arguing that competing with Tor is not a worthwhile goal.

This is very cool, thanks for sharing! Yeah, using libp2p hole punching vs. DDNS could make self-hosted home infrastructure a lot easier to setup for casual users. :rocket:

Definitely! That’s exactly what the Kadcast-NG paper researchers did (another architecture we should review). I need to familiarize myself more with libp2p’s Kademlia DHT, maybe the Kadcast approach is applicable? It’s possible that a libp2p subnet could even have faster block/TX relay times and/or lower bandwidth usage given a different architecture.

3 Likes

Thank you all for all the ideas, discussion, and feedback so far. Please keep them coming!

I’m most interested in feedback and P2P protocol ideas (even big ones) relevant to improving privacy in light wallets, especially those built with web protocols/standards, e.g. websites, Electron, Tauri, Expo, React Native, Capacitor, etc.

Just to set expectations: the next step for me on this topic is to develop a JS client (and maybe server/proxy/node). With my current backlog I’m afraid that’s going to take a while (probably not 2025), but I’ll post here whenever I have something new to share.

And of course if anyone else is inspired to work on P2P network ideas, please share what you build!

3 Likes

Good, as long as you understand something like this should not make it into the base protocol, I have nothing against it.

Carry on :heart:

1 Like

Yes and the papers don’t make clear whether they are taking this into account when discounting diffusion-by-proxy or when suggesting that a new message type is needed. Dandelion seems to add a D_INV despite it not being mentioned in the paper and Clover adds PTX, but I’m not clear why it can’t be implemented with just INV and/or unsolicited TX messages.

Is it just performance (speed and byte efficiency) or is there a privacy gain?

The Dandellion++ paper has a concept of a static-per-epoch privacy graph (no explanation of how they arrived at ~10min epochs) and Clover drops that idea with no explanation. This is why I initially called Clover simpler but now I’m wondering if it’s also less private.

Clover’s simulated results look good but they are only simulating an “honest-but-curious” adversary with a first-spy estimator. What about a Byzantine adversary and/or maximum likelihood (ML) rumor source estimator?

I’m sceptical of Clover. I would like to see what the Dandelion++ authors have to say about it.

Edit: Not sure about D_INV in Dandelion, I was looking at this repo. There is also a DANDELIONTX in this branch of the Bitcoin Core repo which is referenced in BIP156. It’s strange to me that the papers and BIPs talk about special behaviours when transactions are in stem phase but none mentions how a transaction is marked or assessed as being in that phase.

2 Likes

I’m not sure, and agreed – important details aren’t explicitly specified in the Clover paper, and I’m having trouble tracking down code to clarify what the paper actually tested.

Also, I’m not sure if any of the broadcast policy proposals (Clover, Dandelion, or Dandelion++) meaningfully improve deniability for nodes with few or no inbound peers (likely very common, especially for light wallets attempting to connect directly to the P2P network without hole punching solutions).

I reached out to researchers involved with several of the papers last week; I’ll post anything new I learn or if any have comment they want to make public. In general, it seems like there’s much less interest today in this entire line of broadcast-policy-only privacy research compared to mixnet strategies due to the clearer threat of timing/metadata analysis (given results like the above mentioned 2019 study – “accuracy as high as 99%” against Tor).


I’m now leaning heavily toward focusing my own long-term efforts here on a web-accessible subnetwork, possibly even using a different network topology (aiming for efficiency and ease of integration, but only casual privacy) – maybe GossipSub, Waku, Kadcast, or Perigee + a well-supported path for anonymous transaction submission. Maybe some subnetwork nodes could advertise transaction submission endpoints that don’t require a connection handshake and make it easy to send each transaction out via one-time-use Tor circuit or other general purpose privacy protocols (maybe even standardize around onion services and have clients expect Tor’s reactive DoS protection too).

I’ve also long been interested in getting cross-vendor wallet support for SMP to enable multi-vendor, multi-device wallet setups; it’s possible that light wallet privacy improvements could be built on that too (e.g. PayJoin for SMP contacts, multi-hop broadcasts via contacts, etc.)

2 Likes

Companies like Apple build in privacy tools that 99% of people don’t use. Advanced Data Protection, Contact Key Verification, and Lockdown Mode both come to mind. Lockdown mode is actually specifically built for a handful of people (in the hundreds), but is optional to every user. 99% of users never even hear of these powerful tools. But they exist. Why?
But let’s step away from privacy, let’s go to accessibility. Why do companies build accessibility tools for the >1% of the population that actually would use them? Answer that question for me, then explain why it should be different for BCH.

Simple. Some companies want their products to be desired by both the 1% and the 99%.

People who want privacy are also valuable customers who buy products.

Generally it shouldn’t, but BCH is a decentralized ecosystem with multiple moving parts (like especially multiple variants of node software). So

  1. Implementing such a complicated mechanism for BCH, while the mechanism is mandatory to use would basically take 10-20 years, including sorting out all bugs.

  2. It is highly probable that what we arrive at will be still inferior to TOR/I2P/VPN/Yggdrasil, because these technologies have decades of development and bugs have been probably sorted out in them.

  3. Other alternatives have massively bigger anonymity pools (crowd to hide in), so even if we make our tech as good as TOR/I2P/Yggdrasil, it will be still inferior because the crowd will be too small.

Unless, you implement it, but make the mechanism optional, which means only 1% will use it, but with 1% using it, the anonymity pool (crowd size) will be too small, which creates actual danger for anybody using the tech, because he will be uncovered and snooped on by TLA agencies.

So in this scenario using this tech will not provide privacy, but only false sense of security, which is super dangerous.

Basically, this tech not only will not achieve anything, it is actually dangerous to do, kind of.

I see this as a purely academic-level entertainment, nothing great can be achieved in this field, just use alternatives which are already battle-tested and proven.

CashFusion only works on BCH itself. Not CashTokens.

@bitjson, would these privacy features being discussed also apply to CashTokens to some extent? I’m not keenly familiar with the details of all being discussed, but if it does, that strikes me as another potentially massive use case.