Assessing the scaling performance of several categories of BCH network software

Progress update:

I’ve got blocks going from the test case/block emitter to the BCHN node (and a Verde node technically too, but that was to make debugging easier for us). I also have code to generate test case blocks/chains with an ASIC ran/running against it. Next step is having the test case generator create something meaningful and then have fulcrum consume it via BCHN.

The decision we landed on was creating a fork of BCH mainnet after block 144 to be the start of each test scenario, then 100+ blocks for making spendable coinbase UTXOs, then N+ test scenario blocks. We were originally going to use testnet4 as the base, but I forgot that testnet4 uses the special difficulty rules and using those rules are not helpful for creating the test blocks (since it’s a private/ephemeral blockchain) and could technically cause a different codepath to be executed within nodes/wallets/etc.

We’re hoping to have a report ready by the end of the week.

1 Like

Using mainnet is interesting, but please check if the wallets you want to test use checkpoints, which Flowee Pay is doing. If they are, then you’d be better off using testnet.

Weekly update: Currently we have multiple specially-designed ~256MB blocks created and transmitted from the block/scenario emitter to a BCHN node (we also tested node-to-node propagation). These blocks consist of UTXO fan-out and UTXO fan-in scenarios, as well as steady-state blocks. Currently BCHN is processing about 5s per MB for these large (and completely unseen, i.e. worst-case scenario) blocks with default settings running on a modern laptop (Bitcoin Verde is processing about 10s per MB). We’ll publish the data in the report we’re preparing for next week. This report will be focused on the fulcrum/EC wallet endpoint, but will include the performance of the BCHN node(s) as well. We’ll also include the data of the block/scenario emitter to account for any lag introduced from the test scenario itself (which we anticipate any lag here to be negligible from what we’ve seen with current testing).

Creating the test scenarios are taking a bit longer than we had planned/hoped, and we’re about a week behind our desired progress. We’re expecting to publish our first formal report by the end of next week.

2 Likes

Some do, some don’t. The EC wallet does use checkpoints, but they’re trivial to change. The good news is that if we need to test a wallet where its checkpoints aren’t easy to change then we have the facilities to recreate the test blocks forked from a different point (i.e. testnet or whatever).

2 Likes

Any updates on this? I’m really eager to read it :slight_smile:

We’ve compiled our first report! We’ve learned a lot during this process and we hope to have better (and faster) reports coming in the future. Please review and give us your thoughts on our findings and if you see any flaws in our methods that we can improve upon next time.

Our raw data may be found here: BCHN Research - Google Drive

4 Likes

Thanks a lot for the numbers!

While BCHN performance seemed within expectations, Fulcrum looks like it’s struggling with steady-state and fan-ins. I wonder if that has anything to do with how Fulcrum handles its DB…

1 Like

Also: in “Steady-state 2”, which are hundreds of kb blocks, fulcrum was taking excessive time to process those as well - which seems particularly odd, as we know Fulcrum can handle those in the wild. The numbers are perhaps worth double-checking?

1 Like

Reading the report I (happily) assume the numbers above are inverted. It should be 5 MB per second, right?

Was Fulcrum/BCHN communicating via ZeroMQ?

I’m not sure how Fulcrum communicates to BCHN, to be honest. I think it polls for new stuff via RPC although I’m not sure. This is definitely a question for @cculianu .

Calin is looking into it; in fact we gave him (and the project, located in the google drive directory) instructions for how to replicate the build/results. His theory is there a bug in Fulcrum about message sizes getting too large. That seems like it makes sense to me, although it’s weird we didn’t see the same behavior when running the 90p tests.

My original theory was that BCHN was holding the global lock and was starving Fulcrum from getting RPC responses, but again, that too is a flawed theory because after BCHN finished processing its blocks it still took hours for Fulcrum to complete.

The only thing I know for sure is that we were consistently reproducing the behavior with the 0p tests. We ran it at least 4 times.

1 Like

I’m pretty sure there is something dumb happening in Fulcrum’s HTTP-RPC client causing a slowdown/bottleneck here. I saw something similar happen with ScaleNet in some cases. I will investigate and fix. I don’t think this is problem is fundamental to Fulcrum’s design. (But even if it were, any bottlenecks can be addressed and fixed).

@Jonas To answer your question, Fulcrum doesn’t use ZMQ for anything more than brief notifications (such as to wake up and download a new block when it is available). It uses bitcoind’s HTTP-RPC server to download blocks, etc.

2 Likes

Thank you for working on this assessments. One note though on slpDB and bitDB, I think those are proven to be abandonware or unmaintained with really bad performance and very serious issues. I would suggest to replace them with:

There is a comment in Fulcrum’s docs about passing a ZMQ endpoint config parameter to the (bchn) node to speed things up, but seems optional and I have not tried to measure the difference yet.

Thank you very much Josh & Verde team.

I’ve got the basic test built and running based only on the data you supplied (plus a download of the latest Fulcrum binary release).

As I ran on a Debian 11, I ran into a couple minor issues. As I resolved them I will put some notes here in case others face similar bumps:

  1. stock Debian gradle seems too old, so definitely download a recent gradle package from Gradle’s site otherwise it will fail to parse the build.gradle and error on archiveVersion and archiveBaseName (possibly others too, but I only got up to those before deciding it must be due to an inadequate Gradle on my box).

  2. The run script tries to call gradlew, so one needs to run gradle wrapper in the bch-scaling base folder in order to generate that wrapper there.

Further there are some points unclear to me yet but let me note how I proceeded:

  1. Fulcrum docs says it requires indexing on the node side to be enabled, I think? (still need to verify if it works without, but I added index=1 to the bitcoind.conf config
1 Like

@freetrader : Hey, thanks for trying to get it running for yourself! Debian is my goto OS, so I’m pretty familiar with the problems you can encounter, which is good. The gradle wrapper was committed to the repo, so you should have been able to run ./gradlew makeJar copyDependencies and have it “just work” (the wrapper should solve all of the problems you encountered with Debian since it will download the version of gradle it needs).

More ideally, the build intent was to run the ./scripts/make.sh script (from the project directory) since it’ll take care of structuring the out directory for you. Do either of these steps not work? I ran this just now on a Debian 11 VM with openjdk 11.0.15 and it worked without anything special, so hopefully that’s the same for you.

1 Like

This week we implemented a change to the test block emitter to enable the transfer of blocks and transactions via the P2P protocol. We’ve also re-ran the tests via P2P protocol (instead of via the RPC protocol). We have the raw data uploaded but haven’t finished compiling the results. I expect we’ll have this done before the end of the day on Monday. These tests were run with the node configured with txindex=1 and ZMQ enabled; it will be interesting to see if this has any significant performance affect on the node and Fulcrum.

Additionally, we’ve started adding new blocks to the test framework to model cash transactions: 2 inputs -> 2 outputs . These blocks are appended to the current framework and should be made available this coming week.

The one benefit (from a testing perspective) of using RPC was that it was easier to measure when BCHN finished (since the RPC call hung until the block was accepted). We can still measure how long BCHN took, we just have to do it slightly differently, which is not a problem but is something that took us longer than we expected.

A preliminary look at the results are …interesting. It looks like it takes twice as much time for BCHN to process a block compared to last time. I suspect this has little to do with P2P and more to do with index being enabled. I’m going to run the tests again tonight with P2P and index disabled so we can better compare apples-to-apples.

1 Like

I tried to run the suite to reproduce the results, and these are my personal findings:

  1. Java is my first language, so I have taken a (medium depth) look at the code. It is solid and clean, even if a little verbose. Easily extensible after an hour of getting comfortable. Still, python would probably be more palatable to more people
  2. The reproduction instructions are good so far as they go, but incomplete. I’m happy with building my own software, but there was guesswork involved in following instructions
  3. Could not reproduce fulcrum processing, don’t know where to get the logs; can confirm there is a noticeable slowdown after the fanout phase
  4. The results need to be composed by hand. Four different .csv files are generated and need to be assembled by hand to get the results, which is lenghty and error prone. A python script would do wonders here
  5. Bug: the bchn_start_csv.sh script gives me the start times for blocks 245-260, instead of 245-266
  6. Biggest concern: the test sample size is too small. The variance is all over the place with such few samples. A bigger sample size would easily offset the db flushes, or the flushes could be removed from the timing
3 Likes

We’ve compiled the reruns of the existing test framework to explore the hypothesis that RPC vs P2P code paths were different (and more specifically that P2P would be better). In short, it looks like the results are within variance between runs (particularly, the large spikes (likely caused by DB flushing) increasing averages). The formatted results for each finding are below:

RPC Results: RPC Results - Google Sheets

P2P Results (No Tx Indexing): P2P Results - No TxIndex - Google Sheets

P2P Results (Tx Indexing):

2 Likes