Assessing the scaling performance of several categories of BCH network software

Announcement:

Bitcoin Verde to begin assessing the scaling performance of several categories of BCH network software.

Summary:

The Bitcoin Verde and Bitcoin Cash Node development teams have created a plan to test several categories of software related to Bitcoin Cash in order to assess each piece of software’s ability to handle increased block sizes.

Funded by BCHN, each piece of software tested by Bitcoin Verde will produce a standardized report in the context of processing blocks up to 256MB in size. Once complete, each report will be publicized to bitcoincashresearch.org to inform the general community of our findings, as well as to aid in further development discussions.

Evaluation Plan:

Starting this month, the Bitcoin Verde team will begin testing various pieces of software used to support the Bitcoin Cash network. These tests will be used to help our community assess the network’s ability to handle an increase to the maximum block size from 32MB to 256MB. During this assessment, the Bitcoin Verde team will be responsible for creating a prioritized list of relevant software to be tested and producing standardized reports of our findings.

In execution of these software evaluations, the Bitcoin Verde team will be working to produce and publish detailed reports containing the findings of each assessment. Although individual evaluations may result in a wide range of metrics, each report will provide information important in determining whether the software tested is ready for scaling.

In order to begin our evaluations, Bitcoin Verde will first work to create and set up a scalenet as well as to choose/create repeatable tests of the software. The scalenet created will be custom and not published to the live, nor test, networks; the scalenet is a contrived intranet designed to test various edgecases of network performance. The scalenet is static and does not grow organically; the blocks used to test the network and software will be publicly available. This design allows for repeatable testing to ensure an “apples-to-apples” test is always available when changes/updates are made. Once set up, evaluations will be conducted over the course of several weeks, with the intention of producing and publishing one detailed report per week.

Although a number of software categories have been specifically chosen for testing in preparation of these evaluations, not all choices have been set in stone and may be subject to change over the course of our evaluations. Individual software evaluations will likely require different metrics in order to objectively evaluate their performance. The Bitcoin Verde team will provide relevant performance-related metrics for each piece of tested software within our reports.

Every endpoint testing will have unique performance metrics, (i.e. the performance of mining pool software do not have the same metrics as wallet software), but generally, metrics under consideration will include:

  • Initial synchronization
  • Connectivity/latency during heavy network load
  • Latency after Reorganization
  • Latency after disconnection (network catchup)

In addition to conducting and reporting individual software evaluations, the Bitcoin Verde team will simultaneously document each test performed in detail. Documentation of tests performed will be included in the weekly published reports. It is our intention to document each test in such detail that any mildly-technical person reading the report is able to reproduce the results from publicly available components, proprietary software excluded.

Evaluated software that is proprietary in nature or otherwise unable to be publicly reproduced, will be noted with BCHN.

Several categories of software have been chosen to evaluate their “performance at scale”. Evaluations of software performance may include, but are not limited to:

Mining pools

  • Open source
  • Closed source (where possible)

Common wallet backends

  • Fulcrum
  • Blockbook

Indexers

  • bitDB
  • slpDB

Software Evaluations

In order to easily bootstrap a chain for testing purposes as well as to allow space for early BIP/hard-fork activations, the configuration of testnet4 has been identified as having an appropriate base configuration that could be leveraged to avoid any initial hiccups.

Since testnet4 uses simple, round numbers for the activation heights, and the blocks are small, copying the first 5000 blocks allows the scaling tester to start from a clean, repeatable state that should avoid any initial problem with generating test transactions. This also minimizes the time spent configuring chain parameters in BCHN, while making room for any modifications necessary to complete the tests.

Following this initial set of bootstrapping blocks, the scaling tester will contain code to generate additional blocks that create the funds to be used within tests. All following hard-forks will also be handled within these generated blocks. This combined set of 5000 testnet4 blocks and pre-mined setup blocks will then serve as a base chain for any tests to follow.

Tests will then be additional sets of transactions and blocks (along with their associated relay times) that can be replayed onto a test node (or set of nodes) that make up the private test net defined by these chain parameters. After a test is run, additional tests (or re-tests) will require resetting the node to the base chain height (5000 + X setup blocks) and then broadcasting the transactions and blocks from the test as defined.

Testing Architecture

The current test architecture involves a test scenario generator, which will predetermine a series of blocks of various sizes and complexity. These blocks will build off of the base chain described above.

Static test scenarios are then emitted to a node that has been segregated from the rest of the network (and its checkpoints disabled, as applicable) to ensure uptake of the static scenario. The software under test will be connected to the node during the period where the test data is provided to it, and various performance metrics will be gathered for that test scenario. Once the test is complete, the node is reset and a new test scenario can be run from scratch.

During evaluation, multiple tests of the same scenario may be performed in order to ensure replicatibility as well as to measure any deviations in performance.

Status

The Bitcoin Verde team is currently setting up the scalenet needed to conduct our evaluations.

To begin our evaluations, the Bitcoin Verde team intends to first assess the common wallet backend, Fulcrum. The current step by step guide for evaluation is as follows:

  1. Research existing Testnet generation / testing tools
  2. Generate first iteration of private Testnet
  3. Ensure BCHN can accept private Testnet (in isolation)
  4. Attempt to connect Fulcrum to BCHN Testnet
  5. Identify high-value Testnet scenarios
  6. Rerun Fulcrum tests with the multiple Testnet scenarios
  7. Determine (and implement, if necessary) automated capture of evaluation metrics.
  8. Compile and publish results

Following the success of this evaluation, the Bitcoin Verde team will move on to connecting other Bitcoin Cash tools to the test net, and repeating the process.

Final Word

We are extremely excited to begin our evaluations and are appreciative of the BCHN team and their assistance to set this project up. This evaluation effort is set to span several months (~5), please be sure to check back as additional evaluations are complete.

13 Likes

“These tests will be used to help our community assess the network’s ability to handle an increase to the maximum block size from 32MB to 256MB.”

With the average blocksize being so low. I would hope the community would focus more on adoption and withhold from making any changes that would risk damaging the confidence of the currency.

That said, it sounds like a good test and excited to see results.

1 Like

Fantastic!

Will you plan to verify performance metrics for node implementations too? I assume most node implementations have benchmarks already, but it would be nice to have some resource that verifies/reports the latest numbers for each implementation.

Also, if you’re interested, I’d love to help however I can with testing Chaingraph (an indexer). There are also some results here from informal testing in October 2021.

4 Likes

Great question. Although we don’t have any specific node performance tests coming up, our assumption is that through evaluating different tools and categories of software we will inherently be collecting and verifying a lot those metrics anyway.

And thanks, I don’t think we are opposed to that offer! Our first thought is whether it is being used in production by any services at the moment, as we’re trying to prioritize popular services. Also, it should be mentioned: all of the test cases that we’re creating are going to be open source, so if we don’t end up having time, you can always run it once the tests are published.

1 Like

Cool stuff, such an evaluation is very useful to have done.
Would love to know where everyone is on this :slight_smile: I did some in the past, but only on Core and on Flowee the Hub. Those stats are getting stale, though (Jan 2021). I might redo them. See: Scaling progress report.

Great place to start. Just one nitpick, there is exactly one wallet that depends on the protocol it talks, so to call it a “common wallet backend” may cause confusion.

And the reason for my writing here today, a question:

Would you make the blockchain (the data, not the code) you are going to test against available for download somewhere, so we can do this scientificially and other parties (well, mostly other hardware) can try to reproduce your results?

Thank you!

ps. if you want any support from Flowee, please just ask. For instance here, or on Telegram: Contact @Flowee_org.

Nitpicking on your nitpick: Is it “exactly one wallet” though? The Electrum protocol is an open protocol for SPV wallets and electrum-cash - npm is another piece of software, not related to Electron Cash at all, that uses the same backend.

1 Like

We absolutely are! In fact, soon as we publish our first report, the data we used to stress test it will be published alongside it.

2 Likes

Progress update:

I’ve got blocks going from the test case/block emitter to the BCHN node (and a Verde node technically too, but that was to make debugging easier for us). I also have code to generate test case blocks/chains with an ASIC ran/running against it. Next step is having the test case generator create something meaningful and then have fulcrum consume it via BCHN.

The decision we landed on was creating a fork of BCH mainnet after block 144 to be the start of each test scenario, then 100+ blocks for making spendable coinbase UTXOs, then N+ test scenario blocks. We were originally going to use testnet4 as the base, but I forgot that testnet4 uses the special difficulty rules and using those rules are not helpful for creating the test blocks (since it’s a private/ephemeral blockchain) and could technically cause a different codepath to be executed within nodes/wallets/etc.

We’re hoping to have a report ready by the end of the week.

1 Like

Using mainnet is interesting, but please check if the wallets you want to test use checkpoints, which Flowee Pay is doing. If they are, then you’d be better off using testnet.

Weekly update: Currently we have multiple specially-designed ~256MB blocks created and transmitted from the block/scenario emitter to a BCHN node (we also tested node-to-node propagation). These blocks consist of UTXO fan-out and UTXO fan-in scenarios, as well as steady-state blocks. Currently BCHN is processing about 5s per MB for these large (and completely unseen, i.e. worst-case scenario) blocks with default settings running on a modern laptop (Bitcoin Verde is processing about 10s per MB). We’ll publish the data in the report we’re preparing for next week. This report will be focused on the fulcrum/EC wallet endpoint, but will include the performance of the BCHN node(s) as well. We’ll also include the data of the block/scenario emitter to account for any lag introduced from the test scenario itself (which we anticipate any lag here to be negligible from what we’ve seen with current testing).

Creating the test scenarios are taking a bit longer than we had planned/hoped, and we’re about a week behind our desired progress. We’re expecting to publish our first formal report by the end of next week.

2 Likes

Some do, some don’t. The EC wallet does use checkpoints, but they’re trivial to change. The good news is that if we need to test a wallet where its checkpoints aren’t easy to change then we have the facilities to recreate the test blocks forked from a different point (i.e. testnet or whatever).

2 Likes

Any updates on this? I’m really eager to read it :slight_smile:

We’ve compiled our first report! We’ve learned a lot during this process and we hope to have better (and faster) reports coming in the future. Please review and give us your thoughts on our findings and if you see any flaws in our methods that we can improve upon next time.

Our raw data may be found here: BCHN Research - Google Drive

4 Likes

Thanks a lot for the numbers!

While BCHN performance seemed within expectations, Fulcrum looks like it’s struggling with steady-state and fan-ins. I wonder if that has anything to do with how Fulcrum handles its DB…

1 Like

Also: in “Steady-state 2”, which are hundreds of kb blocks, fulcrum was taking excessive time to process those as well - which seems particularly odd, as we know Fulcrum can handle those in the wild. The numbers are perhaps worth double-checking?

1 Like

Reading the report I (happily) assume the numbers above are inverted. It should be 5 MB per second, right?

Was Fulcrum/BCHN communicating via ZeroMQ?

I’m not sure how Fulcrum communicates to BCHN, to be honest. I think it polls for new stuff via RPC although I’m not sure. This is definitely a question for @cculianu .

Calin is looking into it; in fact we gave him (and the project, located in the google drive directory) instructions for how to replicate the build/results. His theory is there a bug in Fulcrum about message sizes getting too large. That seems like it makes sense to me, although it’s weird we didn’t see the same behavior when running the 90p tests.

My original theory was that BCHN was holding the global lock and was starving Fulcrum from getting RPC responses, but again, that too is a flawed theory because after BCHN finished processing its blocks it still took hours for Fulcrum to complete.

The only thing I know for sure is that we were consistently reproducing the behavior with the 0p tests. We ran it at least 4 times.

1 Like

I’m pretty sure there is something dumb happening in Fulcrum’s HTTP-RPC client causing a slowdown/bottleneck here. I saw something similar happen with ScaleNet in some cases. I will investigate and fix. I don’t think this is problem is fundamental to Fulcrum’s design. (But even if it were, any bottlenecks can be addressed and fixed).

@Jonas To answer your question, Fulcrum doesn’t use ZMQ for anything more than brief notifications (such as to wake up and download a new block when it is available). It uses bitcoind’s HTTP-RPC server to download blocks, etc.

2 Likes