I calculated some confidence intervals and did some statistical power analysis using the data on the Google Sheet and an R script I wrote here.

My conclusion: each phase that you want to measure should be run for 100 blocks or more. I know that’s running each phase for move than half a day, but if you want reliable results then you have to increase the sample size substantially above what you have now.

I estimated confidence intervals using a nonparametric percentile bootstrap. Bootstrapped confidence intervals work well in cases of low sample size and data that is not normally distributed, like in our case here. I chose to display the 90% confidence interval since that seemed appropriate for our purposes.

The units of the confidence intervals are seconds to process each block. The “Transactions per second” unit cannot be used directly since there is no measurement of how long each transaction verification takes and therefore there is no way to calculate the variability. I only had enough data to measure the fan-out and steady state 1 phases. Fan-in had only two observations, which is too few. Steady state 2 was missing data in the sheet. The Fulcrum data sheet has its units as “msec”, but from the discussion above it seems that it is actually just seconds.

Here are the confidence intervals:

Processing Type |
Block Type |
Lower 90% Confidence Interval |
Upper 90% C.I. |

bchn.0p |
fan-out |
31 |
106 |

bchn.0p |
steady state 1 |
42 |
62 |

bchn.90p |
fan-out |
28 |
135 |

bchn.90p |
steady state 1 |
12 |
14 |

fulcrum.0p |
fan-out |
1785 |
2169 |

fulcrum.0p |
steady state 1 |
NA |
NA |

fulcrum.90p |
fan-out |
1579 |
1805 |

fulcrum.90p |
steady state 1 |
574 |
698 |

The largest confidence intervals are for the fan-out phases for BCHN (both 90p and 0p). They are very large and therefore need to be shrunk by increasing the sample size.

Through statistical power analysis we can get a sense of how many observations are needed to shrink confidence intervals to a certain size. To standardize and make the numbers comparable across different block processing procedures, can express the width of these confidence intervals in terms of percentage of the mean of the quantity being measured.

Below is the estimated sample size to achieve a target width of confidence interval. I chose 10%, 25%, and 50% of the mean for comparison:

Processing Type |
Block Type |
N for C.I. width < 10% of mean |
< 25% |
< 50% |

bchn.0p |
fan-out |
1447 |
234 |
60 |

bchn.0p |
steady state 1 |
93 |
17 |
6 |

bchn.90p |
fan-out |
2036 |
328 |
84 |

bchn.90p |
steady state 1 |
18 |
5 |
3 |

fulcrum.0p |
fan-out |
45 |
9 |
4 |

fulcrum.0p |
steady state 1 |
NA |
NA |
NA |

fulcrum.90p |
fan-out |
22 |
6 |
3 |

fulcrum.90p |
steady state 1 |
25 |
6 |
3 |

The results show that we ought to be able to shrink the confidence interval to less than 50% of the mean for all block processing procedures if we use 100 blocks for each phase.

Let me know if I have misunderstood anything about the data.