Next Steps
We now have two implementations (of both Limits and BigInt) – BCHN (C++) and Libauth (JS/TS) – with extremely consistent results and relative performance differences in exactly the places we would expect based on complexity analysis (in each case, the spec takes the most conservative approach to assigning operation costs).
We learned a bit more in bouncing the spec between JS and C++. I’m fairly confident now that further changes to the spec will be minor, so implementation work in other node implementations won’t have to go through the same churn. This is also an opportunity to verify empirically that we’ve gotten everything right.
To “preregister” my planned approach to evaluating whether or not we got these CHIPs right: as we get performance benchmark results from each implementation, I’ll rank-order all results by their worst performance-to-opCost ratio for the worst-performing implementation on the benchmark in question. The worst case should be within the same order of magnitude as worst-case 1-of-3 bare multisig (BMS), and ideally it should never exceed 200% 1-of-3 BMS performance-to-opCost.
Given the results from Libauth (shared last month) + Calin’s BCHN results from last week, it seems this will be proven empirically, but I’ll wait to get complete results (and any more spec feedback) from all other implementations before claiming we’re done.
So: We need help getting patches and then benchmark results for all other node implementations. I’m reaching out to all the teams today, and I’m happy to hop on calls to help teams ramp up quickly or answer any questions. This is also the perfect time to get wider feedback on if any spec details can be further simplified to make implementation easier in other languages (JS and C++ gave us a pretty wide view already, but another round of implementation feedback should give us more confidence that we’re not leaving any more protocol simplifications on the table.)
In the meantime, I’ll revise all the benchmarks again for this latest iteration (hopefully one last time ) and make some pretty graphs that I can keep updated as we get results from other implementations. I’m also hoping to knock out the stretch goal of expanding our new test suite to incorporate the old script_tests.json
such that it can be fully deprecated.
When we’re done here, we should have both an expanded set of functional test vectors and an extremely comprehensive set of benchmarking test vectors that can be used to reveal any performance deficiencies in any VM implementation.