Why Eliminating 48% of Memory Writes Only Gave Us 4% Speedup

Lessons from building a bit-parallel SIMD simulator for a RISC-V CPU The Setup We’re building a bit-parallel simulation engine for Verilator, the open-source Verilog simulator. The idea: instead of evaluating a circuit signal-by-signal, group thousands of independent 1-bit operations (AND, OR, XOR, NOT, MUX) into SIMD batches. One 64-bit word operation computes 64 independent logic…

















































