CPU scaling benchmark

workers
4 +1 main
iters total
500M
100000000/stream
elapsed
1340.92 ms
total CPU used
5226.58 ms
speedup
3.9×
vs serial
efficiency
78%
of 5× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 7.9 7.92 1172.23 1164.31 0
1 2.118 2.13 14.51 889.81 875.3 0.12
2 1.613 3.77 15.92 914.63 898.71 0.19
3 2.592 6.38 17.23 1016.47 999.24 0.21
4 1.501 7.9 48.75 1337.77 1289.02 165.65
main
w1
w2
w3
w4
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.