CPU scaling benchmark

workers
10 +1 main
iters total
100M
9090909/stream
elapsed
274.46 ms
total CPU used
2164.52 ms
speedup
7.89×
vs serial
efficiency
71.7%
of 11× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 21.3 21.31 241.4 220.09 0
1 2.066 2.08 24.7 190.67 165.97 0.14
2 1.739 3.84 26.44 222.15 195.71 0.21
3 2.32 6.2 30.63 252.39 221.76 13.62
4 1.917 8.13 56.25 247.78 191.53 6.51
5 3.496 11.63 52.03 270.81 218.78 31.57
6 1.605 13.25 66.26 264.74 198.48 26.77
7 1.571 14.84 66.27 249.9 183.63 9.45
8 3.251 18.11 56.26 264.71 208.45 24.99
9 1.608 19.73 44.04 199.84 155.8 0.22
10 1.544 21.29 66.29 270.61 204.32 32.98
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.