CPU scaling benchmark

workers
6 +1 main
iters total
500M
71428571/stream
elapsed
1137.38 ms
total CPU used
7072.94 ms
speedup
6.22×
vs serial
efficiency
88.9%
of 7× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 12 12.01 957.7 945.69 0
1 1.962 1.98 18.42 875.15 856.73 0.12
2 1.451 3.45 15.31 1096.45 1081.14 141.86
3 1.501 4.96 21.18 1127.14 1105.96 169.57
4 4.077 9.05 31.38 1109.77 1078.39 153.6
5 1.505 10.57 36.46 943.66 907.2 0.17
6 1.407 11.99 36.76 1134.59 1097.83 177.02
main
w1
w2
w3
w4
w5
w6
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.