CPU scaling benchmark

workers
6 +1 main
iters total
500M
71428571/stream
elapsed
1158.75 ms
total CPU used
7035.96 ms
speedup
6.07×
vs serial
efficiency
86.7%
of 7× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 10.13 10.14 1037.15 1027.01 0
1 2.131 2.14 19.86 996.01 976.15 0.11
2 1.656 3.82 28.74 1155.71 1126.97 118.93
3 1.534 5.37 22.05 733.05 711 0.17
4 1.557 6.94 38.59 1015.45 976.86 0.19
5 1.591 8.55 38.56 1153.17 1114.61 116.12
6 1.559 10.12 34.61 1137.97 1103.36 100.94
main
w1
w2
w3
w4
w5
w6
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.