CPU scaling benchmark

workers
6 +1 main
iters total
500M
71428571/stream
elapsed
1150.46 ms
total CPU used
7099.69 ms
speedup
6.17×
vs serial
efficiency
88.1%
of 7× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 14.2 14.21 1089.98 1075.77 0
1 1.962 1.98 21.18 992.81 971.63 0.1
2 1.709 3.71 18.54 942.05 923.51 0.17
3 2.364 6.09 20.94 1129.36 1108.42 39.51
4 1.924 8.04 42.64 1147.58 1104.94 57.74
5 4.645 12.7 47.08 885.4 838.32 0.19
6 1.467 14.18 46.31 1123.41 1077.1 33.58
main
w1
w2
w3
w4
w5
w6
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.