CPU scaling benchmark

workers
9 +1 main
iters total
500M
50000000/stream
elapsed
1149.1 ms
total CPU used
10249.18 ms
speedup
8.92×
vs serial
efficiency
89.2%
of 10× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 16.44 16.45 987.84 971.39 0
1 1.878 1.89 19.7 1030.52 1010.82 42.79
2 1.455 3.37 26.19 1130.62 1104.43 145.08
3 1.447 4.84 38.7 1045.02 1006.32 57.3
4 1.423 6.28 34.56 1083.31 1048.75 99.8
5 1.413 7.71 43.09 1146.28 1103.19 158.54
6 1.421 9.15 46.23 1121.25 1075.02 135.49
7 4.311 13.47 56.22 874.43 818.21 0.11
8 1.475 14.96 37.72 1084.62 1046.9 99.9
9 1.46 16.44 64.12 1128.27 1064.15 140.54
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.