CPU scaling benchmark

workers
9 +1 main
iters total
500M
50000000/stream
elapsed
1154.69 ms
total CPU used
10155.52 ms
speedup
8.8×
vs serial
efficiency
88%
of 10× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 14.55 14.57 1130.75 1116.18 0
1 2.005 2.02 23.37 956.22 932.85 0.12
2 1.557 3.6 18.81 1123.68 1104.87 0.18
3 1.558 5.18 35.36 890.15 854.79 0.19
4 1.524 6.72 45.22 936.86 891.64 0.2
5 1.496 8.23 35.54 1124.43 1088.89 0.21
6 1.456 9.71 58.74 1108.47 1049.73 0.22
7 1.425 11.14 45.27 1046.95 1001.68 0.24
8 1.836 13 63.25 1151.69 1088.44 21.09
9 1.521 14.54 46.4 1072.85 1026.45 0.25
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.