CPU scaling benchmark

workers
9 +1 main
iters total
500M
50000000/stream
elapsed
1177.8 ms
total CPU used
10016.51 ms
speedup
8.5×
vs serial
efficiency
85%
of 10× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 36.09 36.1 1044.44 1008.34 0
1 2.044 2.06 15.59 936.86 921.27 0.16
2 1.476 3.56 21.98 1133.46 1111.48 89.16
3 2.099 5.67 16.7 957.17 940.47 0.22
4 2.072 7.75 53.29 831.85 778.56 0.23
5 3.539 11.3 49.19 1070.2 1021.01 28.8
6 1.443 12.76 47 1147.05 1100.05 102.75
7 1.409 14.18 45.57 1107.23 1061.66 65.83
8 10.426 24.62 75.57 1057.6 982.03 16.06
9 11.446 36.08 82.91 1174.55 1091.64 130.28
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.