CPU scaling benchmark

workers
10 +1 main
iters total
500M
45454545/stream
elapsed
1161.5 ms
total CPU used
11372.4 ms
speedup
9.79×
vs serial
efficiency
89%
of 11× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 22.46 22.47 1112.95 1090.48 0
1 2.097 2.11 14.74 877.23 862.49 0.12
2 1.607 3.74 48.33 1158.6 1110.27 45.76
3 1.563 5.32 44.54 1118.5 1073.96 8.29
4 1.632 6.97 59.91 984.88 924.97 0.18
5 1.535 8.52 48.02 1118.13 1070.11 8.25
6 1.664 10.2 33.33 1133.05 1099.72 22.56
7 1.503 11.71 49.9 1126.48 1076.58 25.54
8 4.226 15.95 69.36 1140.72 1071.36 27.91
9 1.736 17.7 45.59 1023.76 978.17 0.19
10 4.737 22.45 63.11 1077.4 1014.29 0.21
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.