CPU scaling benchmark

workers
12 +1 main
iters total
100M
7692307/stream
elapsed
285.02 ms
total CPU used
2302.53 ms
speedup
8.08×
vs serial
efficiency
62.2%
of 13× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 54.63 54.63 249.8 195.17 0
1 4.397 4.41 15.7 144.26 128.56 0.29
2 1.743 6.17 20.3 182.21 161.91 0.37
3 1.55 7.74 29.3 257.57 228.27 7.85
4 5.154 12.91 65.84 280.38 214.54 32.19
5 1.461 14.39 77.42 279.58 202.16 32.13
6 6.272 20.68 45.84 169.71 123.87 0.4
7 1.715 22.41 87.44 223.39 135.95 0.41
8 1.333 23.76 57.65 248.72 191.07 0.42
9 1.364 25.14 64.72 268.39 203.67 18.77
10 4.808 29.98 83.83 281.16 197.33 35.11
11 1.414 31.42 82.67 211.05 128.38 6.06
12 23.183 54.62 90.16 281.81 191.65 35.13
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
w11
w12
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.