CPU scaling benchmark

workers
10 +1 main
iters total
100M
9090909/stream
elapsed
278.17 ms
total CPU used
2103.75 ms
speedup
7.56×
vs serial
efficiency
68.7%
of 11× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 28.16 28.17 277.77 249.6 0
1 2.102 2.12 20.14 270.09 249.95 0.14
2 1.809 3.95 18.53 242.89 224.36 0.2
3 1.714 5.68 27.46 200.3 172.84 0.22
4 2.849 8.55 26.01 156.73 130.72 0.25
5 2.149 10.72 48.73 234.68 185.95 0.26
6 2.09 12.82 49.72 196.98 147.26 0.28
7 3.682 16.52 66.52 270.64 204.12 0.3
8 1.564 18.1 45.63 216.09 170.46 0.31
9 8.364 26.47 86.5 269.49 182.99 0.33
10 1.665 28.15 69.99 255.49 185.5 0.35
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.