CPU scaling benchmark

workers
9 +1 main
iters total
100M
10000000/stream
elapsed
274.88 ms
total CPU used
1893.12 ms
speedup
6.89×
vs serial
efficiency
68.9%
of 10× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 41.39 41.4 260.17 218.77 0
1 2.345 2.36 15.88 169.88 154 0.15
2 1.944 4.33 17.81 245.45 227.64 0.21
3 1.939 6.29 19.05 200.44 181.39 0.22
4 1.916 8.22 49.44 253.33 203.89 0.24
5 9.372 17.6 64.9 272.37 207.47 12.3
6 2.054 19.71 48.67 268.9 220.23 8.85
7 1.735 21.46 58.68 206.74 148.06 0.29
8 14.992 36.46 64.65 220.2 155.55 0.31
9 4.899 41.38 81.86 257.98 176.12 0.32
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.