CPU scaling benchmark

workers
9 +1 main
iters total
100M
10000000/stream
elapsed
282.02 ms
total CPU used
1943.07 ms
speedup
6.89×
vs serial
efficiency
68.9%
of 10× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 39.59 39.6 270.29 230.69 0
1 2.199 2.22 15.39 187.47 172.08 0.13
2 1.761 4 16.84 229.89 213.05 0.19
3 1.783 5.8 17.94 197.07 179.13 0.21
4 1.822 7.64 48.3 259.22 210.92 0.23
5 8.485 16.15 46.07 266.92 220.85 0.25
6 1.889 18.06 46.09 207.46 161.37 0.31
7 1.559 19.63 48.66 210.8 162.14 0.33
8 8.285 27.94 76.04 277.67 201.63 9.23
9 11.621 39.58 85.3 276.51 191.21 6.35
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.