CPU scaling benchmark
workers
12 +1 main
iters total
100M
7692307/stream
elapsed
285.02 ms
total CPU used
2302.53 ms
speedup
8.08×
vs serial
efficiency
62.2%
of 13× ideal
| stream | spawn ms | spawned@ | work start@ | work end@ | work ms | reap wait ms |
|---|---|---|---|---|---|---|
| 0 (main) | 0 | 54.63 | 54.63 | 249.8 | 195.17 | 0 |
| 1 | 4.397 | 4.41 | 15.7 | 144.26 | 128.56 | 0.29 |
| 2 | 1.743 | 6.17 | 20.3 | 182.21 | 161.91 | 0.37 |
| 3 | 1.55 | 7.74 | 29.3 | 257.57 | 228.27 | 7.85 |
| 4 | 5.154 | 12.91 | 65.84 | 280.38 | 214.54 | 32.19 |
| 5 | 1.461 | 14.39 | 77.42 | 279.58 | 202.16 | 32.13 |
| 6 | 6.272 | 20.68 | 45.84 | 169.71 | 123.87 | 0.4 |
| 7 | 1.715 | 22.41 | 87.44 | 223.39 | 135.95 | 0.41 |
| 8 | 1.333 | 23.76 | 57.65 | 248.72 | 191.07 | 0.42 |
| 9 | 1.364 | 25.14 | 64.72 | 268.39 | 203.67 | 18.77 |
| 10 | 4.808 | 29.98 | 83.83 | 281.16 | 197.33 | 35.11 |
| 11 | 1.414 | 31.42 | 82.67 | 211.05 | 128.38 | 6.06 |
| 12 | 23.183 | 54.62 | 90.16 | 281.81 | 191.65 | 35.13 |
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
w11
w12
fork+handshake
CPU work
parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access,
no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1).
100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup,
reap tail, SMT/core contention.