CPU scaling benchmark
workers
12 +1 main
iters total
100M
7692307/stream
elapsed
290.89 ms
total CPU used
2545.12 ms
speedup
8.75×
vs serial
efficiency
67.3%
of 13× ideal
| stream | spawn ms | spawned@ | work start@ | work end@ | work ms | reap wait ms |
|---|---|---|---|---|---|---|
| 0 (main) | 0 | 35.4 | 35.41 | 249.9 | 214.49 | 0 |
| 1 | 2.101 | 2.12 | 19.88 | 230.49 | 210.61 | 0.16 |
| 2 | 1.657 | 3.8 | 19.85 | 272.95 | 253.1 | 25.66 |
| 3 | 2.833 | 6.65 | 31.74 | 251.4 | 219.66 | 15.5 |
| 4 | 2.013 | 8.68 | 50.21 | 264.5 | 214.29 | 15.51 |
| 5 | 1.574 | 10.27 | 55.25 | 236.64 | 181.39 | 15.37 |
| 6 | 1.586 | 11.87 | 55.32 | 253.55 | 198.23 | 17.54 |
| 7 | 1.702 | 13.6 | 66.6 | 257.09 | 190.49 | 17.57 |
| 8 | 7.692 | 21.31 | 65.02 | 272.35 | 207.33 | 25.3 |
| 9 | 2.04 | 23.36 | 75.26 | 281.34 | 206.08 | 31.6 |
| 10 | 1.725 | 25.11 | 95.26 | 288.01 | 192.75 | 38.22 |
| 11 | 2.963 | 28.08 | 75.24 | 203.29 | 128.05 | 15.42 |
| 12 | 7.285 | 35.39 | 65.27 | 193.92 | 128.65 | 15.46 |
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
w11
w12
fork+handshake
CPU work
parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access,
no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1).
100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup,
reap tail, SMT/core contention.