CPU scaling benchmark
workers
8 +1 main
iters total
100M
11111111/stream
elapsed
271.51 ms
total CPU used
1810.88 ms
speedup
6.67×
vs serial
efficiency
74.1%
of 9× ideal
| stream | spawn ms | spawned@ | work start@ | work end@ | work ms | reap wait ms |
|---|---|---|---|---|---|---|
| 0 (main) | 0 | 28.61 | 28.61 | 266.61 | 238 | 0 |
| 1 | 2.22 | 2.24 | 16.89 | 235.19 | 218.3 | 0.16 |
| 2 | 1.954 | 4.21 | 34.29 | 226.88 | 192.59 | 0.23 |
| 3 | 1.818 | 6.05 | 24.64 | 203.38 | 178.74 | 0.24 |
| 4 | 1.806 | 7.88 | 36.87 | 260.44 | 223.57 | 0.26 |
| 5 | 2.306 | 10.2 | 42.21 | 246.6 | 204.39 | 0.27 |
| 6 | 2.05 | 12.27 | 42.56 | 237.13 | 194.57 | 0.29 |
| 7 | 1.823 | 14.11 | 70.4 | 231.51 | 161.11 | 0.33 |
| 8 | 14.465 | 28.59 | 68.81 | 268.42 | 199.61 | 1.92 |
main
w1
w2
w3
w4
w5
w6
w7
w8
fork+handshake
CPU work
parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access,
no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1).
100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup,
reap tail, SMT/core contention.