CPU scaling benchmark
workers
6 +1 main
iters total
100M
14285714/stream
elapsed
266.13 ms
total CPU used
1279.98 ms
speedup
4.81×
vs serial
efficiency
68.7%
of 7× ideal
| stream | spawn ms | spawned@ | work start@ | work end@ | work ms | reap wait ms |
|---|---|---|---|---|---|---|
| 0 (main) | 0 | 12.42 | 12.43 | 140.26 | 127.83 | 0 |
| 1 | 2.321 | 2.34 | 15.91 | 250.14 | 234.23 | 110.02 |
| 2 | 2.015 | 4.38 | 26.07 | 233.48 | 207.41 | 96.23 |
| 3 | 1.882 | 6.28 | 25.74 | 151.63 | 125.89 | 14.26 |
| 4 | 2.346 | 8.65 | 55.11 | 234.32 | 179.21 | 97.66 |
| 5 | 1.816 | 10.51 | 36.55 | 263.16 | 226.61 | 123.02 |
| 6 | 1.877 | 12.4 | 68.51 | 247.31 | 178.8 | 107.19 |
main
w1
w2
w3
w4
w5
w6
fork+handshake
CPU work
parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access,
no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1).
100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup,
reap tail, SMT/core contention.