CPU scaling benchmark
workers
9 +1 main
iters total
500M
50000000/stream
elapsed
1154.69 ms
total CPU used
10155.52 ms
speedup
8.8×
vs serial
efficiency
88%
of 10× ideal
| stream | spawn ms | spawned@ | work start@ | work end@ | work ms | reap wait ms |
|---|---|---|---|---|---|---|
| 0 (main) | 0 | 14.55 | 14.57 | 1130.75 | 1116.18 | 0 |
| 1 | 2.005 | 2.02 | 23.37 | 956.22 | 932.85 | 0.12 |
| 2 | 1.557 | 3.6 | 18.81 | 1123.68 | 1104.87 | 0.18 |
| 3 | 1.558 | 5.18 | 35.36 | 890.15 | 854.79 | 0.19 |
| 4 | 1.524 | 6.72 | 45.22 | 936.86 | 891.64 | 0.2 |
| 5 | 1.496 | 8.23 | 35.54 | 1124.43 | 1088.89 | 0.21 |
| 6 | 1.456 | 9.71 | 58.74 | 1108.47 | 1049.73 | 0.22 |
| 7 | 1.425 | 11.14 | 45.27 | 1046.95 | 1001.68 | 0.24 |
| 8 | 1.836 | 13 | 63.25 | 1151.69 | 1088.44 | 21.09 |
| 9 | 1.521 | 14.54 | 46.4 | 1072.85 | 1026.45 | 0.25 |
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
fork+handshake
CPU work
parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access,
no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1).
100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup,
reap tail, SMT/core contention.