CPU scaling benchmark
workers
9 +1 main
iters total
500M
50000000/stream
elapsed
1177.28 ms
total CPU used
10274.23 ms
speedup
8.73×
vs serial
efficiency
87.3%
of 10× ideal
| stream | spawn ms | spawned@ | work start@ | work end@ | work ms | reap wait ms |
|---|---|---|---|---|---|---|
| 0 (main) | 0 | 24.83 | 24.84 | 1155.04 | 1130.2 | 0 |
| 1 | 2.36 | 2.38 | 20.21 | 1008.56 | 988.35 | 0.14 |
| 2 | 1.735 | 4.14 | 21.09 | 963.75 | 942.66 | 0.22 |
| 3 | 1.772 | 5.93 | 52.49 | 1017.39 | 964.9 | 0.23 |
| 4 | 3.232 | 9.18 | 52.43 | 1158.96 | 1106.53 | 4.03 |
| 5 | 3.022 | 12.21 | 59.59 | 1060.6 | 1001.01 | 0.24 |
| 6 | 1.813 | 14.04 | 74.78 | 1097.11 | 1022.33 | 0.26 |
| 7 | 1.698 | 15.76 | 59.66 | 1051.7 | 992.04 | 0.27 |
| 8 | 6.91 | 22.68 | 58.05 | 1090.46 | 1032.41 | 0.29 |
| 9 | 2.118 | 24.81 | 79.66 | 1173.46 | 1093.8 | 18.56 |
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
fork+handshake
CPU work
parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access,
no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1).
100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup,
reap tail, SMT/core contention.