CPU scaling benchmark
workers
12 +1 main
iters total
500M
38461538/stream
elapsed
1166.51 ms
total CPU used
13678.85 ms
speedup
11.73×
vs serial
efficiency
90.2%
of 13× ideal
| stream | spawn ms | spawned@ | work start@ | work end@ | work ms | reap wait ms |
|---|---|---|---|---|---|---|
| 0 (main) | 0 | 35.21 | 35.21 | 1021.36 | 986.15 | 0 |
| 1 | 1.843 | 1.85 | 35.42 | 1132.94 | 1097.52 | 114.27 |
| 2 | 1.568 | 3.44 | 26.71 | 1136.89 | 1110.18 | 115.68 |
| 3 | 1.532 | 4.98 | 28.36 | 1118.69 | 1090.33 | 100.33 |
| 4 | 1.538 | 6.53 | 29.35 | 1091.11 | 1061.76 | 77.21 |
| 5 | 1.575 | 8.12 | 35.32 | 1140.85 | 1105.53 | 120.03 |
| 6 | 1.659 | 9.79 | 41.19 | 1163.33 | 1122.14 | 142.14 |
| 7 | 1.574 | 11.37 | 64.74 | 1043.32 | 978.58 | 38.7 |
| 8 | 1.567 | 12.95 | 62.33 | 1057.31 | 994.98 | 38.76 |
| 9 | 3.652 | 16.61 | 49.47 | 1107.55 | 1058.08 | 86.37 |
| 10 | 1.453 | 18.07 | 82.41 | 1145.14 | 1062.73 | 123.93 |
| 11 | 15.127 | 33.21 | 66.12 | 1062.17 | 996.05 | 51.09 |
| 12 | 1.974 | 35.2 | 82.39 | 1097.21 | 1014.82 | 80.95 |
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
w11
w12
fork+handshake
CPU work
parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access,
no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1).
100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup,
reap tail, SMT/core contention.