CPU scaling benchmark
workers
16 +1 main
iters total
500M
29411764/stream
elapsed
1214.45 ms
total CPU used
18095.68 ms
speedup
14.9×
vs serial
efficiency
87.6%
of 17× ideal
| stream | spawn ms | spawned@ | work start@ | work end@ | work ms | reap wait ms |
|---|---|---|---|---|---|---|
| 0 (main) | 0 | 76.85 | 76.86 | 1139.03 | 1062.17 | 0 |
| 1 | 1.906 | 1.92 | 17.33 | 1091.15 | 1073.82 | 0.19 |
| 2 | 1.553 | 3.5 | 22.22 | 1089.82 | 1067.6 | 0.24 |
| 3 | 1.527 | 5.05 | 19.5 | 1117.52 | 1098.02 | 0.26 |
| 4 | 2.759 | 7.82 | 29.66 | 1168.59 | 1138.93 | 39.44 |
| 5 | 2.664 | 10.5 | 42.14 | 1130.84 | 1088.7 | 0.27 |
| 6 | 8.421 | 18.94 | 45.41 | 1048.76 | 1003.35 | 39.33 |
| 7 | 1.764 | 20.73 | 60.93 | 1154.22 | 1093.29 | 39.45 |
| 8 | 11.781 | 32.53 | 82.16 | 1202.17 | 1120.01 | 63.23 |
| 9 | 2.028 | 34.58 | 92.13 | 1128.9 | 1036.77 | 39.35 |
| 10 | 1.675 | 36.27 | 100.24 | 1186.45 | 1086.21 | 47.54 |
| 11 | 10.406 | 46.7 | 96.14 | 1193.04 | 1096.9 | 58.68 |
| 12 | 2.22 | 48.93 | 97.45 | 1113.95 | 1016.5 | 39.37 |
| 13 | 1.761 | 50.71 | 112.19 | 1151.63 | 1039.44 | 39.46 |
| 14 | 21.73 | 72.47 | 152.12 | 1211.39 | 1059.27 | 72.47 |
| 15 | 2.295 | 74.79 | 165.83 | 1198.45 | 1032.62 | 59.57 |
| 16 | 2.021 | 76.83 | 142.9 | 1124.98 | 982.08 | 39.38 |
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
w11
w12
w13
w14
w15
w16
fork+handshake
CPU work
parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access,
no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1).
100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup,
reap tail, SMT/core contention.