CPU scaling benchmark
workers
16 +1 main
iters total
500M
29411764/stream
elapsed
1181.27 ms
total CPU used
17941.67 ms
speedup
15.19×
vs serial
efficiency
89.4%
of 17× ideal
| stream | spawn ms | spawned@ | work start@ | work end@ | work ms | reap wait ms |
|---|---|---|---|---|---|---|
| 0 (main) | 0 | 67.42 | 67.44 | 1021.57 | 954.13 | 0 |
| 1 | 2.049 | 2.07 | 15.92 | 1107.59 | 1091.67 | 112.16 |
| 2 | 1.495 | 3.58 | 18.17 | 1067.5 | 1049.33 | 48.51 |
| 3 | 1.504 | 5.11 | 24.3 | 1107.05 | 1082.75 | 112.18 |
| 4 | 2.142 | 7.27 | 31.9 | 1138.12 | 1106.22 | 119.53 |
| 5 | 4.712 | 11.99 | 55.55 | 1153.13 | 1097.58 | 140.43 |
| 6 | 1.579 | 13.59 | 47.54 | 1135.86 | 1088.32 | 128.29 |
| 7 | 5.338 | 18.94 | 72.72 | 1162.79 | 1090.07 | 142.37 |
| 8 | 1.646 | 20.61 | 55.04 | 1164.31 | 1109.27 | 143.78 |
| 9 | 4.209 | 24.83 | 49.92 | 1107.25 | 1057.33 | 119.5 |
| 10 | 1.593 | 26.44 | 69.89 | 1149 | 1079.11 | 140.45 |
| 11 | 21.15 | 47.61 | 89.53 | 1169.19 | 1079.66 | 147.73 |
| 12 | 2.144 | 49.77 | 101.92 | 1099.84 | 997.92 | 79.72 |
| 13 | 1.527 | 51.31 | 121.89 | 1093.25 | 971.36 | 112.11 |
| 14 | 12.594 | 63.92 | 121.91 | 1144.88 | 1022.97 | 142.33 |
| 15 | 1.958 | 65.89 | 141.57 | 1178.36 | 1036.79 | 156.95 |
| 16 | 1.507 | 67.41 | 131.88 | 1159.07 | 1027.19 | 142.35 |
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
w11
w12
w13
w14
w15
w16
fork+handshake
CPU work
parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access,
no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1).
100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup,
reap tail, SMT/core contention.