CPU scaling benchmark
workers
16 +1 main
iters total
500M
29411764/stream
elapsed
1187.66 ms
total CPU used
17430.9 ms
speedup
14.68×
vs serial
efficiency
86.4%
of 17× ideal
| stream | spawn ms | spawned@ | work start@ | work end@ | work ms | reap wait ms |
|---|---|---|---|---|---|---|
| 0 (main) | 0 | 84.05 | 84.06 | 1076.03 | 991.97 | 0 |
| 1 | 2.052 | 2.07 | 14.74 | 919.1 | 904.36 | 0.16 |
| 2 | 1.543 | 3.63 | 17.07 | 1119.07 | 1102 | 47.7 |
| 3 | 2.05 | 5.72 | 20.06 | 1113.47 | 1093.41 | 37.52 |
| 4 | 2.01 | 7.75 | 61.54 | 1062.21 | 1000.67 | 0.21 |
| 5 | 5.822 | 13.59 | 68.48 | 1183.9 | 1115.42 | 108.02 |
| 6 | 2.155 | 15.76 | 45.4 | 1164 | 1118.6 | 88.33 |
| 7 | 2.017 | 17.8 | 63.62 | 1143.05 | 1079.43 | 72.47 |
| 8 | 5.684 | 23.5 | 74.8 | 1142.13 | 1067.33 | 66.22 |
| 9 | 1.627 | 25.14 | 48.47 | 1161.59 | 1113.12 | 92.18 |
| 10 | 1.504 | 26.67 | 79.95 | 1103.75 | 1023.8 | 35.79 |
| 11 | 4.3 | 30.99 | 78.34 | 1154.51 | 1076.17 | 81.16 |
| 12 | 18.821 | 49.83 | 112.43 | 992.89 | 880.46 | 22.14 |
| 13 | 2.139 | 51.99 | 110.69 | 1085.28 | 974.59 | 22.21 |
| 14 | 26.459 | 78.46 | 109.76 | 1096.64 | 986.88 | 22.22 |
| 15 | 3.572 | 82.05 | 158.22 | 1184.52 | 1026.3 | 110.88 |
| 16 | 1.974 | 84.03 | 158.22 | 1034.61 | 876.39 | 22.16 |
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
w11
w12
w13
w14
w15
w16
fork+handshake
CPU work
parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access,
no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1).
100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup,
reap tail, SMT/core contention.