CPU scaling benchmark
workers
16 +1 main
iters total
500M
29411764/stream
elapsed
1199.94 ms
total CPU used
16411.19 ms
speedup
13.68×
vs serial
efficiency
80.5%
of 17× ideal
| stream | spawn ms | spawned@ | work start@ | work end@ | work ms | reap wait ms |
|---|---|---|---|---|---|---|
| 0 (main) | 0 | 120.92 | 120.92 | 1196.07 | 1075.15 | 0 |
| 1 | 2.361 | 2.38 | 18.26 | 1029.23 | 1010.97 | 0.18 |
| 2 | 1.684 | 4.08 | 32.29 | 892.2 | 859.91 | 0.25 |
| 3 | 1.793 | 5.89 | 29.68 | 1050.51 | 1020.83 | 0.27 |
| 4 | 5.282 | 11.18 | 57.9 | 1178.41 | 1120.51 | 0.28 |
| 5 | 1.734 | 12.93 | 43.98 | 1002.58 | 958.6 | 0.29 |
| 6 | 1.76 | 14.7 | 72.02 | 1188 | 1115.98 | 0.31 |
| 7 | 4.861 | 19.58 | 65.16 | 1065.85 | 1000.69 | 0.33 |
| 8 | 1.79 | 21.38 | 57.55 | 952.65 | 895.1 | 0.34 |
| 9 | 7.777 | 29.17 | 95.19 | 1135.67 | 1040.48 | 0.35 |
| 10 | 2.355 | 31.54 | 83.79 | 605.67 | 521.88 | 0.36 |
| 11 | 1.801 | 33.36 | 85.1 | 948.35 | 863.25 | 0.37 |
| 12 | 32.384 | 65.77 | 136.47 | 1196.95 | 1060.48 | 0.99 |
| 13 | 2.611 | 68.4 | 155.09 | 1168.38 | 1013.29 | 0.38 |
| 14 | 1.902 | 70.31 | 166.75 | 1160.82 | 994.07 | 0.4 |
| 15 | 35.047 | 105.38 | 166.66 | 1122.27 | 955.61 | 0.42 |
| 16 | 15.518 | 120.91 | 225.07 | 1129.46 | 904.39 | 0.44 |
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
w11
w12
w13
w14
w15
w16
fork+handshake
CPU work
parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access,
no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1).
100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup,
reap tail, SMT/core contention.