CPU scaling benchmark
workers
16 +1 main
iters total
500M
29411764/stream
elapsed
1207.95 ms
total CPU used
17957.43 ms
speedup
14.87×
vs serial
efficiency
87.5%
of 17× ideal
| stream | spawn ms | spawned@ | work start@ | work end@ | work ms | reap wait ms |
|---|---|---|---|---|---|---|
| 0 (main) | 0 | 89.95 | 89.96 | 1130.07 | 1040.11 | 0 |
| 1 | 1.885 | 1.9 | 15.07 | 952.81 | 937.74 | 0.14 |
| 2 | 1.507 | 3.42 | 18.26 | 1121.77 | 1103.51 | 0.19 |
| 3 | 1.584 | 5.02 | 26.72 | 1124.53 | 1097.81 | 0.21 |
| 4 | 1.498 | 6.53 | 33.22 | 1105.38 | 1072.16 | 0.22 |
| 5 | 1.48 | 8.03 | 51.88 | 1109.95 | 1058.07 | 0.23 |
| 6 | 5.835 | 13.87 | 51 | 1181.26 | 1130.26 | 54.46 |
| 7 | 1.409 | 15.3 | 43.28 | 1180.65 | 1137.37 | 54.5 |
| 8 | 9.799 | 25.11 | 68.59 | 1150.29 | 1081.7 | 28.18 |
| 9 | 2.314 | 27.44 | 57.88 | 1046.03 | 988.15 | 0.24 |
| 10 | 8.759 | 36.21 | 74.19 | 1164.8 | 1090.61 | 34.85 |
| 11 | 9.432 | 45.67 | 98.68 | 1170.35 | 1071.67 | 40.38 |
| 12 | 1.94 | 47.62 | 107.1 | 1094.66 | 987.56 | 0.26 |
| 13 | 12.458 | 60.1 | 117.69 | 1147.64 | 1029.95 | 17.7 |
| 14 | 1.947 | 62.06 | 118.64 | 1136.68 | 1018.04 | 8.45 |
| 15 | 1.464 | 63.58 | 116.44 | 1182.87 | 1066.43 | 54.52 |
| 16 | 26.351 | 89.94 | 158.65 | 1204.94 | 1046.29 | 75.02 |
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
w11
w12
w13
w14
w15
w16
fork+handshake
CPU work
parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access,
no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1).
100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup,
reap tail, SMT/core contention.