CPU scaling benchmark
workers
16 +1 main
iters total
500M
29411764/stream
elapsed
1209.99 ms
total CPU used
16671.39 ms
speedup
13.78×
vs serial
efficiency
81.1%
of 17× ideal
| stream | spawn ms | spawned@ | work start@ | work end@ | work ms | reap wait ms |
|---|---|---|---|---|---|---|
| 0 (main) | 0 | 116.8 | 116.81 | 1132.91 | 1016.1 | 0 |
| 1 | 2.114 | 2.13 | 20.55 | 1164.65 | 1144.1 | 31.85 |
| 2 | 1.59 | 3.74 | 18.04 | 1071.46 | 1053.42 | 0.16 |
| 3 | 1.578 | 5.34 | 24.59 | 1160.93 | 1136.34 | 28.14 |
| 4 | 5.487 | 10.85 | 36.7 | 1178.37 | 1141.67 | 49.62 |
| 5 | 1.707 | 12.58 | 53.65 | 1199.64 | 1145.99 | 66.84 |
| 6 | 1.472 | 14.07 | 45.45 | 1143.33 | 1097.88 | 13.75 |
| 7 | 1.468 | 15.56 | 49.87 | 1075.51 | 1025.64 | 0.23 |
| 8 | 8.94 | 24.52 | 68.93 | 748.21 | 679.28 | 0.24 |
| 9 | 9.66 | 34.21 | 91.37 | 1206.07 | 1114.7 | 73.28 |
| 10 | 9.263 | 43.49 | 99.11 | 1206.46 | 1107.35 | 76.25 |
| 11 | 2.015 | 45.52 | 96.49 | 1177.18 | 1080.69 | 50.95 |
| 12 | 12.118 | 57.66 | 134.68 | 994.47 | 859.79 | 0.25 |
| 13 | 2.187 | 59.87 | 99.35 | 873.67 | 774.32 | 0.27 |
| 14 | 28.281 | 88.17 | 131.17 | 1105.58 | 974.41 | 0.29 |
| 15 | 19.059 | 107.25 | 140.12 | 562.99 | 422.87 | 0.32 |
| 16 | 9.503 | 116.77 | 166.67 | 1063.51 | 896.84 | 0.33 |
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
w11
w12
w13
w14
w15
w16
fork+handshake
CPU work
parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access,
no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1).
100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup,
reap tail, SMT/core contention.