CPU scaling benchmark
workers
10 +1 main
iters total
500M
45454545/stream
elapsed
1161.5 ms
total CPU used
11372.4 ms
speedup
9.79×
vs serial
efficiency
89%
of 11× ideal
| stream | spawn ms | spawned@ | work start@ | work end@ | work ms | reap wait ms |
|---|---|---|---|---|---|---|
| 0 (main) | 0 | 22.46 | 22.47 | 1112.95 | 1090.48 | 0 |
| 1 | 2.097 | 2.11 | 14.74 | 877.23 | 862.49 | 0.12 |
| 2 | 1.607 | 3.74 | 48.33 | 1158.6 | 1110.27 | 45.76 |
| 3 | 1.563 | 5.32 | 44.54 | 1118.5 | 1073.96 | 8.29 |
| 4 | 1.632 | 6.97 | 59.91 | 984.88 | 924.97 | 0.18 |
| 5 | 1.535 | 8.52 | 48.02 | 1118.13 | 1070.11 | 8.25 |
| 6 | 1.664 | 10.2 | 33.33 | 1133.05 | 1099.72 | 22.56 |
| 7 | 1.503 | 11.71 | 49.9 | 1126.48 | 1076.58 | 25.54 |
| 8 | 4.226 | 15.95 | 69.36 | 1140.72 | 1071.36 | 27.91 |
| 9 | 1.736 | 17.7 | 45.59 | 1023.76 | 978.17 | 0.19 |
| 10 | 4.737 | 22.45 | 63.11 | 1077.4 | 1014.29 | 0.21 |
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
fork+handshake
CPU work
parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access,
no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1).
100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup,
reap tail, SMT/core contention.