CPU scaling benchmark
workers
16 +1 main
iters total
500M
29411764/stream
elapsed
1182.63 ms
total CPU used
17425.66 ms
speedup
14.73×
vs serial
efficiency
86.6%
of 17× ideal
| stream | spawn ms | spawned@ | work start@ | work end@ | work ms | reap wait ms |
|---|---|---|---|---|---|---|
| 0 (main) | 0 | 63.91 | 63.92 | 1117.29 | 1053.37 | 0 |
| 1 | 2.09 | 2.1 | 15.86 | 1005.61 | 989.75 | 0.15 |
| 2 | 1.426 | 3.55 | 22.77 | 1158.69 | 1135.92 | 41.52 |
| 3 | 1.541 | 5.11 | 43.4 | 1116.62 | 1073.22 | 0.21 |
| 4 | 1.476 | 6.61 | 57.02 | 1177.08 | 1120.06 | 59.91 |
| 5 | 1.533 | 8.16 | 38.77 | 1125.87 | 1087.1 | 23.1 |
| 6 | 1.695 | 9.87 | 90.42 | 1010.36 | 919.94 | 22.96 |
| 7 | 1.501 | 11.39 | 65.22 | 1160.46 | 1095.24 | 46.73 |
| 8 | 1.929 | 13.34 | 93.71 | 1145.36 | 1051.65 | 28.16 |
| 9 | 1.612 | 14.98 | 68.84 | 1146.45 | 1077.61 | 31.03 |
| 10 | 1.534 | 16.53 | 50.67 | 1040.62 | 989.95 | 22.98 |
| 11 | 11.565 | 28.11 | 79.46 | 975.79 | 896.33 | 23 |
| 12 | 2.031 | 30.16 | 98.79 | 1179.57 | 1080.78 | 62.78 |
| 13 | 6.911 | 37.09 | 92.12 | 996.14 | 904.02 | 23.01 |
| 14 | 1.582 | 38.69 | 102.95 | 1113.09 | 1010.14 | 23.04 |
| 15 | 1.549 | 40.25 | 85.17 | 1067.11 | 981.94 | 23.05 |
| 16 | 23.641 | 63.91 | 120.81 | 1079.45 | 958.64 | 23.07 |
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
w11
w12
w13
w14
w15
w16
fork+handshake
CPU work
parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access,
no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1).
100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup,
reap tail, SMT/core contention.