CPU scaling benchmark
workers
16 +1 main
iters total
500M
29411764/stream
elapsed
1183.83 ms
total CPU used
17667.13 ms
speedup
14.92×
vs serial
efficiency
87.8%
of 17× ideal
| stream | spawn ms | spawned@ | work start@ | work end@ | work ms | reap wait ms |
|---|---|---|---|---|---|---|
| 0 (main) | 0 | 49.56 | 49.57 | 1085.69 | 1036.12 | 0 |
| 1 | 2.127 | 2.14 | 32.86 | 1170.36 | 1137.5 | 84.81 |
| 2 | 1.625 | 3.79 | 47.7 | 1125.14 | 1077.44 | 39.55 |
| 3 | 1.878 | 5.68 | 22.91 | 1067.51 | 1044.6 | 0.14 |
| 4 | 1.654 | 7.36 | 30.17 | 1034.68 | 1004.51 | 0.19 |
| 5 | 1.63 | 9.01 | 65.61 | 1152.59 | 1086.98 | 72.09 |
| 6 | 1.699 | 10.72 | 48.74 | 1046.73 | 997.99 | 0.21 |
| 7 | 1.657 | 12.4 | 51.95 | 1068.28 | 1016.33 | 0.22 |
| 8 | 1.67 | 14.09 | 54.64 | 1103.08 | 1048.44 | 17.48 |
| 9 | 1.676 | 15.79 | 78.54 | 1133.65 | 1055.11 | 53.2 |
| 10 | 1.605 | 17.41 | 59.02 | 1043.63 | 984.61 | 0.23 |
| 11 | 8.543 | 25.97 | 68.31 | 1089.05 | 1020.74 | 3.47 |
| 12 | 2.01 | 28 | 75.84 | 1079.31 | 1003.47 | 0.24 |
| 13 | 1.706 | 29.73 | 78.06 | 1095.56 | 1017.5 | 10 |
| 14 | 15.721 | 45.47 | 98.57 | 1158.26 | 1059.69 | 75.34 |
| 15 | 2.294 | 47.78 | 116.74 | 1181.05 | 1064.31 | 95.48 |
| 16 | 1.753 | 49.55 | 151.64 | 1163.43 | 1011.79 | 77.84 |
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
w11
w12
w13
w14
w15
w16
fork+handshake
CPU work
parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access,
no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1).
100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup,
reap tail, SMT/core contention.