CPU scaling benchmark
workers
16 +1 main
iters total
500M
29411764/stream
elapsed
1179.75 ms
total CPU used
17300.86 ms
speedup
14.66×
vs serial
efficiency
86.2%
of 17× ideal
| stream | spawn ms | spawned@ | work start@ | work end@ | work ms | reap wait ms |
|---|---|---|---|---|---|---|
| 0 (main) | 0 | 80.72 | 80.72 | 1134.03 | 1053.31 | 0 |
| 1 | 2.002 | 2.01 | 14.66 | 1100.27 | 1085.61 | 0.15 |
| 2 | 1.357 | 3.38 | 26.51 | 1049.32 | 1022.81 | 0.21 |
| 3 | 1.423 | 4.82 | 20.17 | 983.28 | 963.11 | 0.22 |
| 4 | 1.495 | 6.34 | 29.23 | 1132.09 | 1102.86 | 0.23 |
| 5 | 4.073 | 10.43 | 40.25 | 1080.84 | 1040.59 | 6.26 |
| 6 | 1.387 | 11.83 | 54.88 | 863.11 | 808.23 | 6.29 |
| 7 | 8.49 | 20.33 | 61.43 | 1085.43 | 1024 | 6.31 |
| 8 | 1.594 | 21.94 | 64.48 | 1000.88 | 936.4 | 6.32 |
| 9 | 7.147 | 29.1 | 67.45 | 1171.36 | 1103.91 | 37.5 |
| 10 | 1.511 | 30.62 | 75.52 | 1151.8 | 1076.28 | 17.86 |
| 11 | 16.03 | 46.67 | 87.71 | 1176.59 | 1088.88 | 44.36 |
| 12 | 1.741 | 48.42 | 78.78 | 990.55 | 911.77 | 6.33 |
| 13 | 1.451 | 49.89 | 134.54 | 1175.62 | 1041.08 | 42.07 |
| 14 | 22.922 | 72.82 | 126.5 | 1166.67 | 1040.17 | 37.42 |
| 15 | 1.551 | 74.4 | 143.09 | 1139.99 | 996.9 | 6.36 |
| 16 | 6.294 | 80.7 | 133.52 | 1138.47 | 1004.95 | 8.97 |
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
w11
w12
w13
w14
w15
w16
fork+handshake
CPU work
parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access,
no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1).
100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup,
reap tail, SMT/core contention.