CPU scaling benchmark
workers
8 +1 main
iters total
500M
55555555/stream
elapsed
1157.15 ms
total CPU used
9114.52 ms
speedup
7.88×
vs serial
efficiency
87.6%
of 9× ideal
| stream | spawn ms | spawned@ | work start@ | work end@ | work ms | reap wait ms |
|---|---|---|---|---|---|---|
| 0 (main) | 0 | 13.93 | 13.96 | 1028.5 | 1014.54 | 0 |
| 1 | 1.996 | 2.01 | 16.68 | 1000.35 | 983.67 | 0.14 |
| 2 | 1.933 | 3.97 | 55.31 | 1035.52 | 980.21 | 8.18 |
| 3 | 1.681 | 5.67 | 34.25 | 1016.02 | 981.77 | 0.2 |
| 4 | 1.749 | 7.44 | 44.94 | 1154.29 | 1109.35 | 125.89 |
| 5 | 1.622 | 9.09 | 50.47 | 1150.05 | 1099.58 | 121.69 |
| 6 | 1.537 | 10.64 | 62.28 | 908.81 | 846.53 | 8.14 |
| 7 | 1.618 | 12.28 | 72.26 | 1129.07 | 1056.81 | 100.71 |
| 8 | 1.607 | 13.91 | 58.98 | 1101.04 | 1042.06 | 72.65 |
main
w1
w2
w3
w4
w5
w6
w7
w8
fork+handshake
CPU work
parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access,
no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1).
100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup,
reap tail, SMT/core contention.