CPU scaling benchmark
workers
8 +1 main
iters total
500M
55555555/stream
elapsed
1180.27 ms
total CPU used
8835.87 ms
speedup
7.49×
vs serial
efficiency
83.2%
of 9× ideal
| stream | spawn ms | spawned@ | work start@ | work end@ | work ms | reap wait ms |
|---|---|---|---|---|---|---|
| 0 (main) | 0 | 18.8 | 18.81 | 1179.97 | 1161.16 | 0 |
| 1 | 2.094 | 2.11 | 15.38 | 1038.14 | 1022.76 | 0.1 |
| 2 | 1.436 | 3.57 | 15.67 | 1107.21 | 1091.54 | 0.16 |
| 3 | 2.036 | 5.62 | 16.87 | 991.37 | 974.5 | 0.18 |
| 4 | 2.463 | 8.1 | 43.02 | 1043.37 | 1000.35 | 0.2 |
| 5 | 6.087 | 14.2 | 51.55 | 1159.11 | 1107.56 | 0.22 |
| 6 | 1.535 | 15.75 | 41.56 | 913.11 | 871.55 | 0.23 |
| 7 | 1.502 | 17.27 | 61.51 | 1066.02 | 1004.51 | 0.24 |
| 8 | 1.498 | 18.78 | 56.22 | 658.16 | 601.94 | 0.26 |
main
w1
w2
w3
w4
w5
w6
w7
w8
fork+handshake
CPU work
parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access,
no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1).
100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup,
reap tail, SMT/core contention.