CPU scaling benchmark
workers
6 +1 main
iters total
500M
71428571/stream
elapsed
1196.93 ms
total CPU used
7313.38 ms
speedup
6.11×
vs serial
efficiency
87.3%
of 7× ideal
| stream | spawn ms | spawned@ | work start@ | work end@ | work ms | reap wait ms |
|---|---|---|---|---|---|---|
| 0 (main) | 0 | 10.01 | 10.02 | 1166.45 | 1156.43 | 0 |
| 1 | 1.906 | 1.92 | 25.68 | 1035.33 | 1009.65 | 0.15 |
| 2 | 1.483 | 3.43 | 17.49 | 856.23 | 838.74 | 0.21 |
| 3 | 1.464 | 4.91 | 41.65 | 1194.16 | 1152.51 | 27.83 |
| 4 | 2.045 | 6.98 | 32.6 | 1105.42 | 1072.82 | 0.22 |
| 5 | 1.493 | 8.49 | 43.03 | 1144.06 | 1101.03 | 0.25 |
| 6 | 1.482 | 9.99 | 26.56 | 1008.76 | 982.2 | 0.27 |
main
w1
w2
w3
w4
w5
w6
fork+handshake
CPU work
parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access,
no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1).
100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup,
reap tail, SMT/core contention.