CPU scaling benchmark
workers
8 +1 main
iters total
500M
55555555/stream
elapsed
1159.36 ms
total CPU used
9282.04 ms
speedup
8.01×
vs serial
efficiency
89%
of 9× ideal
| stream | spawn ms | spawned@ | work start@ | work end@ | work ms | reap wait ms |
|---|---|---|---|---|---|---|
| 0 (main) | 0 | 35.44 | 35.45 | 1094.58 | 1059.13 | 0 |
| 1 | 1.745 | 1.76 | 15.11 | 1110.35 | 1095.24 | 15.91 |
| 2 | 1.401 | 3.18 | 16.42 | 1143.95 | 1127.53 | 49.48 |
| 3 | 1.424 | 4.62 | 23.23 | 833.46 | 810.23 | 0.13 |
| 4 | 1.56 | 6.2 | 47.55 | 1122.5 | 1074.95 | 28.06 |
| 5 | 8.788 | 15.01 | 60.28 | 1156.66 | 1096.38 | 62.2 |
| 6 | 1.772 | 16.79 | 51.79 | 1098 | 1046.21 | 6.01 |
| 7 | 1.555 | 18.36 | 48.58 | 1082.42 | 1033.84 | 0.21 |
| 8 | 17.046 | 35.42 | 62.06 | 1000.59 | 938.53 | 0.25 |
main
w1
w2
w3
w4
w5
w6
w7
w8
fork+handshake
CPU work
parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access,
no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1).
100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup,
reap tail, SMT/core contention.