CPU scaling benchmark
workers
8 +1 main
iters total
500M
55555555/stream
elapsed
1199.59 ms
total CPU used
9283.89 ms
speedup
7.74×
vs serial
efficiency
86%
of 9× ideal
| stream | spawn ms | spawned@ | work start@ | work end@ | work ms | reap wait ms |
|---|---|---|---|---|---|---|
| 0 (main) | 0 | 20.32 | 20.34 | 1069.87 | 1049.53 | 0 |
| 1 | 4.278 | 4.3 | 16.97 | 1063.55 | 1046.58 | 0.21 |
| 2 | 1.38 | 5.69 | 59.52 | 1084.57 | 1025.05 | 14.85 |
| 3 | 1.191 | 6.9 | 33.4 | 1077.69 | 1044.29 | 7.91 |
| 4 | 1.123 | 8.03 | 59.47 | 857.84 | 798.37 | 0.28 |
| 5 | 1.209 | 9.26 | 42.34 | 1197.14 | 1154.8 | 127.42 |
| 6 | 1.602 | 10.88 | 39.49 | 1082.6 | 1043.11 | 14.43 |
| 7 | 1.197 | 12.1 | 47.92 | 1165.13 | 1117.21 | 95.42 |
| 8 | 8.198 | 20.31 | 69.43 | 1074.38 | 1004.95 | 4.6 |
main
w1
w2
w3
w4
w5
w6
w7
w8
fork+handshake
CPU work
parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access,
no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1).
100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup,
reap tail, SMT/core contention.