CPU scaling benchmark

workers
8 +1 main
iters total
500M
55555555/stream
elapsed
1153.82 ms
total CPU used
8987.61 ms
speedup
7.79×
vs serial
efficiency
86.6%
of 9× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 15.75 15.76 1129.46 1113.7 0
1 1.814 1.83 13.96 1092.78 1078.82 0.1
2 1.396 3.24 17.25 1063.74 1046.49 0.16
3 1.474 4.73 20.95 1141.84 1120.89 12.49
4 1.556 6.31 51 1081.43 1030.43 0.18
5 5.046 11.37 47.18 986.21 939.03 0.2
6 1.437 12.83 47.17 1072.81 1025.64 0.21
7 1.43 14.28 37.11 565.95 528.84 0.22
8 1.431 15.73 47.18 1150.95 1103.77 21.6
main
w1
w2
w3
w4
w5
w6
w7
w8
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.