CPU scaling benchmark

workers
10 +1 main
iters total
500M
45454545/stream
elapsed
1153.66 ms
total CPU used
10800.63 ms
speedup
9.36×
vs serial
efficiency
85.1%
of 11× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 44.28 44.29 1139.62 1095.33 0
1 2.035 2.05 18.16 1099.07 1080.91 0.2
2 1.627 3.71 31.23 1149.64 1118.41 10.13
3 1.72 5.44 20.74 647.44 626.7 0.27
4 3.945 9.42 32.04 1083.81 1051.77 0.29
5 2.686 12.25 51.95 1091.4 1039.45 0.3
6 5.422 17.7 61.21 972.8 911.59 0.31
7 1.844 19.57 61.29 823.94 762.65 0.32
8 3.256 22.85 68.6 1054.96 986.36 0.33
9 1.624 24.49 62.89 1150.34 1087.45 12.96
10 19.766 44.27 101.2 1141.21 1040.01 1.7
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.