CPU scaling benchmark

workers
16 +1 main
iters total
500M
29411764/stream
elapsed
1184.16 ms
total CPU used
17096.92 ms
speedup
14.44×
vs serial
efficiency
84.9%
of 17× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 101.72 101.73 1175.26 1073.53 0
1 2.222 2.24 15.87 1013.77 997.9 0.17
2 1.736 4 17.22 1126.81 1109.59 0.22
3 1.72 5.73 21.49 1001.9 980.41 0.24
4 4.876 10.63 52.97 1093.17 1040.2 0.26
5 2.123 12.77 54.73 1164.83 1110.1 0.27
6 5.686 18.46 68.72 1163.04 1094.32 0.28
7 2.088 20.57 62.65 1154.78 1092.13 0.29
8 1.736 22.33 47.24 903.04 855.8 0.31
9 1.598 23.97 56.4 978.16 921.76 0.32
10 21.132 45.12 91.5 1145.59 1054.09 0.33
11 2.337 47.47 99.89 1036.42 936.53 0.35
12 1.782 49.27 99.07 1147.22 1048.15 0.36
13 25.971 75.25 112.33 930.68 818.35 0.37
14 2.435 77.7 142.23 1137.05 994.82 0.38
15 1.942 79.65 175 1109.98 934.98 0.39
16 22.011 101.68 146.86 1181.12 1034.26 5.97
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
w11
w12
w13
w14
w15
w16
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.