CPU scaling benchmark

workers
9 +1 main
iters total
500M
50000000/stream
elapsed
1181.47 ms
total CPU used
10035.16 ms
speedup
8.49×
vs serial
efficiency
84.9%
of 10× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 35.9 36.19 1144.41 1108.22 0
1 2.181 2.19 14.93 934.65 919.72 0.15
2 1.538 3.75 15.86 934.63 918.77 0.24
3 2.904 6.67 17.04 986.97 969.93 0.25
4 2.33 9.01 48.11 1171.63 1123.52 28.23
5 7.852 16.87 49.49 1169.72 1120.23 25.43
6 2.173 19.06 48.48 956.26 907.78 0.27
7 5.324 24.4 60.38 988.32 927.94 0.28
8 1.598 26.01 48.16 994.11 945.95 0.29
9 9.873 35.9 85.58 1178.68 1093.1 34.36
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.