CPU scaling benchmark

workers
10 +1 main
iters total
500M
45454545/stream
elapsed
1153.65 ms
total CPU used
11553.94 ms
speedup
10.02×
vs serial
efficiency
91.1%
of 11× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 32.05 32.06 1134.22 1102.16 0
1 1.92 1.93 14.43 996.39 981.96 0.12
2 1.48 3.43 20.22 1131.97 1111.75 0.17
3 2.188 5.64 16.8 1091.88 1075.08 0.84
4 1.764 7.42 48.85 1067.56 1018.71 0.86
5 5.692 13.12 51.29 1145.05 1093.76 10.95
6 1.523 14.66 50.63 1113.81 1063.18 0.88
7 6.435 21.11 50.47 1027.77 977.3 0.89
8 1.53 22.65 70.45 1075.63 1005.18 0.9
9 1.395 24.06 72.23 1104.58 1032.35 0.92
10 7.971 32.05 58.23 1150.74 1092.51 16.64
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.