CPU scaling benchmark

workers
10 +1 main
iters total
500M
45454545/stream
elapsed
1165.5 ms
total CPU used
11811.1 ms
speedup
10.13×
vs serial
efficiency
92.1%
of 11× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 47.83 47.85 1152.34 1104.49 0
1 2.681 2.7 33.22 1093.05 1059.83 0.15
2 2.301 5.03 50.49 1141.91 1091.42 0.21
3 2.131 7.19 27.28 1146.03 1118.75 0.22
4 3.431 10.64 30.33 1147.57 1117.24 0.24
5 2.159 12.82 80.44 1158.74 1078.3 6.48
6 2.036 14.89 47.35 1123.89 1076.54 0.9
7 25.66 40.56 80.46 1132.91 1052.45 0.92
8 2.712 43.31 90.48 1067.89 977.41 0.93
9 2.346 45.67 78.03 1162.92 1084.89 10.67
10 2.127 47.82 105.05 1154.83 1049.78 2.63
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.