CPU scaling benchmark

workers
10 +1 main
iters total
500M
45454545/stream
elapsed
1186.65 ms
total CPU used
11455.05 ms
speedup
9.65×
vs serial
efficiency
87.7%
of 11× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 18.72 18.74 1020.68 1001.94 0
1 2.113 2.12 29.63 1177.15 1147.52 156.63
2 1.629 3.78 17.06 1145.37 1128.31 129.79
3 1.565 5.36 42.11 1114.45 1072.34 93.93
4 1.526 6.9 38.17 1155.35 1117.18 134.78
5 1.554 8.48 43.27 1062.72 1019.45 56.91
6 1.512 10.01 43.03 1137.06 1094.03 119.3
7 1.52 11.54 43.22 848.91 805.69 0.14
8 1.651 13.21 52.17 1004.09 951.92 0.2
9 1.496 14.73 56.21 1183.64 1127.43 163.09
10 3.969 18.71 59.99 1049.23 989.24 28.67
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.