CPU scaling benchmark

workers
6 +1 main
iters total
500M
71428571/stream
elapsed
1148.23 ms
total CPU used
7336.44 ms
speedup
6.39×
vs serial
efficiency
91.3%
of 7× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 15.22 15.23 1132.13 1116.9 0
1 2.041 2.06 19.1 1014.08 994.98 0.11
2 1.6 3.68 18.99 1145.29 1126.3 13.28
3 2.124 5.81 27.4 1025.19 997.79 0.17
4 1.935 7.76 32.11 1139.99 1107.88 7.95
5 1.46 9.24 32.8 995.31 962.51 0.19
6 5.958 15.21 39.93 1070.01 1030.08 0.22
main
w1
w2
w3
w4
w5
w6
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.