CPU scaling benchmark

workers
6 +1 main
iters total
500M
71428571/stream
elapsed
1149.59 ms
total CPU used
7120.86 ms
speedup
6.19×
vs serial
efficiency
88.4%
of 7× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 11.31 11.32 960.99 949.67 0
1 2.121 2.14 15.37 1132.98 1117.61 172.14
2 1.644 3.8 18.29 873.07 854.78 0.13
3 2.507 6.32 27 999.16 972.16 43.64
4 1.848 8.18 40.56 1080.23 1039.67 122.08
5 1.505 9.71 43.21 1146.78 1103.57 185.91
6 1.581 11.3 34.46 1117.86 1083.4 157.04
main
w1
w2
w3
w4
w5
w6
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.