CPU scaling benchmark

workers
6 +1 main
iters total
500M
71428571/stream
elapsed
1160.07 ms
total CPU used
7099.38 ms
speedup
6.12×
vs serial
efficiency
87.4%
of 7× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 9.77 9.78 1135.65 1125.87 0
1 2.078 2.09 19.97 764.85 744.88 0.12
2 1.565 3.67 28.64 1025.18 996.54 0.18
3 1.511 5.19 28.38 1031.14 1002.76 0.19
4 1.501 6.71 35.69 1156.96 1121.27 21.43
5 1.51 8.24 30.78 1059.27 1028.49 0.21
6 1.512 9.76 40.67 1120.24 1079.57 0.23
main
w1
w2
w3
w4
w5
w6
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.