CPU scaling benchmark

workers
6 +1 main
iters total
500M
71428571/stream
elapsed
1170.27 ms
total CPU used
7053.75 ms
speedup
6.03×
vs serial
efficiency
86.1%
of 7× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 17.25 17.26 1128.57 1111.31 0
1 2.055 2.07 14.64 1157.15 1142.51 28.72
2 1.598 3.69 17.26 1025.67 1008.41 0.11
3 2.962 6.67 17.3 743.4 726.1 0.17
4 4.734 11.42 37.29 984.45 947.16 0.18
5 2.065 13.5 44.53 1041.51 996.98 0.2
6 3.717 17.24 46.27 1167.55 1121.28 39.09
main
w1
w2
w3
w4
w5
w6
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.