CPU scaling benchmark

workers
4 +1 main
iters total
500M
100000000/stream
elapsed
1156.82 ms
total CPU used
5385.98 ms
speedup
4.66×
vs serial
efficiency
93.2%
of 5× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 6.43 6.44 1040.12 1033.68 0
1 1.945 1.96 18.02 1127.66 1109.64 87.66
2 1.434 3.42 15.29 1084.39 1069.1 44.41
3 1.497 4.93 17.47 1077.66 1060.19 40.22
4 1.467 6.42 40.38 1153.75 1113.37 113.79
main
w1
w2
w3
w4
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.