CPU scaling benchmark

workers
4 +1 main
iters total
500M
100000000/stream
elapsed
1188.82 ms
total CPU used
5359.89 ms
speedup
4.51×
vs serial
efficiency
90.2%
of 5× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 9.13 9.14 1044.37 1035.23 0
1 2.094 2.11 15.77 1028.88 1013.11 0.1
2 1.597 3.72 16.97 1107.24 1090.27 63
3 1.635 5.37 25.7 1185.71 1160.01 141.49
4 3.738 9.12 34.15 1095.42 1061.27 51.19
main
w1
w2
w3
w4
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.