CPU scaling benchmark

workers
8 +1 main
iters total
500M
55555555/stream
elapsed
1180.27 ms
total CPU used
8835.87 ms
speedup
7.49×
vs serial
efficiency
83.2%
of 9× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 18.8 18.81 1179.97 1161.16 0
1 2.094 2.11 15.38 1038.14 1022.76 0.1
2 1.436 3.57 15.67 1107.21 1091.54 0.16
3 2.036 5.62 16.87 991.37 974.5 0.18
4 2.463 8.1 43.02 1043.37 1000.35 0.2
5 6.087 14.2 51.55 1159.11 1107.56 0.22
6 1.535 15.75 41.56 913.11 871.55 0.23
7 1.502 17.27 61.51 1066.02 1004.51 0.24
8 1.498 18.78 56.22 658.16 601.94 0.26
main
w1
w2
w3
w4
w5
w6
w7
w8
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.