CPU scaling benchmark

workers
9 +1 main
iters total
500M
50000000/stream
elapsed
1145.49 ms
total CPU used
10265.96 ms
speedup
8.96×
vs serial
efficiency
89.6%
of 10× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 24.44 24.45 1049.58 1025.13 0
1 2.096 2.11 21.25 1129 1107.75 79.53
2 1.577 3.72 38.47 1063.28 1024.81 13.82
3 1.687 5.43 26.51 713.6 687.09 0.17
4 1.628 7.09 32.39 1095.15 1062.76 50.61
5 1.652 8.76 34.04 1134.6 1100.56 85.17
6 1.82 10.6 46.58 1142.68 1096.1 93.22
7 1.689 12.32 61.02 1089.24 1028.22 45.04
8 9.736 22.08 54.11 1135.3 1081.19 87.8
9 2.31 24.41 64.11 1116.46 1052.35 66.99
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.