CPU scaling benchmark

workers
9 +1 main
iters total
500M
50000000/stream
elapsed
1158.18 ms
total CPU used
9711.45 ms
speedup
8.39×
vs serial
efficiency
83.9%
of 10× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 24.53 24.6 1076.63 1052.03 0
1 1.977 1.99 26.01 1146.47 1120.46 72.35
2 1.614 3.63 18.41 1040.39 1021.98 0.13
3 1.634 5.28 20.25 613.35 593.1 0.19
4 1.555 6.85 28.45 829.52 801.07 0.21
5 4.377 11.24 59.43 1085.43 1026 11.58
6 2.055 13.31 65.63 1143.35 1077.72 66.88
7 1.548 14.87 65.66 1010.59 944.93 0.22
8 7.825 22.71 57.23 1155.41 1098.18 78.9
9 1.801 24.52 57.26 1033.24 975.98 0.23
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.