CPU scaling benchmark

workers
10 +1 main
iters total
500M
45454545/stream
elapsed
1224.43 ms
total CPU used
11787.45 ms
speedup
9.63×
vs serial
efficiency
87.5%
of 11× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 30.49 30.5 1126.85 1096.35 0
1 1.871 1.89 13.68 1040.27 1026.59 0.14
2 1.498 3.41 15.67 1164.18 1148.51 37.45
3 2.588 6.01 20.92 1150.62 1129.7 28.62
4 2.041 8.07 26.1 1053.54 1027.44 0.2
5 4.705 12.8 41.54 1150.25 1108.71 28.56
6 5.527 18.34 56.95 1122.11 1065.16 0.22
7 1.504 19.86 66.87 1179.58 1112.71 52.86
8 7.658 27.53 55.55 1061.66 1006.11 0.24
9 1.524 29.08 58.95 972.77 913.82 0.25
10 1.382 30.48 69.33 1221.68 1152.35 94.97
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.