CPU scaling benchmark

workers
10 +1 main
iters total
500M
45454545/stream
elapsed
1141.69 ms
total CPU used
10955.86 ms
speedup
9.6×
vs serial
efficiency
87.3%
of 11× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 28.12 28.12 1134.07 1105.95 0
1 1.561 1.57 17.94 1134.75 1116.81 4.61
2 1.369 2.96 18.78 1133.51 1114.73 0.15
3 1.448 4.42 33.26 971.86 938.6 4.49
4 1.409 5.85 30.32 1038.44 1008.12 4.51
5 1.48 7.34 38.22 1010.92 972.7 4.53
6 1.441 8.8 47.88 773.43 725.55 4.54
7 1.471 10.29 54.22 1107.67 1053.45 4.55
8 1.93 12.24 50.06 1139.17 1089.11 5.2
9 1.445 13.7 47.41 1021.08 973.67 4.56
10 14.395 28.11 55.29 912.46 857.17 4.59
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.