CPU scaling benchmark

workers
10 +1 main
iters total
500M
45454545/stream
elapsed
1156.48 ms
total CPU used
11361.23 ms
speedup
9.82×
vs serial
efficiency
89.3%
of 11× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 27.3 27.32 968.51 941.19 0
1 2.065 2.12 28.19 1124.3 1096.11 158.26
2 1.534 3.68 21.42 1145.43 1124.01 176.98
3 1.599 5.3 23.45 928.4 904.95 0.45
4 1.609 6.92 35.18 1153.55 1118.37 185.11
5 1.522 8.46 41.12 1060.63 1019.51 92.2
6 1.582 10.06 45.24 975.75 930.51 16.54
7 1.778 11.85 75.13 1102.35 1027.22 133.92
8 3.477 15.35 61.57 1127.49 1065.92 161.05
9 1.542 16.92 60.15 1136.91 1076.76 168.48
10 10.358 27.29 55.15 1111.83 1056.68 148.4
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.