CPU scaling benchmark

workers
9 +1 main
iters total
500M
50000000/stream
elapsed
1162.5 ms
total CPU used
10440.66 ms
speedup
8.98×
vs serial
efficiency
89.8%
of 10× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 27.13 27.14 1132.13 1104.99 0
1 1.735 1.75 18.23 1045.02 1026.79 0.13
2 1.381 3.15 15.37 1051.3 1035.93 0.19
3 1.455 4.62 17.67 1040.16 1022.49 0.21
4 4.326 8.96 41.1 1068.34 1027.24 0.22
5 1.415 10.38 47.28 1139.65 1092.37 7.66
6 4.486 14.88 50.91 1038.32 987.41 0.23
7 1.521 16.42 40.94 1050.1 1009.16 0.24
8 1.436 17.86 60.85 1159.55 1098.7 27.56
9 9.246 27.12 64.34 1099.92 1035.58 0.25
main
w1
w2
w3
w4
w5
w6
w7
w8
w9
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.