CPU scaling benchmark

workers
8 +1 main
iters total
500M
55555555/stream
elapsed
1159.36 ms
total CPU used
9282.04 ms
speedup
8.01×
vs serial
efficiency
89%
of 9× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 35.44 35.45 1094.58 1059.13 0
1 1.745 1.76 15.11 1110.35 1095.24 15.91
2 1.401 3.18 16.42 1143.95 1127.53 49.48
3 1.424 4.62 23.23 833.46 810.23 0.13
4 1.56 6.2 47.55 1122.5 1074.95 28.06
5 8.788 15.01 60.28 1156.66 1096.38 62.2
6 1.772 16.79 51.79 1098 1046.21 6.01
7 1.555 18.36 48.58 1082.42 1033.84 0.21
8 17.046 35.42 62.06 1000.59 938.53 0.25
main
w1
w2
w3
w4
w5
w6
w7
w8
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.