CPU scaling benchmark

workers
8 +1 main
iters total
500M
55555555/stream
elapsed
1199.59 ms
total CPU used
9283.89 ms
speedup
7.74×
vs serial
efficiency
86%
of 9× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 20.32 20.34 1069.87 1049.53 0
1 4.278 4.3 16.97 1063.55 1046.58 0.21
2 1.38 5.69 59.52 1084.57 1025.05 14.85
3 1.191 6.9 33.4 1077.69 1044.29 7.91
4 1.123 8.03 59.47 857.84 798.37 0.28
5 1.209 9.26 42.34 1197.14 1154.8 127.42
6 1.602 10.88 39.49 1082.6 1043.11 14.43
7 1.197 12.1 47.92 1165.13 1117.21 95.42
8 8.198 20.31 69.43 1074.38 1004.95 4.6
main
w1
w2
w3
w4
w5
w6
w7
w8
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.