CPU scaling benchmark

workers
8 +1 main
iters total
500M
55555555/stream
elapsed
1163.89 ms
total CPU used
9196.8 ms
speedup
7.9×
vs serial
efficiency
87.8%
of 9× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 19.21 19.22 1163.57 1144.35 0
1 1.985 2 20.16 868.56 848.4 0.11
2 1.667 3.69 31.55 1066.49 1034.94 0.17
3 1.605 5.31 17.67 1111.32 1093.65 0.2
4 1.611 6.94 40.28 1137.5 1097.22 0.21
5 1.48 8.43 51.15 1136.11 1084.96 0.23
6 1.505 9.95 45.67 1094.76 1049.09 0.25
7 1.537 11.51 42.62 964.08 921.46 0.26
8 7.675 19.2 57 979.73 922.73 0.28
main
w1
w2
w3
w4
w5
w6
w7
w8
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.