CPU scaling benchmark

workers
8 +1 main
iters total
500M
55555555/stream
elapsed
1148.38 ms
total CPU used
9491.67 ms
speedup
8.27×
vs serial
efficiency
91.9%
of 9× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 20.67 20.68 968.68 948 0
1 1.935 1.95 19.08 1113.36 1094.28 149.66
2 1.463 3.43 16.11 1145.42 1129.31 176.88
3 1.537 4.98 27.78 1117.03 1089.25 149.74
4 1.44 6.44 38.64 1054.59 1015.95 86.1
5 1.754 8.21 42.61 1090.39 1047.78 121.84
6 1.443 9.66 44.93 1069.36 1024.43 111.61
7 6.599 16.28 44.98 1127.62 1082.64 159.06
8 4.359 20.65 67.97 1128 1060.03 161.9
main
w1
w2
w3
w4
w5
w6
w7
w8
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.