CPU scaling benchmark

workers
8 +1 main
iters total
500M
55555555/stream
elapsed
1168.77 ms
total CPU used
9577.05 ms
speedup
8.19×
vs serial
efficiency
91%
of 9× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 15.54 15.55 1121.37 1105.82 0
1 2.098 2.12 17.78 1071.03 1053.25 0.13
2 1.476 3.61 22.32 1146.55 1124.23 27.82
3 1.546 5.17 33.29 1157.44 1124.15 36.17
4 1.51 6.7 35.54 1144.34 1108.8 27.86
5 1.539 8.26 36.06 1106.99 1070.93 0.19
6 1.525 9.8 38.55 1102.37 1063.82 0.27
7 4.065 13.87 60.07 1165.96 1105.89 44.72
8 1.628 15.52 43.69 863.85 820.16 0.29
main
w1
w2
w3
w4
w5
w6
w7
w8
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.