CPU scaling benchmark

workers
8 +1 main
iters total
500M
55555555/stream
elapsed
1214.73 ms
total CPU used
9120.43 ms
speedup
7.51×
vs serial
efficiency
83.4%
of 9× ideal
stream spawn ms spawned@ work start@ work end@ work ms reap wait ms
0 (main) 0 23.56 23.58 1124.63 1101.05 0
1 1.917 1.95 19.08 1115.81 1096.73 0.13
2 1.475 3.45 26.51 1053.7 1027.19 0.19
3 1.48 4.95 24.07 1048.04 1023.97 0.2
4 3.569 8.54 38.4 895.14 856.74 0.21
5 1.639 10.2 43.09 1026.25 983.16 0.22
6 5.424 15.64 62.51 1083.21 1020.7 0.23
7 1.801 17.47 71.62 1211.89 1140.27 87.4
8 6.051 23.54 55.14 925.76 870.62 0.24
main
w1
w2
w3
w4
w5
w6
w7
w8
    fork+handshake      CPU work      parent reap wait
what this measures
Each stream runs a tight integer LCG loop — working set is one CPU register, no memory access, no shared data. Speedup = sum(stream CPU time) / wall-clock elapsed. Efficiency = speedup / (workers+1). 100% efficiency means perfect linear scaling; less than 100% is the cost of serial fork setup, reap tail, SMT/core contention.