Just a small clarification. Why the subtable of the USI is directed to another AMP, instead it could be had in the same AMP as that of NUSI, which could increase performance (although the overhead of storing it on same AMP, however now the subtable row is stored in another AMP which is also overhead)
you're right, the access from USI row to base row would be faster when it's within the same AMP.
But this is only the step #2, as a first step you would need to do a broadcast to all AMPs in the system because the exact AMP is unknow. And this is much more overhead.