Because after PR 86041, the optimizer no longer load-merges at the LLVM IR level, which might be part of the perf loss. (I'll run perf and see if this makes a difference.)
Also I added a codegen test so this hopefully won't regress in future -- it passes on stable and with my change here, but not on the 2021-11-09 nightly.