Rollup merge of #115291 - cjgillot:dest-prop-save, r=JakobDegen

Save liveness results for DestinationPropagation

`DestinationPropagation` needs to verify that merge candidates do not conflict with each other. This is done by verifying that a local is not live when its counterpart is written to.

To get the liveness information, the pass runs `MaybeLiveLocals` dataflow analysis repeatedly, once for each propagation round. This is quite costly, and the main driver for the perf impact on `ucd` and `diesel`. (See https://github.com/rust-lang/rust/pull/115105#issuecomment-1689205908)

In order to mitigate this cost, this PR proposes to save the result of the analysis into a `SparseIntervalMatrix`, and mirror merges of locals into that matrix: `liveness(destination) := liveness(destination) union liveness(source)`.

<details>
<summary>Proof</summary>

We denote by `'` all the quantities of the transformed program. Let $\varphi$ be a mapping of locals, which maps `source` to `destination`, and is identity otherwise. The exact liveness set after a statement is $out'(statement)$, and the proposed liveness set is $\varphi(out(statement))$.

Consider a statement. Suppose that the output state verifies $out' \subset phi(out)$. We want to prove that $in' \subset \varphi(in)$ where $in = (out - kill) \cup gen$, and conclude by induction.

We have 2 cases: either that statement is kept with locals renumbered by $\varphi$, or it is a tautological assignment and it removed.

1. If the statement is kept: the gen-set and the kill-set of $statement' = \varphi(statement)$ are $gen' = \varphi(gen)$ and $kill' = \varphi(kill)$ exactly.
From soundness requirement 3, $\varphi(in)$ is disjoint from $\varphi(kill)$.
This implies that $\varphi(out - kill)$ is disjoint from $\varphi(kill)$, and so $\varphi(out - kill) = \varphi(out) - \varphi(kill)$. Then $\varphi(in) = (\varphi(out) - \varphi(kill)) \cup \varphi(gen) = (\varphi(out) - kill') \cup gen'$.
We can conclude that $out' \subset \varphi(out) \implies in' \subset \varphi(in)$.

2. If the statement is removed. As $\varphi(statement)$ is a tautological assignment, we know that $\varphi(gen) = \varphi(kill) = \\{ destination \\}$, while $gen' = kill' = \emptyset$. So $\varphi(in) = \varphi(out) \cup \\{ destination \\}$. Then $in' = out' \subset out \subset \varphi(in)$.

By recursion, we can conclude by that $in' \subset \varphi(in)$ everywhere.
</details>

This approximate liveness results is only suboptimal if there are locals that fully disappear from the CFG due to an assignment cycle. These cases are quite unlikely, so we do not bother with them.

This change allows to reduce the perf impact of DestinationPropagation by half on diesel and ucd (https://github.com/rust-lang/rust/pull/115105#issuecomment-1694701904).

cc ````@JakobDegen````
This commit is contained in:
Matthias Krüger 2024-01-17 20:21:19 +01:00 committed by GitHub
commit 3ae7ab698f
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
10 changed files with 226 additions and 158 deletions

View file

@ -34,6 +34,7 @@ mod errors;
mod framework;
pub mod impls;
pub mod move_paths;
pub mod points;
pub mod rustc_peek;
pub mod storage;
pub mod un_derefer;

View file

@ -0,0 +1,156 @@
use crate::framework::{visit_results, ResultsVisitable, ResultsVisitor};
use rustc_index::bit_set::ChunkedBitSet;
use rustc_index::interval::SparseIntervalMatrix;
use rustc_index::Idx;
use rustc_index::IndexVec;
use rustc_middle::mir::{self, BasicBlock, Body, Location};
/// Maps between a `Location` and a `PointIndex` (and vice versa).
pub struct DenseLocationMap {
/// For each basic block, how many points are contained within?
statements_before_block: IndexVec<BasicBlock, usize>,
/// Map backward from each point to the basic block that it
/// belongs to.
basic_blocks: IndexVec<PointIndex, BasicBlock>,
num_points: usize,
}
impl DenseLocationMap {
#[inline]
pub fn new(body: &Body<'_>) -> Self {
let mut num_points = 0;
let statements_before_block: IndexVec<BasicBlock, usize> = body
.basic_blocks
.iter()
.map(|block_data| {
let v = num_points;
num_points += block_data.statements.len() + 1;
v
})
.collect();
let mut basic_blocks = IndexVec::with_capacity(num_points);
for (bb, bb_data) in body.basic_blocks.iter_enumerated() {
basic_blocks.extend((0..=bb_data.statements.len()).map(|_| bb));
}
Self { statements_before_block, basic_blocks, num_points }
}
/// Total number of point indices
#[inline]
pub fn num_points(&self) -> usize {
self.num_points
}
/// Converts a `Location` into a `PointIndex`. O(1).
#[inline]
pub fn point_from_location(&self, location: Location) -> PointIndex {
let Location { block, statement_index } = location;
let start_index = self.statements_before_block[block];
PointIndex::new(start_index + statement_index)
}
/// Returns the `PointIndex` for the first statement in the given `BasicBlock`. O(1).
#[inline]
pub fn entry_point(&self, block: BasicBlock) -> PointIndex {
let start_index = self.statements_before_block[block];
PointIndex::new(start_index)
}
/// Return the PointIndex for the block start of this index.
#[inline]
pub fn to_block_start(&self, index: PointIndex) -> PointIndex {
PointIndex::new(self.statements_before_block[self.basic_blocks[index]])
}
/// Converts a `PointIndex` back to a location. O(1).
#[inline]
pub fn to_location(&self, index: PointIndex) -> Location {
assert!(index.index() < self.num_points);
let block = self.basic_blocks[index];
let start_index = self.statements_before_block[block];
let statement_index = index.index() - start_index;
Location { block, statement_index }
}
/// Sometimes we get point-indices back from bitsets that may be
/// out of range (because they round up to the nearest 2^N number
/// of bits). Use this function to filter such points out if you
/// like.
#[inline]
pub fn point_in_range(&self, index: PointIndex) -> bool {
index.index() < self.num_points
}
}
rustc_index::newtype_index! {
/// A single integer representing a `Location` in the MIR control-flow
/// graph. Constructed efficiently from `DenseLocationMap`.
#[orderable]
#[debug_format = "PointIndex({})"]
pub struct PointIndex {}
}
/// Add points depending on the result of the given dataflow analysis.
pub fn save_as_intervals<'tcx, N, R>(
elements: &DenseLocationMap,
body: &mir::Body<'tcx>,
mut results: R,
) -> SparseIntervalMatrix<N, PointIndex>
where
N: Idx,
R: ResultsVisitable<'tcx, FlowState = ChunkedBitSet<N>>,
{
let values = SparseIntervalMatrix::new(elements.num_points());
let mut visitor = Visitor { elements, values };
visit_results(
body,
body.basic_blocks.reverse_postorder().iter().copied(),
&mut results,
&mut visitor,
);
visitor.values
}
struct Visitor<'a, N: Idx> {
elements: &'a DenseLocationMap,
values: SparseIntervalMatrix<N, PointIndex>,
}
impl<'mir, 'tcx, R, N> ResultsVisitor<'mir, 'tcx, R> for Visitor<'_, N>
where
N: Idx,
{
type FlowState = ChunkedBitSet<N>;
fn visit_statement_after_primary_effect(
&mut self,
_results: &mut R,
state: &Self::FlowState,
_statement: &'mir mir::Statement<'tcx>,
location: Location,
) {
let point = self.elements.point_from_location(location);
// Use internal iterator manually as it is much more efficient.
state.iter().for_each(|node| {
self.values.insert(node, point);
});
}
fn visit_terminator_after_primary_effect(
&mut self,
_results: &mut R,
state: &Self::FlowState,
_terminator: &'mir mir::Terminator<'tcx>,
location: Location,
) {
let point = self.elements.point_from_location(location);
// Use internal iterator manually as it is much more efficient.
state.iter().for_each(|node| {
self.values.insert(node, point);
});
}
}