Rollup merge of #83329 - camelid:debuginfo-doc-cleanup, r=davidtwco
Cleanup LLVM debuginfo module docs - Move debuginfo docs from `doc.rs` module to `doc.md` file - Cleanup LLVM debuginfo module docs
This commit is contained in:
commit
85f16fb4bc
4 changed files with 182 additions and 181 deletions
180
compiler/rustc_codegen_llvm/src/debuginfo/doc.md
Normal file
180
compiler/rustc_codegen_llvm/src/debuginfo/doc.md
Normal file
|
@ -0,0 +1,180 @@
|
|||
# Debug Info Module
|
||||
|
||||
This module serves the purpose of generating debug symbols. We use LLVM's
|
||||
[source level debugging](https://llvm.org/docs/SourceLevelDebugging.html)
|
||||
features for generating the debug information. The general principle is
|
||||
this:
|
||||
|
||||
Given the right metadata in the LLVM IR, the LLVM code generator is able to
|
||||
create DWARF debug symbols for the given code. The
|
||||
[metadata](https://llvm.org/docs/LangRef.html#metadata-type) is structured
|
||||
much like DWARF *debugging information entries* (DIE), representing type
|
||||
information such as datatype layout, function signatures, block layout,
|
||||
variable location and scope information, etc. It is the purpose of this
|
||||
module to generate correct metadata and insert it into the LLVM IR.
|
||||
|
||||
As the exact format of metadata trees may change between different LLVM
|
||||
versions, we now use LLVM
|
||||
[DIBuilder](https://llvm.org/docs/doxygen/html/classllvm_1_1DIBuilder.html)
|
||||
to create metadata where possible. This will hopefully ease the adaption of
|
||||
this module to future LLVM versions.
|
||||
|
||||
The public API of the module is a set of functions that will insert the
|
||||
correct metadata into the LLVM IR when called with the right parameters.
|
||||
The module is thus driven from an outside client with functions like
|
||||
`debuginfo::create_local_var_metadata(bx: block, local: &ast::local)`.
|
||||
|
||||
Internally the module will try to reuse already created metadata by
|
||||
utilizing a cache. The way to get a shared metadata node when needed is
|
||||
thus to just call the corresponding function in this module:
|
||||
|
||||
let file_metadata = file_metadata(cx, file);
|
||||
|
||||
The function will take care of probing the cache for an existing node for
|
||||
that exact file path.
|
||||
|
||||
All private state used by the module is stored within either the
|
||||
CrateDebugContext struct (owned by the CodegenCx) or the
|
||||
FunctionDebugContext (owned by the FunctionCx).
|
||||
|
||||
This file consists of three conceptual sections:
|
||||
1. The public interface of the module
|
||||
2. Module-internal metadata creation functions
|
||||
3. Minor utility functions
|
||||
|
||||
|
||||
## Recursive Types
|
||||
|
||||
Some kinds of types, such as structs and enums can be recursive. That means
|
||||
that the type definition of some type X refers to some other type which in
|
||||
turn (transitively) refers to X. This introduces cycles into the type
|
||||
referral graph. A naive algorithm doing an on-demand, depth-first traversal
|
||||
of this graph when describing types, can get trapped in an endless loop
|
||||
when it reaches such a cycle.
|
||||
|
||||
For example, the following simple type for a singly-linked list...
|
||||
|
||||
```
|
||||
struct List {
|
||||
value: i32,
|
||||
tail: Option<Box<List>>,
|
||||
}
|
||||
```
|
||||
|
||||
will generate the following callstack with a naive DFS algorithm:
|
||||
|
||||
```
|
||||
describe(t = List)
|
||||
describe(t = i32)
|
||||
describe(t = Option<Box<List>>)
|
||||
describe(t = Box<List>)
|
||||
describe(t = List) // at the beginning again...
|
||||
...
|
||||
```
|
||||
|
||||
To break cycles like these, we use "forward declarations". That is, when
|
||||
the algorithm encounters a possibly recursive type (any struct or enum), it
|
||||
immediately creates a type description node and inserts it into the cache
|
||||
*before* describing the members of the type. This type description is just
|
||||
a stub (as type members are not described and added to it yet) but it
|
||||
allows the algorithm to already refer to the type. After the stub is
|
||||
inserted into the cache, the algorithm continues as before. If it now
|
||||
encounters a recursive reference, it will hit the cache and does not try to
|
||||
describe the type anew.
|
||||
|
||||
This behavior is encapsulated in the 'RecursiveTypeDescription' enum,
|
||||
which represents a kind of continuation, storing all state needed to
|
||||
continue traversal at the type members after the type has been registered
|
||||
with the cache. (This implementation approach might be a tad over-
|
||||
engineered and may change in the future)
|
||||
|
||||
|
||||
## Source Locations and Line Information
|
||||
|
||||
In addition to data type descriptions the debugging information must also
|
||||
allow to map machine code locations back to source code locations in order
|
||||
to be useful. This functionality is also handled in this module. The
|
||||
following functions allow to control source mappings:
|
||||
|
||||
+ `set_source_location()`
|
||||
+ `clear_source_location()`
|
||||
+ `start_emitting_source_locations()`
|
||||
|
||||
`set_source_location()` allows to set the current source location. All IR
|
||||
instructions created after a call to this function will be linked to the
|
||||
given source location, until another location is specified with
|
||||
`set_source_location()` or the source location is cleared with
|
||||
`clear_source_location()`. In the later case, subsequent IR instruction
|
||||
will not be linked to any source location. As you can see, this is a
|
||||
stateful API (mimicking the one in LLVM), so be careful with source
|
||||
locations set by previous calls. It's probably best to not rely on any
|
||||
specific state being present at a given point in code.
|
||||
|
||||
One topic that deserves some extra attention is *function prologues*. At
|
||||
the beginning of a function's machine code there are typically a few
|
||||
instructions for loading argument values into allocas and checking if
|
||||
there's enough stack space for the function to execute. This *prologue* is
|
||||
not visible in the source code and LLVM puts a special PROLOGUE END marker
|
||||
into the line table at the first non-prologue instruction of the function.
|
||||
In order to find out where the prologue ends, LLVM looks for the first
|
||||
instruction in the function body that is linked to a source location. So,
|
||||
when generating prologue instructions we have to make sure that we don't
|
||||
emit source location information until the 'real' function body begins. For
|
||||
this reason, source location emission is disabled by default for any new
|
||||
function being codegened and is only activated after a call to the third
|
||||
function from the list above, `start_emitting_source_locations()`. This
|
||||
function should be called right before regularly starting to codegen the
|
||||
top-level block of the given function.
|
||||
|
||||
There is one exception to the above rule: `llvm.dbg.declare` instruction
|
||||
must be linked to the source location of the variable being declared. For
|
||||
function parameters these `llvm.dbg.declare` instructions typically occur
|
||||
in the middle of the prologue, however, they are ignored by LLVM's prologue
|
||||
detection. The `create_argument_metadata()` and related functions take care
|
||||
of linking the `llvm.dbg.declare` instructions to the correct source
|
||||
locations even while source location emission is still disabled, so there
|
||||
is no need to do anything special with source location handling here.
|
||||
|
||||
## Unique Type Identification
|
||||
|
||||
In order for link-time optimization to work properly, LLVM needs a unique
|
||||
type identifier that tells it across compilation units which types are the
|
||||
same as others. This type identifier is created by
|
||||
`TypeMap::get_unique_type_id_of_type()` using the following algorithm:
|
||||
|
||||
1. Primitive types have their name as ID
|
||||
|
||||
2. Structs, enums and traits have a multipart identifier
|
||||
|
||||
1. The first part is the SVH (strict version hash) of the crate they
|
||||
were originally defined in
|
||||
|
||||
2. The second part is the ast::NodeId of the definition in their
|
||||
original crate
|
||||
|
||||
3. The final part is a concatenation of the type IDs of their concrete
|
||||
type arguments if they are generic types.
|
||||
|
||||
3. Tuple-, pointer-, and function types are structurally identified, which
|
||||
means that they are equivalent if their component types are equivalent
|
||||
(i.e., `(i32, i32)` is the same regardless in which crate it is used).
|
||||
|
||||
This algorithm also provides a stable ID for types that are defined in one
|
||||
crate but instantiated from metadata within another crate. We just have to
|
||||
take care to always map crate and `NodeId`s back to the original crate
|
||||
context.
|
||||
|
||||
As a side-effect these unique type IDs also help to solve a problem arising
|
||||
from lifetime parameters. Since lifetime parameters are completely omitted
|
||||
in debuginfo, more than one `Ty` instance may map to the same debuginfo
|
||||
type metadata, that is, some struct `Struct<'a>` may have N instantiations
|
||||
with different concrete substitutions for `'a`, and thus there will be N
|
||||
`Ty` instances for the type `Struct<'a>` even though it is not generic
|
||||
otherwise. Unfortunately this means that we cannot use `ty::type_id()` as
|
||||
cheap identifier for type metadata -- we have done this in the past, but it
|
||||
led to unnecessary metadata duplication in the best case and LLVM
|
||||
assertions in the worst. However, the unique type ID as described above
|
||||
*can* be used as identifier. Since it is comparatively expensive to
|
||||
construct, though, `ty::type_id()` is still used additionally as an
|
||||
optimization for cases where the exact same type has been seen before
|
||||
(which is most of the time).
|
|
@ -1,179 +0,0 @@
|
|||
//! # Debug Info Module
|
||||
//!
|
||||
//! This module serves the purpose of generating debug symbols. We use LLVM's
|
||||
//! [source level debugging](https://llvm.org/docs/SourceLevelDebugging.html)
|
||||
//! features for generating the debug information. The general principle is
|
||||
//! this:
|
||||
//!
|
||||
//! Given the right metadata in the LLVM IR, the LLVM code generator is able to
|
||||
//! create DWARF debug symbols for the given code. The
|
||||
//! [metadata](https://llvm.org/docs/LangRef.html#metadata-type) is structured
|
||||
//! much like DWARF *debugging information entries* (DIE), representing type
|
||||
//! information such as datatype layout, function signatures, block layout,
|
||||
//! variable location and scope information, etc. It is the purpose of this
|
||||
//! module to generate correct metadata and insert it into the LLVM IR.
|
||||
//!
|
||||
//! As the exact format of metadata trees may change between different LLVM
|
||||
//! versions, we now use LLVM
|
||||
//! [DIBuilder](https://llvm.org/docs/doxygen/html/classllvm_1_1DIBuilder.html)
|
||||
//! to create metadata where possible. This will hopefully ease the adaption of
|
||||
//! this module to future LLVM versions.
|
||||
//!
|
||||
//! The public API of the module is a set of functions that will insert the
|
||||
//! correct metadata into the LLVM IR when called with the right parameters.
|
||||
//! The module is thus driven from an outside client with functions like
|
||||
//! `debuginfo::create_local_var_metadata(bx: block, local: &ast::local)`.
|
||||
//!
|
||||
//! Internally the module will try to reuse already created metadata by
|
||||
//! utilizing a cache. The way to get a shared metadata node when needed is
|
||||
//! thus to just call the corresponding function in this module:
|
||||
//!
|
||||
//! let file_metadata = file_metadata(cx, file);
|
||||
//!
|
||||
//! The function will take care of probing the cache for an existing node for
|
||||
//! that exact file path.
|
||||
//!
|
||||
//! All private state used by the module is stored within either the
|
||||
//! CrateDebugContext struct (owned by the CodegenCx) or the
|
||||
//! FunctionDebugContext (owned by the FunctionCx).
|
||||
//!
|
||||
//! This file consists of three conceptual sections:
|
||||
//! 1. The public interface of the module
|
||||
//! 2. Module-internal metadata creation functions
|
||||
//! 3. Minor utility functions
|
||||
//!
|
||||
//!
|
||||
//! ## Recursive Types
|
||||
//!
|
||||
//! Some kinds of types, such as structs and enums can be recursive. That means
|
||||
//! that the type definition of some type X refers to some other type which in
|
||||
//! turn (transitively) refers to X. This introduces cycles into the type
|
||||
//! referral graph. A naive algorithm doing an on-demand, depth-first traversal
|
||||
//! of this graph when describing types, can get trapped in an endless loop
|
||||
//! when it reaches such a cycle.
|
||||
//!
|
||||
//! For example, the following simple type for a singly-linked list...
|
||||
//!
|
||||
//! ```
|
||||
//! struct List {
|
||||
//! value: i32,
|
||||
//! tail: Option<Box<List>>,
|
||||
//! }
|
||||
//! ```
|
||||
//!
|
||||
//! will generate the following callstack with a naive DFS algorithm:
|
||||
//!
|
||||
//! ```
|
||||
//! describe(t = List)
|
||||
//! describe(t = i32)
|
||||
//! describe(t = Option<Box<List>>)
|
||||
//! describe(t = Box<List>)
|
||||
//! describe(t = List) // at the beginning again...
|
||||
//! ...
|
||||
//! ```
|
||||
//!
|
||||
//! To break cycles like these, we use "forward declarations". That is, when
|
||||
//! the algorithm encounters a possibly recursive type (any struct or enum), it
|
||||
//! immediately creates a type description node and inserts it into the cache
|
||||
//! *before* describing the members of the type. This type description is just
|
||||
//! a stub (as type members are not described and added to it yet) but it
|
||||
//! allows the algorithm to already refer to the type. After the stub is
|
||||
//! inserted into the cache, the algorithm continues as before. If it now
|
||||
//! encounters a recursive reference, it will hit the cache and does not try to
|
||||
//! describe the type anew.
|
||||
//!
|
||||
//! This behavior is encapsulated in the 'RecursiveTypeDescription' enum,
|
||||
//! which represents a kind of continuation, storing all state needed to
|
||||
//! continue traversal at the type members after the type has been registered
|
||||
//! with the cache. (This implementation approach might be a tad over-
|
||||
//! engineered and may change in the future)
|
||||
//!
|
||||
//!
|
||||
//! ## Source Locations and Line Information
|
||||
//!
|
||||
//! In addition to data type descriptions the debugging information must also
|
||||
//! allow to map machine code locations back to source code locations in order
|
||||
//! to be useful. This functionality is also handled in this module. The
|
||||
//! following functions allow to control source mappings:
|
||||
//!
|
||||
//! + set_source_location()
|
||||
//! + clear_source_location()
|
||||
//! + start_emitting_source_locations()
|
||||
//!
|
||||
//! `set_source_location()` allows to set the current source location. All IR
|
||||
//! instructions created after a call to this function will be linked to the
|
||||
//! given source location, until another location is specified with
|
||||
//! `set_source_location()` or the source location is cleared with
|
||||
//! `clear_source_location()`. In the later case, subsequent IR instruction
|
||||
//! will not be linked to any source location. As you can see, this is a
|
||||
//! stateful API (mimicking the one in LLVM), so be careful with source
|
||||
//! locations set by previous calls. It's probably best to not rely on any
|
||||
//! specific state being present at a given point in code.
|
||||
//!
|
||||
//! One topic that deserves some extra attention is *function prologues*. At
|
||||
//! the beginning of a function's machine code there are typically a few
|
||||
//! instructions for loading argument values into allocas and checking if
|
||||
//! there's enough stack space for the function to execute. This *prologue* is
|
||||
//! not visible in the source code and LLVM puts a special PROLOGUE END marker
|
||||
//! into the line table at the first non-prologue instruction of the function.
|
||||
//! In order to find out where the prologue ends, LLVM looks for the first
|
||||
//! instruction in the function body that is linked to a source location. So,
|
||||
//! when generating prologue instructions we have to make sure that we don't
|
||||
//! emit source location information until the 'real' function body begins. For
|
||||
//! this reason, source location emission is disabled by default for any new
|
||||
//! function being codegened and is only activated after a call to the third
|
||||
//! function from the list above, `start_emitting_source_locations()`. This
|
||||
//! function should be called right before regularly starting to codegen the
|
||||
//! top-level block of the given function.
|
||||
//!
|
||||
//! There is one exception to the above rule: `llvm.dbg.declare` instruction
|
||||
//! must be linked to the source location of the variable being declared. For
|
||||
//! function parameters these `llvm.dbg.declare` instructions typically occur
|
||||
//! in the middle of the prologue, however, they are ignored by LLVM's prologue
|
||||
//! detection. The `create_argument_metadata()` and related functions take care
|
||||
//! of linking the `llvm.dbg.declare` instructions to the correct source
|
||||
//! locations even while source location emission is still disabled, so there
|
||||
//! is no need to do anything special with source location handling here.
|
||||
//!
|
||||
//! ## Unique Type Identification
|
||||
//!
|
||||
//! In order for link-time optimization to work properly, LLVM needs a unique
|
||||
//! type identifier that tells it across compilation units which types are the
|
||||
//! same as others. This type identifier is created by
|
||||
//! `TypeMap::get_unique_type_id_of_type()` using the following algorithm:
|
||||
//!
|
||||
//! (1) Primitive types have their name as ID
|
||||
//! (2) Structs, enums and traits have a multipart identifier
|
||||
//!
|
||||
//! (1) The first part is the SVH (strict version hash) of the crate they
|
||||
//! were originally defined in
|
||||
//!
|
||||
//! (2) The second part is the ast::NodeId of the definition in their
|
||||
//! original crate
|
||||
//!
|
||||
//! (3) The final part is a concatenation of the type IDs of their concrete
|
||||
//! type arguments if they are generic types.
|
||||
//!
|
||||
//! (3) Tuple-, pointer and function types are structurally identified, which
|
||||
//! means that they are equivalent if their component types are equivalent
|
||||
//! (i.e., (i32, i32) is the same regardless in which crate it is used).
|
||||
//!
|
||||
//! This algorithm also provides a stable ID for types that are defined in one
|
||||
//! crate but instantiated from metadata within another crate. We just have to
|
||||
//! take care to always map crate and `NodeId`s back to the original crate
|
||||
//! context.
|
||||
//!
|
||||
//! As a side-effect these unique type IDs also help to solve a problem arising
|
||||
//! from lifetime parameters. Since lifetime parameters are completely omitted
|
||||
//! in debuginfo, more than one `Ty` instance may map to the same debuginfo
|
||||
//! type metadata, that is, some struct `Struct<'a>` may have N instantiations
|
||||
//! with different concrete substitutions for `'a`, and thus there will be N
|
||||
//! `Ty` instances for the type `Struct<'a>` even though it is not generic
|
||||
//! otherwise. Unfortunately this means that we cannot use `ty::type_id()` as
|
||||
//! cheap identifier for type metadata -- we have done this in the past, but it
|
||||
//! led to unnecessary metadata duplication in the best case and LLVM
|
||||
//! assertions in the worst. However, the unique type ID as described above
|
||||
//! *can* be used as identifier. Since it is comparatively expensive to
|
||||
//! construct, though, `ty::type_id()` is still used additionally as an
|
||||
//! optimization for cases where the exact same type has been seen before
|
||||
//! (which is most of the time).
|
|
@ -1,5 +1,4 @@
|
|||
// See doc.rs for documentation.
|
||||
mod doc;
|
||||
#![doc = include_str!("doc.md")]
|
||||
|
||||
use rustc_codegen_ssa::mir::debuginfo::VariableKind::*;
|
||||
|
||||
|
|
|
@ -8,6 +8,7 @@
|
|||
#![feature(bool_to_option)]
|
||||
#![feature(const_cstr_unchecked)]
|
||||
#![feature(crate_visibility_modifier)]
|
||||
#![feature(extended_key_value_attributes)]
|
||||
#![feature(extern_types)]
|
||||
#![feature(in_band_lifetimes)]
|
||||
#![feature(nll)]
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue