rustdoc: use a trie for name-based search
Preview and profiler results ---------------------------- Here's some quick profiling in Firefox done on the rust compiler docs: - Before: https://share.firefox.dev/3UPm3M8 - After: https://share.firefox.dev/40LXvYb Here's the results for the node.js profiler: - https://notriddle.com/rustdoc-html-demo-15/trie-perf/index.html Here's a copy that you can use to try it out. Compare it with [the nightly]. Try typing `typecheckercontext` one character at a time, slowly. - https://notriddle.com/rustdoc-html-demo-15/compiler-doc-trie/index.html [the nightly]: https://doc.rust-lang.org/nightly/nightly-rustc/ The fuzzy match algo is based on [Fast String Correction with Levenshtein-Automata] and the corresponding implementation code in [moman] and [Lucene]; the bit-packing representation comes from Lucene, but the actual matcher is more based on `fsc.py`. As suggested in the paper, a trie is used to represent the FSA dictionary. The same trie is used for prefix matching. Substring matching is done with a side table of three-character[^1] windows that point into the trie. [Fast String Correction with Levenshtein-Automata]: https://github.com/tpn/pdfs/blob/master/Fast%20String%20Correction%20with%20Levenshtein-Automata%20(2002)%20(10.1.1.16.652).pdf [Lucene]: https://fossies.org/linux/lucene/lucene/core/src/java/org/apache/lucene/util/automaton/Lev1TParametricDescription.java [moman]: https://gitlab.com/notriddle/moman-rustdoc User-visible changes -------------------- I don't expect anybody to notice anything, but it does cause two changes: - Substring matches, in the middle of a name, only apply if there's three or more characters in the search query. - Levenshtein distance limit now maxes out at two. In the old version, the limit was w/3, so you could get looser matches for queries with 9 or more characters[^1] in them. [^1]: technically utf-16 code units
This commit is contained in:
parent
242f20dc1e
commit
86da4be47f
6 changed files with 743 additions and 130 deletions
|
@ -5,6 +5,5 @@ const EXPECTED = {
|
|||
'others': [
|
||||
{ 'path': 'std::f32', 'name': 'is_nan' },
|
||||
{ 'path': 'std::f64', 'name': 'is_nan' },
|
||||
{ 'path': 'std::option::Option', 'name': 'is_none' },
|
||||
],
|
||||
};
|
||||
|
|
|
@ -3,16 +3,8 @@ const FILTER_CRATE = "std";
|
|||
const EXPECTED = [
|
||||
{
|
||||
query: 'vec::intoiterator',
|
||||
others: [
|
||||
// trait std::iter::IntoIterator is not the first result
|
||||
{ 'path': 'std::vec', 'name': 'IntoIter' },
|
||||
{ 'path': 'std::vec::Vec', 'name': 'into_iter' },
|
||||
{ 'path': 'std::vec::Drain', 'name': 'into_iter' },
|
||||
{ 'path': 'std::vec::IntoIter', 'name': 'into_iter' },
|
||||
{ 'path': 'std::vec::ExtractIf', 'name': 'into_iter' },
|
||||
{ 'path': 'std::vec::Splice', 'name': 'into_iter' },
|
||||
{ 'path': 'std::collections::vec_deque::VecDeque', 'name': 'into_iter' },
|
||||
],
|
||||
// trait std::iter::IntoIterator is not the first result
|
||||
others: [],
|
||||
},
|
||||
{
|
||||
query: 'vec::iter',
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue