Stable hashing: add comments and tests concerning platform-independence

SipHasher128 implements short_write in an endian-independent way, yet
its write_xxx Hasher trait methods undo this endian-independence by byte
swapping the integer inputs on big-endian hardware. StableHasher then
adds endian-independence back by also byte-swapping on big-endian
hardware prior to invoking SipHasher128.

This double swap may have the appearance of being a no-op, but is in
fact by design. In particular, we really do want SipHasher128 to be
platform-dependent, in order to be consistent with the libstd SipHasher.
Try to clarify this intent. Also, add and update a couple of unit tests.
This commit is contained in:
Tyson Nottingham 2020-09-28 17:34:27 -07:00
parent fc2daaae61
commit d061fee177
4 changed files with 142 additions and 18 deletions

View file

@ -125,15 +125,28 @@ impl SipHasher128 {
// A specialized write function for values with size <= 8.
//
// The hashing of multi-byte integers depends on endianness. E.g.:
// The input must be zero-extended to 64-bits by the caller. This extension
// isn't hashed, but the implementation requires it for correctness.
//
// This function, given the same integer size and value, has the same effect
// on both little- and big-endian hardware. It operates on values without
// depending on their sequence in memory, so is independent of endianness.
//
// However, we want SipHasher128 to be platform-dependent, in order to be
// consistent with the platform-dependent SipHasher in libstd. In other
// words, we want:
//
// - little-endian: `write_u32(0xDDCCBBAA)` == `write([0xAA, 0xBB, 0xCC, 0xDD])`
// - big-endian: `write_u32(0xDDCCBBAA)` == `write([0xDD, 0xCC, 0xBB, 0xAA])`
//
// This function does the right thing for little-endian hardware. On
// big-endian hardware `x` must be byte-swapped first to give the right
// behaviour. After any byte-swapping, the input must be zero-extended to
// 64-bits. The caller is responsible for the byte-swapping and
// zero-extension.
// Therefore, in order to produce endian-dependent results, SipHasher128's
// `write_xxx` Hasher trait methods byte-swap `x` prior to zero-extending.
//
// If clients of SipHasher128 itself want platform-independent results, they
// *also* must byte-swap integer inputs before invoking the `write_xxx`
// methods on big-endian hardware (that is, two byte-swaps must occur--one
// in the client, and one in SipHasher128). Additionally, they must extend
// `usize` and `isize` types to 64 bits on 32-bit systems.
#[inline]
fn short_write<T>(&mut self, _x: T, x: u64) {
let size = mem::size_of::<T>();