Map Keys and Lifetimes


UPDATE: Less than 30 minutes after posting this, Andrew Werner came along with a really nice answer. It's now included as the bottom of the post so you can walk through a few ways to almost solve this problem before hitting an actual answer.


I ran into a weird complexity cliff with Rust's HashMap and lifetimes the other day. It's enough of a head scratcher that I thought I'd write it up and see if anyone on the broader internet knew what was going on.

The rest of this post assumes you're comfortable enough with Rust to see some moderately complicated lifetimes and understand what's going on. You don't need a perfect mental model of the borrow checker.


HashMap and BTreeMap are pretty darn good and I use them a lot. Suppose you're me and you're making some HTTP requests and you want to track some request stats by host and port. It's pretty easy and boring to do that with a HashMap.

use std::collections::HashMap;

fn main() {
    let bender = Host { hostname: "bender".to_string(), port: 666 };

    let mut host_stats = HashMap::new();
    host_stats.insert(
        bender.clone(),
        Stats { sent: 30, errors: 30 },
    );
    assert_eq!(
        host_stats.get(&bender),
        Some(&Stats { sent: 30, errors: 30}),
    );
}

#[derive(Clone, Debug, PartialEq, Eq, Hash)]
struct Host {
    hostname: String,
    port: u16,
}

#[derive(Debug, PartialEq)]
struct Stats {
    sent: usize,
    errors: usize,
}

Now suppose you're me and you're trying to wrap stats up with some other state or just want to hide it behind a reasonable interface because we live in a society. I'm not trying to write HashMap<(String, u16), Stats> everywhere, and I also don't want people to have to see Host to use this code.

struct HostStats {
    stats: HashMap<Host, Stats>,
}

impl HostStats {
    fn get(&self, hostname: &str, port: u16) -> Option<&Stats> {
        self.stats.get(&Host{
            hostname: hostname.to_string(),
            port,
        })
    }

    fn record(&mut self, hostname: String, port: u16, sent: usize, errors: usize) {
        let host = Host {
            hostname,
            port,
        };
        self.stats.insert(host, Stats { sent, errors });
    }
}

Now, suppose you're me and you're suddenly annoyed that we can't look up host with just a reference in get - we have to copy the entire string (hostname.to_string()) to build an owned Host and then immediately take a reference to it to look up a key.

My first thought here was to define a key-reference type that can be compared to Host but doesn't own a hostname. HashMap had other ideas. Take a look at the signature of get; it allows you to do a lookup with any key type Q that can be borrowed as if it was a K.

pub fn get<Q>(&self, key: &Q) -> Option<&V>
where
    K: Borrow<Q> + Ord,
    Q: Hash + Eq + ?Sized;

If you try to do what I tried first, you end up being unable to write a useful implementation of Borrow. There's no way to return a &HostRef here without referencing a temporary value AND there is no way to express the appropriate lifetime constraint without conflicting with the signature of the Borrow trait.

struct HostRef<'a> {
    hostname: &'a str,
    port: u16,
}


impl std::borrow::Borrow<HostRef> for Host {
    fn borrow(&self) -> &HostRef {
        todo!("good luck")
    }
}

The fundamental problem with that approach is trying to express that hostname can either be borrowed or owned, and it shouldn't make a difference. If only there was a way to express that.


Having a Cow almost works, man.

When inserting and get a key, things compile. We can use Cow<'static, str> to express that we're only going to keep owned keys in the HashMap and keep explicit lifetime annotations from becoming viral.

Howerever, there are some lifetime shenanigans necessary in the signature of get - we seem to have have to tell the borrow checker that the borrowed hostname will live at least as long as &self and the output value. Intuitively that restriction doesn't make much sense to me - get in touch if it makes sense to you.

use std::borrow::Cow;
use std::collections::HashMap;

#[derive(Clone, Debug, PartialEq, Eq, Hash)]
struct Host<'a> {
    /// this is now a Cow<'a, str> instead of a String.
    hostname: Cow<'a, str>,
    port: u16,
}

#[derive(Debug, PartialEq)]
struct Stats {
    sent: usize,
    errors: usize,
}

struct HostStats {
    /// store keys as Host<'static> so we don't have to let lifetime
    /// annotations go viral.
    stats: HashMap<Host<'static>, Stats>,
}

impl HostStats {
    /// get now needs some lifetimes - we have to tell the compiler that 'b
    /// outlives 'a, otherwise it gets a bit confused about our key types.
    fn get<'a, 'b: 'a>(&'a self, hostname: &'b str, port: u16) -> Option<&'a Stats> {
        let host = Host {
            hostname: Cow::from(hostname),
            port,
        };

        self.stats.get(&host)
    }

    /// when we save data, we're dealing with owned hostnames, so inferred
    /// lifetimes work here.
    fn record(&mut self, hostname: String, port: u16, sent: usize, errors: usize) {
        let host = Host {
            hostname: Cow::from(hostname),
            port,
        };

        self.stats.insert(host, Stats { sent, errors });
    }
}

When writing the code that inspired this post, I ended up needing something like try_record - update stats if they were present, but do nothing if that host was absent.

That extra level of complexity seems to confuse the borrow checker badly enough that the code doesn't compile any more. I'm not sure exactly why, but trying to define that method makes rustc so mad that it gives you error[E0521]: borrowed data escapes outside of method.

impl HostStats {
    fn try_record(&mut self, hostname: &str, port: u16, sent: usize, errors: usize) {
        let host = Host {
            hostname: Cow::from(hostname),
            port,
        };

        if let Some(stats) = self.stats.get_mut(&host) {
            stats.sent = sent;
            stats.errors = errors;
        }
    }
}

If you understand this one, please get in touch. I'm extremely curious! The code is on the playground if you'd like to mess with it yourself.


I ended up working through this problem by working around it; the compound key lookup didn't end up being important to my problem. If it had been, I would have taken Solly's excellent suggestion to use the hashbrown crate directly.

hashbrown gives you two ways out of this - get is defined in terms of an Equivalent trait instead of Borrow and, if you really need it, the raw entry API is right there.


Less than 30 minutes after posting this, I got some answers!

Andrew Werner came through with an extremely nice way to define key types. The Borrow trait requires you to be able to return a reference to the borrowed type, so Andrew split the Host into an inner HostKey and an outer Host so that the Host can be borrowed as a HostKey.

#[derive(Clone, Debug, PartialEq, Eq, Hash)]
struct HostKey<'a> {
    hostname: Cow<'a, str>,
    port: u16,
}

#[derive(Clone, Debug, PartialEq, Eq, Hash)]
struct Host<'a>(HostKey<'a>);

impl<'a, 'me> Borrow<HostKey<'a>> for Host<'me>
where
    'me: 'a,
{
    fn borrow(&self) -> &HostKey<'a> {
        return &self.0;
    }
}

The lifetime bounds here are important: you're only allowed to borrow a Host as a HostKey if the lifetime on the HostKey is shorter than the lifetime of the Host. I'm not sure if you'd ever want any other bounds here, but we have to express them explicitly to rustc.

A kind commenter on Reddit also pointed out that the weird lifetime error I hit in try_record is probably a known issue with get and get_mut.