Do you know the feeling when you start a project and you think it will be easy, but then you realize that it is not? This is the story of the implementation of ssh2-config, a Rust library to parse SSH config files.
Why did I write this? Because basically most of termscp users wanted to have a way to use their SSH config files with it. And obviously the only way to achieve that was to write a library that would parse SSH config files.
I F*ck*d up
This is an example of a SSH configuration file:
User veesoHost 192.168.1.*compression yesUser fooHost 192.168.1.1User rootPort 2222IdentityFile ~/.ssh/id_rsa
So what's obvious is that in SSH configuration we can specify specific or pattern rules for hosts.
So for example we've got the first rule User veeso
which is general for all hosts and in case we're connecting to 192.168.1.1
both host blocks rules will be taken.
But how are conflicts resolved?
If your mind is sane enough, you would think that the most specific rule wins in conflicts.
So for instance if we connect to 192.168.1.1
you would expect the User
to be root
, because
So you would expect this kind of hierarchy: * < 192.168.1.* < 192.168.1.1

No, you're wrong.
The SSH configuration doesn't give a f*ck about hierarchy.
In the manual page from OpenBSD for SSH-config we can read:
Unless noted otherwise, for each parameter, the first obtained value will be used. The configuration files contain sections separated by Host specifications, and that section is only applied for hosts that match one of the patterns given in the specification. The matched host name is usually the one given on the command line (see the CanonicalizeHostname option for exceptions).
Since the first obtained value for each parameter is used, more host-specific declarations should be given near the beginning of the file, and general defaults at the end.
And it just took 4 major versions of my parser to figure that out.
Should have I read the entire documentation before starting the implementation? Yes. Did I? No. I thought I could just read the first few lines and then implement the parser. But no, I had to read the entire documentation to understand how it works.
Why didn't I read this part? Because everybody would think that the most specific rule wins in conflicts. And I thought that it was obvious. But no, it's not. It's not obvious at all. And I had to learn it the hard way.
So how do we setup this?
First of all what we could think the configuration with a structure like this:
Config -> HostMatch -> Parameters
so let's dive into the code, bottom-up
Parameters
We'll need to define a structure that will hold all the parameters we can find in the SSH configuration file and what parameters we want to support.
/// Describes the ssh configuration./// Configuration is describes in this document: <http://man.openbsd.org/OpenBSD-current/man5/ssh_config.5>/// Only arguments supported by libssh2 are implemented#[derive(Debug, Clone, PartialEq, Eq)]pub struct HostParams {pub bind_address: Option<String>,pub bind_interface: Option<String>,pub ca_signature_algorithms: Algorithms,pub certificate_file: Option<PathBuf>,pub ciphers: Algorithms,pub compression: Option<bool>,pub connection_attempts: Option<usize>,pub connect_timeout: Option<Duration>,pub host_key_algorithms: Algorithms,pub host_name: Option<String>,pub identity_file: Option<Vec<PathBuf>>,pub ignore_unknown: Option<Vec<String>>,pub kex_algorithms: Algorithms,pub mac: Algorithms,pub port: Option<u16>,pub pubkey_accepted_algorithms: Algorithms,pub pubkey_authentication: Option<bool>,pub remote_forward: Option<u16>,pub server_alive_interval: Option<Duration>,pub tcp_keep_alive: Option<bool>,pub user: Option<String>,pub ignored_fields: HashMap<String, Vec<String>>,pub unsupported_fields: HashMap<String, Vec<String>>,}
Host
The host structure will hold two things:
- the rule to match against that host
- the parameters to be used for that host
/// Describes the rules to be used for a certain host#[derive(Debug, Clone, PartialEq, Eq)]pub struct Host {/// List of hosts for which params are valid. String is string pattern, bool is whether condition is negatedpub pattern: Vec<HostClause>,pub params: HostParams,}/// Describes a single clause to match host#[derive(Debug, Clone, PartialEq, Eq)]pub struct HostClause {pub pattern: String,pub negated: bool,}
The host clause will hold the pattern to match against the host and a boolean to indicate whether the condition is negated or not. The pattern is a string that can be a hostname, an IP address or a wildcard pattern.
and finally we can implement the intersects
method to check if the host matches the pattern:
impl Host {/// Returns whether `host` argument intersects the host clausespub fn intersects(&self, host: &str) -> bool {let mut has_matched = false;for entry in self.pattern.iter() {let matches = entry.intersects(host);// If the entry is negated and it matches we can stop searchingif matches && entry.negated {return false;}has_matched |= matches;}has_matched}}impl HostClause {/// Returns whether `host` argument intersects the clausepub fn intersects(&self, host: &str) -> bool {WildMatch::new(self.pattern.as_str()).matches(host)}}
SSH Config
Eventually, SSH config will be our top level structure.
The idea is to have a Vec<Host>
with all the possible matches for hosts; hosts will be identified by their matching pattern.
/// Describes the ssh configuration./// Configuration is described in this document: <http://man.openbsd.org/OpenBSD-current/man5/ssh_config.5>#[derive(Debug, Clone, PartialEq, Eq, Default)]pub struct SshConfig {/// Rulesets for hosts./// Default config will be stored with key `*`hosts: Vec<Host>,}
And we'll mainly need two methods:
-
one to parse the configuration from file:
pub fn parse(mut self, reader: &mut impl BufRead) -> SshParserResult<Self> {parser::SshConfigParser::parse(&mut self, reader).map(|_| self)} -
one to query parameters for a certain host:
pub fn query<S: AsRef<str>>(&self, pattern: S) -> HostParams {let mut params = HostParams::new(&self.default_algorithms);// iter keys, overwrite if None top-downfor host in self.hosts.iter() {if host.intersects(pattern.as_ref()) {debug!("Merging params for host: {:?} into params {params:?}",host.pattern);params.overwrite_if_none(&host.params);trace!("Params after merge: {params:?}");}}// return calculated paramsparams}The query function will start with a default
HostParams
and after that we'll iterate over all the hosts, check if the provided pattern intersects with the host pattern for that host, and if so it will overwrite the current parameters unset values with the host parameters. We MUST NOT overwrite the current parameters if they are already set, because the first obtained value will be used.And another very important thing: we need to store hosts in a top-down order, because the first obtained value will be used. So if we have a more specific host match, it should be at the top of the list. This basically means that Host in the Vector must be ordered as they are in the file.
Now we can start finally implementing the parser itself, which is called SshConfigParser
and is located in the parser
module.
Implementing the parser
First, initialize our parse
function:
pub fn parse(config: &mut SshConfig,reader: &mut impl BufRead,) -> SshParserResult<()> {// ...}
We'll have to read the file line by line and parse each line.
Any line that contains a valid parameter, must refer to the current Host parameters.
So the first thing we need to do is to create the first Host, which since is not part of any host block, will have as pattern match *
, which means, anything.
config.hosts.push(Host::new(vec![HostClause::new(String::from("*"), false)],HostParams::new(),));// get current host pointerlet mut current_host = config.hosts.last_mut().unwrap();
and now we can start iterating line by line:
let mut lines = reader.lines();// iter linesloop {let line = match lines.next() {None => break,Some(Err(err)) => return Err(SshParserError::Io(err)),Some(Ok(line)) => Self::strip_comments(line.trim()),};if line.is_empty() {continue;}// tokenizelet (field, args) = match Self::tokenize_line(&line) {Ok((field, args)) => (field, args),Err(SshParserError::UnknownField(field, args))if rules.intersects(ParseRule::ALLOW_UNKNOWN_FIELDS)|| current_host.params.ignored(&field) =>{current_host.params.ignored_fields.insert(field, args);continue;}Err(SshParserError::UnknownField(field, args)) => {return Err(SshParserError::UnknownField(field, args));}Err(err) => return Err(err),};// If field is block, init a new blockif field == Field::Host {// Pass `ignore_unknown` from global overrides down into the tokenizer.let mut params = HostParams::new(&config.default_algorithms);params.ignore_unknown = config.hosts[0].params.ignore_unknown.clone();let pattern = Self::parse_host(args)?;trace!("Adding new host: {pattern:?}",);// Add a new hostconfig.hosts.push(Host::new(pattern, params));// Update current host pointercurrent_host = config.hosts.last_mut().unwrap();} else {// Update fieldmatch Self::update_host(field,args,current_host,rules,&config.default_algorithms,) {Ok(()) => Ok(()),// If we're allowing unsupported fields to be parsed, add them to the mapErr(SshParserError::UnsupportedField(field, args))if rules.intersects(ParseRule::ALLOW_UNSUPPORTED_FIELDS) =>{current_host.params.unsupported_fields.insert(field, args);Ok(())}// Eat the error here to not break the API with this change// Also it'd be weird to error on correct ssh_config's just because they're// not supported by this libraryErr(SshParserError::UnsupportedField(_, _)) => Ok(()),e => e,}?;}}// finally return OkOk(())
Tokenizing lines
A further complexity with tokenizing lines in the ssh configuration, is that we MUST support all of these syntax for defining the parameters:
Field value
Field=value
Field = value
Field "hello world"
- Quotes are used to escape spacesField="hello world"
So fields can be either be separated by =
or by spaces, or either.
And not to mention that the field can have any indentation before it.
/// Tokenize line if possible. Returns [`Field`] name and args as a [`Vec`] of [`String`].////// All of these lines are valid for tokenization////// ```txt/// IgnoreUnknown=Pippo,Pluto/// ConnectTimeout = 15/// Ciphers "Pepperoni Pizza,Margherita Pizza,Hawaiian Pizza"/// Macs="Pasta Carbonara,Pasta con tonno"/// ```////// So lines have syntax `field args...`, `field=args...`, `field "args"`, `field="args"`fn tokenize_line(line: &str) -> SshParserResult<(Field, Vec<String>)> {// check what comes first, space or =?let trimmed_line = line.trim();// first token is the field, and it may be separated either by a space or by '='let (field, other_tokens) = if trimmed_line.find('=').unwrap_or(usize::MAX)< trimmed_line.find(char::is_whitespace).unwrap_or(usize::MAX){trimmed_line.split_once('=').ok_or(SshParserError::MissingArgument)?} else {trimmed_line.split_once(char::is_whitespace).ok_or(SshParserError::MissingArgument)?};trace!("tokenized line '{line}' - field '{field}' with args '{other_tokens}'",);// other tokens should trim = and whitespacelet other_tokens = other_tokens.trim().trim_start_matches('=').trim();trace!("other tokens trimmed: '{other_tokens}'",);// if args is quoted, don't split itlet args = if other_tokens.starts_with('"') && other_tokens.ends_with('"') {trace!("quoted args: '{other_tokens}'",);vec![other_tokens[1..other_tokens.len() - 1].to_string()]} else {trace!("splitting args (non-quoted): '{other_tokens}'",);// split by whitespacelet tokens = other_tokens.split_whitespace();tokens.map(|x| x.trim().to_string()).filter(|x| !x.is_empty()).collect()};match Field::from_str(field) {Ok(field) => Ok((field, args)),Err(_) => Err(SshParserError::UnknownField(field.to_string(), args)),}}
A lot of fun, right?
After that we'll need to implement the update_host
function, which will update the current host parameters with the new parameters.
fn update_host(field: Field,args: Vec<String>,host: &mut Host,rules: ParseRule,default_algos: &DefaultAlgorithms,) -> SshParserResult<()> {trace!("parsing field {field:?} with args {args:?}",);let params = &mut host.params;match field {Field::BindAddress => {let value = Self::parse_string(args)?;trace!("bind_address: {value}",);params.bind_address = Some(value);}// ...}}
And we'll have a parser which converts the args Vec<String>
into the correct type for each parameter.
For example, for the BindAddress
field, we can have a simple string:
/// Parse string argumentfn parse_string(args: Vec<String>) -> SshParserResult<String> {if let Some(s) = args.into_iter().next() {Ok(s)} else {Err(SshParserError::MissingArgument)}}
And anything else for every other type, for example booleans have to be parsed as yes -> true
and no -> false
, etc.
And with that we can parse the SSH configuration file and get the parameters for a certain host.
If it wasn't for the fact that there is the Include
directive.
The Include directive
The Include
directive is a special directive that allows you to include other SSH configuration files in your main configuration file.
Basically it accepts a path
to a file or a glob pattern, and it will include all the files that match the pattern.
How are the parameters applied? Luckily or not, it behaves like the parameters contained in the included file are placed where the Include
directive is.
So in the update_host function, when we have include:
Field::Include => {Self::include_files(args, host, rules)?;}
At this point let's see how to do this. Basically we need to open each file and read them to a new SshConfig
object, and then merge the parameters into the current host.
/// include a file by parsing it and updating host rules by merging the read config to the current one for the hostfn include_files(args: Vec<String>,host: &mut Host,rules: ParseRule,) -> SshParserResult<()> {let path_match = Self::parse_string(args)?;trace!("include files: {path_match}",);let files = glob(&path_match)?;for file in files {let file = file?;trace!("including file: {}", file.display());let mut reader = BufReader::new(File::open(file)?);let mut sub_config = SshConfig::default();Self::parse(&mut sub_config, &mut reader, rules)?;// merge sub-config into hostfor pattern in &host.pattern {if pattern.negated {trace!("excluding sub-config for pattern: {pattern:?}",);continue;}trace!("merging sub-config for pattern: {pattern:?}",);let params = sub_config.query(&pattern.pattern);host.params.overwrite_if_none(¶ms);}}Ok(())}
Uncovered topics
There are some things that I didn't cover in this article, like:
- The
Algorithms
struct, which is a wrapper around aVec<String>
and provides some helper methods to parse the algorithms from the SSH configuration file. Why? Because algorithms in SSH config are a comma separated list of algorithms and can specify with a prefix whether the algorithms should replace, head, appended or exclude to/from default algorithms. - Default algorithms: if you want to deal with algorithms on ssh config, you need to know what the default algorithms are. And guess what? There's not any rust library which interacts with openssh which exposes them. So how did I do this? I wrote a PARSER of a C header file which parses the latest defines from the openssh repository. This is a bit hacky, but it works. And I think it's the only way to do this.
Conclusion
In this article I tried to cover the main points of the implementation of the SSH config parser in Rust. I hope you enjoyed it and learned something new. I also hope that you will not have to go through the same pain I went through to implement this parser.