SSH At Scale With OpenSSH Certificates - Final points
This is the third post in a series on using OpenSSH Certificates to secure access to large numbers of similar devices. The first post can be found here, and the second can be found here.
The practical example left the means of scaling to the immagination, but there is one thing that is not
obvious without looking at the source code of ssh-keygen
itself:
A certificate may not have more than 255 principals
To use certificates at scale, where we will assume to be talking on the order of 2^16 or more, one needs to split a target queue into chunks of no more than 255 devices. Once split up into these chunks, it is rather simple to fetch certificates allowing access to devices from each chunk and map those to their corresponding target lists. Simple set and dictionary objects can handle this well within Python, for example, and can be used to feed a worker pool. The AsyncSSH library in Python supports certificates exceptionally well, and would make a good basis from which one could build both an SSH CA and client tooling capable of highly concurrent and secure access to a very large number of similar targets.
It is worth noting that when using certificates and eliminating password usage, an emergency hatch is necessary. My exmaple used a single CA key, and the loss (or compromise) of that key would be Very Bad News™. One should not depend on a single point of failure, so consider a rotational scheme where your devices know up front about a possible set of keys. Perhaps, if you are in an embedded environment, your base image contains one common CA that will always be available in an emergency but is hopefully never needed in production. Such a key should ideally be stored away in an airgapped HSM with strict, and audited, access policies governing its use. Another set of CA keys may be defined during device provisioning, and could correspond to keys available from a certificate service available over some network connection.
Finally, when using certificates in the ways discussed in this series, client keys can be ephemeral. The certificate authority grantes the powers needed to access the systems that trust it to any public key it signs. If this is combined securely with an external auth provider trusted by the CA, then any client tooling created can utilize per-job key material that is itself never exported to disk. When validity periods are kept to a minimum, this greatly reduces the potential for abuse and the window of opportunity for attacks is narrowed.