For some time now, I’ve been fascinated by the way that networks interconnect with one another. What is happening when I type google.com
into my browser? How does my ISP send my request to Google? What does that interconnection look like?
I’ve come to learn that the answer is BGP: the Border Gateway Protocol. BGP is a routing protocol in which two networks (known as “autonomous systems”) exchange information about how to get to their slice of the internet. We can visualize the boundaries of autonomous systems with a traceroute
. As I write this article from a coffee shop, I start a trace route to Google:
$ traceroute 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 64 hops max, 52 byte packets
1 nyclnyrga1r.wifi.rr.com (24.29.97.185) 15.426 ms 14.597 ms 17.935 ms
2 agg210.nyclnyrg01r.nyc.rr.com (24.29.97.176) 22.046 ms 22.827 ms 15.998 ms
3 bu-ether29.nwrknjmd67w-bcr00.tbone.rr.com (107.14.19.24) 21.914 ms 21.526 ms 11.968 ms
4 66.109.5.138 (66.109.5.138) 21.063 ms
bu-ether12.nycmny837aw-bcr00.tbone.rr.com (66.109.6.27) 21.389 ms
66.109.5.138 (66.109.5.138) 22.444 ms
5 0.ae1.pr0.nyc20.tbone.rr.com (66.109.6.163) 14.863 ms 60.652 ms 20.229 ms
6 ix-ae-6-0.tcore1.n75-new-york.as6453.net (66.110.96.53) 19.951 ms
ix-ae-10-0.tcore1.n75-new-york.as6453.net (66.110.96.13) 18.836 ms
ix-ae-6-0.tcore1.n75-new-york.as6453.net (66.110.96.53) 14.319 ms
7 72.14.195.232 (72.14.195.232) 18.199 ms 15.604 ms 17.883 ms
8 * * *
9 dns.google (8.8.8.8) 20.173 ms 22.622 ms 16.466 ms
Inspecting the traceroute output, we can see the first seven hops all include the rr.com
domain. The “rr” domain is a reference to Road Runner, an ISP which was acquired by Time Warner some years ago. Great, so we know that my coffee shop’s ISP is Time Warner and my request starts by flowing through a series of their routers until eventually it gets to ix-ae-6-0.tcore1.n75-new-york.as6453.net
. This is the first autonomous system boundary on our way to Google. After some research, we see that AS6453 is Tata Communications, a “tier 1” ISP. This seems to suggest that Time Warner doesn’t have a direct route to Google, instead, they pass along my request to their ISP that does know how to get to Google. Tata then takes my request and routes it through a few of their own hops until eventually we enter the next autonomous system boundary and my request lands on a Google DNS server.
At each of these autonomous system boundaries, the routers establish a BGP session with one another and exchange routes that they know how to get to.
BGP is a naive protocol that blindly trusts the routes being passed between autonomous systems. What happens if an AS accidentally shares a route that doesn’t belong to it? Could a bad actor maliciously originate a route and intercept that traffic? The answer is yes, and has been the cause of numerous outages across the internet like this one.
One solution to this is RPKI, in which the network operators can cryptographically sign which autonomous systems are allowed to originate their routes (also known as prefixes). Regional internet registries (RIRs) then host these signed certificates, which are known as Route Origin Authorizations.
This is great, but requires that autonomous systems only accept RPKI valid routes, which requires changes to their BGP-facing routers.
This article details how to drop invalid RPKI routes using BIRD, a software based router on Linux.
It’s worth noting that RPKI validation is not done on the router itself, but instead on a validator. The valid routes are then sent to the router using a protocol called “RPKI to RTR”. The reason the validation doesn’t take place on the router is largely because this cryptographic verification can be expensive, and most routers aren’t well equipped to do it.
There are many different options out there for RPKI to RTR implementations. They range from Cloudflare’s gortr (written in Go), NLnet Labs’ routinator (written in Rust and also does the validation), and RTRLib.
I’ve chose to run gortr, mostly because it can run inside a simple Docker container and because Cloudflare caches the aggregated prefix data sources at its edges, which means it should be relatively quick.
I started by setting up gortr on a Docker host with the following docker-compose.yml
file:
version: '3'
services:
gortr:
image: cloudflare/gortr
container_name: gortr
restart: unless-stopped
ports:
- 8282:8282
And then started it with docker-compose up -d
. Tailing the logs we can see it starts up successfully docker-compose logs -f
:
gortr | time="2019-11-11T16:21:23Z" level=info msg="New update (114706 uniques, 114706 total prefixes). 0 bytes. Updating sha256 hash -> a294dbee9bae169a5d007fa15b084d122ecbb202f2039e4862d75b4a783c9ca7"
gortr | time="2019-11-11T16:21:24Z" level=info msg="Updated added, new serial 1"
Great, now gortr has a list of RPKI valid prefixes, we just have to get them over to BIRD. As of version 2.0.7, BIRD supports two ways of transporting this data: unencrypted TCP and SSH. For today’s post let’s use TCP.
Despite not using SSH as the transport, we need to ensure that BIRD is compiled with support for libssh (source. The libssh package varies by distribution, but if you’re on a Debian-based host you can use libssh-dev
. Start by installing some of BIRD’s dependencies, downloading, and compiling it:
apt install build-essential flex bison libncurses5-dev libreadline-dev libssh-dev
cd /tmp
wget ftp://bird.network.cz/pub/bird/bird-2.0.7.tar.gz
tar -xzf bird-2.0.7.tar.gz
cd bird-2.0.7
./configure --enable-libssh
make
make install
Once you have BIRD up and running, we need to configure it to point to our RPKI-to-Router server. This can be done with the following configuration:
roa4 table r4;
roa6 table r6;
protocol rpki {
roa4 { table r4; };
roa6 { table r6; };
remote “docker.fqdn.com” port 8282;
retry keep 90;
refresh keep 900;
expire keep 172800;
}
function is_v4_rpki_invalid () {
return roa_check(r4, net, bgp_path.last_nonaggregated) = ROA_INVALID;
}
function is_v6_rpki_invalid () {
return roa_check(r6, net, bgp_path.last_nonaggregated) = ROA_INVALID;
}
(Be sure to swap out docker.fqdn.com
for the hostname or IP address of your RPKI-to-Router server).
This configuration does a few things:
import
s to check if a route is invalidOnce you’ve saved your configuration, open a BIRD console with birdc
, reload the configuration, and check to see if it can communicate with the server:
bird> configure soft
Reading configuration from /etc/bird.conf
Reconfigured
bird> show protocols rpki1
Name Proto Table State Since Info
rpki1 RPKI --- up 2019-11-11 16:22:42 Established
Now we have two new functions that we can use to check if prefixes are invalid: is_v4_rpki_invalid
and is_v6_rpki_invalid
. You can use these functions in a peering configuration like this:
protocol bgp AS6939v4 {
description "Hurricane Electric";
local 193.189.82.40 as 397143;
neighbor 193.189.82.134 as 6939;
hold time 90;
keepalive time 30;
graceful restart;
ipv4 {
next hop self;
import keep filtered;
import filter {
if is_v4_rpki_invalid() then reject;
accept;
}
};
}
Because we have import keep filtered;
, we can check to see if the routes were filtered out:
bird> show route filtered count
5 of 5 routes for 5 networks in table master4
1 of 1 routes for 1 networks in table master6
0 of 0 routes for 0 networks in table r4
0 of 0 routes for 0 networks in table r6
Total: 6 of 6 routes for 6 networks in 4 tables
🤔 interesting, I would have expected more than 5 routes filtered from that session. After more research, it looks like BIRD does not automatically reject routes that were imported before the RPKI session came up. To resolve this, we need to reload in all
:
Let’s look at those filtered routes again:
bird> show route filtered count
4612 of 4612 routes for 4612 networks in table master4
0 of 0 routes for 0 networks in table master6
0 of 0 routes for 0 networks in table r4
0 of 0 routes for 0 networks in table r6
Total: 4612 of 4612 routes for 4612 networks in 4 tables
That’s better. And terrifying. It looks like we’re filtering 4,612 invalid routes from this session. This reload in all
quirk is something that BIRD has not solved yet, but is expected to be fixed in a subsequent release.
You can verify that your autonomous system isn’t accepting invalid RPKI routes by browsing to https://www.ripe.net/s/rpki-test. If all is well, you’ll see the following:
🎉
— © 2021. All rights reserved. —