Visualizing the incidence of IPv4 addresses in Farsight's DNSDB
Introduction
Recently, a customer asked “..how large is the DNSDB population of IP addresses with “current” DNS records?” I had some rough back of the napkin estimates, but nothing concrete. The article that follows is my take on finding a more calculated answer to that question.
To approach this problem we first need to define some elements of the question. First, we will define “current” by stating observations captured in the selected set of DNSDB-export mtbl file(s) are current for that set. We can run this analysis on an Hour/Day/Month/Year of data. For this study, I choose to run three passes, one with a daily file (2017-FEB-03), one with a monthly file (2017-JAN) and finally with a yearly file (2015). (At the time of this writing the 2016 yearly file was in the process of being generated so I chose our 2015 rollup). We will define DNS activity as the appearance of an A record with a valid IP address within the subject file (of course as we are looking at A records, we are only looking at IPv4 address space).
The finished product is shown below (read on to learn what it all means and
how it was built):
(A higher-resolution version of this image is available here)
The Tools
To facilitate the process of collecting the data I created a quick set of IPv4 mapping tools in golang to manipulate a one-bit per address bit-map. This tool set consists of the following three tools:
- setup – creates an empty bit-map file,
- addd – the “add daemon” is a daemon that listens for connections on a local TCP port and sets the bit for any newline separated IP addresses submitted to the socket. Finally the
- read – tool takes a filename of a bit-map file and outputs an ordered stream of new-line-separated dotted-quad IPv4 addresses.
These tools provide a simple pipeline model for handling a large list of IPv4 addresses with Unix tools like sed, awk, grep, nmsgtool etc.
The Process
Let’s work through a simple example. First initialize an empty data-file:
$ ./setup data.dat
Now start the daemon by specifying the data file and TCP port number to listen on:
$ ./addd data.dat 4444
Next we will populate the data file by transmitting a newline separated list of dotted-quad IPv4 addresses to the TCP socket. We generate the data using the dnstable_dump command to output an ordered stream of rdata records from the subject mtbl file then filtering the output to contain only “A” records with properly formatted IPv4 addresses:
$ dnstable_dump –d dns.20170203.D.mtbl | grep –e “ IN A “| awk{‘print $4’} | uniq | nc localhost 4444
The IP addresses are passed over the network socket by way of the netcat utility. In this case, localhost:4444
is our
instance of the addd tool. The bits mapped to the IP address of the A records
in the data is then set to one. At this point, we can perform some quick
analytics and get a feel for the kinds of numbers that we are looking at. For
this I use the read tool to generate a list of IP addresses where the bits are
set to one in the bitmap file and pipe the output to typical command-line
tools. Read can also be called with the argument zero which will emit a list
of IPv4 addresses where the bit is set to zero.
Let’s start with a simple line/IP count:
$ ./read data.dat one | wc –l ...
Sticking this output from this command into a table we see the following:
|File |Total |# In Use |% In Use |# Unused |% Unused |
|—–|
|2017-FEB-03
|4,294,967,296
|12,904,017
|%0.30
|4,282,063,279
|%99.69
|
|2017-JAN
|4,294,967,296
|37,266,823
|%0.86
|4,257,700,473
|%99.13
|
|2015
|4,294,967,296
|69,044,394
|%1.60
|4,225,922,902
|%98.39
|
We observe approximately 1% of the total IPv4 address space in DNS on a monthly basis and about 1/3 of a percent on a daily basis. The size of the daily in-use count tracks to about 12 Million addresses per day over the week of data that I checked.
Next, we can compare the daily files:
$ ./read 20170203.dat one > ips.20170203.txt $ ./read 20170204.dat one > ips.20170204.txt $ sort ips.20170203.txt ips.20170204.txt | uniq -d | wc –l 10720088
Comparing the day files for 2017-FEB-03 and 2017-FEB-04 we see an overlap of 10,720,088 IPs that were seen in both day files. This leaves us with about 2.2 million addresses worth of churn in one day.
A note about the size of IPv4 address space
Beware that the 4,294,967,296 number of addresses is based on the entire 32-bit size of the IPv4 address space. Not all of this space is routed or even open for allocation. Carve-outs such as RFC-1918, multicast space and “Class E” AKA “Future use” address space reduce the overall usable space of IPv4, however the DNS A record ignores these limitations and is represented as a 32-bit number. There are applications that make use of A records as a 32-bit number so some addresses found in this study are not functional IP addresses. For this exercise, we are willing to live with this.
Generating the IPv4 Hilbert Curve Heatmap
After loading the data into the data.dat file, we can use the read utility to extract it and feed it into ipv4-heatmap. Ipv4-heatmap takes as input a stream of newline separated dotted-quad IPv4 addresses and plots them onto a Hilbert curve heatmap (the same Hilbert curve made popular for IP mapping by xkcd. For this example we’ll use the default mode which generates an image where each pixel in the body of the graph represents a single /24 netblock. The color of the pixel indicates how many of the individual addresses within that /24 were found in the data-source.
The command is as follows:
$ ./read data.dat one | ipv4-heatmap \ -u "/24 per pixel" \ -h -d -i -P rdbu\ -f ~/ipv4-heatmap/extra/LiberationSans-Regular.ttf \ -t "IPs found within DNSDB rdata (month)" \ -a ../slash_eight.annotations \ -o ./images/map_day.png
The command above generates the image found at the top of this post. Hilbert
curves are a powerful projection used to render IPv4 address space. The output
is rather compact and the way adjacent IP space is plotted, CIDR blocks are
represented as rectangles. It takes a bit of getting used to at first, the way
numbers meander around the image. Once you get the hang of however, it becomes
intuitive. The excerpt below shows a zoom in on 134.0.0.0/8
the inner squares
each represent /16s. Each pixel represents a /24. Black pixels had zero IP
addresses in that /24. Blue to white blocks are < 50% in use and white to red
blocks are > 50% with red being 100% represented.

Ben April is the Research Director for Farsight Security, Inc