New DNSDB -V summarize option: Sometimes "Less" Is "More"
1. Introduction
As part of a recent update to DNSDB, dnsdbq
now offers a “-V summarize
” verb (this is an implementation of the “estimation of result size” feature mentioned in an earlier blog article). Since we covered the new feature using DNSDB Scout in that initial article, we will only focus on dnsdbq,
Farsight’s DNSDB command line client written in C, here.
To make use of the new dnsdbq -V summarize
feature, begin by ensuring that you’re running the latest available version of dnsdbq
. The manual page for the new verb describes the -V option as:
-V verb
The verb to perform, i.e. the type of query, either "lookup" or
"summarize". The default is the "lookup" verb. As an option, you
can specify the "summarize" verb, which gives you an estimate of
result size. At-a-glance, it provides information on when a given
domain name, IP address or other DNS asset was first-seen and last-
seen by the global sensor network, as well as the total observation
count.
As noted in the manual page, when you specify -V summarize
in dnsdbq
you will get JUST:
- The first- and last-seen time across the returned results
- The summed counts tallied across the returned results
- But, NO detail records will be provided for the returned results.
2. Example A: www.mit.edu/A/mit.edu
It may help to consider an example. Let’s ask for RRname results for www.mit.edu/A/mit.edu, and limit that query to three results:
$ dnsdbq -r www.mit.edu/A/mit.edu -l 3 ;; record times: 2010-06-24 06:02:21 .. 2013-04-01 16:52:02 ;; count: 5215640; bailiwick: mit.edu. www.mit.edu. A 18.9.22.169 ;; record times: 2013-01-22 21:10:33 .. 2013-01-23 00:11:57 ;; count: 452; bailiwick: mit.edu. www.mit.edu. A 18.9.22.169 www.mit.edu. A 141.101.116.213 www.mit.edu. A 141.101.117.213 ;; record times: 2013-01-22 17:51:20 .. 2013-01-22 17:53:43 ;; count: 9; bailiwick: mit.edu. www.mit.edu. A 128.103.63.138
Now let’s run that same query, this time including the -V summarize
option:
$ dnsdbq -r www.mit.edu/A/mit.edu -l 3 -V summarize ;; record times: 2010-06-24 06:02:21 .. 2013-04-01 16:52:02 ;; count: 5216101; num_results: 3
Note that this output corresponds to our “full” results:
Looking at just the time first seen for the three records, the earliest of those (2010-06-24 06:02:21) is shown in the summary output.
Looking at just the time last seen for the three records, the latest of those (2013-04-01 16:52:02) is shown in the summary output.
And looking at the counts, if we sum up 5215640, 452, and 9, we get 5216101, the count shown in the summary output.
We did NOT get “imputed” information for “all potential results” that DNSDB may know for that query, just the three we asked for.
The dnsdbq
summarize verb works “just like a regular query,” EXCEPT:
The first-seen time is the earliest first-seen time seen in ANY of the results that would normally be displayed,
The last-seen time the latest last-seen time seen in ANY of the results would normally be displayed,
The displayed count is the sum of the individual counts that were in the results that would normally be displayed, and
You aren’t shown the individual details records.
3. Example B: *.uber.com
Let’s consider another example, a dnsdbq
summarize query for *.uber.com returning up to a million results. We’ll begin by “manually” summing up the counts for an up-to-million results with jq
and a tiny one-line awk
REPL script:
$ dnsdbq -r \*.uber.com -l 1000000 -j | jq -r '.count' | awk '{s+=$1}END{print s}' 2771990113
Now let’s see what we see from the actual dnsdbq summarize
verb:
$ dnsdbq -r \*.uber.com -l 1000000 -V summarize ;; record times: 2010-06-24 10:38:39 .. 2019-08-29 21:33:59 ;; zone times: 2010-04-24 16:12:21 .. 2018-03-22 16:02:25 ;; count: 2771990200; num_results: 1000000
The results for this example are interesting for a couple of reasons:
The summarize results include TWO sets of times, one for the as-observed-in-Farsight-sensor-derived data, and the other for results derived from zone file data (our example from Section 2 didn’t include any zone file data, so didn’t have any zone file data timestamps in that example).
This summarize output has a very large count (2,771,990,200), representing the sum of the count values seen in the million results returned for our query.
When you see a number that large, it can be tempting to assume that summarize MUST somehow be looking at ALL the results that DNSDB knows about for *.uber.com (rather than just the first million results) — but that would be wrong. The huge value of 2,771,990,200 is JUST the sum of the counts for the first million results, very close to the result we got when we summed up a million counts “manually” with jq and awk (2,771,990,113). (The difference between the two counts is due to values updating in the brief interval between the two measurements).
4. Quota Considerations
A dnsdbq -V summarize
query “counts the same” as a regular query in terms of your quota usage
A common question, as you might expect, is “So if doing a dnsdbq -V summarize
query counts the same as doing a regular dnsdbq
query, why not just do a regular query?” The answer is that the summarize verb is a nice option when you ONLY care about things like aggregate counts/first/last seen times because it avoids the necessity of taking all the detail records (only to then subsequently end up “throwing them away”).
5. “Why do you show num_results in dnsdbq -V summarize
output?”
dnsdbq
includes num_results
in its output because it provides important context for the summary output.
For example, if you’ve asked for 500,000 results but we only know about 400,000 results, we want to ensure you know that we weren’t able to give you a summary for the full 500,000 you requested.
6. What You DON’T and CAN’T Get From Summarize
When you use the dnsdbq -V summarize
option, dnsdbq
returns its summary based on the results you would otherwise have seen had you not specified the summarize verb. The summarize verb does NOT somehow magically review ALL the results that DNSDB potentially knows about a given query (as if the limit value didn’t matter).
To make this concrete, let’s pretend that dnsdbq
knows about 25 million unique combinations of (RRname, RRtype, Bailiwick, Rdata, and zone-file vs observed-by-a-sensor). Let’s also assume you use dnsdbq -V summarize
and ask for the maximum number of results you can get from dnsdbq
in a single query (e.g., one million results).
The first-seen, last-seen and count values that will be reported through dnsdbq -V summarize
will be be based on the one million displayable results you would otherwise have been shown in detail, NOT the full set of 25 million results.
This means that you do NOT know, and CANNOT know, how many total unique results for your query may still “lurk” undisclosed in the passive DNS database, nor what the sum of the counts for all those results might be — the summarize
verb will just report on what you could otherwise have gotten in normal detail-record form.
7. Acknowledgement
The author would like to thank his colleague David Waitzman for his helpful comments on this article, and for all his work in adding new features in DNSDB API. Any errors remaining in this article are the responsibility of the author.
8. Conclusion
We hope that this introduction to the dnsdbq summarize
verb has been helpful and instructive for you.
The Farsight Security Sales Team can be reached at [email protected].
Joe St Sauver Ph.D. is a Distinguished Scientist with Farsight Security®, Inc.