Farsight Flexible Search Technical Reference Guide
Introduction
This is the technical reference guide for the Flexible Search extensions (known as the Flex API) to DNSDB APIv2. APIv2 is implemented in Farsight Security DNSDB 2.0 which provides our Standard Search capabilities. Flexible search adds ways to search DNSDB by regular expressions and globs (aka wildcarding).
Please see the Introducing DNSDB 2.0 page for general information.
Audience
This document is intended for those wanting a deeper understanding of the data in the Flex API and programmers who want to write applications that can interact with the Flex API extensions to the RESTful DNSDB APIv2 using JSON and HTTP.
Properties of Flexible Search thru its Flex API
Flexible Search has much more powerful searching capabilities than Standard Search, but the results will not be as “full” as the results from Standard Search DNSDB.
It provides a flavor of “regular expressions” known as “extended regular expressions,” as used by the Unix egrep command. It supports most egrep-style regular expressions. The particulars are known as “Farsight Compatible Regular Expressions” (FCRE) and are documented at FCRE Reference Guide.
It provides wildcard search features, known in the Unix world as globs, that are more comprehensive than that provided in Standard Search. Note, though, that globs are not as powerful as regular expression searches. Globs are documented at Glob Reference Guide.
If you use Flexible Search to search on the LHS (Left Hand Side) of DNS records, aka RRnames, you’ll get back the matching RRnames and their RRTypes. If you use it to search on the RHS, you’ll get back matching RData and their RRTypes. Flexible Search provides only certain RRnames and RData information; that information may be sufficient as-is for your needs. If you need full details, you can pivot from Flexible Search to the main Standard Search DNSDB APIv2.For example, if you search in RRnames for RRType ‘A’, you will get back matching RRnames that have ‘A’ RRType records. If you also want the IP addresses associated with ‘A’ records, then you’ll need to make follow-up queries (i.e. pivot) in DNSDB Standard Search (that can be as simple as clicking on a link in DNSDB Scout).This means:
- Sometimes you’ll run DNSDB Flexible Search and the results will be sufficient as-is.
- Other times, Standard Search can find the results you need, just as before, nothing more needed.
- On other occasions, you’ll run DNSDB Flexible Search and then pivot from those results to DNSDB Standard Search for more details.
- Its search expressions are case insensitive.
- Only certain RRTypes are searchable in Flexible Search, see details below. RRTypes ‘A’ and ‘AAAA’ are not searchable by Flexible Search. IP addresses, IP address ranges, or CIDR netblocks which might be present in other RRTypes, such as ‘TXT’ records, but Flexible Search can only search them by treating them as literal text. Use Standard Search to search RRTypes ‘A’ and ‘AAAA’ with awareness of IP ranges and CIDR.
- It returns JSONL (newline delimted JSON) data, similar to what Standard Search returns, but the data are not the IETF Passive DNS – Common Output Format (COF) format.
Note: Because of the high volume of records and their low value in terms of useful
information, IPv4 and IPv6 PTR records (for example, \.ip6\.arpa$
and
\.[0-9]{1,3}\.in-addr.arpa$
) are filtered out of the flex data set. This implies that
reverse lookups via in-addr.arpa records will return no records, even if those records are
found in the DNSDB database.
Data representation and indexing
Trailing dot
DNS is a distributed database whereby all names are “below” the root. The root name is “.”
(a single dot). DNS domain names all implicitly end in that trailing dot, even though most
programs do not show it. The standard behavior for many DNS-smart libraries and programs
is to add the trailing dot if it’s missing. Therefore, the Standard Search DNSDB
automagically adds the trailing dot to DNS names. To illustrate this, here are some
examples of programs automagically adding the trailing dot as needed, dig
and dnsdbq
:
$ dig +short api.dnsdb.info CNAME
dnsdb.info.
$ dig +short api.dnsdb.info. CNAME
dnsdb.info.
$ dnsdbq -r api.dnsdb.info -t CNAME | fgrep -v ';'
api.dnsdb.info. CNAME dnsdb.info.
$ dnsdbq -r api.dnsdb.info. -t CNAME | fgrep -v ';'
api.dnsdb.info. CNAME dnsdb.info.
Similarly, RData values will usually either end in a trailing dot or a double quote.
Here we show a TXT record from dig
and then from dnsdbq
. From the dnsdbq
output,
you can see that the double quotes are really part of the data (note we split the line for
readability):
$ dig +short fsi.io TXT
"v=spf1 mx a a:support.farsightsecurity.com a:exch.fsi.io a:prod-mail-relay-1.iad1.fsi.io ~all"
$ dnsdbq -r fsi.io -t TXT -j -A -10000 | jq .
{
"count": 340,
"time_first": 1596815861,
"time_last": 1600347647,
"rrname": "fsi.io.",
"rrtype": "TXT",
"bailiwick": "fsi.io.",
"rdata": [
"\"v=spf1 mx a a:support.farsightsecurity.com a:exch.fsi.io a:prod-mail-relay-1.iad1.fsi.io ~all\""
]
}
The Flex API does not automagically adjust your queries to add the trailing dot to RRnames queries nor does the API the trailing period or double quote to RData queries. Therefore, your queries will need to match those characters. This will be illustrated below.
Filtering and Representation of RRnames and RData
To maintain a high signal-to-noise ratio and deliver an as practically-usable tool as possible, we curate what Flexible Search indexes. This means that we intentionally do not index some of the junk that gets injected into the Domain Name System:
For RRnames:
- All RRnames data ends with a trailing dot. Therefore, for RRnames searches, the regular expression or glob must end with an expression that matches the trailing dot.
- Names that contain characters other than alphanumeric characters, dash, underscore, and period will be discarded and not indexed.
- Names more than 81 characters long will be discarded and not indexed.
- Names are converted to text with wdns_domain_to_str(). See the WDNS Library, a low-level C library for dealing with wire-format DNS packets.
For RData:
- All well-formed RData we currently index in the DNS dataset ends in a dot or a double quote. A regular expression or glob usually should end with an expression that matches the trailing dot or a double quote.
- Values more than 256 characters long will be discarded and not indexed.
- SOA record values will be truncated to just mname, space, rname.
- Values are converted to text with wdns_rdata_to_str(), part of the WDNS Library, with that custom representation for SOA data.
RRTypes supported
RRtypes can be entered in either the standardized alphanumeric “symbolic” format (A, AAAA, CNAME, MX, NS, SOA, TXT, etc.) or in TYPE# format (where # is the decimal value of the RRType).
If an undefined RRType is specified (i.e. not in the DNS RFCs), then an error is returned with message “Error: RRTYPE has an unsupported value”.
The RRtype ANY is modified somewhat from its usual meaning. A search for RRtype ANY will match any Flexible Search-indexed RRtype. ANY is the default if RRTYPE is not specified.
DNSSEC records are not searchable nor returned with Flexible Search. Therefore, the following DNSSEC-related RRtypes are not permitted:
- DLV
- DNSKEY
- DS
- NSEC
- NSEC3,
- NSEC3PARAM
- RRSIG
- nor our own synthetic ANY-DNSSEC.
If one of those DNSSEC RRTypes is specified, then an error is returned with message “Error: RRTYPE has an unsupported DNSSEC value”.
Only the following RRTypees (plus their TYPE# equivalents) are indexed for RData queries:
- CNAME
- HINFO
- MX
- NAPTR
- NS
- PTR
- RP
- SOA
- SPF
- SRV
- TXT
If another RRType is provided for an RData query, then an error is returned with message “Error: rdata searches are not allowed for the specified RRType”.
In particular, A and AAAA records do not have their RData indexed. This means that if you’re searching for IPs, IP ranges, or CIDRs you should probably be using Standard DNSDB (any IPs that do get indexed in Flexible Search just get treated as text).
SOA records
An example full SOA record’s rdata value is
"fsi.io. hostmaster.fsi.io. 2014100635 7200 3600 604800 3600"
. In
the Flex API, SOA record values will be truncated to just mname,
space, rname. The other portions of SOA records (serial number, zone
timers, and minimum/error TTL) are not indexed nor available from the
Flex API, but you can pivot to Standard Search for full values.
Due to the truncation of SOA records, they may not longer appear unique, and using time-fencing to match SOA records may not work as expected. For example, if you use time-fencing to select only “old” SOA records, you might find they don’t exist in the Flex API but do exist in the Standard Search API. The Flex API will have that SOA record, but across a much longer period of time.
Here’s a simplified example: dnsdbq -r fsi.io -t SOA -l 0
will
return tens of thousands of rows, with an earliest time_first of 2014-10-09
20:24:59 and a latest time_last around the past hour. dnsdbflex
will return just one record encompassing that entire period.
Here I show, with added line wrapping for readability:
- the first record from Standard Search
$ dnsdbq -r fsi.io -t SOA -l 1 -j
{"count":77,"time_first":1412886299,"time_last":1412900460,"rrname":"fsi.io.",
"rrtype":"SOA","bailiwick":"fsi.io.",
"rdata":["fsi.io. hostmaster.fsi.io. 2014100635 7200 3600 604800 3600"]}
- the last record from Standard Search
$ dnsdbq -r fsi.io -t SOA -l 0 -j | tail -1
{"count":1,"time_first":1599750021,"time_last":1599750025,"rrname":"fsi.io.",
"rrtype":"SOA","bailiwick":"fsi.io.",
"rdata":["fsi.io. hostmaster.fsi.io. 2020090902 7200 3600 604800 3600"]}
- a specific Flex search in RRnames
$ dnsdbflex --glob 'fsi.io.' -j -t SOA -s rrnames
{"rrname":"fsi.io.","rrtype":"SOA"}
- a Flex search with a wildcard in RData
$ dnsdbflex --glob '*fsi.io*' -j -t SOA -s rdata
{"rdata":"fsi.io. hostmaster.fsi.io.","rrtype":"SOA",
"raw_rdata":"0366736902696F000A686F73746D61737465720366736902696F00"}
- using the raw_rdata value from that second Flex search to pivot back into Standard Search and getting the first result
$ dnsdbq -N 0366736902696F000A686F73746D61737465720366736902696F00 -j -l 1
{"count":6,"time_first":1413228451,"time_last":1413228451,
"rrname":"local-data.fsi.io.","rrtype":"SOA",
"rdata":["fsi.io. hostmaster.fsi.io. 2014050101 7200 3600 604800 3600"]}
- the previous Flex search with batch file format output (instead of json format)
$ dnsdbflex --glob '*fsi.io*' -t SOA -s rdata -F
rdata/raw/0366736902696F000A686F73746D61737465720366736902696F00/SOA
# rdata/name/fsi.io. hostmaster.fsi.io./SOA
- pivoting using a batch file from that Flex search directly into dnsdbq
$ dnsdbflex --glob '*fsi.io*' -t SOA -s rdata -F | dnsdbq -f -j -l 1
{"count":6,"time_first":1413228451,"time_last":1413228451,
"rrname":"local-data.fsi.io.","rrtype":"SOA",
"rdata":["fsi.io. hostmaster.fsi.io. 2014050101 7200 3600 604800 3600"]}
--
- and using the raw_rdata value from that second Flex search to pivot back into Standard Search and using a summarize query to get the total number of results available under the raw_rdata value. We see there are 28828 RData rows available and 36640686 is the sum of their counts.
$ dnsdbq -V summarize -N 0366736902696F000A686F73746D61737465720366736902696F00 -j -l 0
{"count":36640686,"num_results":28828,"time_first":1412885673,"time_last":1599765048}
regex search
This is a regular expression search. The particulars are known as “Farsight Compatible Regular Expressions” (FCRE). For more information:
- An introduction is at What is a Regular Expression?
- The FCRE Reference Guide
Recall that dot (aka period) is a “match any one character” regular
expression operator. To match exactly a dot, use \.
, as you must
backslash escape the dot.
The regular expressions are not implicitly anchored front and back. In domain names with standard DNSDB, a search for “fsi.io” contains two DNS labels “fsi” and “io”. With the Flex API, “fsi.io” would actually match “fsiaio”, “fsi.io”, “fsi9io”, but because it’s not anchored, it also matches “www.fsi.io” and “www.fsi-io.yourbank.com”. Use ^
to anchor at the front and $
at the end.
The Flex API does not automagically insert the trailing period. If you right-anchor a regular expression that is searching for RRnames, you need to add a character that will match the trailing period. A backslash escaped dot or not-backslash escaped dot will do, though any regular expression operator that can match a dot is valid. For RData like TXT records, the trailing character is a double quote, therefore using a not-backslash escaped dot is probably safest.
To get the same behavior as Standard Search DNSDB for “fsi.io”, you need to anchor the
regular expression, backslash escape the first dot, and add a trailing, thus use ^fsi\.io\.$
.
glob search
This is a glob (aka wildcarding search). For more information:
- An introduction is at What is Globbing?
- The Glob Reference Guide
Globs, unlike regular expressions, are implicitly front and back anchored. To search for
any name containing “coke”, glob *coke*
works.
To get the same behavior as Standard Search DNSDB in searching for “fsi.io”, you need to
add a trailing dot, fsi.io.
works.
Our dnsdbflex
tool will forbid an RRnames glob query without something that matches the trailing
dot:
$ dnsdbflex --glob fsi.io -s rrnames
Error: a glob search argument for rrnames should end either in a period or certain glob
special characters (*, ?, or ]).
You may not get results from your search.
You can force it to issue the query, but you won’t get results:
$ dnsdbflex --glob fsi.io --force
Warning: a glob search argument for rrnames should end either in a period or certain glob
special characters (*, ?, or ]).
You may not get results from your search.
Query status: NOERROR (no results found for query.)
There are some rare RRTypes whose RData values do not end in dot or double quote, so
--force
can be useful for those.
Comparison of data returned from Standard Search and Flexible Search
For Standard Search RRset searches, the following data are returned in the output:
- rrname
- rrtype
- rdata
- bailiwick
- count
- time_first, time_last, zone_time_first, zone_time_last (as applicable)
For Flexible Search RRnames searches, the equivalent of Standard Search RRset searches, the following data are returned in terse mode (which is the only currently supported mode):
- rrname
- rrtype
For Standard Search RData searches, the following data are returned:
- rrname
- rrtype
- rdata
- count
- time_first, time_last, zone_time_first, zone_time_last (as applicable)
For Flexible Search RData searches, the following data are returned in terse mode (which is the only currently supported mode):
- rdata
- rrtype
- raw_rdata
A description of each of the data fields returned by Flexible and Standard Search:
Key | Description |
---|---|
rrname | The owner name of the RRset in DNS presentation format. For Flexible Search, see the Representation of rrnames and rdata section for details. |
rrtype | The resource record type of the RRset, either using the standard DNS type mnemonic, or an RFC 3597 generic type, i.e. the string TYPE immediately followed by the decimal RRtype number. |
rdata | An array of one or more RData values. For Standard Search, the RData values are converted to the standard presentation format based on the rrtype value. If the encoder lacks a type-specific presentation format for the RRset’s rrtype, then the RFC 3597 generic RData encoding will be used. For Flexible Search, see the Representation of rrnames and rdata section for details. |
raw_rdata | The record data value as pairs of hex digits specifying a raw octet string. This value is used for pivoting from Flexible Search into Standard Search to get more details on RData. |
A description of each of the data fields only returned in Standard Search; you might need to pivot from Flexible Search to Standard Search to get these:
Key | Description |
---|---|
bailiwick | Closest enclosing zone delegated to a nameserver which served the RRset. |
count | The number of times the RRset was observed via passive DNS replication. |
time_first, time_last | UNIX epoch timestamps with second granularity indicating the first and last times the RRset was observed via passive DNS replication. Will not be present if the RRset was only observed via zone file import. |
zone_time_first, zone_time_last | UNIX epoch timestamps with second granularity indicating the first and last times the RRset was observed via zone file import. Will not be present if the RRset was only observed via passive DNS replication. |
Comparison of dnsdbq and dnsdbflex JSON output
Here is what the JSON data output from dnsdbq
and dnsdbflex
looks
like for RRset (in dnsdbq
), RRnames (in dnsdbflex
), and RData
queries. Line breaks were added for readability, but the actual
results were on one line.
RRset (RRnames) comparison
$ dnsdbq -r 'fsi.io' -j -l1
{"count":10392,"time_first":1381265499,"time_last":1428418529,
"rrname":"fsi.io.","rrtype":"A","bailiwick":"fsi.io.","rdata":["66.160.140.76"]}
(See above for why we need to add the trailing period on fsi.io
.)
$ dnsdbflex --glob 'fsi.io.' -j -l1 -s rrnames
{"rrname":"fsi.io.","rrtype":"A"}
RData comparison
$ dnsdbq -n 'fsi.io' -j -l1
{"count":6,"time_first":1413228451,"time_last":1413228451,
"rrname":"local-data.fsi.io.","rrtype":"SOA",
"rdata":["fsi.io. hostmaster.fsi.io. 2014050101 7200 3600 604800 3600"]}
(See above for why we need to make the query wildcarded, as fsi.io.
does not appear by
itself in any RData values.)
$ dnsdbflex --g 'fsi.io*' -j -l1 -s rdata
{"rdata":"fsi.io. hostmaster.fsi.io.","rrtype":"SOA",
"raw_rdata":"0366736902696F000A686F73746D61737465720366736902696F00"}
Name encoding
All IDN DNS names must be encoded in Punycode. In particular, using non-basic ASCII
characters may result in a 500 Internal Server error. You cannot search for a IDN name as
a wildcard in its native character set; you can only search in the Punycode representation.
Quoting
If you’re using our dnsdbflex
command line tools and working at the
shell prompt to access DNSDB 2.0 Flexible Search, single quote marks
around your query string will protect it from unwanted shell
interaction. Otherwise, you’ll need to backslash escape any special
characters including the backslash.
When working with dnsdbflex
, DNSDB Scout, raw URLs, etc. in regular
expressions you will often have to backslash escape the dot that
separates DNS name labels.
Some examples of both of those:
$ dnsdbflex --r fsi.io
searches for: any string, letters ‘f’, ‘s’, ‘i’, any
character, ‘i’, and ‘o’, and any string.$ dnsdbflex --r fsi\.io
identical to the previous search.$ dnsdbflex --r fsi\\.io
searches for: any string, letters ‘f’, ‘s’, ‘i’, ‘.’ ‘i’,
and ‘o’, and any string.$ dnsdbflex --r 'fsi\.io'
searches for: any string, letters ‘f’, ‘s’, ‘i’, ‘.’ ‘i’,
and ‘o’, and any string.$ dnsdbflex --r '^fsi\.io$'
searches for: letters ‘f’, ‘s’, ‘i’, ‘.’ ‘i’, and ‘o’
(anchored left and right). This will fail because the trailing dot is not matched.$ dnsdbflex --r '^fsi\.io.$'
searches for: letters ‘f’, ‘s’, ‘i’, ‘.’ ‘i’, ‘o’, and
‘.’ (anchored left and right).
Error messages
Flex Search will report on various errors in the regex or glob search
parameters. These errors are detected after the query has started
processing, i.e. after an HTTP “200 OK” was returned. DNSDB Scout anddnsdbflex
will report these errors to you. If necessary, the error
message will include the character index (starting at 0) where the
error was noticed (like a compiler will do).
While this is not the API document, here is one detailed example for
reference. The regular expression has a tail anchor followed by a
head anchor.
$ curl -k -H "Accept: application/x-ndjson" -H "X-API-Key: $DNSDB_API_KEY" \
'https://api-dev2.dnsdb.info/dnsdb/v2/regex/rrnames/$^?limit=2'
...
< HTTP/1.1 200 OK
...
{"cond":"begin"}
{"cond":"failed","msg":"Regex syntax error: Invalid characters after '$' at 1"}
The following are the general error messages, which will be followed by more specific
details, as just illustrated.
Message |
---|
Bad exclude expression: |
Gateway timeout: |
Glob syntax error: |
Glob syntax unsupported: |
Regex syntax error: |
Regex syntax unsupported: |
Search string unsupported: |
Examples of some of those errors as reported by dnsdbflex
, whereby it prints Query failed:
and then specific details:
$ dnsdbflex --regex '[[:foo:]].com' -d
Query failed: Regex syntax error: Invalid character class name at 0
$ dnsdbflex --regex 'www|{0,100}'
Query failed: Regex syntax error: Invalid '{' at 4
$ dnsdbflex --regex 'missing['
Query failed: Regex syntax error: Unmatched [ at 7
$ dnsdbflex --glob '[missing.'
Query failed: Glob syntax error: Unmatched [ at 0
$ dnsdbflex --regex '*.com'
Query failed: Regex syntax error: Invalid '*' at 0
# In the previous line, the user probably meant to search for anything ending in .com,
# which as a regular expression, should be:
$ dnsdbflex --regex '.*\.com\.$' -l 2
Unlikely error messages
The following error messages will not be returned in practice, as either the error is detected at a higher-level and a “400 Bad Request” status is returned, or it indicates an internal bug. Please contact Technical Support if you see them.
Unlikely Message |
---|
Bad return what parameter: |
Bad rrtype: |
Bad search method: |
Bad search what parameter: |
Illegal limit: |
Internal error |
Example client
- dnsdbflex, a full-featured example API client written in C. You can also read the manual page.
References
- What’s A Regular Expression?
- What is Globbing?
- Printable cheat-sheet for regex and glob
- For more information about IDNs, Homographs, and Punycode, see Farsight Security Global Internationalized Domain Name Homograph Report
- For more information about DNSDB Time Fencing please see Joe St Sauver’s Farsight’s DNSDB Time Fencing: A Post-Attack “Time Machine”
- Latest draft of IETF Passive DNS – Common Output Format (COF) format