Creating a DNSDB-Flexible-Search-To-DNSDB-Standard-Search Pipeline With 0mq

I. Introduction

In part one of this series, “Using 0mq to Plumb a Simple Intermediate Layer For a DNSDB Client/Server Application”, we showed how we could use the 0mq messaging library to create an intermediate “shim” layer between a simple client application and DNSDB API. PLEASE READ THAT ARTICLE BEFORE READING THIS ONE!

In this part, we’re going to build on that basic Python3 application by:

Building a DNSDB Flexible Search enrichment pipeline
Creating a global domain “kill file” that we can use to do filtering of our DNSDB output prior to delivery.
Displaying our output as something easier-to-read than just a blob of raw JSON Lines with times expressed in Un*x ticks.

II. Reviewing DNSDB Flexible Search Enrichment

Farsight’s new DNSDB Flexible Search capability allows users to search DNSDB for keywords or for regular expressions. Matching hits for RRname searches get returned as terse results, just domain names with their associated resource record type.

To get FULL DETAILS for each Flexible Search RRname hit (including first-seen, last-seen, counts, and associated “right hand side” Rdata), users need to “chase” those hits via followup queries that get made to DNSDB Standard Search. That process can be done:

Routinely (e.g., by piping the output from one command to another, as is the case for dnsdbflex going into dnsdbq 2.3.0 or
Selectively (as when clicking on hits found in DNSDB Scout Website

Let’s demonstrate that process for both cases. Assume we want to find up to 100 domain names that have been seen in the last 30 days that contain the string whitman (naturally, there are likely far more than 100 domain names of that sort, but we don’t want to potentially exhaust your entire query quota with just this example!)

Begining with dnsdbflex:

$ dnsdbflex --regex 'whitman' -A 30d -l 100
{"rrname":"penrose.whitman.edu.cdn.gap.ae.","rrtype":"A"}
{"rrname":"whitman.med.gap.ae.","rrtype":"A"}
{"rrname":"whitmansgc.mib.gap.ae.","rrtype":"A"}
[...]
{"rrname":"whitmancollege.nitflix.ca.","rrtype":"CNAME"}
{"rrname":"whitman.ca.","rrtype":"NS"}

Assume some of these results pique our interest. If so, we can rereun our query, piping the results from dnsdbflex into dnsdbq in order to get full details about the hits.

$ dnsdbflex --regex 'whitman' -A 30d -l 100 -F | dnsdbq -f -m -A 30d -l 100
;; record times: 2020-10-04 06:11:38 .. 2020-10-04 06:11:38 (1s)
;; count: 2; bailiwick: gap.ae.
penrose.whitman.edu.cdn.gap.ae.  A  162.13.201.232

;; record times: 2020-10-04 05:28:53 .. 2020-10-04 05:28:53 (1s)
;; count: 2; bailiwick: gap.ae.
myweb.whitman.gap.ae.  A  162.13.201.232

;; record times: 2020-09-30 01:09:27 .. 2020-09-30 01:09:27 (1s)
;; count: 2; bailiwick: askdyson.at.
www.whitman.askdyson.at.  CNAME  askdyson.at.
[etc]

Using the “dash capital F” option to dnsdbflex ensures that results are written in a format that dnsdbq can easily digest if dnsdbq is run with its “dash little f, dash little em” options.

Some notes:

If you see an error when trying to use those dnsdbq options, you’re likely NOT running dnsdbq v2.3.0 or later– upgrade!
When running dnsbq 2.3.0 or later, be sure you ask for DNSDB API V2 capabilities by including:
```
DNSDBQ_SYSTEM="dnsdb2"
```
in your ~/.dnsdb-query.conf file
Also note that dnsdbq queries, when run with the -m options, will
be run in up to 10 parallel streams and hence may be “interleaved”
or appear to produce “out of order” output.

Trying roughly the same query, but this time using DNSDB Scout Website, we see:

Figure 1. DNSDB Scout Flexible Search Query For Whitman

The eagle-eyed among you may notice that while we’d asked for a limit of 100 results, Scout actually reports 114 returned — that’s not Farsight being mathematically sloppy, it’s because while DNSDB returned at most 100 unique FQDNs, some of those FQDNs may have more than one RRtype, with the net result being that at times you may get “bonus results” beyond what you actually expected.

Clicking on the domain name highlighted in the above results triggers a followup Standard Search for that name, allowing us to “drill down” and view full details for that hit.

Figure 2. DNSDB Scout Flexible Search Results

If need be, we can tweak the dnsbq time fencing to match the limits we’d specified for dnsdbflex:

Figure 3. DNSDB Standard Search for a Specific Hit Of Interest, Tweaked To Roughly The Same Time Fence

Figure 4. DNSDB Standard Search Results Showing Full Details For A Selected RRname

We could repeat that process for other hits that may catch our eye (visiting the “Recent Queries” tab may facilitate that).

III. Implementing A Flexible Search Enrichment Pipeline In Our Intermediate Layer

So now you’ve seen the normal process of going from a Flexible Search query to full details, either for all Flexible Search hits, or for just select hits of particular interest. Let’s try implementing a similar Flexible Search enrichment pipeline for our sample Intermediate Layer architecture. Conceptually, the process we have in mind might be described via the following four step sequence:

Figure 5. Enrichment Process: Intermediate Layer Acting As A Broker Between DNSDB Flexible Search and DNSDB API Standard Search

While there are are actually four steps to this process, to the user the process will appear far simpler: they’ll make a Flexible Search, and then a bit later they’ll get back full (DNSDB-Enhanced) results. The rest of the “magic” happens courtesy of the “hidden hand” of our Intermediate Layer application’s code working on their behalf.

So what work must the intermediate layer do in this case? Well, it must accept and dispatch the user’s flexible search query and then accept the results from that query. However, instead of simply displaying those results, the intermediate layer will format them for follow-on DNSDB Standard Search queries, make that set of queries, and then consolidate the results for delivery to the client application.

Because this is just a proof of concept (and we want to keep the length of the code appropriate for this blog) we’ll initially impose some static constraints:

DNSDB Flexible Search Phase:

We’re only going to make RRname (“Left Hand Side”) Flexible Searches
We’ll time fence those results to names that were active during the last month
By default we’ll limit the number of Flexible Search results to no more than 100 FQDNs

DNSDB Standard Search Phase:

We’re only going to make RRname (“Left Hand Side) Standard Searches
We’ll time fence those results to names that were active during the last month
By default we’ll limit the number of results returned for each FQDN queried in DNSDB Standard Search to 100 or less

We can make those constraints tweakable as a subsequent refinement to the initial quick-and-dirty proof-of-concept.

IV. Implementing The Intermediate Layer Enricher: The Intermediate Layer Server Code (“il-server.py”)

We’re going to now talk about the Python3 code for the Intermediate Layer Server (this code is also available without narrative text as Appendix I).

a) We begin (like most Python3 programs) with a shebang line and the libraries we want to import:

#!/usr/local/bin/python3
from pathlib import Path
from os import path
from io import BytesIO
from time import strftime, gmtime
from signal import signal, SIGINT
import sys
import json
import zmq
import pycurl

b) Next, we set up a handler to cleanly exit if the user hits a ctrl-C. (We can run without this routine, but if we do that, we’ll get a nasty-looking stack dump whenever we interrupt the execution of the server. That looks scary to some people, so we’ll do this instead):

def handler(signal_received, frame):
    """Handle the user hitting ctrl-C"""
    print('CTRL-C detected: exiting')
    sys.exit(0)

c) Next, we’ve got a routine that picks up the contents of a kill file, converting those exclusion rules into a single regex exclusion expression of the sort that DNSDB Flexible Search requires:

def build_filter():
    """Read the kill file (if one exists) and construct a regex exclusion"""
    filterfilepath = str(Path.home()) + "/killfile.txt"
    if path.isfile(filterfilepath):
        with open(filterfilepath) as stream:
            mykillfilepattern = stream.read().rstrip()
        mykillfilepattern = "(" + mykillfilepattern + ")"
        mykillfilepattern = mykillfilepattern.replace("\n", "|")
        mykillfilepattern = mykillfilepattern.replace(" ", "")
        return mykillfilepattern
    else:
        return ""

In a nutshell, you’ll be able to create a file called killfile.txt in your home directory. In it, you can add one or more lines representing patterns you want to exclude. For example, your killfile.txt might contain:

\.(1688|163|58|5858|a2z|aaa|ask|com|co|de|ft|g2a|gap|gs|hbo|hwj|icq|jd|kw|mi)\.com\.$
\.(my|olx|qq|slb|ui|uk|us|vk|vip|web|yhd)\.com\.$
\.sandbox\.aol\.com\.$
\.teredo\.health\.ge\.com\.$

What you put into your killfile.txt is up to you. Anything that matches any of those patterns will be silently excluded from what you receive as results from the Intermediate Layer application’s Flexible Search. Note: the max size of the resulting kill file cannot exceed 4096 octets!

d) Then we get to the “meat” of the intermediate layer, the code that reaches out and executes the queries, either Flexible Search or Standard Search, depending on what’s needed.

def make_query(mytype, fqdn):
    """Make the actual DNSDB Flexible and Standard Searches"""

    # constants (tweak to taste)
    flex_results_to_return = 100
    flex_timefence_in_days = 30
    flex_timefence_in_seconds = flex_timefence_in_days * 86400 * -1

    standard_search_results_to_return = 100
    standard_timefence_in_days = 30
    standard_timefence_in_seconds = standard_timefence_in_days * 86400 * -1

    # get the DNSDB API key
    filepath = str(Path.home()) + "/.dnsdb-apikey.txt"
    with open(filepath) as stream:
        myapikey = stream.read().rstrip()

    # get the exclusion pattern (we reload each time to ensure
    # any changes get immediately applied)
    kill_pattern = build_filter()

    if mytype == 1:
        # flexible search
        url = "https://api.dnsdb.info/dnsdb/v2/regex/rrnames/" + \
            str(fqdn) + "?limit=" + str(flex_results_to_return) + \
            "&time_last_after=" + str(flex_timefence_in_seconds)
        if not kill_pattern:
            url = url + "&exclude=" + kill_pattern
    elif mytype == 2:
        # standard search
        url = "https://api.dnsdb.info/dnsdb/v2/lookup/rrset/name/" + \
            str(fqdn) + "?limit=" + str(standard_search_results_to_return) + \
            "&time_last_after=" + str(standard_timefence_in_seconds)

    requestHeader = []
    requestHeader.append('X-API-Key: ' + myapikey)
    requestHeader.append('Accept: application/jsonl')

    buffer = BytesIO()
    c = pycurl.Curl()
    c.setopt(pycurl.URL, url)
    c.setopt(pycurl.HTTPHEADER, requestHeader)
    c.setopt(pycurl.WRITEDATA, buffer)
    c.perform()
    rc = c.getinfo(c.RESPONSE_CODE)
    body = buffer.getvalue()
    content = body.decode('iso-8859-1')

    if rc == 200:
        return content
    else:
        return rc

e) The result we get from DNSDB Flexible Search are in JSON Lines format. We use json.loads to parse those records, and then extract the bits we need by referring to the appropriate element in that hierarchical object. We could retain the specific RRtype we discovered with Flexible Search, but we elect to omit the RRtype, thereby collecting ALL (non-DNSSEC) RRtypes associated wth that hit using just a single query.

def print_bits(myrecord):
    """extract and format RRname and RRtype value from Flexible Search query"""
        myrecord_json_format = json.loads(myrecord)
    extract_bit = myrecord_json_format['obj']['rrname']

    # if we want to restrict by RRtype
    # extract_bit_2 = myrecord_json_format['obj']['rrtype']
    # results = extract_bit + "/" + extract_bit_2

    # we normally don't want to restrict by RRtype (one query can get all RRtypes)
    results = extract_bit
    return results

f) As we run our DNSDB Standard Searches, we want to reformat that output so it is easier to read rather than just spitting out raw JSON Lines output. The process for doing this is similar to what we did for our Flexible Search results in d), but we extract a greater number of fields and formally format at least some of them to make it easier to align columns in most of our output.

Some may wonder why we have to deal with both time_last and zone_time_last in this section. The answer is that records that come from ICANN’s Zone File Access Program (or it’s replacement, CZDS), will have a zone_time_first and zone_time_last rather than a time_first and time_last value. One or the other will be present, we just need to deal with the different naming. The other item that may be cryptic if you’re not routinely formatting output in Python3 is the format string’s format, for example:

    extract_bit = str('{0:<30}'.format(extract_bit))

That nomenclature is just saying, “Left justify this string in a 30-character-wide field.” (If the name that’s found is larger than that, the field will automatically be expanded for that entry.)

def print_detailed_bits(myrecord):
    """format full Standard Search record before returning to client for display"""
    myformat = '%Y-%m-%d %H:%M:%S'
    myrecord_json_format = json.loads(myrecord)
    extract_bit = myrecord_json_format['obj']['rrname']
    extract_bit = str('{0:<30}'.format(extract_bit))

    extract_bit_2 = myrecord_json_format['obj']['rrtype']
    extract_bit_2 = str('{0:<8}'.format(extract_bit_2))

    temp_bit_3 = myrecord_json_format['obj']['rdata']
    extract_bit_3 = json.dumps(temp_bit_3)

    try:
        extract_tl = myrecord_json_format['obj']['time_last']
    except:
        extract_tl = myrecord_json_format['obj']['zone_time_last']

    tl_datetime = gmtime(extract_tl)
    enddatetime = strftime(myformat, tl_datetime)

    try:
        extract_tf = myrecord_json_format['obj']['time_first']
    except:
        extract_tf = myrecord_json_format['obj']['zone_time_first']

    tf_datetime = gmtime(extract_tf)
    startdatetime = strftime(myformat, tf_datetime)

    extract_count = myrecord_json_format['obj']['count']
    formatted_count = str('{0:>12,d}'.format(extract_count))
    results = extract_bit + " " + extract_bit_2 + " \"" + enddatetime + \
        "\" \"" + startdatetime + "\" " + formatted_count + \
        " " + extract_bit_3
    return results

g) DNSDB 2.0 added SAF records.

SAF records allow for improved monitoring of output completion, allowing detection of trunctated results.

For a simplified example of this sort, we’re simply going to strip them from our results list.

If you’re not used to working with lists in Python3, note that the [0]’th element of a list is the first element of that list, while the [-1]’th element of a list is the last element of the list. “Pop”ing an element from the list effectively deletes it.

def remove_saf_entries(mylist):
    """Strip the streaming format bookend records"""
    if mylist[0] == '{"cond":"begin"}':
        mylist.pop(0)
    if ((mylist[-1] == '{"cond":"succeeded"}') or \
        (mylist[-1] == '{"cond":"limited","msg":"Result limit reached"}')):
        mylist.pop()
    return mylist

h) Now we get to the main routine. This routine does a lot including:

Binding the 0mq socket the clients will connect to
Running the Flexible Search query the user requested
Cleaning up the results of that search (stripping the RRtype, stripping the SAF records, deduplicating the now-raw RRnames, and sorting the unique RRnames)
The main() routine then iterates over that list of unique results, requesting full details
Those results then have the SAF records stripped, and get formatted for display
The accumulated results then get returned via 0mq to the client

def main():
    context = zmq.Context()
    socket = context.socket(zmq.REP)
    socket.bind("tcp://127.0.0.1:5556")

    while True:
        fqdn = socket.recv()
        fqdn2 = fqdn.decode("utf-8")

        # make first query (Flexible Search)
        print("Finding Flexible Search Hits...")
        flex_search_results = make_query(1, fqdn2)

        # flexible search may not have been successful; if there was a problem,
        # we'll get a numeric status code, otherwise we'll get JSON Lines output
        if flex_search_results.isdigit():
            socket.send_string(\
                "Error making flexible search query! Return code = " + \
                flex_search_results)
        else:
            # eventually we'll return a single large consolidated result to the
            # client called mybigstring2; obs2 tracks the number of results
            mybigstring2=""
            obs2=0

            # turn the results from the Flexible Search into a list
            sList = list(line for line in flex_search_results.strip().split("\n"))
            # remove the SAF bookend entries
            remove_saf_entries(sList)

            # uniquify the items
            formattedList = []
            for items in sList:
                formattedLine = print_bits(items)
                formattedList.append(formattedLine) if formattedLine \
                    not in formattedList else formattedLine
            sList = sorted(formattedList)
            print (str(len(sList)) + " entries found")

            # iterate over the remaining names we've found (we'll be enhancing
            # them via additional Standard Search queries)
            for items in sList:
                print ("Retrieving details for " + items)
                content = make_query(2, items)

                # we now have Standard Search output... let's make it into a list
                sList2 = list(line2 for line2 in content.strip().split("\n"))
                remove_saf_entries(sList2)

                for finalitems in sList2:
                    # format the results for eventual display
                    finalresults = print_detailed_bits(finalitems)

                    obs2=obs2+1
                    if obs2 == 1:
                        mybigstring2=finalresults
                    else:
                        mybigstring2=mybigstring2+"\n"+finalresults

                mybigstring2=mybigstring2+"\n"

        print ("Returning " + str(obs2) + \
               " total records to the client\nRun completed\n")
        socket.send_string(mybigstring2)

i) All that’s left is launcher code that instantiates the signal handler and invokes main():

if __name__ == "__main__":
    # Tell Python to run the handler() function when SIGINT is received
    signal(SIGINT, handler)

    main()

All in all, this represents just 210 lines of comfortably spaced and commented Python3 code. Granted, it doesn’t include extensive error checking or defensive hardening, but it’s a runnable proof of concept implementation.

V. Implementing The Intemediate Layer Enricher: What About The Client Code? (“il-client.py”)

It’s a renamed copy of the same rudimentary client that we showed in the first-part of this blog series:

#!/usr/local/bin/python3
import sys
import zmq

context = zmq.Context()
socket = context.socket(zmq.REQ)
socket.connect("tcp://127.0.0.1:5556")

myarg = sys.argv[1]

socket.send_string(myarg)
result = socket.recv_string()
print (result)

VI. Sample Run

Perhaps surprisingly, you can actually launch the client BEFORE the server’s running — because we’re using 0mq, the client will patiently wait for the server it needs to become available.

Let’s test this. Maybe we’re thinking a little about some post-pandemic fishing. Let’s see what domain names mention walleyes:

$ ./il-client.py 'walleye'

With our query now pending, we’ll then go to a 2nd window and launch our intermediate layer server. It will immediately see our pending query and begin processing it:

$ ./il-server.py
Finding Flexible Search Hits...

100 entries found
Retrieving details for bigwalleye.ca.
Retrieving details for blog.walleye.ca.
Retrieving details for cpanel.ontariowalleye.ca.
[...]
Retrieving details for www.walleyemafia.ca.
Retrieving details for www.walleyewallets.hookedmagazine.ca.
Returning 186 total records to the client
Run completed

When our server finishes working on our run, if we go back to our first window, we’ll find our results will be waiting:

$ ./il-client.py 'walleye'
bigwalleye.ca.                 A        "2020-10-19 10:42:14" "2019-06-10 10:03:54"          162 ["35.186.238.101"]
bigwalleye.ca.                 NS       "2020-10-19 10:42:14" "2018-12-10 11:03:26"          316 ["ns01.cashparking.com.", "ns02.cashparking.com."]
bigwalleye.ca.                 NS       "2020-10-19 10:42:14" "2018-12-10 11:03:26"          285 ["ns01.cashparking.com.", "ns02.cashparking.com."]
[...]
www.walleyelocating.ca.        CNAME    "2020-10-21 17:34:23" "2019-01-12 21:17:02"           27 ["walleyelocating.ca."]

www.walleyemafia.ca.           CNAME    "2020-10-13 21:18:29" "2017-10-28 17:18:00"          533 ["walleyemafia.ca."]

www.walleyewallets.hookedmagazine.ca. A  "2020-10-08 03:10:19" "2020-10-05 18:52:33"           6 ["69.90.161.105"]

Some notes about that:

If a query returns multiple records, they’ll be shown together
There will be a blank line after each DNSDB Standard Search query’s output
The results are a bit wide for this format (they’re fine in a wider-than-normal Mac OS Terminal window), but you can at least get a sense of what the results look like.

Since this is open source, you can obviously customize the Python3 output

If there are fields you don’t care about, you can easily omit those and make the output more compact
You could break each line of output into multiple lines per record, or
You might abandon any pretense of caring about columnular output, and just append one field after the next, etc.

If we decide we’d rather go after sunfish instead of walleyes, we can just submit another query — the server’s still running (unless we go back to it’s window and kill it with a control-C, or shut our system down/let it sleep or hibernate).

VII. Conclusion

We hope you’ve enjoyed seeing how you can create a DNSDB result enrichment Intermediate Layer application using 0mq. We think you’ll really enjoy working with it if you give it a try.

Appendix I. il-server.py

$ cat il-server.py
#!/usr/local/bin/python3
from pathlib import Path
from os import path
from io import BytesIO
from time import strftime, gmtime
from signal import signal, SIGINT
import sys
import json
import zmq
import pycurl

def handler(signal_received, frame):
    """Handle the user hitting ctrl-C"""
    print('CTRL-C detected: exiting')
    sys.exit(0)

def build_filter():
    """Read the kill file (if one exists) and construct a regex exclusion"""
    filterfilepath = str(Path.home()) + "/killfile.txt"
    if path.isfile(filterfilepath):
        with open(filterfilepath) as stream:
            mykillfilepattern = stream.read().rstrip()
        mykillfilepattern = "(" + mykillfilepattern + ")"
        mykillfilepattern = mykillfilepattern.replace("\n", "|")
        mykillfilepattern = mykillfilepattern.replace(" ", "")
        return mykillfilepattern
    else:
        return ""

def make_query(mytype, fqdn):
    """Make the actual DNSDB Flexible and Standard Searches"""

    # constants (tweak to taste)
    flex_results_to_return = 100
    flex_timefence_in_days = 30
    flex_timefence_in_seconds = flex_timefence_in_days * 86400 * -1

    standard_search_results_to_return = 100
    standard_timefence_in_days = 30
    standard_timefence_in_seconds = standard_timefence_in_days * 86400 * -1

    # get the DNSDB API key
    filepath = str(Path.home()) + "/.dnsdb-apikey.txt"
    with open(filepath) as stream:
        myapikey = stream.read().rstrip()

    # get the exclusion pattern (we reload each time to ensure
    # any changes get immediately applied)
    kill_pattern = build_filter()

    if mytype == 1:
        # flexible search
        url = "https://api.dnsdb.info/dnsdb/v2/regex/rrnames/" + \
            str(fqdn) + "?limit=" + str(flex_results_to_return) + \
            "&time_last_after=" + str(flex_timefence_in_seconds)
        if not kill_pattern:
            url = url + "&exclude=" + kill_pattern
    elif mytype == 2:
        # standard search
        url = "https://api.dnsdb.info/dnsdb/v2/lookup/rrset/name/" + \
            str(fqdn) + "?limit=" + str(standard_search_results_to_return) + \
            "&time_last_after=" + str(standard_timefence_in_seconds)

    requestHeader = []
    requestHeader.append('X-API-Key: ' + myapikey)
    requestHeader.append('Accept: application/jsonl')

    buffer = BytesIO()
    c = pycurl.Curl()
    c.setopt(pycurl.URL, url)
    c.setopt(pycurl.HTTPHEADER, requestHeader)
    c.setopt(pycurl.WRITEDATA, buffer)
    c.perform()
    rc = c.getinfo(c.RESPONSE_CODE)
    body = buffer.getvalue()
    content = body.decode('iso-8859-1')

    if rc == 200:
        return content
    else:
        return rc

def print_bits(myrecord):
    """extract and format RRname and RRtype value from Flexible Search query"""
    myrecord_json_format = json.loads(myrecord)
    extract_bit = myrecord_json_format['obj']['rrname']

    # if we want to restrict by RRtype
    # extract_bit_2 = myrecord_json_format['obj']['rrtype']
    # results = extract_bit + "/" + extract_bit_2

    # we normally don't want to restrict by RRtype (one query can get all RRtypes)
    results = extract_bit
    return results

def print_detailed_bits(myrecord):
    """format full Standard Search record before returning to client for display"""
    myformat = '%Y-%m-%d %H:%M:%S'
    myrecord_json_format = json.loads(myrecord)
    extract_bit = myrecord_json_format['obj']['rrname']
    extract_bit = str('{0:<30}'.format(extract_bit))

    extract_bit_2 = myrecord_json_format['obj']['rrtype']
    extract_bit_2 = str('{0:<8}'.format(extract_bit_2))

    temp_bit_3 = myrecord_json_format['obj']['rdata']
    extract_bit_3 = json.dumps(temp_bit_3)

    try:
        extract_tl = myrecord_json_format['obj']['time_last']
    except:
        extract_tl = myrecord_json_format['obj']['zone_time_last']

    tl_datetime = gmtime(extract_tl)
    enddatetime = strftime(myformat, tl_datetime)

    try:
        extract_tf = myrecord_json_format['obj']['time_first']
    except:
        extract_tf = myrecord_json_format['obj']['zone_time_first']

    tf_datetime = gmtime(extract_tf)
    startdatetime = strftime(myformat, tf_datetime)

    extract_count = myrecord_json_format['obj']['count']
    formatted_count = str('{0:>12,d}'.format(extract_count))
    results = extract_bit + " " + extract_bit_2 + " \"" + enddatetime + \
        "\" \"" + startdatetime + "\" " + formatted_count + \
        " " + extract_bit_3
    return results

def remove_saf_entries(mylist):
    """Strip the streaming format bookend records"""
    if mylist[0] == '{"cond":"begin"}':
        mylist.pop(0)
    if ((mylist[-1] == '{"cond":"succeeded"}') or \
        (mylist[-1] == '{"cond":"limited","msg":"Result limit reached"}')):
        mylist.pop()
    return mylist

def main():
    context = zmq.Context()
    socket = context.socket(zmq.REP)
    socket.bind("tcp://127.0.0.1:5556")

    while True:
        fqdn = socket.recv()
        fqdn2 = fqdn.decode("utf-8")

        # make first query (Flexible Search)
        print("Finding Flexible Search Hits...")
        flex_search_results = make_query(1, fqdn2)

        # flexible search may not have been successful; if there was a problem,
        # we'll get a numeric status code, otherwise we'll get JSON Lines output
        if flex_search_results.isdigit():
            socket.send_string(\
                "Error making flexible search query! Return code = " + \
                flex_search_results)
        else:
            # eventually we'll return a single large consolidated result to the
            # client called mybigstring2; obs2 tracks the number of results
            mybigstring2=""
            obs2=0

            # turn the results from the Flexible Search into a list
            sList = list(line for line in flex_search_results.strip().split("\n"))
            # remove the SAF bookend entries
            remove_saf_entries(sList)

            # uniquify the items
            formattedList = []
            for items in sList:
                formattedLine = print_bits(items)
                formattedList.append(formattedLine) if formattedLine \
                    not in formattedList else formattedLine
            sList = sorted(formattedList)
            print (str(len(sList)) + " entries found")

            # iterate over the remaining names we've found (we'll be enhancing
            # them via additional Standard Search queries)
            for items in sList:
                print ("Retrieving details for " + items)
                content = make_query(2, items)

                # we now have Standard Search output... let's make it into a list
                sList2 = list(line2 for line2 in content.strip().split("\n"))
                remove_saf_entries(sList2)

                for finalitems in sList2:
                    # format the results for eventual display
                    finalresults = print_detailed_bits(finalitems)

                    obs2=obs2+1
                    if obs2 == 1:
                        mybigstring2=finalresults
                    else:
                        mybigstring2=mybigstring2+"\n"+finalresults

                mybigstring2=mybigstring2+"\n"

        print ("Returning " + str(obs2) + \
               " total records to the client\nRun completed\n")
        socket.send_string(mybigstring2)

if __name__ == "__main__":
    # Tell Python to run the handler() function when SIGINT is received
    signal(SIGINT, handler)

    main()

Joe St Sauver is a Distinguished Scientist and Director of Research with Farsight Security, Inc..

Creating a DNSDB-Flexible-Search-To-DNSDB-Standard-Search Pipeline With 0mq

Share this entry

I. Introduction

II. Reviewing DNSDB Flexible Search Enrichment

III. Implementing A Flexible Search Enrichment Pipeline In Our Intermediate Layer

IV. Implementing The Intermediate Layer Enricher: The Intermediate Layer Server Code (“il-server.py”)

V. Implementing The Intemediate Layer Enricher: What About The Client Code? (“il-client.py”)

VI. Sample Run

VII. Conclusion

Sign up for our newsletter

Related Content

How Domain Intelligence and Passive DNS Create A Fuller Domain Profile

Phishmas Comes Early: New Developments in USPS Smishing Attacks

Post Quantum Cryptography (PQC): You May Already Be Using It!