Using The SIE Batch API to Find Matching Names in Newly Observed Domains (NOD)
1. Introduction
In previous articles, we showed you how to use SIE Batch to pull data for select Security Information Exchange channels, both via an interactive point-and-click web page and via the SIE Batch API.
In this article, we’re going to show how you can use sie_get_rb
(described here) as a building block for a little bash script to find domains of interest in Channel 212, our Newly Observed Domains (NOD) channel. We call this little example sie_batch_match.
What Newly Observed Domains might you be interested in?
A brand owner might want to watch NOD for their brand names or trademarks to see if a third party is using them as part of a unauthorized “knock-off” site
IT or security professionals focused on phishing attacks might want to watch for the name of a bank they’re protecting
Those fighting Covid-19 scams, misinformation, and profiteering might want to watch for (and review) domains with names such as “covid”, “corona”, “pandemic”, “ventilator”, “n95” or “sanitizer”
Political campaigns might want to watch for domains related to contests, candidates, or issues/topics they’re supporting (or opposing).
Ultimately, what you want to watch for is really up to you.
2. The Little Bash Scripts
In a nutshell, our scripts will:
Pull a one-minute batch of data from the Newly Observed Domains Channel using
sie_get_rb
Extract just the
RRname
field from those records usingjq
(the RRname is the “left hand side” of DNS resource records)Use
grep
to search the names for domains that contain one of a number of substrings of interest
Other program design choices:
Because this is intended to be a relatively stable/persistent watcher tool, we’ll simply use a text editor to put the patterns of interest into a file called
./strings-to-match.txt
We’ll archive the files we download into a directory called
~/processed-jsonl-files/
Eventually we’ll need to do housekeeping on those files (compressing or deleting them, etc), but since this is just a proof of concept, we won’t go into those detail today.We’ll send the matches we find to
stdout
(we can always redirectstdout
to a file or whatever should we need to do so)We’ll also routinely confirm that the needed programs and files are all available when we run the script.
The script itself is short:
$ cat sie_batch_match.bash
#!/bin/bash
# make sure we have sie_get_rb installed
# https://github.com/farsightsec/blog-code/tree/master/sie_get_clients/sie_get_ruby
command -v sie_get_rb >/dev/null 2>&1 || \
{ echo >&2 "We use sie_get_rb but it's not installed. Correct and rerun."; exit 1; }
# make sure we have jq installed
# https://stedolan.github.io/jq/
command -v jq >/dev/null 2>&1 || \
{ echo >&2 "We use jq but it's not installed. Install and rerun."; exit 1; }
# make sure we have a file of strings to match
if [ ! ./strings-to-match.txt ]; then
echo "Need ./strings-to-match.txt Create that file then rerun"
exit
fi
# make sure we have the directory we need to process the jsonl-format files
if [ ! -d ./.process-sie-get-jsonl-files ]; then
mkdir -p ./.process-sie-get-jsonl-files
fi
# make sure we have the directory we'll use to save the processed jsonl files
if [ ! -d ./processed-jsonl-files ]; then
mkdir -p ./processed-jsonl-files
fi
# grab data for a minute for a jsonl channel (in this case ch212)
sie_get_rb 212 now 1
# move the resulting data to the file for processing
mv -f sie-*.jsonl ./.process-sie-get-jsonl-files
# find and display matches
jq -r '.message.rrname' ./.process-sie-get-jsonl-files/sie-*\.jsonl | \
grep --ignore-case --no-filename --color --file ./strings-to-match.txt
mv -f ./.process-sie-get-jsonl-files/* ./processed-jsonl-files/.
Having created that script, you could then run it from cron once a minute, or you could invoke it interactively with a 2nd little runit
bash script such as:
$ cat runit.bash
#!/bin/bash
while true
do
echo -n "TIME STAMP: "
date -u
./sie_batch_match.bash
sleep 60
done
Ensure both those files are executable:
$ chmod a+rx sie_batch_match.bash $ chmod a+rx runit.bash
3. Sample Run
So as a test, we ran with keywords that looked like:
$ cat strings-to-match.txt cdc covid corona pandemic
We saw output that looked like the following (since these are new domains of unknown provenance, we’ve replacing one dot in each of these domain names with [dot] for display here):
$ ./runit.bash [snip] TIME STAMP: Fri Mar 27 15:33:40 UTC 2020 covidsupplychain[dot]org. covidfinance[dot]org. covid19collaborators[dot]org. covidcontinuity[dot]org. covid19initiatives[dot]org. TIME STAMP: Fri Mar 27 15:34:43 UTC 2020 coronavoucher[dot]org. covidit[dot]org. ccovidactnow[dot]org. TIME STAMP: Fri Mar 27 15:35:45 UTC 2020 covidmarketing[dot]org. anticovid19[dot]nl. ridofcovid19[dot]com. coronalearning[dot]xyz. covidhr[dot]org. top-corona[dot]com. [snip]
Are those domains good? Are those domains bad? That’s not something Farsight evaluates — after all, we might not all see a given domain the same way. We only tell you that these are objectively new domains we’ve just seen for the first time on one of our sensors.
After that, it’s up to you (or the domain reputation vendor of your choice) to carefully dig into the “goodness” or “badness” of that name (should you desire to do so).
4. Other Enhancements/Changes?
a) Different Cadence?
Currently we’re pulling a new batch of data every minute. In many cases, a more relaxed retrieval schedule might be fine (e.g., perhaps pull an hour’s worth of data every 3600 seconds.)
To do so, you’d change the duration in the sie_get_rb
call in sie_batch_match.bash
and update the sleep duration in the runit.bash
script (note that one of those is in minutes, and the other is in seconds).
b) Different Matcher?
Currently we use grep
to do a very straightforward match against the SIE Batch files we download, but obviously you could easily modify the script to use a different matcher of your choice. For example, you could replace grep with agrep
(“approximate GREP for fast fuzzy string searching”), see https://github.com/Wikinaut/agrep
c) Reporting More Than Just The RRname?
Or you might want to report more than just the RRnames. Perhaps you also want to output the record type and Rdata for each record?
That’s an easy thing to change in the script’s jq
command. Replace:
jq -r '.message.rrname' ./.process-sie-get-jsonl-files/sie-*\.jsonl
with
jq -r '"\(.message.rrname) \(.message.rrtype) \(.message.rdata)"' ./.process-sie-get-jsonl-files/sie-*\.jsonl
and then when you run you’ll see output that looks like (once again sanitized for display here, since this is a brand new domain):
[snip] TIME STAMP: Fri Mar 27 17:12:32 UTC 2020 coronatestkit[dot]es. NS ["docks19.rzone[dot]de.","shades01.rzone[dot]de."] [snip]
d) Watching A Different Channel?
You might want to use sie_batch_match
to watch an SIE Batch channel other than Channel 212.
You can obviously modify the sample script to do so, but if you do, please be alert to the fact that some of the channels available via SIE are in NMSG format (rather than JSON Lines format).
To convert an NMSG format file to JSON Lines format, you’d use nmsgtool
like so:
$ nmsgtool -r sie-ch204-{2020-02-13@20:54:00}-{2020-02-13@20:55:00}.nmsg -J –
Q. “But I don’t have nmsgtool
…”
A. If you’re using Debian Linux, you can install nmsgtool as a package, see https://www.farsightsecurity.com/technical/SIE-user-guide/sie-debian/
Source code is also available for those who prefer to build from source, or for use on systems other than Debian Linux. See https://github.com/farsightsec/
Build instructions for nmsgtool
for the Mac in particular can be found in Appendix I of our recent whitepaper
5. Conclusion
We hope you’ve found this an intriguing little example of how you might practically use the SIE Batch API.
To arrange access to SIE, please contact Farsight Security Sales at [email protected]. Be sure to mention that you want to try SIE Batch.
Joe St Sauver Ph.D. is a Distinguished Scientist with Farsight Security®, Inc.