Carrying Arbitrary Data Payloads (Such as Images) In NMSG
Introduction
If you work with SIE (the Security Information Exchange), you’ll have encountered NMSG-format files. NMSG is the format Farsight Security (now part of DomainTools) uses to distribute DNS cache miss traffic and other cybersecurity data over SIE’s jumbo frame-enabled Ethernet switches.
NMSG payloads normally contain small observations, and in fact, we’re often able to pack multiple small observations into a single 8K NMSG payload for transmission efficiency. That said, NMSG is designed well enough to also handle larger-than-8K payloads, fragmenting the data for transmission and then reassembling the fragments after they’ve been received.
Given that model, we were curious – would NMSG be able to satisfactorily encode arbitrary files (such as images), even VERY large images? We decided to experiment and find out.
NMSG Vendor/Message Types
The classic introduction to NMSG is a five part series of articles by Mike Schiffman that appeared in the Farsight Blog in 2015:
- “Farsight’s Network Message, Volume 1: Introduction to NMSG”
- “Farsight’s Network Message, Volume 2: Introduction to nmsgtool”
- “Network Message, Volume 3: Headers and Encoding”
- “Farsight’s Network Message, Volume 4: The C Programming API”
- “Farsight’s Network Message, Volume 5: The Python Programming API”
If you’ve not worked with NMSG before – or even if you have – those articles are still well worth reviewing.
The first question we faced was how to format or encode our traffic for transfer. The first article from the above series mentions:
A core tenet behind NMSG is data agnosticism. Some of the data Farsight consumes, ships and stores isn’t best represented in its native format as frames, packets, datagrams, segments, or other data primitives. To this end, NMSG was designed to be ignorant of the data it ferries. NMSG offloads the details of encoding to external message modules and in fact can work with opaque containers.
That’s true – BUT much of the data SIE carries in NMSG form IS carefully mapped, field-by-field, to specific formats.
For example, consider the definition for the “SIE/dnsdedupe” format
– that format mentions many field names familiar to those who routinely work with DNSDB – rrname, rrtype, rdata, bailiwick, count, time first seen, time last seen, and so on. See the following screenshot:
That DNS-focused structure is perfect for the DNS data that feeds DNSDB, but wouldn’t be appropriate for generic image blobs. Fortunately, we do have a predefined “generic” data format that should work great for image blobs: that’s the “base/encode” format, as shown in the following screenshot:
Thus, at its simplest, we could simply base64-encode our image into ASCII text format, and then pack that text into an NMSG format binary file using “base/encode” format. That should get our image across, but the resulting blob would be completely “context-less.” Is our eventually-decoded blob of bytes a PNG? A JPEG? A TIFF? Currently we’d have no clue! Having a filename would be a big improvement and provide some very helpful “hints.” We might also want other “metadata” about the file, such as the width and height of the image in pixels, the size of the image in bytes, one or more checksums, the date and time the image was collected, and a source URI. We could “get by” without all that metadata, but since we’re designing this experiment from scratch, let’s do our best to anticipate what we’ll eventually want or need later. So, what should we do? Well, our base64-encoded image — plus the metadata we discussed — could all be handled by using a JSON object.
Fortunately, JSON is another one of the EncodeType options supported by the “base/encode” format.
Creating a Sample NMSG File
If we were going to be doing this as part of a routine production workflow we’d build the bits and pieces needed using a program or script, but for the purposes of this article and to better illustrate what we’re doing, we’re going to use a manual approach.
We begin with a test image we want to transfer via an NMSG file. Let’s assume the image we want to encode is a small orange dot, saved as a PNG image called orange-dot.png
:
Let’s extract some of the meta-information we’ll need to populate our chosen metadata fields using Un*x command line tools.
We’ll get the image size in pixels using the Image Magick identify command:
$ identify -format '%wx%h\n' orange-dot.png
30x30
To get the size of the file in bytes, we’ll use the standard Un*x wc command using the -c (byte count) option:
$ cat orange-dot.png | wc -c
3403
MD5sum is one of the most commonly used checksums (even though it isn’t very cryptographically strong). SHA-256 is a stronger checksum, but is perhaps less popular than other alternatives. We used the GNU Coreutils versions of those routines:
$ md5sum orange-dot.png | awk '{print $1}'
41d0a0714c71debba5dd520a0911682d
$ sha256sum orange-dot.png | awk '{print $1}'
cf546e0ba5fb5118e72fa7bd415db6761e1aee4c5a6f3a184b7e3ab99c029ea3
We like ISO8601 format for our dates/times, so we’ll ensure we’ve specified an ISO8601-compatible format:
$ ls -l --time-style="+%Y-%m-%dT%H:%M:%S%:z" orange-dot.png | awk '{print $6}'
2022-10-19T13:22:02-07:00
We assembled all that metadata into a “header” file that’s actually a “partial” JSON file:
$ cat mymetadata.txt
{
"filename":"orange-dot.png",
"dimensions":"30x30",
"size":"3403",
"md5sum":"41d0a0714c71debba5dd520a0911682d",
"sha256sum":"cf546e0ba5fb5118e72fa7bd415db6761e1aee4c5a6f3a184b7e3ab99c029ea3",
"datetime":"2022-10-19T13:22:02-07:00",
"uri":"file:///Users/joe/orange-dot.png",
"image":"
Note the opening curly brace at the top, and the unterminated double quote on the last line. We’re now ready to base64-encode our image. We’ll use the base64 command from GNU Coreutils to make the encoded version of our file, specifying the -w 0 option (which means “generate base64-encoded output without line wrapping”):
$ base64 -w 0 < orange-dot.png > image.base64
We also need to create a closing “suffix” file to “tack on” and “wrap up” that JSON object (this file will stay the same for all our runs):
$ cat wrapup.txt
"
}
We’ll sandwich the metadata header, the base64-encoded image, and the suffix file together into one base64-encoded payload:
$ paste -s -d'\0' mymetadata.txt image.base64 wrapup.txt | base64 -w 0 > mypayload.base64
We’ve now prepped the payload we want to transfer. All we’ve got left to do is to (a) encapsulate that payload into another JSON object, and (b) convert that file to a binary NMSG file.
As when we built our payload file, we’ll need a header file, a body, and a trailer file. The header and trailer files look like:
$ cat n-hd.txt
{"vname":"base",
"mname":"encode",
"message":{
"type": "JSON",
"payload":"
$ cat n-tr.txt
"
}
}
Those files will stay the same for all runs. We’ll sandwich our JSON payload between that header and trailer file by saying:
$ cat n-hd.txt mypayload.base64 n-tr.txt | tr -d '\n' | jq -c | nmsgtool -z -j - -w mysample.nmsg
The final mysample.nmsg file is 3755 bytes long, so in the case of this small file, we’ve suffered a ~10% increase in file size by going to nmsg binary file format.
Confirming That We Can Successfully Re-Extract Our Original Image
Now let’s confirm that we can get our data back out of that NMSG file. We’ll begin by converting the NMSG binary file back to JSON format:
$ nmsgtool -r mysample.nmsg -J test-output.jsonl
Even in JSON format, our payload’s still base64 encoded, so let’s extract and “de-base64” it:
$ jq -r '.message.payload' test-output.jsonl > test-payload.base64
$ base64 -w 0 -d test-payload.base64 | tr -d '\n' > test-payload.jsonl
Now we can extract metadata from the payload, if we want to. For example, perhaps we want to see the original filename:
$ jq -r '.filename' < test-payload.jsonl
orange-dot.png
Finally, let’s extract the image and verify that it survived “intact” by checking checksums:
$ jq -r '.image' < test-payload.jsonl | base64 -d > check-output.png
$ sha256sum check-output.png | awk '{print $1}'
cf546e0ba5fb5118e72fa7bd415db6761e1aee4c5a6f3a184b7e3ab99c029ea3
The observed checksum matches the original file checksum. This means that we’ve successfully created an NMSG of an image file, and then re-extracted a verbatim copy of our original image from it! Happy times!
Trying A Larger Image
After demonstrating our ability to encode a small (30×30 pixel) orange dot into a binary NMSG file, let’s try a more ambitious image. For this test, let’s try encapsulating Georges Seurat’s “Seascape at Port-en-Bessin, Normandy” (1888), a pointilist image provided courtesy of the National Gallery of Art, Washington DC:
The metadata we assembled for that image looks like:
$ cat mymetadata2.txt
{
"filename":"seascape_at_port-en-bessin,_normandy_1972.9.21.jpg",
"dimensions":"4096x3298",
"size":"7270790",
"md5sum":"54e30f397e7f7e24a35bc0f4d06f6b41",
"sha256sum":"5d8015a89d23dfe2715cb0ac0f242d3769a8c31b4a69fcb721cec333afbc98d1",
"datetime":"2022-10-19T20:43:42-07:00",
"uri":"https://www.nga.gov/collection/art-object-page.53139.html",
"image":"
$ base64 -w 0 < seascape_at_port-en-bessin,_normandy_1972.9.21.jpg > image2.base64
$ paste -s -d'\0' mymetadata2.txt image2.base64 wrapup.txt | base64 -w 0 > mypayload2.base64
$ cat n-hd.txt mypayload2.base64 n-tr.txt | tr -d '\n' | jq -c | nmsgtool -z -j - -w mysample2.nmsg
The resulting mysample2.nmsg file is only ~1% larger (at 7,339,262 octets) than the original image file — minimal overhead involved in using NMSG format! We’ll confirm that we can re-extract the original image from our NMSG file:
$ nmsgtool -r mysample2.nmsg -J test-output-2.jsonl
$ jq -r '.message.payload' test-output-2.jsonl > test-payload-2.base64
$ base64 -w 0 -d test-payload-2.base64 | tr -d '\n' > test-payload-2.jsonl
$ jq -r '.filename' < test-payload-2.jsonl
seascape_at_port-en-bessin,_normandy_1972.9.21.jpg
$ jq -r '.image' < test-payload-2.jsonl | base64 -d > check-output-2.jpg
$ sha256sum check-output-2.jpg | awk '{print $1}'
5d8015a89d23dfe2715cb0ac0f242d3769a8c31b4a69fcb721cec333afbc98d1
As we’d hoped, the larger file worked just as our smaller test did – check-output-2.jpg is exactly the same as our original image.
Demonstrating Actual Transmission and Reception of the NMSG File
There’s one final step we should go through, and that’s demonstrating actual transmission and reception of the NMSG file.
We’ll do so over a UDP loopback connection to an arbitrarily-selected loopback address and port (127.50.0.1/7777).
In one window, we’ll create a receiving nmsgtool process:
$ nmsgtool -l 127.50.0.1/7777 -w received2.nmsg -z --unbuffered
In a 2nd window on the same system, we’ll create a sending nmsgtool process, setting the MTU to 1280 bytes to avoid any MTU issues:
$ nmsgtool -r mysample2.nmsg -s 127.50.0.1/7777 -m 1280 --unbuffered
The file transfers virtually instantaneously (as we’d expect).
After closing the receiving process by hitting CTRL-C in the receiving window, we see:
$ ls -l mysample2.nmsg received2.nmsg | awk '{print $9 " " $5}'
mysample2.nmsg 7339262
received2.nmsg 7339262
Excellent! As a rough first check, the file’s at least the same size!
Now let’s extract the payload and verify that the payload checksums agree:
$ nmsgtool -r mysample2.nmsg -J mysample2.json
$ jq -r '.message.payload' mysample2.json > mysample2.payload
$ nmsgtool -r received2.nmsg -J received2.json
$ jq -r '.message.payload' received2.json > received2.payload
$ ls -l mysample2.payload received2.payload | awk '{print $9 " " $5}'
mysample2.payload 12926317
received2.payload 12926317
$ sha256sum mysample2.payload received2.payload
e0e81b9851c3dc502f67a8fabcc45870de932ca8fa38fc759b21f3b3700155a8 mysample2.payload
e0e81b9851c3dc502f67a8fabcc45870de932ca8fa38fc759b21f3b3700155a8 received2.payload
Bingo! Everything looks like it transferred just great, even with a small MTU.
Conclusion
You’ve now learned how you can encapsulate arbitrary content (and associated metadata) in NMSG-format binary files. This demonstrates that arbitrary content – even large image files – can be NMSG encoded and then successfully transferred, even if small MTUs require fragmentation of a large object.