Network Message, Volume 3: Headers and Encoding
Abstract
This article is the third in a multi-part blog series intended to introduce and acquaint the user with Farsight Security’s NMSG suite. This article explores some of the low-level implementation details of the NMSG protocol including header composition and data encoding.
Before reading this article, it is recommended that you read
Farsight’s Network Message, Volume 1: Introduction to NMSG and Farsight’s Network Message, Volume 2: Introduction to nmsgtool. This article covers NMSG (protocol) version 2
and
nmsg
(C library) version 0.9.1
.
The NMSG header
NMSG units begin with a small 10 octet header as depicted below:
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 'N' | 'M' | 'S' | 'G' | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flags | Version | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length (cont) | Payload(s) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . . . . . . . . . . . . . . . .
The NMSG header always starts with the four octet magic value: N
M
S
G
.
The Flags
octet is next, and depending if payload(s) is a fragment and/or
compressed, it can be one, both, or none of the following:
NMSG_FLAG_ZLIB
: Payload(s) is/are compressed.NMSG_FLAG_FRAGMENT
: Payload is a fragment.
The Version
octet should be 2
. The final header field, Length
, is an
unsigned four octect integer in network byte order that holds the length in octets of the payload(s).
NMSG payload(s) are encoded using Google Protocol Buffers. They are introduced in the next section.
Google Protocol Buffers
Google Protocol Buffers (sometimes referred to a “protobufs”) are an efficient language and platform neutral way to serialize arbitrary structured data. Protobufs are comparable to to XML but smaller, faster, and more efficient. This makes them an ideal solution to encode the variably typed data that flows through our Security Information Exchange (SIE).
To use protobufs in a program (or library code such as nmsg), the programmer
first needs to define what the source data looks like. Again using XML as the
model, protobufs are similar to an XML schema. This definition is written
using a simple specification language and saved to a text file with a
.proto
extension.
Once defined, this file is compiled using the one of the
protobuf compilers. This produces header and source files containing the API to
serialize the data.
The nmsg library is written in C so it uses the protobuf-c compiler to generate the API code for its protobuf serialization code.
If you want to learn more, Google maintains great documentation. The following protobuf-heavy
sections will make more sense if you are familiar with the .proto
specification language.
NMSG Protobuf Data
After the header, the first protobuf encoded message will either be of type
Nmsg
(which carries one or more NmsgPayload
messages) or NmsgFragment
(which carries an NmsgFragment
message). Both are discussed below.
The .proto definition for Nmsg
is shown below:
message Nmsg { repeated NmsgPayload payloads = 1; repeated uint32 payload_crcs = 2; optional uint32 sequence = 3; optional uint64 sequence_id = 4; }
payloads
: The actual NMSG payloads, the.proto
for these is explained below.payload_crcs
: A CRC used for error detection.sequence
: The optional sequence number.sequence_id
: The optional sequence number space. This is a randomized 64-bit number identifying the sequence number space that the ‘sequence’ parameter exists in. Thesequence_id
is used by NMSG consumers to uniquely ID sequence number “flows”.
If the NMSG_FLAG_FRAGMENT
flag is set in the NMSG header, then the data part
is an NmsgFragment
protobuf message, as shown below:
message NmsgFragment { required uint32 id = 1; required uint32 current = 2; required uint32 last = 3; required bytes fragment = 4; optional uint32 crc = 5; }
id
: Fragment ID used by all fragments in this group (chosen at random by the NMSG library).current
: The current fragment in the list.last
: The last fragment in the list.fragment
: The actual fragment bytes.crc
: The CRC of reassembled NMSG.
The NmsgPayload
messages contain payload data and are defined as follows:
message NmsgPayload { required uint32 vid = 1; required uint32 msgtype = 2; required int64 time_sec = 3; required fixed32 time_nsec = 4; optional bytes payload = 5; optional uint32 source = 7; optional uint32 operator = 8; optional uint32 group = 9; }
vid
: The Farsight assigned NMSG vendor ID. These values have a printable name (base
,SIE
, etc) and corresponding codes which are used here.msgtype
: A vendor-specific message type code that signals the serialization type used to encode the payload. Likevid
,msgtype
has a printable name (dns
,encode
,ipconn
, etc) and corresponding codes. They are defined in more detail below. Thevid
together with themsgtype
can be used to determine the type of data contained in the payload.time_sec
: Seconds timestamp of when payload was generated.time_nsec
: Nanoseconds timestamp of when payload was generated.payload
: The actual NMSG payload data.source
: Optional user-defined unsigned 32-bit value, used to uniquely identify an organization submitting data to SIE.operator
: Optional unsigned 32-bit value, used to further differentiate the sender of the data. Value is an integer on the wire and on disk, but is intended to be translated into a symbolic string for presentation by a lookup against thenmsg.opalias
file.group
: Optional user-defined unsigned 32-bit value, used for fine grain winnowing. Value is an integer on the wire and on disk, but is intended to be translated into a symbolic string for presentation by a lookup against thenmsg.gralias
file.
Base Message Modules
Accompanying nmsg are the vendor base
encoding modules. These provide
protobuf serialization for a handful of common use cases. Currently included
are the following modules:
dns
: For encoding DNS RRs, RRsets, and question RRs.dnsqr
: For capturing DNS query/response state. This message type is used by Farsight’s Passive DNS sensors.email
: For describing email message metadata relating to unsolicited email messages (colloquially referred to as “spam”.)encode
: For encapsulating data in other generic formats for transport across SIE. Supported are text, JSON, YAML, MsgPack, and XML.http
: For representing hits to HTTP sinkholes.ipconn
: For describing an IP connection, a five tuple that includes the transport layer protocol.linkpair
: For representing links between web pages.logline
: For representing a single line from a log file (i.e.: syslog).ncap
: For representing legacy NCAP data.packet
: For representing an IPv4 or IPv6 packet.pkt
: A legacy encoder for representing packet data, deprecated in favor ofpacket
.xml
: For representing XML data.
SIE Message Modules
Farsight maintains a separate package, sie-nmsg
, that contains a group of
message module plug-ins specifically designed for Farsight’s SIE. These
plug-ins are:
delay
: A legacy encoder used to generate a reduction of SIE Channel 202 containing transaction latencies.dnsdedupe
: For encoding de-duplicated and de-duplicated/verified Passive DNS traffic.newdomain
: For encoding Newly Observed Domains (NOD) traffic.qr
: A legacy encoder intended for use with an early version DNSDB lookup server.reputation
: For encoding Distributed Reputation Whiteboard data, an experimental service developed by Farsight to facilitate the real-time sharing of reputation data without a priori knowledge of data types.
Coming up
The next article in the NMSG series will introduce the libnmsg
C programming
API.
Mike Schiffman is a Protocol Legerdemainist for Farsight Security, Inc.
Read the next part in this series: Farsight’s Network Message, Volume 4: The C Programming API