Farsight TXT Record

Network Message, Volume 3: Headers and Encoding

Written by: 
Published on: 
Feb 11, 2015

Abstract

This article is the third in a multi-part blog series intended to introduceand acquaint the user with Farsight Security's NMSG suite. This articleexplores some of the low-level implementation details of the NMSG protocolincluding header composition and data encoding.

Before reading this article, it is recommended that you readFarsight's Network Message, Volume 1: Introduction to NMSG and Farsight's Network Message, Volume 2: Introduction to nmsgtool. This article covers NMSG (protocol) version 2 andnmsg (C library) version 0.9.1.

The NMSG header

NMSG units begin with a small 10 octet header as depicted below:

     0                   1                   2                   3  
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |      'N'      |      'M'      |      'S'      |      'G'      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |     Flags     |    Version    |            Length             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |         Length (cont)         |           Payload(s)
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . . . . . . . . . . . . . . . .

The NMSG header always starts with the four octet magic value: N M S G.The Flags octet is next, and depending if payload(s) is a fragment and/orcompressed, it can be one, both, or none of the following:

  • NMSG_FLAG_ZLIB: Payload(s) is/are compressed.
  • NMSG_FLAG_FRAGMENT: Payload is a fragment.

The Version octet should be 2. The final header field, Length, is anunsigned four octect integer in network byte order that holds the length in octets of the payload(s).

NMSG payload(s) are encoded using Google Protocol Buffers. They are introduced in the nextsection.

Google Protocol Buffers

Google Protocol Buffers (sometimes referred to a “protobufs”) are an efficientlanguage and platform neutral way to serialize arbitrary structured data.Protobufs are comparable to to XML but smaller, faster, and more efficient.This makes them an ideal solution to encode the variably typed data that flowsthrough our Security Information Exchange (SIE).

To use protobufs in a program (or library code such as nmsg), the programmerfirst needs to define what the source data looks like. Again using XML as themodel, protobufs are similar to an XML schema. This definition is writtenusing a simple specification language and saved to a text file with a.proto extension.Once defined, this file is compiled using the one of theprotobuf compilers. This produces header and source files containing the API toserialize the data.

The nmsg library is written in C so it uses theprotobuf-c compilerto generate the API code for its protobuf serialization code.

If you want to learn more, Google maintains great documentation. The following protobuf-heavysections will make more sense if you are familiar with the .protospecification language.

NMSG Protobuf Data

After the header, the first protobuf encoded message will either be of typeNmsg (which carries one or more NmsgPayload messages) or NmsgFragment(which carries an NmsgFragment message). Both are discussed below.

The .proto definition for Nmsg is shown below:

   message Nmsg
   {
       repeated NmsgPayload    payloads     = 1;
       repeated uint32         payload_crcs = 2;
       optional uint32         sequence     = 3;
       optional uint64         sequence_id  = 4;
   }

  • payloads: The actual NMSG payloads, the .proto for these is explainedbelow.
  • payload_crcs: A CRCused for error detection.
  • sequence: The optional sequence number.
  • sequence_id: The optional sequence number space. This is a randomized64-bit number identifying the sequence number space that the ‘sequence’parameter exists in. The sequence_id is used by NMSG consumers to uniquelyID sequence number “flows”.

If the NMSG_FLAG_FRAGMENT flag is set in the NMSG header, then the data partis an NmsgFragment protobuf message, as shown below:

   message NmsgFragment
   {
       required uint32         id           = 1;
       required uint32         current      = 2;
       required uint32         last         = 3;
       required bytes          fragment     = 4;
       optional uint32         crc          = 5;
   }

  • id: Fragment ID used by all fragments in this group (chosen at random bythe NMSG library).
  • current: The current fragment in the list.
  • last: The last fragment in the list.
  • fragment: The actual fragment bytes.
  • crc: The CRC of reassembled NMSG.

The NmsgPayload messages contain payload data and are defined as follows:

   message NmsgPayload
   {
       required uint32         vid          = 1;
       required uint32         msgtype      = 2;
       required int64          time_sec     = 3;
       required fixed32        time_nsec    = 4;
       optional bytes          payload      = 5;
       optional uint32         source       = 7;
       optional uint32         operator     = 8;
       optional uint32         group        = 9;
   }

  • vid: The Farsight assigned NMSG vendor ID. These values have a printable name (base,SIE, etc) and corresponding codes which are used here.
  • msgtype: A vendor-specific message type code that signals theserialization type used to encode the payload. Like vid, msgtype hasa printable name (dns, encode, ipconn, etc) and correspondingcodes. They are defined in more detail below. The vid together with themsgtype can be used to determine the type of data contained in thepayload.
  • time_sec: Seconds timestamp of when payload was generated.
  • time_nsec: Nanoseconds timestamp of when payload was generated.
  • payload: The actual NMSG payload data.
  • source: Optional user-defined unsigned 32-bit value, used to uniquelyidentify an organization submitting data to SIE.
  • operator: Optional unsigned 32-bit value, used to further differentiatethe sender of the data. Value is an integer on the wire and on disk, butis intended to be translated into a symbolic string for presentation by alookup against the nmsg.opalias file.
  • group: Optional user-defined unsigned 32-bit value, used for fine grainwinnowing. Value is an integer on the wire and on disk, but is intended tobe translated into a symbolic string for presentation by a lookup againstthe nmsg.gralias file.

Base Message Modules

Accompanying nmsg are the vendor base encoding modules. These provideprotobuf serialization for a handful of common use cases. Currently includedare the following modules:

  • dns: For encoding DNS RRs, RRsets, and question RRs.
  • dnsqr: For capturing DNS query/response state. This message type is used byFarsight's Passive DNS sensors.
  • email: For describing email message metadata relating to unsolicited emailmessages (colloquially referred to as “spam”.)
  • encode: For encapsulating data in other generic formats for transportacross SIE. Supported are text, JSON, YAML, MsgPack, and XML.
  • http: For representing hits to HTTP sinkholes.
  • ipconn: For describing an IP connection, a five tuple that includes thetransport layer protocol.
  • linkpair: For representing links between web pages.
  • logline: For representing a single line from a log file (i.e.: syslog).
  • ncap: For representing legacy NCAP data.
  • packet: For representing an IPv4 or IPv6 packet.
  • pkt: A legacy encoder for representing packet data, deprecated in favor ofpacket.
  • xml: For representing XML data.

SIE Message Modules

Farsight maintains a separate package, sie-nmsg, that contains a group ofmessage module plug-ins specifically designed for Farsight's SIE. Theseplug-ins are:

  • delay: A legacy encoder used to generate a reduction of SIE Channel 202containing transaction latencies.
  • dnsdedupe: For encoding de-duplicated and de-duplicated/verifiedPassive DNS traffic.
  • newdomain: For encoding Newly Observed Domains (NOD) traffic.
  • qr: A legacy encoder intended for use with an early version DNSDB lookupserver.
  • reputation: For encoding Distributed Reputation Whiteboard data, an experimentalservice developed by Farsight to facilitate the real-time sharing of reputationdata without a priori knowledge of data types.

Coming up

The next article in the NMSG series will introduce the libnmsg C programmingAPI.

Mike Schiffman is a Protocol Legerdemainist for Farsight Security, Inc.

Read the next part in this series: Farsight's Network Message, Volume 4: The C Programming API