featured image, lights with dark background

Introduction

Data passes across the Farsight Security, Inc. Security Information Exchange every second of every day. While a significant percentage of it is related to Passive DNS replication, we occasionally participate in sinkhole administration efforts for botnet cleanup operations. We forward that information in real time to participating organizations that in turn fan that data out to ISPs and Internet security companies to help identify and remediate affected clients.

As part of curating that feed, and with the permission of the sinkhole operator, we also normally archive that data. Doing so has paid off for approved and vetted researchers as we can make that data available for subsequent retrospective analysis.

Some examples follow.

“Post-Mortem of a Zombie: Conficker Cleanup After Six Years”

The Conficker sinkhole is the longest effort we’ve facilitated. After the Conficker botnet was sinkholed, infected Windows clients have continued to try to “phone home” for updates for almost seven years. Today, the Conficker botnet may be about a sixth of the size it was at its peak, but it’s still present, on up to one million clients.

We’ve kept a Conficker data archive online in our data centers for download by researchers and remediators alike, but it wasn’t until recently that a team from Technical University Delft (NL) completed a study of long-term remediation efforts utilizing that data. Their work was accepted and presented at Usenix Security 2015 earlier this month (see link above).

”DNS Changer Remediation Study”

When we took part in the takedown of the DNS Changer botnet, we made periodic archives available for download by different remediation teams, and we maintained the data for as long as we could. A group from Georgia Tech (US) was able to take the data and use it to compare different notification methods utilized during remediation efforts to determine which were more effective.

“Developing Security Reputation Metrics for Hosting Providers”

More recently, DNS researchers at TU Delft (NL) wanted to look back, on a daily basis, at Passive DNS DNSDB data, something we don’t normally keep (it gets rolled up into monthly databases). By special arrangement, we kept daily DNSDB database dumps going long enough for them to utilize correlations from DNS resources to develop security metrics for ISPs and hosting providers. Their work was also just recently published earlier this month at Usenix Security (CSET ’15).

Lessons Learned

Along the way we’ve learned many lessons about long-term storage that might be helpful for other projects:

  • ZFS is amazing technology for long term archiving. If you dump data to tape or a disk to take offline, you risk not being able to retrieve it later because you don’t know if it’s gone bad while offline. One needs to regularly verify the data (a process ZFS calls “scrubbing”) to make sure drive or disk sector failures are found in time to ensure the data continues to be available. The checksum operations performed when data is written have helped us discover drive issues before the drives failed or even before SMART technology can predict a failure. We’ve preserved data even through encountering flaky SAS controllers and multiple-disk failures – sometimes two disks in the same RAID set. We now utilize RAID-Z3 where we can.

  • Sometimes storage systems can become too large. We once spread a ZFS file system across several disk chassis, and when a chassis or cable started having issues, it caused problems. If a file system is limited to a single chassis, it is possible to move the data or chassis around between servers. As the number of disks within a RAID group increases, so does the risk of a multiple-disk failure causing problems with availability.

  • Utilizing a date-driven hierarchy for directories (YYYY/MM/DD). A tried and true method for storing all manner of data, when used to store sinkhole data this makes it easier to manage migrating storage sets between systems or storage chassis. A client can access multiple file servers via NFS, and the directory structure provides a clear boundary for NFS mount points. If users are downloading data regularly, splitting data into multiple directories also helps programs like rsync limit how many files need to be processed during synchronization, reducing transfer overhead.

  • Where possible, always log data twice. Log once on the sinkhole data collection system and once where the data is aggregated and permanently archived. The sinkhole storage can be automatically purged after a few days, but having it around for a few days provides the ability to recover from a temporary logging or storage problem in the long-term archive storage. Monitoring data availability as if you were a client will also help catch issues in time for you to recover utilizing data from the limited storage on the sinkhole server.

  • Be clear about your data retention policies. If there is a lack of an understanding between all parties at the start of a collection effort, many might assume that we can collect data indefinitely and not ask us to stop. Several times we’ve had to look for more storage resources for Conficker data, and organizations like ICANN, Microsoft, and PacketForensics have stepped up to make disks or chassis available (Thank you!).

  • Be cognizant of operational costs. As disk drives increase density over time, the burden of storing data goes down, but one still needs to budget for continuous operation of the storage. It’s more sustainable to allocate a fixed time for storage logs to be available online with a well-defined deletion policy so that researchers who may want or need a longer-term access to the data will know that they need to maintain their own local copy. We can also make arrangements with organizations like DHS PREDICT for sinkhole data or DNS-OARC for DNS data.

Conclusion

Farsight Security is proud of our past and current public-private collaboration with law enforcement, researchers, and Internet security organizations that put DNS and sinkhole data to good use. We look forward to making resources available for similar future efforts. If you’re a researcher that needs data or an operator that wants to make sinkhole data available to researchers and/or industry on a real-time basis, we encourage you to contact us.

Eric Ziegast is a Senior Distributed Systems Engineer for Farsight Security, Inc.