Tools To Quickly Extract Indicators of Compromise
What are Indicators of Compromise (IoCs)?
Back in 2009, when an outbreak of the H1N1 influenza strain (known as the swine flu) was deemed a global pandemic, Mike Cloppert published a series on threat intelligence and the cyber kill chain. In this piece, Mike classified three types of indicators: atomic, computed, and behavioral. A year or so later, Mandiant used the term “Indicators of Compromise” in their M-Trends report, and days later, Matt Fraizer of Mandiant published the blog Combat the APT by Sharing Indicators of Compromise.
So what are these illustrious IoCs? According to TechTarget, they are “pieces of forensic data, such as data found in system log entries or files, that identify potentially malicious activity on a system or network.” Although the term IoC was coined over a decade ago, The SANS 2021 Cyber Threat Intelligence (CTI) Survey calls out “specific IoCs to plug into IT and security infrastructure to block or find attacks” as one of the top answers from respondents when asked about information most useful to CTI operations. A challenge is grabbing relevant indicators out of external reports such as vendor research reports (which are typically PDFs), or blogs (plain text/HTML) so you and your team can take action, but more on that later.
Most Common IoCs
There’s a big world of indicators beyond the ones that seem to get blogged/tweeted/reported all the time and it’s worth pursuing them even if it’s more work. Most industry reports share indicators in the form of IP addresses, domain names, file hashes, and sometimes other hashes (like SSL certificates). But for the sake of providing examples of indicators, I’ve included critical IoCs which were outlined In an article published in Dark Reading by Ericka Chickowski:
- Unusual Outbound Network Traffic
- Anomalies in Privileged User Account Activity
- Geographical Irregularities
- Log-In Red Flags
- Increases in Database Read Volume
- HTML Response Sizes
- Large Numbers of Requests for the Same File
- Mismatched Port-Application Traffic
- Suspicious Registry or System File Changes
- Unusual DNS Requests
- Unexpected Patching of Systems
- Mobile Device Profile Changes
- Bundles of Data in the Wrong Place
- Web Traffic with Unhuman Behavior
- Signs of DDoS Activity
The Marriage of Internal and External Indicators
IoCs, although valuable, are not the end all be all of CTI. Forrester’s Intelligence Cycle collection stage and Katie Nickel’s description of The Cycle of Cyber Threat Intelligence highlight the importance of first looking inward (much like my pup, Peanut).
This means asking yourself: What data do you need to answer your intelligence questions? How much of that exists internally? How much do you need to acquire externally? What requirements should you consider when acquiring external intel? Katie highlighted this in our RSA 2020 Human Element Breaking Badness mini-series just before the clock struck midnight but rather than transforming into pumpkins, we morphed into antisocial turtles for 12 months. In this discussion, she references the Collection Management Framework, an extremely helpful tool to build these collection requirements. The bottom line is that one does not simply spin up an effective CTI strategy overnight.
It is built on a foundation of understanding your requirements, collection processes, stakeholders, etc. I prefer to sum this up with an easter egg from the famous meeting between Ron Livingston and the “Bobs” in Office Space: One must plan to plan.
Credit: Office Space, Twentieth Century Fox
After building out the items listed above, the true value of IoCs can be recognized through the beauty of enrichment which involves taking internal threat intelligence and decorating your raw internal data with external indicators from industry reports/research, ISACs, and so forth.
Tools to Extract External IoCs
All right, so by this point this blog described IoCs, reminded us all not to use them in a vacuum, and now to the reason you’re really here. Extracting IoCs from different formats is about as fun as a root canal. My colleague Taylor Wilkes-Pierce recently joked that vendors might as well be sharing IoCs via skywriting. Point being, it can be tough to pull out helpful indicators to enrich your internal threat intel. You need to extract these indicators from reports so that you can, in a scalable, automated way, determine whether your environment is encountering any of the reported indicators.
Now skywriting might be a little extreme, but oftentimes you’ll see IoCs embedded in PDFs in HTML/plain text. So in hopes of making your work a little easier, I pulled together a list of tools that are already out on the Information Superhighway to help you effectively extract IoCs and more.
Description: IoC Parser is a tool to extract indicators of compromise from security reports in PDF format.
Author: Palo Alto Networks
Language: Python
Why this tool?: This is a simple python script that allows you to enter the input file format as well as the output (which includes csv/json/yara/autofocus).
Description: APTnotes is a repository of publicly-available papers and blogs (sorted by year) related to malicious campaigns/activity/software that have been associated with vendor-defined APT (Advanced Persistent Threat) groups and/or tool-sets.
Authors: Kiran Bandla, Santiago Castro
Language: Python
Why this tool?: There is a fantastic pre-existing library of APT reports in PDF form. This tool pairs quite well with ioc-parser, iocextract, and IOCextractor.
EXTRACTOR: Extracting Attack Behavior from Threat Report
Description: EXTRACTOR makes no strong assumptions about the text and is capable of extracting attack behaviors as provenance graphs from unstructured text. Our evaluation results show that EXTRACTOR can extract concise provenance graphs from CTI reports and show that these graphs can successfully be used by cyber-analytics tools in threat-hunting.
Author: Kiavash Satvat
Language: N/A
Why this tool?: EXTRACTOR is capable of extracting attack behaviors as provenance graphs from unstructured text.
Description: This library extracts URLs, IP addresses, MD5/SHA hashes, email addresses, and YARA rules from text corpora. It includes some encoded and “defanged” IoCs in the output, and optionally decodes/refangs them.
Author: InQuest
Language: Python
Why this tool?: iocextract is great at pulling out URLs, and specifically demonstrates this value with pulling defanged URLs out of tweets. You can also quickly “refang” URLs.
Description: IoC (Indicator of Compromise) Extractor: a program to help extract IoCs from text files. The general goal is to speed up the process of parsing structured data (IoCs) from unstructured or semi-structured data (like case reports or security bulletins).
Authors: Bryan Worrell, Stephen Brannon, William Gibb
Language: Python
Why this tool?: This program helps extract indicators of compromise from a plain text file. It currently identifies MD5 hashes, IPv4 addresses, domains, URLs, and email addresses.
So I Extracted IoCs, Now What?
I’d love to leave you with a final thought: the creator of the infamous Pyramid of Pain, David Bianco, has always been explicit about his intent for this concept.
The entire point of detecting indicators is to respond to them, and once you can respond to them quickly enough, you have denied the adversary the use of those indicators when they are attacking you. Not all indicators are created equal, though, and some of them are far more valuable than others. – David Bianco
Credit: David Bianco, Pyramid of Pain
So, while you are pulling out indicators with help from the tools above, be sure to reference David’s work and ensure that they are helping you detect the adversary’s activity and where exactly those IoCs fall on the pyramid.
Additional Resources
- Extracting Indicators of Compromise (IoCs) From Malware Using Basic Static Analysis
- How To: Extract Network Indicators of Compromise (IoCs) from Maldoc Macros — Part 1
- Automatic Extraction of Indicators of Compromise for Web Applications
- TIMiner: Automatically Extracting and Analyzing Categorized Cyber Threat Intelligence from Social Data
- Collecting Indicators of Compromise from Unstructured Text of Cybersecurity Articles Using Neural-Based Sequence Labelling