The 4 Step Guide to Exploring Attacker Infrastructure with Web Assets
What do I Mean by Web Assets?
In general, when I refer to web assets, I mean files that are loaded into the main HTML of a site via HTML tags. Examples include:
- Javascript files (via script tags)
- CSS files (via style tags)
- Images (via img tabs)
Programmers are lazy… threat actors are no exception
- Reusing css and js files is easier than writing new ones from scratch
- The set of third party files loaded into an HTML document and the order in which they are loaded is highly variable, and therefore a good potential fingerprint
- Added javascript and css in stock files and/or inline to the HTML are easy ways for “lazy” programmers to get script code into pages, and generally this code will not be unique across their infrastructure
What Does a Concrete Example Look Like?
Step 1: Choose a Target
I wanted to create a sort of trivial example of searching for connected infrastructure, but first, I needed a malicious domains. Thinking like a SOC analyst, I thought it might be nice to look at some web properties that had known phishing components. To facilitate this, I went over to AlienVault’s Open Threat Exchange (OTX) and found an interesting site there.
When I opened this site in a web browser, I noticed its text centers around earning bitcoins, implying that by doing link shortening for facebook links, you can earn bitcoins. This site seems to have a number of malicious things going on, including running a number of scripts loaded from .ru domains.
Right now, I’m not trying to analyze what the site does or what its author’s intentions may be, my goal is to verify that it’s the type of malicious site that might get sent to users on our network To confirm this, let’s look at the domain in Iris:
Iris highlights that this domain has a risk score of 100, citing that it has a high proximity to other malicious domains, and shows evidence of malware and phishing. This is a great indication that I am onto to something here! With any luck, I should be able to see strong infrastructure connections using web assets in addition to the things Iris already mentions.
Step 2: Pick an Asset to Search For
When looking at the screenshot for this site above, I had the developer tools open on my browser, which allowed me to see the html code as well as assets that are loaded. For this type of exercise, it is important to pick html code that is interesting and unique enough that it might have been created by the attacker. With this in mind it is critical to avoid commonly used javascript and css libraries.
Looking at the loaded assets, I noticed one or two that look like they may be pretty common. One example is “includes/ajax/jquery.js.” Jquery is a well known javascript library that many sites use, so it doesn’t make a very good element for us to pivot on. Based on some of my personal experience with web development, I’d wager that “aurblue” is not a common library (and a quick Google search will back me up). Thus, I chose to search for the path “templates/aurblue/components.css.”
A couple of important notes here, “components.css” by itself is not going to be a good pivot term, as it will be commonly used across many sites, since it’s such a generic and descriptive name. In this case, it’s the file path that has the unique naming, and the filename that (we hypothesize) has the shared code.
Step 3: Search for Connections
Google probably isn’t the best tool for something like this, but it’s the tool everyone has at hand, so let’s see what I can find by searching on our pivot term.
It is pretty apparent that a number of website metadata tracking links come up as our first searches on Google. This is valuable as these sites track what types of components are loaded by sites, and so can give me the types of answers (infrastructure correlation) that I’m looking for. If I pick one of these other sites at random, say bandirun[.]com, I can take a look and see if I can confirm my hypothesis that I can explore a threat actor’s infrastructure in this way.
Loading bandirun[.]com up in Iris, I see right away that it has a malicious profile, though interestingly this time from Iris’ perspective, its malware score is low, but its proximity score is high. This is actually along the lines of what we want to see, it confirms that this site is connected to other known bad sites.
If I look at the Iris domain row for each of 1ink[.]cc and bandirun[.]com, I don’t see any pieces of data that directly relate these two sites. However, Iris is a great infrastructure exploration tool on its own, so lets do some pivoting and see if I can find something related. If I expand on bandirun[.]com’s IP address, I will get a list of hosts that share that IP.
Now if I scroll through the new list of sites Iris has generated for me, I pretty quickly see that globalmaritimetraining[.]net, which is on the same ip address as bandirun.com, shares the same DNS/SOA email address (markabi.twins@gmail[.]com) as 1ink[.]cc.
Bandirun[.]com, expanded on 104.168.58[.]149:
1ink[.]cc:
So the two domains I investigated, though not directly related, have a related attribute (DNS/SOA email address) through a third domain (surfaced by Iris). This shows us that even with a very straightforward approach, I can begin to associate malicious actors infrastructure via web assets.
Step 4: Expand on this technique
Now that I’ve demonstrated that such a technique could be viable, let’s talk about how to expand on and generalize this technique. By their nature, web assets are reused a lot. Therefore it’s likely if I did this without refining my technique (and without the intuition of someone who’s done some web development), I would have a high rate of false positives. In this case a false positive would be a connection between two domains that is not meaningful. To avoid this, it’s critical to to develop allowlists of commonly used javascript and css asset files and frameworks. As our allowlist grows, the likelihood of infrastructure connections found using this technique being legitimate or interesting should grow as well.
To continue to grow the sophistication of this technique, I could begin to look at individual code blocks within the files themselves, as well as comments, coding style, and other indicators.
There is a lot of value in techniques that have been leveraged by the malware analysis community for sometime. There are some different constraints in this domain than in malware analysis, as web assets will always need to be readable by the browser. In a future article, I will explore javascript obfuscation and deobfuscation techniques and how these might fit into this type of pivoting/identification technique.
Inevitably, attackers will adapt their techniques to evade detection via this type of exploration, however, their resources to do so are limited. In essence, I am using my own exploration and asymmetry advantages against threat actors.
For an in-depth discussion on fingerprinting threat actors with similar techniques, watch this webinar I co-presented with Rebekah Brown to learn:
- How the threat intelligence space is evolving
- Practical steps your team can take to get ahead of threat actors
- Real world examples of enumerating attacker infrastructure using web assets and other information scraped from HTML