What's a UUID?
I. Introduction
When working with DNSDB or Security Information Exchange (SIE) DNS-related channels, you may occasionally see domain name labels with a very distinctive “dash-separated pattern” such as the following:
8a1c7f6a-ac5e-4898-af1e-2654d0fa8e45.probe.performance.dropbox.com. c34b98c1-02d5-4020-a6e0-c89af9a9b56e.sync.upravel.com. 0fe9da84-55ab-48fc-847e-4da8807419ee.mitdmp.whiteboxdigital.ru. bec89e42-fe87-44a0-b53d-55b8bc7b7a7a.notifications.api.brightspace.com. 9565a982-6467-4d43-94ac-a5094ad877cc.us.u.fastly-insights.com. f5489548-9f97-4a48-b22b-2f03aec465aa.edge1.pingone.com. eaf4f4b1-65fa-5480-5013-05ab140f8498.z1.dca0.com.
That is, the bolded portion of each of those names all follow the pattern:
- Eight hexadecimal digits followed by a dash
- Four hexadecimal digits followed by a dash (repeated two additional times)
- Twelve hexadecimal digits
These names are almost certainly “Universally Unique Identifiers,” or “UUIDs”.
There are RFC4122 and non-RFC4122 UUIDs. The four main types of RFC4122 UUIDs are:
Type 1 UUIDs (a value derived from the host’s hardware address and the current time)
Type 3 UUIDs (an MD5-hashed version of a namespace identifier and a corresponding namestring (such as a domain name or URL or ISO OID or X.500 DN))
Type 4 UUIDs (essentially a pseudo-random value), and
Type 5 UUIDs (a SHA1-hashed version of a namespace identifier and a corresponding namestring (such as a domain name or URL or OID or X.500 DN)).
II. RFC4122 Type 4 UUIDs
Let’s experiment a little with some UUIDs using Python3’s uuid library.
We’ll begin by starting an interactive Python3 shell, and importing the Python uuid library:
$ python3 >>> import uuid
We can then try creating some RFC4122-format UUIDs. For example, if you ever need a unique random identifier (perhaps to use as a transaction identifier or as part of a temporary filename), RFC4122 Type 4 UUIDs may be just the ticket. Each Type 4 UUID you create will be unique:
>>> uuid.uuid4() UUID('2ab6bdda-fd0b-4d6c-b886-b7c388cca8c7') >>> uuid.uuid4() UUID('64847737-f0b4-401c-932b-452af79e3264') >>> uuid.uuid4() UUID('b06c22b6-2124-4702-a450-714f42898422')
As we will later see in actual traffic, RFC4122 Type 4 UUIDs tend to be the most commonly seen.
III. RFC4122 Type 1 UUIDs
Let’s now look at a type of UUID that’s a little more complex, the RFC4122 Type 1 UUID. We’ll get a Type 1 UUID and assign it to a variable called myuuid
(bolding added in the following output by me):
>>> myuuid=uuid.uuid1() >>> myuuid UUID('c8e85954-bdc8-11eb-a376-a0369f710741')
If we “re-request” that Type 1 UUID, we can see that the last (bolded) field of the Type 1 UUID doesn’t change:
>>> myuuid=uuid.uuid1() >>> myuuid UUID('f8b26182-bdca-11eb-9022-a0369f710741')
If we compare the last field of that UUID to the system’s hardware Ethernet (“MAC”) address, retrieved using
$ ifconfig -a
or
$ ip a
we can see the last part of the Type 1 UUID is set to the default interface’s hardware address (albeit without the colon formatting that’s normally part of a hardware Ethernet address):
link/ether a0:36:9f:71:07:41
If we wanted to, we could programmatically verify that the Type 1 UUID is an RFC4122 Type 1 UUID (and get the host’s Ethernet MAC address) by saying:
>>> myuuid.variant 'specified in RFC 4122' >>> myuuid.version 1 >>> hex(myuuid.fields[5]) '0xa0369f710741'
The above confirms that the (last part of a) Type 1 UUID has the potential to act as a persistent host identifier (although beware of the impact of things like Ethernet adapter replacement).
IV. RFC4122 Type 3 and RFC4122 Type 5 UUIDs
Now let’s consider the RFC4122 Type 3 (MD5) and Type 5 (SHA-1) UUIDs. These are hashed values produced from a namespace identifier and a corresponding namestring (such as a domain name, URL, ISO OID, or X.500 DN). The two hashing functions (MD5 vs SHA1) yield different values, but the principle’s the same.
Note that neither MD5 nor SHA-1 are considered particularly cryptographically robust against well-funded opponents these days, but both are pragmatically adequate for the purposes for which most UUIDs get used.
Let’s create a Type 5 (SHA-1) UUIDs as an example. Part of making a Type 5 UUID is specifying a namespace (the namespace declares the type of namestring being processed). In the notation of the Python UUID library we’re using, those name spaces are defined as:
uuid.NAMESPACE_DNS Domain name uuid.NAMESPACE_URL Web URL uuid.NAMESPACE_OID ISO OID uuid.NAMESPACE_X500 X500 Distinguished Name (DN)
For example, to build a Type 5 UUID (SHA-1) for the domain name www.farsightsecurity.com, we’d say:
uuid.uuid5(uuid.NAMESPACE_DNS, 'www.farsightsecurity.com') UUID('c8b275e4-2990-5eef-af17-a96026a19f71')
If we re-run that Type 5 UUID call for www.farsightsecurity.com, note that we get the same UUID result:
uuid.uuid5(uuid.NAMESPACE_DNS, 'www.farsightsecurity.com') UUID('c8b275e4-2990-5eef-af17-a96026a19f71')
A natural question when thinking about a Type 5 UUID is, “Can I ‘decode’ a UUID in order to extract the domain name that was used to construct that UUID?” The answer is that no, you can’t, at least not algorithmically.
However, data-driven approaches should also be considered. Remember that if you re-run a Type 5 UUID, it will yield the same result every time it is run with the same input. Thus, conceptually, you could build a “dictionary” mapping billions of known namestrings to their corresponding Type 5 UUIDs, and then use that table of “precomputed” Type 5 UUIDs to “lookup” the corresponding “encoded” namestring.
For that reason, please be careful when it comes to assuming that Type 5 UUIDs are “strictly one way” or are “absolutely non-reversible” — that may not always be true, particularly if the set of potential namestrings is known and finite.
V. UUIDs in Security Information Exchange (SIE) Channels
Farsight SIE channels are used to share near-real time security traffic, including DNS traffic. For example, SIE Channel 204 contains “Data from Farsight’s global sensor array that has been deduplicated, filtered and verified.” (A list of available channels can be found here.
We pulled ten million Channel 204 records from a leased SIE blade server using nmsgtool, saving just the RRnames that have the appropriate pattern.
This took less than ten minutes, and yielded 14,348 records (0.143% of the total observations we pulled):
$ time nmsgtool -C ch204 -c 10000000 -J - | jq -r '.message.rrname' | \ egrep '[0-9a-f]{8}\-[0-9a-f]{4}\-[0-9a-f]{4}\-[0-9a-f]{4}\-[0-9a-f]{12}' > \ uuid-pattern-rrnames.txt real 9m21.825s user 6m7.292s sys 0m37.948s $ wc -l uuid-pattern-rrnames.txt 12351 uuid-pattern-rrnames.txt
We then visually confirmed that those names were of the expected UUID-containing pattern:
$ more uuid-pattern-rrnames.txt 01ef8657-5088-4e2d-a90c-f135796b4abc-pdata-v4.unique.k.fastly-insights.com. aa6a8b17-18e2-4cec-a6d3-b9526ef30673-pdata-v4.unique.k.fastly-insights.com. 99de0c1d-c56c-4942-ac6e-3d924687a038-pdata-v4.unique.k.fastly-insights.com. [...] 210a718b-10f8-425d-9918-d399a97e6a67-pdata-v4.unique.k.fastly-insights.com.
Top effective 2nd-level domains that were seen in 50 or more RRnames looked like:
$ 2nd-level-dom < uuid-pattern-rrnames.txt | sort | uniq -c | sort -nr 6141 fastly-insights.com 1088 bugfender.net 401 permutive.app 348 aadrm.com 303 wix-code.com 272 verkada-lan.com 231 mts.ru 168 seondnsresolve.com 158 elliq.co 153 filesusr.com 139 msappproxy.net 135 azure.com 124 mysimplestore.com 99 godaddy.com 91 lever.co 88 rlets.com 85 thirdptop.com 85 cloudflareresolve.com 79 pipedrive.email 60 petzila.com 53 brightspace.com [etc]
There were at least few comment-worthy things to note:
451 domain names from
remotewd.com
had UUID-format names. Given that, you might wonder why remotewd.com does not appear in the list of top-effective 2nd-level domains show above. To understand that behavior, note thatremotewd.com
appears in the PSL (Public Suffix List) and thus is an effective top-level domain in its own right. This means that those 451remotewd.com
domains are all tallied separately rather than being aggregated into a single effective 2nd-level-domain large enough to get listed in the top table.Some UUIDs had been “customized” or “encapsulated” by having text pre- or post-pended to the UUID. For example, we saw unusual-looking domains that looked like the following:
dropthishost-c9ca947d-3c30-48db-89a2-33f6a36a67a0.biz. dropthishost-09a9fadf-de97-41e7-8dea-fae21b069c4a.biz. dropthishost-01c4781b-3816-4081-b1c9-3b7e9df16d9a.biz. dropthishost-01c4781b-3816-4081-b1c9-3b7e9df16d9a.biz. dropthishost-426903b8-79aa-4e10-b80e-0fc520637ac7.biz. dropthishost-a24631d1-6cdc-4eb9-a099-76c4a3fb0d38.biz. dropthishost-21a416d3-dd34-4c4b-a2cd-fdc484cc21b6.biz. dropthishost-e57a0268-a3d3-449d-b424-98ed9cb07d29.biz. dropthishost-e57a0268-a3d3-449d-b424-98ed9cb07d29.biz. dropthishost-9ccc43ef-27f0-4717-9767-3dcf5453a8cd.biz. dropthishost-81b53cb7-6640-4ea9-ba71-d28f39a069cc.biz. dropthishost-dbf7973f-9dc1-46ab-bc10-3b3a29695906.biz. dropthishost-a9f2037f-e43b-4640-8993-db18303e93a0.biz. dropthishost-a08e8b8d-2efa-4ee4-baa0-76fc107c370a.biz. [etc]
All of the domains of this sort that we checked in DNSDB appear to point at 216.194.64.193
What do we see if we dig into the UUIDs themselves? Are they all of a particular type, for example?
We’ll extract just the the apparent UUIDs from the file by saying:
$ grep -oP '([0-9a-f]{8}\-[0-9a-f]{4}\-[0-9a-f]{4}\-[0-9a-f]{4}\-[0-9a-f]{12})' \ uuid-pattern-rrnames.txt > just-uuids.txt
We’ll then check the status of each of those UUID with a little Python3 script:
#!/usr/bin/python3
import sys
import uuid
myuuid=''
myuuid=uuid.UUID(sys.argv[1])
myvariant=''
myvariant=myuuid.variant
myvariant2=myvariant.replace("specified in RFC 4122","4122")
myversion=''
myversion=myuuid.version
print(myuuid,",",myvariant2,",",myversion)
For example:
$ ./checkit.py 3ee99252-8700-414d-a1ef-e3d4e77a63f3
3ee99252-8700-414d-a1ef-e3d4e77a63f3 , 4122 , 4
Considering all 12,351 UUIDs in our sample, 12,292 (99.5% of our sample) were RFC4122 Type 4 “random” UUIDs (the remaining 0.5% consisted of sundry other UUID type). Clearly “random” UUIDs dominate the UUID usage we observed.
VI. Conclusion
We hope you’ve found this little introduction to UUIDs to be of some interest.
We encourage you to explore all the interesting and unusual phenomena waiting to be discovered in Farsight’s Security Information Exchange and DNSDB!
Joe St Sauver, Ph.D. is a Distinguished Scientist and Director of Research for Farsight Security, Inc..