Data Points And Analysis
In DFIR and "threat intel" analysis, very often individual data points are dismissed out of hand, as they are thought to be easily mutable. We see it all the time, don't we? We find a data point, and instead of just adding it to our picture of the incident, instead, we ask, "Hey, what about this...?". Very often, we hear in response, "...hackers change that all the time...", so we drop it.
Why do we do this? Why do we not include "easily mutable" artifacts in our analysis?
A common example of this is the PE compile time, a time stamp value added to an executable file during the compilation process. I'm not an expert in compilers or linkers, but this time stamp value is understood throughout the DFIR community to be easily mutable; that is, it doesn't "cost" an adversary much to change this value. Many of us have seen where the PE compile time value, when converted, indicates that file was compiled in 1980, or possibly even a date in the future. This value is thought to be easily mutable precisely because many of us have either seen it changed, or have actually changed it ourselves. A consequence of this is that when someone brings up, "...hey, this time stamp value says...", we may be immediately met with the value itself being dismissed out of hand.
However, there may be considerable value in including these values in our corpus of viable, relevant data points. What I mean is, just because a value is understood to be easily mutable, what if it wasn't changed? Why are we not including these values in our analysis because they could be changed, without checking to see if they had been changed?
Consider the FireEye blog post from Nov 2018 regarding the APT29/Cozy Bear phishing campaign; table 1 of the article illustrates an "operational timeline", which is a great idea. The fourth row in the table illustrates the time at which the LNK file is thought to have been weaponized; this is a time stamp stored in a shell item, as an MS-DOS date/time value. The specific value is the last modification time of the "system32" folder, and if you know enough about the format of LNK files, it's not hard at all to modify this time value, so it could be considered "easily mutable". For example, open the file in binary mode, go to the location/offset with in the file, and overwrite the 16-bit value with 0's. Boom. You don't even have to mess with issues of endianness, just write 0's and be done with it.
However, in this case, the FireEye folks included the value in their corpus, and found that it had significant value.
Something else you'll hear very often is, "...yeah, we see that all the time...". Okay, great...so why dismiss it? Sure, you see it all the time, but in what context? When you say that you "see it all the time", does that mean you're seeing the same data points across disparate campaigns?
Let's consider Windows shortcut/LNK files again. Let's say we retrieve the machine ID from the LNK file metadata, and we see "user-pc" again, and again, and again. We also see the same node ID (or "MAC address") and the same volume serial number across different campaigns. Are these campaigns all related to the same threat actor group, or different adversaries? Either way, this would tell us something, wouldn't it?
The same can be said for other file and document metadata, including that found in phishing campaign lure documents, particularly the OLE format documents. You see the same metadata across different campaigns? Great. Are the campaigns attributed to the same actors?
What about the embedded macros? Are they obfuscated? I've seen macros with no obfuscation at all, and I've seen macros with four or five levels of obfuscation, each level being completely different (i.e., base64 encoding, character encoding, differences in string concatenation, etc.).
All of these can be useful pieces of information to build out the threat intel picture. Threat intel analysts need to know what's available, so that they can ask for it if it's not present, and then utilize it and track it. DFIR analysts need to understand that there's more to answering the IR questions, and a small amount of additional work can yield significant dividends down the road, particularly when shared with analysts from other disciplines.
Why do we do this? Why do we not include "easily mutable" artifacts in our analysis?
A common example of this is the PE compile time, a time stamp value added to an executable file during the compilation process. I'm not an expert in compilers or linkers, but this time stamp value is understood throughout the DFIR community to be easily mutable; that is, it doesn't "cost" an adversary much to change this value. Many of us have seen where the PE compile time value, when converted, indicates that file was compiled in 1980, or possibly even a date in the future. This value is thought to be easily mutable precisely because many of us have either seen it changed, or have actually changed it ourselves. A consequence of this is that when someone brings up, "...hey, this time stamp value says...", we may be immediately met with the value itself being dismissed out of hand.
However, there may be considerable value in including these values in our corpus of viable, relevant data points. What I mean is, just because a value is understood to be easily mutable, what if it wasn't changed? Why are we not including these values in our analysis because they could be changed, without checking to see if they had been changed?
Consider the FireEye blog post from Nov 2018 regarding the APT29/Cozy Bear phishing campaign; table 1 of the article illustrates an "operational timeline", which is a great idea. The fourth row in the table illustrates the time at which the LNK file is thought to have been weaponized; this is a time stamp stored in a shell item, as an MS-DOS date/time value. The specific value is the last modification time of the "system32" folder, and if you know enough about the format of LNK files, it's not hard at all to modify this time value, so it could be considered "easily mutable". For example, open the file in binary mode, go to the location/offset with in the file, and overwrite the 16-bit value with 0's. Boom. You don't even have to mess with issues of endianness, just write 0's and be done with it.
However, in this case, the FireEye folks included the value in their corpus, and found that it had significant value.
Something else you'll hear very often is, "...yeah, we see that all the time...". Okay, great...so why dismiss it? Sure, you see it all the time, but in what context? When you say that you "see it all the time", does that mean you're seeing the same data points across disparate campaigns?
Let's consider Windows shortcut/LNK files again. Let's say we retrieve the machine ID from the LNK file metadata, and we see "user-pc" again, and again, and again. We also see the same node ID (or "MAC address") and the same volume serial number across different campaigns. Are these campaigns all related to the same threat actor group, or different adversaries? Either way, this would tell us something, wouldn't it?
The same can be said for other file and document metadata, including that found in phishing campaign lure documents, particularly the OLE format documents. You see the same metadata across different campaigns? Great. Are the campaigns attributed to the same actors?
What about the embedded macros? Are they obfuscated? I've seen macros with no obfuscation at all, and I've seen macros with four or five levels of obfuscation, each level being completely different (i.e., base64 encoding, character encoding, differences in string concatenation, etc.).
All of these can be useful pieces of information to build out the threat intel picture. Threat intel analysts need to know what's available, so that they can ask for it if it's not present, and then utilize it and track it. DFIR analysts need to understand that there's more to answering the IR questions, and a small amount of additional work can yield significant dividends down the road, particularly when shared with analysts from other disciplines.