Analysis
A bit ago, I posted about doing analysis, and that post didn't really seem to get much traction at all. What was I trying for? To start a conversation about how we _do_ analysis. When we make statements to a client or to another analyst, on what are we basing those findings? Somewhere between the raw data and our findings is where we _do_ analysis; I know what that looks like for me, and I've shared it (in this blog, in my books, etc.), and what I've wanted to do for some time is go beyond the passivity of sitting in a classroom, and start a conversation where analysts engage and discuss analysis.
I have to wonder...is this even possible? Will analysts talk about what they do? For me, I'm more than happy to. But will this spark a conversation?
I thought I'd try a different tact this time around. In a recent blog post, I mentioned that two Prefetch parsers had recently been released. While it is interesting to see these tools being made available, I have to ask...how are analysts using these tools? How are analysts using these tools to conduct analysis, and achieve the results that they're sharing with their clients?
Don't get me wrong...I think having tools is a wonderful idea. We all have our favorite tools that we tend to gravitate toward or reach for under different circumstances. Whether it's commercial or free/open source tools, it doesn't really matter. Whether you're using a dongle or a Linux distro...it doesn't matter. What does matter is, how are you using it, and how are you interpreting the data?
Someone told me recently, "...I know you have an issue with EnCase...", and to be honest, that's simply not the case. I don't have an issue with EnCase at all, nor with FTK. I do have an issue with how those tools are used by analysts, and the issue extends to any other tool that is run auto-magically and expected to spit out true results with little to no analysis.
What do the tools really do for us? Well, basically, most tools parse data of some sort, and display it. It's then up to us, as analysts, to analyze that data...interpret it, either within the context of that and other data, or by providing additional context, by incorporating either additional data from the same source, or data from external sources.
RegRipper is a great example. The idea behind RegRipper (as well as the other tools I've written) is to parse and display data for analysis...that's it. RegRipper started as a bunch of scripts I had sitting around...every time I'd work on a system and have to dig through the Registry to find something, I'd write a script to do the actual work for me. In some cases, a script was simply to follow a key path (or several key paths) that I didn't want to have to memorize. In other cases, I'd write a script to handle ROT-13 decoding or binary parsing; I figured, rather than having to do all of that again, I'd write a script to automate it.
For a while, that's all RegRipper did...parse and display data. If you had key words you wanted to "pivot" on, you could do so with just about any text editor, but that's still a lot of data. So then I started adding "alerts"; I'd have the script (or tool) do some basic searching to look for things that were known to be "bad", in particular, file paths in specific locations. For example, an .exe file in the root of the user profile, or in the root of the Recycle Bin, is a very bad thing, so I wanted those to pop out and be put right in front of the analyst. I found...and still find...this to be an incredibly useful functionality, but to date,
Here's an example of what I'm talking about with respect to analysis...I ran across this forensics challenge walk-through recently, and just for sh*ts and grins, I downloaded the Registry hive (NTUSER.DAT, Software, System) files. I ran the appcompatcache.pl RegRipper plugin against the system hive, and found the following "interesting" entries within the AppCompatCache value:
C:\dllhot.exe Tue Apr 3 18:08:50 2012 Z Executed
C:\Windows\TEMP\a.exe Tue Apr 3 23:54:46 2012 Z Executed
c:\windows\system32\dllhost\svchost.exe Tue Apr 3 22:40:25 2012 Z Executed
C:\windows\system32\hydrakatz.exe Wed Apr 4 01:00:45 2012 Z Executed
C:\Windows\system32\icacls.exe Tue Jul 14 01:14:21 2009 Z Executed
Now, the question is, for each of those entries, what do they mean? Do they mean that the .exe file was "executed" on the date and time listed?
No, that's not what the entries mean at all. Check out Mandiant's white paper on the subject. You can verify what they're saying in the whitepaper by creating a timeline from the shim cache data and file system metadata (just the $MFT will suffice); if the files that had been executed were not deleted from the system, you'll see that the time stamp included in the shim cache data is, in fact, the last modification time from the file system (specifically, the $STANDARD_INFORMATION attribute) metadata.
I use this as an example, simply because it's something that I see a great deal of; in fact, I recently experienced a "tale of two analysts", where I reviewed work that had previously been conducted, by two separate analysts. The first analyst did not parse the Shim Cache data, and the second parsed it, but assumed that what the data meant was that the .exe files of interested had been executed at the time displayed alongside the entry.
Again, this is just an example, and not meant to focus the spotlight on anyone. I've talked with a number of analysts, and in just about every conversation, they've either known someone who's made the same mistake misinterpreting the Shim Cache data, or they've admitted to misinterpreting it themselves. I get it; no one's perfect, and we all make mistakes. I chose this one as an example, because it's perhaps one of the most misinterpreted data sources. A lot of analysts who have attended (or conducted) expensive training courses have made this mistake.
Pointing out mistakes isn't the point I'm trying to make...it's that we, as a community, need to engage in a community-wide conversation about analysis. What resources do we have available now, and what do we need? We can't all attend training courses, and when we do, what happens most often is that we learn something cool, and then don't see it again for 6 months or a year, and we forget the nuances of that particular analysis. Dedicated resources are great, but they (forums, emails, documents) need to be searched. What about just-in-time resources, like asking a question? Would that help?
I have to wonder...is this even possible? Will analysts talk about what they do? For me, I'm more than happy to. But will this spark a conversation?
I thought I'd try a different tact this time around. In a recent blog post, I mentioned that two Prefetch parsers had recently been released. While it is interesting to see these tools being made available, I have to ask...how are analysts using these tools? How are analysts using these tools to conduct analysis, and achieve the results that they're sharing with their clients?
Don't get me wrong...I think having tools is a wonderful idea. We all have our favorite tools that we tend to gravitate toward or reach for under different circumstances. Whether it's commercial or free/open source tools, it doesn't really matter. Whether you're using a dongle or a Linux distro...it doesn't matter. What does matter is, how are you using it, and how are you interpreting the data?
Someone told me recently, "...I know you have an issue with EnCase...", and to be honest, that's simply not the case. I don't have an issue with EnCase at all, nor with FTK. I do have an issue with how those tools are used by analysts, and the issue extends to any other tool that is run auto-magically and expected to spit out true results with little to no analysis.
What do the tools really do for us? Well, basically, most tools parse data of some sort, and display it. It's then up to us, as analysts, to analyze that data...interpret it, either within the context of that and other data, or by providing additional context, by incorporating either additional data from the same source, or data from external sources.
RegRipper is a great example. The idea behind RegRipper (as well as the other tools I've written) is to parse and display data for analysis...that's it. RegRipper started as a bunch of scripts I had sitting around...every time I'd work on a system and have to dig through the Registry to find something, I'd write a script to do the actual work for me. In some cases, a script was simply to follow a key path (or several key paths) that I didn't want to have to memorize. In other cases, I'd write a script to handle ROT-13 decoding or binary parsing; I figured, rather than having to do all of that again, I'd write a script to automate it.
For a while, that's all RegRipper did...parse and display data. If you had key words you wanted to "pivot" on, you could do so with just about any text editor, but that's still a lot of data. So then I started adding "alerts"; I'd have the script (or tool) do some basic searching to look for things that were known to be "bad", in particular, file paths in specific locations. For example, an .exe file in the root of the user profile, or in the root of the Recycle Bin, is a very bad thing, so I wanted those to pop out and be put right in front of the analyst. I found...and still find...this to be an incredibly useful functionality, but to date,
Here's an example of what I'm talking about with respect to analysis...I ran across this forensics challenge walk-through recently, and just for sh*ts and grins, I downloaded the Registry hive (NTUSER.DAT, Software, System) files. I ran the appcompatcache.pl RegRipper plugin against the system hive, and found the following "interesting" entries within the AppCompatCache value:
C:\dllhot.exe Tue Apr 3 18:08:50 2012 Z Executed
C:\Windows\TEMP\a.exe Tue Apr 3 23:54:46 2012 Z Executed
c:\windows\system32\dllhost\svchost.exe Tue Apr 3 22:40:25 2012 Z Executed
C:\windows\system32\hydrakatz.exe Wed Apr 4 01:00:45 2012 Z Executed
C:\Windows\system32\icacls.exe Tue Jul 14 01:14:21 2009 Z Executed
Now, the question is, for each of those entries, what do they mean? Do they mean that the .exe file was "executed" on the date and time listed?
No, that's not what the entries mean at all. Check out Mandiant's white paper on the subject. You can verify what they're saying in the whitepaper by creating a timeline from the shim cache data and file system metadata (just the $MFT will suffice); if the files that had been executed were not deleted from the system, you'll see that the time stamp included in the shim cache data is, in fact, the last modification time from the file system (specifically, the $STANDARD_INFORMATION attribute) metadata.
I use this as an example, simply because it's something that I see a great deal of; in fact, I recently experienced a "tale of two analysts", where I reviewed work that had previously been conducted, by two separate analysts. The first analyst did not parse the Shim Cache data, and the second parsed it, but assumed that what the data meant was that the .exe files of interested had been executed at the time displayed alongside the entry.
Again, this is just an example, and not meant to focus the spotlight on anyone. I've talked with a number of analysts, and in just about every conversation, they've either known someone who's made the same mistake misinterpreting the Shim Cache data, or they've admitted to misinterpreting it themselves. I get it; no one's perfect, and we all make mistakes. I chose this one as an example, because it's perhaps one of the most misinterpreted data sources. A lot of analysts who have attended (or conducted) expensive training courses have made this mistake.
Pointing out mistakes isn't the point I'm trying to make...it's that we, as a community, need to engage in a community-wide conversation about analysis. What resources do we have available now, and what do we need? We can't all attend training courses, and when we do, what happens most often is that we learn something cool, and then don't see it again for 6 months or a year, and we forget the nuances of that particular analysis. Dedicated resources are great, but they (forums, emails, documents) need to be searched. What about just-in-time resources, like asking a question? Would that help?