Timeline Analysis
The DoD Cybercrime Conference is approaching, and I've been doing some thinking about my topic, Timeline Analysis. I'll be presenting on Wed morning, starting at 8:30am...I remember Cory Altheide saying at one point that all tech conferences should start no sooner than 1pm and run no later than 3:30pm, or something like that. Cool idea.
So, anyway...I've been thinking about some of the things that I put into pretty much all of my timeline analysis presentations. When it comes to creating timelines, IMHO there are essentially two "camps", or approaches. One is what I call the "kitchen sink" approach, which is basically, "Give me everything and let me do the analysis." The other is what I call the "layered" or "overlay" approach, in which the analyst is familiar with the system being analyzed and adds successive "layers" to the timeline. When I had a chance to chat with Chad Tilbury at PFIC 2011, he recommended a hybrid of the two approaches...get everything, and then view the data a layer at a time, using something he referred to as a "zoom" capability. This is something I think is completely within reach...but I digress.
One of the things I've heard folks say about using the "everything" or "kitchen sink" approach is that they'd rather have everything so that they can look at it all when they're conducting analysis, because that's how we find new things. I completely agree with that (the "finding new things" part), and I think it's a great idea. After all, one of the core, foundational ideas behind creating timelines is that they can provide a great deal of context to the events we're seeing, and generally speaking, the more data we have, the more context there is likely to be available. After all, a file modification can be pretty meaningless, in and of itself...but if you are able to see other events going on "nearby", you'll begin to see what events led up to and occurred immediately following the file modification. For example, you may see that the user launched IE, began browsing the web, requested a specific page, Java was launched, a file was created, and the file in question was modified...all of which provides a great deal of context.
That leads me to this question...if you're running a tool that someone else designed and put together, and you're just pushing a button or launching a command, how do you know that the tool got everything? How do you know that what you're looking at in the output of the tool is, in fact, everything?
The reason I prefer the layered approach is that it's predicated on (a) fully understanding the goals of your examination, and (b) understanding the system that you're analyzing. Because you understand your goals, you know what it is you're trying to achieve. And because you understand that system you're analyzing...Windows XP, Windows 7, etc...you also understand how various aspects of the operating system interact and are interconnected. As such, you're able to identify where there may be additional data, and either request or create your own tools for extracting the data that you need. Yes, this approach is more manually-intensive than a more automated approach, but it does have it's positive points. For one, you'll know exactly what should be in the timeline, because you added it.
Alternatively, most often when talking to analysts about collecting data, the sense I get is that the general feeling is to "GET ALL THE THINGS!!" and then begin digging through the volumes of data to perform "analysis". I had a case a while back that involved SQL injection, and I created a timeline using only the file system metadata and the SQL injection statements from the web server logs; adding everything else available (including user profile data) would have simply made the timeline too cumbersome and too confusing to effectively analyze. I understood the goals of my exam (i.e., determine what the bad guy did and/or was able to access), and I understood the system (in this case, how SQL injection works, particularly when the database and web server are on the same system).
Now, some folks are going to say, "hey, but what if you missed something?" To that I say...well, how would you know? Or, what if you had the data available because you grabbed everything, and because you had no real knowledge of how the system acted, you had no idea that the event(s) you were looking at were important?
Something else to consider is this...what does it tell us when artifacts that we expect to see are not present? Or...
The absence of an artifact where you would expect to find one is itself an artifact.
Sound familiar? An example of this would be creating a timeline from an image acquired from a Windows system, and not seeing any indication of Prefetch file metadata in the timeline. A closer look might reveal that there are no files ending in .pf in the timeline. So...what does that tell you? I'll leave that one to the reader...
My point is that while there are (as I see it) two approaches to creating timelines, I'm not saying that one is better than the other...I'm not advocating one approach over another. I know from experience that there a lot of analysts who are not comfortable operating in the command line (the "dark place"), and as such, might not create a timeline to begin with, and in particular not one that is pretty command-line-intensive. I also know that there are a good number of folks who use log2timeline pretty regularly, but don't necessarily understand the complete set of data that it collects, or how it goes about doing so.
What I am saying is that, from my perspective, each has it's own strengths and weaknesses, and it's up to the analyst how they want to approach creating timelines. You may not want to use a manually-intensive approach (which you can easily automate using batch files, a la Corey Harrell's approach), but if you end up using a substantive framework, how do you know you're getting everything?
So, anyway...I've been thinking about some of the things that I put into pretty much all of my timeline analysis presentations. When it comes to creating timelines, IMHO there are essentially two "camps", or approaches. One is what I call the "kitchen sink" approach, which is basically, "Give me everything and let me do the analysis." The other is what I call the "layered" or "overlay" approach, in which the analyst is familiar with the system being analyzed and adds successive "layers" to the timeline. When I had a chance to chat with Chad Tilbury at PFIC 2011, he recommended a hybrid of the two approaches...get everything, and then view the data a layer at a time, using something he referred to as a "zoom" capability. This is something I think is completely within reach...but I digress.
One of the things I've heard folks say about using the "everything" or "kitchen sink" approach is that they'd rather have everything so that they can look at it all when they're conducting analysis, because that's how we find new things. I completely agree with that (the "finding new things" part), and I think it's a great idea. After all, one of the core, foundational ideas behind creating timelines is that they can provide a great deal of context to the events we're seeing, and generally speaking, the more data we have, the more context there is likely to be available. After all, a file modification can be pretty meaningless, in and of itself...but if you are able to see other events going on "nearby", you'll begin to see what events led up to and occurred immediately following the file modification. For example, you may see that the user launched IE, began browsing the web, requested a specific page, Java was launched, a file was created, and the file in question was modified...all of which provides a great deal of context.
That leads me to this question...if you're running a tool that someone else designed and put together, and you're just pushing a button or launching a command, how do you know that the tool got everything? How do you know that what you're looking at in the output of the tool is, in fact, everything?
The reason I prefer the layered approach is that it's predicated on (a) fully understanding the goals of your examination, and (b) understanding the system that you're analyzing. Because you understand your goals, you know what it is you're trying to achieve. And because you understand that system you're analyzing...Windows XP, Windows 7, etc...you also understand how various aspects of the operating system interact and are interconnected. As such, you're able to identify where there may be additional data, and either request or create your own tools for extracting the data that you need. Yes, this approach is more manually-intensive than a more automated approach, but it does have it's positive points. For one, you'll know exactly what should be in the timeline, because you added it.
Alternatively, most often when talking to analysts about collecting data, the sense I get is that the general feeling is to "GET ALL THE THINGS!!" and then begin digging through the volumes of data to perform "analysis". I had a case a while back that involved SQL injection, and I created a timeline using only the file system metadata and the SQL injection statements from the web server logs; adding everything else available (including user profile data) would have simply made the timeline too cumbersome and too confusing to effectively analyze. I understood the goals of my exam (i.e., determine what the bad guy did and/or was able to access), and I understood the system (in this case, how SQL injection works, particularly when the database and web server are on the same system).
Now, some folks are going to say, "hey, but what if you missed something?" To that I say...well, how would you know? Or, what if you had the data available because you grabbed everything, and because you had no real knowledge of how the system acted, you had no idea that the event(s) you were looking at were important?
Something else to consider is this...what does it tell us when artifacts that we expect to see are not present? Or...
The absence of an artifact where you would expect to find one is itself an artifact.
Sound familiar? An example of this would be creating a timeline from an image acquired from a Windows system, and not seeing any indication of Prefetch file metadata in the timeline. A closer look might reveal that there are no files ending in .pf in the timeline. So...what does that tell you? I'll leave that one to the reader...
My point is that while there are (as I see it) two approaches to creating timelines, I'm not saying that one is better than the other...I'm not advocating one approach over another. I know from experience that there a lot of analysts who are not comfortable operating in the command line (the "dark place"), and as such, might not create a timeline to begin with, and in particular not one that is pretty command-line-intensive. I also know that there are a good number of folks who use log2timeline pretty regularly, but don't necessarily understand the complete set of data that it collects, or how it goes about doing so.
What I am saying is that, from my perspective, each has it's own strengths and weaknesses, and it's up to the analyst how they want to approach creating timelines. You may not want to use a manually-intensive approach (which you can easily automate using batch files, a la Corey Harrell's approach), but if you end up using a substantive framework, how do you know you're getting everything?