More Thoughts on Timeline Analysis

I had a conversation with Cory recently, and during the conversation, he mentioned that if I was going to present at a conference and talk about timeline analysis, I should present something novel. I struggled with that one...I don't see a lot of folks talking about using timeline analysis, and that may have to do with the fact that constructing and analyzing a timeline is a very manual process at this point, and that's a likely too high an obstacle for many folks, even with the tools I've provided, or using other tools, such as log2timeline.

Something Cory mentioned really caught my attention, as well. He suggested that various data sources might provide the analyst with a relative level of confidence as to the data itself, and what's being shown. For example, when parsing the MFT (via analyzeMFT or Mark Menz's MFTRipper), the analyst might have more confidence in the temporal values from the $FILE_NAME attribute than from the $STANDARD_INFORMATION attribute, as tools that modify file MAC times modify the temporal values in the latter attribute. See Episode 84 from the CommandLine KungFu blog for a good example that illustrates what I'm talking about...

This is an interesting concept, and something that I really wanted to noodle over and expand. One of the reasons I look to the Registry for so much valuable data is...well...because it's there, but also because I have yet to find a public API that allows you to arbitrarily alter Registry key LastWrite times. Sure, if you want to change a LastWrite time, simply add and delete a value from a key...but I have yet to find an API that will allow me to backdate a LastWrite time on a live system. But LastWrite times aren't the full story...there are a number of keys whose value data contains timestamps.

Particularly for Windows systems, there are a number of sources of timestamped data that can be added to a timeline...metadata from shortcut files, Prefetch files, documents, etc. There are also Event Log records, and entries from other logs (mrt.log, AV logs, etc.).

So, while individual sources of timeline data may provide the analyst with varying levels of relative confidence as to the veracity and validity of the data, populating a timeline with multiple sources of data can serve to raise the analyst's level of relative confidence.

Let's look at some examples of how this sort of thinking can be applied. I did PCI breach investigations for several years, and one of the things I saw pretty quickly was that locating "valid" credit card numbers within an image gave a lot of false positives, even with three different checks (i.e., overall length, BIN, and Luhn check). However, as we added additional checks for track data, our confidence that we had found a valid credit card number increased. Richard talks about something similar in his Attribution post...by using 20 characteristics, your relative confidence of accurate attribution is increased over using, say, 5 characteristics. Another example is malware detection...running 3 AV scanners provides an analyst with a higher level of relative confidence than running just one, just as following a comprehensive process that includes other checks and tools provides an even higher level of relative confidence.

Another aspect of timeline analysis that isn't readily apparent is that as we add more sources, we also add context to the data. For example, we have a Prefetch file from an XP or Vista system, so we have the metadata from that Prefetch file. If we add the file system metadata, we have when the file was first created on the system, and the last modification time of the file should be very similar to the timestamp we extract from the Prefetch file metadata. We may also have other artifacts from the file system metadata, such as other files created or modified as a result of the application itself being run. Now, Prefetch files and file system metadata apply to the system, but not to the specific user...so we may get a great deal of context if we find that a user launched the application, as well as when they took this action. We may also get additional context from an Event Log record that shows, perhaps a login with event ID 528, type 10, indicating a login via RDP. But wait, we know that the user to which the user account applies was in the office that day...

See how using multiple data sources builds our "story" and adds context to our data? Further, the more data we have that shows the same or similar artifacts, the greater relative confidence we have in the data itself. This is, of course, in addition to the relative level of confidence that we have in the various individual sources. I'm not a mathy guy, so I'm not really sure how to represent this in a way that's not purely arbitrary, but to me, this is really a compelling reason for creating timelines for analysis.

What say you?