Timeline Creation and Analysis
I haven't really talked about timelines in a while, in part because I've been creating and analyzing them as part of every engagement I've worked. I do this because in most...well, in all cases...the analysis I need to do involves something that happened a certain time. Sometimes it's a matter of determining what the event or events were, other times it's a matter of determining when the event(s) happened. The fact is that the analysis involves something happened at some point in time...and that's the perfect time to create a timeline.
With respect to creating timelines, I'm not the only using timelines. Chris posted recently on using the TSK tool fls to create a bodyfile from a live system, and Rob posted on creating timelines from Windows Volume Shadow Copies. Using Volume Shadow Copies to create timelines is a great way to get a view into the state of the system at some point in the past...something that can be extremely valuable in an investigation.
These are great places to start, but consider all that you could do if you took advantage of other data on the system. In order to get a more granular view into what happened when on a system, timelinesshould need to incorporate other data sources. Incorporating Event Log (.evt, .evtx) records may show you who was logged on to the system, and how (i.e., locally, via RDP, etc.). Now, auditing isn't always enabled, or enabled enough to provide indications of what you're looking for, but many times, there's some information there that may be helpful.
Including user web browser activity into a timeline has been extremely useful in tracking down things like browser drive-bys, etc. For example, by including web browsing activity, you may see the site that the user visited just prior to a DLL being created on the system and a BHO being added to the Registry. Also, don't forget to check the user's Bookmarks or Favorites...there at timestamps in those files, as well.
When I was working at IBM and conducting data breach investigations, many times we'd see SQL Injection being used in some manner. Parsing all of the web server logs for the necessary data required an iterative approach (i.e., search for SQL injection, collect IP addresses, re-run searches for the IP addresses, etc.), but adding those log entries to the timeline can provide a great deal of context to youranalysis. Say, for example, that the MS SQL Server database is on the same system as the IIS web server...any commands run via SQLi would leave artifacts on that system, just as creating/modifying files would. If the database is on another system entirely, and you're using the five field TLN format, you can easily correlate data from both systems in the same timeline (taking clock skew into account, of course). This works equally well for exams involving ASP or PHP web shells, as you can see where the command was sent (in the web server logs), as well as the artifacts (within the file system, other artifacts), all right there together.
Consider all of the other sources of data on a system, such as other application (i.e., AV, etc.) logs. And don't get me started on the Registry...well, okay, there you go. There're also Task Scheduler Logs, Prefetch files, as well as metadata from other files (PDF, Office documents, etc.) that can be added to a timeline as necessary. Depending on the system, and what you're looking for, there can be quite a lot of data.
But what does this work get me? Well, a couple of things, actually. For one, there's context. Say you start with the file system metadata in your timeline, and you kind of have a date that you're interested in, when you think the incident may have happened. So, you add the contents of the Event Logs, and you see that the user "Kanye" logged in...event ID 528, type 10. Hey, wait a sec...since when does "Kanye" log in via RDP? Why would he, if he's in the office, sitting at this desk? So then we add the user Registry hive information, and we see "cmd/1" in the RunMRU key (the most recent entry) and shortly thereafter we notice that "Kany3" logged in via RDP. We can get the user information from the SAM hive, as well as any additional information from the newly-created user profile. So as we add data, we begin to also add context with respect to activity we're seeing on the system.
We can also use the timeline to provide an increasing or higher level of overall confidence in the data itself. Let's say that we start with the file system metadata...well, we know that this may not be entirely accurate, as file system MAC times can be easily manipulated. These times, as presented by most tools, are usually derived from the $STANDARD_INFORMATION attribute within the MFT. However, what if I add the creation date of the file from the $FILE_NAME attribute, or simply compare that value to the creation date from the $STANDARD_INFORMATION attribute? Okay, maybe now I've raised my relative confidence level with respect to the data. So now, I add other sources of data, so rather than just seeing a file creation or modification event, I now see other activity (within close temporal proximity) that leads to that event?
Let's say that I start off with a timeline based just on file system metadata (Windows XP), and I see a file creation event for a Prefetch file. The Prefetch file is for an application accessed through the Windows shell, so I would want to perhaps see if the Event Log contained any login information so I could determine which user was logged in, as well as when they'd logged in; however, I find out that auditing of login events is not enabled. Okay, I check the ProfileList key in the Software hive against the user profile directories, and I find out that all users who've logged into the system are local users...so I can go into the SAM hive and get things like Last Login dates. I then parse the UserAssist key for each user, and I find that just prior to the Prefetch file I'm interested in being created, the user "Kanye" navigated through the Start button to launch that application. Now, the file system time may be easily changed, but I now have less mutable data (i.e., a timestamp embedded in a binary Registry value) that corroborates the file system time, which increases my relative level of confidence with respect to the data.
Now, jump ahead a couple of days in time...other things had gone on on the system prior to acquisition, and this time, I'm interested in the creation AND modification times of this Prefetch file. It's been a couple of days, and what I find at this point is that the UserAssist information tells me that the application referred to by the Prefetch file has actually been run several times, between the creation and modification date; now, my UserAssist information corresponds to modification time of the file. So, now I add metadata from the Prefetch file, and I have data that supports the modification time (the last time the application was run, the timestamp for which is embedded in the Prefetch file, would correspond to when the Prefetch file was last modified), as well as the number of times the user launched the application.
Now, if the application of interest is something like MSWord, I might also be interested in things such as any documents recently accessed, particularly via common dialogs. The point is that most analysts understand that file system metadata may be easily modified, or perhaps misinterpreted; by adding additional information to the timeline, I not only add context, but by adding data sources that are less likely to be modified (timestamps embedded in files as metadata, Registry key LastWrite times, etc.), I can raise my relative level of confidence in the data itself.
One final point...incident responders increasingly face larger and larger data sets, requiring some sort of triage to identify, reduce, or simply prioritize the scope of an engagement. As such, having access to extra eyes and hands...quickly...can be extremely valuable. So consider this...which is faster? Imaging 50 systems and sitting down and going through them, or collecting specific data (file system metadata and selected files) and providing to someone else to analyze, while on-site response activities continue? The off-site analyst gets the data, processes it and beings analysis, narrowing the scope...now we're down from 50 systems, to 10...and most importantly, we're already starting to get answers.
Let's say that I have a system with a 120GB system partition, of which 50 GB is used. Which is faster to collect...the overall image, or file system metadata? Which is smaller? Which can be more easily provided to someone off-site? Let's say that the file created when collecting file system metadata is 11MB. Okay...add Registry data, Event Logs, and maybe some specific files, and I'm up to...what...13MB. This is even smaller if I zip it...let's say 9MB. So now, while the next 12oGB system is being acquired, I'm providing the data to an off-site analyst, and she's able to follow a process for creating a timeline, and begin analyzing the data. Uploading a 9MB file is much faster than shipping a 120GB image via FedEx.
As a responder, I've had customers in California. Call me after happy hour on the East Coast, and the first flight out will be sometime in the next 12 hrs. It's usually 4 1/2 hrs to the San Jose area, but 6 hrs to LA or Seattle, WA. Then depending on where the customer is actually located, it may be another 2 hrs for me to get to the gate, get a rental car and arrive on-site. However, if there are trained first responders on staff, I can begin analyzing data (and requesting additional data) within, say, 2 hours of the initial call.
So another way cool thing is that this can also be used in data breach cases. How's that? Well, if you're shipping compressed file system metadata to someone (and you've encrypted it), you're not shipping file contents...so you're not exposing sensitive data. Providing the necessary information may not answer the question, but it can definitely narrow down the answer and help to identify and reduce the overall scope of an incident.
With respect to creating timelines, I'm not the only using timelines. Chris posted recently on using the TSK tool fls to create a bodyfile from a live system, and Rob posted on creating timelines from Windows Volume Shadow Copies. Using Volume Shadow Copies to create timelines is a great way to get a view into the state of the system at some point in the past...something that can be extremely valuable in an investigation.
These are great places to start, but consider all that you could do if you took advantage of other data on the system. In order to get a more granular view into what happened when on a system, timelines
Including user web browser activity into a timeline has been extremely useful in tracking down things like browser drive-bys, etc. For example, by including web browsing activity, you may see the site that the user visited just prior to a DLL being created on the system and a BHO being added to the Registry. Also, don't forget to check the user's Bookmarks or Favorites...there at timestamps in those files, as well.
When I was working at IBM and conducting data breach investigations, many times we'd see SQL Injection being used in some manner. Parsing all of the web server logs for the necessary data required an iterative approach (i.e., search for SQL injection, collect IP addresses, re-run searches for the IP addresses, etc.), but adding those log entries to the timeline can provide a great deal of context to youranalysis. Say, for example, that the MS SQL Server database is on the same system as the IIS web server...any commands run via SQLi would leave artifacts on that system, just as creating/modifying files would. If the database is on another system entirely, and you're using the five field TLN format, you can easily correlate data from both systems in the same timeline (taking clock skew into account, of course). This works equally well for exams involving ASP or PHP web shells, as you can see where the command was sent (in the web server logs), as well as the artifacts (within the file system, other artifacts), all right there together.
Consider all of the other sources of data on a system, such as other application (i.e., AV, etc.) logs. And don't get me started on the Registry...well, okay, there you go. There're also Task Scheduler Logs, Prefetch files, as well as metadata from other files (PDF, Office documents, etc.) that can be added to a timeline as necessary. Depending on the system, and what you're looking for, there can be quite a lot of data.
But what does this work get me? Well, a couple of things, actually. For one, there's context. Say you start with the file system metadata in your timeline, and you kind of have a date that you're interested in, when you think the incident may have happened. So, you add the contents of the Event Logs, and you see that the user "Kanye" logged in...event ID 528, type 10. Hey, wait a sec...since when does "Kanye" log in via RDP? Why would he, if he's in the office, sitting at this desk? So then we add the user Registry hive information, and we see "cmd/1" in the RunMRU key (the most recent entry) and shortly thereafter we notice that "Kany3" logged in via RDP. We can get the user information from the SAM hive, as well as any additional information from the newly-created user profile. So as we add data, we begin to also add context with respect to activity we're seeing on the system.
We can also use the timeline to provide an increasing or higher level of overall confidence in the data itself. Let's say that we start with the file system metadata...well, we know that this may not be entirely accurate, as file system MAC times can be easily manipulated. These times, as presented by most tools, are usually derived from the $STANDARD_INFORMATION attribute within the MFT. However, what if I add the creation date of the file from the $FILE_NAME attribute, or simply compare that value to the creation date from the $STANDARD_INFORMATION attribute? Okay, maybe now I've raised my relative confidence level with respect to the data. So now, I add other sources of data, so rather than just seeing a file creation or modification event, I now see other activity (within close temporal proximity) that leads to that event?
Let's say that I start off with a timeline based just on file system metadata (Windows XP), and I see a file creation event for a Prefetch file. The Prefetch file is for an application accessed through the Windows shell, so I would want to perhaps see if the Event Log contained any login information so I could determine which user was logged in, as well as when they'd logged in; however, I find out that auditing of login events is not enabled. Okay, I check the ProfileList key in the Software hive against the user profile directories, and I find out that all users who've logged into the system are local users...so I can go into the SAM hive and get things like Last Login dates. I then parse the UserAssist key for each user, and I find that just prior to the Prefetch file I'm interested in being created, the user "Kanye" navigated through the Start button to launch that application. Now, the file system time may be easily changed, but I now have less mutable data (i.e., a timestamp embedded in a binary Registry value) that corroborates the file system time, which increases my relative level of confidence with respect to the data.
Now, jump ahead a couple of days in time...other things had gone on on the system prior to acquisition, and this time, I'm interested in the creation AND modification times of this Prefetch file. It's been a couple of days, and what I find at this point is that the UserAssist information tells me that the application referred to by the Prefetch file has actually been run several times, between the creation and modification date; now, my UserAssist information corresponds to modification time of the file. So, now I add metadata from the Prefetch file, and I have data that supports the modification time (the last time the application was run, the timestamp for which is embedded in the Prefetch file, would correspond to when the Prefetch file was last modified), as well as the number of times the user launched the application.
Now, if the application of interest is something like MSWord, I might also be interested in things such as any documents recently accessed, particularly via common dialogs. The point is that most analysts understand that file system metadata may be easily modified, or perhaps misinterpreted; by adding additional information to the timeline, I not only add context, but by adding data sources that are less likely to be modified (timestamps embedded in files as metadata, Registry key LastWrite times, etc.), I can raise my relative level of confidence in the data itself.
One final point...incident responders increasingly face larger and larger data sets, requiring some sort of triage to identify, reduce, or simply prioritize the scope of an engagement. As such, having access to extra eyes and hands...quickly...can be extremely valuable. So consider this...which is faster? Imaging 50 systems and sitting down and going through them, or collecting specific data (file system metadata and selected files) and providing to someone else to analyze, while on-site response activities continue? The off-site analyst gets the data, processes it and beings analysis, narrowing the scope...now we're down from 50 systems, to 10...and most importantly, we're already starting to get answers.
Let's say that I have a system with a 120GB system partition, of which 50 GB is used. Which is faster to collect...the overall image, or file system metadata? Which is smaller? Which can be more easily provided to someone off-site? Let's say that the file created when collecting file system metadata is 11MB. Okay...add Registry data, Event Logs, and maybe some specific files, and I'm up to...what...13MB. This is even smaller if I zip it...let's say 9MB. So now, while the next 12oGB system is being acquired, I'm providing the data to an off-site analyst, and she's able to follow a process for creating a timeline, and begin analyzing the data. Uploading a 9MB file is much faster than shipping a 120GB image via FedEx.
As a responder, I've had customers in California. Call me after happy hour on the East Coast, and the first flight out will be sometime in the next 12 hrs. It's usually 4 1/2 hrs to the San Jose area, but 6 hrs to LA or Seattle, WA. Then depending on where the customer is actually located, it may be another 2 hrs for me to get to the gate, get a rental car and arrive on-site. However, if there are trained first responders on staff, I can begin analyzing data (and requesting additional data) within, say, 2 hours of the initial call.
So another way cool thing is that this can also be used in data breach cases. How's that? Well, if you're shipping compressed file system metadata to someone (and you've encrypted it), you're not shipping file contents...so you're not exposing sensitive data. Providing the necessary information may not answer the question, but it can definitely narrow down the answer and help to identify and reduce the overall scope of an incident.