Even More Thoughts on Timelines
I realize that I've talked a bit about timeline creation and analysis, and I know that others (Chris, as well as Rob) have covered this subject, as well.
I also realize that I may have something of a different way of going about creating timelines and conducting analysis. I don't think that, out of the ways I've seen so far that there's a wrong way, I just think that we approach things a bit differently.
For example, I am not a fan of adding everything to a timeline, at least not as an initial step. I've found that IMHO, there's a lot of noise in just the file system metadata...many times, by the time I get notified of an incident, an admin has already logged into the system, installed and run some tools (including anti-virus), maybe even deleted files and accounts, etc. When interviewed about the incident and the specific question of "what actions have you performed on the system?" is asked, that admin most often says, "nothing"...only because they tend to do these things every day and they are therefore trivial. To add to that, there's all the other stuff...automatic updates to Windows or any of the applications, etc., that also adds a great deal of stuff that needs to be culled through in order to find what you're looking for. Adding a lot of raw data to the timeline right up front may mean that you're adding a lot of noise into that timeline, and not a lot of signal.
Goals
Generally, the first step I take is to look at the goals of my exam, and try to figure out which data sources may provide me the best source of data. Sometimes it's not a matter of just automatically parsing file system metadata out of an image; in a recent exam involving "recurring software failures", my initial approach was to parse just the Application Event Log to see if I could determine the date range and relative frequency of such events, in order to get an idea of when those recurring failures would have occurred. In another exam, I needed to determine the first occurrence within the Application Event Log of a particular event ID generated by the installed AV software; to do so, I created a "nano-timeline" by parsing just the Application Event Log, and running the output through the find command to get the event records I was interested in. This provided me with a frame of reference for most of the rest of my follow-on analysis.
Selecting Data Sources
Again, I often select data sources to include in a timeline by closely examining the goals of my analysis. However, I am also aware that in many instances, the initial indicator of an incident often is only the latest indicator, albeit the first to be recognized as such. That being the case, when I start to create a timeline for analysis, I generally start off by creating a file of events from the file system metadata, as well as available Event Log records, as well as running the .evt files through evtrpt.pl to get an idea of what I should expect to see from the Event Logs. I also run the auditpol.pl RegRipper plugin against the Security hive in order to see (a) what events are being logged, and (b) when the contents of that key were last modified. If this date is pertinent to the exam, I'll be sure to include an appropriate event in my events file.
Once my initial timeline has been established and analysis begins, I can go back to my image and select the appropriate data sources to add to the timeline. For example, during investigations involving SQL injection, the most useful data source is very likely going to be the web server logs...in many cases, these may provide an almost "doskey /history"-like timeline of commands. However, adding all of the web server logs to the timeline means that I'm going to end up inundating my timeline with a lot of normal and expected activity...if any of the requested pages contains images, then there will be a lot of additional, albeit uninteresting, information in the timeline. As such, I would narrow down the web log entries to those of interest, beginning with an interative analysis of the web logs, and add the resulting events to my timeline.
That's what I ended up doing, and like I said, the results were almost like running "doskey /history" on a command prompt, except I also had the time at which the commands were entered as well as the IP address from which they originated. Having them in the timeline let me line up the incoming commands with their resulting artifacts quite nicely.
Registry Data
The same holds true with Registry data. Yes, the Registry can be a veritable gold mine of information (and intelligence) that is relevant to your examination, but like a gold mine, that data must be dug out and refined. While there is a great deal of interesting and relevant data in the system hives (SAM, Security, Software, and System), the real wealth of data will often come from the user hives, particularly for examinations that involve some sort of user activity.
Chris and Rob advocate using regtime.pl to add Registry data to the timeline, and that's fine. However, it's not something I do. IMHO, adding Registry data to a timeline by listing each key by its LastWrite time is way too much noise and not nearly enough signal. Again, that's just my opinion, and doesn't mean that either one of us is doing anything wrong. Using tools like RegRipper, MiTec's Registry File Viewer, and regslack, I'm able to go into to a hive file and get the data I'm interested in. For examinations involving user activity, I may be most interested in the contents of the UserAssist\Count keys (log2timeline extracts this data, as well), but the really valuable information from these keys isn't the key LastWrite times; rather, it's the time stamps embedded in the binary value data within the subkey values.
If you're parsing any of the MRU keys, these too can vary with respect to where the really valuable data resides. In the RecentDocs key, for example, the values (as well as those within the RecentDocs subkeys) are maintained in an MRU list; therefore, the LastWrite time of the RecentDocs key is of limited value in and of itself. The LastWrite time of the RecentDocs key has context when you determine what action caused the key to be modified; was a new subkey created, or was another file opened, or was a previously-opened file opened again, modifying the MRUListEx value?
Files opened via Adobe Reader are maintained in an MRU list, as well, but with a vastly different format from what's used by Microsoft; the most recently opened document is in the AVGeneral\cRecentFiles\c1 subkey, but the name of the actual file is embedded in a binary value named tDIText. On top of that, when a new file is opened, it becomes the value in the c1 subkey, so all of the other subkeys that refer to opened documents also get modified (what was c1 becomes c2, c2 becomes c3, etc.), and the key LastWrite times are updated accordingly, all to the time that the most recent file was opened.
Browser Stuff
Speaking of user activity, let's not forget browser cache and history files, as well as Favorites and Bookmarks lists. It's very telling to see a timeline based on file system metadata with a considerable number of files being created in browser cache directories, and then to add data from the browser history files and get that contextual information regarding where those new files came from. If a system was compromised by a browser drive-by, you may be able to discover the web site that served as the initial infection vector.
Identifying Other Data Sources
There may be times when you would want to determine other data sources that may be of use to your examination. I do tend to check the contents of the Task Scheduler service log file (schedLgU.txt) for indications of scheduled tasks being run, particularly during intrusion exams; however, this log file can also be of use if you're trying to determine if the system were up and running during a specific timeframe. This may much more pertinent to laptops and desktop systems than to servers, which may not be rebooted often, but if you look in the file, you'll see messages such as:
"Task Scheduler Service"
Started at 3/21/2010 6:18:19 AM
"Task Scheduler Service"
Exited at 3/21/2010 9:10:32 PM
In this case, the system was booted around 6:16am EST on 21 March, and shut down around 9:11pm that same day. These times are listed relative to the local system time, and may need to be adjusted based on the timezone, but it does provide information regarding when the system was operating, particularly when combined with Registry data regarding the start type of the service, and can be valuable in the absence of, or when combined with, Event Log data.
There may be other data sources available, but not all of them may be of use, depending upon the goals of your examination. For example, some AV applications record scan and detection information in the Event Log as well as in their own log files. If the log file provides no indication that any malware had been identified, is it of value to include all scans (with "no threats found") in the timeline, or would it suffice to note that fact in your case notes and the report?
Summary
Adding all available data sources to a timeline can quickly make that timeline very unwieldy and difficult to manage and analyze. Through education, training, and practice, analysts can begin to understand what data and data sources would be of primary interest to an examination. Again, this all goes back to your examination goals...understand those, and the rest comes together quite nicely.
Finally, I'm not saying that it's wrong to incorporate all available data sources into a timeline...not at all. Personally, I like having the flexibility to create mini-, micro- or nano-timelines that show specific details, and then being able to go back to the overall timeline to be able to see what I learned viewed in the context of the overall timeline.
I also realize that I may have something of a different way of going about creating timelines and conducting analysis. I don't think that, out of the ways I've seen so far that there's a wrong way, I just think that we approach things a bit differently.
For example, I am not a fan of adding everything to a timeline, at least not as an initial step. I've found that IMHO, there's a lot of noise in just the file system metadata...many times, by the time I get notified of an incident, an admin has already logged into the system, installed and run some tools (including anti-virus), maybe even deleted files and accounts, etc. When interviewed about the incident and the specific question of "what actions have you performed on the system?" is asked, that admin most often says, "nothing"...only because they tend to do these things every day and they are therefore trivial. To add to that, there's all the other stuff...automatic updates to Windows or any of the applications, etc., that also adds a great deal of stuff that needs to be culled through in order to find what you're looking for. Adding a lot of raw data to the timeline right up front may mean that you're adding a lot of noise into that timeline, and not a lot of signal.
Goals
Generally, the first step I take is to look at the goals of my exam, and try to figure out which data sources may provide me the best source of data. Sometimes it's not a matter of just automatically parsing file system metadata out of an image; in a recent exam involving "recurring software failures", my initial approach was to parse just the Application Event Log to see if I could determine the date range and relative frequency of such events, in order to get an idea of when those recurring failures would have occurred. In another exam, I needed to determine the first occurrence within the Application Event Log of a particular event ID generated by the installed AV software; to do so, I created a "nano-timeline" by parsing just the Application Event Log, and running the output through the find command to get the event records I was interested in. This provided me with a frame of reference for most of the rest of my follow-on analysis.
Selecting Data Sources
Again, I often select data sources to include in a timeline by closely examining the goals of my analysis. However, I am also aware that in many instances, the initial indicator of an incident often is only the latest indicator, albeit the first to be recognized as such. That being the case, when I start to create a timeline for analysis, I generally start off by creating a file of events from the file system metadata, as well as available Event Log records, as well as running the .evt files through evtrpt.pl to get an idea of what I should expect to see from the Event Logs. I also run the auditpol.pl RegRipper plugin against the Security hive in order to see (a) what events are being logged, and (b) when the contents of that key were last modified. If this date is pertinent to the exam, I'll be sure to include an appropriate event in my events file.
Once my initial timeline has been established and analysis begins, I can go back to my image and select the appropriate data sources to add to the timeline. For example, during investigations involving SQL injection, the most useful data source is very likely going to be the web server logs...in many cases, these may provide an almost "doskey /history"-like timeline of commands. However, adding all of the web server logs to the timeline means that I'm going to end up inundating my timeline with a lot of normal and expected activity...if any of the requested pages contains images, then there will be a lot of additional, albeit uninteresting, information in the timeline. As such, I would narrow down the web log entries to those of interest, beginning with an interative analysis of the web logs, and add the resulting events to my timeline.
That's what I ended up doing, and like I said, the results were almost like running "doskey /history" on a command prompt, except I also had the time at which the commands were entered as well as the IP address from which they originated. Having them in the timeline let me line up the incoming commands with their resulting artifacts quite nicely.
Registry Data
The same holds true with Registry data. Yes, the Registry can be a veritable gold mine of information (and intelligence) that is relevant to your examination, but like a gold mine, that data must be dug out and refined. While there is a great deal of interesting and relevant data in the system hives (SAM, Security, Software, and System), the real wealth of data will often come from the user hives, particularly for examinations that involve some sort of user activity.
Chris and Rob advocate using regtime.pl to add Registry data to the timeline, and that's fine. However, it's not something I do. IMHO, adding Registry data to a timeline by listing each key by its LastWrite time is way too much noise and not nearly enough signal. Again, that's just my opinion, and doesn't mean that either one of us is doing anything wrong. Using tools like RegRipper, MiTec's Registry File Viewer, and regslack, I'm able to go into to a hive file and get the data I'm interested in. For examinations involving user activity, I may be most interested in the contents of the UserAssist\Count keys (log2timeline extracts this data, as well), but the really valuable information from these keys isn't the key LastWrite times; rather, it's the time stamps embedded in the binary value data within the subkey values.
If you're parsing any of the MRU keys, these too can vary with respect to where the really valuable data resides. In the RecentDocs key, for example, the values (as well as those within the RecentDocs subkeys) are maintained in an MRU list; therefore, the LastWrite time of the RecentDocs key is of limited value in and of itself. The LastWrite time of the RecentDocs key has context when you determine what action caused the key to be modified; was a new subkey created, or was another file opened, or was a previously-opened file opened again, modifying the MRUListEx value?
Files opened via Adobe Reader are maintained in an MRU list, as well, but with a vastly different format from what's used by Microsoft; the most recently opened document is in the AVGeneral\cRecentFiles\c1 subkey, but the name of the actual file is embedded in a binary value named tDIText. On top of that, when a new file is opened, it becomes the value in the c1 subkey, so all of the other subkeys that refer to opened documents also get modified (what was c1 becomes c2, c2 becomes c3, etc.), and the key LastWrite times are updated accordingly, all to the time that the most recent file was opened.
Browser Stuff
Speaking of user activity, let's not forget browser cache and history files, as well as Favorites and Bookmarks lists. It's very telling to see a timeline based on file system metadata with a considerable number of files being created in browser cache directories, and then to add data from the browser history files and get that contextual information regarding where those new files came from. If a system was compromised by a browser drive-by, you may be able to discover the web site that served as the initial infection vector.
Identifying Other Data Sources
There may be times when you would want to determine other data sources that may be of use to your examination. I do tend to check the contents of the Task Scheduler service log file (schedLgU.txt) for indications of scheduled tasks being run, particularly during intrusion exams; however, this log file can also be of use if you're trying to determine if the system were up and running during a specific timeframe. This may much more pertinent to laptops and desktop systems than to servers, which may not be rebooted often, but if you look in the file, you'll see messages such as:
"Task Scheduler Service"
Started at 3/21/2010 6:18:19 AM
"Task Scheduler Service"
Exited at 3/21/2010 9:10:32 PM
In this case, the system was booted around 6:16am EST on 21 March, and shut down around 9:11pm that same day. These times are listed relative to the local system time, and may need to be adjusted based on the timezone, but it does provide information regarding when the system was operating, particularly when combined with Registry data regarding the start type of the service, and can be valuable in the absence of, or when combined with, Event Log data.
There may be other data sources available, but not all of them may be of use, depending upon the goals of your examination. For example, some AV applications record scan and detection information in the Event Log as well as in their own log files. If the log file provides no indication that any malware had been identified, is it of value to include all scans (with "no threats found") in the timeline, or would it suffice to note that fact in your case notes and the report?
Summary
Adding all available data sources to a timeline can quickly make that timeline very unwieldy and difficult to manage and analyze. Through education, training, and practice, analysts can begin to understand what data and data sources would be of primary interest to an examination. Again, this all goes back to your examination goals...understand those, and the rest comes together quite nicely.
Finally, I'm not saying that it's wrong to incorporate all available data sources into a timeline...not at all. Personally, I like having the flexibility to create mini-, micro- or nano-timelines that show specific details, and then being able to go back to the overall timeline to be able to see what I learned viewed in the context of the overall timeline.
Even More Thoughts on Timelines
Reviewed by 0x000216
on
Tuesday, March 23, 2010
Rating: 5