BinMode: Understanding Data Structures
As most analysts are aware, the tools we use provide a layer of abstraction over the data with which we're engaged. What we see can often depend upon the tool that we're using. For example, if a tool is written by a developer and the intended user is an administrator, then while the tool may be useful to a DFIR analyst, it may not provide all of the information that is truly useful to that analyst, based on the goals of their examination, the data that's actually presented by the tool, etc.
This is why understanding the data structures that we're working with can often be very beneficial.
By understanding what is actually available in the data structures, we can:
1. Make better use of the information that is available.
2. Locate deleted data in unallocated space (or other unstructured data)
A good recent example of this is the discussion of Java *.idx files, and the resulting parsers that have been created. Understanding the actual data structures that make up the headers and subsequent sections of these files lets us understand what we're looking at. For example, for a successful download, the header contains a field that tells us the size of the content. Most legitimate downloads also include this information in the server response, but malicious downloads of Java content don't always include this information. As such, we have not only have a good way for determining what may be a suspicious download, but we also have a pivot point we can use...we can use the content size to look for files of that size that were created on the system.
Another example of this is the IE history file format (thanks to Joachim for all the work he's done in documenting the format). A lot of analysts run various tools to parse out the user's IE web browser history, but how many understand what's actually in the structure? I'm not saying that you've memorized it and parse everything by hand, but rather that you know enough about it at least be curious when something is missing. For example, according to Joachim's documentation, each "URL" record can contain multiple time stamps, including when the page requested was last modified, last sync'd, and when it expires. I say "can" because in some cases, these may be set to 0. Further, according to the documentation, there's a flag setting that we can use to determine if the HTTP request was a GET or POST request.
How else can this be helpful? Mandiant's recently released APT intel report provides a great deal of useful information, including references to "wininet" on page 31. The WinInet API is what produces the artifacts most commonly associated with IE. As such, if your organization uses Firefox or Chrome, rather than IE, or you see that the "Default User", "NetworkService", or "LocalService" profiles begin developing quite large IE histories, this may be an indicator of activity.
I've used the documented format for Windows XP and 2003 Event Log records to not only parse the Event Log files on a system, but also locate and recover deleted event records from unallocated space. In fact, I had an instance where the intruder had cleared the Security Event Log after gaining access to the system, but I was able to recover 334 deleted event records from unallocated space, including the record that showed when they'd initially logged into the system.
Addendum, 20 Feb: I've mentioned DOSDate format time stamps in this blog before, as well as the data structures in which they're used (shell items in shellbag artifacts, ComDlg32 Registry subkey values, LNK files, Jump Lists, etc.). They're very pervasive across all Windows platforms, and more so on Windows 7 systems. This MS link provides some information regarding how these time stamps are constructed, as well as how they can play havok with timeline analysis if you're not familiar with them.
This doesn't apply just to Windows systems. A great example of this is Mari's recent blog post, Finding and Reverse Engineering Deleted SMS Messages. Mari provides a complete walk-thru of the data source being examined, going so far as to not only identify the data structure, but to also demonstrate how she did this, as well as to show how someone could go about identifying deleted SMS messages.
What's really interesting is that back in the day, there used to be more of a focus on understanding data structures. For example, some DF training programs would require candidates to parse partition tables and compute NTFS MFT data runs.
This is why understanding the data structures that we're working with can often be very beneficial.
By understanding what is actually available in the data structures, we can:
1. Make better use of the information that is available.
2. Locate deleted data in unallocated space (or other unstructured data)
A good recent example of this is the discussion of Java *.idx files, and the resulting parsers that have been created. Understanding the actual data structures that make up the headers and subsequent sections of these files lets us understand what we're looking at. For example, for a successful download, the header contains a field that tells us the size of the content. Most legitimate downloads also include this information in the server response, but malicious downloads of Java content don't always include this information. As such, we have not only have a good way for determining what may be a suspicious download, but we also have a pivot point we can use...we can use the content size to look for files of that size that were created on the system.
Another example of this is the IE history file format (thanks to Joachim for all the work he's done in documenting the format). A lot of analysts run various tools to parse out the user's IE web browser history, but how many understand what's actually in the structure? I'm not saying that you've memorized it and parse everything by hand, but rather that you know enough about it at least be curious when something is missing. For example, according to Joachim's documentation, each "URL" record can contain multiple time stamps, including when the page requested was last modified, last sync'd, and when it expires. I say "can" because in some cases, these may be set to 0. Further, according to the documentation, there's a flag setting that we can use to determine if the HTTP request was a GET or POST request.
How else can this be helpful? Mandiant's recently released APT intel report provides a great deal of useful information, including references to "wininet" on page 31. The WinInet API is what produces the artifacts most commonly associated with IE. As such, if your organization uses Firefox or Chrome, rather than IE, or you see that the "Default User", "NetworkService", or "LocalService" profiles begin developing quite large IE histories, this may be an indicator of activity.
I've used the documented format for Windows XP and 2003 Event Log records to not only parse the Event Log files on a system, but also locate and recover deleted event records from unallocated space. In fact, I had an instance where the intruder had cleared the Security Event Log after gaining access to the system, but I was able to recover 334 deleted event records from unallocated space, including the record that showed when they'd initially logged into the system.
Addendum, 20 Feb: I've mentioned DOSDate format time stamps in this blog before, as well as the data structures in which they're used (shell items in shellbag artifacts, ComDlg32 Registry subkey values, LNK files, Jump Lists, etc.). They're very pervasive across all Windows platforms, and more so on Windows 7 systems. This MS link provides some information regarding how these time stamps are constructed, as well as how they can play havok with timeline analysis if you're not familiar with them.
This doesn't apply just to Windows systems. A great example of this is Mari's recent blog post, Finding and Reverse Engineering Deleted SMS Messages. Mari provides a complete walk-thru of the data source being examined, going so far as to not only identify the data structure, but to also demonstrate how she did this, as well as to show how someone could go about identifying deleted SMS messages.
What's really interesting is that back in the day, there used to be more of a focus on understanding data structures. For example, some DF training programs would require candidates to parse partition tables and compute NTFS MFT data runs.
Training
Interested in Windows DFIR training? Windows Forensic Analysis, 11-12 Mar; Timeline Analysis, 9-10 Apr. Pricing and Calendar. Send email here to register. Each course includes access to tools and techniques that you won't find anywhere else, as well as a demonstration of the use of the Forensic Scanner.
On 10-12 June 2013, a Windows Forensic Analysis and Registry Analysis combo course will be hosted at the Santa Cruz PD training facility.
Course descriptions and other info on the courses is available here. Pricing for the combo course is $749 per seat, and will be listed on the ASI training page shortly.
Course descriptions and other info on the courses is available here. Pricing for the combo course is $749 per seat, and will be listed on the ASI training page shortly.