Defining "Forensic Value"
Who defines "forensic value" when it comes to data? How does a string of 1s and 0s become "valuable", "evidence" or "intelligence"?
These are questions I've been asking myself lately. I've recently seen purveyors of forensic analysis applications indicate that a particular capability has recently been added (or is in the process of being added) to their application/framework, without an understanding of the value of the data that is being presented, or how it would be useful to a practitioner. Sure, it's great that you've added that functionality, or that you will be doing so at some point in the very near future, but what is the value of the data that the capability provides, and how can it be used? Do your users recognize the value of the data that you're providing? If not, do you have a way of educating your users?
I was also thinking about these questions during my presentation at OSDFC...I was talking about extending RegRipper into more of a forensic scanner, and found myself looking out across a sea of blank stares. In fact, at one point I asked the audience if what I was referring to made sense, and the only person to react was Cory. ;-) As a practitioner, I believe that there is significant value in preserving and sharing the collective knowledge and experience of a group of practitioners. I believe that being able to quickly determine the existence (or absence) of a number of artifacts and removing that "low hanging fruit" (i.e., things we've seen before) is and will be extremely valuable. Based on the reaction of the attendees, it appears that Cory and I may be the only ones who see the value of something like this. Does something like this have value?
Also, at the conference, there were a number of academics and researchers in attendance (and speaking), along with a number of practitioners. Speaking to some of the practitioners between sessions and after the conference, there was a common desire to have more practical information available, and possibly even separate tracks for practitioners and developers/academics. There seemed to be a common feeling that while developing applications to parse data and run on multiple cores was definitely a good thing, this only solved a limited number issues and did not address issues that were on the plates of most practitioners right now. It would be safe to say that many of the practitioners (those that I spoke with) didn't see the value in some of the presentations.
One example of this is bulk_extractor (previous version described here), which Simson L. Garfinkel discussed during the conference. This is a tool (Windows EXE/DLLs available) that can be run against an image file, and it will extract a number of items by default, including credit card numbers and CCN track 2 data, and includes the offset to where within the image file the data was found. Something like this may seem valuable to those performing PCI forensic exams, but one of the items required for such exams is the name of the file in which the credit card number/track data were located. As such, where a tool like bulk_extractor might have the most value during a PCI forensic exam is if it were run against the pagefile and unallocated space extracted from the image. Even so, using three checks (LUN formula, length, and BIN) only gives you the possibility that you've found a CCN...we found that there are a lot of MS DLLs with embedded GUIDs that appear to be Visa CCNs, including passing the three checks. In this case, there is some value in what Simson discussed, although perhaps not at its face value.
As a side note, another thing you might want to do before running the tool is to contact Simson and determine which CCNs the tool searches for, to ensure that all of the CCNs covered by PCI are addressed. When I was doing this work, we had an issue with a commercial tool that wasn't covering all the bases, so to speak...so we rolled our own solution.
Recently, I began looking at Windows 7 Jump Lists, and quickly found some very good information about the structure of both the automatic and custom "destinations" files. One thing I could not find, however, was information regarding the structure of the DestList stream located in the automatic destinations file; to me, this seemed to be of particular value, as the numbered streams followed the MS-SHLLNK file format and contained MAC time stamps for the target file, but not for led to the creation of the stream in the first place. Looking at the contents of the DestList stream in a hex editor, and noticing a number of familiar data structures (FILETIME, etc.), it occurred to me that the DestList stream might act like a most recently used (MRU) or most frequently used (MFU) list. More research is needed, but at this point, I think I may have figured out some of the basic elements of DestList structure; so far, my parsing code is consistent across multiple DestList streams, including from multiple systems. As a practitioner, I can see the value in parsing jump list numbered streams, and I believe that there may be more value in the contents of the DestList stream, which is why I pursued examining this structure. But again...who determines the value of something like this? The question is, then...is there any value to this information, or is it just an academic exercise? Simply because I look at some data and as a practitioner believe that it is valuable, is this then a universal assignment, or is that solely my own provenance?
Who decides the forensic value of data? Clearly, during an examination the analyst would determine the relative value of data, perhaps based on the goals of the analysis. But when not involved in an examination, who decides the potential or relative value of data?
These are questions I've been asking myself lately. I've recently seen purveyors of forensic analysis applications indicate that a particular capability has recently been added (or is in the process of being added) to their application/framework, without an understanding of the value of the data that is being presented, or how it would be useful to a practitioner. Sure, it's great that you've added that functionality, or that you will be doing so at some point in the very near future, but what is the value of the data that the capability provides, and how can it be used? Do your users recognize the value of the data that you're providing? If not, do you have a way of educating your users?
I was also thinking about these questions during my presentation at OSDFC...I was talking about extending RegRipper into more of a forensic scanner, and found myself looking out across a sea of blank stares. In fact, at one point I asked the audience if what I was referring to made sense, and the only person to react was Cory. ;-) As a practitioner, I believe that there is significant value in preserving and sharing the collective knowledge and experience of a group of practitioners. I believe that being able to quickly determine the existence (or absence) of a number of artifacts and removing that "low hanging fruit" (i.e., things we've seen before) is and will be extremely valuable. Based on the reaction of the attendees, it appears that Cory and I may be the only ones who see the value of something like this. Does something like this have value?
Also, at the conference, there were a number of academics and researchers in attendance (and speaking), along with a number of practitioners. Speaking to some of the practitioners between sessions and after the conference, there was a common desire to have more practical information available, and possibly even separate tracks for practitioners and developers/academics. There seemed to be a common feeling that while developing applications to parse data and run on multiple cores was definitely a good thing, this only solved a limited number issues and did not address issues that were on the plates of most practitioners right now. It would be safe to say that many of the practitioners (those that I spoke with) didn't see the value in some of the presentations.
One example of this is bulk_extractor (previous version described here), which Simson L. Garfinkel discussed during the conference. This is a tool (Windows EXE/DLLs available) that can be run against an image file, and it will extract a number of items by default, including credit card numbers and CCN track 2 data, and includes the offset to where within the image file the data was found. Something like this may seem valuable to those performing PCI forensic exams, but one of the items required for such exams is the name of the file in which the credit card number/track data were located. As such, where a tool like bulk_extractor might have the most value during a PCI forensic exam is if it were run against the pagefile and unallocated space extracted from the image. Even so, using three checks (LUN formula, length, and BIN) only gives you the possibility that you've found a CCN...we found that there are a lot of MS DLLs with embedded GUIDs that appear to be Visa CCNs, including passing the three checks. In this case, there is some value in what Simson discussed, although perhaps not at its face value.
As a side note, another thing you might want to do before running the tool is to contact Simson and determine which CCNs the tool searches for, to ensure that all of the CCNs covered by PCI are addressed. When I was doing this work, we had an issue with a commercial tool that wasn't covering all the bases, so to speak...so we rolled our own solution.
Recently, I began looking at Windows 7 Jump Lists, and quickly found some very good information about the structure of both the automatic and custom "destinations" files. One thing I could not find, however, was information regarding the structure of the DestList stream located in the automatic destinations file; to me, this seemed to be of particular value, as the numbered streams followed the MS-SHLLNK file format and contained MAC time stamps for the target file, but not for led to the creation of the stream in the first place. Looking at the contents of the DestList stream in a hex editor, and noticing a number of familiar data structures (FILETIME, etc.), it occurred to me that the DestList stream might act like a most recently used (MRU) or most frequently used (MFU) list. More research is needed, but at this point, I think I may have figured out some of the basic elements of DestList structure; so far, my parsing code is consistent across multiple DestList streams, including from multiple systems. As a practitioner, I can see the value in parsing jump list numbered streams, and I believe that there may be more value in the contents of the DestList stream, which is why I pursued examining this structure. But again...who determines the value of something like this? The question is, then...is there any value to this information, or is it just an academic exercise? Simply because I look at some data and as a practitioner believe that it is valuable, is this then a universal assignment, or is that solely my own provenance?
Who decides the forensic value of data? Clearly, during an examination the analyst would determine the relative value of data, perhaps based on the goals of the analysis. But when not involved in an examination, who decides the potential or relative value of data?