Metadata
I've blogged about metadata before (here, and here), but it's been a while, and this is a subject worth revisiting every so often. Metadata has long been an issue for users, and a valuable resource for investigators and forensic analysts. There are a number of file types (images, documents) that allow for embedded metadata...this doesn't mean that it's always populated, but I think you'd be surprised how much information is, in fact, leaked via embedded metadata. MS Office documents, PDFs, and JPG images are all known to be capable of carrying a range of embedded metadata.
One example of embedded metadata coming back to bite someone that I've referenced in my books is the Blair issue discussed by the ComputerBytesMan. This particular issue dated back to 2003, and it's clear that an older version of MS Word was used at the time. This version of MS Word used the OLE "structured storage" format; more recent versions of the Office documents don't use this format any longer, but it is used in Jump Lists, Sticky Notes, and IE session restore files.
Metadata has also brought down others. In the spring of 2012, metadata embedded in an image taken with a smartphone was used to track down the hacker "w0rmer".
One of the best tools I've found for collecting metadata from a wide range of file types (images, documents) is Phil Harvey's EXIFTool. This is a command line tool (available for Windows, Mac OS X, and Linux), which means it's easy to script; you can write simple batch files to extract metadata from all files in a folder, or all files of a particular type (JPG, DOC/DOCX, etc.) in a directory structure. If you prefer GUI tools, check out the EXIFToolGUI...simply remove the "(-k)" from the EXIFTool file name and put the GUI application in the same directory, and you're ready to go.
For more recent versions of MS Office documents, you might consider using read_open_xml_win.pl.
Removing embedded metadata can be pretty easy without employing any special tools. For example, you can remove embedded metadata from JPG images (the format used on digital cameras and smartphones) by using MS Paint to convert the image to TIFF format, then back to JPG.
Metadata can be a very valuable resource of investigators. Computer systems may include a number of images or documents from which metadata can be extracted. When examining systems, analysts should be sure to include looking for smartphone backups files, as images found in these backups may have considerable intelligence value.
Finding images or documents to check for embedded metadata is easy. Start with your own hard drive or file server. Alternatively, you can run Google searches (i.e., "site:domain.com filetype:doc") and find a great deal of documents available online.
Speaking of metadata, one file type that contains some interesting metadata is XP/2003 .job files. About three years ago, I had written a script to parse these files, and was recently asked to provide a copy of this script. I don't usually do that, as most often I don't hear back as to how well the script ran, if at all...but I decided to make an exception and provide the script this time. It turns out that the script had an issue, and Corey Harrell was nice enough to provide a couple of .job files for testing. As it turns out, when I wrote the script, I hadn't had any .job files that had never been run, and the script was failing because I hadn't dealt with the case where the time fields were all zero. Thanks to Corey, I was able to quickly get that fixed and provide a working copy.
One example of embedded metadata coming back to bite someone that I've referenced in my books is the Blair issue discussed by the ComputerBytesMan. This particular issue dated back to 2003, and it's clear that an older version of MS Word was used at the time. This version of MS Word used the OLE "structured storage" format; more recent versions of the Office documents don't use this format any longer, but it is used in Jump Lists, Sticky Notes, and IE session restore files.
Metadata has also brought down others. In the spring of 2012, metadata embedded in an image taken with a smartphone was used to track down the hacker "w0rmer".
One of the best tools I've found for collecting metadata from a wide range of file types (images, documents) is Phil Harvey's EXIFTool. This is a command line tool (available for Windows, Mac OS X, and Linux), which means it's easy to script; you can write simple batch files to extract metadata from all files in a folder, or all files of a particular type (JPG, DOC/DOCX, etc.) in a directory structure. If you prefer GUI tools, check out the EXIFToolGUI...simply remove the "(-k)" from the EXIFTool file name and put the GUI application in the same directory, and you're ready to go.
For more recent versions of MS Office documents, you might consider using read_open_xml_win.pl.
Removing embedded metadata can be pretty easy without employing any special tools. For example, you can remove embedded metadata from JPG images (the format used on digital cameras and smartphones) by using MS Paint to convert the image to TIFF format, then back to JPG.
Metadata can be a very valuable resource of investigators. Computer systems may include a number of images or documents from which metadata can be extracted. When examining systems, analysts should be sure to include looking for smartphone backups files, as images found in these backups may have considerable intelligence value.
Finding images or documents to check for embedded metadata is easy. Start with your own hard drive or file server. Alternatively, you can run Google searches (i.e., "site:domain.com filetype:doc") and find a great deal of documents available online.
Speaking of metadata, one file type that contains some interesting metadata is XP/2003 .job files. About three years ago, I had written a script to parse these files, and was recently asked to provide a copy of this script. I don't usually do that, as most often I don't hear back as to how well the script ran, if at all...but I decided to make an exception and provide the script this time. It turns out that the script had an issue, and Corey Harrell was nice enough to provide a couple of .job files for testing. As it turns out, when I wrote the script, I hadn't had any .job files that had never been run, and the script was failing because I hadn't dealt with the case where the time fields were all zero. Thanks to Corey, I was able to quickly get that fixed and provide a working copy.