Document Metadata

Okay, yeah, so I've been blogging a lot over the past couple of months about extracting document metadata as part of gathering threat intelligence.


This handler diary provided analysis of "malspam pushing Emotet", and this follow up post illustrated how to conduct static analysis of the document itself.  I have used several of the tools mentioned, but had not yet heard of "vipermonkey", and open-source VBA emulator.  Used in conjunction with oledump.py, you can really get a lot of traction with respect to static analysis of the malicious document.

While the second handler diary post focuses on analysis of the malicious macro, what neither post does is illustrate the document metadata. Below is the output of wmd.pl, run against a sample downloaded from VT:

C:\Perl>wmd.pl d:\cases\maldoc\maldoc
--------------------
Statistics
--------------------
File    = d:\cases\maldoc\maldoc
Size    = 215040 bytes
Magic   = 0xa5ec (Word 8.0)
Version = 193
LangID  = Russian

Document has picture(s).

Document was created on Windows.

Magic Created : MS Word 97
Magic Revised : MS Word 97

--------------------
Summary Information
--------------------
Title        : sdf
Subject      : df
Authress     : admin
LastAuth     : admin
RevNum       : 2
AppName      : Microsoft Office Word
Created      : 26.07.2017, 11:51:00
Last Saved   : 26.07.2017, 11:51:00
Last Printed :

--------------------
Document Summary Information
--------------------
Organization : home

...and from oledmp.pl:

C:\Perl>oledmp.pl -f d:\cases\maldoc\maldoc -l
Root Entry  Date: 26.07.2017, 11:51:59  CLSID: 00020906-0000-0000-C000-000000000046
    1 F..   55949                      \Data
    2 F..    7359                      \1Table
    3 F..    4148                      \WordDocument
    4 F.T    4096                      \ SummaryInformation
    5 F.T    4096                      \ DocumentSummaryInformation
    6 D..       0 26.07.2017, 11:51:59 \Macros
    7 D..       0 26.07.2017, 11:51:59 \Macros\VBA
    8 FM.   88908                      \Macros\VBA\ThisDocument
    9 F..     532                      \Macros\VBA\__SRP_2
   10 F..     156                      \Macros\VBA\__SRP_3
   11 FM.    8137                      \Macros\VBA\zjUb2S
   12 FM.    8877                      \Macros\VBA\cvDTF
   13 FM.    4906                      \Macros\VBA\FX9UL
   14 F..   15451                      \Macros\VBA\_VBA_PROJECT
   15 F..     739                      \Macros\VBA\dir
   16 F..    1976                      \Macros\VBA\__SRP_0
   17 F..     198                      \Macros\VBA\__SRP_1
   18 F..      98                      \Macros\PROJECTwm
   19 F..     476                      \Macros\PROJECT
   20 F.T     114                      \ CompObj

We can see that the dates displayed by both tools line up, and we can use oledmp.pl to further list the contents (raw, or hex) of the various streams.  

So, how can any of this be of value, and why does any of this matter?  Well, at BlackHat last week, Allison Wikoff spent a great deal of her time being interviewed about some really fantastic research that she'd conducted on the "Mia Ash" persona (here is the original SecureWorks posting of the results of her research).

From the Wired article:
Eventually, Ash sent the staffer an email with a Microsoft Excel attachment for a photography survey. She asked him to open it on his office network, telling him that it would work best there. After a month of trust-building conversation, he did as he was told. The attachment promptly launched a malicious macro...

So, this really illustrates the dedication of these threat actors...they establish a persona, including social media "pocket litter", and spend time developing a relationship with their target.  As a very small part of her research, Allison took a look at the metadata embedded within the Excel spreadsheet, and found that the user information referred to "Mia Ash".  This further illustrated the depths to which the threat actors would go in order to make the persona appear authentic; not only did they populate multiple social media sites and create a "history" for the persona, but they also ensured that the metadata in the documents sent to intended victims included the 'right' contents to support the persona.  That's right, it's exactly the way it sounds...the metadata embedded in the spreadsheet specifically referred to "Mia Ash" as the authorized user of the MS Office products.

I know what you're going to say..."yeah, but that stuff can be changed/modified...".  Yes, it can...but the point is, how often is that actually done?  Look at the above listed output from wmd.pl...does it look as if any effort was put into modifying the metadata that populated the Word97 file?

Something I've said about Windows systems and DFIR work is that as the versions of Windows have been developed, the amount of information that is automatically recorded as malware or an adversary interacts with the endpoint environment has increased significantly.  In many cases, this seems to be overlooked when it comes to developing threat intelligence for some reason; in spam and phishing campaigns, a lot of the different artifacts are examined...the contents of the email (headers, body, etc.), attachment macros, second-stage downloads, etc.  But what is often missed is document metadata embedded in the attachment; Word docs, Excel spreadsheets, and even LNK shortcut files can all be rich in valuable information.  One such example is looking at time stamps...when an email was sent, when a document was created, when a binary was compiled, etc., and lining all of those up to illustrate just how organized and planned out an attack appears to be.