OLE...OLE, OLE, OLE!
Okay, if you've never seen The Replacements then the title of this post won't be nearly as funny to you as it is to me...but that's okay.
I recently posted an update blog that included a brief discussion of a tool I was working on, and why. In short, and due in part to a recently publicized change in tactics, I wanted to dust off some old code I'd written and see what information or intel I could collect.
The tactic I'm referring to involves the use of malware delivered via '.pub' files. I wasn't entirely too interested in this tactic until I found out that .pub (MS Publisher) files are OLE format files.
The code I'm referring to is wmd.pl, something I wrote a while back (according to the header information, the code is just about 10 yrs old!) and was written specifically to parse documents created using older versions of MS Word, specifically those that used OLE.
OLE
The Object Linking and Embedding (OLE) file format is pretty well documented at the MS site, so I won't spend a lot of time discussing the details here. However, I will say that MS has referred to the file format as a "file system within a file", and that's exactly what it is. If you look at the format, there's actually a 'sector allocation table', and it's laid out very similar to the FAT file system. Also, at some levels of the 'file system' structure, there are time stamps, as well. Now, the exact details of when and how these time stamps are created and/or modified (or if they are, at all) isn't exactly clear, but they can serve as an indicator, and something that we can incorporate with other artifacts such that when combining them with context, we can get a better idea of their validity and value.
For most of us who have been in the IR business for a while, when we hear "OLE", we think of the Blair document, and in particular, the file format used for pre-2007 versions of MS Office documents. Further, many of us thought that with the release of Office 2007, the file format was going to disappear, and at most, we'd maybe have to dust off some tools or analysis techniques at some point in the future. Wow, talk about a surprise! Not only did the file format not disappear, as of Windows 7, we started to see it being used in more and more of the artifacts we were seeing on the system. Take a look at the OLE Compound File page on the ForensicWiki for a list of the files on Windows systems that utilize the OLE file format (i.e., StickyNotes, auto JumpLists, etc.). So, rather than "going away", the file format has become more pervasive over time. This is pretty fascinating, particularly if you have a detailed understanding of the file structure format. In most cases when you're looking at these files on a Windows system, the contents of the files will be what you're most interested in; for example, with automatic Jump Lists, we may be most interested in the DestList stream. However, when an OLE compound file is created off of the system, perhaps through the use of an application, we (as analysts) would be very interested in learning all we can about the file itself.
Tools
So, the idea behind the tool I was working on was to pull apart one component of the overall attack to see if there were any correlations to the same component with respect other attacks. I'm not going to suggest it's the same thing (because it's not) but the idea I was working from is similar to pulling a device apart and breaking down its components in order to identify the builder, or at the very least to learn a little bit more that could be applied to an overall threat intel picture.
Here's what we're looking at...in this case, the .pub files are arriving as email attachments, so you have a sender email address, contents of the email header and body, attachment name, etc. All of this helps us build a picture of the threat. Is the content of the email body pretty generic, or is it specifically written to illicit the desired response (opening the attachment) from the user to whom it was sent? Is it targeted? Is it spam or spear-phishing/whaling?
Then we have what occurs after the user opens the attachment; in some cases, we see that files are downloaded and native commands (i.e., bitsadmin.exe) are executed on the system. Some folks have already been researching those areas or aspects of the overall attacks, and started pulling together things such as sites and files accessed by bitsadmin.exe, etc.
Knowing a bit about the file format of the attachment, I thought I'd take an approach similar to what Kevin talked about in his Continuing Evolution of Samas Ransomware blog post. In particular, why not see if I could develop some information that could be mapped to other aspects of the attacks? Folks were already using Didier's oledump.py to extract information about the .pub files, as well as extract the embedded macros, but I wanted to take a bit of a closer look at the file structure itself. As such, I collected a number of .pub files that were known to be malicious in nature and contain embedded macros (using open sources), and began to run the tool I'd written (oledmp.pl) across the various files, looking not only for commonalities, but differences, as well. Here are some of the things I found:
All of the files had different time stamps; within each file, all of the "directory" streams had the same time stamp. For example, from one file:
Root Entry Date: 30.06.2016, 22:03:16
All of the "directory" streams below the Root Entry had the same time stamp, as illustrated in the following image (different file from the one with 30 June time stamps):
Some of the files had a populated "Authress:" entry in the SummaryInformation section. However, with the exception of those files, the SummaryInformation and DocumentSummaryInformation streams were blank.
All of the files had Trash sections (again, see the document structure specification) that were blank.
For example, in the image to the left, we see the tool listing the Trash sections and their sizes; for each file examined, the File Space section was all zeros, and the System Space section was all "0xFFFF". Without knowing more about how these sections are managed, it's difficult to determine specifically if this is a result of the file being created by whichever application was used (sort of a 'default' configuration), or if this is the result of an intentional action.
Many (albeit not all) files contained a second stream with an embedded macro. In all cases within the sample set, the stream was named "Module1", and contained an empty function. However, in each case, that empty function had a different name.
Some of the streams of all of the files were identical across the sample set. For example, the \Quill\QuillSub\ \x01CompObj stream for all of the files appears as you see in the image below.
All in all, for me, this was some pretty fascinating work. I'm sure that there may be even more information to collect with a larger sample set. In addition, there's more research to be done...for example, how do these files compare to legitimate, non-malicious Publisher files? What tools can be used to create these files?
I recently posted an update blog that included a brief discussion of a tool I was working on, and why. In short, and due in part to a recently publicized change in tactics, I wanted to dust off some old code I'd written and see what information or intel I could collect.
The tactic I'm referring to involves the use of malware delivered via '.pub' files. I wasn't entirely too interested in this tactic until I found out that .pub (MS Publisher) files are OLE format files.
The code I'm referring to is wmd.pl, something I wrote a while back (according to the header information, the code is just about 10 yrs old!) and was written specifically to parse documents created using older versions of MS Word, specifically those that used OLE.
OLE
The Object Linking and Embedding (OLE) file format is pretty well documented at the MS site, so I won't spend a lot of time discussing the details here. However, I will say that MS has referred to the file format as a "file system within a file", and that's exactly what it is. If you look at the format, there's actually a 'sector allocation table', and it's laid out very similar to the FAT file system. Also, at some levels of the 'file system' structure, there are time stamps, as well. Now, the exact details of when and how these time stamps are created and/or modified (or if they are, at all) isn't exactly clear, but they can serve as an indicator, and something that we can incorporate with other artifacts such that when combining them with context, we can get a better idea of their validity and value.
For most of us who have been in the IR business for a while, when we hear "OLE", we think of the Blair document, and in particular, the file format used for pre-2007 versions of MS Office documents. Further, many of us thought that with the release of Office 2007, the file format was going to disappear, and at most, we'd maybe have to dust off some tools or analysis techniques at some point in the future. Wow, talk about a surprise! Not only did the file format not disappear, as of Windows 7, we started to see it being used in more and more of the artifacts we were seeing on the system. Take a look at the OLE Compound File page on the ForensicWiki for a list of the files on Windows systems that utilize the OLE file format (i.e., StickyNotes, auto JumpLists, etc.). So, rather than "going away", the file format has become more pervasive over time. This is pretty fascinating, particularly if you have a detailed understanding of the file structure format. In most cases when you're looking at these files on a Windows system, the contents of the files will be what you're most interested in; for example, with automatic Jump Lists, we may be most interested in the DestList stream. However, when an OLE compound file is created off of the system, perhaps through the use of an application, we (as analysts) would be very interested in learning all we can about the file itself.
Tools
So, the idea behind the tool I was working on was to pull apart one component of the overall attack to see if there were any correlations to the same component with respect other attacks. I'm not going to suggest it's the same thing (because it's not) but the idea I was working from is similar to pulling a device apart and breaking down its components in order to identify the builder, or at the very least to learn a little bit more that could be applied to an overall threat intel picture.
Here's what we're looking at...in this case, the .pub files are arriving as email attachments, so you have a sender email address, contents of the email header and body, attachment name, etc. All of this helps us build a picture of the threat. Is the content of the email body pretty generic, or is it specifically written to illicit the desired response (opening the attachment) from the user to whom it was sent? Is it targeted? Is it spam or spear-phishing/whaling?
Then we have what occurs after the user opens the attachment; in some cases, we see that files are downloaded and native commands (i.e., bitsadmin.exe) are executed on the system. Some folks have already been researching those areas or aspects of the overall attacks, and started pulling together things such as sites and files accessed by bitsadmin.exe, etc.
Knowing a bit about the file format of the attachment, I thought I'd take an approach similar to what Kevin talked about in his Continuing Evolution of Samas Ransomware blog post. In particular, why not see if I could develop some information that could be mapped to other aspects of the attacks? Folks were already using Didier's oledump.py to extract information about the .pub files, as well as extract the embedded macros, but I wanted to take a bit of a closer look at the file structure itself. As such, I collected a number of .pub files that were known to be malicious in nature and contain embedded macros (using open sources), and began to run the tool I'd written (oledmp.pl) across the various files, looking not only for commonalities, but differences, as well. Here are some of the things I found:
All of the files had different time stamps; within each file, all of the "directory" streams had the same time stamp. For example, from one file:
Root Entry Date: 30.06.2016, 22:03:16
All of the "directory" streams below the Root Entry had the same time stamp, as illustrated in the following image (different file from the one with 30 June time stamps):
.pub file structure listing |
Some of the files had a populated "Authress:" entry in the SummaryInformation section. However, with the exception of those files, the SummaryInformation and DocumentSummaryInformation streams were blank.
All of the files had Trash sections (again, see the document structure specification) that were blank.
Trash Sections Listed |
Many (albeit not all) files contained a second stream with an embedded macro. In all cases within the sample set, the stream was named "Module1", and contained an empty function. However, in each case, that empty function had a different name.
Some of the streams of all of the files were identical across the sample set. For example, the \Quill\QuillSub\ \x01CompObj stream for all of the files appears as you see in the image below.
\Quill\QuillSub\ \x01CompObj stream |
All in all, for me, this was some pretty fascinating work. I'm sure that there may be even more information to collect with a larger sample set. In addition, there's more research to be done...for example, how do these files compare to legitimate, non-malicious Publisher files? What tools can be used to create these files?