Automated Binary Analysis Framework using Viper

Overview
In the past, I've had access to some pretty cool, privately hosted, modular, and highly advanced binary analysis frameworks. Including static triage frameworks at Mandiant and commercial offerings such as ReversingLabs, as well as many expensive commercial sandboxing solutions, such as FireEye, Cyphort, and JoeSandbox. I thought that hosting such a cluster of static and dynamic analysis tools could be really beneficial to both speeding up my CTF team and private research. So I decided to host a collection of open-source, automated static and dynamic binary analysis tools which I will be covering today. These tools can be used for malware analysis, CTF binary analysis, or simply to aid in reversing engineering binary files. At least at first, this will be a private, invite-only framework which we will use to instrument multiple tools, automate analysis tasks, and practice reverse engineering. These tools include a myriad of static analysis, dynamic analysis, and reporting tools, which we will cover in-depth in the following article. My end goal is for a useful set of internal tools and a great learning experience in both reverse engineering and building a micro-service architecture for automated analysis.

I went with three main platforms, each hosted on separate instances. For my first pass at this binary analysis cluster I chose: Viper, Cuckoo, and MISP. Viper is the main interface and database for storing the binary files. It allows us to automatically run other python modules on import, thus driving the core of all of our automated analysis. Not only is this the main workhorse for our integrations but it also drives a lot of the modular static analysis operations, as those tools and libraries are located conveniently on the same machine. I am using Cuckoo for driving my dynamic sandbox analysis, allowing us to run binaries, getting us rich virtual machine introspection, network traffic, and even dumping memory. Finally we have MISP as a user interface and for integrating threat intelligence data programmatically. MISP can subscribe to feeds and enrich our other tools such as Viper and Cuckoo, allowing us to incorporate threat intelligence feeds in a manageable way.

Server Setup
I used a new Ubuntu 16 image for each machine and built them on EC2 in AWS. I also set up domain names for them which is what are in the various config files, allowing me to turn them on and off and only have to update the DNS entries. For developer ergonomics, tmux and zsh have been added with some custom configs. One of the big ideas here is that people can use these servers collaboratively in tmux sessions such that one person can reverse engineer with a tool like radare2 and another person can learn by watching the tmux session.
Extra tools and confs installed: tmux, zsh + oh-my-zsh,  vim + vim-plug, and fzf
Viper
Viper is an amazing binary analysis platform and framework written in python. It's extensible, making it simple to integrate and easy to write new modules for. Further, we can automate abstract python workers to launch operations on new files we import and Viper will save the analysis for review later. Some things we get stock w/ Viper include: Generic hashing, fuzzy hashing (pydeep), metadata (exiftool), extracting special strings and addresses (IPv4/v6, domain names), detecting known shellcode patterns, sending samples to Cuckoo Sandbox, launching disassemblers (IDA Pro or radare2), searching on Malwr/Anubis/VirusTotal, XOR searches, YARA scans, detecting common packers (PEiD), imphashing, digital signature analysis, and compile times to name a few fun ones. It also includes analysis options for tons of non-executable file formats, such as HTML operations, extracting embedded scripts, java applet analysis, flash object analysis, image analysis, pdf analysis, tons of OLE analysis and many more binary format functions. It also includes the ability to parse many Java formats, email MIMES, and some antivirus files. I also added the ClamAV service integration, VirusTotal, and Malwr integrations for some extra threat intel, however it seems like Malwrs APIs were buggy at the time of writing this, which meant I had to set up my own private sandboxing solutions.
viper_modules.png
One of the parts I was most excited about was using the Viper framework to drive YARA scanning and auto tag my files based on custom YARA rules I've been working on. These YARA rules detect special properties of the binaries, such as: type, language, crypto primitives, vulnerable functions, and more. I've expanded on the idea  of using YARA rules for binary analysis here. Lastly, be sure to check out the youtube preso by Paul Melson, at the end of that post I linked, to see Viper’s YARA scanning in action.
viper_automations.png
I think the real power of Viper comes in automating many of these modules, as they were already built in and could be configured to run on import. In the image above you can see many of these modules being driven on import as well as the YARA scanning, adding tags and analysis notes to the binaries automatically. Further, I found a lot of custom scripts in the community and I was even able to write some of my own simple modules as a proof of concept. Future plans are to integrate more of the tools listed below into viper modules, which is super easy with any python library. I've also set up many of the threat intel integrations, like VirusTotal, Malwr (although this seems down atm), ShadowServer, XForce, and MISP integrations, however we will go over those in their own posts below. These were as simple as configuring them in the conf file, and making sure Viper had direct firewall access to the other's services. Below you can see my sample modules in action, but be sure to check out the code to see how easy these really were.
xforce.png
I will primarily be running the Viper server using it’s API functionality, in this way I can use local scripts to submit binaries to the platform and eventually integrate tools (like a chatbot or collectors) to the Viper platform to drive it's automation. We can even use custom scripts with the API to drive any of our automations remotely. We then have a private collection of binaries which are automatically classified, tagged, searchable, and we can interact with them in a remote and manageable way.

Lastly, I have some other tools installed on this box for collaborative reversing, most notably Radare2 (for disassembly) and Manticore (for symbolic execution). The idea here is to write scripts using the Manticore API or r2pipe to integrate these deeper static analysis functions into Viper over time. I added Kaitai-struct as well, to have a reference to the grammar and protocols of the various binaries we may encounter. The idea here is it can give us the grammar or protocol to binary structures so we can see if our samples are different than normal. I've also added the Pure Funky Magic scripts for awesome, modular, pythonic data transformation libraries.

Cuckoo
This is our main cluster for dynamically sandboxing binaries. I was really impressed with the flexibility and amount of customization you can do with Cuckoo. I found it far more customizable than any of the proprietary tools I've used in the past. Following these install guides was extremely helpful. The individual analysis vm setup was especially important because you had to ensure the appropriate vulnerable software was installed, the agent would launch on boot, and the virtual machines had the proper network configuration. Once my vms were all configured to perfection, I would export them as an OVA and ship them up to the cloud. In the future, I want to test out setting up Cuckoo images using vmcloak. Our current setup is a nested Cuckoo infra built on AWS, which limits us to 32bit images (see the video at the bottom of the post for more details!). This setup involved many lessons learned, and I plan on rebuilding this infrastructure on ESXI in the near future, using something like Cuckoo Modified instead. That said, if you ever find yourself using Cuckoo with VirtualBox, the following commands below are pretty useful:

Virtualbox helpful commands
vboxmanage import “your_ova_name”
vboxmanage list vms
vboxmanage list runningvms
vboxheadless --startvm “your_vm_name”
vboxmanage snapshot "your_vm_name" take "vm_snapshot_name" --pause   
vboxmanage controlvm "your_vm_name" poweroff  
vboxmanage snapshot "your_vm_name" restorecurrent  
vboxmanage snapshot "your_vm_name" list
vboxmanage snapshot “your_vm_name” restore “vm_snapshot_name”
vboxmanage unregistervm “your_vm_name” --delete
vboxmanage guestproperty enumerate “your_vm_name”
vboxmanage modifyvm “your_vm_name” --hostonlyadapter1 “vm_network”
vboxmanage modifyvm “your_vm_name” --nic1 hostonly

I am running Cuckoo using the API functionality to integrate it to Viper. It’s important to note that the API by default has no auth, so we are using a private network and appropriate firewall rules, for a simple yet secure setup. You can also write your own Cuckoo analysis scripts and integrate it to your MISP instance for threat Intel enrichment.
cuckoo.png
Misp
We will be using MISP for information sharing and data normalization. This is all made, configured, and integrated to Viper such that we can easily find new samples or share our results. The real benefit here is subscribing to other feeds to get that collaborative threat intelligence and apply that to our tools. For more on how to use MISP and Viper together, check out these posts. This is a great way to manage private threat intel, public feeds, and our own analysis reports in an inexpensive way.
viper_misp.png
Other Interesting Projects
There are a number of other binary analysis frameworks which looked interesting that I came across in my search of building this open source analysis platform. I ultimately went with the solutions above because they are all Python and easy to integrate with one another. That said, the following may work for your solutions, so be sure to check out some of these other projects. Mastiff is an older binary analysis framework, which looks promising in terms of functionality but looks abandoned in terms of active dev. FAME is a cool malware analysis framework, however it looked geared purely towards malware analysis and thus lacks a lot of general analysis features which I valued for CTFs or reverse engineering. StoQ looked very promising as a binary analysis framework for enterprise scale, with distributed worker queues and collectors. StoQ has many overlaps with Viper, it includes many pay-service modules, looks well maintained, and appears high quality. LaikaBOSS is a recursive and modular file analysis framework. LaikaBOSS looks very promising, however it also looks complicated and fairly weak on documentation. CRITS is a threat sharing information platform similar to MISP, however I choose to go with MISP for the existing integrations with the other frameworks I am using.

Finally, I like this following presentation which goes over some of the things you may want to automatically extract from your binaries, either statically and dynamically: