Simplify - Generic Android Deobfuscator


Simplify virtually executes an app to empathise its demeanour in addition to so tries to optimize the code so that it behaves identically but is easier for a human to understand. Each optimization type is elementary in addition to generic, so it doesn't affair what the specific type of obfuscation is used.

Before in addition to After
The code on the left is a decompilation of an obfuscated app, in addition to the code on the correct has been deobfuscated.



Overview
There are 3 parts to the project: smalivm, simplify, in addition to the present app.
  1. smalivm: Provides a virtual machine sandbox for executing Dalvik methods. After executing a method, it returns a graph containing all possible register in addition to degree values for every execution path. It industrial plant fifty-fifty if exactly about values are unknown, such every bit file in addition to network I/O. For example, whatever if or switch conditional amongst an unknown value results inward both branches existence taken.
  2. simplify: Analyzes the execution graphs from smalivm in addition to applies optimizations such every bit constant propagation, dead code removal, unreflection, in addition to exactly about peephole optimizations. These are fairly simple, but when applied together repeatedly, they'll decrypt strings, take away reflection, in addition to greatly simplify code. It does not rename methods in addition to classes.
  3. demoapp: Contains simple, heavily commented examples for using smalivm inward your ain project. If you're edifice something that needs to execute Dalvik code, depository fiscal establishment check it out.

Usage
usage: coffee -jar simplify.jar  [options] deobfuscates a dalvik executable  -et,--exclude-types    Exclude classes in addition to methods which include REGEX, eg: "com/android", applied afterwards include-types  -h,--help                       Display this message  -ie,--ignore-errors             Ignore errors land executing in addition to optimizing methods. This may Pb to unexpected behavior.     --include-support            Attempt to execute in addition to optimize classes inward Android back upwardly library packages, default: faux  -it,--include-types    Limit execution to classes in addition to methods which include REGEX, eg: ";->targetMethod\("     --max-address-visits      Give upwardly executing a method afterwards visiting the same address northward times, limits loops, default: 10000     --max-call-depth          Do non telephone telephone methods afterwards reaching a telephone telephone depth of N, limits recursion in addition to long method chains, default: 50     --max-executi   on-time      Give upwardly executing a method afterwards northward seconds, default: 300     --max-method-visits       Give upwardly executing a method afterwards executing northward instructions inward that method, default: ane M 1000     --max-passes              Do non run optimizers on a method to a greater extent than than northward times, default: 100  -o,--output               Output simplified input to FILE     --output-api-level    Set output DEX API compatibility to LEVEL, default: xv  -q,--quiet                      Be serenity     --remove-weak                Remove code fifty-fifty if in that location are weak side effects, default: truthful  -v,--verbose             Set verbosity to LEVEL, default: 0

Building
Building requires the Java Development Kit 8 (JDK) to live on installed.
Because this projection contains submodules for Android frameworks, either clone amongst --recursive:
git clone --recursive https://github.com/CalebFenton/simplify.git
Or update submodules at whatever fourth dimension with:
git submodule update --init --recursive
Then, to create a unmarried appal which contains all dependencies:
./gradlew fatjar
The Simplify appal volition live on inward simplify/build/libs/. You tin examine it's working past times simplifying the provided obfuscated illustration app. Here's how you'd run it (you may ask to alter simplify.jar):
java -jar simplify/build/libs/simplify.jar -it 'org/cf/obfuscated' -et 'MainActivity' simplify/obfuscated-app.apk
To empathise what's getting deobfuscated, depository fiscal establishment check out Obfuscated App's README.

Troubleshooting
If Simplify fails, elbow grease these recommendations, inward order:
  1. Only target a few methods or classes past times using -it option.
  2. If failure is because of maximum visits exceeded, elbow grease using higher --max-address-visits, --max-call-depth, in addition to --max-method-visits.
  3. Try amongst -v or -v 2 in addition to study the number amongst the logs in addition to a hash of the DEX or APK.
  4. Try again, but practise non interruption oculus contact. Simplify tin feel fear.
If edifice on Windows, in addition to edifice fails amongst an fault similar to:
Could non abide by tools.jar. Please depository fiscal establishment check that C:\Program Files\Java\jre1.8.0_151 contains a valid JDK installation.
This agency Gradle is unable to abide by a proper JDK path. Make certain the JDK is installed, laid the JAVA_HOME surroundings variable to your JDK path, in addition to brand certain to closed in addition to re-open the ascendancy prompt you lot role to build.

Contributing
Don't live on shy. I retrieve virtual execution in addition to deobfuscation are fascinating problems. Anyone who's interested is automatically cool in addition to contributions are welcome, fifty-fifty if it's exactly to create a typo. Feel costless to enquire questions inward the issues in addition to submit push clitoris requests.

Reporting Issues
Please include a link to the APK or DEX in addition to the total ascendancy you're using. This makes it much easier to reproduce (and therefore fix) your issue.
If you lot can't part the sample, please include the file hash (SHA1, SHA256, etc).

Optimization Strategies

Constant Propagation
If an op places a value of a type which tin live on turned into a constant such every bit a string, number, or boolean, this optimization volition supplant that op amongst the constant. For example:
const-string v0, "VGVsbCBtZSBvZiB5b3VyIGhvbWV3b3JsZCwgVXN1bC4=" invoke-static {v0}, Lmy/string/Decryptor;->decrypt(Ljava/lang/String;)Ljava/lang/String; # Decrypts to: "Tell me of your homeworld, Usul." move-result v0
In this example, an encrypted string is decrypted in addition to placed into v0. Since strings are "constantizable", the move-result v0 tin live on replaced amongst a const-string:
const-string v0, "VGVsbCBtZSBvZiB5b3VyIGhvbWV3b3JsZCwgVXN1bC4=" invoke-static {v0}, Lmy/string/Decryptor;->decrypt(Ljava/lang/String;)Ljava/lang/String; const-string v0, "Tell me of your homeworld, Usul."

Dead Code Removal
Code is dead if removing it cannot perhaps alter the demeanour of the app. The nigh obvious illustration is if the code is unreachable, e.g. if (false) { // dead }). If code is reachable, it may live on considered dead if it doesn't touching whatever solid soil exterior of the method, i.e. it has no side effect. For example, code may non touching the render value for the method, alter whatever degree variables, or perform whatever IO. This is a hard to create upwardly one's heed inward static analysis. Luckily, smalivm doesn't receive got to live on clever. It exactly stupidly executes everything it tin in addition to assumes in that location are side effects if it can't live on sure. Consider the illustration from Constant Propagation:
const-string v0, "VGVsbCBtZSBvZiB5b3VyIGhvbWV3b3JsZCwgVXN1bC4=" invoke-static {v0}, Lmy/string/Decryptor;->decrypt(Ljava/lang/String;)Ljava/lang/String; const-string v0, "Tell me of your homeworld, Usul."
In this code, the invoke-static no longer affects the render value of the method in addition to let's assume it doesn't practise anything weird similar write bytes to the file organization or a network socket so it has no side effects. It tin only live on removed.
const-string v0, "VGVsbCBtZSBvZiB5b3VyIGhvbWV3b3JsZCwgVXN1bC4=" const-string v0, "Tell me of your homeworld, Usul."
Finally, the source const-string assigns a value to a register, but that value is never used, i.e. the assignment is dead. It tin also live on removed.
const-string v0, "Tell me of your homeworld, Usul."
Huzzah!

Unreflection
One major challenge amongst static analysis of Java is reflection. It's exactly non possible to know the arguments are for reflection methods without doing careful information menses analysis. There are smart, clever ways of doing this, but smalivm does it past times exactly executing the code. When it finds a reflected method invocation such as:
invoke-virtual {v0, v1, v2}, Ljava/lang/reflect/Method;->invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
It tin know the values of v0, v1, in addition to v2. If it's certain what the values are, it tin supplant the telephone telephone to Method.invoke() amongst an actual non-reflected method invocation. The same applies for reflected plain in addition to degree lookups.

Peephole
For everything that doesn't jibe cleanly into a detail category, there's peephole optimizations. This includes removing useless check-cast ops, replacing Ljava/lang/String;-> calls amongst const-string, in addition to so on.

Deobfuscation Example

Before Optimization
.method populace static test1()I     .locals 2      new-instance v0, Ljava/lang/Integer;     const/4 v1, 0x1     invoke-direct {v0, v1}, Ljava/lang/Integer;->(I)V      invoke-virtual {v0}, Ljava/lang/Integer;->intValue()I     move-result v0      render v0 .end method
All this does is v0 = 1.

After Constant Propagation
.method populace static test1()I     .locals 2      new-instance v0, Ljava/lang/Integer;     const/4 v1, 0x1     invoke-direct {v0, v1}, Ljava/lang/Integer;->(I)V      invoke-virtual {v0}, Ljava/lang/Integer;->intValue()I     const/4 v0, 0x1      render v0 .end method
The move-result v0 is replaced amongst const/4 v0, 0x1. This is because in that location is exclusively ane possible render value for intValue()I in addition to the render type tin live on made a constant. The arguments v0 in addition to v1 are unambiguous in addition to practise non change. That is to say, there's a consensus of values for every possible execution path at intValue()I. Other types of values that tin live on turned into constants:
  • numbers - const/4, const/16, etc.
  • strings - const-string
  • classes - const-class

After Dead Code Removal
.method populace static test1()I     .locals 2      const/4 v0, 0x1      render v0 .end method
Because the code to a higher identify const/4 v0, 0x1 does non touching solid soil exterior of the method (no side-effects), it tin live on removed without changing behavior. If in that location was a method telephone telephone that wrote something to the file organization or network, it couldn't live on removed because it affects solid soil exterior the method. Or if test()I took a mutable argument, such every bit a LinkedList, whatever instructions that accessed it couldn't live on considered dead.
Other examples of dead code:
  • unreferenced assignments - assigning registers in addition to non using them
  • unreached / unreachable instructions - if (false) { dead_code(); }

Further Reading