Simplify - Generic Android Deobfuscator
Simplify virtually executes an app to empathise its demeanour in addition to so tries to optimize the code so that it behaves identically but is easier for a human to understand. Each optimization type is elementary in addition to generic, so it doesn't affair what the specific type of obfuscation is used.
Before in addition to After
The code on the left is a decompilation of an obfuscated app, in addition to the code on the correct has been deobfuscated.
There are 3 parts to the project: smalivm, simplify, in addition to the present app.
- smalivm: Provides a virtual machine sandbox for executing Dalvik methods. After executing a method, it returns a graph containing all possible register in addition to degree values for every execution path. It industrial plant fifty-fifty if exactly about values are unknown, such every bit file in addition to network I/O. For example, whatever
if
orswitch
conditional amongst an unknown value results inward both branches existence taken. - simplify: Analyzes the execution graphs from smalivm in addition to applies optimizations such every bit constant propagation, dead code removal, unreflection, in addition to exactly about peephole optimizations. These are fairly simple, but when applied together repeatedly, they'll decrypt strings, take away reflection, in addition to greatly simplify code. It does not rename methods in addition to classes.
- demoapp: Contains simple, heavily commented examples for using smalivm inward your ain project. If you're edifice something that needs to execute Dalvik code, depository fiscal establishment check it out.
Usage
usage: coffee -jar simplify.jar [options] deobfuscates a dalvik executable -et,--exclude-types Exclude classes in addition to methods which include REGEX, eg: "com/android", applied afterwards include-types -h,--help Display this message -ie,--ignore-errors Ignore errors land executing in addition to optimizing methods. This may Pb to unexpected behavior. --include-support Attempt to execute in addition to optimize classes inward Android back upwardly library packages, default: faux -it,--include-types Limit execution to classes in addition to methods which include REGEX, eg: ";->targetMethod\(" --max-address-visits Give upwardly executing a method afterwards visiting the same address northward times, limits loops, default: 10000 --max-call-depth Do non telephone telephone methods afterwards reaching a telephone telephone depth of N, limits recursion in addition to long method chains, default: 50 --max-executi on-time Give upwardly executing a method afterwards northward seconds, default: 300 --max-method-visits Give upwardly executing a method afterwards executing northward instructions inward that method, default: ane M 1000 --max-passes Do non run optimizers on a method to a greater extent than than northward times, default: 100 -o,--output Output simplified input to FILE --output-api-level Set output DEX API compatibility to LEVEL, default: xv -q,--quiet Be serenity --remove-weak Remove code fifty-fifty if in that location are weak side effects, default: truthful -v,--verbose Set verbosity to LEVEL, default: 0
Building
Building requires the Java Development Kit 8 (JDK) to live on installed.
Because this projection contains submodules for Android frameworks, either clone amongst
--recursive
:git clone --recursive https://github.com/CalebFenton/simplify.git
git submodule update --init --recursive
./gradlew fatjar
simplify/build/libs/
. You tin examine it's working past times simplifying the provided obfuscated illustration app. Here's how you'd run it (you may ask to alter simplify.jar
):java -jar simplify/build/libs/simplify.jar -it 'org/cf/obfuscated' -et 'MainActivity' simplify/obfuscated-app.apk
Troubleshooting
If Simplify fails, elbow grease these recommendations, inward order:
- Only target a few methods or classes past times using
-it
option. - If failure is because of maximum visits exceeded, elbow grease using higher
--max-address-visits
,--max-call-depth
, in addition to--max-method-visits
. - Try amongst
-v
or-v 2
in addition to study the number amongst the logs in addition to a hash of the DEX or APK. - Try again, but practise non interruption oculus contact. Simplify tin feel fear.
Could non abide by tools.jar. Please depository fiscal establishment check that C:\Program Files\Java\jre1.8.0_151 contains a valid JDK installation.This agency Gradle is unable to abide by a proper JDK path. Make certain the JDK is installed, laid the
JAVA_HOME
surroundings variable to your JDK path, in addition to brand certain to closed in addition to re-open the ascendancy prompt you lot role to build.Contributing
Don't live on shy. I retrieve virtual execution in addition to deobfuscation are fascinating problems. Anyone who's interested is automatically cool in addition to contributions are welcome, fifty-fifty if it's exactly to create a typo. Feel costless to enquire questions inward the issues in addition to submit push clitoris requests.
Reporting Issues
Please include a link to the APK or DEX in addition to the total ascendancy you're using. This makes it much easier to reproduce (and therefore fix) your issue.
If you lot can't part the sample, please include the file hash (SHA1, SHA256, etc).
Optimization Strategies
Constant Propagation
If an op places a value of a type which tin live on turned into a constant such every bit a string, number, or boolean, this optimization volition supplant that op amongst the constant. For example:
const-string v0, "VGVsbCBtZSBvZiB5b3VyIGhvbWV3b3JsZCwgVXN1bC4=" invoke-static {v0}, Lmy/string/Decryptor;->decrypt(Ljava/lang/String;)Ljava/lang/String; # Decrypts to: "Tell me of your homeworld, Usul." move-result v0
v0
. Since strings are "constantizable", the move-result v0
tin live on replaced amongst a const-string
:const-string v0, "VGVsbCBtZSBvZiB5b3VyIGhvbWV3b3JsZCwgVXN1bC4=" invoke-static {v0}, Lmy/string/Decryptor;->decrypt(Ljava/lang/String;)Ljava/lang/String; const-string v0, "Tell me of your homeworld, Usul."
Dead Code Removal
Code is dead if removing it cannot perhaps alter the demeanour of the app. The nigh obvious illustration is if the code is unreachable, e.g.
if (false) { // dead }
). If code is reachable, it may live on considered dead if it doesn't touching whatever solid soil exterior of the method, i.e. it has no side effect. For example, code may non touching the render value for the method, alter whatever degree variables, or perform whatever IO. This is a hard to create upwardly one's heed inward static analysis. Luckily, smalivm doesn't receive got to live on clever. It exactly stupidly executes everything it tin in addition to assumes in that location are side effects if it can't live on sure. Consider the illustration from Constant Propagation:const-string v0, "VGVsbCBtZSBvZiB5b3VyIGhvbWV3b3JsZCwgVXN1bC4=" invoke-static {v0}, Lmy/string/Decryptor;->decrypt(Ljava/lang/String;)Ljava/lang/String; const-string v0, "Tell me of your homeworld, Usul."
invoke-static
no longer affects the render value of the method in addition to let's assume it doesn't practise anything weird similar write bytes to the file organization or a network socket so it has no side effects. It tin only live on removed.const-string v0, "VGVsbCBtZSBvZiB5b3VyIGhvbWV3b3JsZCwgVXN1bC4=" const-string v0, "Tell me of your homeworld, Usul."
const-string
assigns a value to a register, but that value is never used, i.e. the assignment is dead. It tin also live on removed.const-string v0, "Tell me of your homeworld, Usul."
Unreflection
One major challenge amongst static analysis of Java is reflection. It's exactly non possible to know the arguments are for reflection methods without doing careful information menses analysis. There are smart, clever ways of doing this, but smalivm does it past times exactly executing the code. When it finds a reflected method invocation such as:
invoke-virtual {v0, v1, v2}, Ljava/lang/reflect/Method;->invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
v0
, v1
, in addition to v2
. If it's certain what the values are, it tin supplant the telephone telephone to Method.invoke()
amongst an actual non-reflected method invocation. The same applies for reflected plain in addition to degree lookups.Peephole
For everything that doesn't jibe cleanly into a detail category, there's peephole optimizations. This includes removing useless
check-cast
ops, replacing Ljava/lang/String;->
calls amongst const-string
, in addition to so on.Deobfuscation Example
Before Optimization
.method populace static test1()I .locals 2 new-instance v0, Ljava/lang/Integer; const/4 v1, 0x1 invoke-direct {v0, v1}, Ljava/lang/Integer;->(I)V invoke-virtual {v0}, Ljava/lang/Integer;->intValue()I move-result v0 render v0 .end method
v0 = 1
.After Constant Propagation
.method populace static test1()I .locals 2 new-instance v0, Ljava/lang/Integer; const/4 v1, 0x1 invoke-direct {v0, v1}, Ljava/lang/Integer;->(I)V invoke-virtual {v0}, Ljava/lang/Integer;->intValue()I const/4 v0, 0x1 render v0 .end method
move-result v0
is replaced amongst const/4 v0, 0x1
. This is because in that location is exclusively ane possible render value for intValue()I
in addition to the render type tin live on made a constant. The arguments v0
in addition to v1
are unambiguous in addition to practise non change. That is to say, there's a consensus of values for every possible execution path at intValue()I
. Other types of values that tin live on turned into constants:- numbers -
const/4
,const/16
, etc. - strings -
const-string
- classes -
const-class
After Dead Code Removal
.method populace static test1()I .locals 2 const/4 v0, 0x1 render v0 .end method
const/4 v0, 0x1
does non touching solid soil exterior of the method (no side-effects), it tin live on removed without changing behavior. If in that location was a method telephone telephone that wrote something to the file organization or network, it couldn't live on removed because it affects solid soil exterior the method. Or if test()I
took a mutable argument, such every bit a LinkedList
, whatever instructions that accessed it couldn't live on considered dead.Other examples of dead code:
- unreferenced assignments - assigning registers in addition to non using them
- unreached / unreachable instructions -
if (false) { dead_code(); }
Further Reading
- Dalvik Virtual Execution amongst SmaliVM
- Guillot, Yoann, in addition to Alexandre Gazet. "Automatic Binary Deobfuscation." Journal inward Computer Virology 6.3 (2010): 261-76
- Unicorn - The ultimate CPU emulator
- Babak Yadegari, Saumya Debray. "Symbolic Execution of Obfuscated Code"
- Success stories: