Unzip Gzip

New intrusion analysts will find that web traffic is increasingly compressed due to more and more complex sites with lots of multimedia content. You might use wget to pull down a page in your investigation or use something like Spondulas and end up with a file of mostly "garbage", like this one (intentionally shortened).

HTTP/1.1 403 Forbidden
Server: cloudflare-nginx
Date: Tue, 27 April 2014 18:44:27 GMT
Content-Type: text/html; charset=UTF-8
Set-Cookie: __cfduid=dda0ed9839b8fbbaadbee565b711a05951400006667162; expires=Mon, 23-Dec-2019 23:50:00 GMT; path=/; domain=.yadayada.com; HttpOnly
Cache-Control: max-age=10
Expires: Tue, 13 May 2014 18:44:37 GMT
CF-RAY: 12a101e5c86a09ac-ORD
Content-Encoding: gzip
Transfer-Encoding: chunked
Connection: keep-alive

4e5
^_<8b>^H^@^@^@^@^@^@^CWmo6^P_j@^B^T^Gk<9b>&<96><86>"M|Z<8a>0hl1H<95>]<8e>H<93>6<80><8b>u{{x_O/>;#<85>+e6<9a><<98>^S^Yy)#^S@d<91>^R^A<89><96><80><88>H^Vi^D<8a>=BG^_Ab<89>[^V^GvBx^X
=PG^O<80>Z^V+vE<8d>Bi^_^@<96><8f>^SNB<8a>s<96>^D/Ru\<81>^S<96>kESCs]<92>E
...
...
...
0

Notice we see the Content-Encoding field tells us gzip compression is in use.

The Moloch packet capture program has a built-in gzip decompressor, but if you don't have a tool that will do this automatically, it's easily accomplished manually. Open the file in vi or some other text editor and remove the http header, blanks lines, etc down to the block of text (that starts with the first caret). Save the file with a .gz extension or rename it. Then just run gzip with the -d parameter (to decompress) on the filename. gzip -d
The resulting file should now be unzipped and readable (and no longer have the .gz extension).