What Is Zip Bomb
A zip bomb, also known as a decompression bomb (or the 'Zip of Death' for the overly dramatic ones), is a malicious archive file designed to crash or render useless the program or system reading it. It is often employed to disable antivirus software, in order to create an opening for more traditional viruses. Rather than hijacking the normal operation of the program, a zip bomb allows the program to work as intended, but the archive is carefully crafted so that unpacking it (e.g. by a virus scanner in order to scan for viruses) requires inordinate amounts of time, disk space or memory.
The classic zip bomb is a tiny zip file, most are measured in kilobytes. However, when this file is unzipped its contents are more than what the system can handle (usually up to Petabyte, i.e 1000 Terabyte. Some go up to exabytes too). Yes, we're talking about stuffing exabytes of data into kilobytes. In my view, this ingenious little trick is the product of "pure hacker mentality". In essence, it's nothing like phishing or session hijacking or anything else that has put a bad name to "hackers". It's a simple creative solution, an exploited loophole which truly shows: "Where there's a will, there's a way". To understand how it works, we have to take a little detour to see how data compression works (WinZip, WinRAR etc.)
Various compression software and tools make use of what's called "Lossless compression algorithms". As the name suggests, these algorithms strive to compress files without any loss of information. Clearly, when we compress a file we'd definitely want to get it back in the same shape after decompressing. These algorithms usually exploit statistical redundancy in such a way as to represent the sender's data more concisely without error. In English now: We know that the computer only understands 0's and 1's, So every single program or any data stored in your computer is actually just a series of 1's and 0's (Binary form). Let's take an example that's not entirely correct but will help you understand the principle. Say, we've got a file which after being converted to binary language looks like "1110000101". Remember statistical redundancy that was mentioned earlier? Try to spot it in this string (1110000101). Statistical redundancy basically means that the same thing is repeated over and over again. In this string we see that there are three 1's followed by four 0's. Now take a look at this string: "3140101". What just happened here is compression. We can simply write a program that codes and decodes files as above (Software like WinZip use a fancy form and overly complicated form of what we did above). If the program finds repeating patterns, like a lot of 1's together, it may simply replace all those 1's by another number. Another example, we find "111111111" somewhere in a program. That's nine 1's in a row. What if we replace it by "91"? We can simply code our program to replace a "91" by writing "1" nine times, effectively reversing the process. Again, while decoding, if the program encounters any number other than 1 or 0, in our case 9, it can be instructed to write the successive number, in our case 1, 9 times. So "91" gets converted back to "111111111". That's lossless compression.
What about the previous string (3140101)? On uncompressing this, we get back 1110000101, that is, the original string. Like I said, this example is not entirely accurate. Note that the computer only understands binary. Everything that you'll ever do on a computer will have, at some point, been converted to binary form. Actually the computer is forced to convert to something other than binary (like English) only for us, dumb humans. We compressed "111111111" by writing "91". But the "9" in the "91" will also have to again be converted into 1's and 0's. So our program is quite buggy. Widely used programs like WinZip, WinRar, PowerISO etc. use various different algorithms for different cases.
Lossless compression is possible because most real-world data has statistical redundancy. Lossless compression schemes are reversible so that the original data can be reconstructed.
However, lossless data compression algorithms will always fail to compress some files. Indeed, any compression algorithm will necessarily fail to compress any data containing no discernible patterns. Attempts to compress data that has been compressed already may actually result in an expansion, as will attempts to compress all but the most trivially encrypted data. This is why if you've ever tried "ZIPing" or "RARing" a file, you would have noticed in some cases it works great while in other cases it may not even reduce the file size by 5%. (WinRAR and WinZIP can be considered the same for (almost) all practical purposes. Their names differ more than their compression abilities. Feel free to use either.)
Now, back to zip bombs. Before taking a deeper look, let's get the basic meaning cleared up. Take a new text file and write '0' a 1000 times. Save it, the file size should be just around 1 kilobyte. Open it up, CTRL+A, CTRL+C,CTRL+V - i.e, copy the whole thing then paste it. Do this ten times. Our file is now around 10kb, and completely made of 0's. Do this a few more times. Faster than your expectations, the file size will quickly climb into megabytes and then gigabytes. In most cases, the notepad (or any text editor) will actually begin to lag since it has a ridiculous amount of 0's open in the window. When that happens, that's your cue to slow down since different operating systems and softwares can have unexpected behaviors when dealing with such large files. Practically, just keep it under a few gigabytes and you should be fine.
(Even this may be too much for some systems, I recommend pausing at about a 100 Mb and then slowly increasing the size. If the lag lasts longer than around 15 seconds, you've reached the limit.) So, we have a 5Gb text file (on an awesome computer) containing nothing but 0's. A little perspective: That's over five-freaking-billion zeros that the innocent little notepad obediently handled in a few seconds. So the next time you're getting annoyed at your browser lagging a little bit, try taking a notebook and write down 5Gb worth of text. It's only fair.
And we're back. What do we do now with that ridiculously large text file? Compress it and watch your seriously underappreciated computer do magic. In the same directory, you'll now see the pointlessly large text file, and alongside it, a zip file that should be under 1 Megabyte. That's like stuffing 5000 balls into the volume of one.
Now, for a deeper look let's check out the most famous zip bomb, the 42.zip file. It is a zip file consisting of 42 kilobytes of compressed data, containing five layers of nested zip files in sets of 16, each bottom layer archive containing a 4.3 gigabyte (4 294 967 295 bytes; ~ 3.99 GiB) file for a total of 4.5 petabytes (4 503 599 626 321 920 bytes; ~ 3.99 PiB) of uncompressed data. This file is still available for download on various websites across the Internet. In many anti-virus scanners, only a few layers of recursion are performed on archives to help prevent attacks that would cause a buffer overflow, an out of memory condition, or exceed an acceptable amount of program execution time. Zip bombs often (if not always) rely on repetition of identical files to achieve their extreme compression ratios. Dynamic programming methods can be employed to limit traversal of such files, so that only one file is followed recursively at each level - effectively converting their exponential growth to linear.
(Here's a small website dedicated solely to the 42.zip, http://www.unforgettable.dk/ . You can ven download a ready-made zip bomb from here. Password for the zip file is '42'. The file has a password to protect users who have ancient antivirus software that is set to automatically scan all downloads)
Now, to avoid giving the wrong impression a myth needs to be busted. "Zip Bomb" is not a very accurate name for this malicious file. If you extract a zip-bomb, it won't do anything to your computer though, it'll just create 16 smaller zip-bombs. If you decompress one of those it'll yield 16 more zip-bombs. As such, they're not going to "explode" when someone opens them, they're just used by malware authors to knock out anti-virus software so malware can work without needing to watch its back. What happens is, a malicious program may plant a zip bomb somewhere near it as bait for AV software. The program will wait until the anti-virus comes up for a routine scan, and it'll wait, "hiding" behind the zip-bomb. When the anti-virus reaches the bomb, it'll try to open it, all in its limited memory. 1 file becomes 16, which becomes 256, and it goes on until the memory is full. In reality though, the computer never runs out of memory because each process is only allowed to use so much memory, after it hits its limit it crashes itself to protect the rest of the computer from an OOM (Out-Of-Memory) event. When this happens to an anti-virus program as it's trying to dig into the file for malware, the software simply crashes and exits, while leaving the rest of the computer unharmed. The malware will detect this, and will then use that opportunity to do whatever it wants, without having to worry about AV software that might be right around the corner. Additionally, the nested archives make it much harder for programs like virus scanners (the main target of these "bombs") to be smart and refuse to unpack archives that are "too large", because until the last level the total amount of data is not "that much", you don't "see" how large the files at the lowest level are until you have reached that level, by which time it is, of course, too late). However, most anti-virus software today recognizes a zip-bomb when it sees one, and will skip over it, alerting the user that the computer might be infected with malware. They usually go down to the second or third level before flagging the file.
Further, You wouldn't notice disk space being used because zip-bombs only decompress in an anti-virus program's memory, not to the disk. Most manual archive-opening programs don't even have a recursive opening mode for this very reason. Plus you also wouldn't notice much extra work by the CPU, because zip-bombs work so fast they can knock out an inadequately protected anti-virus program in seconds, while only using a fraction of the total computer's memory.
The 42.zip is just one example, there are many more like this and you can create your own. A similar file is an XML-based decompression bomb called "billion laughs" (or XML Bomb). Basically it crashes a web browser by causing the XML parser to run out of memory (Again, most browsers today will detect such recursive expansion and simply not try to parse the booby-trapped XML).
There's even a torrent for one of the largest (and smallest) zip bombs on the internet although it seems all the seeders have long gone. It's a 5.61 kilobyte zip file that expands to 4 Zettabytes. It seems to be at the absolute limit of zip bombs. Here's the KickAss Torrent link: http://kickass.to/zip-bomb-insanely-huge-zip-archive-4zb-t2105770.html (As a challenge, you can try replicating it. The file structure has been explained in the link: 8 layers, 32 archives in each layer, each archive containing a 4Gb file)
Let's walk through the process once again. Make a 4 Gb text file full of 0's. Zip it. Let's call it zip1. Create, say 10 copies of this zip file. We have 10 zip1's. Now, zip all ten zip1's again. Call it zip2. We're at the second level now and we can simply continue the process for as long as we like and the zip file will just keep getting bigger and bigger. A common doubt is, How can we create a zip file that opens up to a 4 Zettabyte size without having 4 Zettabyte memory on our computers? Actually, we don't even need 10 Gb for this. We just took a 4Gb text file and zipped it (into zip1). We can simply delete the original text file as it is no longer required. All we need is the first single tiny zip file and it is of this zip file that we create more copies, zip them up, create more copies and zip again and so on.
And that ends the story of the zip bomb. These actually come under the class of logic bombs, which also contains the fork bomb we made using batch files. Yet again, the name DDOS is going to pop up here. Zip bombs are basically DDOSers for antiviruses. Limited memory is a 'flaw' that has remained in all computers since their inception and hackers always find a way to exploit it. When the old methods stop working, new ones soon pop up and take their place. DDOSing, Zip Bombs, Fork Bombs, XML bombs, PDF bombs, buffer overflows and what not. This shows what a crucial part of programming 'memory management' really is. And so, we live another day, ready to combat the next problem.