The printf() family of functions (printf(), fprintf(), sprintf(), etc.) are surprisingly powerful and, if not properly used, can expose a class of vulnerabilities called format string attacks.
These attacks can be very bad because with a well-crafted format string, an attacker could write an arbitrary value into an arbitrary memory location. This could allow the attacker to do things like hijack execution or escalate privilege.
In this post, I'm going to go over the basics of how this vulnerability works and how it can be corrected.
In the interest of simplicity and clarity of explanation, I have taken steps to undo some of the protections that modern operating systems and compilers try to use to mitigate attacks like this. That is not to say that these protections are always sufficient. Since the point here is to show that we can write a value of our choosing to a location of our choosing, it's fine to make these simplifications.
Here is our simple vulnerable program:
This program prints input from the first command line argument and also the return value of the foo() method. Our goal is to modify the return value of foo(). The two simplifications I've taken here are to print out the memory location of the ret variable and to set the stack to a consistent location in memory. This is just to give us a known memory location to demonstrate the concept.
Normal execution of this program:
Now, we know that our goal is to write a non-zero value into 0xBF7Fff68. What happens if we use a more interesting input string?
What's going on here? Well, our input string is being used by printf() as its format string. Since our input includes format variables printf() is happily going along and printing out the variables that it assumes are there. If we had tried to compile our program with this string passed to printf() the compiler would complain that the number of arguments don't match.
There's something else interesting about this output. The 13th item printed is "61616161". This is actually when printf() gets to the buffer holding our format string and treats it as an input variable (the %x output of "aaaa" being "61616161"). One of the format specifiers that can be used in printf() is %n. Instead of printing output, the %n format specifier writes the number of bytes that have been printed by this specific call to printf() to the location pointed to by the next variable in the list. Since we've seen that given enough format specifiers printf() eventually uses our input buffer as a format variable, we can put our target address in the buffer for later use:
Now, instead of printing out the hex representation of "aaaa" it's printing out the hex representation of the address of the ret variable. Let's look at what happens when we change the last %x to %n:
We've written 40 into ret! Since %n writes the number of bytes printed let's tweak our format string to write a different value.
Using direct parameter access we've written a small value into ret. Using a 4 byte target address in this way, printf() first prints the address itself and %n gets incremented as that happens. By tweaking our attack string to use a slightly different address we can write 1 into ret:
We can also write much larger values without running the risk of running out of space in our buffer:
The fix for this vulnerability is pretty simple. User input data should not be passed to printf() as the format string. Instead, a format string should always be specified as a string literal. Here's our simple program with the vulnerability fixed:
When we attempt our attack string with this fixed version of the code the exploit fails:
In this post we've seen how a format string attack can be used to write to a specific, arbitrary location in memory. There are many things that can be done with the power of printf() besides printing output. We've also seen how easy it is to correct this programming error. So, let's go out there and write secure code.