Monday, 12 September 2011

What is a format string?

The following is a small excerpt from an article I have written for Hackin9. It is part 1 of 2 and comes out on the 22nd of Sept 2011.
 
Not all security people are programmers and consequently we need to start by defining what a format string is. Any format string is basically a set of special parameters that define how to display a variable number of arguments when sending a string of data to stdout.

Format strings are primarily known in the C family of languages are used by Perl, PHP, and even many web scripting languages to determine how the rebels will be displayed. In the C programming language, it is necessary to define variables such that they are stored as a specific data type. These include integer values (int), character values (char) in many other forms of input. In programming with C and C++ format strings are primarily utilized by the printf()[1] function family.

An example of a format string would occur if we wish to store the price of an item for sale from a catalogue. If we wish to return that value as a floating-point integer between $0 and $999.99 in value with the minimum width of three characters that always has two integer values returned after the decimal point we could do this by using the format string "3f.2f".

I pick on this book a lot, but Teach Yourself C in 21 days by SAMS, has so many good examples of how not to code that I cannot go past it. The authors particularly ignore both buffer overflow attacks as well as format string vulnerabilities. In Figure 1, we see that this book is a table of the common conversion specifiers. I recommend this book to all aspiring security professionals, it provides excellent training material for bug hunters and reverse engineers to uncover and practice exploiting.

In C code, format string vulnerabilities are devilishly simple to overlook. An example is displayed in the code snippet listed below. For the most part, the code will function correctly as long as we do not input unexpected data.
     1: strcpy(buff, argv[1]) /*Previously defined array “char buff[64]”*/
     2: printf(“\nHere we have typed our format identifier: %s\n”, buffer);
     3: printf(“Opps, we forgot to add a format identifier here”);
     4: printf(buff);

Code segment 1: Opps… we left a simple bug
 
In Code Segment 1, the vulnerability occurs at line 4. Ideally, we should have placed a conversion specifier in line 4 just as we see in line 2. Line 4 could be better written as:
     printf(“%s”, buff);
 
Forgetting those few simple characters makes all the difference.

[1] printf refers to print formatted in the printf() family of functions commonly used and taught within the C programming language family.

No comments: