Tuesday, 16 August 2011

Problems with teaching secure coding

I am getting ready to teach introductory C programming soon. From this, I have an old gripe that many who promote secure coding have. Why must we teach bad code practices?

There are many examples of this in Text book and other industry guides. The one I will pick on here is:
Teach Yourself C in 21 days by SAMS.

This book has a few errors and is in need of an update since MsDOS is no longer in common use, but it is still being sold and even used in a few classrooms.

The first real exercise starts with an error on line 36, but this is not my issue, typos happen.

The issue is not simply typos by a long way. It comes to what we are starting off by teaching. Instead of starting with good practices, they are starting by installing poor and insecure coding onto those who will be our future programmers.

It is through bad text-books that we have bad code and clueless developers!

In the book, right from the start they make liberal use of printf(). This starts with Exercise 1.2 on page 22 of this book:

1: #include
3: int radius, area;
5: int main()
6: {
7: printf ( “Enter radius (i.e. 10): ” );
8: scanf( “%d”, &radius );
9: area = (int) (3.14159 * radius * radius);
10: printf ( “\n\nArea = %d\n”, area );
11: return 0;
12: }
Well it works, so what is the problem you ask?

We see below a strange result from this. Here, we have a negative value returned from a large input value.

I have also read that “The only difference between sprintf() and printf() is that sprintf() writes data into a character array, while printf() writes data to stdout, the standard output device.”

Do we really need to teach this? Can we not start by teaching developers what the security concerns are from day 1?

Next, I have seen sprintf () offered as a means to fix problems with printf().

The Syntax of sprintf () is:
int sprintf (char *string, const char *format [,item [,item]...]);
We can see that printf() is really just the following:
    char buf[20];
sprintf(buf, "%d", num); puts(buf);
This code block is functionally the same as the following:
printf( "%d", num)
Now, let us put this into our original program as some other texts have stated is a more secure method.

1: #include
2: int radius, area;
3: int main(void)
4: {
5: char buf[20]; /* Set a buffer max length*/
6: printf ( "Enter radius (i.e. 10): " );
7: scanf( "%d", &radius );
8: area = (int) (3.14159 * radius * radius);
9: /* the old printf line is to be replaced...
10: printf ( "\n\nArea = %d\n", area );
11: */
12: sprintf(buf, "\n\nArea = %d\n", area);
13: puts(buf);

14: return 0;
15: }
Here, line 10 from the segment above has been expanded into lines 5, 12 and 13.

I did not fix the input line, but I wanted just to highlight a single problem (and there are a couple even in this small 12 line code segment).

Again, if the input is out of bounds (and we have not checked this in any way, our program suffers from a buffer overflow condition. It is vulnerable to at least a DOS if not an actual exploit.

We see this code failing in the figure above. So, the solution can be worse than the original.

The first function, printf() writes to SDOUT. The one we just used sprintf() writes to a buffer first. In both of these cases, we are not teaching good practices.

What then?
We have another function. snprintf() will write at most size-1 of the characters printed into the output string.

With snprintf(), if the return value is greater than or equal to the size argument, the string was too short and some of the printed characters were discarded.

int snprintf(char *str, size_t size, const char *format, ...);

The real difference is that snprintf() doesn't suffer from the same buffer overflow problems as we had in the functions printf() and even sprintf().

snprintf() is a length limited version of sprintf().

Using this, our code now becomes (in Visual Studio we would use sprintf_s() instead… What can I say, Microsoft):
1: #include
2: int radius, area;
3: int main(void)
4: {
5: char buf[20]; /* Set a buffer max length*/
6: printf ( "Enter radius (i.e. 10): " );
7: scanf( "%d", &radius );
8: area = (int) (3.14159 * radius * radius);
9: /* the old printf line is to be replaced...
10: printf ( "\n\nArea = %d\n", area ); */
11: int buffer = 32000 /* Set a max buffer size limit*/
12: snprintf(buf, buffer, "\n\nArea = %d\n", area);
13: puts(buf);
14: return 0;
15: }
Here we have added a buffer size. We are still far from complete and I will fix this up in coming posts and get us out of the mess the text books are putting us into completely.

The special format string "%n" causes printf() to write to memory (printf() is the only format string that behaves quite this way, but is not the only function we should avoid). Malicious attacks using this flaw can lead to the ability to write somewhere in your process's address space.

In summary
  1. Never ever use sprintf(); only use snprintf().
  2. Never call any XXXprintf() functions with a single argument.
Rather than using:
printf("hello world");
write puts("hello world");
It may be true that there is little you can do to attack printf("hello world");, but the fact remains that this is the practice we start to get developers into.

They then are used to this and when the need to do some XXXprintf() function using a string input, they will do something such as
The contents of “some_string” are untrusted. One day, an attacker will find a way to exploit this.
We should start teaching people good practice from day 1!

Basically, STOP using strcpy(), strcat(), sprintf(), index(), or strchr() on buffers that might not contain a trailing NUL byte, or that could ever contain embedded NUL bytes (binary data).  
Instead… Use:
  • strncpy() or memcpy() or memmove() in place of strcpy()
  • memchr() in the place of index() or strchr()
  • snprintf() and never think of using sprintf()
  • fgets() and never ever consider using gets()
  • Just forget that bzero() and bcopy() ever existed
The C FAQ:
http://c-faq.com/ or http://www.faqs.org/faqs/C-faq/faq/

For further information on the sprintf() problem and snprintf() solution see Section 12.21 of the FAQ.

To be continued ...

1 comment:

Anonymous said...

You can't in good conscience teach new programmers to use strncpy(). It's almost as bad as strcpy():

1. It doesn't guarantee NUL termination.
2. It pads dst with NULs if src is shorter than n. (Inefficient.)
3. dst and src must not overlap.

You might as well teach them to use memmove instead of strncpy(), though there's still a lot of juggling you have to do around the NUL termination.

You could teach them to use strlcpy but it isn't available everywhere.

Best of all would be to teach the use of a widely available modern string library that handles all of this nonsense. (Glib? http://www.gtk.org/api/2.6/glib/glib-String-Utility-Functions.html#g-strlcpy)