Chinaunix首页 | 论坛 | 博客
  • 博客访问: 154052
  • 博文数量: 32
  • 博客积分: 2053
  • 博客等级: 大尉
  • 技术积分: 382
  • 用 户 组: 普通用户
  • 注册时间: 2010-04-09 12:45
文章分类

全部博文(32)

文章存档

2011年(12)

2010年(20)

分类: LINUX

2010-12-29 12:15:24

About GCC printf optimization

Peter Seiderer

November 30, 2005

Abstract:

As you may have noticed, your GCC compiled source code is not always calling the functions you wrote in the source code. This can easily observed with printf in the hello world example. Starting from this some more GCC optimizations will be examined and analyzed in detail.

For a start we examine what the GCC compiler [] will do with the hello world example when different optimization level are used.

Figure 1: example1.c
\begin{figure}\begin{listinginput}{1}{example1.c}
\end{listinginput}
\end{figure}
Take a look at the nmoutput when compiled without optimization.
> gcc-3.3.6 -O0 example1.c
> nm --undefined-only a.out
w __gmon_start__
w _Jv_RegisterClasses
U __libc_start_main@@GLIBC_2.0
U printf@@GLIBC_2.0
As expected the dynamically linked executable a.out will call the function printf. Now we compile the same source code with optimization on.
> gcc-3.3.6 -O1 example1.c
> nm --undefined-only a.out
w __gmon_start__
w _Jv_RegisterClasses
U __libc_start_main@@GLIBC_2.0
U puts@@GLIBC_2.0
The nm output indicates that puts is called instead of the original printf from the source code. To verify this we take a look at the intermediate assembler listing. First with optimization off.
> gcc-3.3.6 -O0 -S example1.c
Here the resulting example1.s file:
   1 	.file	"example1.c"
2 .section .rodata
3 .LC0:
4 .string "hello world\n"
5 .text
6 .globl main
7 .type main, @function
8 main:
9 pushl %ebp
10 movl %esp, %ebp
11 subl $8, %esp
12 andl $-16, %esp
13 movl $0, %eax
14 subl %eax, %esp
15 movl $.LC0, (%esp)
16 call printf
17 movl $0, %eax
18 leave
19 ret
20 .size main, .-main
21 .ident "GCC: (GNU) 3.3.6"
Second with optimization on.
> gcc-3.3.6 -O1 -S example1.c
And the resulting example1.s file:
   1 	.file	"example1.c"
2 .section .rodata
3 .LC0:
4 .string "hello world"
5 .text
6 .globl main
7 .type main, @function
8 main:
9 pushl %ebp
10 movl %esp, %ebp
11 subl $8, %esp
12 andl $-16, %esp
13 movl $.LC0, (%esp)
14 call puts
15 movl $0, %eax
16 movl %ebp, %esp
17 popl %ebp
18 ret
19 .size main, .-main
20 .ident "GCC: (GNU) 3.3.6"
The string constant in line 4 changed from "hello world$\backslash$n" to "hello world". The call printf on line 16 changed to call puts on line 14.

Where is the optimization implemented in the GCC source code:
  • GCC-3.3.6: gcc/c-common.c line 1324
  • GCC-3.4.4: gcc/builtins.c line 4651
  • GCC-4.0.2: gcc/builtins.c:4654.
Step by step guide along the source code lines (for GCC-3.4.4). In the following examples we will omit the first two lines
         w __gmon_start__
w _Jv_RegisterClasses
of the nm output.

No optimization if the return code of printf is evaluated [line 4662-4664].

Figure 2: example2.c
\begin{figure}\begin{listinginput}{1}{example2.c}
\end{listinginput}
\end{figure}

> gcc-3.4.4 -O1 example2.c
> nm --undefined-only a.out
U __libc_start_main@@GLIBC_2.0
U printf@@GLIBC_2.0

The format string must be a literal string constant [line 4666-4677].

Figure 3: example3.c
\begin{figure}\begin{listinginput}{1}{example3.c}
\end{listinginput}
\end{figure}

> gcc-3.4.4 -O1 example3.c
> nm --undefined-only a.out
U __libc_start_main@@GLIBC_2.0
U printf@@GLIBC_2.0
Note that the GCC-4.0.2 is smart enough to optimize this example if an optimization level greater or even $-O2$ is given.
> gcc-4.0.2 -Wall -O2 example3.c
nm --undefined-only a.out
U __libc_start_main@@GLIBC_2.0
U puts@@GLIBC_2.0

A $printf$ call with the format string $''\%s\backslash n''$ [line 4679-4687] is converted to a $puts()$ call.
  printf("%s\n", "hello world"); // converted to puts("hello world");

Figure 4: example4.c
\begin{figure}\begin{listinginput}{1}{example4.c}
\end{listinginput}
\end{figure}
> gcc-3.4.4 -O1 example4.c
> nm --undefined-only a.out
U __libc_start_main@@GLIBC_2.0
U puts@@GLIBC_2.0

A printf call with the format string $''\%c''$ [line 4688-4696] is converted to a $putchar()$ call.
  printf("%c", 'A'); // converted to putchar('A');

Figure 5: example5.c
\begin{figure}\begin{listinginput}{1}{example5.c}
\end{listinginput}
\end{figure}
> gcc-3.4.4 -O1 example5.c
> nm --undefined-only a.out
U __libc_start_main@@GLIBC_2.0
U putchar@@GLIBC_2.0

No optimization if one or more $'\%'$ are detected in the format string [line 4699-4701].

Figure 6: example6.c
\begin{figure}\begin{listinginput}{1}{example6.c}
\end{listinginput}
\end{figure}
> gcc-3.4.4 -O1 example6.c
> nm --undefined-only a.out
U __libc_start_main@@GLIBC_2.0
U printf@@GLIBC_2.0

Omit printf call if format string is empty [line 4706-4708].
  printf(""); // converted to empty statement

Figure 7: example7.c
\begin{figure}\begin{listinginput}{1}{example7.c}
\end{listinginput}
\end{figure}
> gcc-3.4.4 -O1 example7.c
> nm --undefined-only a.out
U __libc_start_main@@GLIBC_2.0

A printf call with a format string of length one [line 4709-4718] is converted to $putchar()$ call.
  printf("A"); // converted to putchar('A');

Figure 8: example8.c
\begin{figure}\begin{listinginput}{1}{example8.c}
\end{listinginput}
\end{figure}
> gcc-3.4.4 -O1 example8.c
> nm --undefined-only a.out
U __libc_start_main@@GLIBC_2.0
U putchar@@GLIBC_2.0

No optimization if the string is not ending with $'\backslash n'$ [line 4721-4739].

Figure 9: example9.c
\begin{figure}\begin{listinginput}{1}{example9.c}
\end{listinginput}
\end{figure}
> gcc-3.4.4 -O1 example9.c
> nm --undefined-only a.out
U __libc_start_main@@GLIBC_2.0
U printf@@GLIBC_2.0

A printf call with a simple format string ending with $'\backslash n'$ [line 4721-4739] is converted to a $puts()$ call.
  printf("hello world\n"); // converted to puts("hello world");

Figure 10: example10.c
\begin{figure}\begin{listinginput}{1}{example10.c}
\end{listinginput}
\end{figure}
> gcc-3.4.4 -O1 example10.c
> nm --undefined-only a.out
U __libc_start_main@@GLIBC_2.0
U puts@@GLIBC_2.0

Example taken from []. The glibc [] printf implementation has the nice feature to print '(null)' in case of the format string $'\%s'$ and a pointer to NULL is given as argument.

Figure 11: bug_15685.c
\begin{figure}\begin{listinginput}{1}{bug_15685.c}
\end{listinginput}
\end{figure}
> gcc-3.3.6 -Wall -O0 bug_15685.c
./a.out
(null)
> gcc-3.3.6 -Wall -O1 bug_15685.c
./a.out
Segmentation fault
As stated in the bug report by Andrew Pinski this is not a bug, because both behaviors are allowed by the standard. If the standard states undefined behavior the compiler and/or library is free in what to implement. You should not depend on the one or the other behavior in a portable program.

From the glibc manual: ``If you accidentally pass a null pointer as the argument for a %s conversion, the GNU library prints it as (null). We think this is more useful than crashing. But it's not good practice to pass a null argument intentionally.'' [].

In figure  note that the GCC-4.0.2 is doing this optimization even if -O0 (no optimization) is given on the command line.
Figure 12: Results for example1.c
\begin{figure}\par
\begin{center}
\begin{tabular}{\vert c\vert c\vert c\vert c\v...
...0.2 & puts & puts & puts & puts \\ \hline
\end{tabular}\end{center}
\end{figure}

To get the original $printf$ call with GCC-4.0.2 you must provide $-fno-builtin$ or $-fno-builtin-printf$ as command line argument.

> gcc-4.0.2 -fno-builtin -O0 example1.c
> nm --undefined-only a.out
w __gmon_start__
w _Jv_RegisterClasses
U __libc_start_main@@GLIBC_2.0
U printf@@GLIBC_2.0
> gcc-4.0.2 -fno-builtin-printf -O0 example1.c
> nm --undefined-only a.out
w __gmon_start__
w _Jv_RegisterClasses
U __libc_start_main@@GLIBC_2.0
U printf@@GLIBC_2.0

Some simple measurement for the hello world example.
Figure 13: loop_printf.c
\begin{figure}\begin{listinginput}{1}{loop_printf.c}
\end{listinginput}
\end{figure}
> gcc-3.3.6 -Wall -O0 loop_printf.c
> time ./a.out > /dev/null
real 0m2.197s
user 0m2.186s
sys 0m0.010s
> gcc-3.3.6 -Wall -O1 loop_printf.c
> time ./a.out > /dev/null
real 0m1.007s
user 0m0.997s
sys 0m0.009s
In figure  we see that the $puts$ loop consumes roughly half the time of the $printf$ loop.

Figure 14: Times for loop_printf.c
\begin{figure}\begin{center}
\begin{tabular}{\vert c\vert c\vert c\vert c\vert}\...
...s & sys 0m0.011s & sys 0m0.010s \\ \hline
\end{tabular}\end{center}
\end{figure}

Document the remaining optimizations mentioned in $gcc/builtins.c$. Explore advanced optimization like suggested in [].


%20Output%20Conversions.


4645 /* Expand a call to printf or printf_unlocked with argument list ARGLIST.
4646 Return 0 if a normal call should be emitted rather than transforming
4647 the function inline. If convenient, the result should be placed in
4648 TARGET with mode MODE. UNLOCKED indicates this is a printf_unlocked
4649 call. */
4650 static rtx
4651 expand_builtin_printf (tree arglist, rtx target, enum machine_mode mode,
4652 bool unlocked)
4653 {
4654 tree fn_putchar = unlocked
4655 ? implicit_built_in_decls[BUILT_IN_PUTCHAR_UNLOCKED]
4656 : implicit_built_in_decls[BUILT_IN_PUTCHAR];
4657 tree fn_puts = unlocked ? implicit_built_in_decls[BUILT_IN_PUTS_UNLOCKED]
4658 : implicit_built_in_decls[BUILT_IN_PUTS];
4659 const char *fmt_str;
4660 tree fn, fmt, arg;
4661
4662 /* If the return value is used, don't do the transformation. */
4663 if (target != const0_rtx)
4664 return 0;
4665
4666 /* Verify the required arguments in the original call. */
4667 if (! arglist)
4668 return 0;
4669 fmt = TREE_VALUE (arglist);
4670 if (TREE_CODE (TREE_TYPE (fmt)) != POINTER_TYPE)
4671 return 0;
4672 arglist = TREE_CHAIN (arglist);
4673
4674 /* Check whether the format is a literal string constant. */
4675 fmt_str = c_getstr (fmt);
4676 if (fmt_str == NULL)
4677 return 0;
4678
4679 /* If the format specifier was "%s\n", call __builtin_puts(arg). */
4680 if (strcmp (fmt_str, "%s\n") == 0)
4681 {
4682 if (! arglist
4683 || TREE_CODE (TREE_TYPE (TREE_VALUE (arglist))) != POINTER_TYPE
4684 || TREE_CHAIN (arglist))
4685 return 0;
4686 fn = fn_puts;
4687 }
4688 /* If the format specifier was "%c", call __builtin_putchar(arg). */
4689 else if (strcmp (fmt_str, "%c") == 0)
4690 {
4691 if (! arglist
4692 || TREE_CODE (TREE_TYPE (TREE_VALUE (arglist))) != INTEGER_TYPE
4693 || TREE_CHAIN (arglist))
4694 return 0;
4695 fn = fn_putchar;
4696 }
4697 else
4698 {
4699 /* We can't handle anything else with % args or %% ... yet. */
4700 if (strchr (fmt_str, '%'))
4701 return 0;
4702
4703 if (arglist)
4704 return 0;
4705
4706 /* If the format specifier was "", printf does nothing. */
4707 if (fmt_str[0] == '\0')
4708 return const0_rtx;
4709 /* If the format specifier has length of 1, call putchar. */
4710 if (fmt_str[1] == '\0')
4711 {
4712 /* Given printf("c"), (where c is any one character,)
4713 convert "c"[0] to an int and pass that to the replacement
4714 function. */
4715 arg = build_int_2 (fmt_str[0], 0);
4716 arglist = build_tree_list (NULL_TREE, arg);
4717 fn = fn_putchar;
4718 }
4719 else
4720 {
4721 /* If the format specifier was "string\n", call puts("string"). */
4722 size_t len = strlen (fmt_str);
4723 if (fmt_str[len - 1] == '\n')
4724 {
4725 /* Create a NUL-terminated string that's one char shorter
4726 than the original, stripping off the trailing '\n'. */
4727 char *newstr = (char *) alloca (len);
4728 memcpy (newstr, fmt_str, len - 1);
4729 newstr[len - 1] = 0;
4730
4731 arg = build_string_literal (len, newstr);
4732 arglist = build_tree_list (NULL_TREE, arg);
4733 fn = fn_puts;
4734 }
4735 else
4736 /* We'd like to arrange to call fputs(string,stdout) here,
4737 but we need stdout and don't have a way to get it yet. */
4738 return 0;
4739 }
4740 }
4741
4742 if (!fn)
4743 return 0;
4744 return expand_expr (build_function_call_expr (fn, arglist),
4745 target, mode, EXPAND_NORMAL);
4746 }



Footnotes

nm - list symbols from object files
On AMD Athlon(TM) XP 1800+ system


Peter Seiderer 2005-11-30
阅读(1932) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~