Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1852789
  • 博文数量: 274
  • 博客积分: 2366
  • 博客等级: 大尉
  • 技术积分: 1880
  • 用 户 组: 普通用户
  • 注册时间: 2007-04-22 09:37
文章分类

全部博文(274)

文章存档

2022年(1)

2020年(10)

2019年(7)

2018年(18)

2017年(26)

2016年(32)

2015年(43)

2014年(30)

2013年(44)

2012年(36)

2011年(17)

2010年(10)

分类: LINUX

2017-01-19 09:59:34

在研究编译驱动的makefile的时候,发现GCC的命令行里面有一个-Os的优化选项。
    遍查GCC文档,发现了-O0, -O1, -O2, -O3,就是没有发现-Os。
    祭出GOOGLE大法搜了一下,终于发现这篇文章说明了-Os的作用:


   原来-Os相当于-O2.5。是使用了所有-O2的优化选项,但又不缩减代码尺寸的方法。
   详细的说明如下:
Level 2.5 (-Os)

The special optimization level (-Os or size) enables all -O2 optimizations that do not increase code size; it puts the emphasis on size over speed. This includes all second-level optimizations, except for the alignment optimizations. The alignment optimizations skip space to align functions, loops, jumps and labels to an address that is a multiple of a power of two, in an architecture-dependent manner. Skipping to these boundaries can increase performance as well as the size of the resulting code and data spaces; therefore, these particular optimizations are disabled. The size optimization level is enabled as:

gcc -Os -o test test.c

In gcc 3.2.2, reorder-blocks is enabled at -Os, but in gcc 3.3.2 reorder-blocks is disabled.

==============================
补充:在GCC的官方文档里又发现了关于-Os的说明:

具体内容如下:



3.10 Options That Control Optimization


These options control various sorts of optimizations.

Without any optimization option, the compiler's goal is to reduce thecost of compilation and to make debugging produce the expectedresults. Statements are independent: if you stop the program with abreakpoint between statements, you can then assign a new value to anyvariable or change the program counter to any other statement in thefunction and get exactly the results you would expect from the sourcecode.

Turning on optimization flags makes the compiler attempt to improvethe performance and/or code size at the expense of compilation timeand possibly the ability to debug the program.

The compiler performs optimization based on the knowledge it has ofthe program. Using the -funit-at-a-time flag will allow thecompiler to consider information gained from later functions in thefile when compiling a function. Compiling multiple files at once to asingle output file (and using -funit-at-a-time) will allowthe compiler to use information gained from all of the files whencompiling each of them.

Not all optimizations are controlled directly by a flag. Onlyoptimizations that have a flag are listed.

-O -O1 Optimize. Optimizing compilation takes somewhat more time, and a lotmore memory for a large function.

With -O, the compiler tries to reduce code size and executiontime, without performing any optimizations that take a great deal ofcompilation time.

-O turns on the following optimization flags:

          -fdefer-pop 
          -fmerge-constants 
          -fthread-jumps 
          -floop-optimize 
          -fif-conversion 
          -fif-conversion2 
          -fdelayed-branch 
          -fguess-branch-probability 
          -fcprop-registers
     

-O also turns on -fomit-frame-pointer on machineswhere doing so does not interfere with debugging. 

-O2 Optimize even more. GCC performs nearly all supported optimizationsthat do not involve a space-speed tradeoff. The compiler does notperform loop unrolling or function inlining when you specify -O2. As compared to -O, this option increases both compilation timeand the performance of the generated code.

-O2 turns on all optimization flags specified by -O. Italso turns on the following optimization flags:

          -fforce-mem 
          -foptimize-sibling-calls 
          -fstrength-reduce 
          -fcse-follow-jumps  -fcse-skip-blocks 
          -frerun-cse-after-loop  -frerun-loop-opt 
          -fgcse  -fgcse-lm  -fgcse-sm  -fgcse-las 
          -fdelete-null-pointer-checks 
          -fexpensive-optimizations 
          -fregmove 
          -fschedule-insns  -fschedule-insns2 
          -fsched-interblock  -fsched-spec 
          -fcaller-saves 
          -fpeephole2 
          -freorder-blocks  -freorder-functions 
          -fstrict-aliasing 
          -funit-at-a-time 
          -falign-functions  -falign-jumps 
          -falign-loops  -falign-labels 
          -fcrossjumping
     

Please note the warning under -fgcse aboutinvoking -O2 on programs that use computed gotos. 

-O3 Optimize yet more. -O3 turns on all optimizations specified by-O2 and also turns on the -finline-functions,-fweb, -frename-registers and -funswitch-loopsoptions. 
-O0 Do not optimize. This is the default. 
-Os Optimize for size. -Os enables all -O2 optimizations thatdo not typically increase code size. It also performs furtheroptimizations designed to reduce code size.

-Os disables the following optimization flags:

          -falign-functions  -falign-jumps  -falign-loops 
          -falign-labels  -freorder-blocks  -fprefetch-loop-arrays
     

If you use multiple -O options, with or without level numbers,the last such option is the one that is effective.

Options of the form -fflag specify machine-independentflags. Most flags have both positive and negative forms; the negativeform of -ffoo would be -fno-foo. In the tablebelow, only one of the forms is listed—the one you typically willuse. You can figure out the other form by either removing `no-'or adding it.

The following options control specific optimizations. They are eitheractivated by -O options or are related to ones that are. Youcan use the following flags in the rare cases when “fine-tuning” ofoptimizations to be performed is desired.

-fno-default-inline Do not make member functions inline by default merely because they aredefined inside the class scope (C++ only). Otherwise, when you specify-O, member functions defined inside class scope are compiledinline by default; i.e., you don't need to add `inline' in front ofthe member function name. 
-fno-defer-pop Always pop the arguments to each function call as soon as that functionreturns. For machines which must pop arguments after a function call,the compiler normally lets arguments accumulate on the stack for severalfunction calls and pops them all at once.

Disabled at levels -O, -O2, -O3, -Os. 

-fforce-mem Force memory operands to be copied into registers before doingarithmetic on them. This produces better code by making all memoryreferences potential common subexpressions. When they are not commonsubexpressions, instruction combination should eliminate the separateregister-load.

Enabled at levels -O2, -O3, -Os. 

-fforce-addr Force memory address constants to be copied into registers beforedoing arithmetic on them. This may produce better code just as-fforce-memmay. 
-fomit-frame-pointer Don't keep the frame pointer in a register for functions thatdon't need one. This avoids the instructions to save, set up andrestore frame pointers; it also makes an extra register availablein many functions. It also makes debugging impossible onsome machines.

On some machines, such as the VAX, this flag has no effect, becausethe standard calling sequence automatically handles the frame pointerand nothing is saved by pretending it doesn't exist. Themachine-description macro FRAME_POINTER_REQUIRED controlswhether a target machine supports this flag. See .

Enabled at levels -O, -O2, -O3, -Os. 

-foptimize-sibling-calls Optimize sibling and tail recursive calls.

Enabled at levels -O2, -O3, -Os. 

-fno-inline Don't pay attention to the inline keyword. Normally this optionis used to keep the compiler from expanding any functions inline. Note that if you are not optimizing, no functions can be expanded inline. 
-finline-functions Integrate all simple functions into their callers. The compilerheuristically decides which functions are simple enough to be worthintegrating in this way.

If all calls to a given function are integrated, and the function isdeclared static, then the function is normally not output asassembler code in its own right.

Enabled at level -O3. 

-finline-limit=n By default, GCC limits the size of functions that can be inlined. This flagallows the control of this limit for functions that are explicitly marked asinline (i.e., marked with the inline keyword or defined within the classdefinition in c++). n is the size of functions that can be inlined innumber of pseudo instructions (not counting parameter handling). The defaultvalue of n is 600. Increasing this value can result in more inlined code atthe cost of compilation time and memory consumption. Decreasing usually makesthe compilation faster and less code will be inlined (which presumablymeans slower programs). This option is particularly useful for programs thatuse inlining heavily such as those based on recursive templates with C++.

Inlining is actually controlled by a number of parameters, which may bespecified individually by using --param name=value. The -finline-limit=n option sets some of these parametersas follows:

max-inline-insns-single is set to n/2. 
max-inline-insns-auto is set to n/2. 
min-inline-insns is set to 130 or n/4, whichever is smaller. 
max-inline-insns-rtl is set to n.

See below for a documentation of the individualparameters controlling inlining.

Note: pseudo instruction represents, in this particular context, anabstract measurement of function's size. In no way, it represents a countof assembly instructions and as such its exact meaning might change from onerelease to an another. 

-fkeep-inline-functions Even if all calls to a given function are integrated, and the functionis declared static, nevertheless output a separate run-timecallable version of the function. This switch does not affectextern inline functions. 
-fkeep-static-consts Emit variables declared static const when optimization isn't turnedon, even if the variables aren't referenced.

GCC enables this option by default. If you want to force the compiler tocheck if the variable was referenced, regardless of whether or notoptimization is turned on, use the -fno-keep-static-consts option. 

-fmerge-constants Attempt to merge identical constants (string constants and floating pointconstants) across compilation units.

This option is the default for optimized compilation if the assembler andlinker support it. Use -fno-merge-constants to inhibit thisbehavior.

Enabled at levels -O, -O2, -O3, -Os. 

-fmerge-all-constants Attempt to merge identical constants and identical variables.

This option implies -fmerge-constants. In addition to-fmerge-constants this considers e.g. even constant initializedarrays or initialized constant variables with integral or floating pointtypes. Languages like C or C++ require each non-automatic variable tohave distinct location, so using this option will result in non-conformingbehavior. 

-fnew-ra Use a graph coloring register allocator. Currently this option is meantonly for testing. Users should not specify this option, since it is notyet ready for production use. 
-fno-branch-count-reg Do not use “decrement and branch” instructions on a count register,but instead generate a sequence of instructions that decrement aregister, compare it against zero, then branch based upon the result. This option is only meaningful on architectures that support suchinstructions, which include x86, PowerPC, IA-64 and S/390.

The default is -fbranch-count-reg, enabled when-fstrength-reduce is enabled. 

-fno-function-cse Do not put function addresses in registers; make each instruction thatcalls a constant function contain the function's address explicitly.

This option results in less efficient code, but some strange hacksthat alter the assembler output may be confused by the optimizationsperformed when this option is not used.

The default is -ffunction-cse 

-fno-zero-initialized-in-bss If the target supports a BSS section, GCC by default puts variables thatare initialized to zero into BSS. This can save space in the resultingcode.

This option turns off this behavior because some programs explicitlyrely on variables going to the data section. E.g., so that theresulting executable can find the beginning of that section and/or makeassumptions based on that.

The default is -fzero-initialized-in-bss. 

-fstrength-reduce Perform the optimizations of loop strength reduction andelimination of iteration variables.

Enabled at levels -O2, -O3, -Os. 

-fthread-jumps Perform optimizations where we check to see if a jump branches to alocation where another comparison subsumed by the first is found. Ifso, the first branch is redirected to either the destination of thesecond branch or a point immediately following it, depending on whetherthe condition is known to be true or false.

Enabled at levels -O, -O2, -O3, -Os. 

-fcse-follow-jumps In common subexpression elimination, scan through jump instructionswhen the target of the jump is not reached by any other path. Forexample, when CSE encounters an if statement with anelse clause, CSE will follow the jump when the conditiontested is false.

Enabled at levels -O2, -O3, -Os. 

-fcse-skip-blocks This is similar to -fcse-follow-jumps, but causes CSE tofollow jumps which conditionally skip over blocks. When CSEencounters a simple if statement with no else clause,-fcse-skip-blocks causes CSE to follow the jump around thebody of the if.

Enabled at levels -O2, -O3, -Os. 

-frerun-cse-after-loop Re-run common subexpression elimination after loop optimizations has beenperformed.

Enabled at levels -O2, -O3, -Os. 

-frerun-loop-opt Run the loop optimizer twice.

Enabled at levels -O2, -O3, -Os. 

-fgcse Perform a global common subexpression elimination pass. This pass also performs global constant and copy propagation.

Note: When compiling a program using computed gotos, a GCCextension, you may get better runtime performance if you disablethe global common subexpression elimination pass by adding-fno-gcse to the command line.

Enabled at levels -O2, -O3, -Os. 

-fgcse-lm When -fgcse-lm is enabled, global common subexpression elimination willattempt to move loads which are only killed by stores into themselves. Thisallows a loop containing a load/store sequence to be changed to a load outsidethe loop, and a copy/store within the loop.

Enabled by default when gcse is enabled. 

-fgcse-sm When -fgcse-sm is enabled, a store motion pass is run afterglobal common subexpression elimination. This pass will attempt to movestores out of loops. When used in conjunction with -fgcse-lm,loops containing a load/store sequence can be changed to a load beforethe loop and a store after the loop.

Enabled by default when gcse is enabled. 

-fgcse-las When -fgcse-las is enabled, the global common subexpressionelimination pass eliminates redundant loads that come after stores to thesame memory location (both partial and full redundancies).

Enabled by default when gcse is enabled. 

-floop-optimize Perform loop optimizations: move constant expressions out of loops, simplifyexit test conditions and optionally do strength-reduction and loop unrolling aswell.

Enabled at levels -O, -O2, -O3, -Os. 

-fcrossjumping Perform cross-jumping transformation. This transformation unifies equivalent code and save code size. Theresulting code may or may not perform better than without cross-jumping.

Enabled at levels -O, -O2, -O3, -Os. 

-fif-conversion Attempt to transform conditional jumps into branch-less equivalents. Thisinclude use of conditional moves, min, max, set flags and abs instructions, andsome tricks doable by standard arithmetics. The use of conditional executionon chips where it is available is controlled by if-conversion2.

Enabled at levels -O, -O2, -O3, -Os. 

-fif-conversion2 Use conditional execution (where available) to transform conditional jumps intobranch-less equivalents.

Enabled at levels -O, -O2, -O3, -Os. 

-fdelete-null-pointer-checks Use global dataflow analysis to identify and eliminate useless checksfor null pointers. The compiler assumes that dereferencing a nullpointer would have halted the program. If a pointer is checked afterit has already been dereferenced, it cannot be null.

In some environments, this assumption is not true, and programs cansafely dereference null pointers. Use-fno-delete-null-pointer-checks to disable this optimizationfor programs which depend on that behavior.

Enabled at levels -O2, -O3, -Os. 

-fexpensive-optimizations Perform a number of minor optimizations that are relatively expensive.

Enabled at levels -O2, -O3, -Os. 

-foptimize-register-move -fregmove Attempt to reassign register numbers in move instructions and asoperands of other simple instructions in order to maximize the amount ofregister tying. This is especially helpful on machines with two-operandinstructions.

Note -fregmove and -foptimize-register-move are the sameoptimization.

Enabled at levels -O2, -O3, -Os. 

-fdelayed-branch If supported for the target machine, attempt to reorder instructionsto exploit instruction slots available after delayed branchinstructions.

Enabled at levels -O, -O2, -O3, -Os. 

-fschedule-insns If supported for the target machine, attempt to reorder instructions toeliminate execution stalls due to required data being unavailable. Thishelps machines that have slow floating point or memory load instructionsby allowing other instructions to be issued until the result of the loador floating point instruction is required.

Enabled at levels -O2, -O3, -Os. 

-fschedule-insns2 Similar to -fschedule-insns, but requests an additional pass ofinstruction scheduling after register allocation has been done. This isespecially useful on machines with a relatively small number ofregisters and where memory load instructions take more than one cycle.

Enabled at levels -O2, -O3, -Os. 

-fno-sched-interblock Don't schedule instructions across basic blocks. This is normallyenabled by default when scheduling before register allocation, i.e. with -fschedule-insns or at -O2 or higher. 
-fno-sched-spec Don't allow speculative motion of non-load instructions. This is normallyenabled by default when scheduling before register allocation, i.e. with -fschedule-insns or at -O2 or higher. 
-fsched-spec-load Allow speculative motion of some load instructions. This only makessense when scheduling before register allocation, i.e. with-fschedule-insnsor at -O2 or higher. 
-fsched-spec-load-dangerous Allow speculative motion of more load instructions. This only makessense when scheduling before register allocation, i.e. with-fschedule-insnsor at -O2 or higher. 
-fsched-stalled-insns=n Define how many insns (if any) can be moved prematurely from the queueof stalled insns into the ready list, during the second scheduling pass. 
-fsched-stalled-insns-dep=n Define how many insn groups (cycles) will be examined for a dependencyon a stalled insn that is candidate for premature removal from the queueof stalled insns. Has an effect only during the second scheduling pass,and only if -fsched-stalled-insns is used and its value is not zero. 
-fsched2-use-superblocks When scheduling after register allocation, do use superblock schedulingalgorithm. Superblock scheduling allows motion across basic block boundariesresulting on faster schedules. This option is experimental, as not all machinedescriptions used by GCC model the CPU closely enough to avoid unreliableresults from the algorithm.

This only makes sense when scheduling after register allocation, i.e. with-fschedule-insns2 or at -O2 or higher. 

-fsched2-use-traces Use -fsched2-use-superblocks algorithm when scheduling after registerallocation and additionally perform code duplication in order to increase thesize of superblocks using tracer pass. See -ftracer for details ontrace formation.

This mode should produce faster but significantly longer programs. Alsowithout -fbranch-probabilities the traces constructed may not match thereality and hurt the performance. This only makessense when scheduling after register allocation, i.e. with-fschedule-insns2 or at -O2 or higher. 

-fcaller-saves Enable values to be allocated in registers that will be clobbered byfunction calls, by emitting extra instructions to save and restore theregisters around such calls. Such allocation is done only when itseems to result in better code than would otherwise be produced.

This option is always enabled by default on certain machines, usuallythose which have no call-preserved registers to use instead.

Enabled at levels -O2, -O3, -Os. 

-fmove-all-movables Forces all invariant computations in loops to be movedoutside the loop. 
-freduce-all-givs Forces all general-induction variables in loops to bestrength-reduced.

Note: When compiling programs written in Fortran,-fmove-all-movables and -freduce-all-givs are enabledby default when you use the optimizer.

These options may generate better or worse code; results are highlydependent on the structure of loops within the source code.

These two options are intended to be removed someday, oncethey have helped determine the efficacy of variousapproaches to improving loop optimizations.

Please contact , and describe how use ofthese options affects the performance of your production code. Examples of code that runs slower when these options areenabled are very valuable. 

-fno-peephole -fno-peephole2 Disable any machine-specific peephole optimizations. The differencebetween -fno-peephole and -fno-peephole2 is in how theyare implemented in the compiler; some targets use one, some use theother, a few use both.

-fpeephole is enabled by default. -fpeephole2 enabled at levels -O2, -O3, -Os. 

-fno-guess-branch-probability Do not guess branch probabilities using a randomized model.

Sometimes GCC will opt to use a randomized model to guess branchprobabilities, when none are available from either profiling feedback(-fprofile-arcs) or `__builtin_expect'. This means thatdifferent runs of the compiler on the same program may produce differentobject code.

In a hard real-time system, people don't want different runs of thecompiler to produce code that has different behavior; minimizingnon-determinism is of paramount import. This switch allows users toreduce non-determinism, possibly at the expense of inferioroptimization.

The default is -fguess-branch-probability at levels-O, -O2, -O3, -Os. 

-freorder-blocks Reorder basic blocks in the compiled function in order to reduce number oftaken branches and improve code locality.

Enabled at levels -O2, -O3. 

-freorder-functions Reorder basic blocks in the compiled function in order to reduce number oftaken branches and improve code locality. This is implemented by using specialsubsections .text.hot for most frequently executed functions and.text.unlikely for unlikely executed functions. Reordering is done bythe linker so object file format must support named sections and linker mustplace them in a reasonable way.

Also profile feedback must be available in to make this option effective. See-fprofile-arcs for details.

Enabled at levels -O2, -O3, -Os. 

-fstrict-aliasing Allows the compiler to assume the strictest aliasing rules applicable tothe language being compiled. For C (and C++), this activatesoptimizations based on the type of expressions. In particular, anobject of one type is assumed never to reside at the same address as anobject of a different type, unless the types are almost the same. Forexample, an unsigned int can alias an int, but not avoid* or a double. A character type may alias any othertype.

Pay special attention to code like this:

          union a_union {
            int i;
            double d;
          };
          
          int f() {
            a_union t;
            t.d = 3.0;
            return t.i;
          }
     

The practice of reading from a different union member than the one mostrecently written to (called “type-punning”) is common. Even with-fstrict-aliasing, type-punning is allowed, provided the memoryis accessed through the union type. So, the code above will work asexpected. However, this code might not:

          int f() {
            a_union t;
            int* ip;
            t.d = 3.0;
            ip = &t.i;
            return *ip;
          }
     

Every language that wishes to perform language-specific alias analysisshould define a function that computes, given an treenode, an alias set for the node. Nodes in different alias sets are notallowed to alias. For an example, see the C front-end functionc_get_alias_set.

Enabled at levels -O2, -O3, -Os. 

-falign-functions -falign-functions=n Align the start of functions to the next power-of-two greater thann, skipping up to n bytes. For instance,-falign-functions=32 aligns functions to the next 32-byteboundary, but -falign-functions=24 would align to the next32-byte boundary only if this can be done by skipping 23 bytes or less.

-fno-align-functions and -falign-functions=1 areequivalent and mean that functions will not be aligned.

Some assemblers only support this flag when n is a power of two;in that case, it is rounded up.

If n is not specified or is zero, use a machine-dependent default.

Enabled at levels -O2, -O3. 

-falign-labels -falign-labels=n Align all branch targets to a power-of-two boundary, skipping up ton bytes like -falign-functions. This option can easilymake code slower, because it must insert dummy operations for when thebranch target is reached in the usual flow of the code.

-fno-align-labels and -falign-labels=1 areequivalent and mean that labels will not be aligned.

If -falign-loops or -falign-jumps are applicable andare greater than this value, then their values are used instead.

If n is not specified or is zero, use a machine-dependent defaultwhich is very likely to be `1', meaning no alignment.

Enabled at levels -O2, -O3. 

-falign-loops -falign-loops=n Align loops to a power-of-two boundary, skipping up to n byteslike -falign-functions. The hope is that the loop will beexecuted many times, which will make up for any execution of the dummyoperations.

-fno-align-loops and -falign-loops=1 areequivalent and mean that loops will not be aligned.

If n is not specified or is zero, use a machine-dependent default.

Enabled at levels -O2, -O3. 

-falign-jumps -falign-jumps=n Align branch targets to a power-of-two boundary, for branch targetswhere the targets can only be reached by jumping, skipping up to nbytes like -falign-functions. In this case, no dummy operationsneed be executed.

-fno-align-jumps and -falign-jumps=1 areequivalent and mean that loops will not be aligned.

If n is not specified or is zero, use a machine-dependent default.

Enabled at levels -O2, -O3. 

-frename-registers Attempt to avoid false dependencies in scheduled code by making useof registers left over after register allocation. This optimizationwill most benefit processors with lots of registers. It can, however,make debugging impossible, since variables will no longer stay ina “home register”. 
-fweb Constructs webs as commonly used for register allocation purposes and assigneach web individual pseudo register. This allows the register allocation passto operate on pseudos directly, but also strengthens several other optimizationpasses, such as CSE, loop optimizer and trivial dead code remover. It can,however, make debugging impossible, since variables will no longer stay in a“home register”.

Enabled at levels -O3. 

-fno-cprop-registers After register allocation and post-register allocation instruction splitting,we perform a copy-propagation pass to try to reduce scheduling dependenciesand occasionally eliminate the copy.

Disabled at levels -O, -O2, -O3, -Os. 

-fprofile-generate Enable options usually used for instrumenting application to produceprofile useful for later recompilation with profile feedback basedoptimization. You must use -fprofile-generate both whencompiling and when linking your program.

The following options are enabled: -fprofile-arcs, -fprofile-values, -fvpt. 

-fprofile-use Enable profile feedback directed optimizations, and optimizationsgenerally profitable only with profile feedback available.

The following options are enabled: -fbranch-probabilities,-fvpt, -funroll-loops, -fpeel-loops, -ftracer.

The following options control compiler behavior regarding floatingpoint arithmetic. These options trade off between speed andcorrectness. All must be specifically enabled.

-ffloat-store Do not store floating point variables in registers, and inhibit otheroptions that might change whether a floating point value is taken from aregister or memory.

This option prevents undesirable excess precision on machines such asthe 68000 where the floating registers (of the 68881) keep moreprecision than a double is supposed to have. Similarly for thex86 architecture. For most programs, the excess precision does onlygood, but a few programs rely on the precise definition of IEEE floatingpoint. Use -ffloat-store for such programs, after modifyingthem to store all pertinent intermediate computations into variables. 

-ffast-math Sets -fno-math-errno, -funsafe-math-optimizations, 
-fno-trapping-math, -ffinite-math-only,-fno-rounding-math and -fno-signaling-nans.

This option causes the preprocessor macro __FAST_MATH__ to be defined.

This option should never be turned on by any -O option sinceit can result in incorrect output for programs which depend onan exact implementation of IEEE or ISO rules/specifications formath functions. 

-fno-math-errno Do not set ERRNO after calling math functions that are executedwith a single instruction, e.g., sqrt. A program that relies onIEEE exceptions for math error handling may want to use this flagfor speed while maintaining IEEE arithmetic compatibility.

This option should never be turned on by any -O option sinceit can result in incorrect output for programs which depend onan exact implementation of IEEE or ISO rules/specifications formath functions.

The default is -fmath-errno. 

-funsafe-math-optimizations Allow optimizations for floating-point arithmetic that (a) assumethat arguments and results are valid and (b) may violate IEEE orANSI standards. When used at link-time, it may include librariesor startup files that change the default FPU control word or othersimilar optimizations.

This option should never be turned on by any -O option sinceit can result in incorrect output for programs which depend onan exact implementation of IEEE or ISO rules/specifications formath functions.

The default is -fno-unsafe-math-optimizations. 

-ffinite-math-only Allow optimizations for floating-point arithmetic that assumethat arguments and results are not NaNs or +-Infs.

This option should never be turned on by any -O option sinceit can result in incorrect output for programs which depend onan exact implementation of IEEE or ISO rules/specifications.

The default is -fno-finite-math-only. 

-fno-trapping-math Compile code assuming that floating-point operations cannot generateuser-visible traps. These traps include division by zero, overflow,underflow, inexact result and invalid operation. This option implies-fno-signaling-nans. Setting this option may allow fastercode if one relies on “non-stop” IEEE arithmetic, for example.

This option should never be turned on by any -O option sinceit can result in incorrect output for programs which depend onan exact implementation of IEEE or ISO rules/specifications formath functions.

The default is -ftrapping-math. 

-frounding-math Disable transformations and optimizations that assume default floatingpoint rounding behavior. This is round-to-zero for all floating pointto integer conversions, and round-to-nearest for all other arithmetictruncations. This option should be specified for programs that changethe FP rounding mode dynamically, or that may be executed with anon-default rounding mode. This option disables constant folding offloating point expressions at compile-time (which may be affected byrounding mode) and arithmetic transformations that are unsafe in thepresence of sign-dependent rounding modes.

The default is -fno-rounding-math.

This option is experimental and does not currently guarantee todisable all GCC optimizations that are affected by rounding mode. Future versions of GCC may provide finer control of this settingusing C99's FENV_ACCESS pragma. This command line optionwill be used to specify the default state for FENV_ACCESS. 

-fsignaling-nans Compile code assuming that IEEE signaling NaNs may generate user-visibletraps during floating-point operations. Setting this option disablesoptimizations that may change the number of exceptions visible withsignaling NaNs. This option implies -ftrapping-math.

This option causes the preprocessor macro __SUPPORT_SNAN__ tobe defined.

The default is -fno-signaling-nans.

This option is experimental and does not currently guarantee todisable all GCC optimizations that affect signaling NaN behavior. 

-fsingle-precision-constant Treat floating point constant as single precision constant instead ofimplicitly converting it to double precision constant.

The following options control optimizations that may improveperformance, but are not enabled by any -O options. Thissection includes experimental options that may produce broken code.

-fbranch-probabilities After running a program compiled with -fprofile-arcs(see ), you can compile it a second time using-fbranch-probabilities, to improve optimizations based onthe number of times each branch was taken. When the programcompiled with -fprofile-arcs exits it saves arc executioncounts to a file called sourcename.gcda for each sourcefile The information in this data file is very dependent on thestructure of the generated code, so you must use the same source codeand the same optimization options for both compilations.

With -fbranch-probabilities, GCC puts a`REG_BR_PROB' note on each `JUMP_INSN' and `CALL_INSN'. These can be used to improve optimization. Currently, they are onlyused in one place: in reorg.c, instead of guessing which path abranch is mostly to take, the `REG_BR_PROB' values are used toexactly determine which path is taken more often. 

-fprofile-values If combined with -fprofile-arcs, it adds code so that somedata about values of expressions in the program is gathered.

With -fbranch-probabilities, it reads back the data gatheredfrom profiling values of expressions and adds `REG_VALUE_PROFILE'notes to instructions for their later usage in optimizations. 

-fvpt If combined with -fprofile-arcs, it instructs the compiler to adda code to gather information about values of expressions.

With -fbranch-probabilities, it reads back the data gatheredand actually performs the optimizations based on them. Currently the optimizations include specialization of division operationusing the knowledge about the value of the denominator. 

-fnew-ra Use a graph coloring register allocator. Currently this option is meantfor testing, so we are interested to hear about miscompilations with-fnew-ra. 
-ftracer Perform tail duplication to enlarge superblock size. This transformationsimplifies the control flow of the function allowing other optimizations to dobetter job. 
-funit-at-a-time Parse the whole compilation unit before starting to produce code. This allows some extra optimizations to take place but consumes morememory. 
-funroll-loops Unroll loops whose number of iterations can be determined at compile time orupon entry to the loop. -funroll-loops implies-frerun-cse-after-loop. It also turns on complete loop peeling(i.e. complete removal of loops with small constant number of iterations). This option makes code larger, and may or may not make it run faster. 
-funroll-all-loops Unroll all loops, even if their number of iterations is uncertain whenthe loop is entered. This usually makes programs run more slowly. -funroll-all-loops implies the same options as-funroll-loops. 
-fpeel-loops Peels the loops for that there is enough information that they do notroll much (from profile feedback). It also turns on complete loop peeling(i.e. complete removal of loops with small constant number of iterations). 
-funswitch-loops Move branches with loop invariant conditions out of the loop, with duplicatesof the loop on both branches (modified according to result of the condition). 
-fold-unroll-loops Unroll loops whose number of iterations can be determined at compiletime or upon entry to the loop, using the old loop unroller whose looprecognition is based on notes from frontend. -fold-unroll-loops impliesboth -fstrength-reduce and -frerun-cse-after-loop. Thisoption makes code larger, and may or may not make it run faster. 
-fold-unroll-all-loops Unroll all loops, even if their number of iterations is uncertain whenthe loop is entered. This is done using the old loop unroller whose looprecognition is based on notes from frontend. This usually makes programs run more slowly. -fold-unroll-all-loops implies the same options as-fold-unroll-loops. 
-funswitch-loops Move branches with loop invariant conditions out of the loop, with duplicatesof the loop on both branches (modified according to result of the condition). 
-funswitch-loops Move branches with loop invariant conditions out of the loop, with duplicatesof the loop on both branches (modified according to result of the condition). 
-fprefetch-loop-arrays If supported by the target machine, generate instructions to prefetchmemory to improve the performance of loops that access large arrays.

Disabled at level -Os. 

-ffunction-sections -fdata-sections Place each function or data item into its own section in the outputfile if the target supports arbitrary sections. The name of thefunction or the name of the data item determines the section's namein the output file.

Use these options on systems where the linker can perform optimizationsto improve locality of reference in the instruction space. Most systemsusing the ELF object format and SPARC processors running Solaris 2 havelinkers with such optimizations. AIX may have these optimizations inthe future.

Only use these options when there are significant benefits from doingso. When you specify these options, the assembler and linker willcreate larger object and executable files and will also be slower. You will not be able to use gprof on all systems if youspecify this option and you may have problems with debugging ifyou specify both this option and -g. 

-fbranch-target-load-optimize Perform branch target register load optimization before prologue / epiloguethreading. The use of target registers can typically be exposed only during reload,thus hoisting loads out of loops and doing inter-block scheduling needsa separate optimization pass. 
-fbranch-target-load-optimize2 Perform branch target register load optimization after prologue / epiloguethreading. 
--param name=value In some places, GCC uses various constants to control the amount ofoptimization that is done. For example, GCC will not inline functionsthat contain more that a certain number of instructions. You cancontrol some of these constants on the command-line using the--param option.

The names of specific parameters, and the meaning of the values, aretied to the internals of the compiler, and are subject to changewithout notice in future releases.

In each case, the value is an integer. The allowable choices forname are given in the following table:

max-crossjump-edges The maximum number of incoming edges to consider for crossjumping. The algorithm used by -fcrossjumping is O(N^2) inthe number of edges incoming to each block. Increasing values meanmore aggressive optimization, making the compile time increase withprobably small improvement in executable size. 
max-delay-slot-insn-search The maximum number of instructions to consider when looking for aninstruction to fill a delay slot. If more than this arbitrary number ofinstructions is searched, the time savings from filling the delay slotwill be minimal so stop searching. Increasing values mean moreaggressive optimization, making the compile time increase with probablysmall improvement in executable run time. 
max-delay-slot-live-search When trying to fill delay slots, the maximum number of instructions toconsider when searching for a block with valid live registerinformation. Increasing this arbitrarily chosen value means moreaggressive optimization, increasing the compile time. This parametershould be removed when the delay slot code is rewritten to maintain thecontrol-flow graph. 
max-gcse-memory The approximate maximum amount of memory that will be allocated inorder to perform the global common subexpression eliminationoptimization. If more memory than specified is required, theoptimization will not be done. 
max-gcse-passes The maximum number of passes of GCSE to run. 
max-pending-list-length The maximum number of pending dependencies scheduling will allowbefore flushing the current state and starting over. Large functionswith few branches or calls can create excessively large lists whichneedlessly consume memory and resources. 
max-inline-insns-single Several parameters control the tree inliner used in gcc. This number sets the maximum number of instructions (counted in GCC'sinternal representation) in a single function that the tree inlinerwill consider for inlining. This only affects functions declaredinline and methods implemented in a class declaration (C++). The default value is 500. 
max-inline-insns-auto When you use -finline-functions (included in -O3),a lot of functions that would otherwise not be considered for inliningby the compiler will be investigated. To those functions, a different(more restrictive) limit compared to functions declared inline canbe applied. The default value is 100. 
large-function-insns The limit specifying really large functions. For functions greater than thislimit inlining is constrained by --param large-function-growth. This parameter is useful primarily to avoid extreme compilation time caused by non-linearalgorithms used by the backend. This parameter is ignored when -funit-at-a-time is not used. The default value is 3000. 
large-function-growth Specifies maximal growth of large function caused by inlining in percents. This parameter is ignored when -funit-at-a-time is not used. The default value is 200. 
inline-unit-growth Specifies maximal overall growth of the compilation unit caused by inlining. This parameter is ignored when -funit-at-a-time is not used. The default value is 150. 
max-inline-insns-rtl For languages that use the RTL inliner (this happens at a later stagethan tree inlining), you can set the maximum allowable size (countedin RTL instructions) for the RTL inliner with this parameter. The default value is 600. 
max-unrolled-insns The maximum number of instructions that a loop should have if that loopis unrolled, and if the loop is unrolled, it determines how many timesthe loop code is unrolled. 
max-average-unrolled-insns The maximum number of instructions biased by probabilities of their executionthat a loop should have if that loop is unrolled, and if the loop is unrolled,it determines how many times the loop code is unrolled. 
max-unroll-times The maximum number of unrollings of a single loop. 
max-peeled-insns The maximum number of instructions that a loop should have if that loopis peeled, and if the loop is peeled, it determines how many timesthe loop code is peeled. 
max-peel-times The maximum number of peelings of a single loop. 
max-completely-peeled-insns The maximum number of insns of a completely peeled loop. 
max-completely-peel-times The maximum number of iterations of a loop to be suitable for complete peeling. 
max-unswitch-insns The maximum number of insns of an unswitched loop. 
max-unswitch-level The maximum number of branches unswitched in a single loop. 
hot-bb-count-fraction Select fraction of the maximal count of repetitions of basic block in programgiven basic block needs to have to be considered hot. 
hot-bb-frequency-fraction Select fraction of the maximal frequency of executions of basic block infunction given basic block needs to have to be considered hot 
tracer-dynamic-coverage tracer-dynamic-coverage-feedback This value is used to limit superblock formation once the given percentage ofexecuted instructions is covered. This limits unnecessary code sizeexpansion.

The tracer-dynamic-coverage-feedback is used only when profilefeedback is available. The real profiles (as opposed to statically estimatedones) are much less balanced allowing the threshold to be larger value. 

tracer-max-code-growth Stop tail duplication once code growth has reached given percentage. This israther hokey argument, as most of the duplicates will be eliminated later incross jumping, so it may be set to much higher values than is the desired codegrowth. 
tracer-min-branch-ratio Stop reverse growth when the reverse probability of best edge is less than thisthreshold (in percent). 
tracer-min-branch-ratio tracer-min-branch-ratio-feedback Stop forward growth if the best edge do have probability lower than thisthreshold.

Similarly to tracer-dynamic-coverage two values are present, one forcompilation for profile feedback and one for compilation without. The valuefor compilation with profile feedback needs to be more conservative (higher) inorder to make tracer effective. 

max-cse-path-length Maximum number of basic blocks on path that cse considers. 
max-last-value-rtl The maximum size measured as number of RTLs that can be recorded in anexpression in combiner for a pseudo register as last known value of thatregister. The default is 10000. 
ggc-min-expand GCC uses a garbage collector to manage its own memory allocation. Thisparameter specifies the minimum percentage by which the garbagecollector's heap should be allowed to expand between collections. Tuning this may improve compilation speed; it has no effect on codegeneration.

The default is 30% + 70% * (RAM/1GB) with an upper bound of 100% whenRAM >= 1GB. If getrlimit is available, the notion of "RAM" isthe smallest of actual RAM, RLIMIT_RSS, RLIMIT_DATA and RLIMIT_AS. IfGCC is not able to calculate RAM on a particular platform, the lowerbound of 30% is used. Setting this parameter andggc-min-heapsize to zero causes a full collection to occur atevery opportunity. This is extremely slow, but can be useful fordebugging. 

ggc-min-heapsize Minimum size of the garbage collector's heap before it begins botheringto collect garbage. The first collection occurs after the heap expandsby ggc-min-expand% beyond ggc-min-heapsize. Again,tuning this may improve compilation speed, and has no effect on codegeneration.

The default is RAM/8, with a lower bound of 4096 (four megabytes) and anupper bound of 131072 (128 megabytes). If getrlimit isavailable, the notion of "RAM" is the smallest of actual RAM,RLIMIT_RSS, RLIMIT_DATA and RLIMIT_AS. If GCC is not able to calculateRAM on a particular platform, the lower bound is used. Setting thisparameter very large effectively disables garbage collection. Settingthis parameter and ggc-min-expand to zero causes a fullcollection to occur at every opportunity. 

max-reload-search-insns The maximum number of instruction reload should look backward for equivalentregister. Increasing values mean more aggressive optimization, making thecompile time increase with probably slightly better performance. The defaultvalue is 100. 
max-cselib-memory-location The maximum number of memory locations cselib should take into acount. Increasing values mean more aggressive optimization, making the compile timeincrease with probably slightly better performance. The default value is 500. 
reorder-blocks-duplicate reorder-blocks-duplicate-feedback Used by basic block reordering pass to decide whether to use unconditionalbranch or duplicate the code on its destination. Code is duplicated when itsestimated size is smaller than this value multiplied by the estimated size ofunconditional jump in the hot spots of the program.

The reorder-block-duplicate-feedback is used only when profilefeedback is available and may be set to higher values thanreorder-block-duplicate since information about the hot spots is moreaccurate.


阅读(3657) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~