分类: LINUX
2017-01-19 09:59:34
在研究编译驱动的makefile的时候,发现GCC的命令行里面有一个-Os的优化选项。
遍查GCC文档,发现了-O0, -O1, -O2, -O3,就是没有发现-Os。
祭出GOOGLE大法搜了一下,终于发现这篇文章说明了-Os的作用:
原来-Os相当于-O2.5。是使用了所有-O2的优化选项,但又不缩减代码尺寸的方法。
详细的说明如下:
Level 2.5 (-Os)
The special optimization level (-Os or size) enables all -O2 optimizations that do not increase code size; it puts the emphasis on size over speed. This includes all second-level optimizations, except for the alignment optimizations. The alignment optimizations skip space to align functions, loops, jumps and labels to an address that is a multiple of a power of two, in an architecture-dependent manner. Skipping to these boundaries can increase performance as well as the size of the resulting code and data spaces; therefore, these particular optimizations are disabled. The size optimization level is enabled as:
gcc -Os -o test test.c
In gcc 3.2.2, reorder-blocks is enabled at -Os, but in gcc 3.3.2 reorder-blocks is disabled.
==============================
补充:在GCC的官方文档里又发现了关于-Os的说明:
具体内容如下:
These options control various sorts of optimizations.
Without any optimization option, the compiler's goal is to reduce thecost of compilation and to make debugging produce the expectedresults. Statements are independent: if you stop the program with abreakpoint between statements, you can then assign a new value to anyvariable or change the program counter to any other statement in thefunction and get exactly the results you would expect from the sourcecode.
Turning on optimization flags makes the compiler attempt to improvethe performance and/or code size at the expense of compilation timeand possibly the ability to debug the program.
The compiler performs optimization based on the knowledge it has ofthe program. Using the -funit-at-a-time flag will allow thecompiler to consider information gained from later functions in thefile when compiling a function. Compiling multiple files at once to asingle output file (and using -funit-at-a-time) will allowthe compiler to use information gained from all of the files whencompiling each of them.
Not all optimizations are controlled directly by a flag. Onlyoptimizations that have a flag are listed.
-O -O1 Optimize. Optimizing compilation takes somewhat more time, and a lotmore memory for a large function.With -O, the compiler tries to reduce code size and executiontime, without performing any optimizations that take a great deal ofcompilation time.
-O turns on the following optimization flags:
-fdefer-pop -fmerge-constants -fthread-jumps -floop-optimize -fif-conversion -fif-conversion2 -fdelayed-branch -fguess-branch-probability -fcprop-registers
-O also turns on -fomit-frame-pointer on machineswhere doing so does not interfere with debugging.
-O2 Optimize even more. GCC performs nearly all supported optimizationsthat do not involve a space-speed tradeoff. The compiler does notperform loop unrolling or function inlining when you specify -O2. As compared to -O, this option increases both compilation timeand the performance of the generated code.-O2 turns on all optimization flags specified by -O. Italso turns on the following optimization flags:
-fforce-mem -foptimize-sibling-calls -fstrength-reduce -fcse-follow-jumps -fcse-skip-blocks -frerun-cse-after-loop -frerun-loop-opt -fgcse -fgcse-lm -fgcse-sm -fgcse-las -fdelete-null-pointer-checks -fexpensive-optimizations -fregmove -fschedule-insns -fschedule-insns2 -fsched-interblock -fsched-spec -fcaller-saves -fpeephole2 -freorder-blocks -freorder-functions -fstrict-aliasing -funit-at-a-time -falign-functions -falign-jumps -falign-loops -falign-labels -fcrossjumping
Please note the warning under -fgcse aboutinvoking -O2 on programs that use computed gotos.
-O3 Optimize yet more. -O3 turns on all optimizations specified by-O2 and also turns on the -finline-functions,-fweb, -frename-registers and -funswitch-loopsoptions.-Os disables the following optimization flags:
-falign-functions -falign-jumps -falign-loops -falign-labels -freorder-blocks -fprefetch-loop-arrays
If you use multiple -O options, with or without level numbers,the last such option is the one that is effective.
Options of the form -fflag specify machine-independentflags. Most flags have both positive and negative forms; the negativeform of -ffoo would be -fno-foo. In the tablebelow, only one of the forms is listed—the one you typically willuse. You can figure out the other form by either removing `no-'or adding it.
The following options control specific optimizations. They are eitheractivated by -O options or are related to ones that are. Youcan use the following flags in the rare cases when “fine-tuning” ofoptimizations to be performed is desired.
-fno-default-inline Do not make member functions inline by default merely because they aredefined inside the class scope (C++ only). Otherwise, when you specify-O, member functions defined inside class scope are compiledinline by default; i.e., you don't need to add `inline' in front ofthe member function name.Disabled at levels -O, -O2, -O3, -Os.
-fforce-mem Force memory operands to be copied into registers before doingarithmetic on them. This produces better code by making all memoryreferences potential common subexpressions. When they are not commonsubexpressions, instruction combination should eliminate the separateregister-load.Enabled at levels -O2, -O3, -Os.
-fforce-addr Force memory address constants to be copied into registers beforedoing arithmetic on them. This may produce better code just as-fforce-memmay.On some machines, such as the VAX, this flag has no effect, becausethe standard calling sequence automatically handles the frame pointerand nothing is saved by pretending it doesn't exist. Themachine-description macro FRAME_POINTER_REQUIRED controlswhether a target machine supports this flag. See .
Enabled at levels -O, -O2, -O3, -Os.
-foptimize-sibling-calls Optimize sibling and tail recursive calls.Enabled at levels -O2, -O3, -Os.
-fno-inline Don't pay attention to the inline keyword. Normally this optionis used to keep the compiler from expanding any functions inline. Note that if you are not optimizing, no functions can be expanded inline.If all calls to a given function are integrated, and the function isdeclared static, then the function is normally not output asassembler code in its own right.
Enabled at level -O3.
-finline-limit=n By default, GCC limits the size of functions that can be inlined. This flagallows the control of this limit for functions that are explicitly marked asinline (i.e., marked with the inline keyword or defined within the classdefinition in c++). n is the size of functions that can be inlined innumber of pseudo instructions (not counting parameter handling). The defaultvalue of n is 600. Increasing this value can result in more inlined code atthe cost of compilation time and memory consumption. Decreasing usually makesthe compilation faster and less code will be inlined (which presumablymeans slower programs). This option is particularly useful for programs thatuse inlining heavily such as those based on recursive templates with C++.Inlining is actually controlled by a number of parameters, which may bespecified individually by using --param name=value. The -finline-limit=n option sets some of these parametersas follows:
max-inline-insns-single is set to n/2.See below for a documentation of the individualparameters controlling inlining.
Note: pseudo instruction represents, in this particular context, anabstract measurement of function's size. In no way, it represents a countof assembly instructions and as such its exact meaning might change from onerelease to an another.
-fkeep-inline-functions Even if all calls to a given function are integrated, and the functionis declared static, nevertheless output a separate run-timecallable version of the function. This switch does not affectextern inline functions.GCC enables this option by default. If you want to force the compiler tocheck if the variable was referenced, regardless of whether or notoptimization is turned on, use the -fno-keep-static-consts option.
-fmerge-constants Attempt to merge identical constants (string constants and floating pointconstants) across compilation units.This option is the default for optimized compilation if the assembler andlinker support it. Use -fno-merge-constants to inhibit thisbehavior.
Enabled at levels -O, -O2, -O3, -Os.
-fmerge-all-constants Attempt to merge identical constants and identical variables.This option implies -fmerge-constants. In addition to-fmerge-constants this considers e.g. even constant initializedarrays or initialized constant variables with integral or floating pointtypes. Languages like C or C++ require each non-automatic variable tohave distinct location, so using this option will result in non-conformingbehavior.
-fnew-ra Use a graph coloring register allocator. Currently this option is meantonly for testing. Users should not specify this option, since it is notyet ready for production use.The default is -fbranch-count-reg, enabled when-fstrength-reduce is enabled.
-fno-function-cse Do not put function addresses in registers; make each instruction thatcalls a constant function contain the function's address explicitly.This option results in less efficient code, but some strange hacksthat alter the assembler output may be confused by the optimizationsperformed when this option is not used.
The default is -ffunction-cse
-fno-zero-initialized-in-bss If the target supports a BSS section, GCC by default puts variables thatare initialized to zero into BSS. This can save space in the resultingcode.This option turns off this behavior because some programs explicitlyrely on variables going to the data section. E.g., so that theresulting executable can find the beginning of that section and/or makeassumptions based on that.
The default is -fzero-initialized-in-bss.
-fstrength-reduce Perform the optimizations of loop strength reduction andelimination of iteration variables.Enabled at levels -O2, -O3, -Os.
-fthread-jumps Perform optimizations where we check to see if a jump branches to alocation where another comparison subsumed by the first is found. Ifso, the first branch is redirected to either the destination of thesecond branch or a point immediately following it, depending on whetherthe condition is known to be true or false.Enabled at levels -O, -O2, -O3, -Os.
-fcse-follow-jumps In common subexpression elimination, scan through jump instructionswhen the target of the jump is not reached by any other path. Forexample, when CSE encounters an if statement with anelse clause, CSE will follow the jump when the conditiontested is false.Enabled at levels -O2, -O3, -Os.
-fcse-skip-blocks This is similar to -fcse-follow-jumps, but causes CSE tofollow jumps which conditionally skip over blocks. When CSEencounters a simple if statement with no else clause,-fcse-skip-blocks causes CSE to follow the jump around thebody of the if.Enabled at levels -O2, -O3, -Os.
-frerun-cse-after-loop Re-run common subexpression elimination after loop optimizations has beenperformed.Enabled at levels -O2, -O3, -Os.
-frerun-loop-opt Run the loop optimizer twice.Enabled at levels -O2, -O3, -Os.
-fgcse Perform a global common subexpression elimination pass. This pass also performs global constant and copy propagation.Note: When compiling a program using computed gotos, a GCCextension, you may get better runtime performance if you disablethe global common subexpression elimination pass by adding-fno-gcse to the command line.
Enabled at levels -O2, -O3, -Os.
-fgcse-lm When -fgcse-lm is enabled, global common subexpression elimination willattempt to move loads which are only killed by stores into themselves. Thisallows a loop containing a load/store sequence to be changed to a load outsidethe loop, and a copy/store within the loop.Enabled by default when gcse is enabled.
-fgcse-sm When -fgcse-sm is enabled, a store motion pass is run afterglobal common subexpression elimination. This pass will attempt to movestores out of loops. When used in conjunction with -fgcse-lm,loops containing a load/store sequence can be changed to a load beforethe loop and a store after the loop.Enabled by default when gcse is enabled.
-fgcse-las When -fgcse-las is enabled, the global common subexpressionelimination pass eliminates redundant loads that come after stores to thesame memory location (both partial and full redundancies).Enabled by default when gcse is enabled.
-floop-optimize Perform loop optimizations: move constant expressions out of loops, simplifyexit test conditions and optionally do strength-reduction and loop unrolling aswell.Enabled at levels -O, -O2, -O3, -Os.
-fcrossjumping Perform cross-jumping transformation. This transformation unifies equivalent code and save code size. Theresulting code may or may not perform better than without cross-jumping.Enabled at levels -O, -O2, -O3, -Os.
-fif-conversion Attempt to transform conditional jumps into branch-less equivalents. Thisinclude use of conditional moves, min, max, set flags and abs instructions, andsome tricks doable by standard arithmetics. The use of conditional executionon chips where it is available is controlled by if-conversion2.Enabled at levels -O, -O2, -O3, -Os.
-fif-conversion2 Use conditional execution (where available) to transform conditional jumps intobranch-less equivalents.Enabled at levels -O, -O2, -O3, -Os.
-fdelete-null-pointer-checks Use global dataflow analysis to identify and eliminate useless checksfor null pointers. The compiler assumes that dereferencing a nullpointer would have halted the program. If a pointer is checked afterit has already been dereferenced, it cannot be null.In some environments, this assumption is not true, and programs cansafely dereference null pointers. Use-fno-delete-null-pointer-checks to disable this optimizationfor programs which depend on that behavior.
Enabled at levels -O2, -O3, -Os.
-fexpensive-optimizations Perform a number of minor optimizations that are relatively expensive.Enabled at levels -O2, -O3, -Os.
-foptimize-register-move -fregmove Attempt to reassign register numbers in move instructions and asoperands of other simple instructions in order to maximize the amount ofregister tying. This is especially helpful on machines with two-operandinstructions.Note -fregmove and -foptimize-register-move are the sameoptimization.
Enabled at levels -O2, -O3, -Os.
-fdelayed-branch If supported for the target machine, attempt to reorder instructionsto exploit instruction slots available after delayed branchinstructions.Enabled at levels -O, -O2, -O3, -Os.
-fschedule-insns If supported for the target machine, attempt to reorder instructions toeliminate execution stalls due to required data being unavailable. Thishelps machines that have slow floating point or memory load instructionsby allowing other instructions to be issued until the result of the loador floating point instruction is required.Enabled at levels -O2, -O3, -Os.
-fschedule-insns2 Similar to -fschedule-insns, but requests an additional pass ofinstruction scheduling after register allocation has been done. This isespecially useful on machines with a relatively small number ofregisters and where memory load instructions take more than one cycle.Enabled at levels -O2, -O3, -Os.
-fno-sched-interblock Don't schedule instructions across basic blocks. This is normallyenabled by default when scheduling before register allocation, i.e. with -fschedule-insns or at -O2 or higher.This only makes sense when scheduling after register allocation, i.e. with-fschedule-insns2 or at -O2 or higher.
-fsched2-use-traces Use -fsched2-use-superblocks algorithm when scheduling after registerallocation and additionally perform code duplication in order to increase thesize of superblocks using tracer pass. See -ftracer for details ontrace formation.This mode should produce faster but significantly longer programs. Alsowithout -fbranch-probabilities the traces constructed may not match thereality and hurt the performance. This only makessense when scheduling after register allocation, i.e. with-fschedule-insns2 or at -O2 or higher.
-fcaller-saves Enable values to be allocated in registers that will be clobbered byfunction calls, by emitting extra instructions to save and restore theregisters around such calls. Such allocation is done only when itseems to result in better code than would otherwise be produced.This option is always enabled by default on certain machines, usuallythose which have no call-preserved registers to use instead.
Enabled at levels -O2, -O3, -Os.
-fmove-all-movables Forces all invariant computations in loops to be movedoutside the loop.Note: When compiling programs written in Fortran,-fmove-all-movables and -freduce-all-givs are enabledby default when you use the optimizer.
These options may generate better or worse code; results are highlydependent on the structure of loops within the source code.
These two options are intended to be removed someday, oncethey have helped determine the efficacy of variousapproaches to improving loop optimizations.
Please contact , and describe how use ofthese options affects the performance of your production code. Examples of code that runs slower when these options areenabled are very valuable.
-fno-peephole -fno-peephole2 Disable any machine-specific peephole optimizations. The differencebetween -fno-peephole and -fno-peephole2 is in how theyare implemented in the compiler; some targets use one, some use theother, a few use both.-fpeephole is enabled by default. -fpeephole2 enabled at levels -O2, -O3, -Os.
-fno-guess-branch-probability Do not guess branch probabilities using a randomized model.Sometimes GCC will opt to use a randomized model to guess branchprobabilities, when none are available from either profiling feedback(-fprofile-arcs) or `__builtin_expect'. This means thatdifferent runs of the compiler on the same program may produce differentobject code.
In a hard real-time system, people don't want different runs of thecompiler to produce code that has different behavior; minimizingnon-determinism is of paramount import. This switch allows users toreduce non-determinism, possibly at the expense of inferioroptimization.
The default is -fguess-branch-probability at levels-O, -O2, -O3, -Os.
-freorder-blocks Reorder basic blocks in the compiled function in order to reduce number oftaken branches and improve code locality.Enabled at levels -O2, -O3.
-freorder-functions Reorder basic blocks in the compiled function in order to reduce number oftaken branches and improve code locality. This is implemented by using specialsubsections .text.hot for most frequently executed functions and.text.unlikely for unlikely executed functions. Reordering is done bythe linker so object file format must support named sections and linker mustplace them in a reasonable way.Also profile feedback must be available in to make this option effective. See-fprofile-arcs for details.
Enabled at levels -O2, -O3, -Os.
-fstrict-aliasing Allows the compiler to assume the strictest aliasing rules applicable tothe language being compiled. For C (and C++), this activatesoptimizations based on the type of expressions. In particular, anobject of one type is assumed never to reside at the same address as anobject of a different type, unless the types are almost the same. Forexample, an unsigned int can alias an int, but not avoid* or a double. A character type may alias any othertype.Pay special attention to code like this:
union a_union { int i; double d; }; int f() { a_union t; t.d = 3.0; return t.i; }
The practice of reading from a different union member than the one mostrecently written to (called “type-punning”) is common. Even with-fstrict-aliasing, type-punning is allowed, provided the memoryis accessed through the union type. So, the code above will work asexpected. However, this code might not:
int f() { a_union t; int* ip; t.d = 3.0; ip = &t.i; return *ip; }
Every language that wishes to perform language-specific alias analysisshould define a function that computes, given an treenode, an alias set for the node. Nodes in different alias sets are notallowed to alias. For an example, see the C front-end functionc_get_alias_set.
Enabled at levels -O2, -O3, -Os.
-falign-functions -falign-functions=n Align the start of functions to the next power-of-two greater thann, skipping up to n bytes. For instance,-falign-functions=32 aligns functions to the next 32-byteboundary, but -falign-functions=24 would align to the next32-byte boundary only if this can be done by skipping 23 bytes or less.-fno-align-functions and -falign-functions=1 areequivalent and mean that functions will not be aligned.
Some assemblers only support this flag when n is a power of two;in that case, it is rounded up.
If n is not specified or is zero, use a machine-dependent default.
Enabled at levels -O2, -O3.
-falign-labels -falign-labels=n Align all branch targets to a power-of-two boundary, skipping up ton bytes like -falign-functions. This option can easilymake code slower, because it must insert dummy operations for when thebranch target is reached in the usual flow of the code.-fno-align-labels and -falign-labels=1 areequivalent and mean that labels will not be aligned.
If -falign-loops or -falign-jumps are applicable andare greater than this value, then their values are used instead.
If n is not specified or is zero, use a machine-dependent defaultwhich is very likely to be `1', meaning no alignment.
Enabled at levels -O2, -O3.
-falign-loops -falign-loops=n Align loops to a power-of-two boundary, skipping up to n byteslike -falign-functions. The hope is that the loop will beexecuted many times, which will make up for any execution of the dummyoperations.-fno-align-loops and -falign-loops=1 areequivalent and mean that loops will not be aligned.
If n is not specified or is zero, use a machine-dependent default.
Enabled at levels -O2, -O3.
-falign-jumps -falign-jumps=n Align branch targets to a power-of-two boundary, for branch targetswhere the targets can only be reached by jumping, skipping up to nbytes like -falign-functions. In this case, no dummy operationsneed be executed.-fno-align-jumps and -falign-jumps=1 areequivalent and mean that loops will not be aligned.
If n is not specified or is zero, use a machine-dependent default.
Enabled at levels -O2, -O3.
-frename-registers Attempt to avoid false dependencies in scheduled code by making useof registers left over after register allocation. This optimizationwill most benefit processors with lots of registers. It can, however,make debugging impossible, since variables will no longer stay ina “home register”.Enabled at levels -O3.
-fno-cprop-registers After register allocation and post-register allocation instruction splitting,we perform a copy-propagation pass to try to reduce scheduling dependenciesand occasionally eliminate the copy.Disabled at levels -O, -O2, -O3, -Os.
-fprofile-generate Enable options usually used for instrumenting application to produceprofile useful for later recompilation with profile feedback basedoptimization. You must use -fprofile-generate both whencompiling and when linking your program.The following options are enabled: -fprofile-arcs, -fprofile-values, -fvpt.
-fprofile-use Enable profile feedback directed optimizations, and optimizationsgenerally profitable only with profile feedback available.The following options are enabled: -fbranch-probabilities,-fvpt, -funroll-loops, -fpeel-loops, -ftracer.
The following options control compiler behavior regarding floatingpoint arithmetic. These options trade off between speed andcorrectness. All must be specifically enabled.
-ffloat-store Do not store floating point variables in registers, and inhibit otheroptions that might change whether a floating point value is taken from aregister or memory.This option prevents undesirable excess precision on machines such asthe 68000 where the floating registers (of the 68881) keep moreprecision than a double is supposed to have. Similarly for thex86 architecture. For most programs, the excess precision does onlygood, but a few programs rely on the precise definition of IEEE floatingpoint. Use -ffloat-store for such programs, after modifyingthem to store all pertinent intermediate computations into variables.
-ffast-math Sets -fno-math-errno, -funsafe-math-optimizations,This option causes the preprocessor macro __FAST_MATH__ to be defined.
This option should never be turned on by any -O option sinceit can result in incorrect output for programs which depend onan exact implementation of IEEE or ISO rules/specifications formath functions.
-fno-math-errno Do not set ERRNO after calling math functions that are executedwith a single instruction, e.g., sqrt. A program that relies onIEEE exceptions for math error handling may want to use this flagfor speed while maintaining IEEE arithmetic compatibility.This option should never be turned on by any -O option sinceit can result in incorrect output for programs which depend onan exact implementation of IEEE or ISO rules/specifications formath functions.
The default is -fmath-errno.
-funsafe-math-optimizations Allow optimizations for floating-point arithmetic that (a) assumethat arguments and results are valid and (b) may violate IEEE orANSI standards. When used at link-time, it may include librariesor startup files that change the default FPU control word or othersimilar optimizations.This option should never be turned on by any -O option sinceit can result in incorrect output for programs which depend onan exact implementation of IEEE or ISO rules/specifications formath functions.
The default is -fno-unsafe-math-optimizations.
-ffinite-math-only Allow optimizations for floating-point arithmetic that assumethat arguments and results are not NaNs or +-Infs.This option should never be turned on by any -O option sinceit can result in incorrect output for programs which depend onan exact implementation of IEEE or ISO rules/specifications.
The default is -fno-finite-math-only.
-fno-trapping-math Compile code assuming that floating-point operations cannot generateuser-visible traps. These traps include division by zero, overflow,underflow, inexact result and invalid operation. This option implies-fno-signaling-nans. Setting this option may allow fastercode if one relies on “non-stop” IEEE arithmetic, for example.This option should never be turned on by any -O option sinceit can result in incorrect output for programs which depend onan exact implementation of IEEE or ISO rules/specifications formath functions.
The default is -ftrapping-math.
-frounding-math Disable transformations and optimizations that assume default floatingpoint rounding behavior. This is round-to-zero for all floating pointto integer conversions, and round-to-nearest for all other arithmetictruncations. This option should be specified for programs that changethe FP rounding mode dynamically, or that may be executed with anon-default rounding mode. This option disables constant folding offloating point expressions at compile-time (which may be affected byrounding mode) and arithmetic transformations that are unsafe in thepresence of sign-dependent rounding modes.The default is -fno-rounding-math.
This option is experimental and does not currently guarantee todisable all GCC optimizations that are affected by rounding mode. Future versions of GCC may provide finer control of this settingusing C99's FENV_ACCESS pragma. This command line optionwill be used to specify the default state for FENV_ACCESS.
-fsignaling-nans Compile code assuming that IEEE signaling NaNs may generate user-visibletraps during floating-point operations. Setting this option disablesoptimizations that may change the number of exceptions visible withsignaling NaNs. This option implies -ftrapping-math.This option causes the preprocessor macro __SUPPORT_SNAN__ tobe defined.
The default is -fno-signaling-nans.
This option is experimental and does not currently guarantee todisable all GCC optimizations that affect signaling NaN behavior.
-fsingle-precision-constant Treat floating point constant as single precision constant instead ofimplicitly converting it to double precision constant.The following options control optimizations that may improveperformance, but are not enabled by any -O options. Thissection includes experimental options that may produce broken code.
-fbranch-probabilities After running a program compiled with -fprofile-arcs(see ), you can compile it a second time using-fbranch-probabilities, to improve optimizations based onthe number of times each branch was taken. When the programcompiled with -fprofile-arcs exits it saves arc executioncounts to a file called sourcename.gcda for each sourcefile The information in this data file is very dependent on thestructure of the generated code, so you must use the same source codeand the same optimization options for both compilations.With -fbranch-probabilities, GCC puts a`REG_BR_PROB' note on each `JUMP_INSN' and `CALL_INSN'. These can be used to improve optimization. Currently, they are onlyused in one place: in reorg.c, instead of guessing which path abranch is mostly to take, the `REG_BR_PROB' values are used toexactly determine which path is taken more often.
-fprofile-values If combined with -fprofile-arcs, it adds code so that somedata about values of expressions in the program is gathered.With -fbranch-probabilities, it reads back the data gatheredfrom profiling values of expressions and adds `REG_VALUE_PROFILE'notes to instructions for their later usage in optimizations.
-fvpt If combined with -fprofile-arcs, it instructs the compiler to adda code to gather information about values of expressions.With -fbranch-probabilities, it reads back the data gatheredand actually performs the optimizations based on them. Currently the optimizations include specialization of division operationusing the knowledge about the value of the denominator.
-fnew-ra Use a graph coloring register allocator. Currently this option is meantfor testing, so we are interested to hear about miscompilations with-fnew-ra.Disabled at level -Os.
-ffunction-sections -fdata-sections Place each function or data item into its own section in the outputfile if the target supports arbitrary sections. The name of thefunction or the name of the data item determines the section's namein the output file.Use these options on systems where the linker can perform optimizationsto improve locality of reference in the instruction space. Most systemsusing the ELF object format and SPARC processors running Solaris 2 havelinkers with such optimizations. AIX may have these optimizations inthe future.
Only use these options when there are significant benefits from doingso. When you specify these options, the assembler and linker willcreate larger object and executable files and will also be slower. You will not be able to use gprof on all systems if youspecify this option and you may have problems with debugging ifyou specify both this option and -g.
-fbranch-target-load-optimize Perform branch target register load optimization before prologue / epiloguethreading. The use of target registers can typically be exposed only during reload,thus hoisting loads out of loops and doing inter-block scheduling needsa separate optimization pass.The names of specific parameters, and the meaning of the values, aretied to the internals of the compiler, and are subject to changewithout notice in future releases.
In each case, the value is an integer. The allowable choices forname are given in the following table:
max-crossjump-edges The maximum number of incoming edges to consider for crossjumping. The algorithm used by -fcrossjumping is O(N^2) inthe number of edges incoming to each block. Increasing values meanmore aggressive optimization, making the compile time increase withprobably small improvement in executable size.The tracer-dynamic-coverage-feedback is used only when profilefeedback is available. The real profiles (as opposed to statically estimatedones) are much less balanced allowing the threshold to be larger value.
tracer-max-code-growth Stop tail duplication once code growth has reached given percentage. This israther hokey argument, as most of the duplicates will be eliminated later incross jumping, so it may be set to much higher values than is the desired codegrowth.Similarly to tracer-dynamic-coverage two values are present, one forcompilation for profile feedback and one for compilation without. The valuefor compilation with profile feedback needs to be more conservative (higher) inorder to make tracer effective.
max-cse-path-length Maximum number of basic blocks on path that cse considers.The default is 30% + 70% * (RAM/1GB) with an upper bound of 100% whenRAM >= 1GB. If getrlimit is available, the notion of "RAM" isthe smallest of actual RAM, RLIMIT_RSS, RLIMIT_DATA and RLIMIT_AS. IfGCC is not able to calculate RAM on a particular platform, the lowerbound of 30% is used. Setting this parameter andggc-min-heapsize to zero causes a full collection to occur atevery opportunity. This is extremely slow, but can be useful fordebugging.
ggc-min-heapsize Minimum size of the garbage collector's heap before it begins botheringto collect garbage. The first collection occurs after the heap expandsby ggc-min-expand% beyond ggc-min-heapsize. Again,tuning this may improve compilation speed, and has no effect on codegeneration.The default is RAM/8, with a lower bound of 4096 (four megabytes) and anupper bound of 131072 (128 megabytes). If getrlimit isavailable, the notion of "RAM" is the smallest of actual RAM,RLIMIT_RSS, RLIMIT_DATA and RLIMIT_AS. If GCC is not able to calculateRAM on a particular platform, the lower bound is used. Setting thisparameter very large effectively disables garbage collection. Settingthis parameter and ggc-min-expand to zero causes a fullcollection to occur at every opportunity.
max-reload-search-insns The maximum number of instruction reload should look backward for equivalentregister. Increasing values mean more aggressive optimization, making thecompile time increase with probably slightly better performance. The defaultvalue is 100.The reorder-block-duplicate-feedback is used only when profilefeedback is available and may be set to higher values thanreorder-block-duplicate since information about the hot spots is moreaccurate.