Chinaunix首页 | 论坛 | 博客
  • 博客访问: 111664
  • 博文数量: 13
  • 博客积分: 637
  • 博客等级: 中士
  • 技术积分: 323
  • 用 户 组: 普通用户
  • 注册时间: 2010-03-20 17:10
文章分类

全部博文(13)

文章存档

2013年(3)

2012年(10)

分类: BSD

2013-06-30 00:50:30

FreeBSD Tuning and Optimization

performance modifications for 1gig and 10gig networks



The default install of FreeBSD 9.1 is quite fast and will work well the majority of the time. If you installed FreeBSD without any modifications you will not be disappointed. But, what if you wanted to get the most out of your install? In this post we offer some ideas to tune, tweak and optimize FreeBSD's network stack to get the most out of the operating system. Further down on the page we offer proofs to show gained performance and lower latency as well as links to the graphing tools we used so you can do the same.




FreeBSD is fast, but hardware is important

If you want to achieve optimized network throughput you need to use good hardware. As Monty Python taught us, it is Cheap hardware will cause nothing but misery. High latency, low throughput and poor driver support not to mention general flakiness under load. A case in point is the built in network port on motherboards. The chipset may negotiate at one(1) gigabit, but it will not perform well under stress. Take some time, setup a good machine and you will be richly rewarded with "huge...tracks of land."

On our we are using two hardware setups and these network modifications can be applied to both. The first is an example of a 1 gigabit machine for home or office use. The second is a rack mounted server for 10 gig and trunked 40 gigabit high speed networks. Both of these hardware configurations are actively used in production in clusters we support. We will post the hardware here, but again please take a look at our for speed tests and many more details.

## Home or Office server (almost completely silent)

Processor    : AMD Athlon II X4 610e Propus 2.4GHz 45watt
CPU Cooler   : Zalman 9500A-LED 92mm 2 Ball CPU Cooler (fan off)
Motherboard  : Asus M4A89GTD Pro/USB3 AM3 AMD 890GX
Memory       : Kingston 4GB DDR3 KVR1333D3N9K2/4G
Hard Drive   : 256GB 2.5-inch SSD 840 Pro Series with ZFS root
               Western Digital Black WD4001FAEX 4TB 7200RPM SATA3/SATA 6.0 GB/s
Power Supply : Antec Green 380 watts EA-380D 
Case         : Antec LanBoy Air (completely fan-less)

Network Card : Intel PRO/1000 GT PCI PWLA8391GT PCI (two cards for OpenBSD)
                  -OR-
               Intel I350-T2 Server Adapter (PCIe x4 for FreeBSD)

NOTE: FreeBSD can use the Intel I350-T2 with the igb(4) driver. This card is
incredibly fast and stable. OpenBSD does not support the Intel I350-T2 so using
the older Intel PRO/1000 is necessary.


## Rack mounted server

Processor    : Intel Xeon L5630 Westmere 2.13GHz 12MB L3 Cache LGA 1366 40 Watt Quad-Core
Motherboard  : Supermicro X8ST3-F
Chassis      : SuperChassis 825TQ-R700LPV 2U rackmount (Revision K)
Memory       : KVR1333D3E9SK2/4G 4GB 1333MHz DDR3 ECC CL9 DIMM (Kit of 2) w/ Thermal Sensor
Hard Drive   : 256GB 2.5-inch SSD 840 Pro Series with ZFS root
Network Card : Myricom Myri-10G "Gen2" 10G-PCIE2-8B2-2S (PCI Express x8)
 Transceiver : Myricom Myri-10G SFP+ 10GBase-SR optical fiber transceiver (850nm wavelength)



The /boot/loader.conf

The /boot/loader.conf is where we setup the specifics for our network cards and some hashes. We tried to completely comment each of the options in the file. Directives which are commented out are not used and only included for reference. You should be able to copy and paste the following text into your loader.conf if you wish.

### Calomel.org  /boot/loader.conf

# ZFS root boot config. We use ZFS on all of our drives due to its speed and superior data integrity
# options. We commented out these directive in case you are using UFS.
# zfs_load="YES"
# vfs.root.mountfrom="zfs:zroot"

#if_mxge_load="YES"                # load the Myri10GE kernel module on boot

#if_carp_load="YES"                # load the PF CARP module

# accf accept filters are used so a deamon will not have to context switch
# several times before performing the initial parsing of the client request.
# Filtering will decrease server load by reducing the amount of CPU time to
# handle incoming requests, but does not add latency to client requests.
# As soon as the full request arrives in the filter from the client, the
# request is immediately passed to the deamon. "man accf_http" or similar for
# more information. We can not stress enough how good these filters are. For
# example the accf_http filter will keep partial query DoS attacks away from
# the web server daemon (expire connections in net.inet.tcp.msl milliseconds)
# so you can continue to serve properly formated requests. The accf suite has
# been available since FreeBSD 4.0 and is production stable code.
accf_data_load="YES"               # Wait for data accept filter (apache)
accf_http_load="YES"               # buffer incoming connections until complete HTTP requests arrive (nginx apache)
                                   # for nginx also add, "listen 127.0.0.1:80 accept_filter=httpready;"
accf_dns_load="YES"                # Wait for full DNS request accept filter (unbound or bind)

aio_load="YES"                     # Async IO system calls
autoboot_delay="3"                 # reduce boot menu delay from 10 to 3 seconds
#cc_htcp_load="YES"                # H-TCP Congestion Control

#coretemp_load="YES"               # intel cpu thermal sensors
#amdtemp_load="YES"                # amd K8, K10, K11 cpu thermal sensors

# the following hw.igb.* are only for the Intel i350 nic (igb0 and igb1)
hw.igb.enable_aim="1"              # enable Intel's Adaptive Interrupt Moderation to reduce load
hw.igb.max_interrupt_rate="32000"  # maximum number of interrupts per second generated by single igb(4)
hw.igb.num_queues="0"              # network queues equal to the number of supported queues on the hardware NIC
                                   # (Intel i350 = 8) Set to zero(0) for auto tune and the driver will create
                                   # as many queues as CPU cores up to a max of 8.
#hw.igb.enable_msix="1"            # enable MSI-X interrupts for PCE-X devices
hw.igb.txd="2048"                  # number of transmit descriptors allocated by the driver (2048 limit)
hw.igb.rxd="2048"                  # number of receive descriptors allocated by the driver (2048 limit)
hw.igb.rx_process_limit="1000"     # maximum number of received packets to process at a time, The default of 100 is
                                   # too low for most firewalls. (-1 means unlimited)

kern.ipc.nmbclusters="32768"       # increase the number of network mbufs the system is willing to allocate.
                                   # Each cluster represents approximately 2K of memory, so a value of 32768
                                   # represents 64M of kernel memory reserved for network buffers. 

#hw.intr_storm_threshold="10000"   # maximum number of interrupts per second on any interrupt level
                                   # (vmstat -i for total rate). If you still see Interrupt Storm detected messages,
                                   # increase the limit to a higher number and look for the culprit.

net.inet.tcp.syncache.hashsize="1024"   # Size of the syncache hash table, must be a power of 2 (default 512)
net.inet.tcp.syncache.bucketlimit="100" # Limit the number of entries permitted in each bucket of the hash table. (default 30)

net.inet.tcp.tcbhashsize="32000"   # size of hash used to find socket for incoming packet

#net.isr.bindthreads="0"           # do not bind network threads to a CPU core if you notice network processing is
                                   # using 100% of a single CPU core. Setting "0" will spread the network processing
                                   # load over multiple cpus, but at a slight performance cost and increase latency 
                                   # caused by the cache invalidation. The default of "1" is faster for most systems
                                   # and will lead to lower latency response times.

#net.isr.defaultqlimit="256"       # qlimit for igmp, arp, ether and ip6 queues only (netstat -Q)
#net.isr.dispatch="direct"         # interrupt handling via multiple CPU
#net.isr.maxqlimit="10240"         # limit per-workstream queues (use "netstat -Q" if Qdrop -lt 0 increase this directive)
#net.isr.maxthreads="3"            # Max number of threads for NIC IRQ balancing 3 for 4 cores in box leaving at least
                                   # one core for system or service processing. Again, if you notice one cpu being
                                   # overloaded due to network processing this directive will spread out the load
                                   # at the cost of cpu affinity unbinding. The default of "1" is faster.

# NOTE regarding "net.isr.*" : Processor affinity can effectively reduce cache
# problems but it does not curb the persistent load-balancing problem.[1]
# Processor affinity becomes more complicated in systems with non-uniform
# architectures. A system with two dual-core hyper-threaded CPUs presents a
# challenge to a scheduling algorithm. There is complete affinity between two
# virtual CPUs implemented on the same core via hyper-threading, partial
# affinity between two cores on the same physical chip (as the cores share
# some, but not all, cache), and no affinity between separate physical chips.
# It is possible that net.isr.bindthreads="0" and net.isr.maxthreads="3" can
# cause more slowdown if your system is not cpu loaded already. We highly
# recommend getting a more efficient network card instead of setting the
# "net.isr.*" options. Look at the Intel i350-T2 for gigabit or the Myricom
# 10G-PCIE2-8C2-2S for 10gig. These cards will reduce the machines nic
# cpu processing to around 3% or lower and latencies less then 1ms.

net.link.ifqmaxlen="10240"         # Increase interface send queue length

# SIFTR (Statistical Information For TCP Research) is a kernel module which
# logs a range of statistics on active TCP connections to a log file in comma
# separated format. Only useful for researching tcp flows as it does add some
# processing load to the system.
# 
#siftr_load="YES"

#
## EOF ##



The /etc/sysctl.conf

The /etc/sysctl.conf is the primary optimization file. Everything from congestion control to buffer changes can be found here. Again, each option we changed is fully commented and may also have a link to a research study for more information. Directives which are commented out are not used and included for reference. This is a large file so take some time to look through each option and understand why we made the change from default.

### Calomel.org  /etc/sysctl.conf

## NOTES:
# 
# low latency is important so we highly recommend that you disable hyper
# threading on Intel CPUs as it has an unpredictable affect on latency and
# causes a lot of problems with CPU affinity.
#
# These settings are specifically tuned for a low latency FIOS (300/65) and
# gigabit LAN connections. If you have 10gig or 40gig you will notice these
# setting work quite well and can even allow the machine to saturate the
# network.
#

# set to at least 16MB for 10GE hosts. Default "2097152" is fine for 1Gb. If
# you wish to increase you TCP window size to 65535 and window scale to 9 then
# set this directive to 16m.
kern.ipc.maxsockbuf=16777216

# set auto tuning maximum to at least 16MB for 10GE hosts. The default of
# "2097152" is fine for 1Gb.
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216

# use the H-TCP congestion control algorithm which is more aggressive pushing
# up to max bandwidth (total BDP) and favors hosts with lower TTL / VARTTL then
# the default "newreno". Understand "newreno" works really well in most
# conditions and enabling HTCP may only gain a you few percentage points of
# throughput. We suggest testing both. 
# make sure to add 'cc_htcp_load="YES"' to /boot/loader.conf then check
# available congestion control options with "sysctl net.inet.tcp.cc.available"
net.inet.tcp.cc.algorithm=htcp

# Ip Forwarding to allow packets to traverse between interfaces and is used for
# firewalls, bridges and routers. When fast IP forwarding is also enabled, IP packets
# are forwarded directly to the appropriate network interface with direct
# processing to completion, which greatly improves the throughput. All packets
# for local IP addresses, non-unicast, or with IP options are handled by the
# normal IP input processing path. All features of the normal (slow) IP
# forwarding path are supported by fastforwarding including firewall (through
# pfil(9) hooks) checking, except ipsec tunnel brokering. The IP fast
# forwarding path does not generate ICMP redirect or source quench messages
# though. Compared to normal IP forwarding, fastforwarding can give a speedup
# of 40 to 60% in packet forwarding performance. This is great for interactive
# connections like online games or VOIP where low latency is critical.
net.inet.ip.forwarding=1
net.inet.ip.fastforwarding=1

# NOTE: Large Receive Offload (LRO) is enabled by default on some network
# cards. LRO _might_ interfere with forwarding TCP traffic. If you plan to
# forward TCP traffic as a router or bridge you should test LRO on the NIC.
# LRO provides a significant receive (rx) performance improvement. However it
# is incompatible with packet-forwarding workloads. You should carefully
# evaluate the environment and enable when possible. To enable: ifconfig
# interface lro OR It can be disabled by using: ifconfig interface -lro Use
# "ifconfig -m" to check the nic's "options" for currently active directives
# and "capabilities" for supported directives. Use "ifconfig igb0 -lro" to
# disable LRO hardware support if "LRO" is seen in "options" or "ifconfig igb0
# -lro" if you find LRO works well with your setup.

# host cache is used to cache connection details and metrics (like TTL and
# VARTTL) to improve future performance of connections between the host we have
# seen before. view host cache stats using "sysctl net.inet.tcp.hostcache.list"
#  . We increase
# expire time for clients who connect hourly or so to our RSS feed.
net.inet.tcp.hostcache.expire=5400

# maximum segment size (MSS) specifies the largest amount of data in a single TCP segment
# 1460 for a IPv4 1500 MTU network (MTU - 20 IPv4 header - 20 TCP header) 
# 1440 for a IPv6 1500 MTU network (MTU - 40 IPv4 header - 20 TCP header) 
# 8960 for a IPv4 9000 MTU network (MTU - 20 IPv4 header - 20 TCP header) and switch ports set at 9216
# 8940 for a IPv6 9000 MTU network (MTU - 40 IPv6 header - 20 TCP header) and switch ports set at 9216
# For most networks 1460 is optimal, but you may want to be cautious and use
# 1440. This smaller MSS allows an extra 20 bytes of space for those client which are on a
# DSL line which may use PPPoE. These networks have extra header data stored in
# the packet and if there is not enough space, must be fragmented over additional 
# partially filled packets. Fragments cause extra processing which wastes
# time getting the data out to the remote machines.
# 
# We choose IPv4 network 1500 MTU - 60 bytes which includes the 20 bytes safety buffer
net.inet.tcp.mssdflt=1440

# Does not create a socket or compressed tcpw for TCP connections restricted to
# the local machine. Basically, connections made internally to the FreeBSD box
# itself. An example would be a web server and a database server running on
# the same machine. If the web server queries the local database server then no
# states would be made for that connection. If you do not have a lot of
# internal communication between programs this directive will not make much
# difference.
net.inet.tcp.nolocaltimewait=1

# TCP Slow Start is the congestion control mechanism which controls the growth
# of the sending rate and FlightSize is the amount of unacknowledged "data"
# that can be on the wire at any one time. Google recommends an IW of at least
# 10, but we recommend testing higher values. An MTU of 1440 bytes * 10 initial
# congestion window = 14.4KB data burst. Though it is aggressive we found
# values of 128, 64 and 32 work quite well for "most" internet clients. Seem
# ridiculously high? Many web browsers open around six(6) connections per page,
# per domain. If the congestion window per connection is 10 then the combined
# CWND is 60 for each domain name; well over the recomended 10 by google.
#  . A slowstart_flightsize of 64 is great for SPDY
# enabled SSL servers. If using a larger scale factor make sure
# net.inet.tcp.sendspace and net.inet.tcp.recvspace are larger then 64KB or at
# least as large as your slowstart_flightsize multiplied by mssdflt.
net.inet.tcp.local_slowstart_flightsize=10
net.inet.tcp.slowstart_flightsize=10

# Make sure time stamps are enabled for slowstart_flightsize
net.inet.tcp.rfc1323=1

# Make sure rfc3390 is DISABLED so the slowstart flightsize values are used.
net.inet.tcp.rfc3390=0

# size of the TCP transmit and receive buffer. If you are running the machine
# as a dedicated web server and do not accept large uploads you may want to
# decrease net.inet.tcp.recvspace to 8192 to resist DDoS attacks from using up
# all your RAM for false connections. In nginx make sure to set "listen 80
# default rcvbuf=8k;" as well. Generally, we suggest setting both send and
# receive space to larger then 65535 if you have a good amount of RAM; 8 gig or
# more. A client wishing to send data at high rates may need to set its own
# receive buffer to something larger than 64k bytes before the connections
# opens to ensure the server properly negotiates WSCALE. If running a web
# server and you have a lot of spare ram then set the send space to the total
# size in bytes of a standard user request. For example, if a user requests
# your home page and it has 2 pictures, css and index.html equaling 212
# kilobytes then set the sendspace to something like 262144 (256K). This will
# let the web server dump the entire requested page set into the network buffer
# getting more data on the wire faster and freeing web server resources.  By
# increasing the sendspace to a value larger then the whole page requested we
# saved 200ms on the web server client response time. Every millisecond counts.
net.inet.tcp.sendspace=262144 # default 65536
net.inet.tcp.recvspace=131072 # default 32768

# Increase auto-tuning TCP step size of the TCP transmit and receive buffers.
# The buffer starts at "net.inet.tcp.sendspace" and "net.inet.tcp.recvspace"
# and increases by this increment as needed.
#  . We will increase the
# recvbuf_inc since we can receive data at 1gig/sec. We only send 256K web page
# data sets at a time and the net.inet.tcp.sendspace is already big enough.
#net.inet.tcp.sendbuf_inc=16384  #  8192 default
net.inet.tcp.recvbuf_inc=524288  # 16384 default

# somaxconn is the buffer or backlog queue depth for accepting new TCP
# connections. This is NOT the total amount of connections the server can
# receive. Lets say your web server can accept 1000 connections/sec and your
# clients are temporarily bursting in at 1500 connections per/sec. You may want
# to set the somaxconn at 1500 to be a 1500 deep connection buffer so the extra
# 500 clients do not get denied service. Also, a large listen queue will do a
# better job of avoiding Denial of Service (DoS) attacks, _IF_ your application
# can handle the TCP load and at the cost of a bit more RAM. 
kern.ipc.somaxconn=1024 # 128 default

# Reduce the amount of SYN/ACKs we will _retransmit_ to an unresponsive initial
# connection. On the initial connection our server will always send a SYN/ACK
# in response to the clients initial SYN. Limiting retransmitted SYN/ACKS
# reduces local cache size and a "SYN flood" DoS attack's collateral damage by
# not sending SYN/ACKs back to spoofed ips. If we do continue to send SYN/ACKs
# to spoofed IPs they may send RST back to us and an "amplification" attack
# would begin against our host.
# ~jlemon/papers/syncache.pdf
# 
# If the client does not get our original SYN/ACK we will retransmit one more
# SYN/ACK, but no more. It is up to the client to ask again for our service if
# they can not connect in three(3) seconds. An additional retransmit
# corresponds to 1 (original) + 2 (retransmit) = 3 seconds, and the odds are
# that if a connection cannot be established by then, the user has given up. 
net.inet.tcp.syncache.rexmtlimit=1

# Syncookies have a certain number of advantages and disadvantages. Syncookies
# are useful if you are being DoS attacked as this method helps filter the
# proper clients from the attack machines. But, since the TCP options from the
# initial SYN are not saved in syncookies, the tcp options are not applied to
# the connection, precluding use of features like window scale, timestamps, or
# exact MSS sizing. As the returning ACK establishes the connection, it may be
# possible for an attacker to ACK flood a machine in an attempt to create a
# connection. Another benefit to overflowing to the point of getting a valid
# SYN cookie is the attacker can include data payload. Now that the attacker
# can send data to a FreeBSD network daemon, even using a spoofed source IP
# address, they can have FreeBSD do processing on the data which is not
# something the attacker could not do without having SYN cookies. Even though
# syncookies are helpful, we are going to disable them at this time.
net.inet.tcp.syncookies=0

# General Security and DoS mitigation.
dev.igb.0.fc=0                        # disable flow control for intel nics
net.inet.ip.check_interface=1         # verify packet arrives on correct interface
net.inet.ip.portrange.randomized=1    # randomize outgoing upper ports
net.inet.ip.process_options=0         # IP options in the incoming packets will be ignored
net.inet.ip.random_id=1               # assign a random IP_ID to each packet leaving the system
net.inet.ip.redirect=0                # do not send IP redirects
net.inet.ip.accept_sourceroute=0      # drop source routed packets since they can not be trusted
net.inet.ip.sourceroute=0             # if source routed packets are accepted the route data is ignored
#net.inet.ip.stealth=1                # do not reduce the TTL by one(1) when a packets goes through the firewall
net.inet.icmp.bmcastecho=0            # do not respond to ICMP packets sent to IP broadcast addresses
net.inet.icmp.maskfake=0              # do not fake reply to ICMP Address Mask Request packets
net.inet.icmp.maskrepl=0              # replies are not sent for ICMP address mask requests
net.inet.icmp.log_redirect=0          # do not log redirected ICMP packet attempts
net.inet.icmp.drop_redirect=1         # no redirected ICMP packets
#net.inet.icmp.icmplim=50             # 50 ICMP packets per second. a reasonable number for a small office.
net.inet.tcp.delayed_ack=1            # delay acks so they can be combined into other packets to increase bandwidth
net.inet.tcp.drop_synfin=1            # SYN/FIN packets get dropped on initial connection
net.inet.tcp.ecn.enable=1             # explicit congestion notification (ecn) warning: some ISP routers abuse it
net.inet.tcp.fast_finwait2_recycle=1  # recycle FIN/WAIT states quickly (helps against DoS, but may cause false RST)
net.inet.tcp.icmp_may_rst=0           # icmp may not send RST to avoid spoofed icmp/udp floods
net.inet.tcp.maxtcptw=15000           # max number of tcp time_wait states for closing connections
net.inet.tcp.msl=5000                 # 5 second maximum segment life (helps a bit against DoS)
net.inet.tcp.path_mtu_discovery=0     # disable MTU discovery since most ICMP packets are dropped by others
#net.inet.tcp.sack.enable=0           # sack disabled? http://www.ibm.com/developerworks/linux/library/l-tcp-sack/index.html
net.inet.udp.blackhole=1              # drop udp packets destined for closed sockets
net.inet.tcp.blackhole=2              # drop tcp packets destined for closed ports
#net.route.netisr_maxqlen=4096        # route queue length defaults 4096 (rtsock using "netstat -Q")
security.bsd.see_other_uids=0         # hide processes for root from user uid's

# decrease the scheduler maximum time slice for lower latency program calls.
# by default we use stathz/10 which equals twelve(12). also, decrease the
# scheduler maximum time for interactive programs as this is a dedicated server
# (default 30). Do NOT use these two settings if this machine is a desktop with
# graphical X as mouse and window performance will suffer. 
# kern.sched.interact=5
# kern.sched.slice=3

# security settings for jailed environments. it is generally a good idea to
# separately jail any service which is accessible by an external client like
# you web or mail server. This is especially true for public facing services.
# take a look at ezjail, 
security.jail.allow_raw_sockets=1
security.jail.enforce_statfs=2
security.jail.set_hostname_allowed=0
security.jail.socket_unixiproute_only=1
security.jail.sysvipc_allowed=0
security.jail.chflags_allowed=0

# Spoofed packet attacks may be used to overload the kernel route cache. A
# spoofed packet attack using a random source IP will cause the kernel to
# generate a temporary cached route in the route table, Setting rtexpire and
# rtminexpire to two(2) seconds should be sufficient to protect the route table
# from attack. 
# http://www.freebsd.org/doc/en/books/handbook/securing-freebsd.html
net.inet.ip.rtexpire=60      # 3600 secs
net.inet.ip.rtminexpire=2    # 10 secs
net.inet.ip.rtmaxcache=1024  # 128 entries

######### OFF BELOW HERE #########
#
# Other options not used, but included for future reference. We found the
# following directives did not increase the speed or efficiency of our firewall
# over the defaults set by the developers. 

# ZFS - Set TXG write limit to a lower threshold. This helps "level out" the
# throughput rate (see "zpool iostat").  A value of 256MB works well for
# systems with 4 GB of RAM, while 1 GB works well for us w/ 8 GB on disks which
# have 64 MB cache.
#vfs.zfs.write_limit_override=1073741824

# Time before a delayed ACK is sent (default 100ms). By default, the ack is
# delayed 100 ms or sent every other packet in order to improve its chances of
# being added to a return data packet. This method can cut the number of tiny
# packets flowing across the network in half and is efficient. Setting
# delayed_ack to zero(0) will produce twice as many small packets on the
# network without much benefit. Setting delacktime higher then 100 seems to
# slow down downloads as ACKs are queued too long.
#net.inet.tcp.delayed_ack=1   # 1 default
#net.inet.tcp.delacktime=100  # 100 default

# maximum incoming and outgoing ip4 network queue sizes. if, and only if,
# "sysctl net.inet.ip.intr_queue_drops" is greater
# then zero increase these values till queue_drops is always zero(0).
#net.inet.ip.intr_queue_maxlen=4096
#net.route.netisr_maxqlen=4096

# increase buffers for communicating across localhost. If you run many high
# bandwidth services on lo0 like a local DB server or many jails on lo0 then
# these might help. 
#net.local.stream.sendspace=163840    # lo0 mtu 16384 x 10
#net.local.stream.recvspace=163840    # lo0 mtu 16384 x 10

# UFS hard drive read ahead equivalent to 4 MiB at 32KiB block size. Easily
# increases read speeds from 60 MB/sec to 80 MB/sec on a single spinning hard
# drive.  OCZ Vertex 4 SSD drives went from 420 MB/sec to 432 MB/sec (SATA 6).
# use bonnie to performance test file system I/O
#vfs.read_max=128

# global limit for number of sockets in the system. If kern.ipc.numopensockets
# plus net.inet.tcp.maxtcptw is close to kern.ipc.maxsockets then increase this
# value
#kern.ipc.maxsockets = 25600

# spread tcp timer callout load evenly across cpus. We did not see any speed
# benefit from enabling per cpu timers. The default is off(0)
#net.inet.tcp.per_cpu_timers = 0

# disable harvesting entropy for /dev/random from the following devices.
# Truthfully, disabling entropy harvesting does _not_ save much CPU or
# interrupt time. We noticed setting the following to zero(off) increased
# bandwidth by 0.5% or 2Mb/sec on a 1Gb link. We prefer the entropy and leave
# these on(1).
#kern.random.sys.harvest.interrupt = 1
#kern.random.sys.harvest.ethernet = 1
#kern.random.sys.harvest.point_to_point = 1

# Increase maxdgram length for jumbo frames (9000 mtu) OSPF routing. Safe for
# 1500 mtu too.
#net.inet.raw.maxdgram=9216
#net.inet.raw.recvspace=9216 

# IPv6 Security
# For more info see http://www.fosslc.org/drupal/content/security-implications-ipv6
# Disable Node info replies
# To see this vulnerability in action run `ping6 -a sglAac ::1` or `ping6 -w ::1` on unprotected node
#net.inet6.icmp6.nodeinfo=0
# Turn on IPv6 privacy extensions
# For more info see proposal 
#net.inet6.ip6.use_tempaddr=1
#net.inet6.ip6.prefer_tempaddr=1
# Disable ICMP redirect
#net.inet6.icmp6.rediraccept=0
# Disable acceptation of RA and auto linklocal generation if you don't use them
##net.inet6.ip6.accept_rtadv=0
##net.inet6.ip6.auto_linklocal=0

#
### EOF ###



OPTIONAL: Rebuilding the Kernel

Rebuilding the FreeBSD kernel is completely optional and will not affect the speed test we show. We simply wanted to include the modifications we made to the kernel for completeness.

#############################################

## Rebuild the kernel with the following
# cd /usr/src/sys/amd64/conf
# cp GENERIC CALOMEL
# vi CALOMEL   (then add the lines below for /usr/src/sys/amd64/conf/CALOMEL)
# cd /usr/src/ && make buildkernel KERNCONF=CALOMEL && make installkernel KERNCONF=CALOMEL && make clean

ident           CALOMEL

# Enable Pf and ALTQ with HFSC
device pf
device pflog
device pfsync
options         ALTQ
options         ALTQ_HFSC
options         ALTQ_NOPCC

# NOTE about ALTQ: We only built the kernel with ALTQ to stop the 'No ALTQ'
# warning messages. We do not use ALTQ in Pf because it is very inefficient.
# You will loose as much as 10% of your network throughput by simply enabling
# ALTQ in Pf. More information at  

# allow forwarding packets without touching the TTL to
# help to make this machine more transparent
options         IPSTEALTH

# Eliminate datacopy on socket read-write (nginx)
options         ZERO_COPY_SOCKETS

# A framework for very efficient packet I/O from userspace, capable of line
# rate at 10G (FreeBSD10+) See
# 
#device netmap

# Change kernel messages color
options         SC_KERNEL_CONS_ATTR=(FG_YELLOW|BG_BLACK)
options         SC_HISTORY_SIZE=8192

# Increase maximum size of Raw I/O and sendfile(2) readahead
options MAXPHYS=(1024*1024)
options DFLTPHYS=(1024*1024)

#############################################




Do these optimizations really make a difference for a web server ?

Lets take a look at the web server performance for our server, calomel.org before and after modifications. Keep in mind the graphs are the result of the exact same hardware, the same network, the same files and access is 100% public requests. We are only graphing successful requests (code 200) and no code 301 redirections or errors.

With Nginx you can setup the log lines to tell you how long it took to fulfill the client request and complete the data transfer to the client. We have an example of how to setup this log format on our Nginx Secure Web Server page. Using the log data we can graph the nginx performance times with our .

FreeBSD 9 before optimization: In the following graph we are displaying the time in 100 millisecond increments against the number of completed object transfers for the last ten thousand (10,000) log lines. The log was collected BEFORE our speed optimizations using a default FreeBSD 9 install. Keep in mind an "object" is an html, jpg, css or any other file. File sizes range from 24 kilobytes up to 350 kilobytes. On the left hand side is the tall vertical line at 0 seconds going up to 7142 objects and a few smaller lines on the bottom left. The tall line at 0 seconds tells us for the last 10,000 objects served, the web server was able to send and complete the network transfer in zero(0) seconds ( i.e. less then 100 milliseconds) 71.42% of the time. (7142/10000*100=71.42). Keep in mind calomel.org is a SSL enabled site and this time also includes the https negotiation phase.

calomel@freebsd9:  ./calomel_http_log_distribution_performance.pl

   .:.  Calomel Webserver Distribution Statistics

         Log lines: 10000, Search string(s):   
       __________________________________________________________________
  7142 |.................................................................
       |.................................................................
  6157 |_________________________________________________________________
       |.................................................................
  5172 |.................................................................
       |_________________________________________________________________
  4679 |.................................................................
       |.................................................................
  3694 |_________________________________________________________________
       |.................................................................
  2709 |.................................................................
       |_________________________________________________________________
  2216 |.................................................................
       |.................................................................
  1231 |_________________________________________________________________
       |||...............................................................
   246 ||||..............................................................
       |||||||||||||||||||||||||||||_|||_|_||_|_|_|_|_|__|_______________
 Time: |^^^^|^^^^|^^^^|^^^^|^^^^|^^^^|^^^^|^^^^|^^^^|^^^^|^^^^|^^^^|^^^^|
       0   0.5  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  4.6  5.0  5.5  6.0



FreeBSD 9 after optimization: The following graph shows the results after we applied the optimizations found at the beginning of this page. Compared to the previous graph we can see our FreeBSD based Nginx server is able to serve 95.3% of the objects in less then 100ms. (9531/10000*100=95.31) compared to just 71.42%. FreeBSD and Nginx are fast and we just made them faster.

calomel@freebsd9:  /storage/tools/web_server_distribution_performance.pl 

   .:.  Calomel Webserver Distribution Statistics

         Log lines: 10000, Search string(s):   
       __________________________________________________________________
  9531 |.................................................................
       |.................................................................
  8216 |_________________________________________________________________
       |.................................................................
  6902 |.................................................................
       |_________________________________________________________________
  6244 |.................................................................
       |.................................................................
  4930 |_________________________________________________________________
       |.................................................................
  3615 |.................................................................
       |_________________________________________________________________
  2958 |.................................................................
       |.................................................................
  1643 |_________________________________________________________________
       |.................................................................
   328 |.................................................................
       ||||||||||||__||||_|_|_|_|________________|_______________________
 Time: |^^^^|^^^^|^^^^|^^^^|^^^^|^^^^|^^^^|^^^^|^^^^|^^^^|^^^^|^^^^|^^^^|
       0   0.5  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  4.6  5.0  5.5  6.0



OpenBSD 5.x for comparison: Here is the run on our log when we were running OpenBSD. OpenBSD is very secure and has the latest Pf, but it is not considered the fastest of operating systems. Notice the graph shows 1042 requests completed in 0.1 seconds (i.e. less then 200 milliseconds) which equals 10.42% out of a total of 10000. With the FreeBSD optimizations enabled we are able to complete transactions more than 18 times faster! ((9531/.1)/(1042/0.2)=18.29). We really like OpenBSD, but we have to admit it is not suited for a high speed - low latency web server.

calomel@openbsd5.x:  ./calomel_http_log_distribution_performance.pl

   .:.  Calomel Webserver Distribution Performance Statistics

         Log lines: 10000, Search string(s):
       __________________________________________________________________
  1042 .|................................................................
       .|................................................................
   898 _|________________________________________________________________
       .||...............................................................
   754 .|||..............................................................
       _|||______________________________________________________________
   682 .||||.............................................................
       .||||.............................................................
   539 _||||_____________________________________________________________
       .||||..||.........................................................
   395 .||||||||.........................................................
       _|||||||||________________________________________________________
   251 .||||||||||.......................................................
       .|||||||||||......................................................
   179 _||||||||||||_____________________________________________________
       .||||||||||||.....................................................
    35 ||||||||||||||||||||....||........................................
       ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
 Time: |^^^^|^^^^|^^^^|^^^^|^^^^|^^^^|^^^^|^^^^|^^^^|^^^^|^^^^|^^^^|^^^^|
       0   0.5  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  4.6  5.0  5.5  6.0




阅读(5919) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~