Six Things First-Time Squid Administrators Should Know
by Duane Wessels, author of Squid: The Definitive Guide
02/12/2004
New users often struggle with the same frustrating set of Squid
idiosyncracies. In this article, I'll detail six things you should know
about using Squid from the get-go. Even if you're an experienced Squid
administrator, you might want to look at these tips and give your
configuration file a sanity check, especially the one about preventing
spam.
1. File Descriptor Limits
File descriptor limits are a common problem for new Squid users. This
happens because some operating systems have relatively low per-process
and system-wide limits. In some cases, you must take steps to tune your
system before compiling Squid.
A file descriptor is simply a number that represents an open file or
socket. Every time a process opens a new file or socket, it allocates a
new file descriptor. These descriptors are reused after the file or
socket is closed. Most Unix systems place a limit on the number of
simultaneously open file descriptors. There are both per-process and
per-system limits.
How many file descriptors does Squid need? The answer depends on how
many users you have, the size of your cache, and which particular
features that you have enabled. Here are some of the things that
consume file descriptors in Squid:
Client-side TCP connections
Server-side TCP connections
Writing cachable responses to disk
Reading cache hits from disk
Log files
Communication with external helper processes, such as redirectors and authenticators
Idle (persistent) HTTP connections
Even when Squid is not doing anything, it has some number of file
descriptors open for log files and helpers. In most cases, this is
between 10 and 25, so it's probably not a big deal. If you have a lot
of external helpers, that number goes up. However, the file descriptor
count really goes up once Squid starts serving requests. In the worst
case, each concurrent request requires three file descriptors: the
client-side connection, a server-side connection for cache misses, and
a disk file for reading hits or writing misses.
A Squid cache with just a few users might be able to get by with a file
descriptor limit of 256. For a moderately busy Squid, 1024 is a better
limit. Very busy caches should use 4096 or more. One thing to keep in
mind is that file descriptor usage often surges above the normal level
for brief amounts of time. This can happen during short, temporary
network outages or other interruptions in service.
There are a number of ways to determine the file descriptor limit on
your system. One is to use the built-in shell commands limit or ulimit.
For Bourne shell users:
root# ulimit -n
1024
For C shell users:
root# limit desc
descriptors 1024
If you already have Squid compiled and installed, you can just look at the cache.log file for a line like this:
2003/12/12 11:10:54| With 1024 file descriptors available
If Squid detects a file descriptor shortage while it is running, you'll see a warning like this in cache.log:
WARNING! Your cache is running out of file descriptors
If you see the warning, or know in advance that you'll need more file
descriptors, you should increase the limits. The technique for
increasing the file descriptor limit varies between operating systems.
For Linux Users
Linux users need to edit one of the system include files and twiddle
one of the system parameters via the /proc interface. First, edit
/usr/include/bits/types.h and change the value for __FD_SETSIZE. Then,
give the kernel a new limit with this command:
root# echo 1024 >; /proc/sys/fs/file-max
Finally, before compiling or running Squid, execute this shell command to set the process limit equal to the kernel limit:
root# ulimit -Hn 1024
After you have set the limit in this manner, you'll need to
reconfigure, recompile, and reinstall Squid. Also note that these two
commands do not permanently set the limit. They must be executed each
time your system boots. You'll want to add them to your system startup
scripts.
For NetBSD/OpenBSD/FreeBSD Users
On BSD-based systems, you'll need to compile a new kernel. The kernel
configuration file lives in a directory such as /usr/src/sys/i386/conf
or /usr/src/sys/arch/i386/conf. There you'll find a file, possibly
named GENERIC, to which you should add a line like this:
options MAXFILES=8192
For OpenBSD, use option instead of options. Reboot your system after
you've finished configuring, compiling, and installing your new kernel.
Then, reconfigure, recompile, and reinstall Squid.
For Solaris Users
Add this line to your /etc/system file:
set rlim_fd_max = 1024
Then, reboot the system, reconfigure, recompile, and reinstall Squid.
For further information on file descriptor limits, see Chapter 3,
"Compiling and Installing", of Squid: The Definitive Guide or section
11.4 of the Squid FAQ.
2. File and Directory Permissions
Directory permissions are another problem that first-time users often
encounter. One of the reasons for this difficulty is that, in the
interest of security, Squid refuses to run as root. Furthermore, if you
do start Squid as root, it switches to a default user ("nobody"
that has no special privileges. If you don't want to use the "nobody"
userid, you can set your own with the cache_effective_user directive in
the configuration file.
Certain files and directories must be writable by the Squid userid.
These include the log files, usually found in
/usr/local/squid/var/logs, and the cache directories,
/usr/local/squid/var/cache by default.
As an example, let's assume that you're using the "nobody" userid for
Squid. After running make install, you can use this command to set the
permissions for the log files and cache:
root# chown -R nobody /usr/local/squid/var/logs
root# chown -R nobody /usr/local/squid/var/cache
Then, you can proceed to initialize the cache directories with this command:
root# /usr/local/squid/sbin/squid -z
Helper processes are another source of potential permission problems.
Squid spawns the helper processes as the unprivileged user (that is, as
"nobody"
.
This usually means that the helper program must have read and execute
permissions for everyone (for example, -rwxr-xr-x). Furthermore, any
configuration or password files that the helper needs must have
appropriate read permissions as well.
Note that Unix also requires correct permissions on parent directories
leading to a file. For example, if /usr/local/squid is owned by root
with -rwxr-x--- permissions, the user nobody will not be able to access
any of the directories underneath it. /usr/local/squid should be
"-rwxr-xr-x" instead.
You may want to debug file or directory permission problems from a
shell window. If Squid runs as nobody, then start a shell process as
user nobody:
root# su - nobody
(You may have to temporarily change "nobody"'s home directory and shell
program for this to work.) Then, try to read, write, or execute the
files that are giving you trouble. For example:
nobody$ cd /usr
nobody$ cd local
nobody$ cd squid
nobody$ cd var
nobody$ cd logs
nobody$ touch cache.log
3. Controlling Squid's Memory Usage
Squid tends to be a bit of a memory hog. It uses memory for many
different things, some of which are easier to control than others.
Memory usage is important because if the Squid process size exceeds
your system's RAM capacity, some chunks of the process must be
temporarily swapped to disk. Swapping can also happen if you have other
memory-hungry applications running on the same system. Swapping causes
Squid's performance to degrade very quickly.
An easy way to monitor Squid's memory usage is with standard system
tools such as top and ps. You can also ask Squid itself how much memory
it is using, through either the cache manager or SNMP interfaces. If
the process size becomes too large, you'll want to take steps to reduce
it. A good rule of thumb is to not let Squid's process size exceed 60%
to 80% of your RAM capacity.
One of the most important uses for memory is the main cache index. This
is a hash table that contains a small amount of metadata for each
object in the cache. Unfortunately, all of these "small" data
structures add up to a lot when Squid contains millions of objects. The
only way to control the size of the in-memory index is to change
Squid's disk cache size (with the cache_dir directive). Thus, if you
have plenty of disk space, but are short on RAM, you may have to leave
the disk space underutilized.
Squid's in-memory cache can also use significant amounts of RAM. This
is where Squid stores incoming and recently retrieved objects. Its size
is controlled by setting the cache_mem directive. Note that the
cache_mem directive only affects the size of the memory cache, not
Squid's entire memory footprint.
Squid also uses some memory for various I/O buffers. For example, each
time a client makes an HTTP request to Squid, a number of memory
buffers are allocated and then later freed. Squid uses similar buffers
when forwarding requests to origin servers, and when reading and
writing disk files. Depending on the amount and type of traffic coming
to Squid, these I/O buffers may require a lot of memory. There's not
much you can do to control memory usage for these purposes. However,
you can try changing the TCP receive buffer size with the
tcp_recv_bufsize directive.
If you have a large number of clients accessing Squid, you may find
that the "client DB" consumes more memory than you would like. It keeps
a small number of counters for each client IP address that sends
requests to Squid. You can reduce Squid's memory usage a little by
disabling this feature. Simply put client_db off in squid.conf.
Another thing that can help is to simply restart Squid periodically,
say, once per week. Over time, something may happen (such as a network
outage) that causes Squid to temporarily allocate a large amount of
memory. Even though Squid may not be using that memory, it may still be
attached to the Squid process. Restarting Squid allows your operating
system to truly free up the memory for other uses.
You can use Squid's high_memory_warning directive to warn you when its
memory size exceeds a certain limit. For example, add a line like this
to squid.conf:
high_memory_warning 400 MB
Then, if the process grows beyond that value, Squid writes warnings to cache.log and syslog if configured.
4. Rotating the Log Files
Squid writes to various log and journal files as it runs. These files
will continually increase in size unless you take steps to "rotate"
them. Rotation refers to the process of closing a log file, renaming
it, and opening a new log file. It's similar to the way that most
systems deal with their syslog files, such as /var/log/messages.
If you don't rotate the log files, they may eventually consume all free
space on that partition. Some operating systems, such as Linux, cannot
support files larger than 2Gb. When this happens, you'll get a "File
too large" error message and Squid will complain and restart.
To avoid such problems, create a cron job that periodically rotates the log files. It can be as simple as this:
0 0 * * * /usr/local/squid/sbin/squid -k rotate
In most cases, daily log file rotation is the most appropriate. A not-so-busy cache can get by with weekly or monthly rotation.
Squid appends numeric suffixes to rotated log files. Each time you run
squid -k rotate, each file's numeric suffix is incremented by one.
Thus, cache.log.0 becomes cache.log.1, cache.log.1 becomes cache.log.2,
and so on. The logfile_rotate directive specifies the maximum number of
old files to keep around.
Logfile rotation affects more than just the log files in
/usr/local/squid/var/logs. It also generates new swap.state files for
each cache directory. However, Squid does not keep old copies of the
swap.state files. It simply writes a new file from the in-memory index
and forgets about the old one.
5. Understanding Squid's Access Control Syntax
Squid has an extensive, but somewhat confusing, set of access controls.
The most important thing to understand is the difference between ACL
types, elements, and rules, and how they work together to allow or deny
access.
Squid has about 20 different ACL types. These refer to certain aspects
of an HTTP request or response, such as the client's IP address (the
src type), the origin server's hostname (the dstdomain type), and the
HTTP request method (the method type).
An ACL element consists of three components: a type, a name, and one or
more type-specific values. Here are some simple examples:
acl Foo src 1.2.3.4
acl Bar dstdomain
acl Baz method GET
The above ACL element named Foo would match a request that comes from
the IP address 1.2.3.4. The ACL named Bar matches a URL.
The Baz ACL matches an HTTP GET request. Note that we are not allowing
or denying anything yet.
For most of the ACL types, an element can have multiple values, like this:
acl Argle src 1.1.1.8 1.1.1.28 1.1.1.88
acl Bargle dstdomain
acl Fraggle method PUT POST
A multi-valued ACL matches a request when any one of the values is a
match. They use OR logic. The Argle ACL matches a request from 1.1.1.8,
from 1.1.1.28, or from 1.1.1.88. The Bargle ACL matches requests to
NBC, ABC, or CBS web sites. The Fraggle ACL matches a request with the
methods PUT or POST.
Now that you're an expert in ACL elements, its time to graduate to ACL
rules. These are where you say that a request is allowed or denied.
Access list rules refer to ACL elements by their names and contain
either the allow or deny keyword. Here are some simple examples:
http_access allow Foo
http_access deny Bar
http_access allow Baz
It is important to understand that access list rules are checked in
order and that the decision is made when a match is found. Given the
above list, let's see what happens when a user from 1.2.3.4 makes a GET
request for . Squid encounters the allow Foo rule first. Our
request matches the Foo ACL, because the source address is 1.2.3.4, and
the request is allowed to proceed. The remaining rules are not checked.
How about a PUT request for from 5.5.5.5? The request does
not match the first rule. It does match the second rule, however. This
access list rule says that the request must be denied, so the user
receives an error message from Squid.
How about a GET request for from 5.5.5.5? The request
does not match the first rule (allow Foo). It does not match the second
rule, either, because is different than .
However, it does match the third rule, because the request method is
GET.
Of course, these simple ACL rules are not very interesting. The real
power comes from Squid's ability to combine multiple elements on a
single rule. When a rule contains multiple elements, each element must
be a match in order to trigger the rule. In other words, Squid uses AND
logic for access list rules. Consider this example:
http_access allow Foo Bar
http_access deny Foo
The first rule says that a request from 1.2.3.4 AND for
will be allowed. However, the second rule says that any other request
from 1.2.3.4 will be denied. These two lines restrict the user at
1.2.3.4 to visiting only the site. Here's an even more
complex example:
http_access deny Argle Bargle Fraggle
http_access allow Argle Bargle
http_access deny Argle
These three lines allow the Argle clients (1.1.1.8, 1.1.1.28, and 1.1.1.8
to access the Bargle servers (, , and
), but not with PUT or POST methods. Furthermore, the Argle
clients are not allowed to access any other servers.
One of the common mistakes often made by new users is to write a rule
that can never be true. It is easy to do if you forget that Squid uses
AND logic on rules and OR logic on elements. Here is a configuration
that can never be true:
acl A 1.1.1.1
acl B 2.2.2.2
http_access allow A B
The reason is that a request cannot be from both 1.1.1.1 AND 2.2.2.2 at
the same time. Most likely, it should be written like this:
acl A 1.1.1.1 2.2.2.2
http_access allow A
Then, requests from either 1.1.1.1 or 2.2.2.2 are allowed.
Access control rules can become long and complicated. When adding a new
rule, how do you know where it should go? You should put more-specific
rules before less-specific ones. Remember that the rules are checked in
order. When adding a rule, go through the current rules in your head
and see where the new one fits. For example, let's say that you want to
deny requests to a certain site, but allow all others. It should look
like this:
acl XXX
acl All src 0/0
http_access deny XXX
http_access allow All
Now, what if you need to make an exception for one user, so that she can visit that site? The new ACL element is:
acl Admin 3.3.3.3
and the new rule should be:
http_access allow Admin XXX
but where does it go? Since this rule is more specific than the deny XXX rule, it should go first:
http_access allow Admin XXX
http_access deny XXX
http_access allow All
If we place the new rule after deny XXX, it will never even get
checked. The first rule will always match the request and she will not
be able to visit the site.
When you first install Squid, the access control rules will deny every
request. To get things working, you'll need to add an ACL element and a
rule for your local network. The easiest way is to write an source IP
address ACL element for your subnet(s). For example:
acl MyNetwork src 192.168.0.0/24
Then, search through squid.conf for this line:
# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
After that line, add an http_access line with an allow rule:
http_access allow MyNetwork
Once you get this simple configuration working, feel free to move on to
some of the more advanced ACL features, such as username-based proxy
authentication.
6. How to Not Be a Spam Relay
Unless you've been living under a rock, you're aware of the spam
problem on the Internet. Spam senders used to take advantage of open
email relays. These days, a lot of spam comes from open proxies. An
open proxy is one that allows outsiders to make requests through it. If
others on the Internet receive spam email from your proxy, your IP
address will be placed on one or more of the various blackhole lists.
This will adversely affect your ability to communicate with other
Internet sites.
Use the following access control rules to make sure this never happens
to you. First, always deny all requests that don't come from your local
network. Define an ACL element for your subnet:
acl MyNetwork src 10.0.0.0/16
Then, place a deny rule near the top of your http_access rules that matches requests from anywhere else:
http_access deny !MyNetwork
http_access ...
http_access ...
While that may stop outsiders, it may not be good enough. It won't stop
insiders who intentionally, or unintentionally, try to forward spam
through Squid. To add even more security, you should make sure that
Squid never connects to another server's SMTP port:
acl SMTP_port port 25
http_access deny SMTP_port
In fact, there are many well-known TCP ports, in addition to SMTP, to
which Squid should never connect. The default squid.conf includes some
rules to address this. There, you'll see a Safe_ports ACL element that
defines good ports. A deny !Safe_ports rule ensures that Squid does not
connect to any of the bad ports, including SMTP.
Duane Wessels discovered Unix and the Internet as an undergraduate student studying physics at Washington State University.