分类: LINUX
2009-03-03 16:09:14
This is a project to build filtering capabilities comparable to those of into . It consists of a filtering framework and a set of filter modules. Currently available filters:
Usually, a filtering proxy runs standalone and does nothing but filtering. Users have to configure this proxy in their browsers, and if they use a caching proxy too, chain them after the filter. In situations where the user runs Squid anyway (mostly because of caching for different browsers or a small LAN), it is convenient to build this capability into Squid.
You need the Squid sources, everything for compiling them, GNU patch, autoconf 2.50 and automake 1.6.
gzip -cd squid-3.0stable9-filter-0.2.patch.gz | patch -p1
sh bootstrap.sh sh configure (options...) --enable-filters
--with-morefilters="/path/to/file.cc /path/to/other.cc.."
filter_module name [ arguments... ] [ * {allow|deny} acls... ]It tells Squid to define a filter of the given type. The filter modules can take arguments as documented for the individual modules. Arguments are separated with whitespace with the same quoting mechanisms as used elsewhere in squid.conf. A filter type can be specified in more than one filter_module line, in that case several filter instances with different parameters will be created. See below on chaining filters.
Each filter line can optionally take an ACL list. This must start with an asterisk (surrounded by whitespace), followed by either the keyword allow
or deny
, followed by one or more ACLs defined before the filter line.
A filter with no ACL specification is applied to every request. A filter with an ACL specification is applied to each request which is denied by the ACL. In other words: an allowing ACL allows to bypass the filter.
There is a new option for the http_port
directive: The flag nofilter
specifies that requests arriving on this port will not be filtered. Effectively this makes a filtering and a non-filtering proxy running at once, on different ports.
grep -E
syntax), one pattern per line, against which the URI is matched. Blank lines and lines starting with a "number sign" are ignored in the usual fashion. Whenever a pattern file is changed, it gets reloaded at the next request automatically, no reconfigure needed. A pattern is marked as case-insensitive by prepending a dash. (To place a real dash at the start of a pattern use a class, like [-]
). Patterns may not contain literal TABs, use \t
instead.
There are two types of pattern files: simple lists and replacement lists.
sed s///
-like fashion. This type of pattern file is used by the redirection filter. Each line in the file consists of two elements separated by (at least) one TAB character. The first is a pattern, the second a replacement. The replacement may contain \1, \2... \9
references to parenthesized subpatterns; \0
means the whole match and \*
means the complete original URI. The replacement may also contain \_0, \_1..., \_*
references which copy the same subpatterns in modified base64 encoding (see below).
A special replacement can be given as a shortcut for patterns which have no explicit replacement. This default is specified as replacement for the pattern consisting of a single exclamation mark, which should be the first line in the file. Negative match does not work in a replacement list.
+ / =
(plus, slash, equals) replaced by - _ .
(dash, underscore, dot) respectively. This leads to an URL-safe encoding of request URIs or part thereof (may be useful for script-based redirect result postprocessing).
request_header_replace
clause must be set up to filter out the Accept-Encoding and Accept-Ranges request headers.request_header_replace Accept-Encoding identity request_header_replace Accept-Ranges noneSee below for the exact reason.
Currently there are the following filters:
SCRIPT
tags, on...
handlers and browser-specific ways of inserting Javascript into tag attributes) from HTML pages. (For also blocking JavaScript files use an ACL against the "application/x-javascript" file type.)
OBJECT
tags from HTML pages. The tags are preserved, only the classid
parameter is replaced by a dummy, so the page will still be processed correctly (as if by a non-ActiveX browser). This filter takes a pattern file as optional argument. This file contains a list of CLSIDs which are allowed through.
Each content filter specifies the MIME content type(s) to which it applies (like image/gif
for the gifanim module) and ignores all other types.
Content filters can be chained. When more than one filter applies to a given MIME content type, every filter operates on the results of its predecessor.
.X.nofilter
to the host name in the URL, where the X
is replaced by the Squid's visible host name. Example: to get
unfiltered from a Squid called squid.cache
, use the URI
.
The NOFILTER tag as part of the hostname in the URL implies that correctly written relative links, including images, linked scripts etc. on the same server, will also be unfiltered. Apply the necessary caution.
Reason for the inclusion of the Squid's host name is to avoid that web servers add the NOFILTER tag to their junk banner links themselves. This works best when visible_hostname
, unique_hostname
and the canonical (DNS) host name of the proxy are all different and not too related, because the origin server sees the latter two but not the former.
Since ".nofilter" is not a valid top level domain, it can't clash with real host names.
Another possible way to bypass filters is to use a non-filtering port, as described above. Requests arriving on that port will always bypass all filters.
A class diagram (created with ) for the filter classes is here: http://sites.inka.de/bigred/devel/filter-patch.zargo.
PatFile
provides the pattern file facility described above. It is included in the Squid core and described in PatFile.h
.
debug_options
directive) are used:
Section 92 | Filter framework |
Section 93 | Filter modules |
Section 94 | Library modules (PatFile etc.) |
Level 1 | Error messages |
Level 3 | "Filter caught something" messages |
Level 4 | Initialization/finalization messages |
Level 5 | Initialization/finalization trace |
Level 8 | Minor trace |
Level 9 | Full trace (big!) |
script
applied to a file with compression encoding can silently deliver corrupted files, but mostly this is caught by the HTML parser not accepting null characters.)
For this reason, the Accept-Encoding headers should always be filtered out with an appropriate header_replace
clause. The origin server gets forced to always send unencoded data with Accept-Encoding: identity. Another header_replace
which sets the Accept-Ranges header to none causes the client to never try Range requests, which obviously are unfilterable too.
The cache stores always unfiltered objects. Content filtering happens in the data path from cache or memory to the client. The filter object is expected to copy the data into a new buffer, so it can do anything with it including insertions and deletions.
The only exception to the rule that filtering happens only in the path to the client are those filters which alter the request. This applies to the redirect module.
In a cache hierarchy, a filtering cache should only be placed at the bottom, i.e. where only clients directly access it. If another cache sits between the filter and client, that one will cache filtered pages and break the NOFILTER feature.
load_module
directive has been replaced by filter_module
with slightly different syntax.
nofilter_port
directive has been replaced by the nofilter
option in http_port
.
acl allow_activex url_regex "/usr/local/squid/etc/allowlist_activex" filter_module activex * allow allow_activexThe
""
around the path tell the ACL to read its patterns from a file. The syntax of this file should be compatible with the old allow lists. You have to reconfigure when this file is changed, however.
header_access
clauses (use Cookie and Set-Cookie with ACLs for allow lists).
rep_mime_type
ACLs. The web page has one of the oldest and best known web filters as well as a very comprehensive covering most issues from "What is this all about?" to a list of filtering software (by now most of them are either for Windows or for pay or both, which indicates there is a real demand for filtering).
The latest release is filter 0.2 for Squid 3.0.STABLE9. Download at http://sites.inka.de/bigred/devel/squid-3.0stable9-filter-0.2.patch.gz.
For use and distribution of this package, the same terms and conditions as for the Squid package itself (i.e. the GNU General Public License) apply. Note, however, that using a version or installation setup which has the NOFILTER feature removed or restricted in any way is in gross contradiction to the author's intentions, and people who do so should feel guilty of abuse.