This is an in-depth article about how works in general and how it works with Qt.
What is HTTP caching?
When a browser loads a Web page, the different resources (HTML pages,
images, CSS scripts etc.) are stored locally, so that next time the
resource is retrieved, it can possibly be served from the local store
instead of loading it from the network again. This has several benefits:
- speed up: Loading resources from the cache is a lot faster than loading them from the network.
- offline usage: Pages can be displayed without being connected to the network.
- reducing load: Loading resources from a cache or a proxy reduces the load on the originating server.
This article is mostly about finding out when a resource can be
loaded from cache, and when it has to be loaded from the network.
How does caching work with the HTTP protocol?
The usual flow with HTTP caching goes like this: When the client
(usually a browser) is requesting a resource via HTTP GET for the first
time, it does usually not send any caching information with it. The
server responds with a HTTP 200 OK message and the data, while it adds
some headers to control caching on the client side, namely:
expiration information:
When the server responds to a client request, it sends information
along whether the resource can be cached to disk and, if that is the
case, how long the resource can be fetched from cache the next time the
client loads it. In other words, it tells the client when the resource
expires in the cache and has to be loaded from the network again. HTTP
headers used by servers and proxies for sending expiration information
are (list is not complete):
- Expires: The server tells the client the date of when the resource expires. Example: “Expires: Fri, 29 Apr 2011 09:22:59 GMT“
- Cache-Control: max-age: The server tells
the client the maximum age of a resource, i.e. how old the resource can
get while still being considered fresh. Example: The server tells the
client that the resource can be cached for one hour ( = 3600 seconds): “Cache-Control: max-age=3600“.
Example:
- Cache-Control: s-maxage: Same as the
max-age case, but used for shared caches (e.g. caching proxies) and
ignored by private caches (e.g. browser caches), while the max-age case
is for private caches. Example: The server tells intermediate proxies
that the resource can be cached for one hour ( = 3600 seconds): “Cache-Control: s-maxage=3600“
- Cache-Control: must-revalidate: The server
tells the client to always reload this resource, in case other
expiration information is not enough. For instance, a client is allowed
to serve a stale (over-aged) resource from the cache (see
QNetworkRequest::PreferCache below), so specifying “Cache-Control: max-age=0”
would not be enough in that case. Specifying “must-revalidate” makes
sure the client always reloads from the server itself (and not only from
intermediate proxies). E.g. Facebook and Twitter are using that for
their front page (but usually not for elements referenced from their
front page).
Example:
- Age: Denotes the age of a resource. This
header specifies the time in seconds from when the resource has been
generated by the originating server. Now at first glance this seems
redundant, because a reply from the server should always implicitly have
an age of zero. However, often the reply does not come from the
originating server directly, but from intermediate proxies (check e.g. ). In that case, the “Age” header denotes the number of seconds from when the resource has been fetched from the originating server. The “Age” needs to be considered when calculating the “max-age” directive.
modification information:
When the client has a resource in its cache locally, it can ask the
server to send the resource only if it has changed. This involves always
a roundtrip to the server, but might save data if the server tells the
client that the resource has not changed since the client fetched it
last time. In that case, the server sends an HTTP message with an empty
body, instead of sending the data body as well. HTTP headers used for
sending modification information are (list is not complete):
From the server:
- Last-Modified: The server tells the client the date of when the resource was last modified.
- ETag:The server sends a version identifier
of the transmitted resource. This can be considered a hash function of
the data body, which will change whenever the resource changes.
From the client:
- If-Modified-Since: The client tells the server to only send the data if it has been modified since the given date; i.e. if the Last-Modified
header has changed. If it has not been modified, the server sends an
HTTP 304 Not Modified message, containing only HTTP headers, but no
body. If it has been modified, the client sends an HTTP 200 OK message
containing the body.
Example:
- If-None-Match: The client tells the server
to only send data if it has a new version identifier, i.e. if the ETag
header has changed. If it has not been changed, the server sends an HTTP
304 Not Modified message, containing only HTTP headers, but no body. If
it has been changed, the client sends an HTTP 200 OK message containing
the body.
Example:
It is interesting to note that the headers involving absolute dates (“Expires“, “Last-Modified“, “If-Modified-Since“)
were already present in HTTP 1.0; the newer HTTP 1.1 standard resorts
to means not involving dates, but time data relative to the client’s
clock (“max-age“, “s-maxage“) or versioning information (“ETag“, “If-None-Match“).
This is because in order of handling dates to work accurately, the
server and client clocks need to be synchronized. ETags and relative
time data provide more robust means that do not assume the clocks to be
synchronized. That said, all of the headers presented above are still in
widespread use.
How does caching work with Qt?
By default, no disk cache is used when retrieving resources over HTTP with the class. In order to enable a cache, you need to either instantiate the class or write your own class deriving from and then set it on your QNetworkAccessManager instance by calling .
In that case, Qt will load resources from the cache if the resource is
still fresh, and load from the network if not; if possible it adds
modification information, as described above.
In order to fine-tune the behavior of how Qt loads resources from the network, you can set specific attributes in your by calling with QNetworkRequest::CacheLoadControlAttribute being one of:
- AlwaysNetwork: Always load from the server and force intermediate caches to reload by setting “Cache-Control: no-cache” and “Pragma: no-cache“.
- PreferNetwork (default): If the resource can be found in the cache and the age of the cached resource is less than the maximum age (used headers: “Age“, “Cache-Control: max-age“) or the resource has not expired (used header: “Expires“),
then it is loaded from the cache. If the resource has expired or has
exceeded its maximum age, it is loaded from the server, if possible with
modification information (used headers: “If-Modified-Since” if “Last-Modified” was given, and “If-None-Match” if “ETag” was given).
- PreferCache: If the resource can be found in the
cache and has not expired, then load from cache. The contrast to
PreferNetwork here is that even stale, i.e. resources exceeded its
maximum age, will be loaded from cache. If the resource has expired
(determined via “Expires” header) or cannot be found in the cache, this setting behaves as with the PreferNetwork case.
- AlwaysCache: Serve the data from the cache if
available, never use the network; this can be seen as an offline mode.
If the resource is not in the cache, an error is reported.
If you want to fine-tune the caching behaviour even more, you could add headers (e.g. “Cache-Control: max-age” or “Cache-Control: max-stale“) yourself via .
Areas for improvement in Qt
- : If a resource does not have an expiration date (no “Expires” header) and the age of the page cannot be determined (no “max-age” or “Age” header), the client can implement heuristics to determine whether a page is fresh. In particular, if a resource has a “Last-Modified”
header, a fraction (the HTTP RFC mentions 10%) of that time until now
can be used to assume a resource still being fresh. For example, if a
resource has a “Last-Modified” header set 10 days in the past, the resource can be assumed to be fresh for 1 day.
- The age calculation needs to be reworked.
- from the cache can be made faster.
- The
As always, feel free to vote and comment on the tasks above.
From:
阅读(3416) | 评论(0) | 转发(0) |