分类: 系统运维
2013-08-04 11:02:56
Here's an interesting problem I recently solved that dealt with Squid, Apache, and that elusive Vary response header ...
1 - The Vary Basics
Popular caching proxies, like , usually generate a of the request from a number of inputs including the URI and the contents of the Vary response header. When a caching proxy receives a request for a resource, it gathers these inputs, generates a hash, then checks its cache to see if it already has a resource sitting on disk, or in memory, that matches the computed hash. This is how Squid, and other caching proxies, fundamentally know if they have a cache HIT or MISS (e.g., can Squid return the content it has cached or does it need to revalidate the request against the destination server).
That in mind, you can probably see how the Vary header is quite important when a caching proxy is looking for a cache HIT or MISS. The Vary header is a way for the web-server to tell any intermediaries (caching proxies) what they should use, if necessary, to figure out if the requested resource is fresh or stale. Sample Vary headers include:
Vary: Accept-Encoding
Vary: Accept-Encoding,User-Agent
Vary: X-Some-Custom-Header,Host
Vary: *
According to the HTTP spec, "the Vary field value indicates the set of request-header fields that fully determines, while the response is fresh, whether a cache is permitted to use the response to reply to a subsequent request without revalidation." Yep, that's pretty important (I discovered this the hard way).
2 - The Caching Problem
I configured Squid to act as a round-robin load balancer and caching proxy, sitting in front of about four Apache web-servers. Each Apache web-server was running a copy of my web-application, which I intended to have Squid cache where possible. Certain requests, were for large JSON objects, and I explicitly configured Squid to cache requests ending in .json for 24-hours.
I opened a web-browser and visited a URL I expected to be cached (should have already been in the cache from a previous request, notice the HIT) ...
GET /path/big.json HTTP/1.1
Host: app.kolich.local
User-Agent: Firefox
HTTP/1.0 200 OK
Date: Fri, 24 Sep 2010 23:09:32 GMT
Content-Type: application/json;charset=UTF-8
Content-Language: en-US
Vary: Accept-Encoding,User-Agent
Age: 1235
X-Cache: HIT from cache.kolich.local
X-Cache-Lookup: HIT from cache.kolich.local:80
Content-Length: 25090
Connection: close
Ok, looks good! I opened a 2nd web-browser on a different machine (hint: with a different User-Agent) and tried again. This time, notice the X-Cache: MISS ...
GET /path/big.json HTTP/1.1
Host: app.kolich.local
User-Agent: Chrome
HTTP/1.0 200 OK
Date: Fri, 24 Sep 2010 23:11:45 GMT
Content-Type: application/json;charset=UTF-8
Content-Language: en-US
Vary: Accept-Encoding,User-Agent
Age: 4
X-Cache: MISS from cache.kolich.local
X-Cache-Lookup: MISS from cache.kolich.local:80
Content-Length: 25090
Connection: close
Wow, look at that. I requested exactly the same resource, just from a different browser, and I saw a cache MISS. This is obviously not what I want, I need the same cached resource to be served up from the cache regardless of who's making the request. If left alone, this is only caching a response per User-Agent, not globally per resource.
3 - Solution: Check Your Vary Headers
Remember how I said the contents of the Vary header are important for caching proxies?
In both requests above, note the User-Agent request headers and the contents of the Vary response headers. Although each request was for exactly the same resource, Squid determined that they were very different as far as its cache was concerned. How did this happen? Well, take a peek at a Vary response header:
Vary: Accept-Encoding,User-Agent
This tells Squid that the request URI, the Accept-Encoding request header, and the User-Agent request header should be included in a hash when determining if an object is available in its cache, or not. Obviously, any reasonable hash of (URI, Accept-Encoding, "Firefox") should not match the hash of (URI, Accept-Encoding, "Chrome"). Hence why Squid seemed to think the request was for different objects!
To fix this, I located the source of the annoying "User-Agent" addition to my Vary response header, which happened to come from . The recommended mod_deflate configuration involves appending "User-Agent" to the Vary response header on any response that is not compressed by mod_deflate. I don't really see why this is necessary, but the Apache folks seemed to think this was important. Here's the relevant lines from the Apache suggested mod_deflate configuration:
SetEnvIfNoCase Request_URI \.(?:gif|jpe?g|png|ico)$ no-gzip dont-vary
Header append Vary User-Agent env=!dont-vary
In any event, I removed the 2nd line above, restarted Apache and Squid began caching beautifully regardless of which client issued the request. Essentially, I told Squid to stop caring about the User-Agent by removing "User-Agent" from my Vary response header, and problem solved!
下边是我最近遇到的p3p设置的问题
Here's what we observed is happening.
1. CN? player loads on video.brand.com<>.
2. CN? player makes an ad request to xx.v.xxx.com<>.
3. xx.v.xxx.com<> responds without a P3P policy in the header.
4. In the reply, xx.v.xxx.com<> also responds with a bunch of cookies.
5. CN? player gets the response ok and displays the ad ok.
6. But all of the cookies sent are ignored, because by default IE will ignore cookies set on a domain that's not the same as the current site if there is no P3P policy in the header.
7. User experience relies on these cookies that are ignored, so we get a pre-roll every time,which is not what we expected.
This is actually IE's behavior since IE6, but we only support 8+.
Would it be possible to send back some P3P policy in the header, for any responses (including beacons) that set cookies?
下边是我google到的一些东西:
站点 b.com 有这样一个网页:
这个页面的源代码如下:
......
<iframe src="" >iframe>
......
这个源代码中用 iframe 包含了 a.net 站点的一个页面。 这时候所谓的的第一方站点就是 b.com 站点,第三方站点就是 a.net 站点。
的功能很简单,就是写一个长期保存的Cookie,代码如下:
当我们访问 地址时,问题出现了:某些cookie被限制。
我们另外写一个页面 来获取Cookie,
我们首先访问 ; 然后访问 这个页面,我们会发现,没有Cookie。
一个非常简单的解决方案就是修改 文件,在其中增加下面一行代码:
Response.Headers.Add("P3P", "CP=/"CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR/"");
P3P(Platform for Privacy Preferences)(隐私权偏好选项平台)是W3C(World Wide Web Consortium)公布的一项隐私保护推荐标准。Microsoft Internet Explorer 6 (IE6) 是第一个支持这项新隐私权标准的浏览器具备 P3P 能力的浏览器具有一些可供您选择的默认选项。 或者您也可以通过回答问题的方式(例如您愿意分享哪些数据、愿意接受哪些类型的 Cookie 文件)自定义您的设置。 当您在 Web 浏览时,这个软件会判断您的隐私权偏好选项是否与网站的数据收集做法匹配。