Http/1.1 header的cache:vary on以及p3p了解-expert1-ChinaUnix博客

whenexpert1.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

expert1

博客访问： 1110952
博文数量： 186
博客积分： 4939
博客等级：上校
技术积分： 2075
用户组：普通用户
注册时间： 2010-04-08 17:15

文章分类

全部博文（186）

AWS（6）
自动化（26）
php（6）
杂项（23）
工作（41）
感悟（5）
优化？（6）
架构相关（11）
高级脚本（60）
未分配的博文（2）

文章存档

2018年（1）

2017年（3）

2016年（11）

2015年（42）

2014年（21）

2013年（9）

2012年（18）

2011年（46）

2010年（35）

我的朋友

相关博文

Http/1.1 header的cache:vary on以及p3p了解

分类：系统运维

2013-08-04 11:02:56

http/1.1的header有很多值得研究和注意的地方，这里说以下vary on，做CDN的大概都知道这个，我在另外一篇文章也记录了。这里看看官方的一些资料。

I never paid much attention to the . In fact, I've been fortunate enough to avoid it for this long and never really had to care much about it. Well, it turns out when you're configuring a high-performance , understanding the Vary header and what it means to your reverse proxy caching policies is absolutely crucial.

Here's an interesting problem I recently solved that dealt with Squid, Apache, and that elusive Vary response header ...

1 - The Vary Basics

Popular caching proxies, like , usually generate a of the request from a number of inputs including the URI and the contents of the Vary response header. When a caching proxy receives a request for a resource, it gathers these inputs, generates a hash, then checks its cache to see if it already has a resource sitting on disk, or in memory, that matches the computed hash. This is how Squid, and other caching proxies, fundamentally know if they have a cache HIT or MISS (e.g., can Squid return the content it has cached or does it need to revalidate the request against the destination server).

That in mind, you can probably see how the Vary header is quite important when a caching proxy is looking for a cache HIT or MISS. The Vary header is a way for the web-server to tell any intermediaries (caching proxies) what they should use, if necessary, to figure out if the requested resource is fresh or stale. Sample Vary headers include:

Vary: Accept-Encoding

Vary: Accept-Encoding,User-Agent

Vary: X-Some-Custom-Header,Host

Vary: *

According to the HTTP spec, "the Vary field value indicates the set of request-header fields that fully determines, while the response is fresh, whether a cache is permitted to use the response to reply to a subsequent request without revalidation." Yep, that's pretty important (I discovered this the hard way).

2 - The Caching Problem

I configured Squid to act as a round-robin load balancer and caching proxy, sitting in front of about four Apache web-servers. Each Apache web-server was running a copy of my web-application, which I intended to have Squid cache where possible. Certain requests, were for large JSON objects, and I explicitly configured Squid to cache requests ending in .json for 24-hours.

I opened a web-browser and visited a URL I expected to be cached (should have already been in the cache from a previous request, notice the HIT) ...

GET /path/big.json HTTP/1.1

Host: app.kolich.local

User-Agent: Firefox

HTTP/1.0 200 OK

Date: Fri, 24 Sep 2010 23:09:32 GMT

Content-Type: application/json;charset=UTF-8

Content-Language: en-US

Vary: Accept-Encoding,User-Agent

Age: 1235

X-Cache: HIT from cache.kolich.local

X-Cache-Lookup: HIT from cache.kolich.local:80

Content-Length: 25090

Connection: close

Ok, looks good! I opened a 2nd web-browser on a different machine (hint: with a different User-Agent) and tried again. This time, notice the X-Cache: MISS ...

GET /path/big.json HTTP/1.1

Host: app.kolich.local

User-Agent: Chrome

HTTP/1.0 200 OK

Date: Fri, 24 Sep 2010 23:11:45 GMT

Content-Type: application/json;charset=UTF-8

Content-Language: en-US

Vary: Accept-Encoding,User-Agent

Age: 4

X-Cache: MISS from cache.kolich.local

X-Cache-Lookup: MISS from cache.kolich.local:80

Content-Length: 25090

Connection: close

Wow, look at that. I requested exactly the same resource, just from a different browser, and I saw a cache MISS. This is obviously not what I want, I need the same cached resource to be served up from the cache regardless of who's making the request. If left alone, this is only caching a response per User-Agent, not globally per resource.

3 - Solution: Check Your Vary Headers

Remember how I said the contents of the Vary header are important for caching proxies?

In both requests above, note the User-Agent request headers and the contents of the Vary response headers. Although each request was for exactly the same resource, Squid determined that they were very different as far as its cache was concerned. How did this happen? Well, take a peek at a Vary response header:

Vary: Accept-Encoding,User-Agent

This tells Squid that the request URI, the Accept-Encoding request header, and the User-Agent request header should be included in a hash when determining if an object is available in its cache, or not. Obviously, any reasonable hash of (URI, Accept-Encoding, "Firefox") should not match the hash of (URI, Accept-Encoding, "Chrome"). Hence why Squid seemed to think the request was for different objects!

To fix this, I located the source of the annoying "User-Agent" addition to my Vary response header, which happened to come from . The recommended mod_deflate configuration involves appending "User-Agent" to the Vary response header on any response that is not compressed by mod_deflate. I don't really see why this is necessary, but the Apache folks seemed to think this was important. Here's the relevant lines from the Apache suggested mod_deflate configuration:

SetEnvIfNoCase Request_URI \.(?:gif|jpe?g|png|ico)$ no-gzip dont-vary

Header append Vary User-Agent env=!dont-vary

In any event, I removed the 2nd line above, restarted Apache and Squid began caching beautifully regardless of which client issued the request. Essentially, I told Squid to stop caring about the User-Agent by removing "User-Agent" from my Vary response header, and problem solved!

下边是我最近遇到的p3p设置的问题

Here's what we observed is happening.

1. CN? player loads on video.brand.com<>.

2. CN? player makes an ad request to xx.v.xxx.com<>.

3. xx.v.xxx.com<> responds without a P3P policy in the header.

4. In the reply, xx.v.xxx.com<> also responds with a bunch of cookies.

5. CN? player gets the response ok and displays the ad ok.

6. But all of the cookies sent are ignored, because by default IE will ignore cookies set on a domain that's not the same as the current site if there is no P3P policy in the header.

7. User experience relies on these cookies that are ignored, so we get a pre-roll every time，which is not what we expected.

This is actually IE's behavior since IE6, but we only support 8+.

Would it be possible to send back some P3P policy in the header, for any responses (including beacons) that set cookies?

下边是我google到的一些东西：

站点 b.com 有这样一个网页：

这个页面的源代码如下：

......
<iframe src="" >iframe>

......
这个源代码中用 iframe 包含了 a.net 站点的一个页面。这时候所谓的的第一方站点就是 b.com 站点，第三方站点就是 a.net 站点。

的功能很简单，就是写一个长期保存的Cookie，代码如下：

当我们访问地址时，问题出现了：某些cookie被限制。

我们另外写一个页面来获取Cookie，

我们首先访问；然后访问这个页面，我们会发现，没有Cookie。

一个非常简单的解决方案就是修改文件，在其中增加下面一行代码：

Response.Headers.Add("P3P", "CP=/"CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR/"");

P3P（Platform for Privacy Preferences）（隐私权偏好选项平台）是W3C(World Wide Web Consortium)公布的一项隐私保护推荐标准。Microsoft Internet Explorer 6 (IE6) 是第一个支持这项新隐私权标准的浏览器具备 P3P 能力的浏览器具有一些可供您选择的默认选项。或者您也可以通过回答问题的方式（例如您愿意分享哪些数据、愿意接受哪些类型的 Cookie 文件）自定义您的设置。当您在 Web 浏览时，这个软件会判断您的隐私权偏好选项是否与网站的数据收集做法匹配。

阅读(1957) | 评论(0) | 转发(0) |

上一篇：A summary of NFS service

下一篇：MySQL slave复制故障3例

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6