Traffic Server Web Cache替换算法分析(二)-hanwei

hanwei_1049

首页　| 　博文目录　| 　关于我

hanwei_1049

博客访问： 1531776
博文数量： 228
博客积分： 1698
博客等级：上尉
技术积分： 3241
用户组：普通用户
注册时间： 2008-12-24 21:49

个人简介

Linux

文章分类

全部博文（228）

进程间通信（4）
LVS（0）
OpenStack（1）
HTTPS（5）
LUA（1）
版本控制（9）
个人计划（2）
Nginx（27）
MySQL（0）
Trouble Sho（2）
HaProxy（1）
进程调度（0）
ATS（31）
CDN（15）
Redis（0）
TCP/IP协议栈（7）
文件系统/存储（3）
内存管理（11）
系统/脚本（4）
编程相关（8）
攻防研究（13）
体系结构（9）
数据结构（0）
内核相关（14）
安全相关（1）
网络相关（43）
未分配的博文（17）

文章存档

2017年（1）

2016年（43）

2015年（102）

2014年（44）

2013年（5）

2012年（30）

2011年（3）

我的朋友

相关博文

Traffic Server Web Cache替换算法分析(二)

分类： LINUX

2015-07-07 14:47:22

来源：http://blog.chinaunix.net/uid-23242010-id-147989.html

本章主要描述TrafficServer中ram cache设计方案。TrafficServer在内存中维护了一个ram cache，用来保存用户频繁访问的热点文件或数据，以避免磁盘查找慢的问题。TS在刚开源的时候，ram cache的设计只是一个简单的LRU算法，后来，John Plevya针对各个web cache替换算法的优缺点，提出了CLFUS算法，并根据该算法设计与实现了一个新的ram cache，新的ram cache同时支持内容压缩与解压缩[1][2]。

数据结构(这里将无关的内容都删去了)：

//data代表的就是用户请求访问的内容，宏LINK生成指向下一个节点的指针，比如this->hash_link.next指向的就
//是下一个RamCacheCLFUSEntry，TS提供了一套接口通过LINK将RamCacheCLFUSEntry组成一个linked list
struct RamCacheCLFUSEntry {
LINK(RamCacheCLFUSEntry, lru_link);
LINK(RamCacheCLFUSEntry, hash_link);
Ptr data;
};

//这个类提供了从ram cache中读取或者写入用户请求的url的内容，其中get负责读，put负责写
struct RamCacheCLFUS : public RamCache {
int get(INK_MD5 *key, Ptr *ret_data, uint32_t auxkey1 = 0, uint32_t auxkey2 = 0);
int put(INK_MD5 *key, IOBufferData *data, uint32_t len, bool copy = false, uint32_t auxkey1 = 0, uint32_t auxkey2 = 0);
// private
DList(RamCacheCLFUSEntry, hash_link) *bucket;
Que(RamCacheCLFUSEntry, lru_link) lru[2];
uint16_t *seen;
void resize_hashtable(); //bucket是可以动态扩展大小的
};

CLFUS算法涉及的有以下几个数据结构：
(1)基于开放链表法实现的散列表RamCacheCLFUS::bucket。
对于用户请求的url页面，CLFUS首先会对其进行hash映射到bucket中某个linked list，然后在该linked list顺序查找。

(2)Cached List，对应于RamCacheCLFUS::lru[0]。
cached list保存TS认为属于热点的url页面，当用户请求的url属于cached list中时，CLFUS从cached list中直接取出返回给用户。

(3)History List，对应于RamCacheCLFUS::lru[1]。
history list保存用户请求的不在bucket中的url页面，history list中的页面可以根据替换规则替换掉cached list中某个页面。
cached list与history list逻辑上都属于lru list。bucket中的元素有一部分在cached list中，有一部分在history list中。CLFUS通过bucket找到RamCacheCLFUSEntry对象，再根据对象中的bit位来确定该对象是在cached list还是history list中，避免了直接从cached list或者history list中顺序查找时间复杂度高的问题。

(4)Seen Hash，对应于RamCacheCLFUS::seen。在文章(一)中提到，如果一个用户请求的url是第一次访问时候，LRU/2以及2Q算法都不会直接将请求的内容放入cache中。这里Seen Hash起的是相似的作用，如果用户请求的页面只被请求过一次，则CLFUS将该页面的标识保存在seen hash中，在该页面第二次被访问时，通过seen hash可以判断出该页面是第一次还是第二次被访问。

RamCacheCLFUS::get伪代码算法分析：

if X is in cached list then
move X to the tail of cached list, and return the data in X
else if X is in history list then
move X to the tail of history list
"cache miss"
else
"cache miss"
end if

RamCacheCLFUS::put伪代码算法分析：

if X is in cached list then
move X to the tail of cached list, and update its data
else if X is in history list then
if cached list has room to place X then
insert X to the tail of cached list, and update its data
else
create list V
do
pop one page Y from cached list
//simulate the aging algorithm, for avoiding cache pollution
pop one page Z from history list
if HIT_VALUE(Z) is not greater than 1 then
delete Z
else
let the HIT_VALUE(Z) with 1, and reinserted Z to the tail of history list
end if
//end

if CACHE_VALUE(X) is greater than CACHE_VALUE(Y) then
push it to V
else
insert X to the tail of history list and update its data, return
end if
util cached list has enough room for placing X
end do
for(Z in V)
if cached list has room for both Z and X, then
reinsert Z to the tail of cached list
insert X to the tail of cached list, and update its data
end if
end for
else // X is neither in history list nor in cached list
//judge X is or not first accessed by seen hash
if X is first accessed and history list has no room for it, then
save the record of X in seen hash
else
insert X to the tail of history list
end if
end if
end if
CACHE_VALUE = hits/(size + overhead) //similar to GDFS algorithm

参考文献：
[1] /> [2] />

阅读(1749) | 评论(0) | 转发(0) |

上一篇：Traffic Server Web Cache替换算法分析(一)

下一篇：Traffic Server 源码分析——存储机制分析

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6