Varnish

WebCache

WebCache,web缓存,是一种缓存技术,用于临时存储(缓存)的网页文件,如HTML页面和图像等静态资源(此处不绝对,也可以缓存动态页面,但是存储到本地后也为静态文件),减少带宽以及后端服务器的压力,通常一个WebCache也是一个反向代理软件,既可以通过缓存响应用户的请求,当本地没有缓存时,可以代理用户请求至后端主机。(自己学习总结,由于昨天刚接触varnish,所以有错的地方还请担待)
WebCache分为正向和反向之分,一般正向WebCache不常用,本文以反向WebCache为主。

WebCache的由来

由于程序具有局部性,而局部性分为:时间局部性和空间局部性
(1)时间局部性是指:在单位时间内,大部分用户访问的数据只是热点数据(热点数据指经常被访问的数据)
(2)空间局部性是指:比如,某新闻网站突然出来一个重大新闻,此新闻会被被反复访问。

WebCache的新鲜度监测机制

数据都是可变的,所以缓存中的内容要做新鲜度检测
过期日期:
由于网站是可变的,可能缓存定义的时间在未到达之前,数据就已经发生了改变,这在大部分电商站点上是经常发现的,这个时候我们就不得不对数据做新鲜度检测,其方式分为:
HTTP/1.0:Expires
例如:expires:Sat, 20 May 2017 07:49:55 GMT 在具体时间到达之前缓存服务器不会去后端服务器请求,但是会有一个问题,不同地区的时间可能不同
HTTP/1.1:Cache-Control:max-age
例如:Cache-Control: max-age=600 为了解决HTTP/1.0中对于新鲜度控制的策略而生,通过相对时间来控制缓存使用期限
缓存有效性验证机制:
如果原始内容未发生改变,则仅响应首部(不附带body部分),响应码304(Not Modified)
如果原始内容发生改变,则正常响应,响应码200
如果原始内容消失,则响应404,此时缓存中的cache object应被删除
条件式请求首部:
If-Modified-Since:基于请求内容的时间戳作验正,如果后端服务器数据的时间戳未发生改变则继续使用,反之亦然
If-None-Match:通过Etag来跟后端服务器进行匹配,如果数据的Etag未发生改变,既不匹配,则响应新数据,否则继续使用当前数据

WebCache的缓存控制机制

Cache-Control   = "Cache-Control" ":" 1#cache-directive
cache-directive = cache-request-directive
| cache-response-directive
cache-request-directive = //请求报文中的缓存指令
"no-cache" //不要缓存的实体,要求现在从WEB服务器去取
| "no-store" (backup) //不要缓存,其中可能包括用户的敏感信息
| "max-age" "=" delta-seconds //只接受 Age 值小于 max-age 值,并且没有过期的对象
| "max-stale" [ "=" delta-seconds ] //可以接受过去的对象,但是过期时间必须小于 max-stale 值
| "min-fresh" "=" delta-seconds //接受其新鲜生命期大于其当前 Age 跟 min-fresh 值之和的缓存对象
| "only-if-cached" //只有当缓存中有副本时,客户端才会获取一份副本
| cache-extension
cache-response-directive =
"public" //可以用 Cached 内容回应任何用户
| "private" [ "=" <"> 1#field-name <"> ] //只能用缓存内容回应先前请求该内容的那个用户
| "no-cache" [ "=" <"> 1#field-name <"> ] //可以缓存,但是只有在跟WEB服务器验证了其有效后,才能返回给客户端
| "no-store" //此内容不允许缓存到缓存服务器上,可能包含用户的敏感信息
| "no-transform" //未改变
| "max-age" "=" delta-seconds //本响应包含的对象的过期时间
| "s-maxage" "=" delta-seconds //本响应包含的对象的过期时间
| cache-extension

常见WebCache软件

常见WebCache软件

Varnish架构

Varnish架构

(1)Managentment管理进程
CLI interface:命令行接口来,目前Web interface为收费接口,而telnet纯文本传输,所以只能使用ClI interface.
managentment主要用于编译VCL并应用新配置、监控varnish、初始化varnish,并提供一个CLI。

(2)child/cache
child/cache线程有几类:
Acceptor:接收新的连接请求;
Worker:用于处理并响应用户请求;
Expiry:从缓存中清理过期cache object

(3)log
shared memory log,共享内容日志方式存储,一般其大小为90MB,分为两部分:前一部分为计数器、后一部分为客户请求相关的数据

Varnish支持的后端缓存存储机制

malloc[,size] 使用内存缓存机制
VARNISH_STORAGE=”malloc,64M”
file[,path[,size[,granularity]]] 通过文件方式存储
VARNISH_STORAGE=”file,${VARNISH_STORAGE_FILE},${VARNISH_STORAGE_SIZE}”
persistent,path,size 前两者在重启后缓存都会消失,persistent可以永久保存缓存,但还为开发阶段

Varnish的 state engine

vcl配置的缓存策略会在state engine中生效
wKiom1dERca
上图为varnish中缓存控制的规则,每一个request进入vcl_recv后都会被发往到各state engine上,不同的vcl规则会发往不同的state engine上,我们可以通过vcl规则来控制用户请求,下面说一些常见的场景。
未命中缓存时
wKiom1dERlbT
命中缓存时
wKioL1dE
直接与后端服务器建立管道
wKioL1dER17
经过vcl_pass交由后端服务器
wKiom1dERn

Varnish安装配置

环境介绍

varnish_server varnish3.0 172.18.4.70 CentOS7
backend_server httpd+php 172.18.4.71 CentOS7
zabbix官网的yum仓库: https://repo.varnish-cache.org/

安装

bash> yum install varnish gcc -y

配置启动环境

bash> vim /etc/sysconfig/varnish

NFILES=131072                             //可打开最大的文件数
MEMLOCK=82000 //锁定的内存空间
RELOAD_VCL=1
//是否在重启varnish服务时装载vcl配置文件
VARNISH_VCL_CONF=/etc/varnish/default.vcl //vcl默认读取配置文件路径
VARNISH_LISTEN_PORT=80 //varnish 默认监听端口
VARNISH_ADMIN_LISTEN_ADDRESS=127.0.0.1 //varnish管理监听地址
VARNISH_ADMIN_LISTEN_PORT=6082 //varnish管理监听端口
VARNISH_SECRET_FILE=/etc/varnish/secret //varnish 密钥文件
VARNISH_MIN_THREADS=50
//最小线程数,varnish进程启动时启动多少个线程
VARNISH_MAX_THREADS=1000
//最大线程数,一般varnish的总线程数不超过5000(线程池数x最大线程数)
VARNISH_THREAD_TIMEOUT=120
//线程超时时间
VARNISH_STORAGE_FILE=/var/lib/varnish/varnish_storage.bin
//varnish缓存文件,varnish将缓存存储为单个文件
VARNISH_STORAGE_SIZE=64M //varnish存储大小
VARNISH_STORAGE="malloc,${VARNISH_STORAGE_SIZE}" //varnish存储访方式,内存方式

启动服务

bash> systemctl start varnish

vcl配置

常见变量

1、在任何引擎中均可使用

now, .host, .port

2、用于处理请求阶段

client.ip, server.hostname, server.ip, server.port
req.request 请求方法
req.url 请求的URL
req.proto HTTP协议版本
req.backend 用于服务此次请求的后端主机;
req.backend.healthy 后端主机健康状态;
req.http.HEADER 引用请求报文中指定的首部;
req.can_gzip 客户端是否能够接受gzip压缩格式的响应内容;
req.restarts 此请求被重启的次数;

3、varnish向backend主机发起请求前可用的变量

bereq.request           请求方法
bereq.url 请求url
bereq.proto 请求协议
bereq.http.HEADER 请求首部
bereq.connect_timeout 等待与be建立连接的超时时长

4、backend主机的响应报文到达本主机(varnish)后,将其放置于cache中之前可用的变量

beresp.do_stream        流式响应;
beresp.do_gzip 是否压缩之后再存入缓存;
beresp.do_gunzip 是否解压缩之后存入缓存
beresp.http.HEADER 报文首部;
beresp.proto 协议
beresp.status 响应状态码
beresp.response 响应时的原因短语
beresp.ttl 响应对象剩余的生存时长,单位为second;
beresp.backend.name: 此响应报文来源backend名称;
beresp.backend.ip 后端主机ip
beresp.backend.port 后端主机的端口
beresp.storage

5、缓存对象存入cache之后可用的变量

obj.proto               协议
obj.status 状态
obj.response 响应报文
obj.ttl 生存周期
obj.hits 命中
obj.http.HEADER:http 首部

6、在决定对请求键做hash计算时可用的变量

req.hash                将请求交给hash

7、在为客户端准备响应报文时可用的变量

resp.proto              协议
resp.status 状态
resp.response 响应
resp.http.HEADER http首部

各变量可用的状态引擎

wKioL1d

常用示例

为了便于使用及理解,在介绍实例前,介绍一个varnish的命令行工具:varnishadm
在命令行下,直接敲varnishadm

varnishadm

bash> varnishadm
200       
-----------------------------
Varnish Cache CLI 1.0
-----------------------------
Linux,2.6.32-573.el6.x86_64,x86_64,-sfile,-smalloc,-hcritbit
varnish-3.0.6 revision 1899836

Type 'help' for command list.
Type 'quit' to close CLI session.

varnish>
会看到上面的一个界面,可以使用help命令来获取帮助。
varnish> help
200
help [command]
ping [timestamp]
auth response
quit
banner
status
start
stop
vcl.load <configname> <filename>
vcl.inline <configname> <quoted_VCLstring>
vcl.use <configname>
vcl.discard <configname>
vcl.list
vcl.show <configname>
param.show [-l] [<param>]
param.set <param> <value>
panic.show
panic.clear
storage.list
backend.list
backend.set_health matcher state
ban.url <regexp>
ban <field> <operator> <arg> [&& <field> <oper> <arg>]...
ban.list

常用的子命令有

vcl.list            用于列出当前使用的配置及状态
vcl.load 用于加载新配置
vcl.show 查看配置中的内容
ping 用来测试varnish是否正常
vcl.use 切换新配置

实例一

添加http首部,让客户端可知缓存是否从服务器中得到

varnish_server

bash> vim /etc/varnish/defatult.vcl

backend default {
.host = "172.18.4.71";
.port = "80";
}

添加vcl语句

bash> vim /etc/varnish/defatult.vcl

sub vcl_deliver {

if (obj.hits > 0) {
set resp.http.X-Cache = "HIT";
} else {
set resp.http.X-Cache = "MISS";
}
}
sub vcl_hit {
return (deliver);
}

通过命令行工具varnishadm重载配置

varnish> vcl.list
200
active 0 boot
varnish> vcl.load test1 /etc/varnish/default.vcl
200
VCL compiled.
varnish> vcl.use test1
200
varnish> vcl.list
200
available 0 boot
active 0 test1

访问并测试
wKioL1dESJb

实例二

设置http首部,让后端主机知道真实客户端ip地址

varnish_server

bash> vim /etc/varnish/default.vcl

sub vcl_recv {
if (req.restarts == 0) {
if (req.http.x-forwarded-for) {
set req.http.X-Forwarded-For =
req.http.X-Forwarded-For + ", " + client.ip;
} else {
set req.http.X-Forwarded-For = client.ip;
}
}

重载配置

varnish> vcl.load test2 /etc/varnish/default.vcl
200
VCL compiled.

varnish> vcl.use test2
200

varnish> vcl.list
200
available 0 boot
active 0 test2

修改后端web服务器配置文件

bash> vim /etc/httpd/conf/httpd.conf
LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

bash> systemctl reload httpd

访问并查看httpd日志

172.18.250.172 - - [23/May/2016:22:40:07 +0800] "GET / HTTP/1.1" 200 24 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36"

并不是varnish:172.18.4.70

实例三

移除某个对象

varnish_server

bash> vim /etc/varnish/default.vcl

acl purgers {
"127.0.0.1";
"172.18.0.0"/16;
}

sub vcl_recv {
if (req.request == "PURGE") {
if (!client.ip ~ purgers) {
error 405 "Method not allowed";
}
return (lookup);
}
}
sub vcl_hit {
if (req.request == "PURGE") {
purge;
error 200 "Purged";
}
}
sub vcl_miss {
if (req.request == "PURGE") {
purge;
error 404 "Not in cache";
}
}
sub vcl_pass {
if (req.request == "PURGE") {
error 502 "PURGE on a passed object";
}
}

重载配置

varnish> vcl.load test3 /etc/varnish/default.vcl
200
VCL compiled.
varnish> vcl.use test3
200

访问并测试

客户端在发起HTTP请求时,只需要为所请求的URL使用PURGE方法即可,其命令使用方式如下

# curl -I -X PURGE http://varniship/path/to/someurl

wKiom1dER

启用默认vcl_recv默认配置时使用的方式

    sub vcl_recv {
if (req.restarts == 0) {
if (req.http.x-forwarded-for) {
set req.http.X-Forwarded-For =
req.http.X-Forwarded-For + ", " + client.ip;
} else {
set req.http.X-Forwarded-For = client.ip;
}
}
if (req.request == "PURGE" ) {
if (!client.ip ~ purgers) {
error 405 "Method not allowed.";
}
}
if (req.request != "GET" &&
req.request != "HEAD" &&
req.request != "PUT" &&
req.request != "POST" &&
req.request != "TRACE" &&
req.request != "OPTIONS" &&
req.request != "DELETE" &&
req.request != "PURGE" ) {
/* Non-RFC2616 or CONNECT which is weird. */
return (pipe);
}
if (req.request != "GET" && req.request != "HEAD" && req.request != "PURGE") {
/* We only deal with GET and HEAD by default */
return (pass);
}
if (req.http.Authorization || req.http.Cookie) {
/* Not cacheable by default */
return (pass);
}
return (lookup);
}

实例四

控制指定来源地址可访问的资源

修改配置文件

bash> vim /etc/varnish/default.conf

acl admingroup {
"127.0.0.1";
"172.18.0.0"/16;
}
if (req.url ~ "login") {
if (!client.ip ~ admingroup) {
error 404 "no permission access";
}
return (lookup);
}

重载配置

varnish> vcl.load test4 /etc/varnish/default.vcl
200
VCL compiled.
varnish> vcl.use test4
200

wKioL1dE

为了实现效果,这里我修改配置文件,拒绝本机访问

bash> vim /etc/varnish/default.conf

acl admingroup {
"127.0.0.1";
# "172.18.0.0"/16; //注释此行
}

重载配置并测试

varnish> vcl.load test5 /etc/varnish/default.vcl
200
VCL compiled.

varnish> vcl.use test5
200

wKiom1dE

实例五

varnish多主机配置

bash> vim /etc/varnish/default.conf

backend web1 {
.host = "172.18.4.71";
.port = "80";
}

director webservers random {
.retries = 5;
{
.backend = web1;
.weight = 2;
}
{
.backend = {
.host = "172.18.4.72";
.port = "80";
}
.weight = 3;
}
}
sub vcl_recv {
set req.backend = webservers;
return (lookup);
}

多主机配置中,常用算法有两种randomround-robin

重载配置

varnish> vcl.load test6 /etc/varnish/default.vcl
200
VCL compiled.

varnish> vcl.use test6
200

访问测试

varnish> backend.list
200
Backend name Refs Admin Probe
default(172.18.4.71,,80) 4 probe Healthy (no probe)
web1(172.18.4.71,,80) 1 probe Healthy (no probe)
webservers[1](172.18.4.72,,80) 1 probe Healthy (no probe)

由于本地是缓存服务器,所以测试效果不是很明显,所以此次并没有测试结果

实例六
varnish健康状态检查

修改配置文件

bash> vim /etc/varnish/default.conf

backend server1 {
.host = "server1.example.com";
.probe = {
.url = "/";
.interval = 5s;
.timeout = 1 s;
.window = 5;
.threshold = 3;
}
}
backend server2 {
.host = "server2.example.com";
.probe = {
.url = "/";
.interval = 5s;
.timeout = 1 s;
.window = 5;
.threshold = 3;
}
}
sub vcl_recv {
if (req.url ~ "\.php$") {
set req.backend = web1;
} else {
set req.backend = web2;
}
return (pass);
}

重载配置

varnish> vcl.load test8 default.vcl
200
VCL compiled.

varnish> vcl.use test8
200

查看后端状态

varnish> backend.list
200
Backend name Refs Admin Probe
web1(172.18.4.71,,80) 1 probe Healthy 8/8
web2(172.18.4.72,,80) 1 probe Healthy 8/8

关闭其中一台,并查看

bash> systemctl stop httpd

varnish> backend.list
200
Backend name Refs Admin Probe
web1(172.18.4.71,,80) 1 probe Sick 1/8
web2(172.18.4.72,,80) 1 probe Healthy 8/8

varnish> backend.list
200
Backend name Refs Admin Probe
web1(172.18.4.71,,80) 1 probe Sick 0/8
web2(172.18.4.72,,80) 1 probe Healthy 8/8

生产案例

backend shopweb {
.host = "172.18.4.1";
.port = "80";
}

acl purge {
"localhost";
"127.0.0.1";
"10.1.0.0"/16;
"192.168.0.0"/16;
}

sub vcl_hash {
hash_data(req.url);
return (hash);
}

sub vcl_recv {
set req.backend = shopweb;
# set req.grace = 4h;
if (req.request == "PURGE") {
if (!client.ip ~ purge) {
error 405 "Not allowed.";
}
return(lookup);
}
if (req.request == "REPURGE") {
if (!client.ip ~ purge) {
error 405 "Not allowed.";
}
ban("req.http.host == " + req.http.host + " && req.url ~ " + req.url);
error 200 "Ban OK";
}
if (req.restarts == 0) {
if (req.http.x-forwarded-for) {
set req.http.X-Forwarded-For = req.http.X-Forwarded-For + ", " + client.ip;
}
else {
set req.http.X-Forwarded-For = client.ip;
}
}
if (req.request != "GET" &&
req.request != "HEAD" &&
req.request != "PUT" &&
req.request != "POST" &&
req.request != "TRACE" &&
req.request != "OPTIONS" &&
req.request != "DELETE") {
/* Non-RFC2616 or CONNECT which is weird. */
return (pipe);
}
if (req.request != "GET" && req.request != "HEAD") {
/* We only deal with GET and HEAD by default */
return (pass);
}
if (req.http.Authorization) {
/* Not cacheable by default */
return (pass);
}


if ( req.url == "/Heartbeat.html" ) {
return (pipe);
}
if ( req.url == "/" ) {
return (pipe);
}
if ( req.url == "/index.jsp" ) {
return (pipe);
}

if (req.http.Cookie ~ "dper=") {
return (pass);
}
if (req.http.Cookie ~ "sqltrace=") {
return (pass);
}
if (req.http.Cookie ~ "errortrace=") {
return (pass);
}
# if ( req.request == "GET" && req.url ~ "req.url ~ "^/shop/[0-9]+$" ) {
if ( req.url ~ "^/shop/[0-9]+$" || req.url ~ "^/shop/[0-9]?.*" ) {
return (lookup);
}

if ( req.url ~ "^/shop/(\d{1,})/editmember" || req.url ~ "^/shop/(\d{1,})/map" || req.url ~ "^/shop/(\d+)/dish-([^/]+)" ) {
return (lookup);
}

return (pass);
# return (lookup);
}

sub vcl_pipe {
return (pipe);
}

sub vcl_pass {
return (pass);
}

sub vcl_hit {
if (req.request == "PURGE") {
purge;
error 200 "Purged.";
}
return (deliver);
}

sub vcl_miss {
if (req.request == "PURGE") {
error 404 "Not in cache.";
}
# if (object needs ESI processing) {
# unset bereq.http.accept-encoding;
# }
return (fetch);
}


sub vcl_fetch {
set beresp.ttl = 3600s;
set beresp.http.expires = beresp.ttl;
#set beresp.grace = 4h;
# if (object needs ESI processing) {
# set beresp.do_esi = true;
# set beresp.do_gzip = true;
# }

if ( req.url ~ "^/shop/[0-9]+$" || req.url ~ "^/shop/[0-9]?.*" ) {
set beresp.ttl = 4h;
}

if ( req.url ~ "^/shop/(\d{1,})/editmember" || req.url ~ "^/shop/(\d{1,})/map" || req.url ~ "^/shop/(\d+)/dish-([^/]+)" ) {
set beresp.ttl = 24h;
}

if (beresp.status != 200){
return (hit_for_pass);
}
return (deliver);
}

sub vcl_deliver {
if (obj.hits > 0){
set resp.http.X-Cache = "HIT";
}
else {
set resp.http.X-Cache = "MISS";
}
set resp.http.X-Powered-By = "Cache on " + server.ip;
set resp.http.X-Age = resp.http.Age;
return (deliver);
}

sub vcl_error {
set obj.http.Content-Type = "text/html; charset=utf-8";
set obj.http.Retry-After = "5";
synthetic {""} + obj.status + " " + obj.response + {""};
return (deliver);
}

sub vcl_init {
return (ok);
}

sub vcl_fini {
return (ok);
}