Mysql innodb undo&redo 认识一下-skybin090804-ChinaUnix博客

Sky_欧彬skybin090804.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

skybin090804

博客访问： 847603
博文数量： 167
博客积分： 7173
博客等级：少将
技术积分： 1671
用户组：普通用户
注册时间： 2009-08-04 23:07

文章分类

全部博文（167）

轻度运维（12）
云计算（10）
编程（1）
NoSQL（1）
PostgreSQL（1）
mongoDB（4）
其他架构测试总结（1）
工作中积累的文档（1）
网络相关（1）
TTSERVER（1）
其他（2）
Varnish（1）
Memcached（3）
NGINX（9）
RESIN（5）
网络收集的文档（12）
原创文档（1）
SQUID（2）
TCP（1）
SNMP（5）
MogileFS（7）
LVS（7）
MySQL（8）
SQL（3）
Coding（14）

Java（2）

Perl（2）

Python（1）

SHELL（4）
AMS Series（1）
Oracle（9）
AIX（1）
Linux（16）
生活杂谈（5）
Solaris（22）
未分配的博文（0）

文章存档

2018年（1）

2017年（11）

2012年（2）

2011年（27）

2010年（88）

2009年（38）

我的朋友

相关博文

Mysql innodb undo&redo 认识一下

分类： Mysql/postgreSQL

2011-01-05 12:08:50

CPU与Mem，Mem与Disk一级一级的速度差别，使得我们不断寻找可以提高速度
的方式；例如，页面速度的提高：使用squid、varnish、nginx cache等页面
缓存提高页面的访问速度，使用memcache等数据缓存提高应用层访问速度。
数据库怎么减少离散磁盘读写，提高数据访问速度。oracle 从i到g都在不断
优化（之间是回滚段到回滚表空间），对redo和undo日志的利用越来越高。但
mysql中事务类型innodb存储引擎的具体情况是怎样呢？
在对付用户每次有导致数据变更的请求中，Innodb引擎把数据和索引都载入到
内存中的缓冲池(buffer pool)中,如果每次修改数据和索引都需要更新到磁盘,
必定会大大增加I/O请求,而且因为每次更新的位置都是随机的,磁头需要频繁定
位导致效率低，数据暂放在内存中，也一定程度的提高了读的速度。所以Innodb
每处理完一个请求(Transaction)后只添加一条日志log,另外有一个线程负责智
能地读取日志文件并批量更新到磁盘上,实现最高效的磁盘写入。（^-^听着听着
像看到Oracle的一些影子吧，因为本人以前是弄Oracle所以现在简单mysql都联想
到oracle的架构体系，其实个人认为每个数据原理上是差不多。）
innodb既然利用Mem buffer提高相应的速度，那当然也会带来数据不一致，术语为
脏数据，Mysql称之为dirty page。发生过程：当事务(Transaction)需要修改某条
记录（row）时，InnoDB需要将该数据所在的page从disk读到buffer pool中，事务
提交后，InnoDB修改page中的记录(row)。这时buffer pool中的page就已经和disk
中的不一样了，mem中的数据称为脏数据（dirty page）。Dirty page等待flush到disk上。
知道mysql中术语的dirty page（我认为这个dirty page放在哪个缓存工具层面上
都适合）。
dirty page既然是在Buffer pool中，那么如果系统突然断电Dirty page中的数据
修改是否会丢失？答案是肯定的，buffer pool中的数据并不是永久性。
系统故障造成数据库不一致的原因有两个：
1.未完成事务对数据库的更新可能已写入数据库。
2.已提交事务对数据库的更新可能还留在缓冲区没来得及写入数据库。
在这里我们先说恢复的一般方法：
（1）正向扫描日志文件（从头到尾），找出故障发生前已经提交的事务（存在begin transaction
和commit记录），将其标识记入重做（redo）队列。同时找出故障发生时未完成的事务
（只有begin transaction，没commit），将其标识记入（undo）队列
（2）对undo队列的各事务进行撤销处理。进行undo的处理方法是，反向扫描日志文件，对每个undo
事务的更新操作执行反操作，即将日志记录中“更新前的值”写入数据库。
（3）对重做日志中的各事务进行重做操作。进行redo的处理方法是，正向扫描日志，对每个redo事务
重新执行日志文件登记操作。即将日志中“更新后的值”写入数据库。
以上三个步骤放于四海皆行。
但mysql为了防止buffer pool数据掉失，在日常的操作中也建立了redo和undo这两个日志，记录相关
的信息，redo log在每次事务commit的时候，就立刻将事务更改操作记录到redo log。所以即使buffer
pool中的dirty page在断电时丢失，InnoDB在启动时，仍然会根据redo log和undo log中的记录完成数据恢复。
具体操作是：

redo log 也不能无限制放任不断增长，dirty page什么时候flush到disk上？
   1. The redo log, which is organized as a ring buffer, is full. To free up some space we will write
      out dirty pages in redo log order so that we can advance the trailing pointer of the redo log ring
      buffer an make some room.

      redo log是一个环(ring)结构，当redo空间占满时，将会将部分dirty page flush到disk上，然后释放部分redo log。

      This situation is called an Innodb_log_wait and will be registered in the status counter of the same name.
      这种情况称为Innodb_log_wait，会被记录在Mysql status 中。

   2. InnoDB requires a free page from the InnoDB buffer pool but cannot find one. Usually we can free a page in
      the buffer pool by giving up a page that is not marked dirty. When a page is not marked dirty its contents
      can be reloaded from disk at any time and so we can safely give it up in memory. But when the buffer pool
      holds only dirty pages this is impossible and we actually have to flush dirty pages to disk before we can
      free them up for other uses.
      当需要在Buffer pool分配一个page，但是找不到这样的一个page，因为所有的page都是被标注为dirty

      This situation is called Innodb_buffer_pool_wait_free and will be registered in a status counter of the
      same name. InnoDB tries to avoid this situation: Whenever more than innodb_max_dirty_pages_pct percent many
      pages are marked dirty a checkpoint is forced and dirty pges will be written.

      这种情况称为Innodb_buffer_pool_wait_free，并将会记录到Innodb_buffer_pool_wait_free Mysql的系统变量中。
      一般地，可以可以通过启动参数innodb_max_dirty_pages_pct控制这种情况，当buffer pool中的
       dirty page到达这个比例的时候，将会强制设定一个checkpoint，并把dirty page flush到disk中。

   3. InnoDB feels idle and will write out batches of 64 pages each to disk once a second.

      检测到系统空闲的时候，会flush，每次64 pages。

      This is normal and will not be specifically registered (but will of course bump Innodb_pages_written like everything else).

以上三种情况涉及的主要两个参数为：innodb_flush_log_at_trx_commit、innodb_max_dirty_pages_pct；
可通过状态参数：Innodb_log_wait、Innodb_buffer_pool_wait_free进行查询

innodb_flush_log_at_trx_commit
默认值1的意思是每一次事务提交或事务外的指令都需要把日志写入（flush）硬盘，这是很费时。
特别是使用电池供电缓存（Battery backed up cache）时。设成2对于很多运用，特别是从MyISAM
表转过来的是可以的，它的意思是不写入硬盘而是写入系统缓存。日志仍然会每秒flush到硬盘，
所以一般不会丢失超过1-2秒的更新。设成0会更快一点，但安全方面比较差，即使MySQL挂了也
可能会丢失事务的数据。而值2只会在整个操作系统挂了时才可能丢数据。

innodb_max_dirty_pages_pct
his is an integer in the range from 0 to 100. The default value is 90.
The main thread in InnoDB tries to write pages from the buffer pool so that the
percentage of dirty (not yet written) pages will not exceed this value.

具体映射到Mysql的图示

PS：这图也解答了网页中“Mysql 有undo relog log？” 这样的提问了

参考文章：

阅读(1889) | 评论(0) | 转发(0) |

上一篇：淘宝开源的nginx concat模块

下一篇：LVS 插件 lvs-rrd

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6