对于一些公司,通常通过mysql blob来存储文件。对于Mysql 55以后,文件格式采用了Barracuda的新格式,这种格式有个明显的功能就是可以压缩每一行,根据key_block_size来选择需要压缩的大小,对于Blob的大文件而言,无疑为mysql使用上有多了一个选择!
来看一个压缩的案例:
mysql> create table comptest(b mediumblob);
Query OK, 0 rows affected (0.05 sec)
mysql>insert into comptest values(repeat('a',1000000));
Query OK, 1 row affected (0.02 sec)
mysql> show table status like "comptest" \G
*************************** 1. row ***************************
Name: comptest
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 1
Avg_row_length: 1589248
Data_length: 1589248
Max_data_length: 0
Index_length: 0
Data_free: 0
Auto_increment: NULL
Create_time: 2014-01-15 22:24:31
Update_time: NULL
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:
1 row in set (0.00 sec)
mysql> alter table comptest row_format=compressed;
Query OK, 1 row affected (0.03 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> show table status like "comptest" \G
*************************** 1. row ***************************
Name: comptest
Engine: InnoDB
Version: 10
Row_format: Compressed
Rows: 1
Avg_row_length: 16384
Data_length: 16384
Max_data_length: 0
Index_length: 0
Data_free: 0
Auto_increment: NULL
Create_time: 2014-01-15 22:26:44
Update_time: NULL
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options: row_format=COMPRESSED
Comment:
1 row in set (0.00 sec)
可以看到在使用mysql5.5的barracuda文件格式的compress之后,发现既然达到100倍左右的压缩率。
在Antelope文件格式下的,存在compact和redundant行格式,Innodb会尝试将全部的行内容存入Innodb page.而且,每个页至少存两行数据,这就限制了每个行不得超过8000byte。当然,新的文件格式,也就是Barracuda格式的
row格式,允许使用压缩的方式来存储。在这种存储格式下,Innodb在row page上存储所有的blob内容,如果blob不能完全存储在page上,innodb将只存储一个20 byte的指针指向真正存储blog内容的地址。这样,你就可
以在一个row page上存储更多的行。在Barracuda存储格式下,blob可以有前缀索引,不在需要blob前缀部分存在row page上(如5.5以前的Antelope格式,对于超过8000byte的blog,只存储前768byte,接着一个20byte的指
针指向剩下的blog的地址,前缀索引部分必须是前768byte字符)。所以,你可以在blob字段上建立前缀索引,而这些blog可以存放在row page以外分配的空间。
So BLOB storage was not very efficient in REDUNDANT (MySQL 4.1 and below) and COMPACT (MySQL 5.0 and above) format and the fix comes with Innodb Plugin in “Barracuda” format and ROW_FORMAT=DYNAMIC. In this format Innodb stores either whole blob on the row page or only 20 bytes BLOB pointer giving preference to smaller columns to be stored on the page, which is reasonable as you can store more of them. BLOBs can have prefix index but this no more requires column prefix to be stored on the page – you can build prefix indexes on blobs which are often stored outside the page.
为什么不用文件系统存储,而mysql只用存储Link呢?
1.在文件过大的情况下,需要监控innode的使用情况,并根据需要调整innode。
2.在文件夹过多的情况下,需要根据文件的类别将文件放在多个子文件夹下,这样你就要确保你的应用程序合理的管理你的文件夹,比如创建,删除空文件夹等。
3.在文件夹中文件数量过多的情况下,ls操作也会带来一定的影响。
所以有时候使用db来存储文件,使用Innodb来存储文件的有效方式如下:
A) 建立一个table,只有两个字段,一个是自增主键id,一个是blob的值。
B)所有关于这个blob值的元数据信息,存放在另一个单独的表里面,通过id主键来和第一个表关联。
C)当你有复杂查询的时候,只需和元数据信息表进行关联,再通过元数据信息,就能得到blob表中单行信息。
当然,你在设计上述表的时候,需要注意一下几点:
1).对于在Innodb 中通过blob存储的单个大文件,根据主键来查询该blob。如果文件过多过大,导致磁盘是瓶颈,通过行压缩将代价转换到CPU上从而缓解磁盘的压力。
2).一个设计良好的元数据信息表,需要存储blob的所有相关信息。包括是否使用前50个字段作为前缀索引等。
3).对于一个表中有多个blob字段的表,可以考虑通过将这些blob存放到一个列中,通过应用程序来对这个blob进行拆分,减少磁盘碎片。
I agree with you in general, but there are applications where storing tons of files in a filesystem is bad.
I ran into problems having many thousands (millions?) of files, where I ran out of inodes in the filesystem I was using. Besides disk size, I did not want to have to worry about keeping track of inodes nor extra options to mkfs to increase inode counts whenever deploying a new server or increasing disk or logical volume size.
If you split things into multiple subdirectories then you also have to ensure your application(s) maintain those directories — create new, delete old when empty, etc.
Also doing an ‘ls’ on a huge directory will tie things up on your server.
The more efficient thing to do is:
A) create 1 table that has ONLY an id sequence column set as primary index, and 2nd column for the blob. This table could conceivably be in a different database if you so choose.
B) All other metadata about this blob must be held in a separate table, and the id column used to associate rows (1-to-1 relationship).
C) Do your hefty queries (with joins, complex where clauses, etc) ONLY against the metadata table, and when you actually need the data from the blob query just that single row from your blob table.
What you achieve:
1) A single large file in the filesystem for that 1 InnoDB table that holds the blob. Queries against this table will use the primary index, and will only pull from the actual table what you need.
If disk space usage is important, use row compression and pay the CPU price on your DB server, or have your application compress/decompress upon INSERT/SELECT to pay that price on your app. server.
2) A well-designed meta-data table will contain everything you need to know about that blob, and you could design things to even contain a field that is, say, the first 50 bytes of your blob — if you need that for indexing or searching.
If you think you need multiple blob columns, then as Peter suggested, consider combining them into 1 blob and on the application side splitting up the blobs. That might not so easy either, so maybe use multiple “blob-only” tables that have just a few blob columns each — less than 10 he said was problematic.
参考 :http://www.mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb/
阅读(2969) | 评论(0) | 转发(0) |