对于MYSQL的字段选择,char和varchar众说纷纭,那在实际应用中该如何选择呢?high perfornance mysql 3rd有详细的说明:
VARCHAR
VARCHAR stores variable-length character strings and is the most common string data type. It can require less storage space than fixed-length types, because it uses
only as much space as it needs (i.e., less space is used to store shorter values). The exception is a MyISAM table created with ROW_FORMAT=FIXED, which uses a fixed
amount of space on disk for each row and can thus waste space.VARCHAR uses 1 or 2 extra bytes to record the value’s length: 1 byte if the column’s
maximum length is 255 bytes or less, and 2 bytes if it’s more. Assuming the latin1 character set, a VARCHAR(10) will use up to 11 bytes of storage space. A
VARCHAR(1000) can use up to 1002 bytes, because it needs 2 bytes to store length information.
varchar会用一到两个字节来表示字段的实际长度,如果varcha(n)中n<=255,则只用一个byte就可以;如果n>255则需要用两个字节来表示字段的实际长度。所以当我们的字段实际
很小时,我们尽量用可以满足的n来定义最大范围,以免造成无谓的空间浪费。
VARCHAR helps performance because it saves space. However, because the rows are variable-length, they can grow when you update them, which can cause extra work.
If a row grows and no longer fits in its original location, the behavior is storage engine–dependent. For example, MyISAM may fragment the row, and InnoDB
may need to split the page to fit the row into it. Other storage engines may never update data in-place at all.
由于varchar是变长属性,所以更新该字段,可能会使它变长,从而带来额外的工作。比如像MyIsam可能就造成碎片,而innodb可能就会split page来满足改行数据所需要的空间.
Choosing
It’s usually worth using VARCHAR when the maximum column length is much larger than the average length; when updates to the field are rare, so fragmentation is not
a problem; and when you’re using a complex character set such as UTF-8, where each character uses a variable number of bytes of storage.
In version 5.0 and newer, MySQL preserves trailing spaces when you store and retrieve values. In versions 4.1 and older, MySQL strips trailing spaces.
It’s trickier with InnoDB, which can store long VARCHAR values as BLOBs. We discuss this later.
当最大长比平均长度大很多的情况,而且该字段很少更新,或许并不会带来过多的碎片或者说带来的性能影响是可以忽略的,用varchar会是一个好的选择。在mysql4.1之前,
MySQL会在返回的时候,去掉尾部的空格,从5.0以后,保留尾部的空格.对于太长的varchar,会使用blob来存储.
CHAR
CHAR is fixed-length: MySQL always allocates enough space for the specified number of characters. When storing a CHAR value, MySQL removes any trailing spaces.
(This was also true of VARCHAR in MySQL 4.1 and older versions—CHAR and VARCHAR were logically identical and differed only in storage format.) Values are padded
with spaces as needed for comparisons.CHAR is useful if you want to store very short strings, or if all the values are nearly
the same length. For example, CHAR is a good choice for MD5 values for user passwords,which are always the same length. CHAR is also better than VARCHAR for data
that’s changed frequently, because a fixed-length row is not prone to fragmentation.For very short columns, CHAR is also more efficient than VARCHAR; a CHAR(1)
designed to hold only Y and N values will use only one byte in a single-byte character set,1 but a VARCHAR(1) would use two bytes because of the length byte.
varchar和char在4.1之前的版本中,不同之处在于尾部的空格处理部分,4.1之前char和varchar在查询时,都会返回尾部的空格,只是存储方式不同。5.0之后,char的返回值中会去
掉尾部的空格,而varchar会保留。在varchar和char的使用当中,应谨记两点:
[1] char适用于非常短的字符串,或者所有的字符串几乎长度相同。比如:char就非常时候MD5密码的存储,因为md5生成的密码长度相同。
[2] 对于改动比较频繁的字段,使用char来存储会带来更好的性能,因为固定的长度不会带来碎片的问题。
The sibling types for CHAR and VARCHAR are BINARY and VARBINARY, which store binary strings. Binary strings are very similar to conventional strings, but they store bytes
instead of characters. Padding is also different: MySQL pads BINARY values with \0 (the zero byte) instead of spaces and doesn’t strip the pad value on retrieval.3
These types are useful when you need to store binary data and want MySQL to compare the values as bytes instead of characters. The advantage of byte-wise comparisons is
more than just a matter of case insensitivity. MySQL literally compares BINARY strings one byte at a time, according to the numeric value of each byte. As a result, binary
comparisons can be much simpler than character comparisons, so they are faster.Generosity Can Be Unwise
binary和varbinary是char和varchar的二进制格式,使用二进制的方式来存储。只不过他们是按照byte的方式来存储而不是以字符的方式来存储。和字符的不同之处在于:
binary vs char
[1] 按照byte的方式存储,没有字符集和排序规则的概念.
[2] 字符串补足方式不同,binary在字符串不足定义的长度时,会使用\0来补齐(char使用空格来补齐).
[3] 返回值可能不同。不会去掉尾部补足的\0(char会去掉尾部的空格).
[4] 对于按照字节来比较的字符,binary会更高效,因为它不用考虑字符集和排序规则的概念,只需比较挨个比较每个字节即可。
项目当中经常会用到varchar,varchar(N)和varchar(M),M>N AND M<=255,这两种那个更有效率呢?
Storing the value 'hello' requires the same amount of space in a VARCHAR(5) and a VARCHAR(200) column. Is there any advantage to using the shorter column?
As it turns out, there is a big advantage. The larger column can use much more memory,because MySQL often allocates fixed-size chunks of memory to hold values internally.
This is especially bad for sorting or operations that use in-memory temporary tables.The same thing happens with filesorts that use on-disk temporary tables.
The best strategy is to allocate only as much space as you really need.
varchar(N)会比varchar(M)更有效率,因为在内存的分配当中,字段属性越长,mysql内部分配的内存chunk就会越多。特别是在比较或者使用in-memory临时表的时候,需要更多的空间
可能导致数据临时交换到磁盘上。所以,尽量选择定义你所需要的合适的范围,varchar定义的长度越小,越能节省内存的开销。
阅读(1267) | 评论(0) | 转发(0) |