Chinaunix首页 | 论坛 | 博客
  • 博客访问: 2038338
  • 博文数量: 414
  • 博客积分: 10312
  • 博客等级: 上将
  • 技术积分: 4921
  • 用 户 组: 普通用户
  • 注册时间: 2007-10-31 01:49
文章分类

全部博文(414)

文章存档

2011年(1)

2010年(29)

2009年(82)

2008年(301)

2007年(1)

分类: Mysql/postgreSQL

2008-07-10 14:01:37

   Frequently when using SQL you want data from more than one table at a time. You do this by "joining" the tables in various ways.

   For this section, we will be using two tables in our examples, called a and b. Here is what they look like:

a b
user_iduser_name
1jdoe
2bsmith
3mkanat
user_idemail
1jdoe@example.com
2bsmith@example.com
2bsmith@e2.com

    Note that in the below examples we use SELECT * for simplicity, but SELECT *should never be used in Bugzilla code..

CROSS JOIN

   The least common (but also least understood) type of JOIN is the "cross join," also called the "cross-product join."

    A cross join means, "return all possible combinations of rows in these two tables."

    In SQL, there are two ways to do a cross join of tables a and b:

SELECT * FROM a, b or SELECT * FROM a CROSS JOIN b

    The two pieces of code above are identical(等同的). The first one is called an "implicit(绝对的)" join (because the database just "figures out" that we mean CROSS JOIN, from the comma), and the second is called an "explicit(详述的,直言的)" join (because we specified exactly what we want).

    If we run the SQL above with our a and b tables, the result looks like:

a.user_ida.user_nameb.user_idb.email
1jdoe1jdoe@example.com
1jdoe2bsmith@example.com
1jdoe2bsmith@e2.com
2bsmith1jdoe@example.com
2bsmith2bsmith@example.com
2bsmith2bsmith@e2.com
3mkanat1jdoe@example.com
3mkanat2bsmith@example.com
3mkanat2bsmith@e2.com

    In general, you want to avoid cross joins unless you are certain they are exactly what you need.

INNER JOIN

   An INNER JOIN is one where you try to actually join two tables together based on a column that they have in common. This is the most common type of join.

  For our a and b tables, we have two ways of writing an INNER JOIN:

SELECT * FROM a INNER JOIN b ON a.user_id = b.user_id or SELECT * FROM a, b WHERE a.user_id = b.user_id

   Just like with the cross join, the first version is an "explcit" join, and the second one is an "implicit" join.

The inner join above looks like this:

user_iduser_nameemail
1jdoejdoe@example.com
2bsmithbsmith@example.com
2bsmithbsmith@e2.com

   Note that "mkanat" (with user_id 3) is left out, because he doesn't have an entry in both tables.

LEFT JOIN

   A LEFT JOIN is like an INNER JOIN, but it includes records that have an entry in a but no corresponding entry in b. That is, where "mkanat" was left out in our INNER JOIN above, he would be included in our LEFT JOIN.

   A LEFT JOIN is normally only done one way:

SELECT * FROM a LEFT JOIN b ON a.user_id = b.user_id

   The result of that code looks like this:

user_iduser_nameemail
1jdoejdoe@example.com
2bsmithbsmith@example.com
2bsmithbsmith@e2.com
3mkanatNULL

   There was no "email" for "mkanat," so the database returns NULL there.

   It's called a "left" join because we include entries from the table on the left (table a, note that it's on the left side of the words "LEFT JOIN") that don't have a match in the table on the right side.

   Warning: If we add a WHERE email = ? to the above SQL statement, the row containing mkanat will never be returned. This is because NULL is never equal to anything.

Indexes

Simple Description of Indexes

Indexes speed up SELECT statements.

Normally, if you have a table of 10,000 rows, and you want row number 8000, the database has to go through all 10,000 rows to find your row. That's called a "table scan," and it's usually slow.

With an index, you ask it for "row 8000" and it gives it to you instantly.

More Details

OK, so in reality you wouldn't be asking for "row 8000," you'd be doing something like WHERE bugs.bug_id = 8000 in your SQL.

In order for the database to do that quickly, it needs an index on the bugs table, on the bug_id column.

When To Add An Index

Generally, you should add an index any time you plan to use a column in a WHERE clause, or in the "ON" part of a JOIN statement.

Adding too many indexes to a table will slow down INSERT and UPDATE statements, so don't add indexes you don't need.

Multi-Column Indexes

Many databases (MySQL in particular) will only use one index per table per query.

That is, imagine that we do: SELECT * FROM bugs WHERE priority = 'P1' OR op_sys = 'Windows'. We have two indexes on the bugs table, one for "priority" and one for "op_sys." Only one of them will be used. The one that's used is up to the database, it will try to pick the "best" one to use, and then do a table scan for the other one.

If we want that to be fast, we can create a single index that contains the values for both priority and op_sys. That is called a multi-column index.

Multi-column indexes are always in a certain order. For example, either "op_sys, priority" or "priority, op_sys." For your purposes, those two indexes are totally identical, with one exception: The first column of a multi-column index can be used like it's a single-column index. So, for example, having an index on both "priority, op_sys" and "priority" would be redundant. Don't do it.

Write Queries With Indexes In Mind

When writing a query, give a brief thought to how it will use indexes. Don't try to solve performance problems before you know they exist (remember our rules from the top of the Developer's Guide), but do give a brief thought to it, particularly the fact that only one index will be used per table, per query, by common databases.

Cross-Database Query Compatibility

Different databases support different things, and different databases use different syntax for similar features. Bugzilla->dbh provides many functions that will generate correct SQL for the database that Bugzilla is using. These functions all have names that start with sql_.

Here are the MySQL constructs that must be replaced with Bugzilla::DB sql_ functions:

  • LOCK TABLES / UNLOCK TABLES (use bz_lock_tables and bz_unlock_tables instead)
  • BEGIN/START [TRANSACTION] / COMMIT [TRANSACTION]
  • REGEXP / NOT REGEXP
  • LIMIT
  • TO_DAYS
  • DATE_FORMAT
  • INTERVAL
  • CONCAT
  • INSTR / POSITION
  • GROUP BY
  • MATCH or anything having to do with "fulltext" indexes.

Also, if you absolutely need to do a case-insensitive string comparison, use sql_istrcmp instead of = or other operators.

The following SQL constructs should never be used. Replace them with something else instead:

  • ENCRYPT() - use Perl's encrypt function instead.
  • bit arithmetic (AND, OR and NOT on numbers)
  • REPLACE INTO - instead use a check if the row is there, and then either an INSERT or an UPDATE.
  • double quotes (") for string literals - instead use single quotes or $dbh->quote().
  • IFNULL (use COALESCE)
  • Columns of MySQL type ENUM or TIMESTAMP.
  • Any boolean comparison that is inside the fields of a SELECT statement. For example SELECT (column IS NOT NULL) FROM b will not work. (Use CASE WHEN column IS NOT NULL THEN 1 ELSE 0 END instead.)
  • WHERE statements without an operator. For example, SELECT column FROM b WHERE a will not work. That needs to have: WHERE a = 1.
  • IF - use CASE WHEN instead.
  • INSERT statements with multiple groups of parentheses after VALUES. Instead, just call INSERT multiple times.

The following query features are not currently available in all databases that Bugzilla supports, and so should be avoided:

  • Transactions (START TRANSACTION or COMMIT)
  • Subselects (SELECT * FROM (SELECT * FROM b))

A Note About INNER JOIN

In general, implicit INNER JOINs (FROM a, b WHERE a.user_id = b.user_id) are faster than explicit INNER JOINs (FROM a INNER JOIN b ON a.user_id = b.user_id). This is because explicit INNER JOINs bypass the query optimizer on some databases and specify that the database must use that table order.

However, you can't mix explicit and implicit joins. It works on older MySQL versions (before 5.0), but that's it. So, if you have any LEFT JOIN statements in your SQL, you cannot have any "comma" joins.

In general, the performance penalty for explicit INNER JOINs should only happen on older databases (PostgreSQL before 7.4, for example), so don't worry about it too much. When in doubt, do an explicit join.

Style

  • Bugzilla uses a capitalisation convention for SQL. SQL keywords such as SELECT, WHERE and AND should be all upper case. Even function names like NOW() should be upper-case. Database, table and field names should be all lower case, for example:

    SELECT fieldname1 FROM tablename WHERE fieldname2 = 2 AND fieldname3 = 'HELLO' AND time = NOW()

  • Indentation: For short SQL statements, you can indent them in whichever way you like. However, for long SQL statements, the recommended formatting is like this:
    SELECT DISTINCT groups.id, groups.name, groups.description
    FROM groups
    CROSS JOIN user_group_map
    CROSS JOIN group_group_map AS ggm
    WHERE user_group_map.user_id = ?
    AND ((user_group_map.isbless = 1
    AND groups.id = user_group_map.group_id)
    OR (groups.id = ggm.grantor_id
    AND ggm.grant_type = ?
    AND ggm.member_id IN(1, 2, 3))))
    ORDER BY groups.id

    Note how all the major keywords are right-aligned. This makes it very easy to scan through the statement. Also, look at how the WHERE clause is aligned--in such a way as to show the logical groupings, but without a lot of extra spacing.

    Also, because it was such a complex query, we put the table name at the beginning of each column name. This makes it much easier to see where data is coming from, even if it's not always necessary. (For simple queries that query only one table, don't do that. It's obvious where the data is coming from.)

阅读(1467) | 评论(0) | 转发(0) |
0

上一篇:Checkbox 讲解

下一篇:SQL JOIN

给主人留下些什么吧!~~