Chinaunix首页 | 论坛 | 博客
  • 博客访问: 249712
  • 博文数量: 27
  • 博客积分: 2010
  • 博客等级: 大尉
  • 技术积分: 1675
  • 用 户 组: 普通用户
  • 注册时间: 2005-05-19 20:43
文章分类
文章存档

2008年(27)

我的朋友

分类: Mysql/postgreSQL

2008-06-16 23:07:39

(Open Source Analytics Evangelist) Posted 3/30/2006

In the next post, we'll start doing some hands on. But before that we need to select a Data Warehouse Platform. Don't worry, I am not going to nudge you towards some commercial "Data Warehouse Solution" here! As promised, it will be open source and free.

Since a Data Warehouse is essentially a database, we could use any standard database for the purpose. But since we aren't making just a toy to impress people with, let's stop for a moment and look at what we need. So lets go over that...

Unlike an OLTP system that needs to write very fast, we would be better off with a database that can read lots of data faster because that is what we would be doing most of the time. (See previous post Database vs Data Warehouse).

We would also like the system to be production grade, robust, 24x7, and common enough so that finding expertise is not impossible. Do note that 24x7 availability may not be as critical for a DW as it would be for the OLTP servicing your bank's ATM or rerouting your flights.

Demonstrated ability to handle huge (really HUGE) amounts of data is a must as well, and it should allow high-speed bulk loading in batch jobs. And it would be great if there was a way to split very huge tables into smaller ones (like Oracle partitioning).

And then you would need transaction support. Right? Wrong.

Huh? Well, look at it this way. If your OLTP system did not have transactions, we'd be dead. But a DW is not a transaction oriented system. It would be more efficient to use a non-transactional database as you do away with the overheads required for enforcing transactions. This is a debatable point, but a non-transactional database would be a lot faster than a transactional database in simple read write tasks.

MySQL, configured right, makes an obvious and affordable choice for Data Warehousing.

MySQL's non-transactional MyISAM db engine is one of the fastest around when it comes to querying large amounts of data. You don't get transaction support, and that's alright because you don't need transactions (and the huge overheads they bring). And its bulk insert features (LOAD DATA INFILE
阅读(564) | 评论(2) | 转发(0) |
给主人留下些什么吧!~~

chinaunix网友2008-06-22 13:54:19

Hi yue, thx for ur comment. According to my understanding, "merge" mentioned in this article is refered to an approach like Oracle's "partioning" rather than an engine. I haven't yet practiced this functionality with MyISAM tables. Once I did it, I shall post here. Thank you.

yueliangdao06082008-06-21 22:28:53

This article points the engine named merge is the best solution to handle DW. Did you have ever tested this engine? Show me some experiences,thanks.