Chinaunix首页 | 论坛 | 博客
  • 博客访问: 102050
  • 博文数量: 19
  • 博客积分: 840
  • 博客等级: 准尉
  • 技术积分: 235
  • 用 户 组: 普通用户
  • 注册时间: 2009-10-02 21:25
文章分类

全部博文(19)

文章存档

2011年(1)

2010年(5)

2009年(13)

我的朋友

分类: 系统运维

2010-06-26 23:00:47


There's a lot of buzz lately about Hadoop. If you're completely new to Hadoop, I recommend the free videos from Cloudera (). If you have a vague idea and want to play around, it's easy!

First, download Cloudera's training VM which has a small Hadoop cluster already installed and running:

Second, you need to put some data into Hadoop. Fortunately for database folks, there's a tool to import data into Hadoop from MySQL called "Sqoop". It's already installed on the VM and there are instructions for using Sqoop to import some MySQL tables into Hadoop (see Desktop/instructions/exercises/SqoopExercise.html inside the VM). FYI, it's not uncommon to "Sqoop" data into Hadoop, do analysis and transformations, and then use Sqoop to export the data back to MySQL.

Now you're ready to do analysis of your data using Hadoop's powerful MapReduce. Except that MapReduce requires coding (Java, Python, PHP, etc) and an understanding of the functional programming model that is MapReduce. For an easier entry into Hadoop, try Hive. Hive is a data warehousing system for Hadoop. It offers a language (HiveQL) that feels just like SQL. Examples:

$ hive
hive> SHOW TABLES;
hive> SELECT * FROM LIMIT 10;

Hive supports most of the SQL queries you are used to. For example JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, GROUP BY, ORDER BY, aggregate functions, etc. The best part is that Hive can scale to analyze petabytes of data!
阅读(2139) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~