Chinaunix首页 | 论坛 | 博客
  • 博客访问: 5136391
  • 博文数量: 921
  • 博客积分: 16037
  • 博客等级: 上将
  • 技术积分: 8469
  • 用 户 组: 普通用户
  • 注册时间: 2006-04-05 02:08
文章分类

全部博文(921)

文章存档

2020年(1)

2019年(3)

2018年(3)

2017年(6)

2016年(47)

2015年(72)

2014年(25)

2013年(72)

2012年(125)

2011年(182)

2010年(42)

2009年(14)

2008年(85)

2007年(89)

2006年(155)

分类: 数据库开发技术

2011-03-04 00:42:22

Short summary of an adventurous journey in the NOSQL world with .

The legs of our journey

As every other journey, the ours too is made of legs, so let’s write them down before going on:

  1. Redis? What is it?
  2. Available datatypes
  3. Where’s my tables?
  4. A simple use case
  5. Back home
Leg 1: Redis? What is it?

To make a long story short, Redis is a sort of key-value database on steroids. Why steroids? Simply because Redis is extremely fast (all data are loaded in memory and saved on disk, if desired) and feature rich (it supports various datatypes along with many complex operations on them).

An important peculiarity of Redis is the fact that it’s not a database in the common meaning of the word; it is a database in the sense that it stores and keeps data for you, but it doesn’t provide any SQL dialect to play with them (as relational databases do). Don’t be afraid, Redis is not a store-and-lose database, it just doesn’t provide our beloved SQL and all its amenities, but it has a robust and tested that lets you interact with it.

In Redis you’ll not deal with tables, selects, joins, views; there’re no integer or varchar fields and so on. You have to play with a very low level dataset structure and with more primitive datatypes.

Leg 2: Available datatypes

Let’s give a more deep look at how this odd database work. As seen in the previous leg, Redis is based on a key-value store paradigm and we’ll focus for the moment on its "key" part.

Keys are simply strings (officially not binary safe. UPDATE: .) such as "username" or "password". You can’t use space in keys, but alphanumeric values such as ".", ":", "_" (and so on) work fine, so keys like "user_name", "user:123:age", "user:123:username" are perfectly legal ones.

We must put a lot of attention to keys because, unlike field names in the RDBMS world, they are fully engaged in the game. As we’ll see in the following sub-legs, Redis doesn’t knows about tables, so simple tasks such as "SELECT username from users WHERE user_id=123;" have to be approached in a different way. As an example, to obtain in Redis the same result of the previous SQL query, we’ll have to get the value of the key "user:123:username", so, as you can see, keys could bring "information" (such as user ids) with them and play a very different role than the field names of an SQL table.

Now that things about keys should be more clear, let’s take a tour of the available datatypes.

Strings

are the most basic available datatype; they are normal, binary-safe strings of maximum dimension of 1 Gb.

You can set a key to a string value using the SET command and get it using GET. If you need to store a numeric value, such as counter, you can safely save it on a string and use INCR or DECR to increment or decrement it.

Lists

are collections of Redis strings sorted by insertion order. You can think of a list as a chain, you can add a new link on the left extremum (the head) or on the right (the tail); you can also add a link in the center, but you have to broke another link before.

You can add elements to a list using the LPUSH and RPUSH commands (L for left and R for right), get an element (removing it) with LPOP and RPOP and get a range of elements (without removing any) with LRANGE. You can also add an element on a given point with LSET, but, as for the chain example, this operation is much more slower than a simple push.

Hashes

At the time of writing, are the newest datatype, , it will be released shortly in the coming release. Hashes are a relatively long waited feature that helps storing data in a more coherent and neat way. Hashes implements a sort of key-value paradigm inside a key, so, for example an hash "user" may cointain some different fields to wich correspond different values, just like hashes do in programming languages such as ruby, javascript, etc…

Sets

are just like the homonymous mathematical structure, collection of distinct objects; in this particular case the objects are Redis strings. As you could argue, sets are different from lists for two main reasons: elements are unordered and they are all distinct, you can’t have two equal elements.

You can add an element to a set with the command SADD, remove an element with SREM, get (and remove) a random element with SPOP and perform the set operations of , and with SUNION, SINTER and SDIFF respectively.

Ordered sets

are quite like normal sets, with the exception that every set element has an associated weight used to take this element in order with others.

You can work with sorted sets as with normal ones, what changes is just the names of the commands: ZADD to add an element, ZREM to remove it and so on. Two different commands are ZINCR and ZSCORE, the former is used to increment the score of an element and the latter to get its score.

Leg 3: Where are my tables?

So, working with Redis data is really different from what we’re used to with SQL tables, you haven’t a language to query the server, you just have to manipulate the keys stored in the database. Commands are datatype-sensitive, you can’t use a set command on a list, otherwise you’ll get an error. Commands could be issued via the redis-cli or using one of for your preferred programming language. In the following, we’ll be agnostic about the way you’ll query Redis, we’ll just write the commands and you’ll issue them as you prefer.

Let’s think about a simple SQL table where we could save users for some application:

id username password name surname
1 user1 pass1 Bob Smith
2 user2 pass2 Mario Rossi
Storing the data

We want now to save the same data in Redis, so we have to design our database to fit this situation. Maybe, it’s better to think about the problem from an application-centric point of view; in the SQL version, our application would get users’ data issuing a SELECT statement having as clause the users’ id; in other words, we need a way to distinguish between different data entries. Additionally, we need to get user’s data with the sole knowledge of an unique identifier. This situation could be solved solved if we use the user id as a part of the redis key, so the records of the previous table would translate as

Key Value
user:1:username user1
user:1:password pass1
user:1:name Bob
user:1:surname Smith
user:2:username user2
user:2:password pass2
user:2:name Mario
user:2:surname Rossi

Well, given the id of an user, we could now get all its data reading the keys user:id:username, user:id:password, user:id:name and user:id:surname.

Loggin’ in users

Our data are suitable for a login process, so we have to provide a way to gets the user’s id, given the username; in other words, we have to establish a relation between usernames and user ids. This can be done if whe add another redis key to our data design: user:username:id.

Key Value
user:user1:id 1
user:user2:id 2

So, if Mario Rossi will try to login to our application, we can get his id knowing its username and so we’ll get all its data.

Primary keys?

Another problem is the uniqueness of our ids. In the SQL world we say, for example, "id int primary key auto_increment", now we have to implement a similar behavior to ensure that we have a different id for each user. As in the previous case, Redis has a solution: we can simply create another key "user:next_id" and use it as a counter that we manipulate via the INCR command whenever we add a new user.

SELECT * FROM users;

The next problem is to get the user list. We could think that what we’ve already done is enough to get this list: we could get the current value "user:next_id" and get users’ data from 0 to this value in one or more steps. Well, now let’s think about an user that has cancelled it’s subscription (our next problem), when we’ll traverse all the integers from 0 to user:next_id we’ll find its old id that now should not has any data.

Althought this is not a serious problem, we wouldn’t waste our time trying to get the data of users that don’t exist anymore, so we’ll create a new key "user:list" of list (or set) type in wich we’ll add all new user ids, removing them when needed. We prefer using a list because this give us the possibility to do a sort of "pagination" (maybe you’re thinking about "LIMIT"?) using the LRANGE command.

Deleting users

The last problem is a matter of "data integrity"; what would happen when we delete a user? We sould remove anything that references to it. In other words, we should delete all the keys user:id:*, user:username:id and the id in "user:list".

Leg 4: A simple use case

As an example of what we learned in this journey, let’s try to design an application that will work as a virtual library that let us group our books by topics. This example will be slightly more complex than the user’s table, but we’ll teach us how to handle relations with Redis.

In our application, we should be able to collect our books, storing informations such as title, author(s), topic(s), pages, price, ISBN and a description. Obviously, we could have a book with more than one author and that cover more than one topic at the same time (for example, a book that is about programming ad ruby). Additionally, one author could have written more than one book and a topic could be dealed with in more than one book. From what we’ve written it arises clearly that there’re a many-to-many relation between authors and books and between topics and books.

The SQL case

First of all, let’s try to design the SQL tables (with some data) that model this situation, in order to see how works the translation in the Redis idiom:

Books

id title pages price isbn description
1 Programming Ruby: The Pragmatic Programmers’ Guide 829 $26 0974514055 The bible of the ruby programming language
2 Erlang Programming 496 $42 0596518188 An interesting introduction to erlang

Authors

id name surname
1 Dave Thomas
2 Chad Fowler
3 Andy Hunt
4 Francesco Cesarini
5 Simon Thompson

Topics

id name description
1 programming Books about programming
2 ruby Books about ruby
3 erlang Books about erlang

Books-Authors

book_id author_id
1 1
1 2
1 3
2 4
2 5

Books-Topics

book_id topic_id
1 1
1 2
2 1
2 3
The Redis case

It should be already clear that the trhee tables Books, Authors and Topics are not a problem; we’ve already learned how to save this data in Redis in the previous leg (see the next image). The problem arises when considering the link tables Books-Authors and Books-Topics that implement the two many-to-many relations. How to implement them? Let’s think about topics only, the problem for authors is merely a twin (with a different name) of the former and we’ll use the same strategy.

For every book we have to know the topics that belongs to it and, in the same way, for every topic we have to know every book that deals with it. In other words, for each book we need a collection of the related topic ids and for each topic a collection of the related book ids. This is a problem where sets fit perfectly! We’re going to create two sets, book:id:topic and topic:id:books, in the former we’ll save the topic ids that match the book and in the latter we’ll save the ids of the books that match this topic. For example, referring to the data shown in the previous sub-leg, the book "Programming Erlang" (book id 2) will have a key book:2:topics of set type, with elements (1, 3); while the topic programming will have a set topic:1:books with elements (1, 2).

As said early, the same works for authors, so we could say that a SQL’s many-to-many relation translates in Redis as two sets. This is really useful, because it give us the ability to obtain an important feature for free: we could easily find a book that covers more than one topic doing a set intersection between all topic:id:books sets related to the topics we are interested on. So, for example, the intersection between topic:1:books (books about programming) and topic:2:books (books about ruby) will return the set (1), i.e. only the book "Programming Ruby" with id 1.

A special care should be used whenever we remove something from our database. As shown in the previous figure, for example, topics have a reference to books and books have a reference to topics, so, what to remove? Well, surely we should remove every book:id:* keys, but before doing this we have to traverse all topics to remove the book id from the set topic:id:books. Additionally, we should remove the book id from the book:list list. If we want to remove a topic, we have to procede in a very similar way: before removing each topic:id:* key we have to traverse all books referenced by the topic we want to delete and remove the topic id from the set book:id:topics. The same apply for authors.

We like fun and programming, so to lear by doing consider taking a look to . It is implemented in ruby, using ezmobius’ redis-rb. Both the code and the look’n’feel are very rough, so feel free to make a pull request if you have improvements.

Leg 5: Back home

Our journey is reaching its end, we’re quite back home and so now it’s time to pack our luggages and buy some souvenirs.

In the luggage we have to put all we’ve learned about Redis: datatypes, commands, oddities and so on.

The souvenirs are the three patterns we’ve learned:

  • handling the unique id problem with a string key and the INCR command;
  • handling a user login via an user:username:id type key;
  • handling many-to-many relations with sets.

Our journey is completed and now we’re definitely at home; we just hope to don’t suffer of post-holidays stress, but in any case we’ve the right cure: having fun coding free software!

References

Domenico Delle Side (domenico.delleside AT alcacoop.it), AlcaCoop (info AT alcacoop.it)

文章原地址:

阅读(1275) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~