Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1287827
  • 博文数量: 297
  • 博客积分: 285
  • 博客等级: 二等列兵
  • 技术积分: 2996
  • 用 户 组: 普通用户
  • 注册时间: 2010-03-06 22:04

Linuxer, ex IBMer. GNU













分类: Mysql/postgreSQL

2016-12-29 10:12:47


For me PostgreSQL is the most amazing (open source) database around. Even though, there is much interest in stripped down NoSQL databases like key-value stores or “data structure servers”, PostgreSQL continues to innovate at the SQL frontier.

In this post, I’ll show a few of the newer, less known, features of PostgreSQL - far beyond standard SQL.


hstore is a key-value store for simple data types. Using hstore, we’re able to create a key-value store within columns of a table.

To enable the hstore extension, run create extension hstore' in the PostgreSQL prompt. After that the hstore data type is available for our table definitions.


  1. CREATE EXTENSION IF NOT EXISTS hstore SCHEMA my_extensions;
  2. set search_path TO "$user",public,my_extensions

Let’s create a simple table with a hstore column:


  1. create table hstoretest ( id serial primary key, data hstore )

To insert a few rows, we use a special syntax:


  1. create table hstoretest ( id serial primary key, data hstore )

Query the table as usual:


  1. postgres=# TABLE hstoretest;
  2.  id | data
  3. ----+-------------------------------
  4.   1 | "key1"=>"123", "key2"=>"test"
  5. (1 row)

The hstore extension provides a lot of operators and functions to work with hstore columns, for example, selecting all key2 values:


  1. postgres=# select data->'key1' as key2 from hstoretest;
  2.  key2
  3. ------
  4.  123
  5. (1 row)

Some more examples can be found here.


A JSON data type was introduced in release 9.2. Currently this is nothing more than a validating data type, thus it checks if the string we put into that column is a valid JSON object.

Let’s create a new table to play around with this type:


  1. create table jsontest ( id serial primary key, data json )

Now let’s insert an invalid row:


  1. postgres=# insert into jsontest(data) VALUES('{"title":wrong}');
  2. ERROR: invalid input syntax for type json
  3. LINE 1: insert into jsontest(data) VALUES('{"title":wrong}');
  4.                                           ^
  5. DETAIL: Token "wrong" is invalid.

And now with the correct JSON syntax:


  1. insert into jsontest(data) VALUES('{"title":"wrong"}')
There isn’t really much more to the JSON data type besides the ability to return rows of non-JSON tables as JSON:


  1. postgres=# select row_to_json(hstoretest) from hstoretest;
  2.                    row_to_json
  3. -------------------------------------------------
  4.  {"id":1,"data":{"key1": "123", "key2": "test"}}
  5. (1 row)
Nice if you’re used to work with JSON object (in Web applications for example).


Working directly with JSON and JavaScript has been all the rage in many of the NoSQL databases. Using the PLv8 extension, we can use JavaScript (executed by Google’s awesome V8 engine) directly in PostgreSQL. Together with the JSON data type, this offers amazing new possibilities.

Currently PLv8 isn’t included in the standard distribution of PostgreSQL (9.2), but installing it isn’t very hard, the only dependencies are postgresql and the v8 engine. Some distributions already have v8 in their repositories (Archlinux).

Compiling and installing the extension is straight forward as soon as the dependencies are in place:

make && sudo make install 
on Debian ,do this:


  1. apt-get install postgresql-9.6-plv8
No we can enable the plv8 extension within our database (as we did with hstore):


  1. postgres=# create EXTENSION IF NOT EXISTS plv8 SCHEMA my_extensions;
  2. ERROR: extension "plv8" must be installed in schema "pg_catalog"
  3. postgres=# create EXTENSION IF NOT EXISTS plv8

create or replace function jmember (j json, key text )
AS $function$
  var ej = JSON.parse(j);
  if (typeof ej != 'object')
        return NULL;
  return JSON.stringify(ej[key]);

The jmember function allows us to parse and read the JSON string and returns the member identified by key:

select jmember(data, 'title') from jsontest;

(1 row) 

Andrew also shows how to build an index to speed up access times in his post.



  1. postgres=# select jmember(data, 'title') from jsontest;
  2. ERROR: SyntaxError: Unexpected token o
  3. DETAIL: undefined() LINE 0: [object Object]

k-Nearest Neighbors

In PostgreSQL 9.1, a nearest neighbor indexing was introduced. This allows us to perform orderings etc. by a distance metric.

For example, I’ve downloaded the ispell spelling dictionaries, and loaded them into a table words like this:



  1. CREATE EXTENSION IF NOT EXISTS pg_trgm schema my_extensions
  2. create table words(word varchar(255) primary key);
  3. copy words from '/usr/share/dict/american-english'

This inserts roughly 50000 words into the table.

Since we’re working with text data, let’s introduce another extension pg_trgm, which builds tri-grams of strings (triples of three characters). Using theses tri-grams, we can compute a distance metric. Enable the extension like this:


  1. CREATE EXTENSION IF NOT EXISTS pg_trgm schema my_extensions

A tri-gram of hello would look like this:


  1. select show_trgm('hello');
  2.             show_trgm
  3. ---------------------------------
  4.  {" h"," he",ell,hel,llo,"lo "}
  5. (1 row)

The distance metric is very simple, the more of these tri-grams match, the closer two strings are.

To take advantage of the nearest neighbor index, we have to build it:

create index word_trgm_idx on words using gist (word gist_trgm_ops); 

Using the index we can query our table for a word, and return a list of most similar terms as well:

select word, word <-> 'hello' as distance from words order by word <-> 'hello' asc limit 10;
  word  | distance
 hello  |        0
 hellos |    0.375
 hell   | 0.428571
 hells  |      0.5
 heller | 0.555556
 hell's | 0.555556
 help   |    0.625
 helm   |    0.625
 held   |    0.625
 helps  | 0.666667
(10 rows) 

The <-> operator comes from the pg_trgm extension, of course we could use simpler distances like numerical difference or geometric distance, but working with textual data is often perceived as particularly difficult (not so with PostgreSQL).



  1. postgres=# select word, word <-> 'hello' as distance from words order by word <-> 'hello' asc limit 10;
  2.   word | distance
  3. ---------+----------
  4.  hello   | 0
  5.  hello's | 0.25
  6.  hellos  | 0.375
  7.  Hell    | 0.428571
  8.  hell    | 0.428571
  9.  Hells   | 0.5
  10.  Hell's  | 0.555556
  11.  hell's  | 0.555556
  12.  Heller  | 0.555556
  13.  Othello |      0.6

阅读(776) | 评论(0) | 转发(0) |