Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1287827
  • 博文数量: 297
  • 博客积分: 285
  • 博客等级: 二等列兵
  • 技术积分: 2996
  • 用 户 组: 普通用户
  • 注册时间: 2010-03-06 22:04
个人简介

Linuxer, ex IBMer. GNU https://hmchzb19.github.io/

文章分类

全部博文(297)

文章存档

2020年(11)

2019年(15)

2018年(43)

2017年(79)

2016年(79)

2015年(58)

2014年(1)

2013年(8)

2012年(3)

分类: Mysql/postgreSQL

2016-12-29 10:12:47

原帖地址:https://blog.interlinked.org/tutorials/postgresql.html

For me PostgreSQL is the most amazing (open source) database around. Even though, there is much interest in stripped down NoSQL databases like key-value stores or “data structure servers”, PostgreSQL continues to innovate at the SQL frontier.

In this post, I’ll show a few of the newer, less known, features of PostgreSQL - far beyond standard SQL.

hstore

hstore is a key-value store for simple data types. Using hstore, we’re able to create a key-value store within columns of a table.

To enable the hstore extension, run create extension hstore' in the PostgreSQL prompt. After that the hstore data type is available for our table definitions.

点击(此处)折叠或打开

  1. CREATE EXTENSION IF NOT EXISTS hstore SCHEMA my_extensions;
  2. set search_path TO "$user",public,my_extensions


Let’s create a simple table with a hstore column:

点击(此处)折叠或打开

  1. create table hstoretest ( id serial primary key, data hstore )


To insert a few rows, we use a special syntax:

点击(此处)折叠或打开

  1. create table hstoretest ( id serial primary key, data hstore )


Query the table as usual:

点击(此处)折叠或打开

  1. postgres=# TABLE hstoretest;
  2.  id | data
  3. ----+-------------------------------
  4.   1 | "key1"=>"123", "key2"=>"test"
  5. (1 row)


The hstore extension provides a lot of operators and functions to work with hstore columns, for example, selecting all key2 values:

点击(此处)折叠或打开

  1. postgres=# select data->'key1' as key2 from hstoretest;
  2.  key2
  3. ------
  4.  123
  5. (1 row)


Some more examples can be found here.

JSON

A JSON data type was introduced in release 9.2. Currently this is nothing more than a validating data type, thus it checks if the string we put into that column is a valid JSON object.

Let’s create a new table to play around with this type:

点击(此处)折叠或打开

  1. create table jsontest ( id serial primary key, data json )


Now let’s insert an invalid row:

点击(此处)折叠或打开

  1. postgres=# insert into jsontest(data) VALUES('{"title":wrong}');
  2. ERROR: invalid input syntax for type json
  3. LINE 1: insert into jsontest(data) VALUES('{"title":wrong}');
  4.                                           ^
  5. DETAIL: Token "wrong" is invalid.


And now with the correct JSON syntax:

点击(此处)折叠或打开

  1. insert into jsontest(data) VALUES('{"title":"wrong"}')
There isn’t really much more to the JSON data type besides the ability to return rows of non-JSON tables as JSON:

点击(此处)折叠或打开

  1. postgres=# select row_to_json(hstoretest) from hstoretest;
  2.                    row_to_json
  3. -------------------------------------------------
  4.  {"id":1,"data":{"key1": "123", "key2": "test"}}
  5. (1 row)
Nice if you’re used to work with JSON object (in Web applications for example).

PLv8

Working directly with JSON and JavaScript has been all the rage in many of the NoSQL databases. Using the PLv8 extension, we can use JavaScript (executed by Google’s awesome V8 engine) directly in PostgreSQL. Together with the JSON data type, this offers amazing new possibilities.

Currently PLv8 isn’t included in the standard distribution of PostgreSQL (9.2), but installing it isn’t very hard, the only dependencies are postgresql and the v8 engine. Some distributions already have v8 in their repositories (Archlinux).


Compiling and installing the extension is straight forward as soon as the dependencies are in place:

make && sudo make install 
on Debian ,do this:

点击(此处)折叠或打开

  1. apt-get install postgresql-9.6-plv8
No we can enable the plv8 extension within our database (as we did with hstore):

点击(此处)折叠或打开

  1. postgres=# create EXTENSION IF NOT EXISTS plv8 SCHEMA my_extensions;
  2. ERROR: extension "plv8" must be installed in schema "pg_catalog"
  3. postgres=# create EXTENSION IF NOT EXISTS plv8

create or replace function jmember (j json, key text )
 RETURNS text
 LANGUAGE plv8
 IMMUTABLE
AS $function$
  var ej = JSON.parse(j);
  if (typeof ej != 'object')
        return NULL;
  return JSON.stringify(ej[key]);
$function$; 

The jmember function allows us to parse and read the JSON string and returns the member identified by key:

select jmember(data, 'title') from jsontest;

     jmember
-----------------
 "right"
(1 row) 

Andrew also shows how to build an index to speed up access times in his post.

这个函数我没有成功执行过,报错如下:

点击(此处)折叠或打开

  1. postgres=# select jmember(data, 'title') from jsontest;
  2. ERROR: SyntaxError: Unexpected token o
  3. DETAIL: undefined() LINE 0: [object Object]

k-Nearest Neighbors

In PostgreSQL 9.1, a nearest neighbor indexing was introduced. This allows us to perform orderings etc. by a distance metric.

For example, I’ve downloaded the ispell spelling dictionaries, and loaded them into a table words like this:

注:我使用了linux自带的字典文件


点击(此处)折叠或打开

  1. CREATE EXTENSION IF NOT EXISTS pg_trgm schema my_extensions
  2. create table words(word varchar(255) primary key);
  3. copy words from '/usr/share/dict/american-english'

This inserts roughly 50000 words into the table.

Since we’re working with text data, let’s introduce another extension pg_trgm, which builds tri-grams of strings (triples of three characters). Using theses tri-grams, we can compute a distance metric. Enable the extension like this:

点击(此处)折叠或打开

  1. CREATE EXTENSION IF NOT EXISTS pg_trgm schema my_extensions

A tri-gram of hello would look like this:

点击(此处)折叠或打开

  1. select show_trgm('hello');
  2.             show_trgm
  3. ---------------------------------
  4.  {" h"," he",ell,hel,llo,"lo "}
  5. (1 row)


The distance metric is very simple, the more of these tri-grams match, the closer two strings are.

To take advantage of the nearest neighbor index, we have to build it:

create index word_trgm_idx on words using gist (word gist_trgm_ops); 

Using the index we can query our table for a word, and return a list of most similar terms as well:

select word, word <-> 'hello' as distance from words order by word <-> 'hello' asc limit 10;
  word  | distance
--------+----------
 hello  |        0
 hellos |    0.375
 hell   | 0.428571
 hells  |      0.5
 heller | 0.555556
 hell's | 0.555556
 help   |    0.625
 helm   |    0.625
 held   |    0.625
 helps  | 0.666667
(10 rows) 

The <-> operator comes from the pg_trgm extension, of course we could use simpler distances like numerical difference or geometric distance, but working with textual data is often perceived as particularly difficult (not so with PostgreSQL).

我的实际效果:

点击(此处)折叠或打开

  1. postgres=# select word, word <-> 'hello' as distance from words order by word <-> 'hello' asc limit 10;
  2.   word | distance
  3. ---------+----------
  4.  hello   | 0
  5.  hello's | 0.25
  6.  hellos  | 0.375
  7.  Hell    | 0.428571
  8.  hell    | 0.428571
  9.  Hells   | 0.5
  10.  Hell's  | 0.555556
  11.  hell's  | 0.555556
  12.  Heller  | 0.555556
  13.  Othello |      0.6
使用的字典文件不同,结果不同。


阅读(776) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~