Postgres

The Postgres-XC 1.0 beta release overview

 

When I heard about this project a year ago, I was really excited about it. Many cluster-wide projects based on Postgres were developed very slowly, based on older (i.e. Postgres-R http://www.postgres-r.org/) or proprietary (i.e. Greenplum) versions. The features that this project hoped to achieve were ambitious, as we’ll detail in this post. And best of all - this project is based on the 9.1 Postgresql version, which is really up-to-date (at the moment of writing this post, this is the last stable version).

If you are interested in a serious project for scaling horizontally your PostgreSQL architecture, you may visit the official website at http://postgres-xc.sourceforge.net/ and take a look. 

For those who are interested, there will be a tutorial at PgCon this year.  As a brief overview, I will try to give you a broad idea for those who want to get involved in the project.

 

What Postgres-XC can do:

  • Support multi-master architecture. Data nodes can contain part or all of the data of a relationship. 
  • Transparent view to application from any master. The application only needs to interact with the cluster through coordinators.
  • Distribute/Replicate per relation (replication, round robin (by default if any unique column is specified), by hash (by default if a unique is specified), by modulo or a set to a group or node)
  • Parallel transaction execution among cluster nodes.

 

What Postgres-XC cannot do:

  • Support triggers (may be supported in future releases).
  • Distribute a table with more than one parent. 

 

Before you start:

  You need to install the most recent versions of Flex and Bison. That’s important because in the last tarball, ‘./configure’ won’t raise error if they are missing, and the error will be prompted once you execute ‘make’. You will need readline-dev and zlib-dev (not mandatory but strongly recommended).

According to the documentation, Postgres-XC should be compatible with Linux platforms based upon Intel x86_64 CPU. 

The documentation needs to be improved, so we advise you to try the steps directly and read the help prompted by the commands. For example, the initdb command in the documentation is incomplete, “--nodename” is mandatory in this version. This project is new and has only a few contributors to date, but we hope its community keeps growing. Most importantly, it is great that a beta release was launched earlier than we expected.

 

Elements of architecture

 

  • GTM (Global Transaction Manager)

+ Realize that I say only GTM, not GTMs. Only one GTM can be the manager. For redundancy, you have GTM-Standby and to improve performance and failover GTM-Proxies.

+ The GTM serializes all the transaction processing and can limit the whole scalability if you implement it primitively. This should not be used in slow/wide networks and is recommended to involve the fewest number of switches between GTM and coordinators. The proxies reduce the iteration with the GTM and improve the performance. 

+ Uses MVCC technology. That means that it will still use the same control for the concurrency as Postgres.

+ This is the first thing you will need to configure. If you set up everything in the same machine, you will need to create a separate folder for the configuration and files.

 

  • Coordinator/s

+ Interface for applications (like a Postgres backend). 

+ It has its own pooler (yes, and I think this is great, avoiding more complexity in big environments).

+ Doesn’t store any data. The queries are executed in the datanodes, but...

+ … it has its own data folder (for global catalogs).

  • Datanode/s

+ Stores the data.

+ It receives the petition with a GXID (Global Transaction ID) and Global Snapshot to allow requests from several coordinators.

Both Datanodes and Coordinator use their own data directory, so keep this in mind this if you are setting both up on the same machine. 

Configuring several GTM-Proxies will improve the scalability, shrinking the I/O in the GTM. Plus, you can configure the GTM-Standby to avoid a SPOF of the general manager. It not only provides the GXID, it also receives the node registration (you can trace your log or check the file inside the gtm folder called register.node, it’s binary but is readable) and most importantly, it holds the snapshot of the current status of all the transactions.

Coordinators can point to all the datanodes and can point to the same datanode (as Oracle RAC, but we’re not sure if all the features included in that solution will be available for Postgres-XC). Coordinators connect to the proxies (if you have already configured them) or the main GTM.

Hope you enjoyed the read. We are preparing more cool stuff about this amazing solution, keep in touch!

New versions of PgPool released - 3.1.3 & 3.0.7


This essential tool for Postgres architectures is continually improving, and is now available in its new releases. Both are bugfix versions.
 
For those unfamiliar with the tool, it is a middleware with functionality as a load balancer, pooler*  and/or replication system for PostgreSQL databases. The 3.1.x versions are compatible with Postgres 9.x, whose streaming replication feature was pushed to Pgpool developers to take advantage of it.   This allows the tool to balance queries without using the pgpool-replication technique.
 
In the 3.1.3/3.0.7 fixes we have:

  • Allow multi statement transactions in master/slave mode: Transactions with BEGIN, since 3.1 were sent to the slaves/standby servers as well. This brings non desirable effects when the transaction contains DELETE/INSERT/UPDATE, due to the fact that standbys cannot execute writable SQL. (3.1.3 fix)
  • Important fixes for failover detection and execution. (3.1.3 fix)
  • Added m4 files to avoid problems when compiling in older operating systems.
  • Fixed hangup on PREPARE errors.
  • Fixed memory leak in reset queries.


If you are running 3.1.x against Postgres 9 databases, we strongly recommend you upgrade PgPool due to the fixing in the multi statement feature.

For more information http://www.pgpool.net/mediawiki/index.php/Main_Page

* If you need only a connection pooler for Postgres, I prefer PgBouncer http://wiki.postgresql.org/wiki/PgBouncer. It is lightweight and more specific and simply works, without as much configuration.

Exploring a new feature of 9.2: Index-only scans

We, like other Postgres DBAs worldwide, have been waiting for the 9.2 release for some time, specifically for the index-only scan feature, which will help reduce I/O by preventing unnecessary access to heap data if you only need data from the index.

Besides 9.2 is still in development, it is possible to download a version for testing at http://www.postgresql.org/download/snapshots/ . It's important to note that it doesn’t add new behaviours, but improves the way that indexes are used.

How can we test this feature? We created a ‘big’ table starting with 10 million+ records with random values.

  • Extract a few elements, using a where clause.
  • Use aggregations.
  • Use partitioning plus index only scans.



When is this feature most useful?:

  • When you select only the columns that are specified in the index definition,  including those which are at the condition part of the query.
  • If a vacuum was executed previously. Thus happens because the scan can skip the “heap fetch” if the TID references a heap [table] page on which all tuples are known visible to everybody (src/backend/executor/nodeIndexonlyscan.c).


So, the main table is:


CREATE TABLE lot_of_values AS SELECT i, clock_timestamp() t1, random() r1, random() r2, random() r3, clock_timestamp() + (round(random()*1000)::text || ' days')::interval d1 from generate_series(1,10000000) i(i);
ALTER TABLE lot_of_values ADD PRIMARY KEY(i);

CREATE INDEX CONCURRENTLY ON lot_of_values (d1);



Something interesting: due to some improvements in write performance, we realized that 9.2 demonstrated better timing compared with 9.1.3 (~200k in 9.1, ~170k ms on 9.2). The index creation was slightly better on 9.2.

The table will contain data like this:

stuff=# \x
Expanded display is on.
stuff=# select * from lot_of_values limit 1;
-[ RECORD 1 ]---------------------
i  | 1
t1 | 2012-04-18 08:37:14.426624+00
r1 | 0.571268450468779
r2 | 0.222371176816523
r3 | 0.72282966086641
d1 | 2012-08-17 08:37:14.426713+00


Ok, let’s start with some examples. As we previously explained, we need to specify columns that are only in the index. You can’t use columns from 2 different indexes. The next example is a clear fail:


stuff=# explain select i,  d1 from lot_of_values where round(r1*100) < 10;
             QUERY PLAN                                
---------------------------------------------
Seq Scan on lot_of_values  (cost=0.00..263496.00 rows=3333333 width=12)
  Filter: (round((r1 * 100::double precision)) < 10::double precision)
(2 rows)

stuff=# set enable_seqscan=off;
SET
stuff=# explain select i,  d1 from lot_of_values where round(r1*100) < 10;
                                      QUERY PLAN                                       
---------------------------------------------
Seq Scan on lot_of_values  (cost=10000000000.00..10000263496.00 rows=3333333 width=12)
  Filter: (round((r1 * 100::double precision)) < 10::double precision)
(2 rows)



We don’t have indexes at r1 and it isn’t part of the index!

The next example is another fail, using a column that is defined in another index or directly not defined in any index:


stuff=# explain select i,  d1 from lot_of_values where i between 12345 and 23456;
                                         QUERY PLAN                                           
---------------------------------------------
Index Scan using lot_of_values_pkey on lot_of_values  (cost=0.00..450.83 rows=11590 width=12)
  Index Cond: ((i >= 12345) AND (i <= 23456))
(2 rows)



The next example is the correct case:


stuff=# explain select i from lot_of_values where i between 12345 and 23456;
                                           QUERY PLAN                                             
---------------------------------------------
Index Only Scan using lot_of_values_pkey on lot_of_values  (cost=0.00..450.83 rows=11590 width=4)
  Index Cond: ((i >= 12345) AND (i <= 23456))
(2 rows)



Also, we can try with a non-pk index:


stuff=# explain select min(d1), max(d1) from lot_of_values ;
                                          QUERY PLAN                                                             
---------------------------------------------
Result  (cost=6.93..6.94 rows=1 width=0)
  InitPlan 1 (returns $0)
    ->  Limit  (cost=0.00..3.46 rows=1 width=8)
          ->  Index Only Scan using lot_of_values_d1_idx on lot_of_values  (cost=0.00..34634365.96 rows=10000000 width=8)
                Index Cond: (d1 IS NOT NULL)
  InitPlan 2 (returns $1)
    ->  Limit  (cost=0.00..3.46 rows=1 width=8)
          ->  Index Only Scan Backward using lot_of_values_d1_idx on lot_of_values  (cost=0.00..34634365.96 rows=10000000 width=8)
                Index Cond: (d1 IS NOT NULL)
(9 rows) stuff=# explain select min(i), max(i), avg(i) from lot_of_values where i between 1234 and 2345;
                                             QUERY PLAN                                               
---------------------------------------------
Aggregate  (cost=66.90..66.91 rows=1 width=4)
  ->  Index Only Scan using lot_of_values_pkey on lot_of_values  (cost=0.00..58.21 rows=1159 width=4)
        Index Cond: ((i >= 1234) AND (i <= 2345))
(3 rows)


The aggregation cases are special.  Index-only scans are not useful for count(*) without condition, because the index scan needs to check the visibility of the tuple, which makes it expensive. So, if you need to count the entire table, a sequential scan must be perfomed.

Just for testing purposes, we’ll try to “turn off” the seqscan node, to force an Index-only scan:


stuff=# explain (analyze true, costs true, buffers true, timing true, verbose true) select count(i) from lot_of_values;
               QUERY PLAN                                                       
---------------------------------------------
Aggregate  (cost=213496.00..213496.01 rows=1 width=4) (actual time=57865.943..57865.946 rows=1 loops=1)
  Output: count(i)
  Buffers: shared hit=2380 read=86116
  ->  Seq Scan on public.lot_of_values  (cost=0.00..188496.00 rows=10000000 width=4) (actual time=0.667..30219.806 rows=10000000 loops=1)
        Output: i, t1, r1, r2, r3, d1
        Buffers: shared hit=2380 read=86116
Total runtime: 57866.166 ms
(7 rows) stuff=# set enable_seqscan=off;
SET
stuff=# explain (analyze true, costs true, buffers true, timing true, verbose true) select count(i) from lot_of_values;
                                                          QUERY PLAN                                    
---------------------------------------------
Aggregate  (cost=351292.03..351292.04 rows=1 width=4) (actual time=64094.544..64094.547 rows=1 loops=1)
  Output: count(i)
  Buffers: shared read=110380
  ->  Index Only Scan using lot_of_values_pkey on public.lot_of_values  (cost=0.00..326292.03 rows=10000000 width=4) (actual time=38.773..35825.761 rows=10000000 loops=1)
        Output: i
        Heap Fetches: 10000000
        Buffers: shared read=110380
Total runtime: 64094.777 ms
(8 rows)



After a Vacuum, the plan changed and the cost drops to 262793.04.

For partitioning, as we expected, this works as well (in this example, we’ll use another table called ‘persons’ with ‘dni’ as PK column):


coches=# explain (analyze true, costs true, buffers true, timing true, verbose true)  select dni from persons where dni between 2100111 and 2110222;
Result  (cost=0.00..168.62 rows=22 width=8) (actual time=61.468..61.468 rows=0 loops=1)
  Output: persons.dni
  Buffers: shared hit=43 read=1
  ->  Append  (cost=0.00..168.62 rows=22 width=8) (actual time=61.442..61.442 rows=0 loops=1)
        Buffers: shared hit=43 read=1
        ->  Seq Scan onpersons  (cost=0.00..0.00 rows=1 width=8) (actual time=0.156..0.156 rows=0 loops=1)
              Output:persona.dni
              Filter: (((persona.dni)::bigint >= 2100111) AND ((persona.dni)::bigint <= 2110222))
        ->  Index Only Scan using persons_200_pkey on persons_200 persons  (cost=0.00..8.38 rows=1 width=8) (actual time=0.405..0.405 rows=0 loops=1)
              Output:persona.dni
              Index Cond: ((persons.dni >= 2100111) AND (persons.dni <= 2110222))
              Heap Fetches: 0
              Buffers: shared hit=5

…. LOT OF PARTITIONS ….
Total runtime: 11.045 ms
(114 rows)


Conclusion: this feature adds one of the most exciting performance improvements in Postgres. We see improvements at as much as 30% so far, and look forward to seeing how this scales as 9.2 becomes production-ready.


Considerations about text searchs in big fields and planner costs

Imagine the following scenario: you have a table with a small set of records with columns containing  tsvector and text data type. But, text fields has almost 20 megabytes of text or more.

So, what happens? Postgres planner checks the amount of reltuples (tuples) in the relation to calculate the cost of data extraction. If the amount of tuples reached by the query is higher than the 20% of the total result set or the amount of blocks in the relation is too small, then the planner will choose sequential scan, otherwise random access will be the chosen one.

But, in that case we face a problem. The amount of rows and blocks in the relation is not the “total” content. That happens because Postgres use a technique called TOAST (The Oversized  Attribute Storage Technique) which store all the data that is over 200 bytes, apart from the block. So, in the relation you will have only the firsts bytes of your column data. You can modify some options about how the columns get toasted (the level of compression), but the default behaviour is compress and store apart. This methodology is pretty cool, because if you have i.e. 500MB of data in only one field, your relation will contain all that data in the same file and in the most usual cases that is not convenient.

So, meanwhile you have small number of rows, the amount of data to read from disk is quite big in relation of the data stored in the relfilenode (the data file). But for FTS, we need to dig into all the content and for make the searchs faster, we need GIN or GIST indexes. This type of indexes, stores the data inside the index (differently from B-Tree indexes), making possible the search using the index.

So, you want to access by index to your data, but the planner estimates that the amount of rows is so small that is better to access by sequential scan instead of use index. So, the only way to force the read of index is *SET enable_seqscan TO off;* before execute your query. This variable is per session, so you might want to add it inside the transaction and set again to “on” for other queries.

But, by the way, once your table is populated with enough amount of data, you wouldn’t need anymore this execution, but if your table is quite static, could be a little fix. Besides this, it doesn’t represents a problem of performance, but this kind of things could show you how the planner works.

Let’s see an example:

CREATE TABLE documents_en (
   author_id char(2),
   text_id serial,
   author varchar(45) NOT NULL,
   title varchar(127) NOT NULL,
   content text NOT NULL,
   cont_ftsed tsvector  ,
   PRIMARY KEY (author_id, text_id)
);

CREATE FUNCTION ftsed_upd_en() RETURNS trigger AS $$
DECLARE
BEGIN
NEW.cont_ftsed:=(setweight(to_tsvector('english', coalesce(NEW.title,   '')), 'A') ||
    setweight(to_tsvector('english', coalesce(NEW.author,  '')), 'B') ||
    setweight(to_tsvector('english', coalesce(NEW.content, '')), 'C'));
RETURN NEW;
END
$$
LANGUAGE plpgsql;

CREATE TRIGGER tgr_ins_ftsed_en BEFORE INSERT ON documents_en
FOR EACH ROW EXECUTE PROCEDURE ftsed_upd_en();

CREATE TRIGGER tgr_upd_ftsed_en BEFORE UPDATE OF author,title,content
ON documents_en
FOR EACH ROW EXECUTE PROCEDURE ftsed_upd_en();

CREATE INDEX documents_en ON nspc_es USING gin(
   (setweight(to_tsvector('english', coalesce(title,   '')), 'A') ||
    setweight(to_tsvector('english', coalesce(author,  '')), 'B') ||
    setweight(to_tsvector('english', coalesce(content, '')), 'C'))
);

CREATE INDEX ix_ftsed ON documents_en USING GIST(cont_ftsed);


Before continuing, we need to clarify 3 things: text_id is related with author_id, but just for testing we use it as a serial, to avoid collisions in the primary key; we use triggers to automatically set up the tsvector column; the last thing is the indexes, I created 2 indexes to show the examples. IMHO I don’t recommend functional indexes, for performance purposes, but is still an option.

To fill up the table, I’ve made a script to catch up all the READMEs in the server and put them into the DB.


#!/bin/bash
FILENAME="$@"
TMPBFF=temp_file
PSQL=/opt/pg92dev/bin/psql
WORDS=words

for _FILE_ in $FILENAME
do
 egrep  -oi '[a-z]+' $_FILE_ | egrep -i '[a-z]+{4}+' > $WORDS
 AUTHOR=$(cat $WORDS | sort -u | head -n2 | awk '{printf("%s ", $0) }')
 AUTHOR_ID=${AUTHOR:0:2}
 TITLE=$(cat ${WORDS} | sort | uniq -c | sort -nr | head -n3 | awk '{printf("%s ", $2)}')
 BODY=$(cat $WORDS | sort -u | awk '{printf("%s ", $0) }')

 $PSQL -Upostgres fts -c "INSERT INTO documents_en (author_id, author, title, content) VALUES ('${AUTHOR_ID}', '${AUTHOR}', '${TITLE}','${BODY}' );"
done


I know, is a bit tricky, but useful for this test. to execute it, you’ll need something like:

locate README.txt | xargs ./script

First let’s get some stats from the table:

fts=# select count(*) from documents_en ;
count : 22
fts=# select pg_size_pretty(pg_relation_size('documents_en'));
pg_size_pretty :  24 kB (size of the table)

fts=# select relpages from pg_class where relname = 'documents_en';
relpages :3 (8kb each)

Now, a simple query with a EXPLAIN ANALYZE:

fts=# EXPLAIN ANALYZE  SELECT author, title
   FROM documents_en WHERE cont_ftsed @@ to_tsquery('english','editor&find');
                                             QUERY PLAN                                               
-------------------------------------------------------------------------------------------------------
Seq Scan on documents_en  (cost=0.00..
3.27 rows=1 width=34) (actual time=0.228..0.464 rows=2 loops=1)
  Filter: (cont_ftsed @@ '''editor'' & ''find'''::tsquery)
  Rows Removed by Filter: 20
Total runtime:
0.501 ms
(4 rows)


Oh! It’s using sequential scan (as we expect ), but now let’s add a trick:

fts=# SET enable_seqscan TO off;
SET
fts=# explain ANALYZE  SELECT author, title
   FROM documents_en WHERE cont_ftsed @@ to_tsquery('english','editor&find');
                                                       QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
Index Scan using ix_ftsed_en on documents_en  (cost=0.00..
8.27 rows=1 width=34) (actual time=0.127..0.191 rows=2 loops=1)
  Index Cond: (cont_ftsed @@ '''editor'' & ''find'''::tsquery)
  Rows Removed by Index Recheck: 1
Total runtime:
0.280 ms
(4 rows)

What happens? The cost is higher but is faster? The fast answer is YES, is possible. Cost is based on estimations in the amount of blocks and type of access. Random access use to has 4:1 more cost in relation with the sequential accesses. In this particular case, the table is small so the planner estimates that is cheaper to read it without indexes. BUT that’s not the real problem, the real one is that the data is already TOASTED outside the “data file” and this data is not considered by the planner in this case.

Another question is: can we improve the access to this data? The answer is YES, if you are not complaining about the storage. What you can do is:

fts=# alter table documents_en alter column cont_ftsed set storage external;
ALTER TABLE

That will uncompress the data of the column cont_ftsed and force to store it outside the table (this is not really at this way, but for be didactic we will explain it like this).

fts=# vacuum  full analyze documents_en;
VACUUM
fts=# explain ANALYZE  SELECT author, title
   FROM documents_en WHERE cont_ftsed @@ to_tsquery('english','editor&find');
                                                       QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
Index Scan using ix_ftsed_en on documents_en  (cost=0.00..8.27 rows=1 width=34) (actual time=0.071..0.127 rows=2 loops=1)
  Index Cond: (cont_ftsed @@ '''editor'' & ''find'''::tsquery)
  Rows Removed by Index Recheck: 1
Total runtime:
0.178 ms
(4 rows)

fts=# SET enable_seqscan TO on;
SET
fts=# explain ANALYZE  SELECT author, title
   FROM documents_en WHERE cont_ftsed @@ to_tsquery('english','editor&find');
                                             QUERY PLAN                                               
-------------------------------------------------------------------------------------------------------
Seq Scan on documents_en  (cost=0.00..3.27 rows=1 width=34) (actual time=0.067..0.292 rows=2 loops=1)
  Filter: (cont_ftsed @@ '''editor'' & ''find'''::tsquery)
  Rows Removed by Filter: 20
Total runtime:
0.329 ms
(4 rows)

In this case we use VACUUM because the SET STORAGE option doesn’t make any changes to the previous stored data, and its effects are after. So with vacuum, we are forcing to apply the changes in all the table.

Ok, what happen if the table gets filled with more data? Automatically the planner will start to use the index:

fts=# select pg_size_pretty(pg_relation_size('documents_en'));
pg_size_pretty
----------------
1744 kB
(1 row)
fts=# SELECT count(*)  FROM documents_en WHERE cont_ftsed @@ to_tsquery('english','editor&find');
count
-------
   45
(1 row)


fts=# explain ANALYZE  SELECT author, title
   FROM documents_en WHERE cont_ftsed @@ to_tsquery('english','editor&find');
                                                     QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on documents_en  (cost=4.38..
47.02 rows=13 width=31) (actual time=1.091..5.404 rows=45 loops=1)
  Recheck Cond: (cont_ftsed @@ '''editor'' & ''find'''::tsquery)
  Rows Removed by Index Recheck: 80
  ->  Bitmap Index Scan on ix_ftsed_en  (cost=0.00..4.37 rows=13 width=0) (actual time=1.003..1.003 rows=125 loops=1)
        Index Cond: (cont_ftsed @@ '''editor'' & ''find'''::tsquery)
Total runtime:
5.489 ms
(6 rows)

fts=#
SET enable_bitmapscan TO off;
SET
fts=# explain ANALYZE  SELECT author, title
   FROM documents_en WHERE cont_ftsed @@ to_tsquery('english','editor&find');
                                                         QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
Index Scan using ix_ftsed_en on documents_en  (cost=0.00..
56.50 rows=13 width=31) (actual time=0.307..5.286 rows=45 loops=1)
  Index Cond: (cont_ftsed @@ '''editor'' & ''find'''::tsquery)
  Rows Removed by Index Recheck: 80
Total runtime:
5.390 ms
(4 rows)

We don’t see a huge difference in terms of time between each other, but we still facing the question about the relation between cost/real-performance.

But, don’t forget the functional index! I think you will like the following example:

fts=# analyze;
ANALYZE
fts=# explain analyze select author_id, text_id from documents_en

where (((setweight(to_tsvector('english'::regconfig, COALESCE(title, ''::character varying)::text), 'A'::"char") || setweight(to_tsvector('english'::regconfig, COALESCE(author, ''::character varying)::text), 'B'::"char")) || setweight(to_tsvector('english'::regconfig, COALESCE(content, ''::text)), 'C'::"char"))) @@ to_tsquery('english','editor&find');
                                                                                                                                                                             QUERY PLAN                                                                                                                                                                               
------------------------------------------------------------------------------------------------------------------------------------------------------------ Seq Scan on documents_en  
(cost=0.00..292.20 rows=17 width=7) (actual time=2.960..3878.627 rows=45 loops=1)
  Filter: (((setweight(to_tsvector('english'::regconfig, (COALESCE(title, ''::character varying))::text), 'A'::"char") || setweight(to_tsvector('english'::regconfig, (COALESCE(author
, ''::character varying))::text), 'B'::"char")) || setweight(to_tsvector('english'::regconfig, COALESCE(content, ''::text)), 'C'::"char")) @@ '''editor'' & ''find'''::tsquery)
  Rows Removed by Filter: 2238
Total runtime:
3878.714 ms
(4 rows)

fts=#
SET enable_seqscan TO off;
SET
fts=# explain analyze select author_id, text_id from documents_en

where (((setweight(to_tsvector('english'::regconfig, COALESCE(title, ''::character varying)::text), 'A'::"char") || setweight(to_tsvector('english'::regconfig, COALESCE(author, ''::character varying)::text), 'B'::"char")) || setweight(to_tsvector('english'::regconfig, COALESCE(content, ''::text)), 'C'::"char"))) @@ to_tsquery('english','editor&find');
                                                                                                                                                                                  QUERY PLAN                                                                                                                                                                                 
------------------------------------------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on documents_en  
(cost=272.15..326.46 rows=17 width=7) (actual time=0.937..1.026 rows=45 loops=1)
  Recheck Cond: (((setweight(to_tsvector('english'::regconfig, (COALESCE(title, ''::character varying))::text), 'A'::"char") || setweight(to_tsvector('english'::regconfig, (COALESCE(
author, ''::character varying))::text), 'B'::"char")) || setweight(to_tsvector('english'::regconfig, COALESCE(content, ''::text)), 'C'::"char")) @@ '''editor'' & ''find'''::tsquery)
  ->  Bitmap Index Scan on textsearch_en  (cost=0.00..272.15 rows=17 width=0) (actual time=0.916..0.916 rows=45 loops=1)
        Index Cond: (((setweight(to_tsvector('english'::regconfig, (COALESCE(title, ''::character varying))::text), 'A'::"char") || setweight(to_tsvector('english'::regconfig, (COALE
SCE(author, ''::character varying))::text), 'B'::"char")) || setweight(to_tsvector('english'::regconfig, COALESCE(content, ''::text)), 'C'::"char")) @@ '''editor'' & ''find'''::tsquer
y)
Total runtime:
1.109 ms
(5 rows)

Again! But hits time it is more stronger the difference in terms of time. Is not so pretty, ugh? So, It seems that you will need to be careful if you use searchs in big columns, that are toasted. Maybe you should check in you production database if you need to add some tweaks to get the maximum performance to your Postgres instance.

The version used on this post is PostgreSQL 9.2devel on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.6.2 20111027 (Red Hat 4.6.2-1), 64-bit. Besides this, I made tests in 9.1 and still see the same thing’.

Happy hacking!





PostgreSQL 9.1 Presentation Slides available

Our resident Postgres expert Emanuel spoke at the Libre Software World Conference last week about PostgreSQL 9.1, and as the Community Liasion for PalominoDB for a few more days, I am posting the slides so that folks can download them.  They are in PPT format:

 

http://dev.palominodb.com/docs/LSWC_POstgresql_91_English.ppt

Enjoy, and remember that all of our presentation slides and videos are available at any time from our Community -> Presentations page.  

 

New England Database Summit

The New England Database Summit is an all day conference-style event where participants from the research community and industry in the New England area can come together to present ideas and discuss their research and experiences working with on data-related problems.  It is an academic conference with applications to real life, and includes any type of database.

The 5th annual NEDB will be held in Cambridge, MA MIT (in 32-123) on Friday, February 3, 2012.  Anyone who would like is welcome to present a poster (registration required), or submit a short paper for review.  We plan to accept 8--10 papers for presentation (15 minutes) at the meeting.   All posters will be accepted.

For more details, and to register and / or upload a paper, see:

http://db.csail.mit.edu/nedbday12/

Indexing text columns with GIST or GIN to optimize LIKE, ILIKE using pg_trgm in Postgres 9.1 (Part 1)

 
“The pg_trgm module provides functions and operators for determining the similarity of ASCII alphanumeric text based on trigram matching, as well as index operator classes that support fast searching for similar strings.”
 
This is the introduction to the official documentation of the extension at [1]. Note, that I used the EXTENSION terminology instead of CONTRIB, as in PostgreSQL 9.1.  Now we’ll use the CREATE EXTENSION command to include this module in the database. This new methodology allows us to manage modules installations/uninstallations with only a few commands.

The idea of this post is to show you how KNN GIST and pg_trgm could be used together to obtain interesting results. First, let’s start with some basic elements of pg_trgm.

Installation

Installing the module is easy. If you are installing through the use of source code, you must compile the module and once you get access to the database execute:

CREATE EXTENSION pg_trgm;

That’s it! Installation complete!

What is a Trigram and how use them?


A trigram is a group of three consecutive characters in a string that can be used to detect the similarity of two words (for example) or the ‘distance’ between them.

When we talk about ‘distance’, 0 means in the same place and 1 is very far. When we talk about similarity, 1 is equal and 0 is totally different. In other words, distance is 1 minus the similarity value. These concepts about distance and similarity, are necessary to start without confusion.

What’s is a trigram? A trigram is a group of three consecutive characters from a string, used to know the similarity between two strings by counting the trigrams they share.

A trigram looks like this:

palominodb=# select show_trgm('PalominoDB CO');
                              show_trgm
-----------------------------------------------------------------------
{"  c","  p"," co"," pa",alo,"co ","db ",ino,lom,min,nod,odb,omi,pal}
(1 row)

In the pg_trgm extension we have functions and operators. show_limit and set_limit are functions used to set up and show the similarity threshold for the % operator. This operator takes the form “string1 % string2” and returns a boolean type (“t” if the similarity is greater than the similarity threshold, otherwise it returns “f”).

palominodb=# select show_limit();
show_limit
       0.4

palominodb=# select set_limit(0.3), show_limit();
set_limit | show_limit
         0.3 |    0.3

In the following example we’ll see the use of each one. Operator % will return true if the similarity of the strings is greater than similarity threshold returned by show_limit function. In this example, both string are equal, in consequence, the operation will return true:

palominodb=# select similarity('Palomino','Palomino')  AS Similarity,
                                  'Palomino'<->'Palomino'              AS distance,
                                  'Palomino' % 'Palomino'              AS SimilarOrNot;
-[ RECORD 1 ]+--
similarity          | 1
distance     | 0
similarornot     | t

Index Support and usage


Now let’s discuss combining GIST or GIN and pg_trgm. pg_trgm includes an operator class to support searches using similarity, like, or ilike operators. GIN and GIST have several differences. If you don’t know which to choose, just remember a few rules: GIN searches quicker than GIST but is slower to update; if you have a write-intensive table use GIST. GIN is better for static data. Please be aware, however, that they don’t support exact matching with the equals operator!  You can do an exact match using like/ilike with no wildcards.  If you want to use the equals operator(=), you must create a standard BTREE index on the pertinent column.

In the following examples, we’ll show a table only with a GIST index. As you can see, if you want to match the exact value with equal operator, it will scan the whole table:

palominodb=# EXPLAIN ANALYZE  SELECT id, texto FROM texto_busqueda WHERE texto = 'Palomino';
                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Seq Scan on texto_busqueda  (cost=0.00..90.15 rows=1 width=136) (actual time=16.835..16.846 rows=1 loops=1)
  Filter: (texto = 'Palomino'::text)
Total runtime: 17.094 ms
(3 rows)

But, if we use LIKE operator, index scan will be activated:

palominodb=# EXPLAIN ANALYZE  SELECT id, texto FROM texto_busqueda WHERE texto like 'Palomino';
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Index Scan using texto_busqueda_texto_idx on texto_busqueda  (cost=0.00..8.27 rows=1 width=136) (actual time=0.374..1.780 rows=1 loops=1)
  Index Cond: (texto ~~ 'Palomino'::text)
Total runtime: 1.979 ms
(3 rows)


palominodb=# EXPLAIN ANALYZE  SELECT id, texto FROM texto_busqueda WHERE texto like '%Palomino%';
                                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Index Scan using texto_busqueda_texto_idx on texto_busqueda  (cost=0.00..8.27 rows=1 width=136) (actual time=0.171..1.732 rows=1 loops=1)
  Index Cond: (texto ~~ '%Palomino%'::text)
Total runtime: 1.882 ms
(3 rows)



To use an index for  match equal strings, we need to create a BTREE index. But in case of BTREE there is a limitation of 8191 bytes per index row. So, if you have very large text columns you will not allowed to create a BTREE index without using functional indexes.

We get this result because, unlike BTREE indexes, the search string is not left-anchored.

The creation of indexes with the pg_trgm operator class is simple:

CREATE INDEX ON texto_busqueda USING GIST(texto gist_trgm_ops);
or
CREATE INDEX ON texto_busqueda USING GIN(texto gin_trgm_ops);

If you want a more comprehensive understanding of GIST or GIN implementation on Postgres, you can download the the source code and read  src/backend/access/gist/README and src/backend/access/gin/README.


Another useful technique

Combining % operator to get the strings that have a similarity greater than the established threshold and similarity function -that returns the similarity-, we can get ordered from the most similar to the less one discarding all the strings that aren’t similar enough:

palominodb=# SELECT ctid, similarity(texto, 'Palominodb') AS simil
palominodb-#  FROM texto_busqueda
palominodb-#  WHERE texto % 'Palominodb'
palominodb-#  ORDER BY simil DESC;
 ctid  |  simil
--------+----------
(55,3) | 0.666667
(1 row)

The same query, but with its EXPLAIN plan shows that the generated index is used in the condition:

palominodb=# EXPLAIN ANALYZE  SELECT ctid, similarity(texto, 'Palominodb') AS sml
palominodb-#  FROM texto_busqueda
palominodb-#  WHERE texto % 'Palominodb'
palominodb-#  ORDER BY sml DESC;
                                                             QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Sort  (cost=14.26..14.27 rows=3 width=138) (actual time=3.428..3.437 rows=1 loops=1)
  Sort Key: (similarity(texto, 'Palominodb'::text))
  Sort Method: quicksort  Memory: 17kB
  ->  Bitmap Heap Scan on texto_busqueda  (cost=4.28..14.24 rows=3 width=138) (actual time=3.336..3.383 rows=1 loops=1)
        Recheck Cond: (texto % 'Palominodb'::text)
        ->  Bitmap Index Scan on texto_busqueda_texto_idx  (cost=0.00..4.28 rows=3 width=0) (actual time=3.278..3.278 rows=1 loops=1)
              Index Cond: (texto % 'Palominodb'::text)
Total runtime: 3.578 ms
(8 rows)


Well, this is the first part. Hope you enjoyed the reading and I’ll wait your comments and feedback!

 


[1] http://www.postgresql.org/docs/9.1/static/pgtrgm.html

Postgresql 9.1 - Part 1: General Features

General scope

 
Postgresql 9.1 runs over the theme “features, innovation and extensibility” and it really does. This version was born to overcome Postgresql 9.0 ‘s limitations and known bugs in replication. If you are developing over 9.0, it’s time to think seriously about preparing your code for Postgresql 9.1.
 
The intent of this series of posts are not to be another release features posts. I offer a vision based on my personal experience and focus on the features that I saw exciting for the most of the projects where I’m involved. If you want to read an excellent general article about the new features of this version, web to [2].
 
At the moment of this post, the last PostgreSQL version is 9.1.1 . It includes 11 commits to fix GIST memory leaks, VACUUM improvements, catalog fixes and others. A description of the minor release can be check at [3].
 
The main features included are:

  • Synchronous Replication
  • Foreign Data Support
  • Per Column collation support
  • True SSI (Serializable Snapshot Isolation)
  • Unlogged tables for ephemeral data
  • Writable Common Table Expressions
  • K-nearest-neighbor added to GIST indexes
  • Se-linux integration with the SECURITY LEVEL command
  • Update the PL/Python server-side language
  • To come: PGXN Client for install extensions easily from the command line. More information: http://pgxn.org/faq/  The source will be on https://github.com/pgxn/pgxn-client



Some of these features could be considered minor, but many think they are very cool while using 9.1  in their environments.

Considerations before migrating


If you are an old Pg user, you may already know the migration risks listed on the next page. Still, I advise that you note and carefully learn about these risks. Many users freeze their developments to older versions simply because they didn’t know how to solve new issues. The most notable case is when 8.3 stopped using implicit casts for some datatypes and many queries didn’t work as a result.  



There are some important changes that could affect your queries, so take a pen and note:

  • The default value of standard_conforming_strings is now turned on by default. That means that backslashes are normal characters (which is the SQL standard behavior). So, if you have backslashes in your SQL code, you must add E’’ strings. For example: E’Don’t’
  • Function-style and attribute-style data type casts were disallowed for composite types. If you have code like value_composite.text or text(value_composite), you will need to use CAST or :: operator.
  • Whereas before the checks were skipped, domains are now based on arrays when they are updated, which results in a rechecking of the constraints.
  • String_to_array function returns now an empty array for a zero-length string (before it returned NULL). The same function splits into characters if you use the NULL separator.
  • The inclusion of the INSTEAD OF action for triggers will require you to recheck the logic of your triggers.
  • If you are an actual 9.0 replication user, you may know that in 9.1 you can control the side effects of VACUUM operations during big queries execution and replication. This is a really important improvement. Basically, if you run a big query in the slave server and the master starts a VACUUM process, the slave server can request the master postpone the cleanup of death rows that are being used by the query.


Brief description of main features


Don’t worry about the details, we’ll cover each feature in future posts.

  • Synchronous Replication
    • This feature enhances the durability of the data. Only one server can be synchronous with the master, the rest of the replicated servers will be asynchronous. If the actual synchronous server goes down, another server will become synchronous (using a list of servers in synchronous_standby_names).  Failover is not automatic, so you must use external tools to activate the standby sync server, one of the most popular is pgpool [4].
  • Foreign Data Support
    • The feature of Foreign Data Wrappers has been included since 8.4, but now it is possible to reach data from any database where a plugin exists. Included in the contribs, is a file called file_fwd, which connects CSV files to a linked table. Basically it provides an interface to connect to external data. In my opinion, this is perhaps one of the most useful features of this versions, especially if you have different data sources in your environment.
  • Serializable Snapshot Isolation
    • This new level of serialization is the strictest. Postgres now supports READ COMMITED, REPEATABLE READ (old serializable) and SERIALIZABLE. It uses predicate locking to keep the lock if the write would have an impact on the result. You will not need explicit locks to use this level, due to the automatic protection provided.
  • Unlogged tables
    • Postgres uses the WAL log to have a log of all the data changes to prevent data loss and guarantee consistency in the event of a crash, but it consumes resources and sometimes we have data that we can recover from other sources or that is ephemeral. In these cases, creation of unlogged tables allows the database to have tables without logging into the WAL, reducing the writes to disk. Otherwise, this data will not be replicated, due to the mechanism of replication used by Postgres (through WAL records shipping).
  • Writable Common Table Expressions
    • CTE was included in 8.4 version, but in this version, it was improved to allow you to use writes inside the CTE (WITH clause). This could save a lot of code in your functions.
  • K-nearest-neighbor added to GIST indexes
    • Postgres supports multiple types of indexes; one of them is GiST (Generalized Search Tree). With 9.1, we can define a ‘distance’ for datatypes and use it for with a GiST index. Right now, this feature is implemented for point, pg_trgm contrib and others btree_gist datatypes. The operator for distance is <-> . Another feature you will enjoy is that LIKE and ILIKE operators can use the tgrm index without scanning the whole table.
  • SE-Linux integration
    • Postgres is now the first database to be fully integrated with military security-grade. SECURITY LABEL applies a security label to a database object. This facility is intended to allow integration with label-based mandatory access control (MAC) systems such as SE-Linux instead of the more traditional access control - discretionary with users and groups. (DAC).



References:

[1] http://www.postgresql.org/docs/9.1/static/release-9-1.html
[2] http://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.1
[3] http://www.postgresql.org/docs/9.1/static/release-9-1-1.html
[4] http://pgpool.projects.postgresql.org/


Syndicate content
Website by Digital Loom