Sunday, July 14, 2013

monetdb_fdw: PostgreSQL meets column store. Finally.

As you may know, FDW is one of the advantages of PostgreSQL. There are variety of FDWs to federate different data sources (including PostgreSQL itself) with PostgreSQL.

And also, you may know everyone in the industry is talking about BigData and analytics everyday. We know that most of them must be a buzz though. :p

Yesterday, I released a brand-new FDW, monetdb_fdw, which allows you to federate MonetDB, an open source column-oriented RDBMS, with PostgreSQL.

The column-store pioneer | MonetDB

MonetDB was originally developed at University of Amsterdam, and it has been well developed and maintained as open source.

While I was trying DBT-3 queries on MonetDB, I found that this column-oriented RDBMS realizes extreme performance (up to x100 faster) rather than PostgreSQL in analytic query execution.

So, I decided to develop some way to take advantage of MonetDB performance on PostgreSQL, and here monetdb_fdw comes.

monetdb_fdw is a FDW which allows you to execute analytic query on the remote MonetDB server and handle a result set on PostgreSQL.

A demo video is available to see how MonetDB and monetdb_fdw work.

monetdb_fdw demo from uptimejp on Vimeo.

This video shows (1) an analytic query being processed on PostgreSQL in 177 seconds, (2) the same query being processed on MonetDB in 8 seconds, and (3) the same analytic query being processed on the remote MonetDB server through MonetDB FDW in 1 second.

If you're interested in getting analytic workload faster, and being PostgreSQL users, you would benefit from monetdb_fdw. Take a look!



Ross Reedstrom said...

I realize this is a quick demo, but how do you explain that accessing monetdb via pgsql is faster than doing it directly? Is this a demo effect, caused by cache preloading from the 8sec direct monetdb evaluation?

Satoshi Nagayasu said...

Exactly. I often observed that the query could be executed in 1 second when I tried twice (or more) directly on MonetDB.

Martin Kersten said...

Indeed, for any database query it makes a difference if you run in 'cold' mode (data on disk) or 'hot' mode (data cached).