Monday, December 28, 2009

PostgreSQL Cluster: “Too Many Projects” problem

“Too Many Projects” problem

The “too many projects” problem of PostgreSQL Cluster was raised and discussed in the PostgreSQL Clustering Developer Meeting at the JPUG Conference.

Nowadays, the PostgreSQL developer community has many clustering/replication projects, but many of them are not production ready. Though, new project is starting, and old project is still unmature.

I think this is one of the biggest challenges of the OpenSource development community.

Why happens this? How can we solve this, and get solutions mature?


What is the problem? And why happens this?

As Josh said, the problem is each project has only few users and few developers. Fewer users and developers make a project difficult to improve faster. So, many projects are not mature, and not ready for production use.

As I explained at the meeting, the most difficult challenge of PostgresForest development was not technical issue, but it was about the development process and the project sustainability.

I think there are three steps of this problem.
  • There are several types of clustering requirements.
  • PostgreSQL code is very clean, and making new “add-on” is very easy. So someone starts making a solution “XX”.
  • But it does not fit other users use (because feature “YY” is missing), so he/she thinks "It’s time to make new one, called ZZ”.
Thus, many users start making their own solutions to solve their (similar, but) own problems.


Thinking about the Goal

I was working as a cluster developer, and also working as a cluster user to implement production systems.

From the user’s point of view, the “To-Be” of the clustering project is a manure clustering solution with enough technical information, which can solve your requirements. Also users will attempt to find other production users (or case studies), and some kind of supports (Q&A, fixing bugs, releasing new version).

These make a credibility of the solution, and the credibility makes more users.

How to solve this?

A conclusion of the meeting is sharing information about several solutions. Here is a new PostgreSQL cluster wiki page.

I think the information we should share is:
  • Requirements (and/or use cases), which can be covered with the solution “XX”.
  • Features, which are implemented, and/or will be implemented in the solution “XX”.
We know “one size does not fit all”. So understanding each solution from these points of view is very important to improve existing project, and to prevent starting a new similar project.

So if you are a cluster developer, please put information on the wiki page. (Of course, I should put my information.)

See also:

2 comments:

Robert Treat said...

I think we not only have the problem of too many solutions, but apparently a problem of too many solutions to trying to organize all of the solutions.

In other words, can't we just use the existing wiki page for this information? http://wiki.postgresql.org/wiki/Replication%2C_Clustering%2C_and_Connection_Pooling

Satoshi Nagayasu said...

Thanks for the comment. I think what you mentioned is very important.

Many people are trying to do such thing, and it means the existing solution (to organize) is not enough for others. "Enough" is not only about the contents, also about its appearance.

I think both should be merged, but we have to have a better solution (or operation) to share and promote the information about PostgreSQL cluster.

How can we prevent such dupulicated works to organize?