Tuesday, April 12, 2011

Load Balancing for PostgreSQL Database

     As explained in this post we achieved Read Scalability with Slony for our PostgreSQL databases. The next task was to achieve Load Balancing and some level of High Availability for our databases within each Data Center for all known reasons. This blog gives an overview of how we achieved the same.

Before Load Balancing






        As shown in the diagram above our web application had one to one correspondence to the database. Also for each client request the App Node used to hit the database with a short and stateless transaction request. On an average each Database Node was performing around 3000 TPS and in order to avoid each request creating a new connection we used pgbouncer as our connection pooler. This setup was working well to begin with before we started facing the following issues.

  • Requests to App Node was random in nature and as a result DB Node was having random load issues
  • Overall DB Nodes usage was very less which forced us to answer tough questions during expansion
  • During maintenance of DB Node, the corresponding App Node had to be out of live traffic due to one to one correspondence
  • Whenever a DB Node goes down, we had a dilemma of whether an existing live node can take additional traffic before we hit on capacity
After Load Balancing

       After little bit of research and checking our hardware inventory we moved to the below architecture which solved most of the above issues.

   As shown above we introduced a Load Balancer in between the App Nodes and DB Nodes. Now all incoming DB requests will be intercepted by the Load Balancer and will be forwarded to the appropriate DB Node based on configuration like round-robin, least connections etc. This helped us in achieving Load Balancing of the DB requests.
  The other interesting snippet in the above image is the "Health Check" component. The "Health Check" component has multiple checks on PgBouncer and PostgreSQL database to determine if it is healthy and can take live traffic. The Load Balancer runs these checks periodically on the background and marks whether a DB Node is UP or DOWN. For each incoming request, LB redirects the request to the appropriate DB Node which is UP and thereby avoiding the node which is marked as DOWN. This has helped in achieving partial level of High Availability for applications. One case where it fails is if all the DB Nodes are marked as DOWN.
   In summary by adding a Load Balancer and introducing "Health Checks" we achieved Load Balancing and partial level of High Availability.  More specifics about the implementation is reserved for another blog post which will follow in sometime soon.

  

2 comments:

  1. Hi Sivakumar. I´m interested about your findings for PostgreSQL load balancing. Can you share more information about the Netscaler configuration to accomplish this? Did you need to make new configurations on your DB servers? Thank you.

    ReplyDelete
    Replies
    1. Hi MoBlanc,
      No we don't need make any new configurations on the DB servers. From the LB perspective we created a VIrtual IP with DB as real servers underneath it. Also we had a dynamic health check to enable/disable traffic to the DB server

      Delete