Supercharging PostgreSQL with PgBouncer
As I’ve written about previously, Postgres is an incredibly rock solid database. What’s more, when the good vibes go your way and you need to scale Postgres, its august ecosystem answers the call quite nicely. In particular, PgBouncer enables you to both vertically scale Postgres and more efficiently take advantage of Postgres horizontal scaling.
PgBouncer is primarily a connection pooler and router for Postgres. It acts as middleware between applications and one or more Postgres instances, managing both routes to individual databases as well as a pool of connections. Leveraging PgBouncer can significantly improve the performance of an application by reducing the overhead of opening and closing new connections to the database as well as enabling you to easily route traffic to multiple Postgres instances (such as read replicas, for instance).
With PgBouncer managing database connections, you free up valuable resources for your database instances thereby enabling you to more efficiently vertically scale. Moreover, PgBouncer's routing features make it copacetically easily to route queries to multiple instances. In this way, PgBouncer is conceptually akin to a load balancer enabling efficient horizontal scaling.
Getting started with PgBouncer
Getting started with PgBouncer couldn't be any easier; in fact, I've set up a Github repository, dubbed Recoil, that'll enable you to familiarize yourself with both PgBouncer's connection pooling and routing features from the comfort of your laptop. All you need is Docker Engine and you'll be good to go, baby!
The connection pooling boogie
Connections in Postgres are expensive because Postgres creates a new backend process for each connection, which involves memory allocation and initialization tasks. A connection pooler manages a set of connections thereby freeing up a database's resources to focus on other aspects, such as rapidly returning data.
To see how PgBouncer works as a connection pool, you can use use pgbench, which is a simple command line utility that runs benchmark tests on a Postgres instance. It's an efficient way to exhaust database resources, which is easy do if you set a high value for pgbench
's client connections (i.e. via the -c
flag) . You can, of course, increase Postgres's max_connections
; however, you'll potentially do this at the cost of other important resources.
Using the handy docker-compose.yml
file found in the Recoil repository and following the detailed directions in the README
, you can fire up a single instance of Postgres and do two things: initialize the database for benchmarking and then running a benchmark test.
Initializing the database for benchmarking is easy. Run the following command in your favorite terminal:
The -i
flag initializes a benchmark with a scale factor of 5 via the -s
flag. This will create 500,000 rows in a ppgbench_accounts
table.
By default, Postgres's max_connections
is set at 100. You can quickly exhaust this resource by running a benchmark test that exceeds this limit like so:
In this case, 101 clients, configured via the -c
flag were thrown at the local Postgres via 2 threads (i.e. -j
). Each client attempted to issue 100,000 transactions via the -t
flag. Unfortunately, this test fails as Postgres quickly exhausts valuable connections. The database will error out with a message along the lines of FATAL: sorry, too many clients already
.
This is a great use case for employing a connection pool like PgBouncer! Following the groovy directions I outlined in Recoil's README
you can stand up PgBouncer in front of Postgres and let it manage database connections; what's more, if you set PgBouncer's connection pool to a large number, such as 200, you can get through the above pgbench
test without failure.
Accordingly, you can reissue the pgbench
benchmark test pointing to a different port, since you're going directly to PgBouncer (instead of Postgres) like so:
Connection pooling isn't a panacea; indeed, you can play around with pgbench
and quickly exhaust PgBouncer as well. What's important is that you've essentially offloaded connection management to a specialized process which is more efficient. Moreover, PgBouncer adds a few other handy features for your scaling needs including routing.
Routin' SQL requests
Horizontally scaling Postgres is fairly simple, especially if you read my blog post on read replicas. Of course, when you stand up more than one database, you'll want an easy way to route SQL requests. A simple load balancer would do this automatically for you; however, with PgBouncer, you can easily route via a name. In the case of read replicas, you'll want to route writes to the primary instance and reads to one or more replicas. Conceptually, the architecture of using PgBouncer as a combination pooler and router looks like so:
With PgBouncer, routes are configured like connection strings, where the database name signifies a specific database. For instance, in my Recoil example, the database is named recoil
. In a normal read replica environment, read replicas would have the same name; nevertheless, with PgBouncer, you can give each read replica a unique identifier, which clients can leverage for specific routing.
You can see routing in action following the directions in the Recoil Github repository. If you examine the docker-compose.yml
file, you'll note that the PgBouncer instance, pgbouncer
, defines two databases. Defining the first database is done like so:
Subsequent databases, however, are defined using a somewhat esoteric string like so:
The crucial part of this string is the DSN_0
aspect, which gives this route the name recoil-ro
. This logical name points to the postgres-replica
service listening on port 5432
.
You can see routing (and replication too!) in action by using psql
to log into the primary and issuing two SQL queries. First, shell into the primary using the following command:
If you're following the directions in Recoil's repository and ran bunzip2
on the employee_data.sql.bz2
file and then started things up with docker compose up -d
, then when you run this SQL command, you should see a count of 300024
.
Next, insert an additional employee
record like so:
With Postgres replication, behind the scenes this record is automatically added to the configured read replica (i.e. the postgres-replica
service found in the Docker compose file). You can verify this new row was added by using psql
to shell into the replica instance by referencing the route name elaborated earlier, which is recoil-ro
.
Run the aforementioned count
query and you should see the answer is 300025
, which reflects the newly added employee
record inserted into the primary instance and replicated to the replica.
As you can see from the code above, PgBouncer is acting like a proxy, routing queries to appropriate database instances via name. In fact, conceptually, PgBouncer is akin to a load balancer. It does represent a single point of failure, but if you zoom out slightly, you'll recall that PgBouncer also is a connection pool.
Keep on truckin' with PgBouncer
If you're using Postgres and are looking for a way to improve the performance and stability of your application, then PgBouncer is a groovy option. It's easy to set up and use, and it can provide a significant improvement in performance. Can you dig it?