Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 113 additions & 0 deletions docs/enterprise_edition/autodiscovery.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
---
icon: material/database-eye
---

# Automatic database discovery

When deployed in front of AWS Aurora databases, PgDog can automatically detect the cluster instances and configure them in `pgdog.toml`. This is useful when Aurora uses replica autoscaling, which can add or remove instances at any time.

## How it works

This feature is **disabled** by default. To enable it, add at least one Aurora host to `pgdog.toml` and enable autodiscovery:

=== "pgdog.toml"
```toml
[[databases]]
name = "postgres"
host = "any-instance.account-id.region.rds.amazonaws.com"

[autodiscovery]
enabled = true
```
=== "Helm chart"
```yaml
databases:
- name: "postgres"
host: "any-instance.account-id.region.rds.amazonaws.com"
autodiscovery:
enabled: true
```

When enabled, PgDog connects to the first available host for each database in the configuration and runs the [`aurora_replica_status()`](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora_replica_status.html) function to get the list of instances in the cluster.

PgDog then replaces all entries in `pgdog.toml` with the discovered hosts and reloads its configuration automatically.

### Autoscaling events

To keep the list of databases in sync with Aurora autoscaling events, PgDog periodically queries the first available host in its config and runs the replica discovery function again.

If the list of databases has changed, PgDog updates its config and reloads it. The interval for this check is configurable:

=== "pgdog.toml"
```toml
[autodiscovery]
enabled = true
check_interval = 5_000
```

=== "Helm chart"
```yaml
autodiscovery:
enabled: true
checkInterval: 5000
```

### Filtering databases

When enabled, autodiscovery runs for all databases in `pgdog.toml` by default. If some of your databases are not running on Aurora, or you want autodiscovery on some databases but not others, you can configure which ones it applies to:

=== "pgdog.toml"
```toml
[[databases]]
name = "postgres"
host = "any-instance.account-id.region.rds.amazonaws.com"

[[databases]]
name = "staging"
host = "not-aurora.account-id.region.rds.amazonaws.com"

[autodiscovery]
enabled = true

[[autodiscovery.databases]]
name = "postgres"
```
=== "Helm chart"
```yaml
databases:
- name: "postgres"
host: "any-instance.account-id.region.rds.amazonaws.com"
- name: "staging"
host: "not-aurora.account-id.region.rds.amazonaws.com"
autodiscovery:
enabled: true
databases:
- name: "postgres"
```

In this example, only the `postgres` database will have autodiscovery enabled.

!!! note "Configuring databases"
If you specify the `[[autodiscovery.databases]]` config, any database _not_ listed there will
not have autodiscovery enabled.

### Replicas only

If you're not using the read/write separation (single endpoint) feature of the [load balancer](../features/load-balancer/index.md#single-endpoint), you may want to configure read and write connection pools separately.

To exclude the writer instance from host discovery for a read-only connection pool, set `replicas_only` in the autodiscovery database settings:

=== "pgdog.toml"
```toml
[[autodiscovery.databases]]
name = "prod_readonly"
replicas_only = true
```

=== "Helm chart"
```yaml
autodiscovery:
databases:
- name: "postgres"
replicasOnly: true
```
175 changes: 138 additions & 37 deletions docs/features/authentication.md

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions docs/features/transaction-mode.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ Transaction mode is **enabled** by default. This is controllable via configurati
[general]
pooler_mode = "transaction"
```
<br>
```yaml title="Helm chart"
poolerMode: transaction
```
Expand All @@ -37,6 +38,7 @@ Transaction mode is **enabled** by default. This is controllable via configurati
host = "127.0.0.1"
pooler_mode = "transaction"
```
<br>
```yaml title="Helm chart"
databases:
- name: prod
Expand All @@ -50,6 +52,7 @@ Transaction mode is **enabled** by default. This is controllable via configurati
database = "prod"
pooler_mode = "transaction"
```
<br>
```yaml title="Helm chart"
users:
- name: alice
Expand Down Expand Up @@ -145,6 +148,7 @@ To use statement mode, you can configure it globally or per user/database, for e
[general]
pooler_mode = "statement"
```
<br>
```yaml title="Helm chart"
poolerMode: statement
```
Expand All @@ -155,6 +159,7 @@ To use statement mode, you can configure it globally or per user/database, for e
host = "127.0.0.1"
pooler_mode = "statement"
```
<br>
```yaml title="Helm chart"
databases:
- name: prod
Expand All @@ -168,6 +173,7 @@ To use statement mode, you can configure it globally or per user/database, for e
database = "prod"
pooler_mode = "statement"
```
<br>
```yaml title="Helm chart"
users:
- name: alice
Expand Down
Binary file added docs/images/logo-blue-64x64.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/resharding-intro.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
46 changes: 36 additions & 10 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,30 +4,56 @@ icon: material/home

# Introduction to PgDog

[PgDog](https://pgdog.dev) is a sharder, connection pooler and load balancer for PostgreSQL. Written in Rust, PgDog is fast, reliable and scales databases horizontally without requiring changes to application code.
PgDog is a connection pooler, load balancer and database sharder for PostgreSQL. Written in Rust, PgDog is fast, reliable and scales Postgres databases without requiring changes to your application.

## The problem
## Getting started

Unlike NoSQL databases, PostgreSQL can serve `INSERT`, `UPDATE`, and `DELETE` queries from only one machine. Once the capacity of that machine is exceeded, applications have to find new and creative ways to reduce their impact on the database, like batching requests or delaying workloads to run overnight.
PgDog is an open source project. You can download its code from our [repository](https://github.com/pgdogdev/pgdog) on GitHub. If you're deploying PgDog to your cloud account (or on prem), you can either use the [compiled binaries](https://github.com/pgdogdev/pgdog/releases) we provide or build it from source.

At the same time, database operators are faced with increasing operating costs, like behind schedule vacuums, table bloat and downtime. Incidents are frequent and engineers are more focused on not breaking the DB than building new features.
Every commit in the `main` branch and weekly tagged releases have corresponding images in our [Docker](https://github.com/orgs/pgdogdev/packages/container/package/pgdog) repository, for example:

```bash
docker run ghcr.io/pgdogdev/pgdog:v0.1.46
```

## Why PgDog

PostgreSQL is a process-based, single-primary database. As such, it has hard limits on how many clients can connect, how many queries a single server can execute, and how much data it can write at any given time.

PgDog helps you work around these limits. It's a single binary, deployed between your application and the database, that speaks both the protocol applications use to talk to Postgres and the replication protocol Postgres uses internally. This lets PgDog do its job transparently, without changes to Postgres or the applications that query it.

## Connection pooler

Like PgBouncer or RDS Proxy, PgDog is a connection pooler: it multiplexes thousands of application connections over just a handful of Postgres server connections. This effectively removes the limit on how many clients a PostgreSQL database can serve at once.

Unlike those proxies, PgDog handles features that usually force a pooler to pin or reset connections. It supports SET statements, LISTEN/NOTIFY, and advisory locks without breaking connection state, so your application keeps working as if it were talking to Postgres directly.

PgDog is also multithreaded, so a single instance can serve many more clients while still relying on the same small number of Postgres connections.

You can read more about how PgDog handles transactions [here](features/transaction-mode.md).

## Load balancer

If your database has read replicas, PgDog can distribute read queries across them using one of several built-in load balancing [algorithms](features/load-balancer/index.md). This lets you scale reads simply by adding replicas, with no application changes and no extra infrastructure like HAProxy or Patroni.

You can read more about how PgDog load balances queries [here](features/load-balancer/index.md).


## Sharding PostgreSQL

The solution to an overextended database is **sharding**: splitting all tables and indices equally between multiple machines. For example, if your primary database is 750 GB, splitting it into 3 shards will produce 3 databases of 250 GB each. As databases get smaller, vacuums start to catch up, indices fit into memory again, and queries run faster with reliable performance.
Unlike NoSQL databases, PostgreSQL can serve INSERT, UPDATE, and DELETE queries from only one server. Once that server's capacity is exceeded, applications have to find new and creative ways to reduce their load on the database, such as batching requests or deferring workloads to run overnight.

As shards store more data and grow, they can be split again, scaling PostgreSQL horizontally. Sharded databases can grow into petabytes (that's thousands of TB), while serving OLTP and OLAP use cases.
At the same time, database operators face rising operating costs from vacuums that fall behind schedule, table bloat and downtime. Incidents become frequent, and engineers end up more focused on keeping the database from breaking than on building new features.

## How PgDog works
The solution to an overextended database is sharding: splitting all tables and indices equally between multiple machines. For example, if your primary database is 750 GB, splitting it into 3 shards will produce 3 databases of 250 GB each. As databases get smaller, vacuums start to catch up, indices fit into memory again, and queries run faster with reliable performance.

PgDog operates on the application layer of the stack: it speaks PostgreSQL and understands not only the queries sent by applications but also the logical replication protocol used by the server. This allows it to automatically route queries while moving data between machines to create more capacity.
As shards accumulate more data and grow, they can be split again, scaling PostgreSQL horizontally. Sharded databases can grow into petabytes (thousands of TB) while serving both OLTP and OLAP workloads.

<center>
<img src="/images/intro.png" width="95%" alt="How PgDog works" />
<img src="/images/resharding-intro.png" class="theme-aware-image" width="85%" alt="How PgDog sharding works" />
</center>

While PgDog focuses a lot on sharding PostgreSQL, it is also a load balancer and transaction pooler that can be used with simpler PostgreSQL deployments.
PgDog operates on the application layer of the stack: it speaks PostgreSQL and understands not only the queries sent by applications but also the logical replication protocol used by the server. This allows it to automatically route queries while moving data between machines to create more capacity.

This documentation provides a detailed overview of all PgDog features, along with reference material for production operations.

Expand Down
37 changes: 18 additions & 19 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,34 +31,35 @@ docker run ghcr.io/pgdogdev/pgdog:v0.1.44
### AWS ECS

!!! note "New feature"
This is a new feature. Please report any issues you may encounter.
The ECS Terraform module is a new project. Please report any issues you may encounter. Community contributions are welcome.

We recently added a [Terraform module](https://github.com/pgdogdev/pgdog-ecs-terraform) to deploy PgDog on AWS ECS. It works with the same Docker image as our Helm chart, so the experience should be familiar.
We recently added a [Terraform module](https://github.com/pgdogdev/pgdog-ecs-terraform) to deploy PgDog on AWS ECS. It works with the same Docker image as our [Helm chart](#kubernetes), so the experience should be familiar.

## Pre-built binaries

Each PgDog release (weekly, on Thursdays) contains pre-built binaries for Linux (arm64, amd64), Mac (aarch64, i.e. Apple Silicon), and Debian packages (`.deb`) for convenient installation on Debian/Ubuntu.
Each PgDog release (every week, on Thursday) contains pre-built binaries for Linux (arm64, amd64), Mac (aarch64, i.e. Apple Silicon), and Debian packages (`.deb`) for convenient installation on Debian/Ubuntu.

You can download pre-built binaries in [GitHub](https://github.com/pgdogdev/pgdog/releases).
You can download all binaries from the [releases page](https://github.com/pgdogdev/pgdog/releases) in GitHub.

#### glibc
The Linux binaries are built on Ubuntu 24.04, and are linked against glibc version 2.39. To run them, your system needs glibc 2.39 or later.
#### Linux binaries and glibc

#### Mac OS
The Linux binaries are built on Ubuntu 24.04 and are linked against glibc version 2.39. To run them, your system needs glibc 2.39 or later.

The Mac OS binary is not signed. To run it locally, make sure to de-quarantine it:
#### Mac OS security

The Mac OS binary is not signed. To run it locally, make sure to de-quarantine it first, e.g.:

```bash
xattr -d com.apple.quarantine ./pgdog
```

## From source

PgDog can be easily compiled from source. For production deployments, a `Dockerfile` is provided in [GitHub](https://github.com/pgdogdev/pgdog/tree/main/Dockerfile). If you prefer to deploy on bare-metal or you're looking to run PgDog locally, you'll need to install a few dependencies.
PgDog can be easily compiled from source. For production deployments, a [`Dockerfile`](https://github.com/pgdogdev/pgdog/tree/main/Dockerfile) is provided in our code repository. If you prefer to deploy on bare-metal or you are looking to run PgDog locally, you will need to install a few dependencies.

### Dependencies

Parts of PgDog depend on C/C++ libraries, which are compiled from source. Make sure to have a working version of a C/C++ compiler installed.
Parts of PgDog depend on C/C++ libraries, which are compiled from source. Make sure to have a working version of a C/C++ compiler installed before building from source:

=== "macOS"
Install [Xcode](https://developer.apple.com/xcode/) from the App Store and CMake & Clang from Homebrew:
Expand Down Expand Up @@ -102,15 +103,15 @@ git clone https://github.com/pgdogdev/pgdog && \
cd pgdog
```

To make sure you get all performance benefits, PgDog should be compiled in release mode:
To make sure you get all performance benefits, PgDog should be compiled in release mode with all optimizations:

```bash
cargo build --release
```

### Launch PgDog

You can start PgDog by running the binary directly, located in `target/release/pgdog`, or with Cargo:
You can start PgDog by running the binary directly, which is located in `target/release/pgdog`, or by running it with Cargo:

```bash
cargo run --release
Expand All @@ -122,13 +123,12 @@ PgDog is configured via two files:

| Configuration file | Description |
|-|-|
| [pgdog.toml](configuration/index.md) | Contains general settings and information about PostgreSQL servers. |
| [users.toml](configuration/users.toml/users.md) | Contains users and passwords that are allowed to connect to PgDog. |
| [pgdog.toml](configuration/index.md) | General settings and information about PostgreSQL servers. |
| [users.toml](configuration/users.toml/users.md) | Usernames and passwords that are allowed to connect to PgDog. |

Users are configured separately to allow them to be encrypted at rest in environments that support it, like in Kubernetes or with the AWS Secrets Manager.
Users are configured separately, which allows them to be encrypted at rest in environments that support it, like in Kubernetes or with the AWS Secrets Manager.

Both config files should be placed in the current working directory (`$PWD`) for PgDog to detect them. Alternatively,
you can pass their paths at startup as arguments:
If the configuration files are placed in the current working directory (`$PWD`), PgDog will detect them automatically. Alternatively, you can pass their paths at startup as arguments:

```bash
pgdog \
Expand All @@ -155,7 +155,7 @@ Most configuration options have sensible defaults. This makes single-database co

#### [users.toml](configuration/users.toml/users.md)

This config file contains a mapping between databases, users, and passwords. Unless you configured [passthrough authentication](features/authentication.md#passthrough-authentication), users not specified in this file won't be able to connect:
This config file contains a mapping between databases, users, and passwords. Unless you configured [passthrough authentication](features/authentication.md#passthrough-authentication), users not specified in this file will not be able to connect:

=== "users.toml"
```toml
Expand All @@ -173,7 +173,6 @@ This config file contains a mapping between databases, users, and passwords. Unl
```

!!! note "Configuring users"

PgDog creates connection pools for each user/database pair. If no users are configured in `users.toml`, all connection pools will be disabled and PgDog won't connect to the database(s).

## Next steps
Expand Down
Loading
Loading