News
🗄️ Databases Tutorials Add an Index to a Large Postgres Table Without Locking It

Add an Index to a Large Postgres Table Without Locking It

CREATE INDEX locks writes. CREATE INDEX CONCURRENTLY does not. Here is how to use it, monitor it, and recover when it fails halfway.

You have a Postgres table with 50 million rows. Queries against one of the columns are slow. The right fix is an index. The wrong way to add it — on a live production database — is:

CREATE INDEX idx_orders_customer ON orders (customer_id);

That statement takes an ACCESS SHARE lock on the table, which is fine for reads. But it also takes a SHARE lock that blocks writes for the entire duration of the build. On a 50M-row table that can be several minutes. Your write traffic stacks up behind it. Ordering stalls, the queue backs up, and on-call's phone rings.

The right way uses CONCURRENTLY. Here is how.


Step 1 — Use CREATE INDEX CONCURRENTLY

CREATE INDEX CONCURRENTLY idx_orders_customer ON orders (customer_id);

The CONCURRENTLY keyword changes the build strategy. Postgres takes a lighter lock that permits concurrent inserts, updates, and deletes throughout the build. The index is built in two table scans plus a final validation pass, which is slower than the normal build — often 2–3x — but your application never notices.

A few things to know before you run it:

  • It cannot run inside a transaction block. You must execute it as a standalone statement. If you are using a migration tool that wraps each migration in BEGIN/COMMIT, disable that for this migration.
  • It can fail. If the build hits a constraint violation or is interrupted, Postgres is left with an invalid index. We'll handle that in Step 4.
  • It takes twice as long. Plan around that — on a 50M-row table, expect the operation to run for tens of minutes.

Step 2 — Tell the migration tool to skip its transaction wrapper

Different tools, different flags.

Rails / ActiveRecord:

class AddIndexOnCustomerId < ActiveRecord::Migration[7.1]
  disable_ddl_transaction!

  def change
    add_index :orders, :customer_id, algorithm: :concurrently
  end
end

Django:

from django.db import migrations

class Migration(migrations.Migration):
    atomic = False  # ← disables the transaction wrapper

    operations = [
        migrations.RunSQL(
            "CREATE INDEX CONCURRENTLY idx_orders_customer ON orders (customer_id);",
            reverse_sql="DROP INDEX CONCURRENTLY IF EXISTS idx_orders_customer;",
        ),
    ]

sqlx (Rust), golang-migrate, Flyway, Liquibase — all have equivalent flags. The common failure mode is forgetting the flag, getting a cryptic error like CREATE INDEX CONCURRENTLY cannot run inside a transaction block, and then the migration is half-applied and blocks all future deploys. Double-check the flag before you merge.


Step 3 — Watch the build from another session

In a separate psql session, run:

SELECT
  now() - query_start AS duration,
  state,
  query
FROM pg_stat_activity
WHERE query LIKE '%CREATE INDEX CONCURRENTLY%';

For Postgres 12+, you can get granular progress from pg_stat_progress_create_index:

SELECT
  phase,
  round(100.0 * blocks_done / NULLIF(blocks_total, 0), 1) AS block_pct,
  round(100.0 * tuples_done / NULLIF(tuples_total, 0), 1) AS tuple_pct
FROM pg_stat_progress_create_index;

This gives you a live phase (building index: scanning table, sorting tuples, validating, etc.) and a rough percent-done. No more guessing whether the build is close or just started.


Step 4 — Clean up if it fails

If the CREATE INDEX CONCURRENTLY is interrupted — the session died, the database restarted, a CHECK constraint was violated — Postgres leaves the index in an invalid state. It exists, it takes up disk space, but queries cannot use it.

Check for invalid indexes:

SELECT
  schemaname,
  indexrelname AS index_name,
  relname AS table_name
FROM pg_stat_user_indexes i
JOIN pg_index idx ON idx.indexrelid = i.indexrelid
WHERE NOT idx.indisvalid;

Drop the invalid one and retry:

DROP INDEX CONCURRENTLY IF EXISTS idx_orders_customer;
CREATE INDEX CONCURRENTLY idx_orders_customer ON orders (customer_id);

You can also use REINDEX INDEX CONCURRENTLY (Postgres 12+), but in my experience dropping and recreating is more predictable and avoids edge cases where REINDEX has its own locking behavior.


Step 5 — Verify the index is being used

After the build completes, confirm the query planner actually picks it up:

EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42;

Look for Index Scan using idx_orders_customer in the output. If you still see Seq Scan, check:

  • Is the table large enough that Postgres thinks the index is worth it? On tiny tables a seq scan is genuinely faster.
  • Are statistics up to date? ANALYZE orders; after the index is built if planning looks off.
  • Does the query actually match the index? customer_id = 42 uses it. customer_id::text = '42' does not, because of the implicit cast.

A quick note on unique indexes

If you need a unique constraint, you can still build it concurrently:

CREATE UNIQUE INDEX CONCURRENTLY idx_users_email_unique ON users (email);
ALTER TABLE users ADD CONSTRAINT users_email_unique UNIQUE USING INDEX idx_users_email_unique;

The ALTER TABLE ... USING INDEX form attaches the constraint to the already-built index without another scan. You get the safety of CONCURRENTLY and the semantics of a real UNIQUE constraint.

Beware: if the table contains duplicate values, the index build will fail partway through and you will have an invalid index to clean up. Deduplicate the data first, or be prepared to drop and start over.


What this looks like in a migration PR

A well-scoped migration PR for this kind of change:

  • One migration, one index
  • CONCURRENTLY keyword visible in the SQL
  • Transaction wrapper disabled at the migration level
  • A rollback that also uses CONCURRENTLY
  • A comment with the expected runtime based on table size
  • (Optional) an IF NOT EXISTS for idempotency
-- Expected runtime: ~15 minutes on a 50M-row table.
-- Does not lock writes thanks to CONCURRENTLY.
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_orders_customer
  ON orders (customer_id);

That is the difference between a two-minute deploy and a one-minute outage. Worth the extra care.