Skip to content

SQL – Schema Design & Best Practices

A well-designed schema is the foundation of every performant, maintainable database. Schema decisions made early are extremely difficult and expensive to undo once data is in production.

Data Types

Use the most appropriate, specific data type for every column. The right type enforces data integrity at the database level and enables the query optimiser to work effectively.

Numeric Types

-- ✅ Integer types
id              BIGINT              -- Primary keys and foreign keys: always BIGINT
quantity        INTEGER             -- Standard integer: counters, quantities, ages

-- ✅ Monetary values — NEVER use FLOAT or DOUBLE for money
price           NUMERIC(12,2)       -- NUMERIC for exact decimal arithmetic
total_amount    NUMERIC(12,2)       -- NUMERIC(precision, scale) — no rounding errors
tax_rate        NUMERIC(5,4)        -- e.g. 0.2000 for 20%

-- ❌ Float for money — causes rounding errors in financial calculations
price           FLOAT,              -- 0.1 + 0.2 ≠ 0.3 in floating-point
total_amount    REAL                -- never use for financial data

Never use FLOAT for money

Floating-point arithmetic is imprecise by design. 0.1 + 0.2 in a FLOAT column produces 0.30000000000000004. Use NUMERIC(precision, scale) for all monetary values without exception.

Text Types

-- ✅ Text types
name            VARCHAR(255)        -- bounded string with a known maximum
description     TEXT                -- unbounded text — no length limit needed
currency_code   CHAR(3)             -- fixed-width when all values are exactly the same length
email           VARCHAR(254)        -- RFC 5321 maximum email length

-- ❌ Incorrectly specified VARCHAR
name            VARCHAR(10000)      -- use TEXT if there is no meaningful upper bound
email           VARCHAR(50)         -- emails can be up to 254 characters

Date and Time Types

-- ✅ Always use TIMESTAMPTZ — stores in UTC, displays in any timezone
created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()
event_time      TIMESTAMPTZ NOT NULL

-- ✅ DATE for calendar dates with no time component
birth_date      DATE
invoice_date    DATE

-- ❌ Timezone-naive TIMESTAMP — almost always wrong
created_at      TIMESTAMP           -- 'what timezone?' becomes unanswerable later

Boolean Types

-- ✅ Use BOOLEAN — not TINYINT(1) or VARCHAR('yes'/'no')
is_active       BOOLEAN NOT NULL DEFAULT TRUE
has_paid        BOOLEAN NOT NULL DEFAULT FALSE

-- ❌ Do not simulate booleans
is_active       INTEGER             -- what does 0, 1, 2 mean?
has_paid        VARCHAR(3)          -- 'yes'/'no' — no constraint enforcement

UUIDs vs BIGINTs

-- ✅ BIGINT for internal tables — faster joins, smaller storage
CREATE TABLE orders (
    id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    ...
);

-- ✅ UUID for globally distributed or externally-exposed IDs
CREATE TABLE api_tokens (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    ...
);

-- ✅ Both — BIGINT internally, UUID as external reference
CREATE TABLE payments (
    id            BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    external_id   UUID   NOT NULL UNIQUE DEFAULT gen_random_uuid(),
    ...
);

Constraints

Database constraints are the last line of defence. Application code has bugs. Constraints enforce data integrity regardless of which application, migration script, or admin query writes to the table.

-- ✅ Complete constraint coverage
CREATE TABLE users (
    id              BIGINT       PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    email           VARCHAR(254) NOT NULL,
    first_name      VARCHAR(100) NOT NULL,
    last_name       VARCHAR(100) NOT NULL,
    password_hash   VARCHAR(255) NOT NULL,
    role            VARCHAR(50)  NOT NULL DEFAULT 'standard',
    is_active       BOOLEAN      NOT NULL DEFAULT TRUE,
    created_at      TIMESTAMPTZ  NOT NULL DEFAULT NOW(),
    updated_at      TIMESTAMPTZ  NOT NULL DEFAULT NOW(),
    deleted_at      TIMESTAMPTZ,

    CONSTRAINT users_email_key      UNIQUE (email),
    CONSTRAINT users_role_check     CHECK (role IN ('standard', 'admin', 'support')),
    CONSTRAINT users_email_format   CHECK (email ~ '^[^@\s]+@[^@\s]+\.[^@\s]+$')
);

ON DELETE Reference

Action Behaviour Use When
RESTRICT Prevent deletion of parent row Most FK relationships — safe default
CASCADE Delete child rows when parent deleted Child rows are meaningless without parent
SET NULL Set FK to NULL when parent deleted Child can exist without the parent
SET DEFAULT Set FK to default value Rare — when a default fallback exists

Normalisation

Normalise to reduce redundancy and ensure consistency. Aim for Third Normal Form (3NF) minimum:

-- ❌ Denormalised — customer data duplicated in every order
CREATE TABLE orders (
    id               BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    customer_name    VARCHAR(255),
    customer_email   VARCHAR(254),
    customer_address TEXT,          -- duplicated and potentially stale on every order
    total_amount     NUMERIC(12,2)
);

-- ✅ Normalised — customer data lives once
CREATE TABLE customers (
    id       BIGINT       PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    name     VARCHAR(255) NOT NULL,
    email    VARCHAR(254) NOT NULL UNIQUE,
    address  TEXT
);

CREATE TABLE orders (
    id           BIGINT        PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    customer_id  BIGINT        NOT NULL REFERENCES customers(id),
    total_amount NUMERIC(12,2) NOT NULL
);

Normalise for integrity, denormalise for read performance

Normalisation eliminates redundancy and enforces consistency. If a normalised schema is genuinely too slow for a specific read-heavy query, consider a materialised view or controlled denormalisation — with documentation explaining the trade-off.

Transactions

Group related operations that must succeed or fail together in a transaction:

-- ✅ Transaction wraps operations that must all succeed together
BEGIN;

    UPDATE accounts
    SET    balance = balance - 500.00
    WHERE  id = 1;

    UPDATE accounts
    SET    balance = balance + 500.00
    WHERE  id = 2;

COMMIT;

-- ✅ Full order creation in a single transaction
BEGIN;

    INSERT INTO orders (user_id, total_amount, order_status)
    VALUES ($1, $2, 'confirmed')
    RETURNING id INTO order_id;

    INSERT INTO order_items (order_id, product_id, quantity, unit_price)
    SELECT order_id, product_id, quantity, unit_price
    FROM   cart_items
    WHERE  cart_id = $3;

    DELETE FROM cart_items WHERE cart_id = $3;

COMMIT;

Soft Deletes

Never hard-delete records referenced by other tables or needed for audit/compliance:

-- ✅ Soft delete pattern
ALTER TABLE orders
    ADD COLUMN deleted_at TIMESTAMPTZ;

-- "Delete" — set the timestamp
UPDATE orders SET deleted_at = NOW() WHERE id = $1;

-- All queries exclude deleted rows via a view
CREATE VIEW v_active_orders AS
    SELECT * FROM orders WHERE deleted_at IS NULL;

-- ✅ Partial index for performance — only indexes active rows
CREATE INDEX orders_active_user_idx
    ON orders(user_id)
    WHERE deleted_at IS NULL;

Migrations

All schema changes must be made via migration files — never by manually altering a production database.

Approved Migration Tools

Language / Stack Approved Tool Notes
Java (Spring Boot) Flyway SQL-first, convention-based versioning
Java (alternative) Liquibase XML/YAML/SQL changelogs, supports rollback
Python (SQLAlchemy) Alembic Autogeneration from ORM models, env.py for multi-env
Python (Django) Django Migrations Built-in, closely tied to the ORM
Node.js (Prisma) Prisma Migrate Schema-first, auto-generates SQL
Node.js (raw SQL) db-migrate or Flyway Choose one and standardise per project
Any Flyway (community edition) Works with any language — SQL files only

Migration File Standards

-- ✅ Flyway naming convention: V{version}__{description}.sql
-- V20260315_001__add_is_gift_to_orders.sql

-- Up migration — always idempotent where possible
ALTER TABLE orders
    ADD COLUMN IF NOT EXISTS is_gift BOOLEAN NOT NULL DEFAULT FALSE;

COMMENT ON COLUMN orders.is_gift
    IS 'True if order was placed as a gift. Triggers gift wrap and hides receipt from recipient.';

CREATE INDEX CONCURRENTLY orders_is_gift_idx
    ON orders(is_gift)
    WHERE is_gift = TRUE;
# ✅ Alembic migration: {timestamp}_{description}.py
# alembic/versions/20260315_001_add_is_gift_to_orders.py

def upgrade() -> None:
    op.add_column(
        'orders',
        sa.Column('is_gift', sa.Boolean(), nullable=False, server_default='false')
    )
    op.create_index(
        'orders_is_gift_idx', 'orders', ['is_gift'],
        postgresql_where=sa.text('is_gift = true')
    )

def downgrade() -> None:
    op.drop_index('orders_is_gift_idx', table_name='orders')
    op.drop_column('orders', 'is_gift')

Migration Rules

  • Every migration must be backward compatible where possible — add columns as nullable or with defaults before making them NOT NULL
  • Never rename or drop a column in the same deploy as code that depends on the new name — deploy in two stages
  • Always write and test the rollback (down migration) before deploying
  • Never modify data in a migration that runs as part of deployment — use a separate reviewed one-time script
  • Migrations run on main branch only — never on feature branches

Audit Logging

For tables containing sensitive or compliance-relevant data, maintain a full audit trail of all changes:

-- ✅ Audit log table — one row per change
CREATE TABLE audit_log (
    id           BIGINT      PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    table_name   VARCHAR(100) NOT NULL,
    record_id    BIGINT       NOT NULL,
    action       VARCHAR(10)  NOT NULL CHECK (action IN ('INSERT', 'UPDATE', 'DELETE')),
    changed_by   BIGINT       REFERENCES users(id),
    changed_at   TIMESTAMPTZ  NOT NULL DEFAULT NOW(),
    old_values   JSONB,        -- NULL for INSERT
    new_values   JSONB         -- NULL for DELETE
);

CREATE INDEX audit_log_table_record_idx
    ON audit_log(table_name, record_id);

CREATE INDEX audit_log_changed_at_idx
    ON audit_log(changed_at DESC);

-- ✅ Trigger function to automatically log changes
CREATE FUNCTION trg_audit_log_changes()
RETURNS TRIGGER
LANGUAGE plpgsql AS $$
BEGIN
    INSERT INTO audit_log (table_name, record_id, action, changed_by, old_values, new_values)
    VALUES (
        TG_TABLE_NAME,
        COALESCE(NEW.id, OLD.id),
        TG_OP,
        NULLIF(current_setting('app.current_user_id', true), '')::BIGINT,
        CASE WHEN TG_OP = 'INSERT' THEN NULL ELSE to_jsonb(OLD) END,
        CASE WHEN TG_OP = 'DELETE' THEN NULL ELSE to_jsonb(NEW) END
    );
    RETURN COALESCE(NEW, OLD);
END;
$$;

-- Apply to sensitive tables
CREATE TRIGGER after_orders_change
    AFTER INSERT OR UPDATE OR DELETE ON orders
    FOR EACH ROW EXECUTE FUNCTION trg_audit_log_changes();

Row-Level Security (RLS)

Use PostgreSQL Row-Level Security to enforce data isolation at the database level — ensuring users can only access their own data, even if application-level filtering is bypassed:

-- ✅ Enable RLS on a table
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;

-- ✅ Policy — users can only see their own orders
CREATE POLICY orders_user_isolation
    ON orders
    USING (user_id = NULLIF(current_setting('app.current_user_id', true), '')::BIGINT);

-- ✅ Admin bypass policy
CREATE POLICY orders_admin_bypass
    ON orders
    USING (
        NULLIF(current_setting('app.current_user_role', true), '') = 'admin'
    );

-- ✅ Set the user context from the application before each query
-- (in application code — Node.js / Python / Java)
SET LOCAL app.current_user_id = '42';
SET LOCAL app.current_user_role = 'standard';
SELECT * FROM orders;  -- automatically filtered to user_id = 42

Scheduled Jobs with pg_cron

Use pg_cron for scheduled database maintenance tasks — materialised view refreshes, archiving old records, and statistics updates:

-- Enable pg_cron extension (requires postgresql.conf: shared_preload_libraries = 'pg_cron')
CREATE EXTENSION IF NOT EXISTS pg_cron;

-- ✅ Refresh materialised view nightly at 02:00 UTC
SELECT cron.schedule(
    'refresh-daily-revenue',
    '0 2 * * *',
    'REFRESH MATERIALIZED VIEW CONCURRENTLY mv_daily_revenue'
);

-- ✅ Archive old audit logs monthly (retain 1 year)
SELECT cron.schedule(
    'archive-old-audit-logs',
    '0 3 1 * *',
    $$
        INSERT INTO audit_log_archive SELECT * FROM audit_log WHERE changed_at < NOW() - INTERVAL '1 year';
        DELETE FROM audit_log WHERE changed_at < NOW() - INTERVAL '1 year';
    $$
);

-- ✅ Update table statistics weekly
SELECT cron.schedule(
    'weekly-analyze',
    '0 4 * * 0',
    'ANALYZE'
);

-- View all scheduled jobs
SELECT jobname, schedule, command, active FROM cron.job ORDER BY jobname;

Database Maintenance

-- ✅ VACUUM — reclaims storage from dead rows
VACUUM orders;
VACUUM FULL orders;  -- rewrites the table — takes a lock, use in maintenance windows only

-- ✅ ANALYZE — updates query planner statistics
ANALYZE orders;
ANALYZE;             -- updates all tables

-- ✅ REINDEX — rebuilds bloated or corrupt indexes
REINDEX TABLE CONCURRENTLY orders;
REINDEX INDEX CONCURRENTLY orders_user_id_idx;

-- ✅ Monitor autovacuum activity
SELECT
    schemaname,
    relname                                   AS table_name,
    last_autovacuum,
    last_autoanalyze,
    autovacuum_count,
    n_dead_tup                                AS dead_tuples,
    n_live_tup                                AS live_tuples
FROM  pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 20;

Security — Parameterised Queries

Never build SQL strings by concatenating user input:

-- ❌ SQL injection vulnerability
query = "SELECT * FROM users WHERE id = " + user_input;

-- ✅ Node.js + pg
const result = await client.query(
    'SELECT * FROM users WHERE id = $1',
    [userId]
);

-- ✅ Python + psycopg2
cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))

-- ✅ Java + JDBC
PreparedStatement stmt = conn.prepareStatement("SELECT * FROM users WHERE id = ?");
stmt.setLong(1, userId);

Best Practices Quick Reference

Practice Rule
Data types NUMERIC for money, TIMESTAMPTZ for time, BOOLEAN for flags, BIGINT for PKs
NOT NULL Every column that must have a value — default to NOT NULL
Constraints Name all constraints; use CHECK, UNIQUE, FK on every relevant column
FK behaviour Define ON DELETE explicitly — RESTRICT is the safe default
Normalisation 3NF minimum; denormalise only with materialised views and documentation
Transactions Wrap all multi-statement operations that must succeed or fail together
Soft deletes deleted_at TIMESTAMPTZ — never hard-delete audit-relevant records
Migrations Version-controlled, reviewed, backward-compatible, with rollback
Migration tool Flyway (Java), Alembic (Python), Prisma Migrate (Node.js)
Audit logging Trigger-based audit_log table for sensitive and compliance data
Row-level security Enforce data isolation at the DB level for multi-tenant data
Scheduled jobs Use pg_cron for maintenance tasks and materialised view refreshes
Maintenance Schedule VACUUM/ANALYZE; monitor dead tuple counts and index bloat
Parameterised queries Mandatory — never concatenate user input into SQL