SQL – Schema Design & Best Practices
A well-designed schema is the foundation of every performant, maintainable database. Schema decisions made early are extremely difficult and expensive to undo once data is in production.
Data Types
Use the most appropriate, specific data type for every column. The right type enforces data integrity at the database level and enables the query optimiser to work effectively.
Numeric Types
-- ✅ Integer types
id BIGINT -- Primary keys and foreign keys: always BIGINT
quantity INTEGER -- Standard integer: counters, quantities, ages
-- ✅ Monetary values — NEVER use FLOAT or DOUBLE for money
price NUMERIC(12,2) -- NUMERIC for exact decimal arithmetic
total_amount NUMERIC(12,2) -- NUMERIC(precision, scale) — no rounding errors
tax_rate NUMERIC(5,4) -- e.g. 0.2000 for 20%
-- ❌ Float for money — causes rounding errors in financial calculations
price FLOAT, -- 0.1 + 0.2 ≠ 0.3 in floating-point
total_amount REAL -- never use for financial data
Never use FLOAT for money
Floating-point arithmetic is imprecise by design. 0.1 + 0.2 in a FLOAT column produces 0.30000000000000004. Use NUMERIC(precision, scale) for all monetary values without exception.
Text Types
-- ✅ Text types
name VARCHAR(255) -- bounded string with a known maximum
description TEXT -- unbounded text — no length limit needed
currency_code CHAR(3) -- fixed-width when all values are exactly the same length
email VARCHAR(254) -- RFC 5321 maximum email length
-- ❌ Incorrectly specified VARCHAR
name VARCHAR(10000) -- use TEXT if there is no meaningful upper bound
email VARCHAR(50) -- emails can be up to 254 characters
Date and Time Types
-- ✅ Always use TIMESTAMPTZ — stores in UTC, displays in any timezone
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
event_time TIMESTAMPTZ NOT NULL
-- ✅ DATE for calendar dates with no time component
birth_date DATE
invoice_date DATE
-- ❌ Timezone-naive TIMESTAMP — almost always wrong
created_at TIMESTAMP -- 'what timezone?' becomes unanswerable later
Boolean Types
-- ✅ Use BOOLEAN — not TINYINT(1) or VARCHAR('yes'/'no')
is_active BOOLEAN NOT NULL DEFAULT TRUE
has_paid BOOLEAN NOT NULL DEFAULT FALSE
-- ❌ Do not simulate booleans
is_active INTEGER -- what does 0, 1, 2 mean?
has_paid VARCHAR(3) -- 'yes'/'no' — no constraint enforcement
UUIDs vs BIGINTs
-- ✅ BIGINT for internal tables — faster joins, smaller storage
CREATE TABLE orders (
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
...
);
-- ✅ UUID for globally distributed or externally-exposed IDs
CREATE TABLE api_tokens (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
...
);
-- ✅ Both — BIGINT internally, UUID as external reference
CREATE TABLE payments (
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
external_id UUID NOT NULL UNIQUE DEFAULT gen_random_uuid(),
...
);
Constraints
Database constraints are the last line of defence. Application code has bugs. Constraints enforce data integrity regardless of which application, migration script, or admin query writes to the table.
-- ✅ Complete constraint coverage
CREATE TABLE users (
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
email VARCHAR(254) NOT NULL,
first_name VARCHAR(100) NOT NULL,
last_name VARCHAR(100) NOT NULL,
password_hash VARCHAR(255) NOT NULL,
role VARCHAR(50) NOT NULL DEFAULT 'standard',
is_active BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
deleted_at TIMESTAMPTZ,
CONSTRAINT users_email_key UNIQUE (email),
CONSTRAINT users_role_check CHECK (role IN ('standard', 'admin', 'support')),
CONSTRAINT users_email_format CHECK (email ~ '^[^@\s]+@[^@\s]+\.[^@\s]+$')
);
ON DELETE Reference
| Action | Behaviour | Use When |
|---|---|---|
RESTRICT |
Prevent deletion of parent row | Most FK relationships — safe default |
CASCADE |
Delete child rows when parent deleted | Child rows are meaningless without parent |
SET NULL |
Set FK to NULL when parent deleted | Child can exist without the parent |
SET DEFAULT |
Set FK to default value | Rare — when a default fallback exists |
Normalisation
Normalise to reduce redundancy and ensure consistency. Aim for Third Normal Form (3NF) minimum:
-- ❌ Denormalised — customer data duplicated in every order
CREATE TABLE orders (
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
customer_name VARCHAR(255),
customer_email VARCHAR(254),
customer_address TEXT, -- duplicated and potentially stale on every order
total_amount NUMERIC(12,2)
);
-- ✅ Normalised — customer data lives once
CREATE TABLE customers (
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
name VARCHAR(255) NOT NULL,
email VARCHAR(254) NOT NULL UNIQUE,
address TEXT
);
CREATE TABLE orders (
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
customer_id BIGINT NOT NULL REFERENCES customers(id),
total_amount NUMERIC(12,2) NOT NULL
);
Normalise for integrity, denormalise for read performance
Normalisation eliminates redundancy and enforces consistency. If a normalised schema is genuinely too slow for a specific read-heavy query, consider a materialised view or controlled denormalisation — with documentation explaining the trade-off.
Transactions
Group related operations that must succeed or fail together in a transaction:
-- ✅ Transaction wraps operations that must all succeed together
BEGIN;
UPDATE accounts
SET balance = balance - 500.00
WHERE id = 1;
UPDATE accounts
SET balance = balance + 500.00
WHERE id = 2;
COMMIT;
-- ✅ Full order creation in a single transaction
BEGIN;
INSERT INTO orders (user_id, total_amount, order_status)
VALUES ($1, $2, 'confirmed')
RETURNING id INTO order_id;
INSERT INTO order_items (order_id, product_id, quantity, unit_price)
SELECT order_id, product_id, quantity, unit_price
FROM cart_items
WHERE cart_id = $3;
DELETE FROM cart_items WHERE cart_id = $3;
COMMIT;
Soft Deletes
Never hard-delete records referenced by other tables or needed for audit/compliance:
-- ✅ Soft delete pattern
ALTER TABLE orders
ADD COLUMN deleted_at TIMESTAMPTZ;
-- "Delete" — set the timestamp
UPDATE orders SET deleted_at = NOW() WHERE id = $1;
-- All queries exclude deleted rows via a view
CREATE VIEW v_active_orders AS
SELECT * FROM orders WHERE deleted_at IS NULL;
-- ✅ Partial index for performance — only indexes active rows
CREATE INDEX orders_active_user_idx
ON orders(user_id)
WHERE deleted_at IS NULL;
Migrations
All schema changes must be made via migration files — never by manually altering a production database.
Approved Migration Tools
| Language / Stack | Approved Tool | Notes |
|---|---|---|
| Java (Spring Boot) | Flyway | SQL-first, convention-based versioning |
| Java (alternative) | Liquibase | XML/YAML/SQL changelogs, supports rollback |
| Python (SQLAlchemy) | Alembic | Autogeneration from ORM models, env.py for multi-env |
| Python (Django) | Django Migrations | Built-in, closely tied to the ORM |
| Node.js (Prisma) | Prisma Migrate | Schema-first, auto-generates SQL |
| Node.js (raw SQL) | db-migrate or Flyway | Choose one and standardise per project |
| Any | Flyway (community edition) | Works with any language — SQL files only |
Migration File Standards
-- ✅ Flyway naming convention: V{version}__{description}.sql
-- V20260315_001__add_is_gift_to_orders.sql
-- Up migration — always idempotent where possible
ALTER TABLE orders
ADD COLUMN IF NOT EXISTS is_gift BOOLEAN NOT NULL DEFAULT FALSE;
COMMENT ON COLUMN orders.is_gift
IS 'True if order was placed as a gift. Triggers gift wrap and hides receipt from recipient.';
CREATE INDEX CONCURRENTLY orders_is_gift_idx
ON orders(is_gift)
WHERE is_gift = TRUE;
# ✅ Alembic migration: {timestamp}_{description}.py
# alembic/versions/20260315_001_add_is_gift_to_orders.py
def upgrade() -> None:
op.add_column(
'orders',
sa.Column('is_gift', sa.Boolean(), nullable=False, server_default='false')
)
op.create_index(
'orders_is_gift_idx', 'orders', ['is_gift'],
postgresql_where=sa.text('is_gift = true')
)
def downgrade() -> None:
op.drop_index('orders_is_gift_idx', table_name='orders')
op.drop_column('orders', 'is_gift')
Migration Rules
- Every migration must be backward compatible where possible — add columns as nullable or with defaults before making them
NOT NULL - Never rename or drop a column in the same deploy as code that depends on the new name — deploy in two stages
- Always write and test the rollback (down migration) before deploying
- Never modify data in a migration that runs as part of deployment — use a separate reviewed one-time script
- Migrations run on
mainbranch only — never on feature branches
Audit Logging
For tables containing sensitive or compliance-relevant data, maintain a full audit trail of all changes:
-- ✅ Audit log table — one row per change
CREATE TABLE audit_log (
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
table_name VARCHAR(100) NOT NULL,
record_id BIGINT NOT NULL,
action VARCHAR(10) NOT NULL CHECK (action IN ('INSERT', 'UPDATE', 'DELETE')),
changed_by BIGINT REFERENCES users(id),
changed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
old_values JSONB, -- NULL for INSERT
new_values JSONB -- NULL for DELETE
);
CREATE INDEX audit_log_table_record_idx
ON audit_log(table_name, record_id);
CREATE INDEX audit_log_changed_at_idx
ON audit_log(changed_at DESC);
-- ✅ Trigger function to automatically log changes
CREATE FUNCTION trg_audit_log_changes()
RETURNS TRIGGER
LANGUAGE plpgsql AS $$
BEGIN
INSERT INTO audit_log (table_name, record_id, action, changed_by, old_values, new_values)
VALUES (
TG_TABLE_NAME,
COALESCE(NEW.id, OLD.id),
TG_OP,
NULLIF(current_setting('app.current_user_id', true), '')::BIGINT,
CASE WHEN TG_OP = 'INSERT' THEN NULL ELSE to_jsonb(OLD) END,
CASE WHEN TG_OP = 'DELETE' THEN NULL ELSE to_jsonb(NEW) END
);
RETURN COALESCE(NEW, OLD);
END;
$$;
-- Apply to sensitive tables
CREATE TRIGGER after_orders_change
AFTER INSERT OR UPDATE OR DELETE ON orders
FOR EACH ROW EXECUTE FUNCTION trg_audit_log_changes();
Row-Level Security (RLS)
Use PostgreSQL Row-Level Security to enforce data isolation at the database level — ensuring users can only access their own data, even if application-level filtering is bypassed:
-- ✅ Enable RLS on a table
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
-- ✅ Policy — users can only see their own orders
CREATE POLICY orders_user_isolation
ON orders
USING (user_id = NULLIF(current_setting('app.current_user_id', true), '')::BIGINT);
-- ✅ Admin bypass policy
CREATE POLICY orders_admin_bypass
ON orders
USING (
NULLIF(current_setting('app.current_user_role', true), '') = 'admin'
);
-- ✅ Set the user context from the application before each query
-- (in application code — Node.js / Python / Java)
SET LOCAL app.current_user_id = '42';
SET LOCAL app.current_user_role = 'standard';
SELECT * FROM orders; -- automatically filtered to user_id = 42
Scheduled Jobs with pg_cron
Use pg_cron for scheduled database maintenance tasks — materialised view refreshes, archiving old records, and statistics updates:
-- Enable pg_cron extension (requires postgresql.conf: shared_preload_libraries = 'pg_cron')
CREATE EXTENSION IF NOT EXISTS pg_cron;
-- ✅ Refresh materialised view nightly at 02:00 UTC
SELECT cron.schedule(
'refresh-daily-revenue',
'0 2 * * *',
'REFRESH MATERIALIZED VIEW CONCURRENTLY mv_daily_revenue'
);
-- ✅ Archive old audit logs monthly (retain 1 year)
SELECT cron.schedule(
'archive-old-audit-logs',
'0 3 1 * *',
$$
INSERT INTO audit_log_archive SELECT * FROM audit_log WHERE changed_at < NOW() - INTERVAL '1 year';
DELETE FROM audit_log WHERE changed_at < NOW() - INTERVAL '1 year';
$$
);
-- ✅ Update table statistics weekly
SELECT cron.schedule(
'weekly-analyze',
'0 4 * * 0',
'ANALYZE'
);
-- View all scheduled jobs
SELECT jobname, schedule, command, active FROM cron.job ORDER BY jobname;
Database Maintenance
-- ✅ VACUUM — reclaims storage from dead rows
VACUUM orders;
VACUUM FULL orders; -- rewrites the table — takes a lock, use in maintenance windows only
-- ✅ ANALYZE — updates query planner statistics
ANALYZE orders;
ANALYZE; -- updates all tables
-- ✅ REINDEX — rebuilds bloated or corrupt indexes
REINDEX TABLE CONCURRENTLY orders;
REINDEX INDEX CONCURRENTLY orders_user_id_idx;
-- ✅ Monitor autovacuum activity
SELECT
schemaname,
relname AS table_name,
last_autovacuum,
last_autoanalyze,
autovacuum_count,
n_dead_tup AS dead_tuples,
n_live_tup AS live_tuples
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 20;
Security — Parameterised Queries
Never build SQL strings by concatenating user input:
-- ❌ SQL injection vulnerability
query = "SELECT * FROM users WHERE id = " + user_input;
-- ✅ Node.js + pg
const result = await client.query(
'SELECT * FROM users WHERE id = $1',
[userId]
);
-- ✅ Python + psycopg2
cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
-- ✅ Java + JDBC
PreparedStatement stmt = conn.prepareStatement("SELECT * FROM users WHERE id = ?");
stmt.setLong(1, userId);
Best Practices Quick Reference
| Practice | Rule |
|---|---|
| Data types | NUMERIC for money, TIMESTAMPTZ for time, BOOLEAN for flags, BIGINT for PKs |
| NOT NULL | Every column that must have a value — default to NOT NULL |
| Constraints | Name all constraints; use CHECK, UNIQUE, FK on every relevant column |
| FK behaviour | Define ON DELETE explicitly — RESTRICT is the safe default |
| Normalisation | 3NF minimum; denormalise only with materialised views and documentation |
| Transactions | Wrap all multi-statement operations that must succeed or fail together |
| Soft deletes | deleted_at TIMESTAMPTZ — never hard-delete audit-relevant records |
| Migrations | Version-controlled, reviewed, backward-compatible, with rollback |
| Migration tool | Flyway (Java), Alembic (Python), Prisma Migrate (Node.js) |
| Audit logging | Trigger-based audit_log table for sensitive and compliance data |
| Row-level security | Enforce data isolation at the DB level for multi-tenant data |
| Scheduled jobs | Use pg_cron for maintenance tasks and materialised view refreshes |
| Maintenance | Schedule VACUUM/ANALYZE; monitor dead tuple counts and index bloat |
| Parameterised queries | Mandatory — never concatenate user input into SQL |