Why Indexes Matter

Without an index, a database performs a full table scan — reading every single row to find the ones that match your query. On a small table this is fine. On a table with millions of rows, it's a performance disaster. Indexes solve this by creating a data structure that lets the database jump directly to the relevant rows.

Think of an index like the index at the back of a textbook: instead of reading every page to find "normalization," you look it up alphabetically and go straight to page 47.

How a B-Tree Index Works

The most common index type is the B-tree (balanced tree) index. It stores column values in a sorted tree structure with pointers back to the actual table rows. When you query WHERE last_name = 'Smith', the database traverses the tree in O(log n) time rather than scanning O(n) rows.

Most databases (PostgreSQL, MySQL, SQL Server) use B-tree indexes by default.

Creating an Index

-- Basic single-column index
CREATE INDEX idx_orders_customer_id ON orders(customer_id);

-- Composite index (column order matters!)
CREATE INDEX idx_orders_status_date ON orders(status, created_at);

-- Unique index (also enforces uniqueness)
CREATE UNIQUE INDEX idx_users_email ON users(email);

When to Add an Index

Good candidates for indexing include:

  • Columns frequently used in WHERE clauses
  • Columns used in JOIN conditions
  • Columns used in ORDER BY or GROUP BY
  • Foreign key columns (not always auto-indexed depending on your DB)

When NOT to Add an Index

More indexes isn't always better. Indexes have real costs:

  • Write overhead: Every INSERT, UPDATE, and DELETE must also update all relevant indexes.
  • Storage: Indexes consume disk space, sometimes significantly.
  • Small tables: A full scan is often faster than an index lookup on tiny tables.
  • Low-cardinality columns: Indexing a boolean column with only two possible values is rarely useful.

Composite Index Column Order

With composite indexes, order matters. An index on (status, created_at) will efficiently support queries filtering by status alone, or by status AND created_at. But it won't efficiently support queries filtering only by created_at. Put the most selective (highest cardinality) column first, unless query patterns dictate otherwise.

Using EXPLAIN to Verify Index Usage

Always verify your indexes are being used with EXPLAIN (or EXPLAIN ANALYZE in PostgreSQL):

EXPLAIN SELECT * FROM orders WHERE customer_id = 42;

Look for Index Scan or Index Seek in the output. A Seq Scan (sequential/full table scan) on a large table is a warning sign that your index isn't being used — or doesn't exist.

Common Index Pitfalls

  1. Wrapping indexed columns in functions: WHERE YEAR(created_at) = 2024 prevents index use. Use range conditions instead.
  2. Leading wildcard searches: WHERE name LIKE '%Smith' can't use a B-tree index. LIKE 'Smith%' can.
  3. Implicit type conversions: Comparing an indexed VARCHAR column to an integer causes a type cast that bypasses the index.
  4. Too many indexes: Tables with heavy write loads suffer when over-indexed.

Key Takeaways

  • Indexes dramatically speed up reads on large tables by avoiding full scans.
  • Every index adds overhead to write operations — choose thoughtfully.
  • Use EXPLAIN to confirm your queries are actually using indexes.
  • Composite index column order is critical — plan it around your actual queries.