Database Indexing Types and Usage

Database indexing is a performance optimization technique that improves query speed by enabling faster data retrieval. Indexes function like a table of contents in a book, allowing the database engine to locate data without scanning the entire table.

While indexes speed up read operations, they increase storage usage and can slow down insert/update operations due to the additional overhead of maintaining index structures. Choosing the right index depends on query patterns and database workload characteristics.

Indexing Strategies in PostgreSQL

1. B-Tree Index (Default Index)

B-tree (Balanced Tree) indexing is a type of data structure used in databases to store and manage sorted data efficiently. B-Tree indexing is the default method in most databases. It supports both equality (`=`) and range filtering (`<`, `>`, `BETWEEN`). It is best for sorting and general queries.

Key Characteristics:

Balanced structure: All leaf nodes are at the same depth, ensuring consistent performance.
Multi-level nodes: Each node can have multiple keys and children, reducing the height of the tree.
Sorted keys: Data is stored in sorted order for fast retrieval.

How it Works:

When you create a B-tree index on a column, the database builds a tree structure where:

Internal nodes guide the search.
Leaf nodes contain pointers to the actual data rows.

This structure allows:

Fast lookups (O(log n) time).
Efficient range queries (e.g., WHERE age BETWEEN 20 AND 30).
Ordered traversal (for ORDER BY, etc.).

CREATE INDEX idx_token_timestamp ON app_market_data (token, timestamp);

✅ Pros:

Efficient for most lookup and sorting queries
Supports both equality and range filtering
Balanced tree structure ensures predictable performance

❌ Cons:

Can slow down inserts/updates on large tables
Takes additional storage

2. Hash Index (Fast Exact Match)

Hash indexing uses a hash function to map search keys to a location (bucket) in the index. It’s a fast and efficient way to retrieve data when you’re looking for exact matches but do not support range queries.

How It Works:

A hash function is applied to the search key (e.g., id = 101).
The result determines the bucket where the data is stored.
The database jumps directly to the bucket and retrieves the row.

CREATE INDEX idx_symbol_hash ON option_datas USING hash (symbol);

✅ Pros:

Very fast for equality lookups (`=`)
Smaller storage compared to B-Tree

❌ Cons:

Does not support range queries (`<`, `>`, `BETWEEN`)
Cannot be used for sorting

3. GIN Index (Generalized Inverted Index)

GIN is a special type of index used in PostgreSQL to efficiently index composite, array, and full-text data types, especially when a single column contains multiple values (like arrays or documents).

Instead of indexing rows directly, GIN creates an inverted index — it maps values to the rows they appear in, rather than mapping rows to their values.

Best for **full-text search, JSONB fields, and array indexing**.

CREATE INDEX idx_json_gin ON logs USING GIN (data_jsonb);

✅ Pros:

Ideal for **text search** and **JSONB queries**
Great for **array indexing**

❌ Cons:

Slower **write performance**
High **storage overhead**

4. BRIN Index (Block Range Index)

BRIN (Block Range Index) is a type of index in PostgreSQL that stores summaries of data ranges, rather than individual row values. BRIN is ideal for **very large, sequentially stored datasets** such as time-series data.

How It Works:

If your table has 1 million rows with a created_at column ordered chronologically, BRIN will:

Divide the table into blocks (e.g., every 128 pages)
Store:
- min(created_at)
- max(created_at)
  for each block

Then, if you query for rows within a time range, BRIN can skip entire blocks that don’t match.

CREATE INDEX idx_brin_created_at ON big_table USING BRIN (created_at);

✅ Pros:

Minimal storage overhead
Best for **time-series and large datasets**

❌ Cons:

Not effective for **randomly distributed data**

Comparison between Indexing Strategies

Index Type	Best Used For	Pros	Cons
B-Tree	General-purpose queries	Fast for most queries	Slower inserts/updates
Hash	Exact equality search	Very fast for `=` queries	Does not support range queries
GIN	Full-text search, JSONB	Efficient for complex searches	High storage cost
BRIN	Large, sequential datasets	Minimal storage cost	Not useful for random data

Viewing Indexes and Definitions

SELECT tablename, indexname AS index_name,
indexdef AS index_definition
FROM pg_indexes
WHERE schemaname = 'public' 
ORDER BY tablename, indexname;

To view indexes for a specific table:

SELECT indexname AS index_name,
indexdef AS index_definition
FROM pg_indexes 
WHERE tablename = 'inst_datas';

Viewing Table and Index Size

SELECT relname AS table_name, 
pg_size_pretty(pg_total_relation_size(oid)) AS total_size, pg_size_pretty(pg_table_size(oid)) AS table_size, 
pg_size_pretty(pg_indexes_size(oid)) AS indexes_size 
FROM pg_class 
WHERE relkind = 'r' -- Only tables
ORDER BY pg_total_relation_size(oid) DESC;

To check a specific table’s storage:

SELECT pg_size_pretty(pg_total_relation_size('public.app_market_data')) 
AS total_size,pg_size_pretty(pg_indexes_size('public.app_market_data')) 
AS indexes_size;

Summary

Indexes play a crucial role in enhancing the performance of database queries by allowing faster data retrieval. However, it’s important to note that while indexes can significantly improve query speed, they do come with a trade-off in the form of additional storage requirements.

B-Tree is the most common and works for general queries.
Hash Index is best for exact match but does not support sorting.
GIN Index is great for text search and JSONB fields.
BRIN Index is best for large, sequentially stored datasets.

In conclusion, selecting the most appropriate indexing strategy should be guided by a clear understanding of your database’s structure, the nature of your data, and the types of queries you intend to run. Making the right choice can dramatically improve performance while maintaining efficient resource usage.

Albin Manoj