Beyond PostgreSQL - Exploring Elasticsearch

Disclaimer: This post was written with the help of Claude Sonnet 4

PostgreSQL, the perfect database that has been adapted to do everything sometimes shouldn’t be used to do everything. PostgreSQL shines for ACID compliance, for relational joins, and its virtual extensibility. But what if you have millions of data that should be searched through in seconds? What if you have situations where data replication across nodes is not an afterthought but part of the fundamental system? What if you want to horizontally distribute resources across multiple machines? In such cases, you might want to explore something without the shackles of PostgreSQL. Something in the league of MongoDB or Elasticsearch.

MongoDB shines for writes and the likes, but should it be used? Yes, it should be used. But you know what’s better than MongoDB? Elasticsearch! Elasticsearch is a NoSQL database that supports indexing across all entries of your data. This indexing allows for very fast search. When I say very fast search I mean, milliseconds. Indexing means it fetches the indexed records and the pointer to the point on the database where they are stored. Then for records that have not yet been indexed, they can simply be returned as an extra computation. This reduces the time complexity in the best case to O(1) and in the worst case, O(n).

Understanding the Fundamental Differences

Think of PostgreSQL as a well-organized filing cabinet with strict rules about where everything goes. Each piece of data has its designated place, relationships are clearly defined, and you need to know exactly which drawer to open to find what you’re looking for. Elasticsearch, on the other hand, is like having a librarian with a perfect memory who can instantly tell you every book that mentions “dragons” or “medieval architecture” across the entire library, regardless of which shelf they’re on.

Where PostgreSQL excels at structured queries with known relationships, Elasticsearch excels at discovery. You might query PostgreSQL with “Show me all users from California who signed up in March.” But with Elasticsearch, you might ask “Find me all documents that contain content similar to this blog post” or “What are customers saying about our product that mentions performance issues?”

The schema flexibility also differs dramatically. In PostgreSQL, you define your table structure upfront. Adding a column requires an ALTER TABLE statement. In Elasticsearch, you can index a document with completely new fields, and the system adapts automatically. This makes it perfect for evolving data structures or when you’re not sure what fields you’ll need to search on later.

When PostgreSQL Hits Its Limits

PostgreSQL starts to show strain when you’re doing full-text search across large datasets. Sure, you can use GIN indexes and tsvector, but you’ll find yourself writing increasingly complex queries for what should be simple searches. Try searching for “machine learning” across millions of documents in PostgreSQL, accounting for variations like “ML,” “artificial intelligence,” or “neural networks.” You’ll end up with a nightmare of OR conditions and LIKE statements.

Elasticsearch handles this naturally. Its analyzer can understand that “ML” and “machine learning” are related concepts. It can rank results by relevance, handle typos, and even suggest corrections. This isn’t just convenience – it’s a fundamental architectural difference.

Another breaking point is horizontal scaling. PostgreSQL can scale vertically (bigger servers) and has read replicas, but truly distributing data across multiple nodes is complex. Elasticsearch was built for this from day one. Adding nodes to your cluster is straightforward, and data automatically rebalances across the infrastructure.

Elasticsearch allows you to access it via several means. One of these means is via the REST API. Another means is via the ES|QL (ElasticSearch query language). In your software applications, you’d ultimately use the REST API because the SDKs depend on it for communication. The ES|QL allows you to use special syntax to query documents across your indices. The idea is simple: you query data and specify how exact you want your queries to be. This is specified by the query parameters.

From Tables to Indices: Rethinking Your Data Model

Moving from PostgreSQL to Elasticsearch requires a mental shift. Instead of tables, you work with indices. Instead of rows, you have documents. Instead of columns, you have fields. But the real difference is in how you think about relationships.

In PostgreSQL, you might have a users table related to an orders table through foreign keys. In Elasticsearch, you might denormalize this into an orders index where each order document contains the relevant user information directly. This trades storage efficiency for query performance – a classic Elasticsearch pattern.

Consider this PostgreSQL structure:

-- Users table
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    email VARCHAR(255)
);

-- Orders table  
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    product_name VARCHAR(255),
    amount DECIMAL(10,2)
);

The equivalent Elasticsearch mapping might look like:

{
  "mappings": {
    "properties": {
      "user": {
        "properties": {
          "id": {"type": "keyword"},
          "name": {"type": "text"},
          "email": {"type": "keyword"}
        }
      },
      "product_name": {"type": "text"},
      "amount": {"type": "double"},
      "order_date": {"type": "date"}
    }
  }
}

Notice how the user information is nested directly in the order document. This eliminates the need for joins but means updating a user’s name requires updating all their order documents.

Stack Management

If you are accessing Elasticsearch from a self-hosted setup, you’d create a new user after your enrollment token thing has completed. This allows you to configure the permissions of your users, because somehow, Elasticsearch’s permission roles list actually looks like a list of permissions. So after getting the permissions in, you’d use that user to log in.

Setting Up User Permissions

The permission system in Elasticsearch is role-based, which feels familiar if you’re coming from PostgreSQL’s grant system, but with some key differences. Instead of database-level permissions, you’re working with cluster-level, index-level, and document-level permissions.

Here’s how you create a user with specific permissions:

First, create a role that defines what actions are allowed:

POST /_security/role/blog_writer
{
  "cluster": ["monitor"],
  "indices": [
    {
      "names": ["blog_posts", "comments"],
      "privileges": ["read", "write", "create_index"]
    }
  ]
}

Then create a user and assign the role:

POST /_security/user/john_blogger
{
  "password": "secure_password_here",
  "roles": ["blog_writer"],
  "full_name": "John the Blogger"
}

[Image placeholder: Screenshot of Elasticsearch security management interface showing role creation and user assignment]

The role system is more granular than PostgreSQL’s. You can restrict access not just by index (like a PostgreSQL database), but by document query, specific fields, or even API operations.

Accessing Your Indices

Once you have proper permissions, accessing your indices through Kibana gives you a visual interface similar to a database administration tool. The Index Management page shows you all your indices, their health status, document counts, and storage usage.

[Image placeholder: Screenshot of Kibana’s Index Management interface showing multiple indices with health status, document counts, and storage metrics]

From here, you can:

Monitor index performance and storage usage
Set up index lifecycle management policies
Create index templates for consistent mapping across similar indices
Monitor cluster health and node distribution

This is roughly equivalent to PostgreSQL’s system tables and monitoring views, but presented in a more user-friendly dashboard format.

Getting API Keys

You can use several authentication strategies with Elasticsearch, but a better strategy will be to generate the key from your deployment and add only the necessary permissions for your application, then deploy with that. You should also consider key rotation and management.

API keys in Elasticsearch work similarly to application-specific passwords in PostgreSQL, but with more flexibility. You can create keys that expire automatically, have limited privileges, or are restricted to specific indices:

POST /_security/api_key
{
  "name": "my_app_key",
  "expiration": "1d",
  "role_descriptors": {
    "app_writer": {
      "cluster": ["monitor"],
      "index": [
        {
          "names": ["app_logs"],
          "privileges": ["write"]
        }
      ]
    }
  }
}

This creates a key that can only write to the app_logs index and expires in one day. Unlike PostgreSQL user accounts that typically live indefinitely, API keys can be designed to self-destruct, which is perfect for temporary access or automated systems.

Installing Elasticsearch and Kibana

First off, Elasticsearch was designed to be run as a singular software on a system that can host it. This means that it will try to take as much resources as it can get.

Things to do:

Install Elasticsearch following the instructions on their documentation
Store the admin password in your ~/.bashrc file
Install Kibana following their docs

Get the kibana enrollment token from elasticsearch:

cd /usr/share/elasticsearch/
sudo bin/elasticsearch-create-enrollment-token --scope kibana

Get the verification code

cd /usr/share/kibana
sudo bin/kibana-verification-code

Login using the elastic username and the password (remember you added it to ~/.bashrc)
Succeed!

The installation process is more involved than PostgreSQL because you’re setting up both the database engine (Elasticsearch) and the administration interface (Kibana). Think of Kibana as the equivalent of pgAdmin, but it comes as a separate service rather than a standalone application.

One key difference is resource allocation. PostgreSQL typically starts conservatively and you tune it based on your needs. Elasticsearch, by default, tries to use a significant portion of your available memory. On a development machine, you might want to limit this in the jvm.options file:

-Xms1g
-Xmx1g

This sets both minimum and maximum heap size to 1GB, preventing Elasticsearch from consuming all available memory on smaller systems.

Designing Your Schema (Or Lack Thereof)

Coming from PostgreSQL’s strict schema requirements, Elasticsearch’s dynamic mapping can feel liberating and terrifying at the same time. You can just start indexing documents without defining the structure upfront:

POST /my_blog_posts/_doc
{
  "title": "My First Post",
  "content": "This is the content of my blog post",
  "tags": ["elasticsearch", "tutorial"],
  "published_date": "2024-12-06"
}

Elasticsearch will automatically detect that title and content should be text fields (for full-text search), tags should be a text array, and published_date should be a date field.

[Image placeholder: Screenshot of Kibana’s Index Patterns interface showing automatically detected field mappings with data types like text, keyword, date, etc.]

However, just because you can work without explicit mappings doesn’t mean you should. For production systems, you’ll want to define mappings to ensure consistent field types and optimize for your specific use cases:

PUT /blog_posts
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "content": {"type": "text"},
      "tags": {"type": "keyword"},
      "published_date": {"type": "date"},
      "author": {
        "properties": {
          "name": {"type": "text"},
          "email": {"type": "keyword"}
        }
      }
    }
  }
}

Notice the dual mapping for title – it’s indexed both as text (for searching) and keyword (for exact matching and sorting). This is a common Elasticsearch pattern that has no direct PostgreSQL equivalent.

Queries and Operations

ES was built for querying. But this does not mean updates and deletes are afterthoughts. The query structure in Elasticsearch is JSON-based and can express complex search logic that would require multiple joins in PostgreSQL.

Basic Search Operations

A simple match query in Elasticsearch:

GET /blog_posts/_search
{
  "query": {
    "match": {
      "content": "elasticsearch tutorial"
    }
  }
}

This finds all blog posts containing “elasticsearch” or “tutorial” in the content, ranked by relevance. The equivalent PostgreSQL query might look like:

SELECT *, 
       ts_rank(to_tsvector('english', content), 
               to_tsquery('english', 'elasticsearch | tutorial')) as rank
FROM blog_posts 
WHERE to_tsvector('english', content) @@ to_tsquery('english', 'elasticsearch | tutorial')
ORDER BY rank DESC;

The Elasticsearch version is more readable and the ranking algorithm is more sophisticated out of the box.

Complex Boolean Logic

Elasticsearch excels at combining multiple conditions:

GET /blog_posts/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"content": "elasticsearch"}},
        {"range": {"published_date": {"gte": "2024-01-01"}}}
      ],
      "should": [
        {"match": {"tags": "tutorial"}},
        {"match": {"tags": "beginner"}}
      ],
      "must_not": [
        {"match": {"tags": "advanced"}}
      ]
    }
  }
}

This query finds posts that:

Must contain “elasticsearch” in content
Must be published after 2024-01-01
Should have “tutorial” or “beginner” tags (boosts score)
Must not have “advanced” tag

The equivalent PostgreSQL query would be considerably more verbose and wouldn’t provide the same relevance scoring.

Updates and Bulk Operations

Unlike PostgreSQL’s straightforward UPDATE statements, Elasticsearch updates work differently due to its immutable document model. When you update a document, Elasticsearch actually marks the old version as deleted and indexes a new version:

POST /blog_posts/_update/1
{
  "doc": {
    "title": "Updated Blog Title",
    "updated_date": "2024-12-06"
  }
}

For bulk operations, Elasticsearch provides a bulk API that’s more efficient than individual operations:

POST /_bulk
{"update": {"_index": "blog_posts", "_id": "1"}}
{"doc": {"view_count": 150}}
{"index": {"_index": "blog_posts", "_id": "2"}}
{"title": "New Post", "content": "New content here"}

This can handle thousands of operations in a single request, something that would require careful transaction management in PostgreSQL.

Performance Characteristics

The performance profile of Elasticsearch differs significantly from PostgreSQL. Where PostgreSQL optimizes for consistent query performance and ACID transactions, Elasticsearch optimizes for search speed and horizontal scaling, sometimes at the cost of immediate consistency.

Elasticsearch uses a “refresh” interval (default 1 second) before new documents appear in search results. This near-real-time behavior is different from PostgreSQL’s immediate consistency. For applications requiring instant visibility of writes, you can force a refresh, but this impacts performance:

POST /blog_posts/_doc?refresh=true
{
  "title": "Immediately Searchable Post"
}

Indexing performance in Elasticsearch scales almost linearly with the number of nodes, while PostgreSQL’s write performance is largely limited by the primary server. However, PostgreSQL’s read replicas can provide excellent read scaling for traditional queries.

Conclusion

The choice of a technical solution for a technical problem is often the choice between a blazing fast system and a snailed, bottlenecked one. Being able to have a data retrieval tool that automatically indexes all points of your data is a thing that will require a database engineer or a backend engineer focused on databases. With Elasticsearch, you can achieve this without the hassle of setting up complex indexing strategies.

Being a NoSQL database means you must learn its query language and use it optimally so as to prevent bottlenecking yourself in the new technical solution you picked. The basic read and write operations can happen over a few queries, but understanding when to use match vs. term queries, how to structure your mappings, and when to denormalize your data will determine whether your Elasticsearch implementation flies or crawls.

The transition from PostgreSQL’s structured world to Elasticsearch’s search-oriented architecture requires rethinking how you model and query your data. But for use cases involving complex search requirements, real-time analytics, or massive scale, Elasticsearch provides capabilities that are difficult or impossible to achieve with traditional relational databases.

Remember that this isn’t necessarily an either-or choice. Many successful applications use PostgreSQL for transactional data and critical business logic, while leveraging Elasticsearch for search, analytics, and discovery features. Each tool excels in its domain, and the best solutions often combine their strengths rather than forcing one to handle everything.

Understanding the Fundamental Differences#

When PostgreSQL Hits Its Limits#

From Tables to Indices: Rethinking Your Data Model#

Stack Management#

Setting Up User Permissions#

Accessing Your Indices#

Getting API Keys#

Installing Elasticsearch and Kibana#

Designing Your Schema (Or Lack Thereof)#

Queries and Operations#

Basic Search Operations#

Complex Boolean Logic#

Updates and Bulk Operations#

Performance Characteristics#

Conclusion#