PostgreSQL’s JSON functionalities as an alternative to NoSQL

Optimizing GeoJSON processing with built-in JSON operations

4 min readDec 13, 2024

The Challenge: Efficient GeoJSON Processing at Scale

At CodeComply.ai, we transform floor plans into actionable data by converting them into GeoJSON features using Togal.ai’s processing capabilities. These features enable us to perform dozens of automated checks for fire safety, accessibility, and state-specific statuses.

As our system grew, we faced a critical challenge: efficiently processing large volumes of memory-intensive GeoJSON objects stored in PostgreSQL’s JSONB columns.
Our initial approach of fetching entire collections before filtering was causing performance issues — high memory usage and slow response times.
While a dedicated NoSQL database seemed like a good solution, we discovered that PostgreSQL’s native JSON operations could solve our problems without adding architectural complexity.
Let’s examine our approach using Koa.js, Sequelize, and PostgreSQL. These optimizations work with most recent PostgreSQL versions, so you won’t need the absolute latest release.

The Data Model

Let’s examine some primary data structures we work with:

Individual features — representing elements like doors

const point: Feature<Point> = {
  type: "Feature",
  geometry: {
    type: "Point",
    coordinates: [5230, 4957]
  },
  properties: {
    name: "Door center",
    id: "0193a241-2fd1-705a-aa18-4f9fdbc6c10f",
    types: ["DOOR_POINT", "MANUAL"]
  }
}

These point features contain essential metadata: coordinates, a unique identifier, and classification types we use for filtering.

Feature Collections — how we store related features:

{
  "type": "FeatureCollection",
  "features": [
    {
      type: "Feature",
      geometry: {
        type: "Polygon",
        coordinates: [
          [
            [480, 3552],
            [529, 3552],
            [529, 3606],
            [480, 3606],
            [480, 3552],
          ],
        ],
      },
      properties: {
        id: "019312ff-0a6e-751d-8f7e-419a099ef666",
        str: "D",
        name: "text_1",
      },
    },
    (other features)
  ]
}

Our database stores these collections in JSONB columns, allowing us to leverage PostgreSQL’s JSON operations for efficient querying.

Steps to optimize queries

Our initial implementation retrieved complete FeatureCollections for processing:

const objects = await Model.findAll({
  attributes: ['geojson_text']
});

This approach produced the following metrics:

The array of numbers represents text feature counts per object, showing significant variation (from 0 to 2671 features, undefined means that the text analysis has not been performed on this page).

In the next step, we implemented database-level filtering using PostgreSQL’s JSON operators — we’ll only return text features with a specific prefix:

  const text_path = `feature->'properties'->>'name'`;

  const PARAGRAPH_PREFIX = "paragraph_";

  const filtered_text = literal(`
    jsonb_build_object(
      'type', 'FeatureCollection',
      'features', COALESCE(
        (
          SELECT jsonb_agg(feature)
          FROM jsonb_array_elements(geojson_text->'features') feature
          WHERE ${`${text_path} LIKE :paragraph_prefix || '%'`}
        ),
        '[]'::jsonb
      )
    )
  `);

  const objects = await Model.findAll({
    where,
    attributes: [
      [filtered_text, "geojson_text"],
    ],
    replacements: {
      paragraph_prefix: PARAGRAPH_PREFIX,
    },
  });

Key components of this optimization:

JSON path operations:

->: Retrieves JSON objects
->>: Extracts text values
Combined in expressions like feature->’properties’->>’name’ for property access

PostgreSQL Functions:

jsonb_build_object(): Creates a JSON object from key-value pairs, useful for restructuring our output
jsonb_agg(): Aggregates multiple records into a JSON array
jsonb_array_elements(): Expands a JSON array into a set of rows, enabling us to query individual array elements
COALESCE(): Returns the first non-null value from its arguments; we use it to return ‘[]’::jsonb when no features match our criteria

Sequelize helpers with text matching:

Sequelize’s literal(): Allows raw SQL injection into queries when we need complex operations that aren’t supported by Sequelize’s standard query builder
replacements: the PARAGRAPH_PREFIX parameter allows flexible text matching. For example:
- With PARAGRAPH_PREFIX = ‘text_’, we match features named ‘text_1’, ‘text_2’, etc.
- The WHERE ${text_path} LIKE :paragraph_prefix || ‘%’ condition performs this prefix matching efficiently at the database level

We immediately see a 26% shorter duration and a 76% reduction of response size. Pages with zero counts indicate no text features matching our filter.

Analysis of our test cases revealed we were sending unnecessary data — particularly geometry information. We implemented property selection:

  const selected_properties = ["id", "str", "name"];

  const filtered_text = literal(`
    jsonb_build_object(
      'type', 'FeatureCollection',
      'features', COALESCE(
        (
          SELECT jsonb_agg(jsonb_build_object(
            'type', 'Feature',
            'properties', jsonb_build_object(
              ${selected_properties
                .map((prop) => `'${prop}', feature->'properties'->>'${prop}'`)
                .join(",")}
            )
          ))
          FROM jsonb_array_elements(geojson_text->'features') feature
          WHERE ${`${text_path} LIKE :paragraph_prefix || '%'`}
        ),
        '[]'::jsonb
      )
    )
  `);

Below we can see the results of such optimization:

The response size keeps getting smaller — 87% reduction from baseline, 52% from the second approach, the duration is 38% faster than baseline and 15% faster than the filtered approach.

Our optimization journey demonstrates that PostgreSQL’s JSON capabilities can effectively replace dedicated NoSQL solutions while:

Significantly reducing response sizes (87% reduction)
Improving query performance (38% faster)
Maintaining a simpler architecture
Preserving type safety and data integrity

In the follow-up article, we’ll explore how we abstracted these patterns into a reusable query generator, covering:

Dynamic property selection
Complex filtering conditions
Type-safe JSON operations
Performance optimization techniques

Stay tuned!