Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Introduction

PostgreSQL has extensive support for storing and querying JSON documents. In fact, it seems to fare very well compared to MongoDB for features and especially for performance. Each platform has its own advantages. For Meadowlark 0.2.0, the development team will implement CRUD operations using PostgreSQL in addition to MongoDB, thus enabling direct comparison of features and benefits, and demonstrating the flexibility inherent in the design of the Meadowlark code.

Also see: Meadowlark 0.2.0 - Durable Change Data Capture for more information on streaming data out to OpenSearch.

Design

Overview

The PostgreSQL schema would be set up in a similar non-relational design to the other NoSQL designs, but take advantage of PostgreSQL's document store features. The basic principal continues that the API document is stored along with metadata to be used for existence/reference validation. Metadata would continue to be stored alongside the API document in columns. Fast document lookups continue to be done by id, constructed as before from API document project name, entity type, version and natural key. Transactions will again be used to check for existence/references before performing create/update/delete operations.

In order to simplify a PostgreSQL deployment, this design is flexible on the requirement of OpenSearch for queries. This also means change data capture streaming becomes optional.

Instead of using OpenSearch, a "standalone deployment" will take advantage of PostgreSQLs JSON inverted-index support. Rather than split the entities into separate tables, an additional index on project_name/entity_type/entity_version  will be required for query support. Once a deployment reaches the performance constraints of this design, all these indexes can be dropped and an Elastic/OpenSearch configuration introduced that will continue to use a single table design.

Entity Table

This implementation will use a single table named Entity.

Columns

  • id - A string hash of the project name, entity type, entity version and natural key body of the API document. This is identical to PK identifier in the DynamoDB implementation. This is the primary key.
  • project_name - The MetaEd project name the entity is defined in e.g. "EdFi" for a data standard entity.
  • entity_type - The entity type as a string e.g. "Student".
  • entity_version - The entity version as a string. This is the same as MetaEd project version the entity is defined in e.g. "3.3.1-b" for a 3.3b data standard entity.
  • api_doc - The ODS/API document body as a jsonb field.
  • security_annotations - if any, *half-baked idea

Indexes:

  • On api_doc as a GIN jsonb_path_ops index - for query support in standalone deployment
  • On project_name/entity_type/entity_version - for query support in standalone deployment

Reference Table

This implementation will also use a reference table for reference validation.

Columns

  • parent_id - The id of the parent entity, Fked to Entity and part of the PK
  • external_id - The id of the externally referenced entity, Fked to Entity and part of the PK
  • reference_type - an indicator of reference type (*half baked idea, most likely to help with security)
  • document_location - The JSONPath expression of where the external reference is located in the parent document (* this is a maybe, we'll already have it, interesting use cases like "Ref at this JSONPath is this resource URI" in API response as metadata)

Indexes

  • On external_id - for pre-delete lookups

Data Processing

Insert Transaction Steps

Inserting a new Entity document into the table will follow these steps:

  • Check that id does not exist in Entity (indexed query)
  • Check that external reference ids for the document all exist in Entity (index query per reference)
  • Perform insert of document into Entity
  • Perform insert of external references into Reference
  • Note: PostgreSQL has upsert support, but we may need to know if the outcome was insert or update to return the correct API response.

Update Transaction Steps

Updating an existing Entity document into the table will follow these steps:

  • Check that id exists in Entity (indexed query)
  • Check that external reference ids for the document all exist in Entity (index query per reference)
  • Perform update into Entity
  • Perform replacement of prior external references in Reference (delete all old + insert)
  • Note: PostgreSQL has upsert support, but we may need to know if the outcome was insert or update to return the correct API response.

Delete Transaction Steps

Deleting an existing Entity document from the table will follow these steps:

  • Check that id exists in Entity (indexed query)
  • Check that there are no external references in Reference for this id (indexed query)
  • Perform delete

Queries

A PostgreSQL installation will operate in two modes. In standalone mode, get all and get-by-key queries will be done directly on PostgreSQL by project_name/entity_type/entity_version plus the GIN-indexed api_doc. In "normal" mode, get all and get-by-key queries will be serviced by OpenSearch/Elasticsearch via CDC streaming.

Open Issues

Need a partioning / sharding paradigm for large databases. See https://www.percona.com/blog/2019/05/24/an-overview-of-sharding-in-postgresql-and-how-it-relates-to-mongodbs/