Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction

PostgreSQL has extensive support for storing and querying JSON documents. In fact, it seems to fare very well compared to MongoDB for features and especially for performance. Each platform has its own advantages. For Meadowlark 0.2.0, the development team will implement CRUD operations using PostgreSQL in addition to MongoDB, thus enabling direct comparison of features and benefits, and demonstrating the flexibility inherent in the design of the Meadowlark code.

Also see: Meadowlark 0.2.0 - Durable Change Data Capture for more information on streaming data out to OpenSearch.

Design

Overview

The PostgreSQL schema would be set up in a similar non-relational design to the other NoSQL designs, but take advantage of PostgreSQL's document store features. The basic principal continues that the API document is stored along with metadata to be used for existence/reference validation. Metadata would continue to be stored alongside the API document in columns. Fast document lookups continue to be done by id, constructed as before from API document project name, entity type, version and natural key. Transactions will again be used to check for existence/references before performing create/update/delete operations.

In order to simplify a PostgreSQL deployment, this design is flexible on the requirement of OpenSearch for queries. This also means change data capture streaming becomes optional.

Instead of using OpenSearch, a "standalone deployment" will take advantage of PostgreSQLs JSON inverted-index support. Rather than split the entities into separate tables, an additional index on project_name/entity_type/entity_version  will be required for query support. Once a deployment reaches the performance constraints of this design, all these indexes can be dropped and an Elastic/OpenSearch configuration introduced that will continue to use a single table design.

Entity

Documents Table

This implementation will use a single table named Entity.

Columns

Column NameData TypeDescription
_pkbigserialSynthetic primary key, analogous to MongoDB's _id
id​varyingA string hash derived from the project name, resource name, resource version and identity of the API document. This field will be a unique index on the collection.​

document_identity 

varyingThe identity elements extracted from the API document.
project_namevaryingThe MetaEd project name the API document resource is defined in e.g. "EdFi" for a data standard entity.
resource_namevaryingThe name of the resource. Typically, this is the same as the corresponding MetaEd entity name. However, there are exceptions, for example descriptors have a "Descriptor" suffix on their resource name.
resource_versionvaryingThe resource version as a string. This is the same as the MetaEd project version the entity is defined in e.g. "3.3.1-b" for a 3.3b data standard entity.
is_descriptorBooleanIndicator
validatedBooleanIndicator
edfi_doc JSONBThe Ed-Fi ODS/API document itself.

Indexes:

  • On edfi_doc  as a GIN jsonb_path_ops index - for query support in standalone deployment
  • On project_name & entity_type & entity_version - for query support in standalone deployment
Reference

References Table

This implementation will also use a reference table for reference validation.

Columns

Column NameData TypeDescription
_pkbigserialSynthetic primary key
reference_tovaryingThe parent document's id (~ foreign key)
reference_fromvaryingThe child document's id (~ document' natural key)


Tip

Potential addition:

document_locationvaryingJSONPath expression to the external reference in the parent document

Might be useful in API response metadata?

Indexes

Need to be able to look up references in both directions:

  • reference_to - e.g. when trying to delete a resource, determine if there are any external references to it
  • reference_from - e.g. when trying to delete a resource, also deletes its own references to a "parent" resource.

Data Processing

Insert Transaction Steps

Inserting a new Entity document into the table will follow these steps:

  • Check that id does not exist in Entity (indexed query)
  • Check that external reference ids for the document all exist in Entity (index query per reference)
  • Perform insert of document into Entity
  • Perform insert of external references into Reference
  • Note: PostgreSQL has upsert support, but we may need to know if the outcome was insert or update to return the correct API response.

Update Transaction Steps

Updating an existing Entity document into the table will follow these steps:

  • Check that id exists in Entity (indexed query)
  • Check that external reference ids for the document all exist in Entity (index query per reference)
  • Perform update into Entity
  • Perform replacement of prior external references in Reference (delete all old + insert)
  • Note: PostgreSQL has upsert support, but we may need to know if the outcome was insert or update to return the correct API response.

Delete Transaction Steps

Deleting an existing Entity document from the table will follow these steps:

  • Check that id exists in Entity (indexed query)
  • Check that there are no external references in Reference for this id (indexed query)
  • Perform delete

Queries

A PostgreSQL installation will operate in two modes. In standalone mode, get all and get-by-key queries will be done directly on PostgreSQL by project_name/entity_type/entity_version plus the GIN-indexed api_doc. In "normal" mode, get all and get-by-key queries will be serviced by OpenSearch/Elasticsearch via CDC streaming.

Open Issues

Need a partioning / sharding paradigm for large databases. See https://www.percona.com/blog/2019/05/24/an-overview-of-sharding-in-postgresql-and-how-it-relates-to-mongodbs/

Table of Contents

Table of Contents