Introduction

In hindsight, DynamoDB was a poor choice of data store for the first release of Meadowlark for two primary reasons:

Except for a little-known open source implementation, it is entirely restricted to Amazon Web Services.
The design model is interesting, but idiosyncratic.

MongoDB would have been a better starting point:

It is supported, directly and/or through emulation, on all major cloud platforms and on-premises.
It is a mature product, with strong documentation and design patterns.
The scalability features, such as replication and sharding, are very attractive for large implementation.

There are other NoSQL databases with similar benefits and other attractive features, such as Couchbase. However, the support is less widespread, so it will not be investigated at this time.

Although it is one of the traditional relational databases, PostgreSQL has powerful built-in support for NoSQL operations. Because of the Ed-Fi community's growing adoption of PostgreSQL, it will be explored as an alternative to MongoDB. See Meadowlark 0.2.0 - PostgreSQL.

Also see: Meadowlark 0.2.0 - Durable Change Data Capture for more information on streaming data out to OpenSearch.

Design

This proposal takes its cue from the team experience with DynamoDB. The basic principal continues that the API document is stored along with metadata to be used for existence/reference validation. However, instead of storing the metadata in columns it will be part of a single larger document. Fast document lookups continue to be done by id, constructed as before from API document project name, entity type, version and body. Transactions will again be used to check for existence/references before performing create/update/delete operations. The MongoDB version of reference validation for deletes is greatly simplified from the DynamoDB version by taking advantage of MongoDB's indexing features, in particular indexing of arrays.

To support potential deployment to Amazon DocumentDB or Azure CosmosDB, all code and design should match the MongoDB 4.0 API.

Entity Collection

The MongoDB implementation will only need one collection, to be called Entity. The shape of the Entity document (all fields required):

id - A string hash derived from the project name, resource name, resource version and identity of the API document. This field will be a unique index on the collection.
documentIdentity - The identity elements extracted from the API document.
projectName -The MetaEd project name the API document resource is defined in e.g. "EdFi" for a data standard entity.
resourceName - The name of the resource. Typically, this is the same as the corresponding MetaEd entity name. However, there are exceptions, for example descriptors have a "Descriptor" suffix on their resource name.
resourceVersion - The resource version as a string. This is the same as the MetaEd project version the entity is defined in e.g. "3.3.1-b" for a 3.3b data standard entity.
isDescriptor - Boolean indicator.
edfiDoc - The Ed-Fi ODS/API document itself.
outRefs - An array of ids extracted from the ODS/API document for all externally referenced documents.
validated - Boolean indicator.

Examples

Example: Descriptor

{
    "_id" : ObjectId("6287c039abf2ff4430376b3d"),
    "documentIdentity" : [ 
        {
            "name" : "descriptor",
            "value" : "uri://ed-fi.org/AbsenceEventCategoryDescriptor#Bereavement"
        }
    ],
    "projectName" : "Ed-Fi",
    "resourceName" : "AbsenceEventCategoryDescriptor",
    "resourceVersion" : "3.3.1-b",
    "isDescriptor" : true,
    "id" : "546c96c905374bed9287409ba1ca77d28fdcd08c9d3ea3e9085d8a10",
    "edfiDoc" : {
        "codeValue" : "Bereavement",
        "shortDescription" : "Bereavement",
        "description" : "Bereavement",
        "namespace" : "uri://ed-fi.org/AbsenceEventCategoryDescriptor"
    },
    "outRefs" : [],
    "validated" : true
}

Example: School

{
    "_id" : ObjectId("6287e93cabf2ff4430384af2"),
    "documentIdentity" : [ 
        {
            "name" : "schoolId",
            "value" : 123
        }
    ],
    "projectName" : "Ed-Fi",
    "resourceName" : "School",
    "resourceVersion" : "3.3.1-b",
    "isDescriptor" : false,
    "id" : "8d111d14579c51e8aff915e7746cda7e0730ed74837af960b31c4fa6",
    "edfiDoc" : {
        "schoolId" : 123,
        "gradeLevels" : [],
        "nameOfInstitution" : "abc",
        "educationOrganizationCategories" : []
    },
    "outRefs" : [],
    "validated" : true
}

In the following example, note that the outRefs array has the ID of the school from the example above.

Example: AcademicWeek, with Reference to School

{
    "_id" : ObjectId("6287e993abf2ff4430384bfd"),
    "documentIdentity" : [ 
        {
            "name" : "schoolReference.schoolId",
            "value" : 123
        }, 
        {
            "name" : "weekIdentifier",
            "value" : "1st"
        }
    ],
    "projectName" : "Ed-Fi",
    "resourceName" : "AcademicWeek",
    "resourceVersion" : "3.3.1-b",
    "isDescriptor" : false,
    "id" : "20325050be22032deaeaddeb6a82cc160ce85911c9ad5ca8de5482e2",
    "edfiDoc" : {
        "schoolReference" : {
            "schoolId" : 123
        },
        "weekIdentifier" : "1st",
        "beginDate" : "2022-12-01",
        "endDate" : "2022-12-31",
        "totalInstructionalDays" : 30
    },
    "outRefs" : [ 
        "8d111d14579c51e8aff915e7746cda7e0730ed74837af960b31c4fa6"
    ],
    "validated" : true
}

If trying to query inside of an entity, or if trying to GET ALL by type in MongoDB, then separate collections would be better than a single collection. However, when using MongoDB we would still plan to have OpenSearch or ElasticSearch in the picture for those functions. Therefore a single "table" (collection) design is appropriate, and makes sharding easy.

Insert Transaction Steps

Inserting a new Entity document into the collection will follow these steps:

Check that id does not exist (indexed query)
Check that external reference ids for the document all exist (index query per reference)
Perform upsert

Update Transaction Steps

Updating an existing Entity document into the collection will follow these steps:

Check that id exists (indexed query)
Check that external reference ids for the document all exist (index query per reference)
Perform overwrite

Delete Transaction Steps

Deleting an existing Entity document from the collection will follow these steps:

Check that id exists (indexed query)
Check that there are no out_refs for this id (indexed query)
Perform delete

Queries

Get all and get-by-key queries will continue to be serviced by OpenSearch. See Meadowlark 0.2.0 - Durable Change Data Capture for more information on how data will flow out to OpenSearch.

Future Considerations

Security

Investigate adding security annotations based on indexable API document attributes
- Examples: ownership field, extracted education organization field
Investigate using with CASL.js for attribute-based authorization

https://casl.js.org/v5/en
Slide deck intro: CASL presentation by author

Improve version migration support

Consider ways we might want to change the id design to make migrating to newer DS versions easier. For current design, id includes project name, entity type, version, and natural key.

Let's say a new DS version comes out and a Meadowlark implementation wants to upgrade documents to the newer DS version. Assume School is unchanged between two DS versions. From the API client perspective, it would be very nice if the School resource ids didn't change. However, in the current design it would have to because version is part of the id hash.

This may get into changes in how DS versions are incorporated into resource URLs, and/or doing versions per MongoDB collection so that id is unique within a collection?

Table of Contents

Meadowlark 0.2.0 - MongoDB