July 2024 - Project Tanager Workgroup
July 25, 2024, 11:00 am - 11:45 am central. Contact @Stephen Fuqua for the meeting invitation.
Agenda
Demonstration
Review the roadmap
Discuss / provide input on architecture
Attendees: JF Guertin (EdWire), Max Paulson (Denver Public Schools), Geoff McElhanon (Edufied), Doug Loyo (Edufied), Joshua Impson (Resultant), Erik Joranlien (Education Analytics), Brad Banister (Doubleline Partners on behalf of Ed-Fi), Stephen Fuqua (Ed-Fi Alliance).
Demonstration
Rather than review milestone v0.1.0 release, let’s see what we have today, now that we have added reference validation.
Meeting notes:
Walked through execution of a demonstration script showing referential integrity checks.
Described that JSON validation checks are also in place at this time.
Roadmap
Milestone | Functional Goals |
---|---|
0.1 done | Compliant Discovery API, Descriptor API, and Resource API definition (except GET by query): able to run bulk upload, smoke test. Includes JSON validation based on API schema file. Fake OAuth (1). |
0.2 in progress | Reference validation, Streaming, and Profiles: rejects POST, PUT, and DELETE requests that would violate referential integrity. Streaming data out. Build basic Profiles support (2). |
0.3 by Summit | GET by query and cascading updates: use search engine or relational DB to fulfill GET by query requests. Support cascading updates on allowed resources. |
0.4 need to accelerate | Namespace authorization: real OAuth; JWT inspection; duplicate ODS/API's namespace authorization. First release of the Configuration Service. |
0.5 | Data model flexibility and Concurrency: extensions, choosing between DS 4 and DS 5, swapping data standards at start up (not compile). Dynamic Discovery API definition, based on actual Data Standard/extensions. Full support for eTag-based concurrency. |
0.6 | Dynamic profiles and multitenancy: full-fledged support for XML-based dynamic profiles, and for ODS/API 7 style multi-tenant routing and database segmentation. |
0.7 tech congress? | Ed-org based authorization. (3) |
0.8 | Change queries. |
Meeting notes:
Milestone 0.2 under way now, and targeting at least milestone 0.3 complete by the time of the Ed-Fi Summit at the end of September.
Swap “get by query” and “basic profiles” support between milestones 0.2 and 0.3.
Need to try to accelerate for a good release candidate at Tech Congress 2025.
Who can help?
JF: perhaps help with multinenancy / routing? Suggested to write up design notes first in this GitHub discussion.
Max: Lambda function as alternate front end interface. We can also look into other areas of interest in the roadmap.
Architecture
Database Design
“Single table” design inspired by NoSQL document stores, with added benefit of RDBMS-managed foreign key references.
document
holds the Resource and Descriptor JSON documentsalias
helps with referential integrity in inheritance structures: e.g. a School is an EducationOrganization. If JSON payload has aneducationOrganizationId
, then this table solves the problem of not knowing which type of Education Organization to query.reference
further helps with referential integrity checks, for example not able to delete a document that is referenced by another document.
Each of these three tables is partitioned out of the box for better performance. Default: 16. Easily supports up to 256 partitions; could be more with a very small tweak.
Bulk Load Performance
Grand Bend data set (“populated template”). Running in Docker containers on localhost.
System | Data Set | Timing | Row count |
---|---|---|---|
DMS | Descriptors only | 0:19.9 minutes | 3,201 in + 3,201 in |
ODS/API 7.1 | Descriptors only | 6:45.6 minutes | 3,201 in + 3,201 across all of the descriptor tables |
Isolation Level
Read Committed vs. Repeatable Read vs. Snapshot vs. Serialized
Ex: read before write when a delete fails because the deleted item is referenced by something else. Read committed could allow a change to the “something else”, making it impossible to report the problem with the delete. Repeatable read solves this, but can cause a transaction rollback for another “concurrent” (but second-in) transaction. We think this conflict is unlikely to occur frequently. If using read committed, would probably want to do more manual locking.
Search Database
Project-Tanager/docs/DMS/CDC-STREAMING.md at main · Ed-Fi-Alliance-OSS/Project-Tanager (github.com)
Reading straight from dms.document
, no outbox event table → open to suggestions / design for an additional outbox table.
Alternative
Additional query tables in PostgreSQL / MSSQL with painful indexing. States are likely to want this, and we have a design in mind already.