Project Meadowlark Survey Results - July 28, 2022

Introduction

A survey of the SIG members was conducted for the following information:

Which features to prioritize (or remove!) in Meadowlark
Interest and ability to establish a pilot project in your deployment environment

Below is the survey results.

The Ed-Fi tech team will follow-up one-on-one with potential pilot projects.

Survey Results

Thinking in the long-term, how do you prioritize these broad features that are currently in planning for Meadowlark?

No Pilot - Follow-up

Pilot - Follow-up

Can you get by with just the two modes of authorization, ownership and full-access? If not, please tell us a little more about your minimum authorization requirements for a pilot test. More info: https://edfi.atlassian.net/wiki/x/TEtXAQ

Yes
Your link took me to placeholder pages, not sure how to reply here...
The link is not working.. I would need to better understand the difference between the two before forming an opinion.

Currently, we are not proposing to include these features and tools in a pilot test. Will that be a problem?

Anything else we should know right now in preparation for a possible pilot?

We don't have access to AWS/Google Cloud - we're currently doing everything in Azure using the Cloud ODS deployment script that was posted on the Exchange a while ago, so our experience is limited to that.

I remain skeptical of a document store as the driving engine here. I could be convinced, but my understanding is that there are several RDBMSes that have significantly improved both read and write scalability, and will inherently be better at maintaining data integrity. I have doubts about the performance and feasibility of things like schema enforcement, referential integrity, etc, particularly if using an engine that doesn't truly support some of the necessary primitives (joins, upserts, etc). I don't personally think year-rollover is sufficiently burdensome to motivate schemalessness as a top-line feature. I think a lot could be accomplished with read-replicas, surrogate keys, indices, better defaults for bypassing security checks on certain account types, and some tweaks to the paging mechanism. I'd be curious if there are good analytics on where bottlenecks occur in both large write and large read situations. For data out, I know that the paging mechanism and excess security checks are the two largest hurdles we've seen. It seems to me that a relational engine will in general be a better backing for highly relational data, and that these can be scaled to absurd speeds if the appetite is there. Uber ran on Postgres for many years, and then migrated to MySQL. Scaling can absolutely be accomplished without giving up on schema/integrity enforcement, or performing them outside of the database. That said, if this approach performs without serious drawbacks, I can absolutely put my distaste for document stores aside. I'll be very curious to see how some of the enforcement mechanisms are being implemented, and how they perform at scale.

Ed-Fi Special Interest Groups