...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
Considerations for Integration of the Ed-Fi Ecosystem into Data Lake and Next Generation Data Warehouse Architectures
...
Anchor | ||||
---|---|---|---|---|
|
- System interoperability: "helping every school district and state achieve data interoperability"
- Serving data to end users: "empower educators with comprehensive, real-time2 insights into their students’ performance and needs"
The diagrams below illustrate these two problems and will serve as a foundation for the many architectural diagrams that follow.
Problem: interoperability | Problem: serving data |
---|---|
| |
...
- Data Model Extensions that support exchange storage of new data elements and domains that are not otherwise covered by the Data Standard;
- API Composites that denormalize data, so that data can be extracted from the API with fewer calls; and
- Custom Analytics Middle Tier (AMT) views that denormalize more of the ODS tables, compared to what is available out of the box.
The Data Lake Hypothesis
Amazon defines a data lake as
"... a centralized repository that allows you to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data, and run different types of analytics—from dashboards and visualizations to big data processing, real-time analytics, and machine learning to guide better decisions."
A central premise of the data lake concept is to get the data into the hands of analysts as quickly as possible and let them sort out the details. That brings us to the question, "why not put it in a data lake?"
...
(1) Ed-Fi mission statement, from About Ed-Fi: Anchor f1 f1
"The Ed-Fi Alliance is a nonprofit devoted to helping every school district and state achieve data interoperability. By connecting educational data systems, we empower educators with comprehensive, real-time insights into their students’ performance and needs."
(2) Anecdotally, "real-time" in educational data is not meant literally, as in some other industries. In educational settings, there is more concern that the data are up-to-date within a day or two. A counter-example might be literal real-time notifications on classroom attendance. On the other hand, manually-entered attendance data may be prone to errors or recording delays, unless the school is using automated proximity detection (e.g. RFID). In the manual case, actual real-time may not be desirable, and in the latter case, the proximity system itself likely takes responsibility for notifications. Anchor f2 f2
(3) Colloquially, following the Microsoft terminology, known as Change Data Capture or CDC. Examples: CDC on Microsoft SQL Server and Azure SQL; roll your own with PostgreSQL (1) (2) or use an add-on such as Materialize, Hevo, Aiven, etc.