Introduction
Analytics Middle Tier 2.0 is becoming a full citizen in the Ed-Fi platform, rather than just a proof-of-concept on the Ed-Fi Exchange. As it grows up, it needs to correct some architectural concerns that came up as feedback from the field. It also needs to be up to par with the latest release of the ODS/API, version 3.3. This document aims to inform about the challenges and elicit feedback on the utility of the proposed solutions.
Multi Data Standard Support
Requirement
Version 1.3.0 added support for Data Standard 3.1, which was used by ODS/API 3.1.1 and 3.2, through the use of the –dataStandard <Ds2 | Ds31>
argumnet. Analytics Middle Tier 2.0.0 will continue in this vein with flag Ds32
for Data Standard 3.2 (ODS/API 3.3).
Design
Forcing the user to remember which data standard is installed is sub-optimal. We should be able to detect this implicitly and install the correct version without user input.
- If table
AddressType
exists, then install Data Standard 2. - Else if table
VersionLevel
exists, then install Data Standard 3.1. - Else if table
DeployJournal
exists, then install Data Standard 3.2. - Else throw an error: "Unable to determine the ODS database version".
Student, Parent, and Staff Keys
Requirement
The views should expose "Key" fields based on the natural key of the underlying table. In the case of StudentDimension
, ContactPersonDimension
, and UserDimension
, the original release used StudentUSI
, ParentUSI
, and StaffUSI
respectively. The "USI" columns are primary keys and were used by mistake. The "UniqueId" columns are the correct natural keys.
Design
Change all instances of StudentKey
, ContactPersonKey
, and UserKey
to use the corresponding "UniqueId" column from the source table.
Descriptor and Type Mapping
Requirement
Decouple the views from hard-coded Descriptor and Type values.
Context: many of the views need to lookup records by Descriptor value - for instance, looking up the Attendance records where a student has an "Excused Absence" or "Unexcused Absence." Because the original developer had access to only a limited dataset, it was not realized that the Descriptor values will vary widely from one implementation to the next. Thus the hard-coding needs to be decoupled, allowing the implementation to provide a mapping from their Descriptor value to the concept used by the Analytics Middle Tier.
In theory, the various "Types" values in Data Standard 2 should provide a more universal constant than the Descriptors. However, some community members report that these too are mutable. Therefore, (a) using Types is not a solution for Data Standard 2, and (b) even those views with hard-coding to Types instead of Descriptors must be modified for greater independence. Note: Type tables were removed in Data Standard 3 precisely because they were not being used in the originally-designed manner.
Design
Summary
- Move hard-coded values to a "Constants" table.
- Create mapping tables that link Descriptors or Types to Constants.
- Modify all views as needed to join to the Constants and new mapping tables.
Example
In Version 1.x, the StudentEarlyWarningFact
view reports on absences, looking for StudentSchoolAttendanceEvent
records with attendance descriptor values of either "Excused Absence" or "Unexcused Absence".
In version 2, the view would now search for all StudentSchoolAttendanceEvent
records whose descriptor maps to constant "Absent". Thus there would be two relevant DescriptorMap
values, one each for "Excused Absence" and "Unexcused Absence." Any school who uses a different term than these two would create a DescriptorMap
record mapping that term to the DescriptorConstant
value of "Absent".
Implications
Those who install the Analytics Middle Tier will need to carefully assess their Descriptors and Types, and then manage the DescriptorMap
table (and TypeMap
, for Data Standard 2) accordingly. The release notes for 2.0 will provide a script that maps the default Ed-Fi Descriptors and Types as a starting point.
Student Dimension Uniqueness
Requirement
The Early Warning System fact views both assumed that the StudentDimension
would only have a single record for a student, as shown in these diagrams:
However, a student could be enrolled in multiple schools at the same time, resulting in two records in the StudentDimension for the same StudentKey. This is problematic for the PowerBI Starter Kit, which has a hard requirement for unique StudentKeys.
At the same time, the Program fact and event views are looking at a Student in a given LocalEducationAgency, whereas the facts above are looking at a Student at a School. Although the ODS database supports a Student enrolled in a Program associated with a School, our input from the field tells us that Program association is on the LEA not the school. Thus we need to balance these competing tensions in the current model.
Design
Data Standard 3 moved the student demographics into the StudentEducationOrganizationAssociation
table, away from the Student
table. The current StudentDimension
view provides student name, primary contact, and demographics. The proposal is to split this into two different views, change the Early Warning System star schema to a snowflake.
The demographics on the view are secondary to the student-school relationship. A data analyst might naturally look for school enrollment data. Therefore we will have a new view called StudentEnrollment
to show active enrollment/school association information, with a StudentEnrollmentKey
that combines the StudentUniqueId
and SchoolId
. The early warning views would then change to use the single StudentEnrollmentKey
instead of the two separate StudentKey
and SchoolKey
columns.
The StudentDimension
would continue to exist in a reduced form, providing only name and primary contact information. The Program views would continue to have the StudentKey
in them.
<create diagrams here>
Problem
What if someone wants to build queries that analyze program participation by demographic data?