Version 2.0 Requirements and Design
Introduction
Analytics Middle Tier 2.0 is becoming a full citizen in the Ed-Fi platform in 2020, rather than just a proof-of-concept on the Ed-Fi Exchange. As it grows up, it needs to correct some architectural concerns that came up as feedback from the field. It also needs to be up to par with the latest release of the ODS/API, version 3.3. This document aims to inform about the challenges and elicit feedback on the real-world usefulness of the proposed solutions.
Naming Convention
Requirement
Hold names to under 63 characters for PostgreSQL compatibility.
Design
Currently, none of the objects have a name that violates this constraint. To help avoid problems with future views, it is proposed to either truncate "Dimension" to "Dim" or drop the word altogether. Generally, clarity should be preferred over length when naming objects, hence the question is: how much clarity is lost if we move away from the "Dimension" suffix?
Old Name | A - Truncate | B - Drop |
|---|---|---|
analytics.StudentDimension | analytics.StudentDim | analytics.Student |
Is there risk of confusing the analytics.Student view with the real edfi.Student table when "Dimension" is entirely dropped? The typical use case for Analytics Middle Tier is to only import the views into a business intelligence / reporting data model - thus the end-user would not see the edfi.Student table
Status
Committing to "Dim" suffix on dimensions for best balance between name length and clarity of intent. Jan 31, 2020 @Stephen Fuqua
Multi Data Standard Support
Requirement
Support installing the views on ODS databases supporting multiple data standards (2.2, 3.1, 3.2).
Design
Version 1.3.0 added support for Data Standard 3.1, which was used by ODS/API 3.1.1 and 3.2, through the use of the –dataStandard <Ds2 | Ds31> argument.
Forcing the user to remember which data standard is installed is sub-optimal. We should be able to detect this implicitly and install the correct version without user input.
If table
AddressTypeexists, then install Data Standard 2. if neededElse if table
VersionLevelexists, then install Data Standard 3.1.Else if table
DeployJournalexists, then install Data Standard 3.2.Else throw an error: "Unable to determine the ODS database version".
Status
Will proceed with this design. Jan 10, 2020 @Stephen Fuqua
Data Standard 2 support plan:
New views submitted by the community on Data Standard 3+ will not be translated to Data Standard 2 by the Alliance. Pull requests from community members adding the view(s) to Data Standard 2 will be welcomed.
This will be documented in the official notes.
Data Standard 2 support will signal deprecation - that is, we reserve the right to remove Data Standard from a future Analytics Middle Tier 3.0 release.
Student, Parent, and Staff Keys
Requirement
The views should expose "Key" fields based on the natural key of the underlying table.
Design
In the case of StudentDimension , ContactPersonDimension , and UserDimension , the original release used StudentUSI , ParentUSI , and StaffUSI respectively. The "USI" columns are primary keys and were used by mistake. The "UniqueId" columns are the correct natural keys.
Change all instances of StudentKey , ContactPersonKey , and UserKey to use the corresponding "UniqueId" column from the source table.
Status
Will proceed with this design. Dec 31, 2019 @Stephen Fuqua
Descriptor and Type Mapping
Requirement
Decouple the views from hard-coded Descriptor and Type values.
Context
many of the views need to lookup records by Descriptor value - for instance, looking up the Attendance records where a student has an "Excused Absence" or "Unexcused Absence." Because the original developer had access to only a limited dataset, it was not realized that the Descriptor values will vary widely from one implementation to the next. Thus the hard-coding needs to be decoupled, allowing the implementation to provide a mapping from their Descriptor value to the concept used by the Analytics Middle Tier.
In theory, the various "Types" values in Data Standard 2 should provide a more universal constant than the Descriptors. However, some community members report that these too are mutable. Therefore, (a) using Types is not a solution for Data Standard 2, and (b) even those views with hard-coding to Types instead of Descriptors must be modified for greater independence. Note: Type tables were removed in Data Standard 3 precisely because they were not being used in the originally-designed manner.
Design
Summary
Move hard-coded values to a "Constants" table.
Create mapping tables that link Descriptors or Types to Constants.
Modify all views as needed to join to the Constants and new mapping tables.
Example
In Version 1.x, the StudentEarlyWarningFact view reports on excused and unexcused absences, looking for StudentSchoolAttendanceEvent records with attendance descriptor values of either "Excused Absence" or "Unexcused Absence".
In version 2, the view would now search for all StudentSchoolAttendanceEvent records whose descriptor maps to the relevant constant. Thus there would be two DescriptorMap values, one each for "Excused Absence" and "Unexcused Absence." Any school who uses a different term than these two would create a DescriptorMap record mapping that term to the DescriptorConstant value of "Absent".
Implications
Those who install the Analytics Middle Tier will need to carefully assess their Descriptors and Types, and then manage the DescriptorMap table (and TypeMap , for Data Standard 2) accordingly.
Default Mappings
A new command-line Option will be provided to run a script that loads the default Descriptor mapping for the default Ed-Fi descriptors (minimal/populated template descriptors).
.\EdFi.AnalyticsMiddleTier.exe --connectionString "..." --options DefaultMap
Status
Will proceed with this design. Dec 31, 2019 @Stephen Fuqua
Changes to the Student Dimension
Requirements
Create a "Student" dimension with a single unique key.
Provide intuitive access to student demographics.
Context
Student Dimension Uniqueness
The Early Warning System fact views both assumed that the StudentDimension would only have a single record for a student. However, a student could be enrolled in multiple schools at the same time, resulting in two records in the StudentDimension for the same StudentKey. This is problematic for the PowerBI Starter Kit, which has a hard requirement for unique StudentKeys.
Demographics in Ed-Fi UDM v2.2
Sources for student demographics:
edfi.Studentcontains sex, Hispanic/Latino ethnicity, economic disadvantaged (Bool), school foodservice eligibility, limited English proficiency.One-to-many tables:
edfi.StudentCohortYearedfi.StudentDisabilityedfi.StudentLanguageedfi.StudentLanguageUseedfi.StudentProgramAssociationedfi.StudentCharacteristicis a generic table, and contains begin/end dateedfi.StudentRace
Demographics in Ed-Fi UDM v3.x
Sources for student demographics:
edfi.StudentSchoolAssociationcontains School Year, Enrollment Date and Grade Leveledfi.StudentEducationOrganizationAssociationcontains Sex, Hispanic/Latino ethnicity, and Limited English ProficiencyThere are a series of many-to-many tables to store specific types of multi-value demographic characteristics - note these can be saved for either the school or the district (or charter, state, ESC, etc.)
edfi.StudentEducationOrganizationAssociationCohortYearedfi.StudentEducationOrganizationAssociationDisabilityedfi.StudentEducationOrganizationAssociationLanguageedfi.StudentEducationOrganizationAssociationLanguageUseedfi.StudentEducationOrganizationAssociationRaceedfi.StudentEducationOrganizationAssociationTribalAffiliation
And there is the generic
edfi.StudentEducationOrganizationAssociationStudentCharacteristictable, which has a time Period associated with it.Includes food service eligibility, which was present on
Studentas a Boolean in version 1.
Dimension or Fact?
While gender, race, and ethnicity all have strings associated with them, some elements of the demographic and enrollment data are more fact-oriented than dimension-oriented:
IsHispanic
IsEconomicallyDisadvantaged
LimitedEnglishProficiency
SchoolEnrollmentDate
The ODS does not support slowly-changing dimensions, so there is only ever one current snapshot of these data - one cannot tie them to a date unless referring to the enrollment date... except in the case of food service eligibility because there is a time period.
Foodservice Eligibility
Foodservice eligibility is tracked via a Program Association, which not a demographic in the Ed-Fi Unified Data Model. Therefore it should be removed from demographics and placed in the program association views. For supporting the Power Bi Starter Kit, a new Early Warning System view might be needed that tries to preserve the old StudentDimension in some ways, including flattening the foodservice eligibility into a single Boolean value.
Design Proposal
Summary:
Eliminate the idea of a separate "Student Dimension" in the core data collection.
Create two new, very similar, dimensions to replace the old
StudentDimension:StudentSchoolDimStudentLocalEducationAgencyDim
Combine the data from various
edfi.StudentEducationOrganizationAssociationXYZtables into a single view,DemographicDim.Create two bridge tables to link student information to the characteristics
StudentSchoolDemographicBridgeStudentLocalEducationAgencyDemographicBridge
Rationale for the Student Dimension replacement:
Students don't (or shouldn't) exist in isolation from an organization - hence no need for a
StudentDim.Across the Ed-Fi data model, there are two different student relationships:
With the school (e.g. StudentSchoolAssociation, StudentDisciplineIncident, StudentGradebookEntry, etc.).
And with the more generic Education Organization - which in the current context of the Analytics Middle Tier, generally means Local Education Agency.
In defining meaningful
StudentSchoolandStudentLocalEducationAgencyentities, there will be some overlap of fields - but the data could be different. This is an inherently dangerous area of the Ed-Fi data model. If we were to combine the data into a single perspective, then we would be hiding the danger. The data analyst will need to read and understand why there are two "root entities" for data reporting, and then choose which one to use based on their implementation.The Analytics Middle Tier is intended for Local Education Agency use cases. Other use cases can be added in the future as needed to support other types of Education Organizations (e.g. a future view
StudentStateEducationAgencyDim).
Rationale for combining the various characteristic tables into a single point in time view:
There are seven (in Data Standard 3+) similar characteristics tables that do not have time periods associated with them
CohortYear
Disability
DisabilityDesignation
Language
LanguageUse
Race
TribalAffiliation
And there is one with a time period: StudentCharacteristic
Those without a time period can be combined into a single view for "demographics"
Those without a time period can also be included in that view, so long as the data analyst understands that the "Bridge" between the student and the demographics represents "data as of right now".
The two with a time period can, in the future, be used to create new Fact views that link to the date range. See Program Views below.
DemographicDim
Ultimately these values come from the edfi.Descriptor table, although not all descriptors will be here. String values will be used for keys instead of DescriptorId in order to allow combining data from multiple year-specific ODS databases into a single data mart - this would not be possible with the auto-incremented DescriptorId since that value will differ between ODS database instances.
Structure
Column | Data Type | Source | Description |
|---|---|---|---|
DemographicKey | String | "{Source Table}" or "{Source Table}.{Descriptor.CodeValue}" | Primary key. Made up of the table source and the Descriptor value. To support hierarchies, there will also be a root Key with only the table source value. |
DemographicParentKey | String | same as above | Facilitates creation of roll-up / hierarchy in BI tools by relating each individual record to its "parent concept". |
DemographicLabel | String | "{Descriptor}.{CodeValue}" for all Descriptors related to the relevant tables*. | For parent entities, will be the same as the Key. For child entities, will be the actual demographic label. |
Data Standard 2.2 Source Tables
Descriptors for the following tables:
StudentCohortYear
StudentDisability
StudentLanguage
StudentLanguageUse
StudentRace
StudentCharacteristic (where time dates encompass "now")
Student contains "IsEconomicDisadvantaged" in DS 2, whereas this is now one of the "StudentCharacteristics" in DS 3. In order to have parity between the two data standards, the DemographicDim view therefor needs a hard-coded row that does not come from a table:
DemographicKey | ParentKey | DemographicLable |
|---|---|---|
StudentCharacteristic#EconomicDisadvantaged | StudentCharacteristic | Economic Disadvantaged |
Otherwise the sample records will be as with the Data Standard 3+ samples below.
Data Standard 3+ Source Tables
Descriptors for the following tables:
StudentEducationAgencyCohortYear
StudentEducationAgencyDisability
StudentEducationAgencyDisabilityDesignation
StudentEducationAgencyLanguage
StudentEducationAgencyLanguageUse
StudentEducationAgencyRace
StudentEducationAgencyTribalAffiliation
StudentEducationAgencyStudentCharacteristic (where time StudentEducationAgencyStudentCharacteristicPeriod encompasses "now")
Sample Records
DemographicKey | ParentKey | DemographicLabel |
|---|---|---|
StudentCharacteristic | StudentCharacteristic | StudentCharacteristic |
StudentCharacteristic#Economic Disadvantaged | StudentCharacteristic | Economic Disadvantaged |
StudentCharacteristic#Homeless | StudentCharacteristic | Homeless |
StudentCharacteristic#Runaway | StudentCharacteristic | Runaway |
Race | Race | Race |
Race#American Indian - Alaska Native | Race | American Indian - Alaska Native |
Race#Asian | Race | Asian |
Race#Black - African American | Race | Black - African American |
Language | None | Language |
Language#Adyghe | Language | Adyghe |
Language#Swiss German | Language | Swiss German |
etc. |
|
|
StudentSchoolDim
Data Standard 2.2
Column | Data Type | Source | Description |
|---|---|---|---|
StudentSchoolKey | String | "{Student.StudentUniqueId}-{StudentSchoolAssociation.SchoolId}" | Primary key |
StudentKey | String | edfi.Student.UniqueId |
|
SchoolKey | String | edfi.StudentSchoolAssociation.SchoolId |
|
SchoolYear | String | edfi.StudentSchoolAssocation.SchoolYear | convert to string to signal to modeling tools that this is not an aggregatable number |
StudentFirstName | String | edfi.Student.FirstName |
|