Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction

Analytics Middle Tier 2.0 is becoming a full citizen in the Ed-Fi platform in 2020, rather than just a proof-of-concept on the Ed-Fi Exchange. As it grows up, it needs to correct some architectural concerns that came up as feedback from the field. It also needs to be up to par with the latest release of the ODS/API, version 3.3. This document aims to inform about the challenges and elicit feedback on the real-world usefulness of the proposed solutions.

Naming Convention

Requirement

Hold names to under 63 characters for PostgreSQL compatibility.

Design

Currently, none of the objects have a name that violates this constraint. To help avoid problems with future views, it is proposed to either truncate "Dimension" to "Dim" or drop the word altogether. Generally, clarity should be preferred over length when naming objects, hence the question is: how much clarity is lost if we move away from the "Dimension" suffix?

Old NameA - TruncateB - Drop
​analytics.StudentDimensionanalytics.StudentDim​analytics.Student​


Info

Is there risk of confusing the analytics.Student  view with the real edfi.Student  table when "Dimension" is entirely dropped? The typical use case for Analytics Middle Tier is to only import the views into a business intelligence / reporting data model - thus the end-user would not see the edfi.Student  table


Tip
titleStatus

Committing to "Dim" suffix on dimensions for best balance between name length and clarity of intent.  Stephen Fuqua

Jira Legacy
serverEd-Fi Issue Tracker
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverIde04b01cb-fd08-30cd-a7d6-c8f664ef7691
keyBIA-289

Multi Data Standard Support

Requirement

Support installing the views on ODS databases supporting multiple data standards (2.2, 3.1, 3.2).

Warning

Is support for Data Standard 2 really necessary in this next release? Ed-Fi Alliance is committed to both 3.1 and 3.2 in all releases in 2020, but has not decided about 2.x. Welcoming input on this.

Design

Version 1.3.0 added support for Data Standard 3.1, which was used by ODS/API 3.1.1 and 3.2, through the use of the –dataStandard <Ds2 | Ds31>  argument. 

Forcing the user to remember which data standard is installed is sub-optimal. We should be able to detect this implicitly and install the correct version without user input.

  1. If table AddressType exists, then install Data Standard 2. (warning) if needed
  2. Else if table VersionLevel exists, then install Data Standard 3.1.
  3. Else if table DeployJournal exists, then install Data Standard 3.2.
  4. Else throw an error: "Unable to determine the ODS database version".


Tip
titleStatus

Will proceed with this design.  Stephen Fuqua

Jira Legacy
serverEd-Fi Issue Tracker
columnskey,summary,status
maximumIssues20
jqlQuerykey in (BIA-247, BIA-272)
serverIde04b01cb-fd08-30cd-a7d6-c8f664ef7691

Data Standard 2 support plan:

  • New views submitted by the community on Data Standard 3+ will not be translated to Data Standard 2 by the Alliance. Pull requests from community members adding the view(s) to Data Standard 2 will be welcomed.
  • This will be documented in the official notes.
  • Data Standard 2 support will signal deprecation - that is, we reserve the right to remove Data Standard from a future Analytics Middle Tier 3.0 release.

Student, Parent, and Staff Keys

Requirement

The views should expose "Key" fields based on the natural key of the underlying table. 

Design

In the case of StudentDimension , ContactPersonDimension , and UserDimension , the original release used StudentUSI , ParentUSI , and StaffUSI  respectively. The "USI" columns are primary keys and were used by mistake. The "UniqueId" columns are the correct natural keys.

Change all instances of StudentKey , ContactPersonKey , and UserKey  to use the corresponding "UniqueId" column from the source table.

Tip
titleStatus

Will proceed with this design.  Stephen Fuqua

Jira Legacy
serverEd-Fi Issue Tracker
columnskey,summary,status
maximumIssues20
jqlQuerykey in (BIA-152, BIA-227)
serverIde04b01cb-fd08-30cd-a7d6-c8f664ef7691

Descriptor and Type Mapping

Requirement

Decouple the views from hard-coded Descriptor and Type values. 

Context

many of the views need to lookup records by Descriptor value - for instance, looking up the Attendance records where a student has an "Excused Absence" or "Unexcused Absence." Because the original developer had access to only a limited dataset, it was not realized that the Descriptor values will vary widely from one implementation to the next. Thus the hard-coding needs to be decoupled, allowing the implementation to provide a mapping from their Descriptor value to the concept used by the Analytics Middle Tier.

In theory, the various "Types" values in Data Standard 2 should provide a more universal constant than the Descriptors. However, some community members report that these too are mutable. Therefore, (a) using Types is not a solution for Data Standard 2, and (b) even those views with hard-coding to Types instead of Descriptors must be modified for greater independence. Note: Type tables were removed in Data Standard 3 precisely because they were not being used in the originally-designed manner. 

Design

Summary

  1. Move hard-coded values to a "Constants" table.
  2. Create mapping tables that link Descriptors or Types to Constants.
  3. Modify all views as needed to join to the Constants and new mapping tables.

Gliffy
nameDescriptor Map 2
pagePin1

Expand
titleList of Descriptor and Type Constants...


ConstantNamePurpose
AddressType.Home Looking up ContactPerson's Home address
AddressType.Mailing Looking up ContactPerson's Mailing address
AddressType.Physical Looking up ContactPerson's Physical address
AddressType.TemporaryLooking up ContactPerson's Temporary address
AddressType.Work Looking up ContactPerson's Work address
Descriptor.Absent

Looking up StudentAbsenceEvents that should be treated as "Absent" in an Early Warning System. Example descriptor values to map might be "Excused Absence" and "Unexcused Absence."

As another example, if a Field Trip absence event should be treated as an absence from school for the purpose of Early Warning, then one would also map the descriptor for "Field Trip" to the constant "Descriptor.Absent".

Descriptor.TardyLooks up StudentAbsenceEvents that should be treated as "Tardy".
Descriptor.InstructionalDay

Determines if a calendar date is an instructional day that should be used in calculating attendance rates.

The Ed-Fi default template mapping would use both the "Instructional Day" and "Make Up Day" descriptors.

  
EmailType.Home/Personal Looking up ContactPerson's home or personal e-mail address.
EmailType.WorkLooking up ContactPerson's work e-mail address.  
FoodServicesDescriptor.FullPriceDetermines if a student is eligible for school food service.
GradeType.Grading PeriodLooking up the Grade records by the most granular period, which by default is "Grading Period". Some implementations might instead use terms like "Quarter" or "Six Weeks".
TelephoneNumberType.HomeLooking up ContactPerson's Home phone number.
TelephoneNumberType.MobileLooking up ContactPerson's Mobile phone number.
TelephoneNumberType.WorkLooking up ContactPerson's Work phone number.
Group.TeacherSupports creation of row-level authorization data.
Group.PrincipalSupports creation of row-level authorization data.
Group.SuperintendentSupports creation of row-level authorization data.


Example

In Version 1.x, the StudentEarlyWarningFact view reports on excused and unexcused absences, looking for StudentSchoolAttendanceEvent  records with attendance descriptor values of either "Excused Absence" or "Unexcused Absence".

In version 2, the view would now search for all StudentSchoolAttendanceEvent records whose descriptor maps to the relevant constant. Thus there would be two DescriptorMap  values, one each for "Excused Absence" and "Unexcused Absence." Any school who uses a different term than these two would create a DescriptorMap  record mapping that term to the DescriptorConstant  value of "Absent".

Implications

Those who install the Analytics Middle Tier will need to carefully assess their Descriptors and Types, and then manage the DescriptorMap table (and TypeMap , for Data Standard 2) accordingly. 

Default Mappings

A new command-line Option will be provided to run a script that loads the default Descriptor mapping for the default Ed-Fi descriptors (minimal/populated template descriptors).

Code Block
.\EdFi.AnalyticsMiddleTier.exe --connectionString "..." --options DefaultMap


Tip
titleStatus

Will proceed with this design.  Stephen Fuqua

Jira Legacy
serverEd-Fi Issue Tracker
columnskey,summary,status
maximumIssues20
jqlQuerykey in (BIA-150, BIA-238, BIA-288)
serverIde04b01cb-fd08-30cd-a7d6-c8f664ef7691

Changes to the Student Dimension

Requirements

  1. Create a "Student" dimension with a single unique key.
  2. Provide intuitive access to student demographics.

Context

Student Dimension Uniqueness

The Early Warning System fact views both assumed that the StudentDimension  would only have a single record for a student. However, a student could be enrolled in multiple schools at the same time, resulting in two records in the StudentDimension for the same StudentKey. This is problematic for the PowerBI Starter Kit, which has a hard requirement for unique StudentKeys.

Demographics in Ed-Fi UDM v3.x

In Data Standard 3 there are several sources for student demographics:

  • edfi.StudentSchoolAssociation  contains School Year, Enrollment Date and Grade Level
  • edfi.StudentEducationOrganizationAssociation  contains Sex, Hispanic/Latino ethnicity, and Limited English Proficiency
  • There are a series of many-to-many tables to store specific types of multi-value demographic characteristics - (warning) note these can be saved for either the school or the district (or charter, state, ESC, etc.)
    • edfi.StudentEducationOrganizationAssociationCohortYear 
    • edfi.StudentEducationOrganizationAssociationDisability 
    • edfi.StudentEducationOrganizationAssociationLanguage 
    • edfi.StudentEducationOrganizationAssociationLanguageUse 
    • edfi.StudentEducationOrganizationAssociationRace 
    • edfi.StudentEducationOrganizationAssociationTribalAffiliation 
  • And there is the generic edfi.StudentEducationOrganizationAssociationStudentCharacteristic table, which has a time Period associated with it.
    • Includes food service eligibility, which was present on Student  as a Boolean in version 1.

Dimension or Fact?

While gender, race, and ethnicity all have strings associated with them, some elements of the demographic and enrollment data are more fact-oriented than dimension-oriented:

  • IsHispanic
  • IsEconomicallyDisadvantaged
  • LimitedEnglishProficiency
  • IsEligibleForSchoolFoodService
  • SchoolEnrollmentDate

The ODS does not support slowly-changing dimensions, so there is only ever one current snapshot of these data - one cannot tie them to a date unless referring to the enrollment date... except in the case of food service eligibility because there is a time period.

Design Proposal

Expand
titleArchived proposals...

1. Shrink the Student Dimension 

Only the following columns truly belong on a StudentDimension  that has unique primary keys:

ColumnSource​StudentKeyedfi.Student.UniqueId​StudentFirstNameedfi.Student.FirstNameStudentMiddleNameedfi.Student.MiddleNameStudentLastNameedfi.Student.LastSurnameContactNamefrom analytics.ContactDimension for the first primary contact recordContactRelationshipContactAddressContactMobilePhoneNumberContactWorkPhoneNumberContactEmailAddressLastModifiedDateedfi.Student or analytics.ContactDimension

2. Handling Demographics

2a. Kick the Can Down the Road

The Early Warning System use case needs to know about a student's school enrollment - which is already provided via the StudentEarlyWarningFact  view. It does not need any demographic information. That information was included in version 1.0 to support ad hoc analytics without having a specific use case to anchor the structure of the data. Therefore we could simply remove all demographic data from 2.0. Re-introduce in 2.1 (or beyond) when there is a specific use case.

2b. Demographics Dimension for Ad Hoc Exploration

To support ad-hoc analytics exploration, continue supporting dimensional demographics data. Why dimension over fact?

  • Fact views should generally be use case driven, and we don't have a clearly defined use case for demographics right now.
  • A data analyst using a BI tool probably wants to slice data by demographics, which favors dimensional perspective over date-in-time fact or event perspective.
  • Image RemovedWhen slicing or filtering by some bit of demographic data, a BI tool should automatically re-calculate aggregations / metrics. This is different than creating a report - this is about real-time exploration. For example, "how many fifth graders are at risk for dropout based on attendance, behavior, and course performance?" The analyst simply wants to filter on GradeLevel=Fifth. Then drill down to "ask" about limited English proficiency. In Power BI, this looks the image to the right. In this case, none of the fifth graders are recorded as having limited English proficiency, so the filter only shows "not applicable." Thus the ad hoc exploration just uncovered something interesting.
  • That filter would not be so easy to achieve with a fact that is tied to a date, unless the data were provided for every single available date. Doing that would greatly increase the size of the analytics database without providing any additional value: since the ODS doesn't store past data, only current records, the demographic data would never change from one day to the next.

Create a new dimension that includes the following columns. Name it StudentSchoolDemographicsDim.

ColumnSource​StudentKeyedfi.Student.StudentUniqueId via edfi.StudentSchoolAssociation​.StudentUSISchoolKeyedfi.StudentSchoolAssociation.SchoolIdEnrollmentDateKeyedfi.StudentSchoolAssociation.EntryDate formatted as YYYY-MM-DDGradeLeveledfi.Descriptor.CodeValue via edfi.StudentSchoolAssociation.EntryGradeLevelDescriptorIdLimitedEnglishProficiency

edfi.Descriptor.CodeValue via edfi.StudentEducationOrganizationAssociation

→ If multiple records available in StudentEducationOrganizationAssociation, let School record's value take precedence over District's value.

→ Relies on the Descriptor Mapping described above

→ If not set, reports "Not Applicable" instead of null value

IsEconomicallyDisadvantaged

edfi.Descriptor via edfi.StudentEducationOrganizationAssociationStudentCharacteristic

→ If multiple records available in StudentEducationOrganizationAssociation, let School record's value take precedence over District's value.

→ If record present, then value is true - not displaying the Descriptor value

→ Relies on the Descriptor Mapping described above

THIS IS A MANY-TO-MANY CHARACTERISTIC THUS MOVING TO NEXT VIEW DOWN THE PAGE

IsEligibleForSchoolFoodService

edfi.StudentSchoolFoodServiceProgramAssociation via StudentUSI

→ Relies on the Descriptor Mapping described above

→ Any program enrollment that is not linked to "Full Price" is taken to imply that yes, the student is eligible for school food service.

IsHispanic

edfi.StudentEducationOrganizationAssociation.HispanicLatinoEthnicity

→ If multiple records available in StudentEducationOrganizationAssociation, let School record's value take precedence over District's value.

Sexedfi.Descriptor.CodeValue via edfi.StudentEducationOrganizationAssociation

2c. Single Bridge Table for Complete Set of Demographics

There are several other group categorizations that might be of interest:

  • CohortYear
  • Disability
  • DisabilityDesignation
  • Language / LanguageUse
  • StudentCharacteristic
  • Race
  • TribalAffiliation

These are all represented with many-to-many relationships, making them difficult to model on the StudentDemographics  view. We have already used one: Student Characteristic of "Economic Disadvantaged" - so there is one model: add columns for a bunch of characteristics of interest, e.g. an IsHomeless  column or HasTribalAffiliation  column. Alternately, could combine values by concept, e.g. single "StudentCharacteristic" field that might have value of "Homeless, Refugee" to represent two records in the Edfi.StudentEducationOrganizationAssociationStudentCharacteristic . The latter would be more flexible from a modeling standpoint, but less useful from an analytics standpoint (i.e. you don't want to force users to query with language such as studentcharacteristic like "%homeless%" because that will lead to terrible performance.

Instead of flattening the data in some arbitrary fashion, we can create a Bridge table to support the many-to-many student-demographic relationships, and a separate fact table to represent enrollment.

Gliffy
nameDemographics Bridge
pagePin5

The Demographic Key might be the descriptor value. The ParentKey supports the natural hierarchy of demographics, e.g. Tribal Affiliation would be a parent label with child keys. In the setting of a view, if there is any difficulty in writing a performant query, then the parent could be moved to an outrigger table. As the parent would not be directly used in the bridge, this should provide no difficulties for the data model.

Expand
titleSample records...

StudentCharacteristicDim

StudentCharacteristicKeyParentKeyStudentCharacteristicValue

​None

(warning) This row exists so that no record will have a null ParentKey .

None​Top Parent​StudentCharacteristicNoneStudentCharacteristicStudentCharacteristic#Economic DisadvantagedStudentCharacteristicEconomic DisadvantagedStudentCharacteristic#HomelessStudentCharacteristicHomelessStudentCharacteristic#RunawayStudentCharacteristicRunawayRaceNoneRaceRace#American Indian - Alaska NativeRaceAmerican Indian - Alaska NativeRace#AsianRaceAsianRace#Black - African AmericanRaceBlack - African AmericanLanguageNoneLanguageLanguage#AdygheLanguageAdygheLanguage#Swiss GermanLanguageSwiss Germanetc.

StudentSchoolCharacteristicBridge

The compound primary key is added for the sake of modeling tools that require a primary key column. As these are views, not tables, it would be impractical to try to create a synthetic key here. The example below shows a single student with three different demographic characteristics.

StudentDemographicsBridgeKeyStudentKeySchoolKeyStudentCharacteristicKey​604823-​Language#Chinese​604823123456​Language#Chinese604823-Race#Asian604823123456Race#Asian604823-StudentCharacteristic#Homeless604823123456StudentCharacteristic#Homeless

2d. Multiple Bridge Tables

Option (c) is tidy-looking, with its one "table". Nonetheless two objections quickly emerge:

  1. Difficult to write these views.
  2. Not very intuitive for the data analyst - must do too much work to get to a specific demographic category.

So as an alternative, don't combine these various demographics into a single view. Leave as separate views. There are only two bridge views in the diagram below, but in fact there would be many more for the various demographic characteristics listed above. This might be appropriately easy for the data analyst, but it certainly crowds the data model with a lot of normalized / snowflake data - contrary to the general point of the Analytics Middle Tier!

Gliffy
nameMultiple Bridges
pagePin2

The current proposal (as of ):

  1. Eliminate the idea of a separate "Student Dimension" in the core data collection.
  2. Create two new, very similar, dimensions to replace the old StudentDimension:
    1. StudentSchoolDim 
    2. StudentLocalEducationAgencyDim 
  3. Combine the data from various edfi.StudentEducationOrganizationAssociationXYZ  tables into a single view, StudentDemographicDim.
  4. Create two bridge tables to link student information to the characteristics
    1. StudentSchoolDemographicBridge 
    2. StudentLocalEducationAgencyDemographicBridge 

Gliffy
nameStudent to Characteristic Relationships
pagePin2

Rationale for the Student Dimension replacement:

  • Students don't (or shouldn't) exist in isolation from an organization - hence no need for a StudentDim.
  • Across the Ed-Fi data model, there are two different student relationships:
  • With the school (e.g. StudentSchoolAssociation, StudentDisciplineIncident, StudentGradebookEntry, etc.).
  • And with the more generic Education Organization - which in the current context of the Analytics Middle Tier, generally means Local Education Agency.

    Note

    Exception: StudentAssessment is only connected directly to a Student! From analytics viewpoint, we will define the Analytics Middle Tier as assuming that analytics on assessment data will always be in the context of a School or Local Education Agency.

  • In defining meaningful StudentSchool  and StudentLocalEducationAgency  entities, there will be some overlap of fields - but the data could be different. This is an inherently dangerous area of the Ed-Fi data model. If we were to combine the data into a single perspective, then we would be hiding the danger. The data analyst will need to read and understand why there are two "root entities" for data reporting, and then choose which one to use based on their implementation.
  • The Analytics Middle Tier is intended for Local Education Agency use cases. Other use cases can be added in the future as needed to support other types of Education Organizations (e.g. a future view StudentStateEducationAgencyDim ).
  • Rationale for combining the various characteristic tables into a single point in time view:

    1. There are seven similar characteristics tables that do not have time periods associated with them
      1. CohortYear
      2. Disability
      3. DisabilityDesignation
      4. Language
      5. LanguageUse
      6. Race
      7. TribalAffiliation
    2. And there are two with a time period
      1. ProgramParticipation
      2. StudentCharacteristic
    3. Those without a time period can be combined into a single view for "demographics"
    4. Those  without a time period can also be included in that view, so long as the data analyst understands that the "Bridge" between the student and the demographics represents "data as of right now".
    5. The two with a time period can, in the future, be used to create new Fact views that link to the date range. See Program Views below.

    DemographicDim

    Ultimately these values come from the edfi.Descriptor  table, although not all descriptors will be here. String values will be used for keys instead of DescriptorId in order to allow combining data from multiple year-specific ODS databases into a single data mart - this would not be possible with the auto-incremented DescriptorId  since that value will differ between ODS database instances.

    Structure

    ColumnData TypeSourceDescription​DemographicKeyString​

    "{Source Table}" or 

    "{Source Table}.{Descriptor.CodeValue}"

    Primary key.​

    Made up of the table source and the Descriptor value. To support hierarchies, there will also be a root Key with only the table source value.

    ParentKeyStringsame as above

    Facilitates creation of roll-up / hierarchy in BI tools by relating each individual record to its "parent concept".

    DemographicLabelString"{Descriptor}.{CodeValue}" for all Descriptors related to the relevant tables*.For parent entities, will be the same as the Key. For child entities, will be the actual demographic label.

    * source tables:

    1. CohortYear
    2. Disability
    3. DisabilityDesignation
    4. Language
    5. LanguageUse
    6. Race
    7. TribalAffiliation
    8. ProgramParticipation (where time period encompasses "now")
    9. StudentCharacteristic (where time period encompasses "now")

    Sample Records

    DemographicKeyParentKeyDemographicLabel

    ​None

    (warning) This row exists so that no record will have a null ParentKey .

    None​Top Parent​StudentCharacteristicNoneStudentCharacteristicStudentCharacteristic#Economic DisadvantagedStudentCharacteristicEconomic DisadvantagedStudentCharacteristic#HomelessStudentCharacteristicHomelessStudentCharacteristic#RunawayStudentCharacteristicRunawayRaceNoneRaceRace#American Indian - Alaska NativeRaceAmerican Indian - Alaska NativeRace#AsianRaceAsianRace#Black - African AmericanRaceBlack - African AmericanLanguageNoneLanguageLanguage#AdygheLanguageAdygheLanguage#Swiss GermanLanguageSwiss Germanetc.

    StudentSchoolDim

    Gliffy
    nameStudentSchoolDim ERD
    pagePin2

    Structure

    ColumnData TypeSourceDescriptionStudentSchoolKeyString"{Student.StudentUniqueId}-{StudentSchoolAssociation.SchoolId}"Primary key​StudentKeyStringedfi.Student.UniqueId​SchoolKeyStringedfi.StudentSchoolAssociation.SchoolIdStudentFirstNameStringedfi.Student.FirstNameStudentMiddleNameStringedfi.Student.MiddleNameStudentLastNameStringedfi.Student.LastSurnameContactNameStringanalytics.ContactDimension
    First primary contact
    ContactRelationshipStringContactAddressStringContactMobilePhoneNumberStringContactWorkPhoneNumberStringContactEmailAddressStringEnrollmentDateKeyStringedfi.StudentSchoolAssociation.EntryDateformatted as YYYY-MM-DDGradeLevelStringedfi.Descriptor.CodeValue via edfi.StudentSchoolAssociation.EntryGradeLevelDescriptorIdReplace null with "Unknown"LimitedEnglishProficiency

    String

    edfi.Descriptor.CodeValue via edfi.StudentEducationOrganizationAssociation

    → If not set, reports "Not Applicable" instead of null value

    Replace null with "Not Applicable"SchoolFoodService

    String

    edfi.Descriptor.CodeValue via

    edfi.StudentSchoolFoodServiceProgramAssociation

    Replace null with "Not Applicable"IsHispanic

    Boolean

    edfi.StudentEducationOrganizationAssociation.HispanicLatinoEthnicity

    Replace null with falseSexStringedfi.Descriptor.CodeValue via edfi.StudentEducationOrganizationAssociationReplace null with "Unknown"LastModifiedDateDateTime

    Most recent date from any source that has a LastModifiedDate column

    Sample Records

    DemographicKeyParentKeyDemographicLabel

    ​None

    (warning) This row exists so that no record will have a null ParentKey .

    None​Top Parent​StudentCharacteristicNoneStudentCharacteristic

    StudentLocalEducationAgencyDim

    WarningThis is still a work in progress. Also a problem of mixing granularity of School and Ed Org data. 04

    Summary:

    1. Eliminate the idea of a separate "Student Dimension" in the core data collection.
    2. Create two new, very similar, dimensions to replace the old StudentDimension:
      1. StudentSchoolDim 
      2. StudentLocalEducationAgencyDim 
    3. Combine the data from various edfi.StudentEducationOrganizationAssociationXYZ  tables into a single view, StudentDemographicDim.
    4. Create two bridge tables to link student information to the characteristics
      1. StudentSchoolDemographicBridge 
      2. StudentLocalEducationAgencyDemographicBridge 


    Gliffy
    nameStudent to Characteristic Relationships
    pagePin2

    Rationale for the Student Dimension replacement:

    1. Students don't (or shouldn't) exist in isolation from an organization - hence no need for a StudentDim.
    2. Across the Ed-Fi data model, there are two different student relationships:
      1. With the school (e.g. StudentSchoolAssociation, StudentDisciplineIncident, StudentGradebookEntry, etc.).
      2. And with the more generic Education Organization - which in the current context of the Analytics Middle Tier, generally means Local Education Agency.

        Note

        Exception: StudentAssessment is only connected directly to a Student! From analytics viewpoint, we will define the Analytics Middle Tier as assuming that analytics on assessment data will always be in the context of a School or Local Education Agency.


    3. In defining meaningful StudentSchool  and StudentLocalEducationAgency  entities, there will be some overlap of fields - but the data could be different. This is an inherently dangerous area of the Ed-Fi data model. If we were to combine the data into a single perspective, then we would be hiding the danger. The data analyst will need to read and understand why there are two "root entities" for data reporting, and then choose which one to use based on their implementation.
    4. The Analytics Middle Tier is intended for Local Education Agency use cases. Other use cases can be added in the future as needed to support other types of Education Organizations (e.g. a future view StudentStateEducationAgencyDim ).

    Rationale for combining the various characteristic tables into a single point in time view:

    1. There are seven similar characteristics tables that do not have time periods associated with them
      1. CohortYear
      2. Disability
      3. DisabilityDesignation
      4. Language
      5. LanguageUse
      6. Race
      7. TribalAffiliation
    2. And there are two with a time period
      1. ProgramParticipation
      2. StudentCharacteristic
    3. Those without a time period can be combined into a single view for "demographics"
    4. Those  without a time period can also be included in that view, so long as the data analyst understands that the "Bridge" between the student and the demographics represents "data as of right now".
    5. The two with a time period can, in the future, be used to create new Fact views that link to the date range. See Program Views below.

    DemographicDim


    Ultimately these values come from the edfi.Descriptor  table, although not all descriptors will be here. String values will be used for keys instead of DescriptorId in order to allow combining data from multiple year-specific ODS databases into a single data mart - this would not be possible with the auto-incremented DescriptorId  since that value will differ between ODS database instances.

    Structure

    ColumnData TypeSourceDescription
    ​DemographicKeyString​

    "{Source Table}" or 

    "{Source Table}.{Descriptor.CodeValue}"

    Primary key.​

    Made up of the table source and the Descriptor value. To support hierarchies, there will also be a root Key with only the table source value.

    ParentKeyStringsame as above

    Facilitates creation of roll-up / hierarchy in BI tools by relating each individual record to its "parent concept".

    DemographicLabelString"{Descriptor}.{CodeValue}" for all Descriptors related to the relevant tables*.For parent entities, will be the same as the Key. For child entities, will be the actual demographic label.

    * source tables:

    1. CohortYear
    2. Disability
    3. DisabilityDesignation
    4. Language
    5. LanguageUse
    6. Race
    7. TribalAffiliation
    8. ProgramParticipation (where time period encompasses "now")
    9. StudentCharacteristic (where time period encompasses "now")

    Sample Records

    DemographicKeyParentKeyDemographicLabel

    ​None

    (warning) This row exists so that no record will have a null ParentKey .

    None​Top Parent​
    StudentCharacteristicNoneStudentCharacteristic
    StudentCharacteristic#Economic DisadvantagedStudentCharacteristicEconomic Disadvantaged
    StudentCharacteristic#HomelessStudentCharacteristicHomeless
    StudentCharacteristic#RunawayStudentCharacteristicRunaway
    RaceNoneRace
    Race#American Indian - Alaska NativeRaceAmerican Indian - Alaska Native
    Race#AsianRaceAsian
    Race#Black - African AmericanRaceBlack - African American
    LanguageNoneLanguage
    Language#AdygheLanguageAdyghe
    Language#Swiss GermanLanguageSwiss German
    etc.



    StudentSchoolDim

    Gliffy
    nameStudentSchoolDim ERD
    pagePin3

    ColumnData TypeSourceDescription
    StudentSchoolKeyString"{Student.StudentUniqueId}-{StudentSchoolAssociation.SchoolId}"Primary key
    ​StudentKeyStringedfi.Student.UniqueId​
    SchoolKeyStringedfi.StudentSchoolAssociation.SchoolId
    StudentFirstNameStringedfi.Student.FirstName
    StudentMiddleNameStringedfi.Student.MiddleName
    StudentLastNameString

    edfi.Student.LastSurname


    EnrollmentDateKeyStringedfi.StudentSchoolAssociation.EntryDateformatted as YYYY-MM-DD
    GradeLevelStringedfi.Descriptor.CodeValue via edfi.StudentSchoolAssociation.EntryGradeLevelDescriptorIdReplace null with "Unknown"
    LimitedEnglishProficiency

    String

    edfi.Descriptor.CodeValue via edfi.StudentEducationOrganizationAssociation

    Replace null with "Not Applicable"
    SchoolFoodService

    String

    edfi.Descriptor.CodeValue via

    edfi.StudentSchoolFoodServiceProgramAssociation

    Replace null with "Not Applicable"
    IsHispanic

    Boolean

    edfi.StudentEducationOrganizationAssociation.HispanicLatinoEthnicity

    Replace null with false
    SexStringedfi.Descriptor.CodeValue via edfi.StudentEducationOrganizationAssociationReplace null with "Unknown"
    LastModifiedDateDateTime

    Most recent date from any source that has a LastModifiedDate column



    Info

    The (first) primary contact was included in the original Student Dimension to further flatten the model. However, this had a large performance cost. To improve performance, flattening the primary contact is now left as an exercise for downstream semantic models - for example in a SSAS Tabular Data Model. 

    StudentLocalEducationAgencyDim

    Gliffy
    nameStudentLocalEducationAgencyDim
    pagePin1

    ColumnData TypeSourceDescription
    StudentLocalEducationAgencyKeyString"{Student.StudentUniqueId}-{LocalEducationAgency.EducationOrganizationId}"Primary key
    ​StudentKeyStringedfi.Student.UniqueId​
    LocalEducationAgencyKeyStringedfi.LocalEducationAgency.EducationOrganizationId
    StudentFirstNameString

    edfi.Student.FirstName


    StudentMiddleNameStringedfi.Student.MiddleName
    StudentLastNameStringedfi.Student.LastSurname
    LimitedEnglishProficiency

    String

    edfi.Descriptor.CodeValue via edfi.StudentEducationOrganizationAssociation

    Replace null with "Not Applicable"
    IsHispanic

    Boolean

    edfi.StudentEducationOrganizationAssociation.HispanicLatinoEthnicity

    Replace null with false
    SexStringedfi.Descriptor.CodeValue via edfi.StudentEducationOrganizationAssociation
    LastModifiedDateDateTime

    Most recent date from any source that has a LastModifiedDate column


    Alternatives

    The following alternatives were considered and rejected

    Split StudentDim into StudentDim and StudentEnrollmentDim

    The original StudentDimension  would be split in two: StudentDim  with no SchoolKey  in it and a StudentEnrollmentDim  (or StudentDemographicDim ) holding the Student-to-school relationship and demographics. Rejected for these reasons:


    1. Trying to keep the number of views as small as possible, so that the domain model is easier to understand compared to the source Ed-Fi data model.
    2. Generally need to query for that student-school relationship - not for a student in isolation.
    3. Keeping only a single "enrollment" or "demographic" dimension for the student requires implementing business logic to determine which demographics take precedence - if demographics are saved for both school and local education agency, then when one should be used? Whichever choice is made, it will likely be wrong for many implementations.

    Change StudentDim to StudentSchoolDim

    In this version, the old StudentDimension  is essentially renamed to StudentSchoolDim - largely preserving the old structure. Compared to the proposed model, this version allows the data analyst to quickly and easily find the right student information. It also relieves the analyst from having to decide which version of truth to use - the School or the Local Education Agency. As mentioned above, it has been decided that the Analytics Middle Tier should not gloss over this difficulty: the data analyst must inspect their implementation and decide which perspective (School or Local Education Agency) is appropriate in each circumstance.

    Create Separate Bridge Tables for Each Demographic

    Instead of combining the demographics into a single bridge view, we could have created one for each concept: Disability, Race, Tribal Affiliation, etc. On one level, this would have simplified the data analyst's work when looking for a particular demographic field: they can just look for the view with the word "Race" in the name, for example. However, this comes at the expense of proliferating more tables, making the Analytics Middle Tier look too much like the Ed-Fi data standard.

    Gliffy
    nameMultiple Bridges
    pagePin2

    Note

    Beginning to feel confident about the outlined proposal.  Stephen Fuqua

    Jira Legacy
    serverEd-Fi Issue Tracker
    columnskey,summary,status
    maximumIssues20
    jqlQuery"Epic Link" = BIA-303
    serverIde04b01cb-fd08-30cd-a7d6-c8f664ef7691

    Program Views

    Requirement

    Support analytics on Program Participation at the school level.

    Context

    A small set of Program-related views was added to Analytics Middle Tier as an experiment in supporting a second use-case: analyzing student program participation. Programs in the default Ed-Fi ODS template include "Bilingual", "Career and Technical Education", "Special Education", and a few others. These data are represented in two different fact views: analytics.StudentProgramEvent  and analytics.StudentProgramFact 

    The "Event" view represents the date on which a student entered or exited a program. The "Fact" view represents every day on which the student was in a program. Each perspective has its own utility in analytics / reporting.

    Note, however, that they both join to analytics.LocalEducationAgencyDimension . There is no linkage to schools. This is because the data modeler originally heard (or thought he heard) that program enrollment is "always" at the district level. Since then, he has received feedback that many implementations do link students to programs at the school level, or even at the state level.

    Design

    a Remove the Views

    Eliminate the problem by eliminating the views, unless and until we get a detailed real-world use case definition that would solve these problems.

    b Add a SchoolKey to Both Views

    This implies that SchoolKey  or LocalEducationAgencyKey  could be null, generally an undesirable situation in dimensional modeling. A few options:

    1. Ignore the problem: downstream data analyst have to join the program views to SchoolDimension  or LocalEducationAgencyDimension  with an outer join.
      1. (tick) Good for data architect.
      2. (warning) Dangerous for data analyst.

    2. Nulls can be eliminated - or at least nearly eliminated - for LocalEducationAgencyKey by loading a School's LocalEducationAgencyKey  value. 
      1. (warning) Moderate additional complexity for data architecture.
      2. (tick)(warning) Resolves one outer join problem but leaves the other in place.

    3. Create a "fake school" for each LEA in the SchoolDimension , with SchoolName = 'n/a' . Use this as the SchoolKey  when program participation is only at the LEA level.
      1. (warning) Ugly for the data architect, although not impossible.
      2. (tick)(warning) Resolves the other outer join problem, at the expense of having a strange "District" entry show up in School filters. Dubious value.

    4. Separate the views into copies for School and LocalEducationAgency.
      1. (error) Just forces the problem onto the data analyst.

    The Data Standard  shows that a School can belong to 0 or 1 Local Education Agency. Side note: that Agency might be a Charter Management Organization. Thus option 2 can still lead to lost records when using an INNER JOIN. As with Option 3, null/missing records can be eliminated by creating a "n/a" LocalEducationAgency for these schools.

    If these program views are to be kept, then a combination of options 2 and 3 seems like the only option that presents a useful interface to the data analyst.

    Tip
    titleStatus

    Going to defer for a real use case so that we don't mislead anyone. Taking the program views out of Analytics Middle Tier 2.0. 

    Jira Legacy
    serverEd-Fi Issue Tracker
    columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
    serverIde04b01cb-fd08-30cd-a7d6-c8f664ef7691
    keyBIA-293

     Stephen Fuqua

    School Year

    Requirement

    Add SchoolYear to help support longitudinal data / multi-year databases. Wherever possible, would be nice to support drill-down hierarchies by school year.

    Design

    The following dimension views could have a SchoolYear  column in them; Data Standard 2's support for School Year is limited compared to Data Standard 3. 

    Data Standard 2Data Standard 3
    ​Student / Student EnrollmentStudent / Student Enrollment
    Student SectionStudent Section

    Date

    Grading Period

    The multi-year use-case was not originally one of the goals of the Analytics MIddle Tier, so no consideration was given to adding to the two views that could support it. It will be trivial to add to these two views in common above.

    For Date and Grading Period, there is real value. To support in Data Standard 2, we would need to create a mapping table or extra column on each of those two tables. This takes into account that one record could below to multiple school years in some edge cases. The additional effort required may push solving for Date and Grading Period to a future release, e.g. Analytics Middle Tier 2.1.

    Tip

    Decided to support SchoolYear in the student-school relationship and in the student-section relationship in Analytics Middle Tier 2.0. SchoolYear column will exist for Data Standard 2 but will not be populated where not available.  Stephen Fuqua

    Jira Legacy
    serverEd-Fi Issue Tracker
    columnskey,summary,status
    maximumIssues20
    jqlQuery"epic link" = BIA-295
    serverIde04b01cb-fd08-30cd-a7d6-c8f664ef7691

    Separation Between Core and Use-Case Views

    Requirement

    Manage a collection of "core" views and separate collections of use-case specific views.

    Design

    The application already has a concept for installing optional components, which was first created for optional install of additional indexes in the ODS. Proposal:

    1. Always install a core set of views
      1. ContactPersonDimension
      2. DateDimension
      3. GradingPeriodDimension
      4. LocalEducationAgencyDimension
      5. MostRecentGradingPeriod
      6. SchoolDimension
      7. SchoolNetworkAssociationDimension
      8. StudentDimension
      9. StudentEnrollmentDimension (if created, see above)
      10. StudentSectionDimension

    2. Move some of the existing views into new optional collections:
      1. Row-level Security (RLS)
        1. StudentDataAuthorization
        2. UserAuthorization
        3. UserDimension

      2. Early Warning System (EWS)
        1. StudentEarlyWarningFact
        2. StudentSectionGradeFact

      3. QuickSight-Early Warning System (QEWS)
        1. Ews_SchoolRiskTrend
        2. Ews_StudentAttendanceTrend
        3. Ews_StudentEnrolledSectionGrade
        4. Ews_StudentEnrolledSectionGradeTrend
        5. Ews_StudentIndicators
        6. Ews_StudentIndicatorsByGradingPeriod
        7. Ews_UserSchoolAuthorization

      4. Program Analysis (PROGRAM)
        1. ProgramTypeDimension
        2. StudentProgramEvent
        3. StudentProgramFact

          Tip

          Thus to install the Early Warning System and Row-level security collections used by the Power BI Starter Kit v2, the admin user would run this command:

          Code Block
          .\EdFi.AnalyticsMiddleTier.exe --connectionString "..." --options EWS RLS



    3. Avoid name overlaps
      1. Option 1: separate by "namespace" (schema).  Instead of having a single analytics  schema, we could create an analytics_core  schema and other schemas to match use cases:

        v1 Namev2 Name
        analytics.​ContactPersonDimensionanalytics_core.​ContactPersonDimension
        analytics.DateDimensionanalytics_core.DateDimension
        analytics.Ews_SchoolRiskTrendanalytics_qews.SchoolRiskTrend
        analytics.Ews_StudentAttendanceTrendanalytics_qews.StudentAttendanceTrend
        analytics.Ews_StudentEnrolledSectionGradeanalytics_qews.StudentEnrolledSectionGrade
        analytics.Ews_StudentEnrolledSectionGradeTrendanalytics_qews.StudentEnrolledSectionGradeTrend
        analytics.Ews_StudentIndicatorsanalytics_qews.StudentIndicators
        analytics.Ews_StudentIndicatorsByGradingPeriodanalytics_qews.StudentIndicatorsByGradingPeriod
        analytics.Ews_UserSchoolAuthorizationanalytics_qews.UserSchoolAuthorization
        analytics.GradingPeriodDimensionanalytics_core.GradingPeriodDimension
        analytics.LocalEducationAgencyDimensionanalytics_core.LocalEducationAgencyDimension
        analytics.MostRecentGradingPeriodanalytics_core.MostRecentGradingPeriod
        analytics.ProgramTypeDimensionanalytics_program.ProgramTypeDimension
        analytics.SchoolDimensionanalytics_core.SchoolDimension
        analytics.SchoolNetworkAssociationDimensionanalytics_core.SchoolNetworkAssociationDimension
        analytics.StudentDataAuthorizationanalytics_rls.StudentDataAuthorization
        analytics.StudentDimensionanalytics_core.StudentDimension
        analytics.StudentEarlyWarningFactanalytics_ews.StudentEarlyWarningFact
        analytics.StudentProgramEventanalytics_program.StudentProgramEvent
        analytics.StudentProgramFactanalytics_program.StudentProgramFact
        analytics.StudentSectionDimensionanalytics_core.StudentSectionDimension
        analytics.StudentSectionGradeFactanalytics_ews.StudentSectionGradeFact
        analytics.UserAuthorizationanalytics_rls.UserAuthorization
        analytics.UserDimensionanalytics_rls.UserDimension
        analytics.UserStudentDataAuthorizationanalytics_rls.UserStudentDataAuthorization


      2. Option 2: keep everything in a single schema, ensuring unique names, so that downstream data models (without namespaces/schemas) do not need to name their models differently than the views. Put use case name as object name prefix.

        v1 Namev2 Name
        analytics.​ContactPersonDimensionanalytics.​ContactPersonDim
        analytics.DateDimensionanalytics.DateDim
        analytics.Ews_SchoolRiskTrendanalytics.qews_SchoolRiskTrend
        analytics.Ews_StudentAttendanceTrendanalytics.qews_StudentAttendanceTrend
        analytics.Ews_StudentEnrolledSectionGradeanalytics.qews_StudentEnrolledSectionGrade
        analytics.Ews_StudentEnrolledSectionGradeTrendanalytics.qews_StudentEnrolledSectionGradeTrend
        analytics.Ews_StudentIndicatorsanalytics.qews_StudentIndicators
        analytics.Ews_StudentIndicatorsByGradingPeriodanalytics.qews_StudentIndicatorsByGradingPeriod
        analytics.Ews_UserSchoolAuthorizationanalytics.qews_UserSchoolAuthorization
        analytics.GradingPeriodDimensionanalytics.GradingPeriodDim
        analytics.LocalEducationAgencyDimensionanalytics.LocalEducationAgencyDim
        analytics.MostRecentGradingPeriodanalytics.MostRecentGradingPeriod
        analytics.ProgramTypeDimensionanalytics.program_ProgramTypeDimension
        analytics.SchoolDimensionanalytics.SchoolDim
        analytics.SchoolNetworkAssociationDimensionanalytics.SchoolNetworkAssociationDim
        analytics.StudentDataAuthorizationanalytics.rls_StudentDataAuthorization
        analytics.StudentDimensionanalytics.StudentDim
        analytics.StudentEarlyWarningFactanalytics.ews_StudentEarlyWarningFact
        analytics.StudentProgramEventanalytics.program_StudentProgramEvent
        analytics.StudentProgramFactanalytics.program_StudentProgramFact
        analytics.StudentSectionDimensionanalytics.StudentSectionDim
        analytics.StudentSectionGradeFactanalytics.ews_StudentSectionGradeFact
        analytics.UserAuthorizationanalytics.rls_UserAuthorization
        analytics.UserDimensionanalytics.rls_UserDim
        analytics.UserStudentDataAuthorizationanalytics.rls_UserStudentDataAuthorization


      3. Option 3: keep everything in single schema and don't force prefixing for use cases. Just have clear and unique names for views. Prefix on case-by-case basis.

        Note

        Leaning toward option (b). Additional benefit: helps the reader know where to look up additional information about use-case specific views, such as important usage notes.


    Tip

    Going with option (b).  Stephen Fuqua

    Jira Legacy
    serverEd-Fi Issue Tracker
    columnskey,summary,status
    maximumIssues20
    jqlQuery"Epic Link" = BIA-298
    serverIde04b01cb-fd08-30cd-a7d6-c8f664ef7691

    Additional Views

    Will not add any new views in the 2.0 release. New views can be added with 2.1, 2.2 etc. This 2.0 release is all about fixing architectural problems and setting the stage for broader adoption.

    Documentation

    End-Users

    Decisions made in defining the Ed-Fi data model are allowing a great deal of flexibility in storing data, at the expense of un-intuitive complexity. Users of the Analytics Middle Tier need to know about the complexities in order to use this tool effectively. For example, if adopting option (b) to solve the Program view problem, there needs to be clear guidance to help the end-user.

    Users also need to be made aware of potential data quality issues, for example with the Student Demographics. If a student is enrolled in two schools at a time, and they don't both enter the same demographic information (e.g. one accidentally clicks on the wrong gender, or one does not mark student as Hispanic/Latino), then how will the data analyst know and reconcile this? The Ed-Fi Alliance cannot prescribe an answer: it depends on the implementation.

    For the (rare?) case that the console deployment tool does not work, provide guidance on directly accessing the views from the source code repository. Warn that scripts, when manually executed, need to be run in numeric order of file name, starting with the Core collection first and then installing other collections as needed.

    Other issues will likely arise, so that end-user documentation will be an ongoing exercise.

    Contributors

    Documentation for contributors to the project will need to spell out how to contribute; how to create use-cases; naming conventions; when and how to place a new view into the Core collection.

    Version 2 versus Version 3 support.


    Note
    titleDocument Status

    Work-in-progress draft. This notice will be removed when the "final" design decisions are documented.

     :

    • Completely reworked Student Dimension section, splitting out Program view problems and giving more context, as well as new solutions, for the demographics / enrollment problem.
    • Notes on documentation requirements.

     :

    • Demographics bridge designs.
    • Committing to "Dim" suffix.

     :

    • Only committing to partial Data Standard 2 support.
    • Commitment to School Year on two simple cases for now.
    • Removing program views.
    • Moving original 1-1 student demographics to StudentSchoolDemographics, and M-M demographics to a single bridge table StudentSchoolCharacteristics.
      • Actually needs a little more thought: Student → EducationOrganization.

     :

    • Decision on collection naming convention.

     

    • Fully rewritten proposal for student enrollment and demographics.


    Table of Contents
    maxLevel3