Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

« Previous Version 19 Next »

Introduction

Analytics Middle Tier 2.0 is becoming a full citizen in the Ed-Fi platform in 2020, rather than just a proof-of-concept on the Ed-Fi Exchange. As it grows up, it needs to correct some architectural concerns that came up as feedback from the field. It also needs to be up to par with the latest release of the ODS/API, version 3.3. This document aims to inform about the challenges and elicit feedback on the real-world usefulness of the proposed solutions.

Naming Convention

Requirement

Hold names to under 63 characters for PostgreSQL compatibility.

Design

Currently, none of the objects have a name that violates this constraint. To help avoid problems with future views, it is proposed to either truncate "Dimension" to "Dim" or drop the word altogether.

Old NameA - TruncateB - Drop
​analytics.StudentDimensionanalytics.StudentDim​analytics.Student​

Is there too much risk of confusing the analytics.Student  view with the real edfi.Student  table when "Dimension" is entirely dropped? The typical use case for Analytics Middle Tier is to only import the views into a business intelligence / reporting data model - thus the end-user would not see the edfi.Student  table

Status

This is a substantial breaking change compared to 1.0. Since 1.0 adoption is limited, this breaking change would also be very limited - but it needs to happen now if it is going to happen.

Requesting additional input before committing. Respond to Stephen Fuqua, #dev-analytics-middle-tier on Slack, or with a comment on this page.

Multi Data Standard Support

Requirement

Support installing the views on ODS databases supporting multiple data standards (2.2, 3.1, 3.2).

Is support for Data Standard 2 really necessary in this next release? Ed-Fi Alliance is committed to both 3.1 and 3.2 in all releases in 2020, but has not decided about 2.x. Welcoming input on this.

Design

Version 1.3.0 added support for Data Standard 3.1, which was used by ODS/API 3.1.1 and 3.2, through the use of the –dataStandard <Ds2 | Ds31>  argument. 

Forcing the user to remember which data standard is installed is sub-optimal. We should be able to detect this implicitly and install the correct version without user input.

  1. If table AddressType exists, then install Data Standard 2. (warning) if needed
  2. Else if table VersionLevel exists, then install Data Standard 3.1.
  3. Else if table DeployJournal exists, then install Data Standard 3.2.
  4. Else throw an error: "Unable to determine the ODS database version".

Status

Will proceed with this design.  Stephen Fuqua

Student, Parent, and Staff Keys

Requirement

The views should expose "Key" fields based on the natural key of the underlying table. 

Design

In the case of StudentDimension , ContactPersonDimension , and UserDimension , the original release used StudentUSI , ParentUSI , and StaffUSI  respectively. The "USI" columns are primary keys and were used by mistake. The "UniqueId" columns are the correct natural keys.

Change all instances of StudentKey , ContactPersonKey , and UserKey  to use the corresponding "UniqueId" column from the source table.

Will proceed with this design.  Stephen Fuqua

Descriptor and Type Mapping

Requirement

Decouple the views from hard-coded Descriptor and Type values. 

Context

many of the views need to lookup records by Descriptor value - for instance, looking up the Attendance records where a student has an "Excused Absence" or "Unexcused Absence." Because the original developer had access to only a limited dataset, it was not realized that the Descriptor values will vary widely from one implementation to the next. Thus the hard-coding needs to be decoupled, allowing the implementation to provide a mapping from their Descriptor value to the concept used by the Analytics Middle Tier.

In theory, the various "Types" values in Data Standard 2 should provide a more universal constant than the Descriptors. However, some community members report that these too are mutable. Therefore, (a) using Types is not a solution for Data Standard 2, and (b) even those views with hard-coding to Types instead of Descriptors must be modified for greater independence. Note: Type tables were removed in Data Standard 3 precisely because they were not being used in the originally-designed manner. 

Design

Summary

  1. Move hard-coded values to a "Constants" table.
  2. Create mapping tables that link Descriptors or Types to Constants.
  3. Modify all views as needed to join to the Constants and new mapping tables.

 List of Descriptor and Type Constants...
ConstantNamePurpose
AddressType.Home Looking up ContactPerson's Home address
AddressType.Mailing Looking up ContactPerson's Mailing address
AddressType.Physical Looking up ContactPerson's Physical address
AddressType.TemporaryLooking up ContactPerson's Temporary address
AddressType.Work Looking up ContactPerson's Work address
Descriptor.Absent

Looking up StudentAbsenceEvents that should be treated as "Absent" in an Early Warning System. Example descriptor values to map might be "Excused Absence" and "Unexcused Absence."

As another example, if a Field Trip absence event should be treated as an absence from school for the purpose of Early Warning, then one would also map the descriptor for "Field Trip" to the constant "Descriptor.Absent".

Descriptor.TardyLooks up StudentAbsenceEvents that should be treated as "Tardy".
Descriptor.InstructionalDay

Determines if a calendar date is an instructional day that should be used in calculating attendance rates.

The Ed-Fi default template mapping would use both the "Instructional Day" and "Make Up Day" descriptors.

  
EmailType.Home/Personal Looking up ContactPerson's home or personal e-mail address.
EmailType.WorkLooking up ContactPerson's work e-mail address.  
FoodServicesDescriptor.FullPriceDetermines if a student is eligible for school food service.
GradeType.Grading PeriodLooking up the Grade records by the most granular period, which by default is "Grading Period". Some implementations might instead use terms like "Quarter" or "Six Weeks".
TelephoneNumberType.HomeLooking up ContactPerson's Home phone number.
TelephoneNumberType.MobileLooking up ContactPerson's Mobile phone number.
TelephoneNumberType.WorkLooking up ContactPerson's Work phone number.
Group.TeacherSupports creation of row-level authorization data.
Group.PrincipalSupports creation of row-level authorization data.
Group.SuperintendentSupports creation of row-level authorization data.

Example

In Version 1.x, the StudentEarlyWarningFact view reports on excused and unexcused absences, looking for StudentSchoolAttendanceEvent  records with attendance descriptor values of either "Excused Absence" or "Unexcused Absence".

In version 2, the view would now search for all StudentSchoolAttendanceEvent records whose descriptor maps to the relevant constant. Thus there would be two DescriptorMap  values, one each for "Excused Absence" and "Unexcused Absence." Any school who uses a different term than these two would create a DescriptorMap  record mapping that term to the DescriptorConstant  value of "Absent".

Implications

Those who install the Analytics Middle Tier will need to carefully assess their Descriptors and Types, and then manage the DescriptorMap table (and TypeMap , for Data Standard 2) accordingly. 

Default Mappings

A new command-line Option will be provided to run a script that loads the default Descriptor mapping for the default Ed-Fi descriptors (minimal/populated template descriptors).

.\EdFi.AnalyticsMiddleTier.exe --connectionString "..." --options DefaultMap

Status

Will proceed with this design.  Stephen Fuqua

Student Dimension Uniqueness

Requirement

Create a "Student" dimension with a single unique key.

Context

The Early Warning System fact views both assumed that the StudentDimension  would only have a single record for a student, as shown in these diagrams:

However, a student could be enrolled in multiple schools at the same time, resulting in two records in the StudentDimension for the same StudentKey. This is problematic for the PowerBI Starter Kit, which has a hard requirement for unique StudentKeys.

At the same time, the Program-related fact and event views are looking at a Student in a given Local Education Agency, whereas the facts above are looking at a Student at a School. Although the ODS database supports a Student enrolled in a Program associated with a School, our input from the field tells us that Program association is on the LEA not the school. Thus we need to balance these competing tensions in any change to the data model.

Design

Data Standard 3 moved the student demographics into the StudentEducationOrganizationAssociation  table, away from the Student table. The current StudentDimension  view provides student name, primary contact, and demographics. The proposal is to split this into two different views, changing the Early Warning System star schema to a snowflake.

The demographics on the view are secondary to the student-school enrollment relationship. Therefore we will have a new view called StudentEnrollment to show active enrollment/school association information, with a StudentEnrollmentKey  that combines the StudentUniqueId  and SchoolId (maintaining the principal of needing only one column for a join). The early warning views would then change to use the single StudentEnrollmentKey  in addition to the two separate StudentKey  and SchoolKey  columns. The original keys are retained to facilitate aggregating by student or school.

The StudentDimension  would continue to exist in a reduced form, providing name and primary contact information. The Program views would continue to have the StudentKey in them (no changes).

Impact on Security

By leaving StudentKey  and SchoolKey in the views above, the current row-level security views will not require any changes.

Problem

What if someone wants to build queries that analyze program participation by demographic data?

Suppose we have the following data (partial representation):

ProgramTypeDimension

ProgramTypeKeyProgramType
​3Spanish Grammar for Native Speakers​

StudentProgramEvent

StudentKeyLocalEducationAgencyKeyDateKeyProgramTypeKeyProgramEventType
​12​20191230​3​Enter

StudentEnrollmentDimension

StudentEnrollmentKeyStudentKeySchoolKeyEnrollmentDateGradeLevelLimitedEnglishProficiency
​1-41​4​201901108Limited
1-515201912308Not applicable

Here a student is enrolled at two schools, presumably in the same district. Perhaps the program enrollment is due to a specialized class in the student's home language, at a different school than the one attended daily. Since data entry occurred twice - once for each school - there is an opportunity for inconsistency. In this case, one of the school enrollment entries failed to note that the student has limited English Proficiency.

We can write the following query to analyze this program proficiency by LimitedEnglishProficiency.

select
	ProgramTypeDimension.ProgramType,
    count(1) as EnrolledCount,
    sum(case when StudentEnrollmentDimension.LimitedEnglishProficiency <> 'Not applicable' then 1 else 0 end) as LimitedEnglishProficiencyCount
from
	analytics.ProgramTypeDimension
inner join
	analytics.StudentProgramEvent on ProgramTypeDimension.ProgramTypeKey = StudentProgramEvent.ProgramTypeKey
inner join
	analytics.StudentEnrollmentDimension on StudentProgramEvent.StudentKey = StudentEnrollmentDimension.StudentKey
where
	StudentProgramEvent.ProgramEventType = 'Enter'
group by
	ProgramTypeDimension.ProgramType

The result will be

ProgramTypeEnrolledCountLimitedEnglishProficiencyCount
​Spanish Grammar for Native Speakers​2​1​

In reality there is only one student, yet this naive query counted two students. And it will have the odd result of showing that 50% of the students have limited English proficiency!

Arguably, there is a data quality problem that the Analytics Middle Tier cannot solve. But there is also a training issue: the data analyst needs a deeper understanding of the nuances of Ed-Fi data storage and of the Analytics Middle Tier views. Ignoring the data inconsistency problem, the following query would resolve the double-count problem:

select
	ProgramTypeDimension.ProgramType,
    count(1) as EnrolledCount,
    sum(case when StudentDimensionEnrollment.LimitedEnglishProficiency <> 'Not applicable' then 1 else 0 end) as LimitedEnglishProficiencyCount
from
	analytics.ProgramTypeDimension
inner join
	analytics.StudentProgramEvent on ProgramTypeDimension.ProgramTypeKey = StudentProgramEvent.ProgramTypeKey
outer apply (
	select distinct
		LimitedEnglishProficiency
	from
		analytics.StudentEnrollmentDimension 
	where
		StudentKey = StudentProgramEvent.StudentKey
) as StudentDimensionEnrollment
where
	StudentProgramEvent.ProgramEventType = 'Enter'
group by
	ProgramTypeDimension.ProgramType

These problems are rather ugly, and would be difficult to detect unless you're actively looking for them. Furthermore, while writing an updated SQL query was easy, how would one represent this in a business intelligence tool like Tableau or PowerBI?

Status

Requesting additional input before committing. Respond to Stephen Fuqua, #dev-analytics-middle-tier on Slack, or with a comment on this page.

School Year

Requirement

Add SchoolYear to help support longitudinal data.

Design

Data Standard 2's support for School Year is limited compared to Data Standard 3. The following dimension views could have a SchoolYear  column in them:

Data Standard 2Data Standard 3
​Student / Student EnrollmentStudent / Student Enrollment
Student SectionStudent Section

Date

Grading Period

The DateDimension  view does not include a SchoolKey as it was intended as a generic date table. Although it strikes the writer as a nonsensical, there could be two different school years for the same date at different schools. Therefore it does not seem appropriate to add this column to the DateDimension

There might be value in having the SchoolYear  in the GradingPeriod  and making the decision that Data Standard 2 would always return null for this column (so that the interfaces stay consistent): one could use this to create a filter hierarchy School Year > Grading Period. However, other than interpolating the begin and end dates on the Grading Period, there is currently no way to build out the hierarchy beyond this. The PeriodSequence is insufficient for determining a parent-child relationship between grading periods.

When trying to filter the StudentEarlyWarningFact  or StudentSectionGradeFact views, the available school years could be extracted from either the Student or Student Section dimensions to create a slicer. However, this would be much simpler if the SchoolYear  were simply included in those fact views.

Status

Requesting additional input before committing. Respond to Stephen Fuqua, #dev-analytics-middle-tier on Slack, or with a comment on this page.

Questions to address. Should we add a SchoolYear  column to...

ViewStephen's View
​Student / Student EnrollmentYes, easy, no harm​
Student SectionYes, easy, no harm
StudentEarlyWarningFactYes: while redundant, it simplifies ability to create self-serve analytics tools
StudentSectionGradeFactYes: while redundant, it simplifies ability to create self-serve analytics tools
GradingPeriodNo: prefer to avoid inconsistency between data standards, and the objective of slicing Facts by SchoolYear can be met better with the tools above.
DateDimensionNo

Separation Between Core and Use-Case Views

Requirement

Manage a collection of "core" views and separate collections of use-case specific views.

Design

The application already has a concept for installing optional components, which was first created for optional install of additional indexes in the ODS. Proposal:

  1. Always install a core set of views
    1. ContactPersonDimension
    2. DateDimension
    3. GradingPeriodDimension
    4. LocalEducationAgencyDimension
    5. MostRecentGradingPeriod
    6. SchoolDimension
    7. SchoolNetworkAssociationDimension
    8. StudentDimension
    9. StudentEnrollmentDimension (if created, see above)
    10. StudentSectionDimension

  2. Move some of the existing views into new optional collections:
    1. Row-level Security (RLS)
      1. StudentDataAuthorization
      2. UserAuthorization
      3. UserDimension

    2. Early Warning System (EWS)
      1. StudentEarlyWarningFact
      2. StudentSectionGradeFact

    3. QuickSight-Early Warning System (Q-EWS)
      1. Ews_SchoolRiskTrend
      2. Ews_StudentAttendanceTrend
      3. Ews_StudentEnrolledSectionGrade
      4. Ews_StudentEnrolledSectionGradeTrend
      5. Ews_StudentIndicators
      6. Ews_StudentIndicatorsByGradingPeriod
      7. Ews_UserSchoolAuthorization

    4. Program Analysis (PROGRAM)
      1. ProgramTypeDimension
      2. StudentProgramEvent
      3. StudentProgramFact

  3. Establish clear standards for distinguishing view names and avoiding name overlaps
    1. Option 1: separate by "namespace" (schema).  Instead of having a single analytics  schema, we could create an analytics_core  schema and other schemas to match use cases:
      1. analytics_rls  for Row-level Security
      2. analytics_ews  for Early Warning System
      3. analytics_quicksight  for QuickSight
      4. analytics_program  for Program Analysis

      Each new use case, whether created by the Alliance or submitted by the field, would get its own schema name. 

    2. Option 2: keep everything in a single schema, ensuring unique names, so that downstream data models (without namespaces/schemas) do not need to name their models differently than the views. 


Thus to install the Early Warning System and Row-level security collections used by the Power BI Starter Kit v2, the admin user would run this command:

.\EdFi.AnalyticsMiddleTier.exe --connectionString "..." --options EWS RLS




For deployment purposes, it would be ideal to separate these by "namespace" - or in SQL terms, by schema.

Requesting additional input before committing. Respond to Stephen Fuqua, #dev-analytics-middle-tier on Slack, or with a comment on this page.

Additional Views

Will not add any new views in the 2.0 release. New views can be added with 2.1, 2.2 etc. This 2.0 release is all about fixing architectural problems and setting the stage for broader adoption.

  • No labels