Longitudinal Indicators

The current Ed-Fi community has struggled with the non-longitudinal nature of many indicators in the model. For the purposes of this analysis, an "indicator" is generally regarded as a field describing an important characteristic of a primary data model entity (e.g. a student, school, etc.). Also, indicators here will have a very broad definition: the value of an indicator can be binary in nature (true/false) or have a value associated with it ("red", "not submitted", "3.3", etc.).

Tickets:  DATASTD-1561 - Getting issue details... STATUS

Use Cases

Historically, there have been a 2 important use cases relating to indicators:

Use Case 1: as an agency performing data collections and reporting, I want to capture the value of a indicator for a period of time or at a moment in time (e.g. at the end of a month) so I can perform operations such as issuing payments to districts or performing compliance reporting.

Use Case 2: as a school district, I want to track the movement of indicators over time so I can make instructional decisions on how to intervene to improve the education of the student. 

The longitudinal nature of indicators is important in case #2, but not in case #1. The evolution of Ed-Fi from simpler to more complex use cases has also meant that many indicators only support case #1.

Current Situation

To take a very simple example, consider this data model; this not a part of the Ed-Fi data model today, but it illustrates the problem as it is found in the Ed-Fi model:

Figure 1: a hypothetical common situation today where an indicator has no dates attached.

The obvious solution is to add dates, but due to the natural key system of Ed-Fi, the adding of dates is not straightforward. The model above illustrates why this is so: to add dates to StudentIndicator (as optional or required) in a non-breaking fashion would not solve use case #2, as the data model would reject multiple records for the same student and IndicatorName:

Figure 2: The attempt to use this model longitudinally will fail as the StudentUniqueId and the IndicatorName form the primary key; a second record for this same combination cannot therefore be loaded. From: Situation.sql

The obvious solution is to add BeginDate to the key, and that option is sometimes justified. However, such a change has some drawbacks, as we will see below.

This document is an attempt to "step back" and consider this problem holistically.  To do that, this analysis examines multiple options below for adding longitudinality.

Options

The 3 main options we will consider are depicted in this diagram:

Generated from: Example.sql

Option #1

This option is covered in StudentIndicator1 above. In this model, BeginDate is added to the key. This solves the core longitudinality problem as now it is possible to submit multiple records for the same indicator. 

PROCON
Allows the indicator to be longitudinalIn some cases, this forces a source system to invent a begin date, if the indicator does not have a date in the system. This can be OK for point-to-point transmission, but in an ecosystem an invented date can cause confusion or even errors. Typically, invented dates are set to term dates or the such.
Can help expand data and data quality by requiring more data that can be used in analysis.Breaking change

Dates have been found to be poor key fields, even when they are highly monitored, as they are frequently mis-entered into systems. This is likely not a strong drawback as indicators are generally "wipe-and-replaced" when their host entity is updated. However, this drawback does question if such an entity should have independent identity at all (see Option #3).
APPROPRIATE USES
This model is appropriate when the date being added is highly managed and widely available in the source system - i.e. when a date does not have to be invented. An example of such a date is an enrollment date or the date a student is entered or exited from a program, as such dates tend to be carefully managed.


Option #2

In this option, longitudinality is delivered via a Period entity attached to the entity. To make this option work, the key of StudentIndicator has to be expanded to include not only the IndicatorName, but also the IndicatorValue, as a date history must be written for each name/value pair.

PROCON
Allows the indicator to be longitudinalCan create complex data management for clients, as periods must be attached to each name/value pair and in the source system, this is not likely to be the case (a unified history is more likely)
Use of longitudinality is optional - not need to invent dates when those don't exist.More difficult for clients to consume, as the date history for an entity is fragmented across records.

Long key for entities can possibly be problematic in database systems.

Breaking change
APPROPRIATE USES
This is not a strong option due to the number of cons; it is a possible option for situations when dates may or may not be present.

Option #3

In this option, the entity has no primary key, but has a foreign key/reference back to Student.

PROCON
Allows the indicator to be longitudinalIt is possible for records to be inconsistent, e.g. same indicator can have multiple values
Use of longitudinality is optional - not need to invent dates when those don't exist.
Likely non-breaking for submitters and consumers
APPROPRIATE USES
When dates may or may not be present and for indicators strongly managed by source systems, as the source system needs to accept more responsibility for data quality.
When there is a strong desire to avoid breaking changes or otherwise limit change.


Earlier Analysis