Indicators with Option Sets

The descriptor is the primary Ed-Fi modeling pattern used to model situations in which the value of an indicator must be constrained to a member of a pre-defined option-set. Ed-Fi API specifications require that descriptor values be constrained values to those options, and failure to meet that requirement will block an API transaction.

Of course, it is sometimes possible and allowed for new option set values for a descriptor to be introduced by API clients into a data exchange context, but that capability can be monitored and controlled.

Descriptors, however, have limitations:

They are not longitudinal / do not have dates attached.¹ This makes it difficult to track their changes over time. For example, descriptors provide no native way to track that – during a school year – a student was at some point evaluated to be in grade level 10, but later (perhaps when the student's credits were re-counted) judged to be in grade level 11.² The model does not provide a means of attaching dates to such changes.
To introduce a new descriptor (not a new descriptor value) is complex as it requires the organization to extend the data model. This can be a problem In contexts where there is a need for more agility due to rapid change or short timelines (e.g., such a organizing community collaboration – see DATASTD-1459 - Getting issue details... STATUS ). In some cases, such additions are also understood to be local in nature; i.e. have little or no value in a larger data exchange context.

This document looks at solutions – alternative models, design patterns, etc. – that could address these problems.

Note that this analysis overlaps somewhat with the analysis on Longitudinal Indicators

Tickets: DATASTD-1480 - Getting issue details... STATUS , DATASTD-1459 - Getting issue details... STATUS

Requirements

The pattern must be able to be longitudinal, but also not require longitudinality
The pattern must be simple to implement and flexible
Ideally, the pattern constrains indicator values to the canonical set defined by the data exchange context (and potentially an "edfi" context") – i.e., it should be simple to recognize and fail a transaction that does not use an allowed value

Proposed

One pattern proposed appears below. This pattern is displayed as a database model. It is applied specifically below to student indicators.

Figure 1: a proposed model. Click to expand. Note that this example and the sample implementation below shows a relational database implementation as a convenience for explanation. The foreign keys here should be understood to be references between entities that would be enforced by the API host.

This model sets up StudentIndicatorMeasurement and StudentIndicatorMeasurementValue as separate domain entities. These would be managed by the API host in most situations, but it is possible that write access could be allowed if necessary.
- This pattern would be replicated for other domain entities likely to have indicators attached - e.g., EducationOrganizationsIndicatorMeasurement, etc.,
Values can be directly referenced as a collection attached to StudentEducationOrganizationAssociation, which enforces that a student only has one value for each StudentIndicatorMeasurement, while also enforcing that the combination of the Measurement and Value exist in the separate domain entity through referential integrity.

Sample Data

One example of this is the collection of indicators intended to track student digital equity. The current proposal advises storing the metadata and values in StudentIndicator. The main drawback to this approach is the lack of a controlled vocabulary. Indicator is an open string so there is a greater chance of data quality issues. Using the proposed pattern above, the data for tracking a student's internet access could look something more like the following:

The first two code blocks are used to populate the metadata that defines the StudentIndicatorMeasurement and StudentIndicatorMeasureValue records. StudentIndicatorMeasurement defines the individual student indicators and StudentIndicatorMeasurevalue defines the possible values that can be submitted for each student indicator.

StudentIndicatorMeasurement

{
  "indicatorName": "InternetAccessInResidence",
  "namespace": "uri://ed-fi.org/",
  "description": "The level of access to the internet on a primary learning device at home.",
  "indicatorGroup": "DigitalEquity",
}

StudentIndicatorMeasurementValue

{
  "indicator": "Yes",
  "studentIndicatorMeasurementReference": {
    "indicatorName": "InternetAccessInResidence",
    "namespace": "uri://ed-fi.org/",
  },
  "description": "Yes, the student has internet access in residence.",
},
{
  "indicator": "No - Not Available",
  "studentIndicatorMeasurementReference": {
    "indicatorName": "InternetAccessInResidence",
    "namespace": "uri://ed-fi.org/",
  },
  "description": "No, the student does not have internet access in residence.",
},
{
  "indicator": "No - Not Affordable",
  "studentIndicatorMeasurementReference": {
    "indicatorName": "InternetAccessInResidence",
    "namespace": "uri://ed-fi.org/",
  },
  "description": "No, the student does not have and cannot afford internet access in residence.",
},
{
  "indicator": "No - Other",
  "studentIndicatorMeasurementReference": {
    "indicatorName": "InternetAccessInResidence",
    "namespace": "uri://ed-fi.org/",
  }
}

Once the metadata is defined for a student indicator measurement, StudentIndicatorResponse records can be created. Note the following:

An attempt to submit a StudentIndicatorResponse can be made to fail if either the studentIndicatorMeasurementReference or the studentIndicatorMeasurementValueReference is not referenceable (i.e., not in the API host)
Period can be used to add a date history to values, providing for longitudinality.

StudentIndicatorResponse

{
   "studentEducationOrganizationAssociationReference": {
    "educationOrganizationId": 255901,
    "studentUniqueId": "111111"
  },
  "studentIndicatorMeasurementReference": {
    "indicatorName": "InternetAccessInResidence",
    "namespace": "uri://ed-fi.org/"
  },
  "studentIndicatorMeasurementValueReference": {
    "indicator": "No - Not Available",
    "indicatorName": "InternetAccessInResidence",
    "namespace": "uri://ed-fi.org/",
  },
  "periods": [
    {
      "beginDate": "2020-03-20",
      "endDate": "2020-09-04"
    }
  ]
},
{
  "studentEducationOrganizationAssociationReference": {
    "educationOrganizationId": 255901,
    "studentUniqueId": "111111"
  },
  "studentIndicatorMeasurementReference": {
    "indicatorName": "InternetAccessInResidence",
    "namespace": "uri://ed-fi.org/"
  },
  "studentIndicatorMeasurementValueReference": {
    "indicator": "Yes",
    "indicatorName": "InternetAccessInResidence",
    "namespace": "uri://ed-fi.org/",
  },
  "periods": [
    {
      "beginDate": "2020-09-07",
      "endDate": "2021-05-28"
    },
    {
      "beginDate": "2020-01-01",
      "endDate": "2020-03-19"
    }
  ]
}

Attached are SQL scripts to generate the student indicator tables as extension tables to the core Ed-Fi model.

StudentIndicatorModelScripts.zip

Analysis

Change Management

In this model "configuration is data" so it would make it very simple to add new indicators. However, given the key to StudentIndicatorMeasurement it would not be simple to change the indicator name given the manner in which that key is replicated to other entities. That could possibly be addressed by using a surrogate key field (e.g., StudentIndicatorMeasurementIdentifier) but such a model places the data model at higher risk of accidental duplication/replication of indicators (i.e., if the client does not manage the surrogate key carefully, then multiple versions of the same indicators can appear; said another way, the name provides for some stability for the identity).

The alternative is to see change to indicators as a process of introducing new values and retiring older ones; that process is likely preferable.

Cannibalizing Descriptors / Impacts on Standardization Efforts (or "New and Improved Descriptors")

Given this mode, the temptation would be for agencies to start expressing current descriptors as indicators. Doing so would save time as there would be no extension required. That could represent a significant change and a move away from a proven pattern.

Is that a problem?

Absence from schema = lower visibility. As the descriptor is not part of the schema, it becomes less visible in many contexts. That makes the core concept modeled by the indicator less visible, and could degrade the likelihood of community collaboration on standards over time. Note that the presence of a namespace in the indicators does provide a useful anchor points for collaboration, as it establishes ownership over indicators and values.
Flexibility and agility. Clearly indicators modeled in this fashion are easier to manage and change, as they are not part of the data schema. In some ways, this could enhance the agility of the community. For example, the Digital Equity data collection efforts of the Ed-Fi community (from which the examples above were taken) adopted a similar "indicator" like pattern initially to allow for simpler community collaboration (see /wiki/spaces/EFDSDRAFT/pages/22773854).

It is clear that if such a pattern was adopted, as a data standardization effort there would still be considerable value for the Ed-Fi community in promoting descriptors as the primary means for capturing core concepts of the shared data model.

Advantages to Schema as Data

This model transfers descriptors into data, which was noted as a potential drawback in terms of suppressing visibility above. However, there are some possible advantages:

In this model, it would be much easier to track a history of descriptors, as they are managed as data. This may also make certain related processes like data migration simpler (since we can evolve descriptors over time, there is no need to migrate older references to values to newer ones in an operational analytics system.
Operational context: the notion that mapping descriptors to other descriptors is critical in the movement of data between different operational contexts. If descriptors are data, such a mapping theoretically becomes simpler to implement.

1.Note that descriptor values do have "effective dates" in the Ed-Fi ODS. However, these are not part of the data standard, and in any case for the ODS they capture the period the value was "effective"/valid, not when the value was valid in the context of an individual data element,

2. Note that the primary purpose of the Ed-Fi data model is to describe data in transit and not data at rest. Therefore, the Ed-Fi data model generally does not focus on longitudinal tracking unless the source systems are themselves capable of providing dates (e.g., effective dates, change dates, etc.) for elements they provide.