Ed-Fi Level 2 Data Validation System Requirements

Prepared for the Ed-Fi Alliance by: Jonathan Hickam, Learning Tapestry

Contents

Introduction

This document has been prepared as part of a larger initiative that is looking at scalable, economical, and reusable solutions for level 2 validations.  For further context, architecture, and vocabulary refer to the associated Ed-Fi Validation Architecture document.

Validation System Functional Requirements

Validation Rule Specification

The validation engine must allow for the specification of level 2 and greater validation rules:

  1. Each validation rule must have a unique id.
    1. If a validation rule is removed, then the id must not be recycled.  This could cause confusion de-duplicating future validation results.
  2. Validation engine must be able to specify different severities of validation events.
    1. Validation engine must be able to specify a 'validation warning' to denote that a situation has been observed that may or may not be problem but will require further attention.
    2. Validation engine must be able to specify 'minor validation errors' to denote that a situation has been observed will likely present a problem but may present little consequence.
    3. Validation engine must be able to specify 'major validation errors' to denote that a situation has been observed that will present a problem and will likely have significant impact.
  3. Validation engine must be able to validate against any entity in the ODS
  4. Validation engine must be able to validate using a variety of conditionals.
    1. Conditionals must be able to evaluate a variety of data sources.
      1. Conditionals must be able to evaluate using attributes on the target entity.
      2. Conditionals must be able to evaluate using attributes on child entities.
      3. Conditionals must be able to evaluate using attributes on parent entities.
      4. Conditionals must be able to evaluate using explicitly defined static values in the validation rule.
      5. Conditionals must be able to evaluate using the current date/time in the validation rule.
      6. Conditionals must be able to evaluate using basic math functions on attributes.
      7. Conditionals must be able to evaluate on an aggregate of a child attribute (SUM, MAX, MIN, AVG, COUNT).
    2. The validation engine must be able to do several types of evaluations for conditionals.
      1. Conditionals must be able to evaluate for the presence or absence of an entity (EXISTS / NOT EXISTS).
      2. Conditionals must be able to evaluate for the presence or absence of an attribute (NULL / NOT NULL) .
      3. Conditionals must be able to evaluate if an attribute is equal, not equal, greater than, or less than a provided value.
      4. Conditionals must be able to evaluate an attribute for membership or exclusion from an explicit list of values (IN / NOT IN).
      5. Conditionals must be able to evaluate an attribute against wildcard text matching (LIKE / NOT LIKE).
      6. Conditionals must be able to evaluate a string (VARCHAR) attribute against an expected formats (alpha numeric vs digits, expected values, string length).
    3. A conditional must be able to be made up of several other conditionals.
      1. Conditionals can be combined using standard logic operators (AND / OR / NOT).
      2. Conditionals can be combined using explicit evaluation sequence (this is usually done in SQL using parenthesis).

Validation Rule Control

The validation engine must allow for the configuration and control of validation rules.

  1. The validation engine must allow for the management of a set of validation rules.
    1. Validation rule sets must be able to be scheduled to run at a given time-of-day on a daily basis.
    2. Validation rule sets must be able to be scheduled to run at a given day-of-week / time-of-day on a weekly basis.
    3. Validation rule sets must be able to be scheduled to run on a specific day-of-month and time-of-day on a monthly basis.
    4. Validation rule sets must be able to be scheduled to run on specific calendar days / time-of-day on an annual basis.
    5. Within a set, some validation rules should be dependent on the successful completion of prerequisite rules. For example, there may be some validation rules that would fail on a student and then it would not make sense to run any more validation rules on that student.
  2. The validation engine must show the status and history of validation activity.
    1. The validation engine must have a way to show what (if any) validation sets or individual validation rules are currently running.
    2. The validation engine must have a way to show the history an individual validation rule, including:
      1. Last time the rule was run.
      2. How many validation results were produced.
    3. The validation engine must show the history of a validation rule set, including:
      1. Last time the rule set was run.
      2. What individual rules were included in that run.
      3. For the validation rule set, the total number of validation results raised.
      4. For each individual validation rule what was the number of validation results raised.
  3. The validation engine must allow for the management of individual validation rules.
    1. Validation rules must be able to be added by end-user.
    2. Validation rules must be able to be deleted by end-user.
    3. Validation rules must have the ability to be enabled or disabled by the end-user. A disabled validation rule is still in the system but it will not evaluate.
    4. The validation engine must allow for ad-hoc initiation of individual validation rules for a single run.
    5. The validation engine must allow for individual validation rules to be added or removed from one or more validation rule sets.
  4. Validation rule evaluation must be triggered by a scheduling mechanism.
    1. The scheduling mechanism must have provisions for running a set of rules at a given time-of-day (e.g. - run the main batch of rules at night).
    2. The scheduling mechanism must have provisions for running a set of rules at a given time-of-day and day-of-week (e.g. - run the weekly rules on Sunday at 2:00 AM).
  5. The validation engine must have an optional mechanism for limiting the number of validation results that an individual validation rule can raise.
    1. A validation rule must have a 'maximum number of validation results' threshold that would limit the total number of validation results for a given run of the validation rule
    2. A validation rule must have a 'maximum number of validation results per school' threshold that would limit the total number of validation results per school for a given run of the validation rule. This would only apply if the target entity is associated with a school.
    3. The validation engine must have a way to denote that the result set from a validation rule was limited do to such a threshold.
  6. Validation rules must be stored in a database.
    1. Validation rules must be able to be backed up so that the set of validation rules can be recovered to a specific point in time.
    2. Validation rules must have the ability to be implemented via an automated deployment script.
    3. Validation rules MAY be able to be versioned in a git repository.
    4. Validation rule versions must be able to be associated with a school year.
  7. Validation rules must be able to be exported and imported to another system with the same validation engine.
    1. Validation rules exports and imports must use locally assigned primary ids to avoid duplicates.  This means that if a validation rule is exported in one system and imported into another, then its primary id in the importing system will be assigned from the importing system and not reused from the exporting system.

API Availability

Validation Results must be available via an API

  1. Validation or collection, storage, distribution and reporting via an API and repository, as described in Ed-Fi Validation API Design.

Ed-Fi Interoperability

The validation engine must run against an Ed-Fi ODS database.

  1. The validation engine must be able to access all core ODS data objects.
  2. The validation engine must be able to access all extensions.
  3. The validation engine is only required to access the data that exists in an Ed-Fi ODS at the time that the validation run is happening.  Specifically, it cannot validate against previous values or data sets or changes.

Reporting

The validation results must be available for reporting with the following pieces of data:

  1. Aggregate total number of validation results.
  2. Aggregate total number of target entities analyzed.
  3. Aggregate total time elapsed doing validation.
  4. Aggregate start (min) clock time of validation.
  5. Aggregate end (max) clock time of validation.
  6. Computed value of % of scenarios with validation result.
  7. Dimension of validation rule.
  8. Dimension of education organization.

Level 3 Validations

The validation engine must have a method for delegating 'level 3' validation analysis to another processing system.

  1. The validation engine must have a method for initiating procedural scripts for evaluating level 3 validation rules.
    1. The validation engine must be agnostic to language with respect to initiating other scripts.
    2. The validation engine must be able to pass a validation rule code to the script(s) processing level 3 validation rules for inclusion to the validation results.

Non-functional Requirement Considerations

The following considerations should be considered in the evaluation of rules engines. Unlike the functional requirements these are not a matter of a particular solution complying or not complying with the requirements. Instead, these are considerations that should be accounted for when looking at validation engines.

AreaRequirements
System performanceThe validation engine must be able to complete all necessary validations during after-hours processing times. The amount of data, the amount of rules, and the complexity of rules will all impact this metric.
Reliability metricsAre there expectations about how often the solution should be available? 
System Architecture: 

What sort of development languages and frameworks are used in the solution?

Software Dependencies

What type and versions of operating systems are supported by the solution? What, if any, versions of databases are supported by the solution?

Cost and Licensing

What initial and recurring licensing costs are associated with the solution and its required infrastructure?

Required Skills

What sort of skill sets are required by the end users of the solution?

Reference documents