Prepared for the Ed-Fi Alliance by: Jonathan Hickam, Learning Tapestry
Contents
Table of Contents maxLevel 2
Introduction
This document has been prepared as part of a larger initiative that is looking at scalable, economical, and reusable solutions for level 2 validations. For further context, architecture, and vocabulary refer to the associated Ed-Fi Validation Architecture document.
A high-level assessment of a wide variety of open source solutions was undertaken that could potentially address all or some of the Ed-Fi Level 2 Data Validation System Requirements. A high-level analysis of the tools that were considered can be found hereat Survey of Rule Engines. However, the final approach taken was to take a deeper look at three different types of solutions from this list. The first one was to look at an open-source business rules engine . The second was to experiment with using a database-centric testing framework. The third was to look at what a pure-SQL implementation without external tool dependencies might look like.
Option 1: Business rule engine solution
Our initial hypothesis when we started this evaluation was that one of the many open source business rules engines would be able to address a significant part of our requirements. During our analysis we settled in on two Java-based business rules engines, Drools (https://www.drools.org/) and Open Rules (https://openrules.com/). For our prototype we decided to try Open Rules, which is geared more towards business analysts instead of Java developers.
...
The resulting spreadsheet can be seen here.
Results
Business rule engines are well suited for complex logic flows and decision trees. However, most of the validation rules that were gathered as examples do not necessarily have complex linear decisions, it is more of a matter of finding specific examples in a complex data set. This means that expressing these rules in the language that the rules engine is looking for might require a level of abstraction and complexity than it would be to do something closer to the data.
...
If an organization is already using a business rules infrastructure, it could be adapted to address the validation engine requirements. This would be especially well suited for complex validations that had multiple decision points. However, for an organization that is just looking to solve for the requirements identified, the overhead and complexity of this type of solution would be hard to justify.
Option 2: Testing framework solution
Test driven and continuous integration development methodologies are widely considered best practices in contemporary software development shops, especially those embracing agile development. The ubiquity of these approaches has led to a wide variety of automated testing frameworks, some of which are very database-centric. Our hypothesis is that such a tool could be useful for implementing a data validation testing infrastructure.
...
Testing infrastructures do provide a tool set that would allow for a standardized methodology for storing and defining tests. Potentially this could be leveraged for data validation tests, but the toolset would have to be heavily customized so that it would output specific instances of non-compliant data and so that it would output results to the API. If an organization was already heavily invested in a testing infrastructure this work could be justified, otherwise it might be a hard sell because the amount of effort to implement the customizations is probably less then just doing something from the ground up (per the previous example).
Option 3: Pure SQL Ed-Fi Data Validation Proof-Of-Concept
Overview
This solution shows a simplified version of a data validation engine built using only dynamic SQL.
...
The data model includes three tables:
Table | Description |
---|---|
ValidationRun | This table shows the status of runs. It will have one row for each time the validation engine is started. This should evolve into a community standard. |
ValidationResults | This table has all of the validation results. This is the data that would be consumed by the SIS via the API. This table has a foreign key that ties it back to the validation run. This table should also evolve into a community standard. |
ValidationRules | This is a table that has the SQL that produces validation results. This is what drives the T-SQL validation engine, and should serve as a starting point and example. This would NOT evolve into a community standard - different organizations could implement this functionality in their preferred technology. |
Processing Logic
The prototype includes a stored procedure, RunValidationRules, that will iterate through all of the rules in the ValidationRule table for a given ValidationRuleSet and populate the results into the API tables. When called with a ValidationRuleSet it does:
...
- Obtain the Glendale sample database from here and restore it to a local SQL Server database.
- Change to the local SQL Server database.
Create a new schema 'dv' for the data validation objects.
Code Block language sql create schema dv;
Create the Validation API tables and constraints.
Code Block Code Block language sql IF language sql Create the rules table.
Code Block language sql Create the stored procedure.
collapse true -- drop FK(s) at the beginning IF (OBJECT_ID('dv.RunValidationRulesFK_ValidationResult_ValidationRunId', 'PF') IS NOT NULL) BEGIN ALTER TABLE dv.ValidationResult DROP CONSTRAINT FK_ValidationResult_ValidationRunId END -- tracking table for validation runs IF OBJECT_ID('dv.ValidationRun', 'U') IS NOT NULL BEGIN DROP TABLE dv.ValidationRun; END IF OBJECT_ID('dv.ValidationRun', 'U') IS NOT NULL BEGIN DROP TABLE dv.ValidationRun; END Create table dv.ValidationRun (ValidationRunId int not null identity primary key, ValidationRunStatus varchar(60) not null, ValidationRunStartDateTime datetime not null default current_timestamp, ValidationRunSource varchar(60), ValidationRunFinishDateTime datetime); --tracking table for validation results IF OBJECT_ID('dv.ValidationResult', 'U') IS NOT NULL BEGIN DROP TABLE dv.ValidationResult; END Create table dv.ValidationResult (ValidationResultId int not null identity primary key, ValidationRunId int , ValidationRuleId Varchar(60) not null, ValidationResourceId varchar(60) not null, ValidationResourceType varchar(60) not null, ValidationSeverity varchar(60), EducationOrganizationId int, NameSpace varchar(255), ValidationDetails varchar(2056), ValidationCategory varchar(60), ResultCreatedTime datetime default current_timestamp); -- foriegn key ALTER TABLE dv.ValidationResult ADD CONSTRAINT FK_ValidationResult_ValidationRunId FOREIGN KEY(ValidationRunId) REFERENCES dv.ValidationRun (ValidationRunId) GO
Create the rules table.
Code Block language sql collapse true IF OBJECT_ID('dv.ValidationRules', 'U') IS NOT NULL BEGIN DROP TABLE dv.ValidationRules; END create table dv.ValidationRules (ValidationRuleId Varchar(60) not null primary key, ValidationSQL Varchar(2024) not null, ValidationRuleSet varchar(60), ValidationSeverity varchar(60), ValidationCategory varchar(60), RuleDescription Varchar(2024) ) go
Create the stored procedure.
Code Block language sql collapse true IF OBJECT_ID('dv.RunValidationRules', 'P') IS NOT NULL BEGIN DROP PROC dv.RunValidationRules; END go CREATE PROC dv.RunValidationRules @ValidationRuleSet varchar(60) AS DECLARE @ValidationRuleId varchar(max); DECLARE @ValidationSQL varchar(max); DECLARE @ValidationSeverity varchar(max); DECLARE @ValidationCategory varchar(max); Declare @RunId INT; -- build a cursor that gets all of the validations for a given rule set DECLARE db_cursor CURSOR for SELECT ValidationRuleId, ValidationSQL, ValidationSeverity, ValidationCategory FROM dv.ValidationRules WHERE ValidationRuleSet = @ValidationRuleSet; -- insert a row in the tracking table insert into dv.ValidationRun( validationRunStatus, ValidationRunSource) VALUES ('Running', @ValidationRuleSet); --capture the ID from that insert SELECT @RunId = SCOPE_IDENTITY() ; OPEN db_cursor; FETCH NEXT FROM db_cursor INTO @ValidationRuleId, @ValidationSQL, @ValidationSeverity, @ValidationCategory; WHILE @@FETCH_STATUS = 0 BEGIN -- sub in the following values set @ValidationSQL= REPLACE(@ValidationSQL,'~SEVERITY~',@ValidationSeverity); set @ValidationSQL= REPLACE(@ValidationSQL,'~VALIDATION_RULE_ID~',@ValidationRuleId); set @ValidationSQL= REPLACE(@ValidationSQL,'~VALIDATION_CATEGORY~',@ValidationCategory); set @ValidationSQL= REPLACE(@ValidationSQL,'~VALIDATION_RUN_ID~',@RunId); set @ValidationSQL = concat(' INSERT into dv.ValidationResult (ValidationRunId, ValidationRuleId, ValidationResourceId, ValidationResourceType, ValidationSeverity, EducationOrganizationId, Namespace, ValidationCategory, ValidationDetails) ', @ValidationSQL); print(@ValidationSQL); exec(@ValidationSQL); FETCH NEXT FROM db_cursor INTO @ValidationRuleId, @ValidationSQL, @ValidationSeverity, @ValidationCategory; -- Tell the tracking table we are done update dv.validationRun set ValidationRunStatus = 'Finished', ValidationRunFinishDateTime = current_timestamp where ValidationRunId = @RunId; END close db_cursor DEALLOCATE db_cursor go
Insert the sample rules into the rules table.
Code Block language sql collapse true INSERT INTO dv.ValidationRules (ValidationRuleId, ValidationSQL, ValidationRuleSet, ValidationSeverity, ValidationCategory, RuleDescription) VALUES ('1.1', 'SELECT top(10) ~VALIDATION_RUN_ID~ ValidationRunId, ''~VALIDATION_RULE_ID~'' ValidationRuleId, StudentUniqueId as ValidationResourceId, ''Student'' as ValidationResourceType, ''~SEVERITY~'' as ValidationSeverity, SchoolId as EducationOrganizationId, '''' Namespace, ''~VALIDATION_CATEGORY~'' ValidationCategory, '''' ValidationDetails FROM ( select DISTINCT c.StudentUniqueId , SchoolId from edfi.student c JOIN edfi.StudentSchoolAssociation on c.studentUSI = StudentSchoolAssociation.StudentUSI LEFT JOIN (SELECT studentUSI, b.CodeValue FROM edfi.studentAddress a JOIN edfi.AddressType b on a.AddressTypeId = b.addressTypeId and b.CodeValue in ( ''Physical'', ''Home'') ) d on c.studentUSI = d.studentUSI )e', 'nightly', 'Warning', 'Student', 'Raise a validation warning for every student that does not have a physical address specified. Limit result set to 10.'); INSERT INTO dv.ValidationRules (ValidationRuleId, ValidationSQL, ValidationRuleSet, ValidationSeverity, ValidationCategory, RuleDescription) VALUES ('1.2', 'SELECT top(10) ~VALIDATION_RUN_ID~ ValidationRunId, ''~VALIDATION_RULE_ID~'' ValidationRuleId, StudentUniqueId as ValidationResourceId, ''Student'' as ValidationResourceType, ''~SEVERITY~'' as ValidationSeverity, SchoolId as EducationOrganizationId, '''' Namespace, ''~VALIDATION_CATEGORY~'' ValidationCategory, ValidationDetails FROM ( SELECT distinct studentUniqueId, SchoolId, concat(''exit date '',ExitWithdrawDate) ValidationDetails from edfi.studentSchoolAssociation JOIN edfi.Student on student.StudentUSI = StudentSchoolAssociation.StudentUSI where ExitWithdrawDate is not null and ExitWithdrawTypeDescriptorId is null) tab', 'nightly', 'Major Validation Error', 'Student', 'Raise a major validation error for every StudentSchoolAssociation where there is an ExitWithdrawDate specified but not an ExitWithdrawDescriptor specified. Limit result set to 10.');
- Run the stored procedure and check out the results.
Code Block language sql exec dv.RunValidationRules 'Nightly'; Select * from dv.ValidationRun; Select * from dv.ValidationResult;
...