Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Prepared for the Ed-Fi Alliance by: Jonathan Hickam, Learning Tapestry

Contents

Table of Contents
maxLevel2

Introduction

This document has been prepared as part of a larger initiative that is looking at scalable, economical, and reusable solutions for level 2 validations.  For further context, architecture, and vocabulary refer to the associated Ed-Fi Validation Architecture document.

A high-level assessment of a wide variety of open source solutions was undertaken that could potentially address all or some of the Ed-Fi Level 2 Data Validation System Requirements.  A high-level analysis of the tools that were considered can be found hereat  Survey of Rule Engines.  However, the final approach taken was to take a deeper look at three different types of solutions from this list. The first one was to look at an open-source business rules engine . The second was to experiment with using a database-centric testing framework. The third was to look at what a pure-SQL implementation without external tool dependencies might look like.

Option 1: Business rule engine solution

Our initial hypothesis when we started this evaluation was that one of the many open source business rules engines would be able to address a significant part of our requirements.  During our analysis we settled in on two Java-based business rules engines, Drools (https://www.drools.org/) and Open Rules (https://openrules.com/).  For our prototype we decided to try Open Rules, which is geared more towards business analysts instead of Java developers.

...

The resulting spreadsheet can be seen here.

Results

Business rule engines are well suited for complex logic flows and decision trees.  However, most of the validation rules that were gathered as examples do not necessarily have complex linear decisions, it is more of a matter of finding specific examples in a complex data set.   This means that expressing these rules in the language that the rules engine is looking for might require a level of abstraction and complexity than it would be to do something closer to the data.

...

If an organization is already using a business rules infrastructure, it could be adapted to address the validation engine requirements.  This would be especially well suited for complex validations that had multiple decision points. However, for an organization that is just looking to solve for the requirements identified, the overhead and complexity of this type of solution would be hard to justify.

Option 2: Testing framework solution

Test driven and continuous integration development methodologies are widely considered best practices in contemporary software development shops, especially those embracing agile development.  The ubiquity of these approaches has led to a wide variety of automated testing frameworks, some of which are very database-centric. Our hypothesis is that such a tool could be useful for implementing a data validation testing infrastructure.

...

Testing infrastructures do provide a tool set that would allow for a standardized methodology for storing and defining tests.  Potentially this could be leveraged for data validation tests, but the toolset would have to be heavily customized so that it would output specific instances of non-compliant data and so that it would output results to the API.  If an organization was already heavily invested in a testing infrastructure this work could be justified, otherwise it might be a hard sell because the amount of effort to implement the customizations is probably less then just doing something from the ground up (per the previous example).

Option 3: Pure SQL Ed-Fi Data Validation Proof-Of-Concept

Overview

This solution shows a simplified version of a data validation engine built using only dynamic SQL. 

...

The data model includes three tables:


TableDescription

ValidationRun

This table shows the status of runs.  It will have one row for each time the validation engine is started. This should evolve into a community standard.

ValidationResults

This table has all of the validation results. This is the data that would be consumed by the SIS via the API.  This table has a foreign key that ties it back to the validation run. This table should also evolve into a community standard.

ValidationRules

This is a table that has the SQL that produces validation results.  This is what drives the T-SQL validation engine, and should serve as a starting point and example.  This would NOT evolve into a community standard - different organizations could implement this functionality in their preferred technology.

Processing Logic

The prototype includes a stored procedure, RunValidationRules, that will iterate through all of the rules in the ValidationRule table for a given ValidationRuleSet and populate the results into the API tables.  When called with a ValidationRuleSet it does:

...

  1. Obtain the Glendale sample database from here  and restore it to a local SQL Server database.
  2. Change to the local SQL Server database.
  3. Create a new schema 'dv' for the data validation objects.

    Code Block
    languagesql
    create schema dv;


  4. Create the Validation API tables and constraints.

    IF
    Code Block
    Code Block
    languagesql
    languagesql

    Create the rules table.

    Code Block
    languagesql

    Create the stored procedure.

    collapsetrue
    -- drop FK(s) at the beginning
    IF (OBJECT_ID('dv.RunValidationRulesFK_ValidationResult_ValidationRunId', 'PF') IS NOT NULL)
    BEGIN
        ALTER TABLE dv.ValidationResult DROP CONSTRAINT FK_ValidationResult_ValidationRunId
    END
    
    -- tracking table for validation runs
    IF OBJECT_ID('dv.ValidationRun', 'U') IS NOT NULL 
    BEGIN
      DROP TABLE dv.ValidationRun; 
    END
      
    
    IF OBJECT_ID('dv.ValidationRun', 'U') IS NOT NULL 
    BEGIN
      DROP TABLE dv.ValidationRun; 
    END
      
    Create table dv.ValidationRun
    	(ValidationRunId int not null identity primary key,
    	ValidationRunStatus varchar(60)  not null,
    	ValidationRunStartDateTime datetime not null default current_timestamp,
          ValidationRunSource varchar(60),
    	ValidationRunFinishDateTime datetime);
    
    --tracking table for validation results
    IF OBJECT_ID('dv.ValidationResult', 'U') IS NOT NULL 
    BEGIN
      DROP TABLE dv.ValidationResult; 
    END
     
    Create table dv.ValidationResult
    	(ValidationResultId int not null identity primary key,
    	ValidationRunId int ,
    	ValidationRuleId Varchar(60) not null,
    	ValidationResourceId varchar(60) not null,
    	ValidationResourceType varchar(60) not null,
        ValidationSeverity varchar(60),
    EducationOrganizationId int,
    NameSpace varchar(255),
    ValidationDetails varchar(2056),
    ValidationCategory varchar(60),
    	ResultCreatedTime datetime default current_timestamp);
    
    
    
    -- foriegn key
    ALTER TABLE dv.ValidationResult 
    	ADD  CONSTRAINT FK_ValidationResult_ValidationRunId FOREIGN KEY(ValidationRunId)
    		REFERENCES dv.ValidationRun (ValidationRunId)
    GO


  5. Create the rules table.

    Code Block
    languagesql
    collapsetrue
    IF OBJECT_ID('dv.ValidationRules', 'U') IS NOT NULL
    BEGIN 
      DROP TABLE dv.ValidationRules; 
     END
    
    create table dv.ValidationRules
    (ValidationRuleId Varchar(60) not null primary key,
    ValidationSQL Varchar(2024) not null,
    ValidationRuleSet	varchar(60),
    ValidationSeverity varchar(60),
    ValidationCategory varchar(60),
    RuleDescription Varchar(2024)
    )
    go


  6. Create the stored procedure.

    Code Block
    languagesql
    collapsetrue
    IF OBJECT_ID('dv.RunValidationRules', 'P') IS NOT NULL
    BEGIN 
      DROP PROC dv.RunValidationRules; 
     END
    go
    
    
    CREATE PROC  dv.RunValidationRules  @ValidationRuleSet varchar(60)
    AS
    	DECLARE @ValidationRuleId  varchar(max);
    	DECLARE @ValidationSQL  varchar(max);
    	DECLARE @ValidationSeverity  varchar(max);
    	DECLARE @ValidationCategory  varchar(max);
    	
    	Declare @RunId INT;
    
    	-- build a cursor that gets all of the validations for a given rule set
    	DECLARE db_cursor CURSOR for 
    		SELECT ValidationRuleId,
    				ValidationSQL,
    				ValidationSeverity,
    				ValidationCategory
    		FROM dv.ValidationRules
    	WHERE ValidationRuleSet = @ValidationRuleSet;
    
    
    	-- insert a row in the tracking table
    	insert into dv.ValidationRun(
    	validationRunStatus, ValidationRunSource) VALUES
    	('Running', @ValidationRuleSet);
    
    	--capture the ID from that insert
    	SELECT @RunId =  SCOPE_IDENTITY() ;
    
    	OPEN db_cursor;
    
    	FETCH NEXT FROM db_cursor INTO @ValidationRuleId,
    				@ValidationSQL,
    				@ValidationSeverity,
    				@ValidationCategory;
    	WHILE @@FETCH_STATUS = 0
    	BEGIN
    		-- sub in the following values
    		set @ValidationSQL= REPLACE(@ValidationSQL,'~SEVERITY~',@ValidationSeverity);
    		set @ValidationSQL= REPLACE(@ValidationSQL,'~VALIDATION_RULE_ID~',@ValidationRuleId);
    		set @ValidationSQL= REPLACE(@ValidationSQL,'~VALIDATION_CATEGORY~',@ValidationCategory);
    		set @ValidationSQL= REPLACE(@ValidationSQL,'~VALIDATION_RUN_ID~',@RunId);
    		set @ValidationSQL = concat('
    			INSERT into dv.ValidationResult
    				(ValidationRunId, ValidationRuleId, ValidationResourceId, ValidationResourceType, ValidationSeverity,
    				EducationOrganizationId, Namespace, ValidationCategory, ValidationDetails) 
    				 ', @ValidationSQL);
    	
    		print(@ValidationSQL);
    		exec(@ValidationSQL);
    				
    	FETCH NEXT FROM db_cursor INTO @ValidationRuleId,
    				@ValidationSQL,
    				@ValidationSeverity,
    				@ValidationCategory;
    
    	-- Tell the tracking table we are done
    	
    	update dv.validationRun
    	set ValidationRunStatus = 'Finished',
    		ValidationRunFinishDateTime = current_timestamp
    	where ValidationRunId = @RunId;
    	END
    
    	close db_cursor
    	DEALLOCATE db_cursor
    go


  7. Insert the sample rules into the rules table.

    Code Block
    languagesql
    collapsetrue
    INSERT INTO dv.ValidationRules (ValidationRuleId, 
    ValidationSQL,
    ValidationRuleSet,
    ValidationSeverity,
    ValidationCategory,
    RuleDescription)
    VALUES
    ('1.1',
    'SELECT top(10)
    	~VALIDATION_RUN_ID~ ValidationRunId,
    	''~VALIDATION_RULE_ID~'' ValidationRuleId,
    	StudentUniqueId as ValidationResourceId,
    	''Student'' as ValidationResourceType,
    	''~SEVERITY~'' as ValidationSeverity,
    	SchoolId as EducationOrganizationId,
    	'''' Namespace,
    	''~VALIDATION_CATEGORY~'' ValidationCategory,
    	'''' ValidationDetails
    FROM (	
    	select DISTINCT
    		c.StudentUniqueId ,
    		SchoolId
    	from edfi.student c
    	JOIN edfi.StudentSchoolAssociation on c.studentUSI = StudentSchoolAssociation.StudentUSI
    	LEFT JOIN 
    		(SELECT studentUSI, b.CodeValue
    		FROM edfi.studentAddress a
    			JOIN edfi.AddressType b	on a.AddressTypeId = b.addressTypeId
    		and b.CodeValue in ( ''Physical'', ''Home'') 
    		) d on c.studentUSI = d.studentUSI
    		)e',
    'nightly',
    'Warning',
    'Student',
    'Raise a validation warning for every student that does not have a physical address specified. Limit result set to 10.');
    
    
    
    INSERT INTO dv.ValidationRules (ValidationRuleId, 
    ValidationSQL,
    ValidationRuleSet,
    ValidationSeverity,
    ValidationCategory,
    RuleDescription)
    VALUES
    ('1.2',
    'SELECT top(10)
    	~VALIDATION_RUN_ID~ ValidationRunId,
    	''~VALIDATION_RULE_ID~'' ValidationRuleId,
    	StudentUniqueId as ValidationResourceId,
    	''Student'' as ValidationResourceType,
    	''~SEVERITY~'' as ValidationSeverity,
    	SchoolId as EducationOrganizationId,
    	'''' Namespace,
    	''~VALIDATION_CATEGORY~'' ValidationCategory,
    	ValidationDetails
    FROM (
    SELECT distinct studentUniqueId, SchoolId,
    concat(''exit date '',ExitWithdrawDate) ValidationDetails
    from edfi.studentSchoolAssociation JOIN
         edfi.Student on student.StudentUSI = StudentSchoolAssociation.StudentUSI
    where ExitWithdrawDate is not null
    and ExitWithdrawTypeDescriptorId is null) tab',
    'nightly',
    'Major Validation Error',
    'Student',
    'Raise a major validation error for every StudentSchoolAssociation where there is an ExitWithdrawDate specified but not an ExitWithdrawDescriptor specified. Limit result set to 10.');


  8. Run the stored procedure and check out the results.


    Code Block
    languagesql
    exec dv.RunValidationRules 'Nightly';
    Select * from dv.ValidationRun;
    Select * from dv.ValidationResult;


...