Introduction to Data Quality - Quick Start
- Vinaya Mayya
- Ian Christopher (Deactivated)
- Mark Ramon
- Miguel Kaminski
In this primer on data quality, you will check your data for errors via provided stored procedures. In the Introduction to API - Quick Start you were shown how to explore the API through the use of Postman. The first collection, SEA Starter Kit, not only exercised the API by executing GET/POST requests — but also introduced several data anomalies for use in this validation guide.
NOTE: In the API guide you were instructed to import the Postman environment and collection. There is a second collection, "SEA Modernization Starter Kit Rectification", required for this Validation exercise. If you imported the collection and worked through the Introduction to API guide, then you are set. If not, import it using Postman, but do not run that collection yet.
Now it's time to run our validation process. Connect to your server using SQL Server Management Studio (SSMS), then use Object Explorer to navigate to your Ed-Fi ODS. Expand the "Programmability" folder and then "Stored Procedures." You will find the included stored procedure that will perform some validation checks on your data.
Execute the stored procedures by right-clicking and selecting "Execute Stored Procedure."
Upon being prompted to provide values for the parameters, add the values "all" and "2023" for @StateOrganizationId and @Datayear respectively as shown below and click "Ok" to execute the procedure.
Once you've run "validation.LoadValidationErrors" procedure, navigate to the "Tables" folder in the Object Explorer and expand it to view the database tables. Locate "validation.DistrictErrorLog". This table is where validation errors created by the above stored procedure are stored.
You can now write a simple query to view the records on the table, or simply right-click and select "Select Top 1000 Rows" if you have fewer errors on the table and have no need to filter the results.
The result of the query for our example shows all the errors stored to the "validation.DistrictErrorLog" table. You'll notice the validation procedure found one student record with over 100% FullTimeEquivalency and another record with no race reported for the student.
Now that we've worked through running validations and examining the errors, let's use another Postman collection to resolve the errors. Open up Postman as an Administrator if it's not already open. Navigate to the Collections tab on the left side of the application screen and select it. You'll see the collections you've imported, including the "SEA Modernization Starter Kit Rectification" collection we covered in the opening section of this guide. Locate the collection and select it, then click on the "Run" button.
This will pull up the Runner window in Postman. Just like when you ran the initial collection, you will see all of the GET/POST requests selected. Click on "Run SEA Modernization Starter Kit Rectification" button as shown below.
A successful run will show all "Pass" values and status codes of 200 or 201.
After the Rectification collection has run, return to SSMS and rerun validation stored procedure.
View the results in "validation.DistrictErrorLog" table. Your output should now show no errors, as the Rectification collection has resolved the data errors.
Reminder
States validate data according to the state business rules and log any errors. Typically, errors are reported back to the LEA via a state error portal. The LEA then fixes the errors, and the data is re-transmitted to the API. This Quick Start contains sample validation scripts.