Ed-Fi Sample Data Generator


The Sample Data Generator (SDG) produces  realistic, cohesive, yet fictional datasets for use in demonstrations, testing and other scenarios useful to Ed-Fi implementations, without using actual data.  The SDG prefers statistically realistic patterns (e.g., a student with poor attendance generally tracks to poor grades, students are the appropriate age for their grade level, students who are English learners have home languages that track to their ethnicity, and so forth). The system is configurable, and can produce arbitrarily large datasets.  While the SDG creates data with realistic patterns, it is randomly generated and must not be used in place of real-world data for scenarios such as training for machine learning or other algorithmic approaches.

Starting with SDG v1.3, EdGraph has added a major contribution to the existing SDG code base to enable automated configuration of the SDG using NCES CCD data, which is a listing of local education agencies and schools.  SDG may be used either with automated configuration using NCES CCD pre-existing data to match realistic scenarios or manual configuration for other scenarios as needed.  The documentation below provides details for both use cases, as well as developer information for customizing or extending the tool.  SDG v1.3 supports Ed-Fi data standard v3.3a and older versions of the tool are available for prior data standard versions.



  • By: Major contributions from EdGraph in SDG v1.3 and Ed-Fi Alliance for versions SDG v1.0-1.2.
  • License Terms: Apache 2.0 License
  • Released: February 2022

At a Glance

Generation: Tech Suite 3