Like-My-District Configuration Using NCES CCD Data
As a design document, this may not 100% match the implementation.
Overview
Sample Data Generator is an Ed-Fi application that generates realistic, cohesive yet fictional data sets for use in Ed-Fi ODS / API instances. It generates sample data for scenarios commonly found in education, low-performing students consistent show low performance patterns for example, which is preferred to randomly generated sample data. Sample data from this tool can be used for demonstrations, load testing and other evaluative scenarios which SEAs and LEAs need in decisions for incorporating Ed-Fi technology into their environments. Sample Data Generator outputs in Ed-Fi XML which can be bulk imported into an ODS / API and offers much of the core data structure such as educational organization, students, staff, courses, attendance, discipline and more to provide a strong basis for test scenarios with rich data.
While Sample Data Generator is a well-built, well-documented tool, usage to date has been limited in the field despite advocacy and promotion. It has been observed and heard from end-users that Sample Data Generator has a significant learning curve to be able to configure and use for custom data use cases and the learning curve appears to be a blocker for the majority of end-users that seek to use the tool. It appears the tool is best suited for developers to customize the configuration, whereas data and business analysts have known to have need for a tool such as this and the configuration tasks appear as blocker to their work. This feature proposal sets forth an idea to help generate configuration for SDG to make the tool more usable "out-of-the-box" and accelerate usage in the field.
Feature Proposal
The feature proposal for Sample Data Generator is to find a method to pair NCES CCD data to generate SDG XML configuration tied to LEA/SEA IDs as a basis for sample data. With this, NCES CCD data would be used to generate SDG configuration that model known characteristics of LEA/SEAs, such as agency name and location data (from SEA down to school); student and staff counts; sex, race and ethnicity; free and reduced lunch program information, and disability information and other known attributes. The end-user would be prompted for a LEA or SEA NCES ID (lookup tool), then once entered, SDG would use that ID to retrieve information from NCES CCD source files to populate within SDG XML configuration. This would save the end-user the daunting and time-consuming task of modeling their agency manually by using pre-existing data as produced by NCES.
- EDFI-249Getting issue details... STATUS  has been registered to represent this feature request. EdWire has validated the need and flagged the feature request using NCES CCD data via Slack.
Noted, this feature will not solve all known issues for SDG. School calendar, course names and other data generated by "seed files" may need to be customized to match a LEA/SEA scenario. Not all domains and elements are generated by SDG, so would need code customization for other entities currently not supported. Also, each data model that is released triggers additional manual work within SDG that needs to be accommodated. These are not trivial efforts and will still exist as barriers to adoption and usage of the SDG.
Common Core of Data Overview
The U.S. Department of Education manages an office called the National Center of Education Statistics (NCES) which is responsible for many of the official statistics published by the Department. The Common Core of Data (CCD) is an annual survey of K-12 schools, districts and state agencies. The Common Core of Data should not be confused with "Common Core" as education standards, while they share a common name, they are not related in effort. The Common Core of Data contains statistics and directory information for state agencies, districts and schools across the U.S. and it's territories. Much of this data is generated via the EDFacts state submission process and is required for S/LEAs to receive federal funding. Below is a data overview of what can be found in NCES CCD data.
Data Overview
- State Education Agency (SEA) Level Data
- Data (CSV and SAS format)
- Directory Information - State Agency Name, State, Address (Mailing and Location), Web Site and Number of Schools in System
- Membership Information - Counts of Students in each state system, delineated by grade level, race/ethnicity and sex
- Staff Information - Counts of Staff in each state system, delineated by job type (elementary, secondary, counselors, support, etc)
- Documentation for Data (Excel format)
- Data (CSV and SAS format)
- Local Education Agency (LEA) Level Data
- Data (CSV and SAS format)
- Directory Information -Â District Agency Name, State, Address (Mailing and Location), Web Site and Number of Schools in System
- Membership Information -Â Counts of Students in each district system, delineated by grade level, race/ethnicity and sex
- Staff -Â Counts of Staff in each district, delineated by job type (elementary, secondary, counselors, support, etc)
- Children with Disabilities
- English Learners
- Documentation for Data (Excel format)
- Data (CSV and SAS format)
- School Level Data
- Data (CSV and SAS format)
- Directory Information -Â School Name, State, Address (Mailing and Location), Grade Levels offered
- Membership Information - Counts of Students in each school system
- Staff - Counts of Staff in each school
- School Characteristics
- Lunch Program EligibilityÂ
- Documentation for Data (Excel format)
- Data (CSV and SAS format)
Summary of Updates for Generating Data From NCES CCD Data
The proposal is to generate Sample Data Generator XML configuration and CSV seed data from NCES CCD data. Below is a listing of main touchpoints from NCES CCD data files to be used to input into Sample Data Generator's configuration. Please note, this is not a comprehensive guide and more data may be found in other files to enhance the base updates below.
Sample Data Generator Files
- Samples\SampleDataGenerator\SampleConfig.Xml (or NorthridgeConfig.xml) - base structure for generated Ed-Fi sample data
- Samples\SampleDataGenerator\DataFiles - directory of CSV seed data files
NCES CCD Updates into Sample Data Generator Configuration and Seed Data
Entity | NCES CCD File | XML Configuration File Updates (XPath) | CSV Seed Data Updates (Filename) |
---|---|---|---|
LocalEducationalAgency | ccd_lea_029_1819_w_1a_091019.csv LEA Data, Directory Information | /SampleDataGeneratorConfig/DistrictProfile/DistrictName /SampleDataGeneratorConfig/DistrictProfile/LocationInfo | Samples\DataFiles\LocalEducationAgency.csv |
School | ccd_sch_029_1819_w_1a_091019.csv School Data, Directory Information | /SampleDataGeneratorConfig/DistrictProfile/DistrictName/SchoolProfile[x] /SampleDataGeneratorConfig/DistrictProfile/DistrictName/SchoolProfile[x]/StudentPopulationProfile /SampleDataGeneratorConfig/DistrictProfile/DistrictName/SchoolProfile[x]/GradeProfile | Samples\DataFiles\School.csv |
School / Staff | ccd_sch_059_1819_l_1a_091019.csv School Data, Staff Information | /SampleDataGeneratorConfig/DistrictProfile/DistrictName/SchoolProfile[x]/StaffProfile | |
School / Student | ccd_SCH_052_1819_l_1a_091019.csv School Data, Membership Information | /SampleDataGeneratorConfig/DistrictProfile/DistrictName/SchoolProfile[x]/StudentProfile |
References
- https://nces.ed.gov/ccd/Â - National Center of Education Statistics, Common Core of Data general web site
- https://nces.ed.gov/ccd/files.asp - Direct data files for NCES CCD Data. Beware, NCES CCD is heavily reviewed, normalized and scrutinized before publishing and will be time-lagged by 12-24 months. "Provisional" and "preliminary" data are early access releases of partial data. For full data to feed into SDG, the last known official data would be best to use as most comprehensive.
- https://nces.ed.gov/ccd/files.asp#Fiscal:2,SchoolYearId:33,Page:1Â - Accessing 2018-2019 raw data - SEA, LEA and school level data