The Data Import (DI) SIG had 3 goals:
...
Details of this toolkit are below, with links to GitHub for the source code and documentation of each component:
- Earthmover - CLI tool for transforming collections of tabular source data into a variety of text-based data formats via YAML configuration and Jinja templates.
- Lightbeam - CLI tool for validating and transmitting payloads from JSONL files into an Ed-Fi API.
Ed-Fi Educator Preparation Program Evaluation
The Ed-Fi Educator Preparation Program (EPP) works in the higher education space to utilize Ed-Fi technology to integrate data to evaluate performance and growth of education preparation programs. The majority of data within this domain comes from non-API ready systems, such as legacy databases and CSV source files. Data Import is known to service this domain and aids in the ETL process into Ed-Fi environments. As EPP can work in high volumes of data and enterprise -type of environments, the team did an evaluation of open-source, low-cost and cloud-ready ETL tools. Each one of these tools have been proven to load data into Ed-Fi ODS / APIs from this evaluation, with detailed notes on performance, process to install, map and load data with the tool, and pros/cons of each tool used.
Below is a summary listing of the tools reviewed as part of this effort:
- Standalone Tools
- Cloud-Based Tools
Outcomes: Ed-Fi will continue on non-enterprise tools / ad-hoc integration — at some point at-scale please a stronger look at the alternatives above — no "perfect line" in this in tool solutions,
Should we begin to look to how to work closer with the tools above?From this evaluation, Data Import seems to work well for non-enterprise environments and its data needs and processes. The tools listed above work well for enterprise environments and require a level of knowledge and effort to maintain for ETL needs. Each ETL situation is different and should be evaluated against the list of tools to determine the right fit for the project. In the future, Ed-Fi may look at paths to utilize domain knowledge and mapping capabilities from Data Import, and utilize pre-existing tools to service the need of transforming and loading non-API data for a hybrid solution approach.
3.) Data Import users will prefer an open-source path ahead for the product
...
4.) Additional Data Import SIG requests lead to these feature requests
- John Bailey - has noticed the first time they are importing data, it seems faster than subsequent imports (SF:
- Emilio and Rosh - Would like improved logging. It logs too much and they have to truncate the table often
- Zurab - duplicated headers in files have posed issues; Emilio - the pre-processor could help with this issue
- DI-1135 - Array Format in CSV
- John Bailey - 1.3.2 included a Docker container, but it is unclear how to kick off a schedule
- Zurab - Source code uses a library to work with FTP servers. The library does not work if the FTP server has a certain setting turned on. Works fine with SFTP, but not FTP.
- Mike Werner - Documentation is lacking