Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: draft save

The Data Import (DI) SIG had 3 goals:

...

Details of this toolkit are below, with links to GitHub for the source code and documentation of each component:

  • Earthmover - CLI tool for transforming collections of tabular source data into a variety of text-based data formats via YAML configuration and Jinja templates.
  • Lightbeam - CLI tool for validating and transmitting payloads from JSONL files into an Ed-Fi API.

Ed-Fi Educator Preparation Program Evaluation

...

The Ed-Fi Educator Preparation Program (EPP) works in the higher education space to utilize Ed-Fi technology to integrate data to evaluate performance and growth of education preparation programs.  The majority of data within this domain comes from non-API ready systems, such as legacy databases and CSV source files.  Data Import is known to service this domain and aids in the ETL process into Ed-Fi environments.  As EPP can work in high volumes of data and enterprise-type of environments, the team did an evaluation of open-source, low-cost and cloud-ready ETL tools.

Below is a summary listing of the tools reviewed as part of this effort:



Outcomes:  Ed-Fi will continue on non-enterprise tools / ad-hoc integration — at some point at-scale please a stronger look at the alternatives above — no "perfect line" in this in tool solutions, 

...

4.) Additional Data Import SIG requests lead to these feature requests

  • John Bailey - has noticed the first time they are importing data, it seems faster than subsequent imports  (SF:  
  • Emilio and Rosh - Would like improved logging. It logs too much and they have to truncate the table often
  • Zurab - duplicated headers in files have posed issues; Emilio - the pre-processor could help with this issue
  • DI-1135 - Array Format in CSV
  • John Bailey - 1.3.2 included a Docker container, but it is unclear how to kick off a schedule
  • Zurab - Source code uses a library to work with FTP servers. The library does not work if the FTP server has a certain setting turned on. Works fine with SFTP, but not FTP.
  • Mike Werner - Documentation is lacking