The Data Import (DI) SIG had 3 goals:
...
- EdAnalytics
- Great contribution for ETL tooling and report out from EA
- Cloud based in mind and Python based tooo
- https://github.com/edanalytics/earthmover
- https://github.com/edanalytics/lightbeam
- EPP alternatives eval
- Link to the published report (comes out today)
- Summary brief of tools used:
- Talend (free version)
- Ni-Fi (open source)
- Azure Data Factory (cloud based DataBricks)
- AWS Glue (cloud based DataBricks)
Outcomes: Ed-Fi will continue on non-enterprise tools / ad-hoc integration — at some point at-scale please a stronger look at the alternatives above — no "perfect line" in this in tool solutions,
Should we begin to look to how to work closer with the tools above?
3.) Data Import users will prefer an open-source path ahead for the product
- Because of 2022 conversations heard, Ed-Fi was moved to open-source Data Import
- In November 2022, we released Data Import 2.0 to an open source repo
- We've moved the TSS Template Sharing Service to a GitHub Exchange repo
4.) Additional Data Import SIG requests lead to these feature requests
- John Bailey - has noticed the first time they are importing data, it seems faster than subsequent imports (SF:
- Emilio and Rosh - Would like improved logging. It logs too much and they have to truncate the table often
- Zurab - duplicated headers in files have posed issues; Emilio - the pre-processor could help with this issue
- DI-1135 - Array Format in CSV
- John Bailey - 1.3.2 included a Docker container, but it is unclear how to kick off a schedule
- Zurab - Source code uses a library to work with FTP servers. The library does not work if the FTP server has a certain setting turned on. Works fine with SFTP, but not FTP.
- Mike Werner - Documentation is lacking