DI SIG - September 8, 2022
Attendees
Support
Marcos Alcozer, Nancy Wilson, Ann Su - Ed-Fi Governance Support
The meeting was held via WebEx 2022-09-08 1:00-2:15pm CT
Meeting Materials PPT
Agenda/Notes
Data Import - Meeting One Agenda - "Data Import in Review"
- Welcome and Intros
- Participants introduced themselves with name, org, and role
- Data Import Review, SIG Goals and Data Import 1.4
- Jason - Meeting is focused on reviewing SIG goals, understanding how it’s used today, and understanding current pain points
- Jason -
- Data Import is an ETL solution for data that cannot get modernized on the API path
- Transforming data to the Ed-Fi format
- Loading into the Ed-Fi API
- Would like to discuss alternative ETL tools that perhaps didn’t exist when Data Import was initially developed
- Data Import is currently under the Ed-Fi License, but is moving to Apache 2.0 open-source
- Data Import is LEA-sizes. Performance suffers when CSVs are > 100k rows
- Next release of Data Import is coming in November
- SSO/OIDC/2FA improvements
- .NET 6 and multi-threaded performance enhancements
- Data Import is an ETL solution for data that cannot get modernized on the API path
- John Bailey - has noticed the first time they are importing data, it seems faster than subsequent imports
- Stephen - We need some more information, let’s get a Tracker ticket opened for that one
- What's working with Data Import?
- Emilio
- the latest version has resolved a lot of the small issue he has seen
- Would like improved logging. It logs too much and they have to truncate the table often.
- Rosh - “Yep, we've had to add SQL Server managment plans to truncate that table”
- Jason -
- We have seen that a lot with customers who have been using the tool for over a year
- We will try to be less chatty with logs in the November release
- We are going to look into this more
- Will look into file, database, console options
- --- Long-Tail/non-API Assessment (CSV based), TPDM/EPP domain
- --- Template Sharing Service - useful templates?
- Jason - We have had some performance and login issues this year. We are working on it and should be OK by end of September.
- Jon - TSS has been a blocker. Would like to use Data Import without the sharing service
- Emilio - Likes that having TSS as a requirement promotes the use of it, but it does get in the way if you don’t have credentials to it
- Jason - We have found that some Data Import deployments don’t have network access and the TSS requirement has caused problems
- Emilio - Templates have been working well. An engineer of his are working on reworking the templates to work with Data Standard v4
- Current templates will break with DS 4
- Zurab - duplicated headers in files have posed issues
- Emilio - the pre-processor could help with this issue
- Jason - we will open up an epic or ticket to look at that
- Zurab - Source code uses a library to work with FTP servers. The library does not work if the FTP server has a certain setting turned on. Works fine with SFTP, but not FTP.
- Mike Werner - Documentation is lacking
- John Bailey - 1.3.2 included a Docker container, but it is unclear how to kick off a schedule
- Jason - Good points. We are going to look into that.
- JF - we want multi-tenancy
- Multiple instances of Data Import in the same environment
- James Nadeau - Vermont would be interested
- Zurab - would like support for multiple files in one import
- Today a pre-processor can do it, but would be nicer for DI to do it without a pre-processor
- John Bailey - Question about Data Imports and CEDS generate?
- Emilio
- What's not working with Data Import or getting flat files into ODS/API?
- --- DI-1135 - Array Format in CSV
- Other tools and approaches?
- Close out