Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Purpose

The primary goals for introducing preprocessor enhancements to Data Import are to provide support for a wider range of file import use cases and improve the experience of working with Powershell scripts within Data Import.

...

  • Provide support for non-CSV files (e.g. tab-delimited, fixed-width, or XML)
  • Enable reshaping of the row grain of the file (one row becomes multiple or vice-versa)
  • Enable exclusion of rows (e.g. student assessment rows that indicate the student was not tested)
  • Enable invocation of the ODS API from preprocessor scripts (dynamically selecting the API used by the executing agent)
  • Improve the management and sharing capabilities or of PowerShell scripts used by preprocessors
    • Maintain scripts in the database instead of the file system and provide a UI for managing
    • Include script dependencies in template sharing and import/export tools

Related epic: 

Jira Legacy
serverEd-Fi Issue Tracker
serverIde04b01cb-fd08-30cd-a7d6-c8f664ef7691
keyEDFI-411

Preprocessing Capabilities in Data Import

Data Import already supports two PowerShell integration capabilities for preprocessing of files:

  • Custom Record Processing
    • Provided via an optional selection of a Row Processor (PS1 file) on the agent screen.
    • This feature enables scripting of modifications to field values within a row, including adding new fields.
    • More information
  • Custom File Generation
    • Provided via a "File System / PowerShell" agent type choice on the agent screen, which enables the selection of a Generator (PS1 file).
    • This feature enables scripting of file generation on a predefined schedule.
    • More information

...

  • Custom File Processing
    • Provided via an optional selection of a File Processor (PS1 file) on the map screen.
    • This feature integrates the source file into the PowerShell pipeline to support a wide variety of transformations resulting in a tabular output compatible with the existing mapping designer.

Understanding Custom File Processing

The PowerShell script implementation of Custom File Processing works a little differently than the existing Custom Row Processing.

How It Works: Custom Row Processing

The existing Custom Row Processing operates against the agent.

...

$row.'sasid' = $row.'sasid'.Trim()

$row.'newfield' = $row.'sasid' + '!'

How It Works: Custom File Processing

By contrast, Custom File Processing uses pipeline input via an Advanced Function, and it operates against the data map rather than the agent.

...

Alternatively, substitute the array of arrays parameter with a flat array parameter if you want your script pipeline processing to be called executed per row instead of for the whole file:

...

The PowerShell advanced function uses input process methods, represented by the Begin, Process, and End blocks of the function, to process the stream as it passes through the PowerShell pipeline.

Custom File Processing Examples

This example demonstrates processing the entire file at once. The Begin and End blocks are optional. They're included here for illustration.

...

Code Block
languagepowershell
titleExample: Fixed Width to CSV Conversion
[CmdletBinding()]
Param( 
    [Parameter(Mandatory = $true, ValueFromPipeline = $true)][string[]]$line
)

Begin {
    $fieldDefinitions = @(
        [PSCustomObject]@{ "Name" = "SCHOOL_ID";  "Start" = 0;  "Length" = 10; }
        [PSCustomObject]@{ "Name" = "STUDENT_ID"; "Start" = 10; "Length" = 10; }
        [PSCustomObject]@{ "Name" = "FIRST_NAME"; "Start" = 20; "Length" = 12; }
        [PSCustomObject]@{ "Name" = "LAST_NAME";  "Start" = 32; "Length" = 12; }
        [PSCustomObject]@{ "Name" = "GENDER";     "Start" = 44; "Length" = 6; }
    )

    Write-Output (($fieldDefinitions | %{ $_.Name }) -join ",")
}

Process {
    $builder = [System.Text.StringBuilder]::new()

    $fieldDefinitions | %{ 
        $field = $line.Substring($_.Start, $_.Length)
            
        $needsQuotes = ($Field -match ",")
        $isLast = ($_.Start + $_.Length -ge $line.Length)

        if ($needsQuotes) {
            $builder.Append("`"") | Out-Null
        }
            
        $builder.Append($field.Trim()) | Out-Null
            
        if ($needsQuotes) {
            $builder.Append("`"") | Out-Null
        }
        
        if (!$IsLast) {
            $builder.Append(",") | Out-Null
        }
    }

    Write-Output $builder.ToString()
}

Understanding ODS API Invocation

In the context of preprocessing a file, it can be useful to read data from the ODS via the API. This is supported by exposing the agent connection details at run-time via the following PowerShell variables:

  • $ODS.BaseUrl - The URL for the ODS API (including the path, e.g. /data/v3[/year])
  • $ODS.AccessToken - A valid bearer token to authorize access to the ODS API

This partial example demonstrates building an associated array of students for later lookups.

Code Block
languagepowershell
titleInvoking the ODS API from a Preprocessor Script
Write-Information "Preparing to load student data..."
$runStatus.ScriptStartTime = Get-Date
$runStatus.StudentLoadStartTime = Get-Date
$studentIds = @{}
try {
    $continue = $true
    $offset = 0
    $limit = 100
    while ($continue) {
        $response = Invoke-WebRequest -URI "$($ODS.BaseUrl)/students?limit=$limit&offset=$offset" -Headers @{"Authorization"="Bearer $($ODS.AccessToken)"} -UseBasicParsing 
        if ($response.StatusCode -ne 200) { Write-Error "Error invoking the EdFi API: $_"; return }
        
        $students = ConvertFrom-Json $response
        if ($students.Length -gt 0) {
            foreach ($student in $students) {
                $districtId = "NOT SET"
                $stateId = "NOT SET"
                foreach ($idCode in $student.identificationCodes) {
                    if ($idCode.studentIdentificationSystemDescriptor -eq "District") { $districtId = $idCode.identificationCode }
                    if ($idCode.studentIdentificationSystemDescriptor -eq "State") { $stateId = $idCode.identificationCode }
                }
                $studentIds.Add($stateId, $districtId)
            }
        }
        else {
            $continue = $false
        }
        $offset += $limit
    }
}
catch {
    Write-Error "Error loading list of Student State/District IDs from the ODS: $($_.Exception.Message)"
    return
}

Understanding the Big Picture

Image Added

Design Considerations (WIP)

Do we need both Customer Custom Record Processing and Custom File Processing?

...

  • Both can modify values at the row level, including .
  • Custom File Processing additionally supports adding new fields to the row as well as processing of the entire file for more complex scenarios.

Since Custom File Processing

...

Are there any use cases for the agent-based Custom Record Processing that will not be supported by the new map-based Custom File Processing feature?

Where does Custom File Generation fit in?

Should we implement ODS API invocation support in the existing Custom Row Processing (i.e. on agent) rather than the new Custom File Generation (on data map)?provides a superset of functionality, there is a consideration to remove (or at least deprecate) the existing Custom Record Processing feature:

  • Benefits of removing:
    • Strategically, if two file preprocessing methods are not needed, then we should progress to that end state.
    • Avoid additional work to improve refactor Custom Record Processing to support improved preprocessing management and sharing capabilities planned for Customer File Processing.
  • Benefits of retaining:
    • If there are use cases for agent-specific processing. NEED INPUT
    • Opportunity to incorporate invocation of ODS /API from agent preprocessor (rather than data map preprocessor), which naturally aligns with the concrete API server connection.

Does Custom File Generation need to be considered in context of design considerations?

Custom File Generation uses a PowerShell script in an entirely different context than Customer Record Processing and Custom File Processing. Whereas the latter items modify the content of an uploaded file during processing, Custom File Generation executes a script on a schedule to generate a file.

As a result, Custom File Generation is not impacted by the proposed preprocessor enhancements.

However, there is value in consistently managing all PowerShell scripts. Currently, these scripts are discovered and used from known paths in the file system. The proposed enhancements include migrating script storage to the database and provide a UI for managing the scripts. It is likely beneficial if these enhancements are applied to Custom File Generation as well.

Where will ODS API invocation support be implemented?

There are three possibilities:

  • API invocation is supported by Custom Record Processing (Agent)
    • This works well if we are confident that any logic requiring use of the API happens after the file has been converted to tabular data by the Custom File Processor.
    • However, it is of course only applicable if we retain the Custom Record Processing capability.
  • API invocation is supported by Custom File Processing (Data Map)
    • Building a Data Map requires discovery of the source file columns. When a Custom File Processor is used and that script requires use of the API, then we must prompt for an API Connection for executing the script (per Multi-Connection enhancements).
    • If we use this approach, it may be preferred that ODS API support for the preprocessor is explicitly enabled as a configuration option for each applicable Preprocessor script.
  • API invocation is supported by Custom Record Processing and Custom File Processing
    • This favors flexibility for undiscovered use cases.

User Interface Changes

UI FeatureModifications




Database Changes

TableChangeJustification