...
Understanding ODS API Invocation
Security Note: The method for invoking the ODS API will change from what is described below since Invoke-WebRequest will not be an allowed cmdlet. Instead of exposing $ODS variables directly, we will expose a pre-defined cmdlet that explicitly supports accessing the ODS API.
In the context of preprocessing a file, it can be useful to read data from the ODS via the API. This is supported by exposing the agent connection details at run-time via the following PowerShell variables:
- $ODS.BaseUrl - The URL for the ODS API (including the path, e.g. /data/v3[/year])
- $ODS.AccessToken - A valid bearer token to authorize access to the ODS API
This partial example demonstrates building an associated array of students for later lookups.
...
language | powershell |
---|---|
title | Invoking the ODS API from a Preprocessor Script |
...
This example uses the custom cmdlet "Invoke-OdsApiRequest". More information on this is provided in the section "PowerShell Sandbox"
This partial example demonstrates building an associated array of students for later lookups.
Code Block | ||||
---|---|---|---|---|
| ||||
Write-Information "Preparing to load student data..." $runStatus.ScriptStartTime = Get-Date $runStatus.StudentLoadStartTime = Get-Date $studentIds = @{} try { $continue = $true $offset = 0 $limit = 100 while ($continue) { $response = Invoke-WebRequestOdsApiRequest -URIUriPath "$($ODS.BaseUrl)/students?limit=$limit&offset=$offset" -Headers @{"Authorization"="Bearer $($ODS.AccessToken)"} -UseBasicParsing UseBasicParsing if ($response.StatusCode -ne 200) { Write-Error "Error invoking the EdFi API: $_"; return } $students = ConvertFrom-Json $response if ($students.Length -gt 0) { foreach ($student in $students) { $districtId = "NOT SET" $stateId = "NOT SET" foreach ($idCode in $student.identificationCodes) { if ($idCode.studentIdentificationSystemDescriptor -eq "District") { $districtId = $idCode.identificationCode } if ($idCode.studentIdentificationSystemDescriptor -eq "State") { $stateId = $idCode.identificationCode } } $studentIds.Add($stateId, $districtId) } } else { $continue = $false } $offset += $limit } } catch { Write-Error "Error loading list of Student State/District IDs from the ODS: $($_.Exception.Message)" return } |
Understanding the Big Picture
It is helpful to understand how files are processed today, and how this workflow
Design Considerations
Do we need both Custom Record Processing and Custom File Processing? NEED INPUT (working assumption is that we retain)
There is an overlap between the capabilities of the existing Custom Record Processing and the new Custom File Processing feature:
...
- Benefits of removing:
- Strategically, if two file preprocessing methods are not needed, then we should progress to that end state.
- Avoid additional work to improve refactor Custom Record Processing to support improved preprocessing management and sharing capabilities planned for Customer File Processing.
- However, we will need to identify a migration strategy for customers currently using Custom Record Processing.
- Benefits of retaining:
- If there are use cases for agent-specific processing.
- Opportunity to incorporate invocation of ODS /API from agent preprocessor (rather than data map preprocessor), which naturally aligns with the concrete API server connection.
Does Custom File Generation need to be considered this design?
Custom File Generation uses a PowerShell script in an entirely different context than Customer Record Processing and Custom File Processing. Whereas the latter items modify the content of an uploaded file during processingRecommendation:
- Retain Custom Record Processing capability. An example scenario in which we may want to apply a preprocessor script to an agent rather than the map is to resolve state-assigned student ids to student unique ids for select customers depending on how a particular file specification was populated.
Does Custom File Generation need to be considered in this design?
Custom File Generation uses a PowerShell script in an entirely different context than Customer Record Processing and Custom File Processing. Whereas the latter items modify the content of an uploaded file during processing, Custom File Generation executes a script on a schedule to generate a file.
...
However, there is value in consistently managing all PowerShell scripts. Currently, these scripts are discovered and used from known paths in the file system. The proposed enhancements include migrating script storage to the database and provide a UI for managing the scripts. It is likely beneficial if these enhancements are applied to Custom File Generation as well.
Where will ODS API invocation support be implemented? NEED INPUT (working assumption is that we support both)One of the goals stated by Jason for preprocessing is to support database access via ODBC. Custom File Generation is a good fit for this.
Where will ODS API invocation support be implemented?
There are three possibilities:
- API invocation is supported by Custom Record Processing (on Agent screen)
- This works well if we are confident that any logic requiring the use of the API happens after the file has been converted to tabular data by the Custom File Processor.
- However, it is of course only applicable if we retain the Custom Record Processing capability.
- API invocation is supported by Custom File Processing (on Data Map screen)
- Building a Data Map requires the discovery of the source file columns. When a Custom File Processor is used and that script requires the use of the API, then we must prompt for an API Connection for executing the script (per Multi-Connection enhancements).
- If we use this approach, it may be preferred that ODS API support for the preprocessor is explicitly enabled as a configuration option for each applicable Preprocessor script.
- API invocation is supported by Custom Record Processing and Custom File Processing
- This favors flexibility for undiscovered use cases.
Recommendation:
- We will support API invocation for all preprocessors.
Security concerns? NEED INPUT
Providing the capability for an application to dynamically execute user-provided PowerShell scripts adds a potentially significant attack vector to the web app and transform load console app.
...
We need to put some thoughts into ensuring effective security measures are considered, whether this is constraints on script capabilities or documenting risks and recommendations for restricting access to the server. We can also consider an option to disable the use of scripts.
Template Recommendation:
- Implement PowerShell Sandbox (described in more detail in its own section)
Template Sharing Service compatibility? NEED INPUT
TSS storage accepts the payload as a single JSON file that contains all the packaged assets. We can add PowerShell scripts to this without requiring changes to the service.
However, by providing support for PowerShell scripts in templates, Data Import version compatibility becomes a consideration. An older version of Data Import will likely ignore the addition of PowerShell scripts and new properties referencing them, but this will result in the user importing data maps that are non-functional.
Is it sufficient to say that by convention any usage related information will be described in the template description?
It is also proposed that Custom File Processing is the only preprocessor supported by template sharingthe service.
However, by providing support for PowerShell scripts in templates, Data Import version compatibility becomes a consideration. An older version of Data Import will likely ignore the addition of PowerShell scripts and new properties referencing them, but this will result in the user importing data maps that are non-functional.
Is it sufficient to say that by convention any usage related information will be described in the template description?
It is also proposed that Custom File Processing is the only preprocessor supported by template sharing.
Recommendation:
- Custom File Processing scripts are automatically included in the shared template based on map selections.
PowerShell Sandbox
Certica evaluated a number of strategies to provide reduce the security exposure of executing user-maintained PowerShell scripts in Data Import:
- Script signing (requires a public key infrastructure)
- Execute in a constrained runspace (with only the cmdlets needed for mapping available)
- Execute in container (introduces a complicated infrastructure and dependencies)
Striking a balance between maximizing security and retaining deployment and usage simplicity, the recommended is to execute script in-process using a constrained runspace, colloquially a "PowerShell Sandbox"
Characteristics of the sandbox:
- PowerShell scripts executed in a constrained runspace.
- The runspace is initialized with no available cmdlets.
- Cmdlets designed specifically for Data Import use are added to the runspace
- Invoke-OdsApiRequest
- ...
- Cmdlets explicitly whitelisted by the administrator are added to the runspace (probably will be defined in a config file)
- High-risk cmdlets should be avoided (New-Object, Invoke-WebRequest).
User Interface Changes
Menu
...
Manage Preprocessors
Add Preprocessor
800
Add Data Map
Export Template
...