Data Import CLI Design

DRAFT

This is preliminary work, and is not ready for review or feedback.

Purpose

Provide CLI capabilities for Data Import to enable:

  • Integration with source control systems via scriptable tasks for importing and exporting assets into and out of Data Import as individual files
  • Integration with CI/CD automation for automatically updating Data Import environments

The idea for this enhancement arose from Certica's enhancements to Data Flow (predecessor to Data Import) in which Data Map JSON definitions and PowerShell Preprocessor scripts were maintained in source control to provide accountability, revision comparison, and auditing.

Requirements

  1. All assets managed via the Data Import user interface are transferable in and out of the system.
    1. PowerShell Scripts
    2. Data Maps
    3. Lookups (by Source Table)
    4. Bootstrap Data
    5. Agents
    6. API Connections
  2. Each asset is represented as an individual file on the file system.
  3. Assets should be transferrable individually.
  4. Groups of assets can be transferred without specifying assets individually. This includes all assets.
  5. A transferred group of assets can be specified by asset type (e.g. data maps)
  6. A transferred group of assets can be specified by an explicit definition (e.g. specify all assets used for importing STAAR files).
  7. A configurable folder convention allows transferred assets to align to the desired organization of files committed to source control.
  8. When importing, validation exists for verifying the integrity of individual assets as well as dependencies between assets.
  9. Import results are reported (summary, list of imported / not imported / unchanged assets).
  10. Configure whether to overwrite or skip existing items when importing.
  11. Error state can be communicated to the caller.
  12. PowerShell friendly.

Design Considerations

Does this affect the existing import, exporting, and sharing features?

The current functionality is designed to be interactive. Assets are packaged into a single file, which is optimal for exchanging with other parties. The CLI feature is focused on providing integration with development tools. Other than sharing some common words ("import", "export") they otherwise seem to be independent from each other.

How much of the existing code needs to be refactored to share common logic between the web application and the new CLI console app?

Minimal (I'd like to get a second opinion on this though)

What are the filename conventions for exported asset files?

All transferrable assets will follow a standard filename convention:

  • <Name>.<Type>.json

For example:

  • Grand Bend 2019 Sample State ELA Assessment Results.datamap.json

Issues with persisting the asset name as a filename? NEED INPUT

Use Path.GetInvalidFileNameChars to identify any characters in the name that are forbidden in a filename. 

Forbidden Characters (Windows): <>:"/\|?*

We will either need to decide on a substitution scheme or declare them as invalid for exporting. NEED INPUT

Console app or PowerShell cmdlet?

I tend to think PowerShell should be the default approach for this. It is broadly supported, provides flexibility, and is strategically placed.

It should be noted that this is divergent from the existing TransformLoad app, which is a console app.

We can also consider supported both, but there would need to be a compelling need for this.

How does the CLI connect to Data Import? NEED INPUT

A few options:

  1. Implicitly connect to database (this strategy is used by TranformLoad)
    1. While this is convenient from a traditional dotnet deployment perspective, separating of concerns between connection and code is preferred.
  2. Explicit credentials provided to database (ala Invoke-Sqlcmd)
    1. Robust, flexible, clean integration with CI/CD stack
  3. API-first approach (Data Import exposes an API for administration activities)
    1. Architecturally optimal, but this increases the scope of work 10-fold. Data Import is not currently architected for API-first, and doing so would require an entirely new SPA-based front-end.

Prefer option 2.

PowerShell Cmdlets

Syntax

Data Import Cmdlets
Export-DataImportAssets
	-FolderPath <String>
    -ConnectionString <String>
	[-Filter <String>]
    [-FolderPerAssetType]
	[-NoClobber]

Import-DataImportAssets
	-FolderPath <String>
    -ConnectionString <String>
	[-Filter <String>]
    [-FolderPerAssetType]
	[-NoClobber]

Description

The Export-DataImportAssets cmdlet exports assets from Data Import as individual files.

The filename of each asset is as follows: <Name>.<AssetType>.json

Possible AssetType values are:

  • preprocessor
  • datamap
  • lookuptable
  • bootstrap
  • agent
  • connection

Examples of exported files:

  • Grand Bend 2019 Sample State ELA Assessment Results.datamap.json
  • CA-IREADY_AssessedGradeLevel.lookuptable.json
  • Default API Connection.connection.json

The AssetType value is used in three contexts:

  • The filename segment (as demonstrated above).
  • The folder name if -FolderPerAssetType is specified.
  • Values for the AssetType attribute in the -Filter expression.

The Import-DataImportAssets cmdlet works similarly but in reverse. It imports assets into Data Import from individual files.

Parameters

-FolderPath

  • A required parameter that specifies the location of the transferred Data Import assets.

-ConnectionString

  • A required parameter that specifies a connection string to the database used by Data Import.

-Filter

  • Specifies a query string that matches transferrable assets. This string uses the PowerShell Expression Language syntax.
  • <FilterComponentList> ::= <FilterComponent> | <FilterComponent> <JoinOperator> <FilterComponent> | <NotOperator> <FilterComponent>
  • <FilterComponent> ::= <attr> <FilterOperator> <value> | "(" <FilterComponent> ")"
  • <FilterOperator> ::= "-eq" | "-le" | "-ge" | "-ne" | "-lt" | "-gt"| "-approx" | "-bor" | "-band" | "-recursivematch" | "-like" | "-notlike"
  • <JoinOperator> ::= "-and" | "-or"
  • <NotOperator> ::= "-not"
  • <attr> ::= AssetType | Name
  • Note: PowerShell wildcards other than *, such as ?, are not supported by the Filter syntax.
  • (This was modeled after -Filter for Get-ADGroup)

-FolderPerAssetType

  • Assets are organized by folders per asset type.

-NoClobber

  • Use this parameter so that the command does not overwrite an existing asset. By default, if the asset exists, the command overwrites without warning.

Examples

Export all assets to the current folder
Export-DataImportAssets 
    -FolderPath .  
    -ConnectionString "Data Source=MYSERVER;Initial Catalog=EdFi_DataImport;Integrated Security=True;ApplicationIntent=ReadOnly"
Export all STAAR assets to folders by asset type
Export-DataImportAssets 
    -FolderPath .
    -ConnectionString "Data Source=MYSERVER;Initial Catalog=EdFi_DataImport;Integrated Security=True;ApplicationIntent=ReadOnly"
    -Filter 'Name -like "STAAR*"'
    -FolderPerAssetType


Export not-existing assets from folders by asset type
Import-DataImportAssets 
    -FolderPath .
    -ConnectionString "Data Source=MYSERVER;Initial Catalog=EdFi_DataImport;Integrated Security=True;ApplicationIntent=ReadOnly"
    -FolderPerAssetType
    -NoClobber