T-Bulk Design

Introduction

This documentation provides an overview of the Temporal Data Bulk Load pipeline and its interactions with existing Ed-Fi Bulk Load technology components. 

T-Bulk Features 

  1. Supports full snapshot creation or overwrite.

  2. Bulk loads data to a staging ODS reusing the existing pipeline. Modifications support passing snapshot metadata to entry points.

  3. Executes snapshot process against temporary staging ODS.
  4. Staging ODS is tied to bulk load ODS, built on deployment and initialization.
  5. Incremental Snapshot Mode to support incremental bulk load operations before snapshot execution.

T-Bulk Components 

Entry Points

Bulk temporal data is accepted by the Ed-Fi Console Bulk Loader and the Ed-Fi ODS / API Bulk Load Services. Bulk operations support new parameters used for the creation of snapshot metadata. BulkOperationResource will also be modified to expose these same properties to the API.

Support for Temporal Snapshot Metadata Creation/Replacement

The Entity Framework BulkOperation and database with migration was modified to support a new set of parameters.

The following table shows the temporal context parameters understood by the bulk console application. The BulkOperationResource exposes these properties to the API.

Error rendering macro 'excerpt-include' : No link could be created for 'Loading Ed-Fi Temporal Data Using Console Bulk Loader'.

T-Bulk Empty ODS DB

A new Empty Temporal Staging ODS DB is generated with a new PowerShell script. This has been implemented similar to the way the existing bulk staging database is built to allow reuse between deployment and local scenarios. Type data is pre-loaded into the ODS. 

Updates to Build Process Required for Initialization of the Database

Creation of the empty staging database has been added to the initdev process, modeled after the way the existing Bulk staging is generated. InitializeDevelopmentEnvironment.psm1 is modified to read from Web.config to determine if temporal features is enabled, and changes the execution of the TemporalOds subtype and the creation of the temporal bulk staging to run only if the temporal flag is true. This is set to true by default for local development.

When temporal parameters are included, the bulk pipeline targets the temporal staging ODS using a modified connection string.

Steps have been added to TeamCity and Octopus to support deploying an additional database as part of the existing database NuGet package.

T-Bulk ODS Snapshot Execution

Once the temporal staging database has been loaded, the bulk load process triggers a load of the snapshot data using the load stored procedure, with the target database as determined by the system (passed in for Bulk Console, determined by IOC for Windows service). The Snapshot Audit Log and UDM Count process are run at snapshot completion.

Error Reporting

Validation errors are output to the console.

Logging

Bulk Operation errors are logged in the existing bulk logging framework to bulk database.

Validation

Additional validation rules have been added to BulkOperationCreateValidator to verify the following scenarios.

  • Providing none of the temporal parameters is valid. This simply triggers a normal bulk load against the current ODS/API.
  • SnapshotBeginDate must be prior to or equal to today.
  • If SnapshotEndDate exists, SnapshotBeginDate must exist.
  • If SnapshotEndDate exists, SnapshotBeginDate must be less than SnapshotEndDate.
  • If SnapshotBeginDate and SnapshotEndDate are valid, additional checks are run:
    • If SnapshotEndDate is null, SnapshotBeginDate must be greater than the highest existing SnapshotBeginDate, and greater than or equal to the highest existing not null SnapshotEndDate. Note that if there is an existing SnapshotEndDate that is null, it will be updated to the bulk SnapshotBeginDate.
    • Check if any snapshots overlap with the given range. Partial overlap of any kind should be an error.
    • No overlap at all is valid.
  • If ForceOverwriteSnapshot is false or not provided, and the snapshot date and snapshot code exactly matches an existing snapshot, throw an error.
  • If ForceOverwriteSnapshot is provided, snapshot date and snapshot code match, and the snapshot is locked, throw an error.
  • If ForceOverwriteSnapshot flag is true, and a matching snapshot is not found (where snapshot date and code match) and all other validations pass, a new snapshot will be created.
  • If ForceOverwriteSnapshot flag is true but validations fail then the appropriate validation error will be displayed.
  • If ForceOverwriteSnapshot flag is true, matching snapshot is found, and all other validations pass, then the snapshot is overwritten, any changed metadata is updated, and snapshot is re-initialized.
  • If SnapshotCode is provided, it must be unique across all Snapshots.
  • If SnapshotDate is provided, it must be unique across all Snapshots.
  • If SnapshotDate is provided, it must be greater than or equal to SnapshotBeginDate, and less than SnapshotEndDate (if SnapshotEndDate is not null).
  • If loading to a locked snapshot with o/ flag, the locked snapshot is not overwritten.

Incremental T-Bulk Loads to Accommodate Bulk Process Race Condition/Threading Issues

Current Bulk pipeline contains a known issue related to records failing to load when a heavier load is pushed in a single bulk operation. See https://tracker.ed-fi.org/browse/ODS-1373.

In order to load bulk successfully, a user must incrementally bulk load or retry (upsert) the same load until all errors are resolved. To workaround this, the T-Bulk process will allow a user to set an incremental load (using the temporalBulkLoad parameter) which will not run a snapshot or clear temporal staging data, and will run the bulk load as normal against the temporal staging ODS. Once the user is satisfied that the bulk load has succeeded, the user will run SnapshotIncremental mode and the snapshot will execute and the temporal staging ODS will be cleared.

Temporal Bulk Load Flow Diagram