This version of the Ed-Fi ODS / API is no longer supported. See the Ed-Fi Technology Version Index for a link to the latest version.

 

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

The Ed-Fi ODS / API contains endpoints that allow client applications to send XML data files through the API for bulk loading. This is useful for a number of scenarios. Bulk loading is often the easiest way to populate a new instance of the ODS / API. In addition, some implementations only require periodic uploads from clients. Bulk loading is useful for these "batch loading" scenarios.

This article provides overview and technical information to help platform hosts and client developers use the bulk load endpoints.

Note that platform hosts have an alternate way of bulk loading files directly from disk (i.e., not through the API) using the Ed-Fi Console ODS Bulk Loader. See the article How To: Use the Ed-Fi Console ODS Bulk Loader for more information.

Overview

A bulk operation can include thousands of records across multiple files to insert and update.

A few key points about the API surface worth understanding are:

  • Clients must include a manifest file. Clients must include a manifest file for their bulk load operation. More on what this is...link to Bulk Load XML format below.
  • Clients must send bulk data that conforms to the Ed-Fi Data Standard XML definitions. See the "Bulk Load XML Format" section below for more detail.
  • Error Handling. Records are parsed and operations are executed as individual transactions. This means that one failing record will not fail the entire batch. The errors from individual transactions are logged. The error log can be inspected per bulk operation.
  • API Profiles do not impact bulk data loads. Platform hosts can optionally implement API Profiles to create data policies for the API JSON endpoints. Those policies do not affect data sent through bulk load services.
  • What else...?

Before we dive into the details, it's useful to understand the differences between the transactional operations of the API surface and the bulk load services. The following table provides a compare/contrast of the major differences: 

Transactional API SurfaceBulk Load Services
JSONEd-Fi Data Standard XML
Synchronous responsesAsynchronous responses
Near real-time, as data is changing in client applications

For initial load or batch mode updates

Full range of create, read, update, and delete operationsUpsert (i.e., create and update) only
Create and retrieve UniqueIdsNo ability to create or retrieve UniqueIds

Sequence

A high-level sequence of operations from the client is as follows:

  • Step 1. Create the operation.
    • POST the bulk operation manifest to /bulkOperations.
    • Obtain an operation id from the response. This ID is used by the client as a reference for future calls.
  • Step 2. Upload XML file(s).
    • POST each "chunk" to /uploads/{uploadId}/chunk?offset={}&size={}.
  • Step 3. Commit.
    • POST to /uploads/{fileId}/commit.
  • Step 4. Check Status.
    • GET /bulkOperations/{id}

Suggest we add a little framing to each operation above.

Bulk Load XML Format

Specifics about the manifest and the interchanges accepted...

Endpoints

Specifics and more detail about the API endpoints...

Platform Setup and Testing

This section outlines the basics of setting up and testing bulk loading for an ODS / API platform.

  • Microsoft Message Queue (MSMQ). Bulk load services work against Microsoft message queues, and the console workers share the same internal logic. The internal logic is covered by unit tests that verify the ability to process messages from one queue to the next. 
  • Smoke Testing. A "smoke test" is typically all that is required for these services. Platform hosts basically perform a bulk upload operation as outlined in this article, and verify that the data hits the ODS (either by inspecting the data tables directly or calling the API surface to search for the information loaded).
  • Troubleshooting. When troubleshooting the services, the bulk worker and upload services can be temporarily stopped, and their associated message queues examined for unprocessed messages. Turning on the associated service should eventually clear the service's source queue. If there are messages building up in either of the queues, the problem is typically one of credentials, and an appropriate error will be in the event log. Proper credentialing of the services is covered in the deployment documentation. See, e.g., the sandbox deployment information and production deployment information in the Platform Developers' Guide.

An Example

The Ed-Fi Alliance hosted a "boot camp" training session for API Client developers that included a walkthrough of bulk loading. An instructional overview and training materials are available online here.

 

  • No labels