/
How To: Use the Ed-Fi API Bulk Load Services

This version of the Ed-Fi ODS / API is no longer supported. See the Ed-Fi Technology Version Index for a link to the latest version.

 

How To: Use the Ed-Fi API Bulk Load Services

The Ed-Fi ODS / API contains endpoints that allow client applications to send XML data files through the API for bulk loading. This is useful for a number of scenarios. Bulk loading is often the easiest way to populate a new instance of the ODS / API. In addition, some implementations only require periodic uploads from clients. Bulk loading is useful for these "batch loading" scenarios.

This article provides overview and technical information to help platform hosts and client developers use the bulk load endpoints.

Note that platform hosts have an alternate way of bulk loading files directly from disk (i.e., not through the API) using the Ed-Fi Console ODS Bulk Loader. See the Console Bulk Loader documentation for more information.

Overview

A bulk operation can include thousands of records across multiple files to insert and update.

A few key points about the API surface worth understanding are:

  • Clients must post a representation of the files to be uploaded. This includes the format, the interchange type, and the file size. We'll look at an example below.
  • Clients must send bulk data that conforms to the Ed-Fi Data Standard XML definitions. The ODS / API is compatible with Ed-Fi Bulk Data Exchange for XML v3.1 standard./wiki/spaces/EFDS/pages/17727544
  • Error Handling. Records are parsed and operations are executed as individual transactions. This means that one failing entity record will not fail the entire batch. The errors from individual transactions are logged. The error log can be inspected per bulk operation.
  • API Profiles do not impact bulk data loads. Platform hosts can optionally implement API profiles to create data policies for the API JSON endpoints. Those policies do not affect data sent through bulk load services.

Before we dive into the details, it's useful to understand the differences between the transactional operations of the API surface and the bulk load services discussed in this article. The following table provides a compare and contrast of the major differences: 

Transactional API SurfaceBulk Load Services
JSONEd-Fi Data Standard XML
Synchronous responsesAsynchronous responses
Near real-time, as data is changing in client applications

For initial load or batch mode updates

Full range of create, read, update, and delete operationsUpsert (i.e., create and update) only
Create and retrieve UniqueIdsNo ability to create or retrieve UniqueIds

Platform Setup and Testing

This section outlines the basics of setting up and testing bulk loading through the ODS / API surface.

  • Microsoft Message Queue (MSMQ). The API bulk load services work against Microsoft message queues, and the console workers share the same internal logic. The internal logic is covered by unit tests that verify the ability to process messages from one queue to the next. 
  • Smoke Testing. A "smoke test" is typically all that is required for these services. Platform hosts basically perform a bulk upload operation as outlined in this article, and verify that the data hits the ODS (either by inspecting the data tables directly or calling the API surface to search for the information loaded).
  • Troubleshooting. When troubleshooting the API bulk load services, the bulk worker and upload services can be temporarily stopped, and their associated message queues examined for unprocessed messages. Turning on the associated service should eventually clear the service's source queue. If there are messages building up in either of the queues, the problem is typically one of credentials, and an appropriate error will be in the event log. Proper credentialing of the services is covered in the deployment documentation. See, e.g., the sandbox deployment information and production deployment information in the Platform Developers' Guide.

Client Walkthrough Example

This walkthrough demonstrates the sequence of operations clients use to load bulk data via the API. Note all URLs are within the /bulk path, which is under the relative path api/bulk/v1/. For a local developer this would be http://localhost:54746/bulk/v1/. We'll use XML files with student and enrollment data as an example. 

The high-level sequence of operations from the client is as follows:

Detail on each step follows.

Step 1. Create the Operation

POST a representation of the files to upload to /bulkOperations.

{
  "uploadFiles": [{
    "format": "text/xml",
    "interchangeType": "student",
    "size": 574
  }]
}

Create one uploadFiles entry for every file you're including. The format should always be "text/xml", interchangeType should be the type of interchange, and size is the total bytes of the file you're uploading. You can easily get the file size by using new FileInfo(filePath).Length or using the Length property of the file stream if you're opening a file stream to send it up.

Sample Response (which will have the HTTP status 201 Created):

HTTP/1.1 201 Created
Content-Type: application/json; charset=utf-8
Location: http://localhost:54746/bulk/v1/bulkOperations/956b7fd9-e289-4a01-9527-036822acf2b9?controller=bulkoperations
Server: Microsoft-IIS/10.0
Access-Control-Allow-Origin: *
Access-Control-Expose-Headers: *
X-SourceFiles: =?UTF-8?B?QzpcR2l0XFBlcnNvbmFsXEVkRmlBbGxpYW5jZVxFZC1GaS1PRFMtSW1wbGVtZW50YXRpb25cQXBwbGljYXRpb25cRWRGaS5PZHMuV2ViQXBpXGFwaVxCdWxrXHYxXGJ1bGtPcGVyYXRpb25z?=
X-Powered-By: ASP.NET
Date: Tue, 16 Jan 2018 20:36:35 GMT
Content-Length: 290

{
  "id": "956b7fd9-e289-4a01-9527-036822acf2b9",
  "uploadFiles": [
    {
      "id": "5782929c-c6c3-46de-b14c-a3efcab77d8b",
      "size": 574,
      "format": "text/xml",
      "interchangeType": "student",
      "status": "Initialized"
    }
  ],
  "status": "Initialized"
}

From the response, you can obtain the overall operation id (the root id), as well as individual file IDs for each file to be uploaded as part of the operation.

Step 2. Upload XML Files

For each file to upload, the client will take the returned file ID and then submit the file as one-to-many "chunks." Each chunk of the file can be up to 150MB. For the attached example file, it can be submitted as a single chunk for simplicity.

POST the file to /uploads/fileId/chunk?offset=offset&size=size, where fileId is the value returned from creating the bulk operation, offset is the current offset in the file starting with 0, and size is the actual size of the chunk being uploaded. This POST must be submitted as multipart/form-data with the binary data streamed along in the body. An easy way to do this correctly is to use (or deconstruct) the code provided in the generated SDK for the UploadsApi, as it will handle submitting the appropriate headers and data.

The following is an example HttpRequest with Headers and embedded XML:

POST http://localhost:54746/bulk/v1/uploads/5782929c-c6c3-46de-b14c-a3efcab77d8b/chunk?offset=0&size=707 HTTP/1.1
Host: localhost:54746
Content-Length: 1012
Authorization: Bearer 3319197ab6264138825027a16ed7aaa5
Cache-Control: no-cache
Content-Type: multipart/form-data; boundary=-----------------------------28947758029299
Accept: */*
Accept-Encoding: gzip, deflate, br

-------------------------------28947758029299
Content-Disposition: form-data; name="5782929c-c6c3-46de-b14c-a3efcab77d8b"; filename="5782929c-c6c3-46de-b14c-a3efcab77d8b"
Content-Type: application/octet-stream
 
<?xml version="1.0" encoding="UTF-8"?>
<InterchangeStudent xmlns="http://ed-fi.org/0300" xmlns:ann="http://ed-fi.org/annotation" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ed-fi.org/0300 ../../Schemas/Bulk/Interchange-Student.xsd">
	<Student>
		<StudentUniqueId>68</StudentUniqueId>
		<Name>
			<PersonalTitlePrefix>Mrs</PersonalTitlePrefix>
			<FirstName>Sarah</FirstName>
			<LastSurname>Stevens</LastSurname>
		</Name>
		<BirthData>
			<BirthDate>2005-06-12</BirthDate>
		</BirthData>
	</Student>
</InterchangeStudent> 
-------------------------------28947758029299--

The expected response is a status code of 201 Created, with no body.

You would repeat this process until the entire file has been uploaded, adding the size of the chunk to the offset value for each subsequent upload. For example, submitting two 300-byte chunks, the first offset would be 0, the second would be 300, and both would have a size of 300.

The following is example code for handling a large file:

int offset = 0;
int bytesRead = 0;
var buffer = new byte[3 * 1000000];

this.Logger.DebugFormat("Uploading file {0}", filePath);
using (var stream = File.Open(filePath, FileMode.Open, FileAccess.Read))
while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) != 0)
{
    if (bytesRead != buffer.Length)
    {
        var newBuffer = new byte[bytesRead];
        Array.Copy(buffer, newBuffer, bytesRead);
        buffer = newBuffer;
    }

    // Submit over to the sdk uploadApi for upload
    var response = uploadApi.PostUploads(new Upload
    {
        id = fileId,
        size = bytesRead,
        offset = offset,
        fileBytes = buffer
    });

    offset += bytesRead;

    if (response.StatusCode != HttpStatusCode.Created)
    {
        this.Logger.DebugFormat("Error uploading file {0}.", uploadFile.FilePath);
        break;
    }

    this.Logger.DebugFormat("{0} bytes uploaded.", offset);
}

Step 3. Commit the Upload

For each file, after finishing the upload, take the fileId and commit the upload.

POST to /uploads/fileId/commit where fileId is the same fileId that was uploaded. The expected response is a 202 Accepted with no body.

Step 4. Check Status

At this point, the bulk operation is complete, and will be processed on the server asynchronously. Once the commit command is received, the operation is pushed to a queue that will trigger the actual processing. Status can be checked at any time by performing a GET to /bulkOperations/bulkOperationId, where bulkOperationId is the id sent back from the original creation of the operation.

On the happy path, after committing all the files, the status will be Started, as in this example:

{
  "id": "956b7fd9-e289-4a01-9527-036822acf2b9",
  "uploadFiles": [
    {
      "id": "5782929c-c6c3-46de-b14c-a3efcab77d8b",
      "size": 729,
      "format": "text/xml",
      "interchangeType": "student",
      "status": "Started"
    }
  ],
  "status": "Started"
}

Once the operation is done processing, the status should be Completed, as in this example:

{
    "id": "956b7fd9-e289-4a01-9527-036822acf2b9",
    "uploadFiles": [
        {
            "id": "5782929c-c6c3-46de-b14c-a3efcab77d8b",
            "size": 574,
            "format": "text/xml",
            "interchangeType": "student",
            "status": "Completed"
        }
    ],
    "status": "Completed"
}

If any of the data elements don't load correctly, the status will come back as Error, as in this example:

{
    "id": "956b7fd9-e289-4a01-9527-036822acf2b9",
    "uploadFiles": [
        {
            "id": "5782929c-c6c3-46de-b14c-a3efcab77d8b",
            "size": 574,
            "format": "text/xml",
            "interchangeType": "student",
            "status": "Error"
        }
    ],
    "status": "Error"
}

An Error status doesn't necessarily mean every record failed to load. To see which parts failed to load, you can perform a GET against /bulkOperations/operationId/exceptions/fileId?offset=0&limit=50 to get 50 exceptions per file at a time. You can adjust the offset and limit to page through all the exceptions until you've received them all.

A client system typically wouldn't be able to access the student record that was loaded in this example until StudentEnrollment record establishing the relationship between student and an education organization is loaded. This is required for authorization. You can follow the same steps to load StudentEnrollment interchange from the provided sample StudentEnrollment.xml (size: 2917) .

Further Information

This section contains a few additional resources related to bulk loading through the API:

  • Bulk Operation Endpoint Documentation. The endpoints used in the example above are documented on the Ed-Fi API sandbox instance at https://api.ed-fi.org/v3.1.1/docs/. See the "Other" API section.
  • Ed-Fi Tracker / JIRA. The Ed-Fi Alliance's issue tracking system is a good resource for fine points and troubleshooting specific implementation issues. See, e.g., the discussion on Tracker ticket ODS-820.
  • Deployment Documentation. The Deployment section in the Platform Developers' Guide has additional information that platform hosts and operations teams may find useful.
Downloads

The following links provide the sample XML files used in the walkthrough below.

SampleStudent.xml
SampleStudentEnrollment.xml

Platform hosts and client application developers may find these files useful for testing their implementations.