Bulk Data Proxy Service

First draft proposal . Please provide input/feedback via comments, whether inline or at the bottom of the page.

User Story

As an API client, I want to submit data to the Ed-Fi API in bulk, in order to optimize network transactions.

Implementation Proposal

Create a stand-alone proxy service that accepts JSON arrays asynchronously and forwards them to the real ODS/API, providing feedback to the client applications through asynchronous means such as a webhook, data stream, and/or status API.

Such a service could like the final proposal from the Bulk Data Exchange Over REST API Special Interest Group. The distinguishing feature of the current proposal is the decoupling of the bulk interface from the ODS/API software.

Advantages of decoupling:

  • Can work with any Ed-Fi API, not only the current .NET-based ODS/API version
  • Fits with the philosophy of building tools to purpose, instead of making the ODS/API into a Swiss Army knife
  • Could be completely independent of data standard: simply accept JSON, forward it, and store the HTTP responses without validation.
  • Therefore can sit in front of older implementations.

Challenges:

  • Another application to build
  • Authorization: the ODS/API's OAuth implementation is partial and therefore not ready to serve as a full-fledged OAuth 2.0 server for generating and validating tokens.
    • Simplest solution would be to store the client credentials and let the Proxy service handle authentication. (warning) This is risky.
    • Another solution would be to create a more full-fledged OAuth 2 service on top of the ODS/API's Admin database, perhaps integrated into the new Admin API.
    • Do not want to build this directly into a the Bulk Input Proxy Service - too much coupling to the current implementation.

Additional implementation notes:

  • Needs a backing datastore for JSON documents. Options:
    • PostgreSQL
    • SQL Server
    • SQLite
    • Filesystem / blob storage
  • While represented as a single service above, it would make more sense to have two components:
    • API service that accepts and queues up the documents, and provides status monitoring
    • Long-running service that orchestrates communication with the Ed-Fi API, which can scale independently from the API service.