_Using the Changed Record Queries
- Eric Jansson
- Miguel Kaminski
The Ed-Fi ODS / API platform contains data that gets updated frequently. The platform tracks inserts, updates, and deletes, and surfaces those changes to client systems through a feature called changed record queries, or "change queries." Change queries allow client systems to narrow requests for data to only data that has changed since a specified point in time. This allows client systems to stay in sync with the ODS / API without having to pull a complete dataset.
Change queries is an optional feature, so you'll need to check with your target platform host to see if it's enabled.
About Change Queries
The change queries feature was designed to have a simple architecture, integrate with the core API client authorization, and be simple to use. This ensures that the system is performant, secure, and easy to maintain. This approach results in the following properties:
- The solution provides a reference to records that have changed since a previous point. It does not directly provide the changed data itself. This allows client systems to optimize the application of changes in the most efficient means for the particulars of their system.
- The reference provided by the solution will only be the highest (i.e., the most recent) change version. This is different from, say, a change data capture system that provides a log of every change.
- The solution does not target immediate consistency, but rather provides a reliable means of eventual consistency.
- The solution will work for most use cases, but absolute consistency cannot be guaranteed. A periodic re-synchronization may be required for some uses.
- The technical article Changed Record Queries has implementation details which may be of interest to some client system developers.
Overview of Change Query Endpoints
The simple design means that the core operations are basic. This section provides an overview.
Snapshots
A global Available snapshots API resource provides information on the available snapshots that host has provided for change processing.
GET /changeQueries/v1/snapshots
A successful request will contain a response body that looks like the following:
{ "id": "d19c86ced5ff49c19d56e6b5c8f1ec68", "snapshotIdentifier": "abcd", "snapshotDateTime": "2021-02-15T17:19:57.7866667Z" }
If multiple snapshots are returned, the API Client should use the most recent snapshot based on the snapshotDateTime
property. Once the most recent snapshot is identified, the API client should add the Snapshot-Identifier
HTTP header to each request, with the value of the snapshotIdentifier
property.
If no snapshots are returned, it means that the host has not set up snapshots for change query processing. API clients can use the change query feature without the Snapshot-Identifier
HTTP header, but using snapshots is recommended for data consistency and accuracy where available.
Available Change Versions Resource
The ODS / API uses a change version (as opposed to, say, a date) in the form of a sequential long integer. A global Available Change Versions API resource provides information on the current change version. This resource allows clients to request a reference to changed records they have not already requested or processed.
GET /changeQueries/v1/availableChangeVersions
Minimum and Maximum Change Version Parameters
The Minimum Change Version and Maximum Change Version parameters allow clients to request the latest representation of all resources that were modified within the given change version window. These parameters are available on every data resource, both as part of the Ed-Fi Data Standard and in extension models. The parameters are also compatible with the existing parameters to support paging using the offset
and limit
parameters. Using paging parameters plus change version parameters, all records can be retrieved over multiple calls.
GET /data/v3/ed-fi/students?minChangeVersion=234378&maxChangeVersion=234974&offset=100&limit=100
Deletes Route
The "Deletes" route allows clients to get the Id
for deleted resources. This route also supports the existing paging parameters of offset
and limit
.
GET /data/v3/ed-fi/students/deletes?minChangeVersion=234378&maxChangeVersion=234974&offset=100&limit=100
Note that the /deletes
endpoint will include all deleted resource Ids without filtering the result based on an API Client's authorization. This is regarded as harmless from a security aspect as the access to actual resource data is governed by the authorization.
Synchronization Using the Change Query Endpoints
The primary purpose of the change queries feature is to support periodic synchronization of data. This section covers the basics.
Simple Daily Synchronization Example
The following example shows the logical flow for a daily synchronization process that only looks at Student records.
For more expansive processing, consider using the resource dependency metadata endpoint.
- Initial processing to get all data:
GET /changeQueries/v1/snapshots
to get the current snapshot identifier and add that value to all subsequent request headers. If a snapshot is not available, you can still use the change query feature, but be aware of the caveats mentioned below.GET /changeQueries/v1/availableChangeVersions.
As an example, assume we get 100 as a response. Store this value in a variable to be saved as the starting point for the first increment of processing if the full export is successful.GET
/data/v3/ed-fi/students.
If you are not using a snapshot, add themaxChangeVersion=100
parameter to return only Student records up through change version 100. Note that theminChangeVersion
is not required for the initial export, but you may need to perform multiple paged requests (usingoffset
andlimit
parameters) to retrieve all the available data.When using a snapshot, the
maxChangeVersion
on these requests is unnecessary since the source data is read-only and isolated from any changes. Using a snapshot greatly simplifies client processing.- The results will not include deleted Student records, so for the initial full export you won't need any special delete handling.
- As with any large data retrieval process, it is recommended to perform the initial export process during a period of low activity on the API, to reduce contention for resources.
If a snapshot is not available, it is strongly recommended to run an incremental processing of changes immediately after the initial synchronization due to the time required to transfer all the data. While this may help reduce the chances of downstream referential integrity problems, it cannot prevent all data consistency error scenarios.
- Incremental processing of changes (e.g., 1 day later):
GET /changeQueries/v1/snapshots
to get the current snapshot identifier and add that value to all subsequent requests. If snapshots are not available, you can still use the change query feature but be aware of some caveats mentioned below.GET /changeQueries/v1/availableChangeVersions.
Assume we get 250 as a response.GET /data/v3/ed-fi/students?minChangeVersion=101.
This request returns any created or updated Student records. TheminimumChangeVersion
is the previously processed maximum incremented by 1.If not using a snapshot, you'll also need to add the
maxChangeVersion=250
parameter to the requests to prevent changes after that point from being returned in the responses. Be aware that this mode of processing has an associated set of data consistency failure scenarios, and the only way to ensure data consistency is to process with the snapshot isolation.GET /data/v3/ed-fi/students/deletes?minChangeVersion=101.
This API call is, of course, optional if your system does not need to be aware of deleted records.If not using a snapshot, you'll also need to add the
maxChangeVersion=250
to prevent deletions that occur after the start of processing from being applied to the downstream system.In order for your system to process deletes, the current API implementation will require you to save (and possibly map) each resource identifier (i.e., the
Id
property) to your system. There has been some discussion about exposing natural key values in the deletes child resource, but currently only theId
is provided (see ODS-3672 - Getting issue details... STATUS ).
Usage Notes
A few things to keep in mind when developing API client processing:
- If enabled by the platform host, using an ODS snapshot simplifies client processing significantly and ensures data consistency on the downstream system. There are failure scenarios that can cause data loss or referential integrity violations when the source data is allowed to change during processing.
- Keep dependency order in mind when pulling updates and deletes. For example, if you have a system that enforces referential integrity, you'll need to create and update data in dependency order, and delete data in reverse dependency order to ensure that valid relationships are maintained.
- If you are not using an ODS snapshot, try spanning your change version window across two (or more) change windows if the data is being modified during your processing. Updates during the time the client is processing can cause errors in loading some resources, including missed data. You can try to prevent or recover from these scenarios by spanning over multiple change windows.
As an example of processing using multiple change windows in lieu of snapshots, and assuming a daily synchronization schedule:
Synchronization | When Performed | AvailableChangeVersions Result | MinChangeVersion Used | MaxChangeVersion Used |
---|---|---|---|---|
Initial | Start | 100 | 100 | |
Incremental #1 | Immediately after completion of start | 200 | 100 | 200 |
Incremental #2 | 1 day after start | 300 | 100 | 300 |
Incremental #3 | 2 days after start | 400 | 200 | 400 |