The Ed-Fi Resources API, based on the Ed-Fi Data Standard, provides fine-grain access to educational data, modeled largely on the common denominators of the source systems that provide the data. For applications that consume data from an Ed-Fi API, this can result in a very “chatty” application integration: the consumer must make large numbers of calls over the network to retrieve the required data.
Furthermore, the authorization model in an Ed-Fi API is designed for client-server interactions, not for user interactions. Thus, the Ed-Fi API should not be used directly from a user interface.
In this article, we will explore design patterns and implementation concerns for building backend applications that address these problems.
Data Access Patterns
These patterns describe several different architectures for an application to access Ed-Fi data.
Direct Database Interaction
This is an anti-pattern.
How it works
A dedicated backend application interacts directly with the Ed-Fi database.
When to use
The Ed-Fi Alliance strongly discourages this pattern for the following reasons:
It bypasses the authorization security in the Ed-Fi API, potentially affecting both read and write operations.
This approach may put too much strain on the primary data storage, causing resource contention for other Ed-Fi client applications that are using the Ed-Fi API.
The Ed-Fi database structure is not a standard. Thus, different implementations or even different versions of the same implementation could have unstated breaking changes at the database layer. For example, the https://edfi.atlassian.net/wiki/spaces/ODSAPIS3V72 and the Data Management Service Platform have very different backend database structures. An integration built on the ODS/API’s
EdFi_ODS
database would not be compatible with the Data Management Service’s database.
Implementation Notes
Although no longer considered advisable, the performance benefits make this a tempting option. Many applications have been built on this model in the past. In such cases, it is advisable to limit the direct database interaction to read operations only, and to run from a read-only database copy. The copy could be a snapshot or replica. Using a read-only copy mitigates the resource contention concern on the primary database. Limiting to read operations eliminates half of the authorization security concern.
Also see Row-Level Authorization below.
Real-time API Interaction
How it works
A dedicated backend application interacts directly with the Ed-Fi API, translating the incoming coarse-grained request into many fine-grained requests that utilize the Ed-Fi Resources API or other API specifications implemented in the Ed-Fi service application.
When to use
Use when true “real-time” interaction with the Ed-Fi API is required. This pattern works best when only a small number of calls to the Ed-Fi API are needed or when the calling service can safely wait for an extended period of time. If many requests are required of the Ed-Fi API to fulfill the “user” application’s originating request, then there could be a substantial delay before responding. This might not be acceptable in a user interface application.
Caution: If this integration is intended to read data from the Ed-Fi API that were sourced from a different system, then real-time integration might not be feasible. Check to see if the other source system(s) have real-time or batched integrations. If batched, then help the end-users adjust their expectations about data freshness.
Implementation Notes
Ed-Fi API client credentials need to be managed directly in the backend application. The client_id
and client_secret
should be secured as strongly as one would secure credentials to a backend database.
Application performance may be improved by caching some data from the Ed-Fi API if real-time updates are not required for those cached data.
The Ed-Fi ODS/API Platform has a feature allowing API clients to use a read replica database. Using a read replica on GET requests can help reduce contention with systems that are actively writing to the API.
It may be useful to prepare for additional server load by monitoring resource consumption and performance and having a contingency plan for vertical scale-up (additional memory or CPU) and/or horizontal scale-out (additional nodes in clustered deployments).
Batch and Save
How it works
The backend application retrieves optimized data from a local data store, thus improving the response time on the originating request. A separate ETL process runs on a schedule to pull data from the Ed-Fi API, reshape it according to the backend application’s needs, and place into the shared data store.
When to use
Use when user interface responsiveness is more critical than data freshness, and/or when a single “front end” request would generate more than some small number (2? 3?) of synchronous calls to the Ed-Fi API.
This pattern is also advisable when further preparation is necessary before using the data – the “transform” portion of “ETL”.
Implementation Notes
API Credentials
Ed-Fi API client credentials will need to be managed in the ETL application. The client_id
and client_secret
should be secured as strongly as one would secure credentials to a backend database.
Scheduling
To optimize the batch scheduling, it may be useful to analyze the arrival time of data in the Ed-Fi API, potentially using queries on the backend database if it is accessible. If that database is not accessible, then try having a conversation with the service host to see if they can provide insight on the frequency and time of day when modifications are made in the Ed-Fi API.
Change Queries
If satisfying the frontend requirements only requires storing hundreds to thousands of records, it may be feasible to perform a full refresh of the data on a schedule. As the number of records to retrieve increases, a full refresh will take longer and can put significant strain on the Ed-Fi API. In such cases, the Change Queries API can be used to detect deleted records and to look for new or updated records.
Streaming Data
How it works
Ed-Fi Resources are copied into a streaming platform, such as Kafka, generally in real-time. Another application reads from the data streams, transforms data to fit the frontend application requirements, and saves the results into a local data store.
When to use
This pattern inverts the ETL process described in the Batch and Save pattern, by pushing changed records into the database instead of requiring a scheduled pull operation. It combines the “real-time” benefit of direct API integration with the data storage optimization of Batch and Save.
This architecture would be most appropriate when the end-user application is managed by the same organization that is managing the Ed-Fi API. Otherwise, it may be difficult to overcome the network and authorization security challenges between two different parties.
Caution: this pattern is not advisable when using the Ed-Fi ODS/API Platform. Technically feasible, it would require running Change Data Capture on the Ed-Fi ODS database. The result would be a data stream that looks like the ODS database, rather than looking like the Ed-Fi Unifying Data Model (as surfaced in the Ed-Fi API). Thus, this is in essence a more complex version of the Direct Database Interaction anti-pattern described above.
Other Ed-Fi API applications, such as the forthcoming Ed-Fi Data Management Service, or applications developed by parties other than the Ed-Fi Alliance, may support this pattern.
Implementation Notes
This is an emergent pattern that the Ed-Fi community has not widely used. It requires deployment of several additional components that are not present in other patterns (stream processor, change data capture, etc.).
Frontend API Design Pattern
These patterns describe common approaches for building an end-user application that uses one of the Data Access patterns above to access the Ed-Fi data.
Backend-for-frontend
How it works
This pattern starts from the needs of a specific user interface, creating a finely tuned API specification that optimizes data transfer for that application. The backend-for-frontend service (BFF) then handles translation of the custom specification into requests for data from a local data store or from the Ed-Fi API, following one of the Data Access Patterns above.
When to use
There is only a single front-end application that needs access to the Ed-Fi resources.
Implementation Notes
See Row Level Security below.
Also see: Backends for Frontends pattern - Azure Architecture Center | Microsoft Learn
Central Aggregating Gateway
How it works
Like the BFF pattern, this pattern creates a custom API that is more appropriate to the use case than the Ed-Fi API, aggregating what would otherwise be multiple calls to fetch data into a single (or fewer, at least) call to the gateway service. Unlike the BFF, the aggregating gateway is generalized to support multiple use cases or applications. It may even present a GraphQL interface instead of a REST interface.
When to use
When multiple applications need access to an Ed-Fi data, with strong overlap in the data required. If the required data sets are very different, then the optimization of a BFF service may be a better fit for purpose.
Implementation Notes
See Row Level Security below.
Also see: Gateway Aggregation pattern - Azure Architecture Center | Microsoft Learn
Row-Level Security
The Family Educational Rights and Privacy Act (FERPA) outlines certain data privacy rights for students plus the rules by which student data can be shared to anyone other than the student or parent. Systems that utilize Ed-Fi data must provide appropriate data security so that school officials, parents, and so forth are only authorized to view "need-to-know" records. What is appropriate may vary from state to state and district to district.
In K–12 scenarios, several common roles clearly require different degrees of authorization to view student data:
Superintendents see data for all students in their district.
Principals see data for all students in their school.
Teachers see data for all students in their classes.
Parents see data for all their children.
Students see only data for themselves.
Real-world usage might not map job titles to data authorization levels in such a simple manner. There may be district employees other than superintendents who need access to all students. An Assistant Principal might take the lead on checking an Early Warning system. Rather than speaking about roles, it may be more useful to speak of access scopes, such as:
District
School
Section
Each application will need to devise its own mechanism for determining the correct scope of access for a user.