In December 2015, the Ed-Fi Alliance concluded an effort to test the full spectrum of the ODS / API capabilities under load.1 The testing covered both transactional operations to create, read, update, and delete entities, as well as bulk operations supporting the import of large files. This technical article reports on the results of the transactional performance testing. Performance testing results for bulk loading are covered in another article, ODS / API Bulk Load Performance Testing.
In transactional tests, the API web server CPU and memory usage increased with activity, but the SQL Server hosting the ODS rarely experienced a spike greater than a 10% CPU utilization. For this reason, testing focused on the performance characteristics of the API web server(s).
The performance tests were run against a single-web-server configuration and a load-balanced, multiple-web-server configuration. Each type of configuration was tested with increasingly powerful virtual machines. The configurations were designed to be characteristic of production environments with a vertical-scaling strategy (i.e., achieving scale by investing in a few, very powerful servers) and a horizontal-scaling strategy (i.e., achieving scale by balancing load across multiple, relatively inexpensive servers).
The performance tests applied increasing pressure (i.e., an increased number of requests per second) in stages to determine the point of stability, stress, and failure for each configuration at each virtual machine size. The virtual servers used were Amazon Web Service (AWS) machines. The high level testing results are summarized in the table below.
Scaling Strategy | Virtual Web Server Size | Stable Requests/sec. | Burst Requests/sec. | Failure Requests/sec. |
---|
Horizontal | 2 x Medium | 525-550 | 575-600 | 625-650 |
Horizontal | 4 x Medium | 875-900 | 1050-1075 | 1275-1300 |
Vertical | Medium | 175-200 | 225-250 | 275-300 |
Vertical | Large | 375-400 | 475-500 | 650-675 |
Vertical | Extra Large | 575-600 | 775-800 | 850-875 |
Detailed results and server specifications can be found later in this document.
Notes:
- The ODS / API system as a whole proved to be stable under sustained transactional load.
- Stability was defined as a consistent average response time of less than 1 sec. / request. The minimum response time for any operation on the configured system was measured at .013 seconds.
- The load-balanced, horizontal scaling configuration outperformed the vertical scaling strategy using a comparable number of processors and memory.
The load simulated by these tests approximates a fairly high degree of activity at a mid-sized organization, using easily accessible and relatively inexpensive virtual machines. As a point of comparison, an SEA-sponsored production system with over 250K students experiences around 40 transactions/second during business hours on a “normal” day. The intent in using this testing approach was to provide a baseline for organizations to use in planning. The solution can easily be scaled to handle larger organizations or increased performance needs
This section provides detail about the objectives, scope, methodology, of the performance testing effort as well as the architecture tested.
The transactional load testing objectives were:
- Validate that the ODS / API is stable under sustained transactional load.
- Determine practical limits of various server sizes.
- Compare the performance of vertical and horizontal scaling strategies.
- Report the results to assist implementers in planning for production deployments.
The transactional load testing exercised all types of API operations under varying load levels.
- API Coverage. The testing exercised every type of domain aggregate exposed by the ODS / API, except StudentGradebookEntry, which is roughly of the API resource surface. The tests did not include "helper" API endpoints such as Types, Descriptors, the bulk load endpoints (discussed in a separate technical article) and the endpoints.
- . Transactional requests exist in four different flavors: Create, Read, Update, and Delete (CRUD) operations for each domain aggregate exposed by the ODS / API. Each operation result is categorized into either “success,” meaning the operation completed without error, or “failure,” with an error message indicating the type of error.
- Request Load. The Load Testing application allowed for the transactional tempo to be increased by increasing the number of threads. The transactional tests also have configuration options to set the mixture ratio for how many of each operation to perform, which was important when trying to simulate different scenarios such as initial setup, enrollment, day-to-day, and end of year.
The goal of this phase of transactional load testing was to determine approximately how many requests per second various server configurations can handle. For comparison purposes, each configuration was analyzed to determine three levels of performance.
The first level is stable throughput, a level that a server could handle with reasonable response time (<1 second) and continue to handle indefinitely. The second level is burst throughput, a level that a server can handle but has noticeable impact on response times (>1 second), as well as eventually leading to service unavailable errors if the burst continues for too long. The final level is the point of failure, the requests per second that lead to very slow response times and a noticeable number of server failures (Service Unavailable or GatewayTimeout) almost immediately.
The load testing was performed using a custom application available to Ed-Fi Licensees. Details on downloading, building, and running load tests using the application can be found in the technical article /wiki/spaces/ODSAPI23/pages/21562157.
The Ed-Fi ODS / API can be deployed in a variety of architectural configurations, from a single server (as in a development or test machine) to various load-balanced, multi-machine configurations.
Performance tests were run against configurations representative of typical, cloud-based production environments. Both a horizontally scaled and vertically scaled solution was tested, each with a variety of server instance types. Since hardware characteristics can vary results greatly, testing was performed using Amazon Web Services (AWS) to provide a more-or-less standard point of reference.
Vertical configuration testing aimed to understand the performance profile as the web server specifications were increased and horizontal configuration testing which provided insight into performance when multiple web servers are used.
Testing was performed against horizontally scaled components distributed on AWS in the following configurations:
| | Load Balancer |
---|
2 x Medium | Medium | AWS Elastic Load Balancing |
4 x Large | Medium | AWS Elastic Load Balancing |
Testing was performed against vertically scaled components distributed on AWS in the following configurations:
| | Load Balancer |
---|
Medium | Medium | No Load Balancing |
Large | Large | No Load Balancing |
Extra Large | Extra Large | No Load Balancing |
- Microsoft Internet Information Server
- SQL Server 2012 Enterprise
- Ed-Fi ODS / API v2.0 Public Release
- ODS Web API. Encompasses the RESTful endpoints that allow CRUD operations against the ODS database, plus the API endpoints related to the Bulk Load Services.
- ODS Database. The SQL Server installation hosting the ODS and its supporting databases.
This section provides detail about the server configurations and associated test results.
Horizontal testing generally showed stability across the board, to the point of hitting the limits of what the each infrastructure level can handle. In contrast to issues described in the Vertical Scaling section below for the vertical Extra Large test, horizontal tests showed that the IIS queues were not overloaded. This is due to multiple servers each with their own IIS queue, requiring a very large number of requests to fill up the queues.
The horizontal configuration also drastically outperformed a similar number of CPU cores in the vertical configuration, as a result of the inherent benefits of the load balancer handling requests. The individual web servers remained stable due to the fact that if one server would be tied up or blocked by a bad request the other server would continue to process. The load balancer also helped once the configuration was under load, since a dedicated server checking the underlying web server health provided fast responses once the service was unavailable, and reasonably graceful behavior even when overloaded.
Based on these findings and test results, we conclude that a horizontally scaled implementation is generally more performant than a vertical configuration and is the recommended approach for large-scale implementations.
Under normal, steady load, single-web-server vertical configurations were stable. However, under stress, vertical configurations failed when overloaded, oftentimes blocking up the server for up to a minute after the requests stopped being sent.
Without a load balancer, the API web server is responsible for sending the Service Unavailable response. Often the server would be so busy with requests that it would take upwards of 30 seconds to inform the client that the service was unavailable. This causes the very dramatic jumps in response time near the upper reaches of requests per second.
Finally, there were noticeable queue issues with the Extra Large test scenario. The powerful hardware in this setup caused the to fill up at times, even when the server itself wasn't overloaded. This is represented in the data by the existence of Service Unavailable in small numbers even when the response time is still low and the server isn't highly utilized. This could be mitigated by adjusting the queue size on IIS when running on a stronger server.
The figures below show response times and requests per second at each request level. Graphs are shown for CPU usage on Medium, Large and Extra Large web server .
In general, a healthy server should show low (sub-second) response times, and a response / second rate very similar to the request / second rate. These numbers were used to determine the approximate stable request / second range for a given configuration. As shown in the "Unhealthy Server" chart below, the response time and the number of responses per second vary greatly once the server gets unloaded, leading to inconsistent results for client applications. The response times and responses will try to catch up because IIS and the load balancer are designed to try to recover in these scenarios, but spikes will continue to occur because the server simply can't handle the number of requests being sent to it.
Complex, multi-tier systems under load sometimes exhibit errors that aren't reproducible and difficult to diagnose. The following graph shows a request/response profile for an event the team encountered during testing, where internal server errors caused dramatic spikes in response times. The response times are reasonable until the scenario occurs (around 60 seconds into the test run), at which point the server stops sending back responses for a period of time. The server eventually recovers and works the queue to catch up, bursting a large number of responses. Eventually the server levels off, until the issue happens again.
In production, these types of errors need to be worked individually, and can be caused by a number of factors in the configuration or the code. (In fact, the errors the team encountered in the test runs are being investigated by Ed-Fi technologists and tracked in
ODS-631
-
Getting issue details...
STATUS
to see if a code fix is indicated.)
This section summarizes the recommendations based on the latest round of load testing.
- Large-scale implementations should prefer horizontal, load-balanced scaling strategies over vertical scaling.
- Set logging levels appropriately for production. The log4net configuration should be set to error only in production instances, except when troubleshooting. Turn off the SystemDiagnosticsTracing in production systems.