Executive Summary

In December 2015, the Ed-Fi Alliance concluded an effort to test the full spectrum of the ODS / API capabilities under load.¹ The testing covered both transactional operations to create, read, update, and delete entities, as well as bulk operations supporting the import of large files. This technical article reports on the results of the transactional performance testing. Performance testing results for bulk loading are covered in another article, ODS / API Bulk Load Performance Testing.

In transactional tests, the API web server CPU and memory usage increased with activity, but the SQL Server hosting the ODS rarely experienced a spike greater than a 10% CPU utilization. For this reason, testing focused on the performance characteristics of the API web server(s).

The performance tests were run against a single-web-server configuration and a load-balanced, multiple-web-server configuration. Each type of configuration was tested with increasingly powerful virtual machines. The configurations were designed to be characteristic of production environments with a vertical-scaling strategy (i.e., achieving scale by investing in a few, very powerful servers) and a horizontal-scaling strategy (i.e., achieving scale by balancing load across multiple, relatively inexpensive servers).

The performance tests applied increasing pressure (i.e., an increased number of requests per second) in stages to determine the point of stability, stress, and failure for each configuration at each virtual machine size. The virtual servers used were Amazon Web Service (AWS) machines. The high level testing results are summarized in the table below.

Scaling Strategy	Virtual Web Server Size	Stable Requests/sec.	Burst Requests/sec.	Failure Requests/sec.
Horizontal	2 x Medium	525-550	575-600	625-650
Horizontal	4 x Medium	875-900	1050-1075	1275-1300
Vertical	Medium	175-200	225-250	275-300
Vertical	Large	375-400	475-500	650-675
Vertical	Extra Large	575-600	775-800	850-875

Detailed results and server specifications can be found later in this document.

Notes:

The ODS / API system as a whole proved to be stable under sustained transactional load.
Stability was defined as a consistent average response time of less than 1 sec. / request. The minimum response time for any operation on the configured system was measured at .013 seconds.
The load-balanced, horizontal scaling configuration outperformed the vertical scaling strategy using a comparable number of processors and memory.

The load simulated by these tests approximates a fairly high degree of activity at a mid-sized organization, using easily accessible and relatively inexpensive virtual machines. As a point of comparison, an SEA-sponsored production system with over 250K students experiences around 40 transactions/second during business hours on a “normal” day. The intent in using this testing approach was to provide a baseline for organizations to use in planning. The solution can easily be scaled to handle larger organizations or increased performance needs

Project Detail

This section provides detail about the objectives, scope, methodology, of the performance testing effort as well as the architecture tested.

Project Objectives

The transactional load testing objectives were:

Validate that the ODS / API is stable under sustained transactional load.
Determine practical limits of various server sizes.
Compare the performance of vertical and horizontal scaling strategies.
Report the results to assist implementers in planning for production deployments.

Scope

The transactional load testing exercised all types of API operations under varying load levels.

API Coverage. The testing exercised every type of domain aggregate exposed by the ODS / API, except StudentGradebookEntry, which is roughly 99% of the API resource surface. The tests did not include "helper" API endpoints such as Types, Descriptors, the bulk load endpoints (discussed in a separate technical article) and the Unique ID endpoints.
Request Types. Transactional requests exist in four different flavors: Create, Read, Update, and Delete (CRUD) operations for each domain aggregate exposed by the ODS / API. Each operation result is categorized into either “success,” meaning the operation completed without error, or “failure,” with an error message indicating the type of error.
Request Load. The Load Testing application allowed for the transactional tempo to be increased by increasing the number of threads. The transactional tests also have configuration options to set the mixture ratio for how many of each operation to perform, which was important when trying to simulate different scenarios such as initial setup, enrollment, day-to-day, and end of year.

Testing Methodology

The goal of this phase of transactional load testing was to determine approximately how many requests per second various server configurations can handle. For comparison purposes, each configuration was analyzed to determine three levels of performance.

The first level is stable throughput, a level that a server could handle with reasonable response time (<1 second) and continue to handle indefinitely. The second level is burst throughput, a level that a server can handle but has noticeable impact on response times (>1 second), as well as eventually leading to service unavailable errors if the burst continues for too long. The final level is the point of failure, the requests per second that lead to very slow response times and a noticeable number of server failures (Service Unavailable or GatewayTimeout) almost immediately.

The load testing was performed using a custom application available to Ed-Fi Licensees. Details on downloading, building, and running load tests using the application can be found in the technical article /wiki/spaces/ODSAPI23/pages/21562157.

Testing Architecture

The Ed-Fi ODS / API can be deployed in a variety of architectural configurations, from a single server (as in a development or test machine) to various load-balanced, multi-machine configurations.

Performance tests were run against configurations representative of typical, cloud-based production environments. Both a horizontally scaled and vertically scaled solution was tested, each with a variety of server instance types. Since hardware characteristics can vary results greatly, testing was performed using Amazon Web Services (AWS) to provide a more-or-less standard point of reference.

Vertical configuration testing aimed to understand the performance profile as the web server specifications were increased and horizontal configuration testing which provided insight into performance when multiple web servers are used.

Server Configurations Used for Testing

Horizontally Scaled Server Configuration

Testing was performed against horizontally scaled components distributed on AWS in the following configurations:

Web Servers	Database Server	Load Balancer
2 x Medium	Medium	AWS Elastic Load Balancing
4 x Large	Medium	AWS Elastic Load Balancing

Vertically Scaled Server Configuration

Testing was performed against vertically scaled components distributed on AWS in the following configurations:

Web Server	Database Server	Load Balancer
Medium	Medium	No Load Balancing
Large	Large	No Load Balancing
Extra Large	Extra Large	No Load Balancing

Software & Platform Information

Microsoft Internet Information Server
SQL Server 2012 Enterprise
Ed-Fi ODS / API v2.0 Public Release

Software Components

ODS Web API. Encompasses the RESTful endpoints that allow CRUD operations against the ODS database, plus the API endpoints related to the Bulk Load Services.
ODS Database. The SQL Server installation hosting the ODS and its supporting databases.

Test Results

This section provides detail about the server configurations and associated test results.

Horizontally Scaled Configuration Results

Horizontal testing generally showed stability across the board, to the point of hitting the limits of what the each infrastructure level can handle. In contrast to issues described in the Vertical Scaling section below for the vertical Extra Large test, horizontal tests showed that the IIS queues were not overloaded. This is due to multiple servers each with their own IIS queue, requiring a very large number of requests to fill up the queues.

The horizontal configuration also drastically outperformed a similar number of CPU cores in the vertical configuration, as a result of the inherent benefits of the load balancer handling requests. The individual web servers remained stable due to the fact that if one server would be tied up or blocked by a bad request the other server would continue to process. The load balancer also helped once the configuration was under load, since a dedicated server checking the underlying web server health provided fast responses once the service was unavailable, and reasonably graceful behavior even when overloaded.

Based on these findings and test results, we conclude that a horizontally scaled implementation is generally more performant than a vertical configuration and is the recommended approach for large-scale implementations.

Vertically Scaled Configuration Results

Under normal, steady load, single-web-server vertical configurations were stable. However, under stress, vertical configurations failed when overloaded, oftentimes blocking up the server for up to a minute after the requests stopped being sent.

Without a load balancer, the API web server is responsible for sending the Service Unavailable response. Often the server would be so busy with requests that it would take upwards of 30 seconds to inform the client that the service was unavailable. This causes the very dramatic jumps in response time near the upper reaches of requests per second.

Finally, there were noticeable queue issues with the Extra Large test scenario. The powerful hardware in this setup caused the default configuration of the IIS queue to fill up at times, even when the server itself wasn't overloaded. This is represented in the data by the existence of Service Unavailable in small numbers even when the response time is still low and the server isn't highly utilized. This could be mitigated by adjusting the queue size on IIS when running on a stronger server.

The figures below show response times and requests per second at each request level. Graphs are shown for CPU usage on Medium, Large and Extra Large web server configurations.

Overall Server Health

In general, a healthy server should show low (sub-second) response times, and a response / second rate very similar to the request / second rate. These numbers were used to determine the approximate stable request / second range for a given configuration. As shown in the "Unhealthy Server" chart below, the response time and the number of responses per second vary greatly once the server gets unloaded, leading to inconsistent results for client applications. The response times and responses will try to catch up because IIS and the load balancer are designed to try to recover in these scenarios, but spikes will continue to occur because the server simply can't handle the number of requests being sent to it.

Error Condition Profile

Complex, multi-tier systems under load sometimes exhibit errors that aren't reproducible and difficult to diagnose. The following graph shows a request/response profile for an event the team encountered during testing, where internal server errors caused dramatic spikes in response times. The response times are reasonable until the scenario occurs (around 60 seconds into the test run), at which point the server stops sending back responses for a period of time. The server eventually recovers and works the queue to catch up, bursting a large number of responses. Eventually the server levels off, until the issue happens again.

In production, these types of errors need to be worked individually, and can be caused by a number of factors in the configuration or the code. (In fact, the errors the team encountered in the test runs are being investigated by Ed-Fi technologists and tracked in ODS-631 - Getting issue details... STATUS to see if a code fix is indicated.)

Recommendations

This section summarizes the recommendations based on the latest round of load testing.

Large-scale implementations should prefer horizontal, load-balanced scaling strategies over vertical scaling.
Set logging levels appropriately for production. The log4net configuration should be set to error only in production instances, except when troubleshooting. Turn off the SystemDiagnosticsTracing in production systems.

Test Result Detail

This section contains detail about the testing methods and result data from the testing.

Test Server Specifications

Web Servers

Amazon EC2 C4.x instance types were used to support the web application servers.

Model	Series	Model	vCPU	Mem (GiB)	SSD Storage (GB)	Dedicated EBS Throughput (Mbps)
Small	c4	large	2	3.75	EBS-only	500
Medium	c4	xlarge	4	7.5	EBS-only	750
Large	c4	2xlarge	8	15	EBS-only	1,000
Extra Large	c4	4xlarge	16	30	EBS-only	2,000

Amazon C4 instances are the latest generation of Compute-optimized instances, featuring the highest performing processors and the lowest price/compute performance in EC2.

Features:

High frequency Intel Xeon E5-2666 v3 (Haswell) processors optimized specifically for EC2
EBS-optimized by default and at no additional cost
Ability to control processor C-state and P-state configuration on the c4.8xlarge instance type
Support for Enhanced Networking and Clustering

Database Server

Amazon EC2 R3.x instance types were used to support the database server.

Model	Series	Model	vCPU	Mem (GiB)	SSD Storage (GB)
Small	r3	large	2	15.25	1 x 32
Medium	r3	xlarge	4	30.5	1 x 80
Large	r3	2xlarge	8	61	1 x 160
Extra Large	r3	4xlarge	16	122	1 x 320

Additional drives were allocated to support the SQL data, log, and tempdb files. This was required to maximize IOPS disk performance across the volumes. R3 instances are optimized for memory-intensive applications and have the lowest cost per GiB of RAM among Amazon EC2 instance types.