Improving ODS Query Performance With Indexes
- Stephen Fuqua
- Ian Christopher (Deactivated)
Overview
The Analytics Middle Tier views — especially fact-type views — are complex and contain many conditions in the JOIN and WHERE clauses. The ODS database is optimized for high performance on REST API operations, necessarily striking a balance between write and read performance. Thus there are performance-enhancing indexes on natural keys, which are used in some GET requests, but not much else. Adding other indexes can significantly boost performance of ad hoc queries and permanent views such as those created by the Analytics Middle Tier, but they come with a cost: higher disk space utilization and slower write performance.
Optional Indexes
In addition to the optional AMT Collections, there exists an option to install extra indexes that significantly improve the performance of some of the AMT queries. Install is simple: just provide the "indexes" keyword on the options flag (-o
or --options
):
.\EdFi.AnalyticsMiddleTier.Console.exe -c "..." -o indexes
This installs the following indexes:
Table | On Columns | Include Columns |
---|---|---|
edfi.Grade | StudentUSI, SchoolId, LocalCourseCode, SchoolYear, SectionIdentifier, SessionName | NumericGradeEarned |
edfi.StudentSectionAssociation | SchoolId, LocalCourseCode, SchoolYear, SessionName | EndDate, LastModifiedDate |
edfi.AcademicSubjectType | CodeValue | (none) |
On the Glendale sample database, the Grade
table storage allocation grows by approximately 9% with the addition of this index: 75 MB additional space for sample grade data covering 16,724 students. This is not a significant additional burden, and may not cause a noticeable performance impact. The impact for a large district or a state, however, could be problematic for write performance to the Grade table. The storage allocation for StudentSectionAssociation
similarly grows by about 10% in Glendale, with approximately 30 MB dedicated to the new index. The v2.2-only AcademicSubjectType
index has a negligible footprint.
SELECT tn.[name] AS [Table name], ix.[name] AS [Index name], SUM(sz.[used_page_count]) * 8 AS [Index size (KB)] FROM sys.dm_db_partition_stats AS sz INNER JOIN sys.indexes AS ix ON sz.[object_id] = ix.[object_id] AND sz.[index_id] = ix.[index_id] INNER JOIN sys.tables tn ON tn.OBJECT_ID = ix.object_id GROUP BY tn.[name], ix.[name] ORDER BY 3 desc
Adding New Indexes
If you have a performance problem on one or more views, one of the patterns for limiting impact on the ODS is to copy the data to a reporting server via replication or log shipping. This reporting database is a better place for adding the optional indexes collection. Once you have created some representative queries against the Analytics Middle Tier views, either directly or through a BI/reporting tool, you may find that the performance warrants another look at additional indexes. Don't just blindly create those "missing" indexes! is a good reference article on how to detect potential indexes and decide on whether to implement them.
Alternatively, if the views all return within minutes (where microseconds to seconds is the objective), then you may benefit from streaming the view contents out to new tables on another database, where you can then optimize a concrete table instead of optimizing a view.
LastModifiedDate
Finally, a special word about LastModifiedDate: this column is used on many different tables for display only; it is not part of the WHERE clause or other selection criteria in the Analytics Middle Tier views. However, if you are regularly querying for changed records based on this column, your query time will benefit greatly from adding an index on this column to the key tables used by the Core View Collection and/or other view collections.
Contents