T-ODS Design & Data Model
- Chris Moffatt (Deactivated)
Introduction
This documentation serves as an introduction to the overall design and data model for the Temporal ODS (T-ODS), which consists of data structures and database support for an efficient means of loading, storing, and making available data at multiple points in time. These structures handle related snapshot and other historical data functions, and provide support for applications to access and query that historical data.
Audience
The audience for this documentation are analysts, developers, and database administrators who are working with the T-ODS. Those considering Ed-Fi ODS / API adoption may find this information useful as well.
T-ODS Design
The T-ODS is comprised of metadata tables and data tables.
The snapshot features of the T-ODS are driven by two metadata tables and a set of metadata-to-data bridge tables to support the snapshot architecture and T-ODS / T-API functionality.
- The first metadata table, tods.Snapshot, stores snapshot-level metadata, such as the SnapshotName and the Status (e.g., ACTIVE).
- The second metadata table, tods.SnapshotCode, associates individual Snapshot records to one or more SnapshotCodes. This supports snapshot aliasing, meaning that client systems and queries can use multiple names to query or perform operations against a single Snapshot.
- The set of metadata-to-data bridge tables consists of a t_[ODS namespace].[ODS table name]SnapshotRecord (e.g., t_edfi.StudentSnapshotRecord), which associates individual records in the T-ODS domain aggregate data tables to one or more snapshots.
The T-ODS contains a set of data tables parallel to the Ed-Fi ODS data tables, including Ed-Fi Extensions, if present. T-ODS schema artifacts (e.g., tables, foreign keys, stored procedures) are generated using the MetaEd domain-specific language files and the MetaEd database script generation capability. To support the ODS extension schema, as well as plans for future namespace extensions, T-ODS data tables are generated in a separate schema for every ODS table generated presently. The T-ODS data tables are generated using the schema naming convention: t_[ODS namespace] (e.g., t_edfi, t_extension). The T-ODS tables are generated within the ODS database.
A given T-ODS data table has the same definition and patterns relative to its ODS counterpart with the following exceptions:
- An additional "tid_TableName" INT IDENTITY column is added as the new primary key
- An additional "t_PartitionYearId" SMALLINT column is added as the partition column
- ODS surrogate columns are replaced with the ODS natural key columns (e.g. AcademicSubjectTypeId is replaced with AcademicSubjectTypeCodeValue, StudentUSI is replaced with StudentUniqueId, LanguageDescriptorId is replaced with LanguageDescriptorCodeValue and LanguageDescriptorNamespace)
- The edfi.Descriptor table is excluded
- Its data columns are included in each Descriptor table
- Its data is loaded directly to the specific Descriptor data tables identified by the Namespace data
- Domain aggregate tables are linked internally by a T-ODS tid identity column
- Domain aggregate child tables do not contain the ODS parent key columns
While the T-ODS includes data tables for the ODS type tables, the T-ODS type tables are loaded during the build process and are not "snapshotted" since type enumerations are a core part of the standard and should not be modified. However, descriptors are snapshotted since descriptors are designed to provide a mechanism for local enumerations and could change over time.
A major challenge in the design of the T-ODS is avoiding an explosion of data storage needs caused by copies of data that do not change much over time. For example, the results of a local assessment delivered in the first week of school is unlikely to change much after loaded. If a licensee were to take a snapshot just once per week, ideally the T-ODS would not store 35 copies of this unchanged data.
The T-ODS solves this problem. Data is stored at the domain entity resource level where unique combinations of attribute values found across all the domain tables that represent a record for that domain entity. Each generated T-ODS snapshot stores only the unique combinations of domain record values found across all snapshots executed on its ODS counterpart domain entity tables. This is also true of data loaded via the bulk loader. This approach significantly reduces the physical space required to store snapshots, especially for records in the ODS with low churn, as only data that provides new information is added to the database. For example, if a given record in edfi.Student does not change throughout an entire school year, the student will be represented in the T-ODS as a single record even if the ODS is snapshotted weekly or even daily.
Temporal Data Model
Documentation Conventions
The following conventions were used in the model diagrams to illustrate the associations between the temporal data model entities:
For the purpose of describing types and Ed-Fi Descriptors, the naming convention "ContextDescriptor" and "ContextType" are used as a generic representation of the descriptive name of the individual Descriptor and type tables (e.g., AcademicSubjectDescriptor, RaceType).
T-ODS Metadata Model
The T-ODS contains two metadata tables and a set of metadata-to-data bridge tables to support the snapshot architecture and T-ODS/T-API functionality:
- tods.Snapshot
- Stores snapshot-level metadata, such as the SnapshotName and the Status (e.g.,'ACTIVE', 'INACTIVE')
- Must be created prior to loading the snapshot data tables
- tods.SnapshotCode
- Stores the multiple codes of a given Snapshot
- A Snapshot must have at least one code
- t_[ODS namespace].[table name]SnapshotRecord
- Associates individual records in the T-ODS data tables to one or more snapshots
- Most "core" Ed-Fi tables will be prefixed with t_edfi (e.g., t_edfi.Student, t_edfi.Staff)
- Extension tables have a different schema name; examples that ship with the T-ODS are prefixed with t_extension (e.g., t_extension.StudentTransportation, t_extension.BusRouteDayOfWeekDescriptor)
Snapshot Metadata to Type Entities Association
Types, including map types, are not associated to the Snapshot metadata:
- Each Type table is equivalent to its ODS counterpart in both structure and data for the columns:
- CodeValue
- ShortDescription
- Description
- Id
- LastModifiedDate
- CreateDate
- The ODS surrogate [Type table name ]Id column is dropped
A referential integrity SQL script maintains the external references between the data tables and the type tables
Abstract Model
The following model illustrates the relationships between:
- Snapshot metadata and the type entities
- Type entities and domain entities
- Type entities and Ed-Fi Descriptor tables (e.g., AcademicSubjectDescriptor)
Snapshot Metadata to Descriptor Entities Association
Descriptors are associated to the Snapshot metadata as follows:
- t_edfi.DescriptorSnapshotRecord associates the t_edfi.Descriptor, the temporal counterpart of the Core edfi.Descriptor entity
- Core and Extension Descriptors are associated via the t_[ODS namespace].[table name]SnapshotRecord (e.g., AcademicSubjectDescriptorSnapshotRecord)
- A Descriptor's association to the t_edfi.Descriptor entity is not defined by database constraints
Abstract Model
The following model illustrates the relationships between:
- Snapshot metadata and the t_edfi.Descriptor entity
- Snapshot metadata and the descriptor entities
- Descriptor entities and the t_edfi.Descriptor entity
- Type entities and descriptor entities
Snapshot Metadata to Domain Entities Association
The following abstract model illustrates these modeling points:
- Snapshot metadata is associated to a domain root entity via the t_[ODS namespace].[table name]SnapshotRecord (e.g., t_edfi.StudentSnapshotRecord)
- Domain child entities are not directly associated to the snapshot metadata
The following partial Student model uses a concrete example to show the relationships between:
- Snapshot metadata and the t_[ODS namespace].[domain root table name]SnapshotRecord tables
- Relationships between the domain data tables, type tables, and Ed-Fi Descriptor tables
Note that relationships within the domain data tables such as the parent-child relationship are established via new primary key identities (i.e., "tid_[TableName]") instead of natural keys or their ODS surrogate keys.
Temporal Data Tables Model
The T-ODS consists of a set of data tables parallel to the present and future Ed-Fi ODS data tables, including extensions.
To support extensions, as well as plans for future namespace extensions, the T-ODS data tables are generated in a separate schema: t_[ODS namespace]
The T-ODS data tables follow the same pattern relative to its ODS counterpart with the following exceptions:
- Generated using the naming convention t_[ODS namespace].[ODS table name]
- An additional tid_[table name] INT IDENTITY column is added as the new primary key
- An additional t_PartitionYearId SMALLINT column is added as the partition column
- ODS surrogate columns are replaced with the ODS natural key columns (e.g., AcademicSubjectTypeId is replaced with AcademicSubjectTypeCodeValue, StudentUSI is replaced with StudentUniqueId, LanguageDescriptorId is replaced with LanguageDescriptorCodeValue and LanguageDescriptorNamespace)
- The edfi.Descriptor table is excluded:
- Its data columns are included in each Descriptor table
- Its data is loaded directly into the specific Descriptor data tables identified by the Namespace data
- Domain aggregate tables are linked internally by a T-ODS tid identity column
- Domain aggregate child tables do not contain the ODS parent key columns
The LastModifiedDate of the root entity is used to determine if the ODS record was modified. For a given row, if the ODS LastModifiedDate <> the T-ODS LastModifiedDate then a new row is stored in the T-ODS resource table(s) during the snapshot process.
Other Topics
Replacement of ODS Surrogate Key Columns
The one-to-one relationship between the surrogate key and the natural key is maintained in the ODS since it stores current data only. However, in the T-ODS, there can be a many-to-one relationship between the ODS surrogate key value and the natural key. For this reason, the ODS surrogate keys are replaced with natural key column and values in the T-ODS. For example, the "USI" column in the person tables Student, Staff, and Parent is replaced by the "UniqueId" column. This change results in:
- The "USI" column being dropped in the Student, Staff, and Parent tables
- The "USI" column being replaced by the "UniqueId" in all external references
Surrogate key removal also impacts the "TypeId" and "DescriptorId" surrogate columns:
- "TypeId" column is replaced with the CodeValue of the counterpart type table (e.g. SexTypeId is replaced by SexTypeCodeValue)
- "DescriptorId" column is replaced with the CodeValue and Namespace columns of the counterpart descriptor table (e.g. LanguageDescriptorId is replaced by LanguageDescriptorCodeValue and LanguageDescriptorNamespace)
The reliability of data loaded from an ODS database is dependent on the data integrity enforced by the sourced ODS database and assumes:
- a primary key was defined for every table in the sourced database
- foreign key constraints were defined for every table with child-parent relationship(s)
The snapshot load process includes an integrity SQL script that audits the expected relationships and logs any exceptions identified. The T-API and T-Bulk features will use the same scripts to ensure that referential integrity is enforced for as updates and inserts
ODS Surrogate Key Column replacement and Impact on the T-ODS API POC
The guiding approach of the T-ODS API POC is to reuse the existing ODS API where possible. The resource and entity models along with their mapping code are one area where this design philosophy, while possible, requires significant code modifications to achieve this reuse. Sub-classes of the entities were created. References to person, type, and descriptor caches were replaced with analogous T-ODS API classes. Additional attributes were added.
With the changes to the internal and external key structures, the mapping model between resources and entities is simplified because the USI cache, Type cache, and Descriptor cache are not needed to lookup their corresponding IDs.
Partition Scheme
The T-ODS supports partitioning the data tables by school year via a t_PartitionYearId
column to enable linear or better performance impact as data grows to multiple years. Additionally, a stored procedure is provided for adding the front-end partition in preparation for a new school year. MetaEd generates the artifacts to create the objects required to support partitioning. The configuration table and stored procedures are created during the build process and the load process was designed to fill the t_PartitionYearId
column. The Snapshot utility provides functionality to create the initial partition scheme, as well as functionality to add the front-end partition in future years.
The following tables contain the t_PartitionYearId
column:
t_[ODS namespace].[T-ODS table name]SnapshotRecord
t_[ODS namespace].[ODS domain root table name]
and child tables within the resourcetods.Snapshot
tods.SnapshotCode
The T-ODS Type tables do not contain the t_PartitionYearId
column since the Types are not snapshotted.
The configurations needed for partitioning by school year are housed in the tods.PartitionYearConfig
table:
Column | Data Type | Purpose |
---|---|---|
PartitionYearConfigId | SMALLINT INDENTITY(1,1) |
|
BeginDate | Date |
|
EndDate | Date |
|
SchoolYear | NVARCHAR(10) | School year represented in format YYYY-YYYY |
CreateDate | DATETIME | Captures the date and time when the row was crated |
LastModifiedDate | DATETIME | Captures the date and time when the row was last modified |
Id | UNIQUEIDENTIFIER | T-ODS GUID for the partition year |
During the build process the dbo.PartitionYearConfig
table is loaded to reflect the SchoolYear
values (current, plus previous 11) with related default BeginDate
and EndDate
values. The Snapshot utility provides functionality for a snapshot administrator to manage the configuration table.
The tods.PartitionYearConfig contains the configurations required by the following stored procedures:
tods.usp_CreateInitialPartitionScheme
tods.usp_CreateNextPartition
tods.usp_DerivePartitionYearId
The stored procedure usp_CreateInitialPartitionScheme
performs the following operations to create the initial partitioning scheme:
- Creates the filegroups required to create the partition scheme
- Add the filegroups to the database
- Adds the database files to the filegroups
- Creates the partition range function that defines the range of values that are to be stored in the different partitions
- Creates the partition scheme for the partition range function using a list of Partition Years derived from the
tods.PartitionYearConfig
table - Creates the CLUSTERED INDEX on the
t_PartitionYearId
column of the T-ODS tables (Snapshot, SnapshotCode, all SnapshotRecord tables, and all data tables)
The stored procedure usp_CreateNextPartition
performs the following operations to add the front-end partition:
- Adds the next row to the tods.PartitionYearConfig table
- Generates the new file group required for the new front-end partition
- Adds the new file group to the database
- Adds the database files to the filegroup
- Adds the new file group to the partition scheme
- Adds the new value to the partition function range
The file groups for the partition scheme are created with the following properties:
SIZE = 750MB
MAXSIZE = 50000MB -> 50GB
FILEGROWTH = 500MB
The tods.usp_DerivePartitionYearId
is used during the snapshot load process to derive the value of the t_PartitionYearId
column for all data stored during the load process. The SnapshotDate
is used to derive the t_PartitionYearId
value.
Queries
Avoiding an explosion of data storage arising from snapshotting operations involves a performance tradeoff.
Read queries would be most performant if physical copies of every ODS record – including unchanged records – were stored with every snapshot. However, unique combinations of attribute values, and logical snapshots are reconstructed by joining to snapshot metadata tables at query runtime to reduce the storage requirements for a T-ODS instance. This join occurs using the SnapshotId and the associated tid primary key of the domain resource, which is an IDENTITY generated integer (rather than using an alphanumeric GUID or similar) to improve performance.
Because the join occurs over a primary key / foreign key relationship, the operation is efficient. The storage-optimized approach does not implement database constraints on “natural” primary keys and foreign keys from the Ed-Fi ODS tables into the T-ODS tables. As a result, joins between T-ODS data tables do not naturally have indexes that would allow the database optimizer to use efficient join algorithms. To remedy this, non-unique indexes were added on the T-ODS to match the columns making up the non-surrogate natural key.
To simplify querying the T-ODS data tables, a set of table-valued functions was created for each T-ODS data table:
- t_[ODS namespace].ufn_Get[ODS table name]ByDate (e.g., t_edf.ufn_GetStudentByDate), which is a table-valued function for viewing historical data by AsOfDate. This function returns data for single date within a snapshot, based on snapshot record effective dates.
- t_[ODS namespace].ufn_Get[ODS table name]ByRange (e.g., t_edf.ufn_GetStudentByRange), which is a table-valued function for viewing historical data by Effective date range. This function returns data where (Snapshot Record Effective Begin Date < input End Date OR input End Date IS NULL) AND Snapshot Record Effective End Date >= input Begin Date.
- t_[ODS namespace].ufn_Get[ODS table name]BySnapshot (e.g., t_edf.ufn_GetStudentBySnapshot), which is a table-valued function for viewing historical data by Snapshot. The parameter is a pipe-delimited string that accepts:
- Identifier, which returns ALL data within the Snapshot for the given snapshot identifier.
- Snapshot Date, which returns ALL data within the Snapshot for the given snapshot date.
- Snapshot Code, which returns ALL data within the Snapshot for the given snapshot code.
Referential Integrity Approach
The Referential integrity of updates and inserts submitted through the T-API will be enforced using a combination of database constraints and SQL scripts:
Internal References
Internal References are handled via Database constraints applied within a domain entity
- Within the T-ODS tables, the
tid (PK)
is established for the Root table and is used to associate the Root table to the relatedSnapshotRecord
table For example:tid (PK)
would be established for thet_edfi.Student
table and used to associate thet_edfi.Student
table and thet_edfi.StudentSnapshotRecord
table - Child table(s) contain a
tid (FK)
column to associate the child table(s) to the parent table. For example:t_edfi.StudentAddress
table would contain atid_Student (FK)
to thet_edfi.Student
table
External References
External references will be validated through the use of SQL stored procedures that check references external to the domain entity where the domain entity and external references are linked by the SnapshotId
. These are easiest to understand by example:
- Type example. For a given
SnapshotId
, Theusp_eStudent
SQL stored procedure checks if thet_edfi.StudentAddress.AddressType
is a valid value within thet_edfi.AddressType
table. - Descriptor
**
example. For a givenSnapshotId
, The sp_eStudent SQL stored procedure checks if thet_edfi.Student.BirthCountryDescriptorCodeValue
is a valid value within thet_edfi.CountryDescriptor
table. - Association example. For a given SnapshotId. The sp_eStudentProgramAssociation stored procedure checks if the t_edfi.StudentProgramAssociation.StudentUniqueId is a valid value with the t_edfi.Student table.
** As noted elsewhere, the T-ODS doesn't contain a t_edfi.Descriptor
table since the edfi.Descriptor
table columns have been moved into each specific Descriptor table in the T-ODS.
Temporal Field Data Dictionary
This section provides a data dictionary for the tables referenced in this documentation.
tods.Snapshot
The metadata table tods.Snapshot stores snapshot-level metadata.
Field | Data Type | Defintion |
---|---|---|
SnapshotId | INT IDENTITY |
|
SnapshotDate | DATE |
|
EffectiveBeginDate | DATE |
|
EffectiveEndDate | DATE |
|
SnapshotName | NVARCHAR(450) |
|
Description | NVARCHAR(1024) | User friendly description of the snapshot |
IsLockedForProcessing | BIT |
|
IsLocked | BIT |
|
Status | NVARCHAR(10) |
|
LoadStartTime | DATETIME | Captures the start time of the most recent data load process performed on the snapshot |
LoadEndTime | DATETIME | Captures the end time of the most recent data load process performed on the snapshot |
LastModifiedDate | DATETIME | Captures the last date the snapshot metadata was modified |
Id | UNIQUEIDENTIFIER | T-ODS GUID for the snapshot |
t_PartitionYearId | SMALLINT |
|
tods.SnapshotCode
The metadata table tods.SnapshotCode stores one or more "aliases" of a given Snapshot metadata record.
Field | Data Type | Definition |
---|---|---|
SnapshotCodeId | INT IDENTITY |
|
SnapshotCode | NVARCHAR(50) |
|
SnapshotId | INT | FK to associated Snapshot metadata record |
t_PartitionYearId | SMALLINT |
|
t_[ODS namespace].[data table name]SnapshotRecord
The Snapshot Record metadata tables associate individual records in the T-ODS data tables to one or more snapshots.
Field | Definition | |
---|---|---|
SnapshotId | INT | FK to associated Snapshot metadata record |
tid_[data table name] (e.g. tid_Student, tid_SexType, tid_AcademicSubjectDescriptor) | INT | FK to associated temporal data table |
EffectiveBeginDate | DATE |
|
EffectiveEndDate | DATE |
|
Id | UNIQUEIDENTIFIER | T-ODS GUID for the snapshot |
t_PartitionYearId | SMALLINT |
|
T-ODS Data Tables
A given T-ODS data table has the same definition and patterns relative to its ODS counterpart with a few exceptions as described below.
Columns Across All t_[ODS namespace].[data table name]
The following lists the fields that are common across all T-ODS data tables.
Field | Data Type | Definition |
---|---|---|
tid_[dataTableName] (e.g., the t_edfi.Student table has tid_Student PK; the t_edfi.SexType table has tid_SexType PK, the t_edfi.AcademicSubjectDescriptor table has the tid_AcademicSubjectDescriptor PK)
| INT IDENTITY |
|
t_PartitionYearId | SMALLINT | To support partitioning scheme in order to enable linear or better performance impact as data grows to multiple years |
Patterns Specific to Domain Child t_[ODS namespace].[data table name]
The following lists the patterns specific to T-ODS domain child data tables.
Field | Data Type | Definition |
---|---|---|
tid_[parentDataTableName] | INT IDENTITY | FK to associated parent - the immediate parent |
ODS surrogate key column(s) replaced with natural key column(s) (e.g., AcademicSubjectDescriptorId replaced with AcademicSubjectDescriptorCodeValue and AcademicSubjectDescriptorNamespace) | data type of natural key | External references monitored by SQL script |
ODS parent key column(s) removed from child tables (e.g., the t_edfi.StudentLanguage table does not contain the t_edfi.Student.StudentUniqueId natural key column) | n/a | ODS parent key column(s) will be surfaced by the functions to appear "as if" they were in the table structure |
ODS Field Reference
As noted above, the T-ODS contains a set of data tables parallel to the present and future Ed-Fi ODS data tables, including extensions, with an additional "tid_TableName" column as the primary key.
While the transactional, current ODS does not have any fields specifically to support the T-ODS, the snapshot loading process leverages the following ODS field references:
ODS Table Column(s) | Purpose |
---|---|
Primary Key columns NOTE: surrogate key columns converted to natural key values | To uniquely identify and relate the ODS record to its T-ODS counterpart |
LastModifiedDate | Used to determine if the ODS record was modified: if a record's ODS LastModifiedDate <> T-ODS LastModifiedDate then a new record is stored in the T-ODS resource table(s) during the snapshot process. |
DRAFT NOTES (to be removed) | |
---|---|
Requirement | DOC-01 |
Audience | Technical |
Type | Technical Reference Material |
Due Date | TBD |