T-ODS Design & Data Model

Introduction

This documentation serves as an introduction to the overall design and data model for the Temporal ODS (T-ODS), which consists of data structures and database support for an efficient means of loading, storing, and making available data at multiple points in time. These structures handle related snapshot and other historical data functions, and provide support for applications to access and query that historical data.

Audience

The audience for this documentation are analysts, developers, and database administrators who are working with the T-ODS. Those considering Ed-Fi ODS / API adoption may find this information useful as well.

T-ODS Design

The T-ODS is comprised of metadata tables and data tables.

The snapshot features of the T-ODS are driven by two metadata tables and a set of metadata-to-data bridge tables to support the snapshot architecture and T-ODS / T-API functionality.

  • The first metadata table, tods.Snapshot, stores snapshot-level metadata, such as the SnapshotName and the Status (e.g., ACTIVE). 
  • The second metadata table, tods.SnapshotCode, associates individual Snapshot records to one or more SnapshotCodes. This supports snapshot aliasing, meaning that client systems and queries can use multiple names to query or perform operations against a single Snapshot. 
  • The set of metadata-to-data bridge tables consists of a t_[ODS namespace].[ODS table name]SnapshotRecord (e.g., t_edfi.StudentSnapshotRecord), which associates individual records in the T-ODS domain aggregate data tables to one or more snapshots.

The T-ODS contains a set of data tables parallel to the Ed-Fi ODS data tables, including Ed-Fi Extensions, if present. T-ODS schema artifacts (e.g., tables, foreign keys, stored procedures) are generated using the MetaEd domain-specific language files and the MetaEd database script generation capability. To support the ODS extension schema, as well as plans for future namespace extensions, T-ODS data tables are generated in a separate schema for every ODS table generated presently. The T-ODS data tables are generated using the schema naming convention: t_[ODS namespace] (e.g., t_edfi, t_extension). The T-ODS tables are generated within the ODS database.

A given T-ODS data table has the same definition and patterns relative to its ODS counterpart with the following exceptions:

  • An additional "tid_TableName" INT IDENTITY column is added as the new primary key
  • An additional "t_PartitionYearId" SMALLINT column is added as the partition column
  • ODS surrogate columns are replaced with the ODS natural key columns (e.g. AcademicSubjectTypeId is replaced with AcademicSubjectTypeCodeValue, StudentUSI is replaced with StudentUniqueId, LanguageDescriptorId is replaced with LanguageDescriptorCodeValue and LanguageDescriptorNamespace)
  • The edfi.Descriptor table is excluded
    • Its data columns are included in each Descriptor table
    • Its data is loaded directly to the specific Descriptor data tables identified by the Namespace data
  • Domain aggregate tables are linked internally by a T-ODS tid identity column
  • Domain aggregate child tables do not contain the ODS parent key columns

While the T-ODS includes data tables for the ODS type tables, the T-ODS type tables are loaded during the build process and are not "snapshotted" since type enumerations are a core part of the standard and should not be modified. However, descriptors are snapshotted since descriptors are designed to provide a mechanism for local enumerations and could change over time.

A major challenge in the design of the T-ODS is avoiding an explosion of data storage needs caused by copies of data that do not change much over time. For example, the results of a local assessment delivered in the first week of school is unlikely to change much after loaded. If a licensee were to take a snapshot just once per week, ideally the T-ODS would not store 35 copies of this unchanged data.

The T-ODS solves this problem. Data is stored at the domain entity resource level where unique combinations of attribute values found across all the domain tables that represent a record for that domain entity. Each generated T-ODS snapshot stores only the unique combinations of domain record values found across all snapshots executed on its ODS counterpart domain entity tables. This is also true of data loaded via the bulk loader. This approach significantly reduces the physical space required to store snapshots, especially for records in the ODS with low churn, as only data that provides new information is added to the database. For example, if a given record in edfi.Student does not change throughout an entire school year, the student will be represented in the T-ODS as a single record even if the ODS is snapshotted weekly or even daily. 

Temporal Data Model

Documentation Conventions

The following conventions were used in the model diagrams to illustrate the associations between the temporal data model entities:

For the purpose of describing types and Ed-Fi Descriptors, the naming convention "ContextDescriptor" and "ContextType" are used as a generic representation of the descriptive name of the individual Descriptor and type tables (e.g., AcademicSubjectDescriptor, RaceType).

T-ODS Metadata Model

The T-ODS contains two metadata tables and a set of metadata-to-data bridge tables to support the snapshot architecture and T-ODS/T-API functionality:

  • tods.Snapshot
    • Stores snapshot-level metadata, such as the SnapshotName and the Status (e.g.,'ACTIVE', 'INACTIVE')
    • Must be created prior to loading the snapshot data tables
  • tods.SnapshotCode
    • Stores the multiple codes of a given Snapshot
    • A Snapshot must have at least one code
  • t_[ODS namespace].[table name]SnapshotRecord 
    • Associates individual records in the T-ODS data tables to one or more snapshots
    • Most "core" Ed-Fi tables will be prefixed with t_edfi (e.g., t_edfi.Student, t_edfi.Staff)
    • Extension tables have a different schema name; examples that ship with the T-ODS are prefixed with t_extension (e.g., t_extension.StudentTransportation, t_extension.BusRouteDayOfWeekDescriptor)

Snapshot Metadata to Type Entities Association

Types, including map types, are not associated to the Snapshot metadata:

  • Each Type table is equivalent to its ODS counterpart in both structure and data for the columns:
    • CodeValue
    • ShortDescription
    • Description
    • Id
    • LastModifiedDate
    • CreateDate
  • The ODS surrogate [Type table name ]Id column is dropped
  • A referential integrity SQL script maintains the external references between the data tables and the type tables

Abstract Model


The following model illustrates the relationships between:

  • Snapshot metadata and the type entities
  • Type entities and domain entities
  • Type entities and Ed-Fi Descriptor tables (e.g., AcademicSubjectDescriptor)




Snapshot Metadata to Descriptor Entities Association

Descriptors are associated to the Snapshot metadata as follows:

  • t_edfi.DescriptorSnapshotRecord associates the t_edfi.Descriptor, the temporal counterpart of the Core edfi.Descriptor entity
  • Core and Extension Descriptors are associated via the t_[ODS namespace].[table name]SnapshotRecord (e.g., AcademicSubjectDescriptorSnapshotRecord)
    • A Descriptor's association to the t_edfi.Descriptor entity is not defined by database constraints
Abstract Model


The following model illustrates the relationships between:

  • Snapshot metadata and the t_edfi.Descriptor entity
  • Snapshot metadata and the descriptor entities
  • Descriptor entities and the t_edfi.Descriptor entity
  • Type entities and descriptor entities

 

Snapshot Metadata to Domain Entities Association

The following abstract model illustrates these modeling points:

  • Snapshot metadata is associated to a domain root entity via the t_[ODS namespace].[table name]SnapshotRecord (e.g., t_edfi.StudentSnapshotRecord)
  • Domain child entities are not directly associated to the snapshot metadata


The following partial Student model uses a concrete example to show the relationships between:

  • Snapshot metadata and the t_[ODS namespace].[domain root table name]SnapshotRecord tables
  • Relationships between the domain data tables, type tables, and Ed-Fi Descriptor tables

Note that relationships within the domain data tables such as the parent-child relationship are established via new primary key identities (i.e., "tid_[TableName]") instead of natural keys or their ODS surrogate keys.


Temporal Data Tables Model

The T-ODS consists of a set of data tables parallel to the present and future Ed-Fi ODS data tables, including extensions.

To support extensions, as well as plans for future namespace extensions, the T-ODS data tables are generated in a separate schema: t_[ODS namespace]

The T-ODS data tables follow the same pattern relative to its ODS counterpart with the following exceptions:

  • Generated using the naming convention t_[ODS namespace].[ODS table name]
  • An additional tid_[table name] INT IDENTITY column is added as the new primary key
  • An additional t_PartitionYearId SMALLINT column is added as the partition column
  • ODS surrogate columns are replaced with the ODS natural key columns (e.g., AcademicSubjectTypeId is replaced with AcademicSubjectTypeCodeValue, StudentUSI is replaced with StudentUniqueId, LanguageDescriptorId is replaced with LanguageDescriptorCodeValue and LanguageDescriptorNamespace)
  • The edfi.Descriptor table is excluded:
    • Its data columns are included in each Descriptor table
    • Its data is loaded directly into the specific Descriptor data tables identified by the Namespace data
  • Domain aggregate tables are linked internally by a T-ODS tid identity column
  • Domain aggregate child tables do not contain the ODS parent key columns

The LastModifiedDate of the root entity is used to determine if the ODS record was modified. For a given row, if the ODS LastModifiedDate <> the T-ODS LastModifiedDate then a new row is stored in the T-ODS resource table(s) during the snapshot process.


Other Topics

Replacement of ODS Surrogate Key Columns

The one-to-one relationship between the surrogate key and the natural key is maintained in the ODS since it stores current data only. However, in the T-ODS, there can be a many-to-one relationship between the ODS surrogate key value and the natural key. For this reason, the ODS surrogate keys are replaced with natural key column and values in the T-ODS. For example, the "USI" column in the person tables Student, Staff, and Parent is replaced by the "UniqueId" column. This change results in:

  • The "USI" column being dropped in the Student, Staff, and Parent tables
  • The "USI" column being replaced by the "UniqueId" in all external references

Surrogate key removal also impacts the "TypeId" and "DescriptorId" surrogate columns:

  • "TypeId" column is replaced with the CodeValue of the counterpart type table (e.g. SexTypeId is replaced by SexTypeCodeValue)
  • "DescriptorId" column is replaced with the CodeValue and Namespace columns of the counterpart descriptor table (e.g. LanguageDescriptorId is replaced by LanguageDescriptorCodeValue and LanguageDescriptorNamespace)

The reliability of data loaded from an ODS database is dependent on the data integrity enforced by the sourced ODS database and assumes:

  1. a primary key was defined for every table in the sourced database
  2. foreign key constraints were defined for every table with child-parent relationship(s)

The snapshot load process includes an integrity SQL script that audits the expected relationships and logs any exceptions identified. The T-API and T-Bulk features will use the same scripts to ensure that referential integrity is enforced for as updates and inserts 

ODS Surrogate Key Column replacement and Impact on the T-ODS API POC

The guiding approach of the T-ODS API POC is to reuse the existing ODS API where possible. The resource and entity models along with their mapping code are one area where this design philosophy, while possible, requires significant code modifications to achieve this reuse. Sub-classes of the entities were created. References to person, type, and descriptor caches were replaced with analogous T-ODS API classes. Additional attributes were added.

With the changes to the internal and external key structures, the mapping model between resources and entities is simplified because the USI cache, Type cache, and Descriptor cache are not needed to lookup their corresponding IDs.

Partition Scheme

The T-ODS supports partitioning the data tables by school year via a t_PartitionYearId column to enable linear or better performance impact as data grows to multiple years. Additionally, a stored procedure is provided for adding the front-end partition in preparation for a new school year. MetaEd generates the artifacts to create the objects required to support partitioning. The configuration table and stored procedures are created during the build process and the load process was designed to fill the t_PartitionYearId column. The Snapshot utility provides functionality to create the initial partition scheme, as well as functionality to add the front-end partition in future years.

The following tables contain the t_PartitionYearId column:

  • t_[ODS namespace].[T-ODS table name]SnapshotRecord
  • t_[ODS namespace].[ODS domain root table name] and child tables within the resource
  • tods.Snapshot
  • tods.SnapshotCode

The T-ODS Type tables do not contain the t_PartitionYearId column since the Types are not snapshotted.

The configurations needed for partitioning by school year are housed in the tods.PartitionYearConfig table:

ColumnData TypePurpose
PartitionYearConfigIdSMALLINT INDENTITY(1,1)
  • PK column
  • database generated identity
  • value derived by tods.usp_DerivePartitionYearId is stored in the t_PartitionYearId column
BeginDateDate
  • begin date of the school year
  • default: last Monday in August of current year
EndDateDate
  • end date of the school year
  • default: Sunday prior to last Monday in August of the following year
SchoolYearNVARCHAR(10)School year represented in format YYYY-YYYY
CreateDateDATETIMECaptures the date and time when the row was crated
LastModifiedDateDATETIMECaptures the date and time when the row was last modified
IdUNIQUEIDENTIFIERT-ODS GUID for the partition year

During the build process the dbo.PartitionYearConfig table is loaded to reflect the SchoolYear values (current, plus previous 11) with related default BeginDate and EndDate values. The Snapshot utility provides functionality for a snapshot administrator to manage the configuration table.

The tods.PartitionYearConfig contains the configurations required by the following stored procedures:

  • tods.usp_CreateInitialPartitionScheme
  • tods.usp_CreateNextPartition
  • tods.usp_DerivePartitionYearId

The stored procedure usp_CreateInitialPartitionScheme performs the following operations to create the initial partitioning scheme:

  • Creates the filegroups required to create the partition scheme
  • Add the filegroups to the database
  • Adds the database files to the filegroups
  • Creates the partition range function that defines the range of values that are to be stored in the different partitions
  • Creates the partition scheme for the partition range function using a list of Partition Years derived from the tods.PartitionYearConfig table
  • Creates the CLUSTERED INDEX on the t_PartitionYearId column of the T-ODS tables (Snapshot, SnapshotCode, all SnapshotRecord tables, and all data tables)

The stored procedure usp_CreateNextPartition performs the following operations to add the front-end partition:

  • Adds the next row to the tods.PartitionYearConfig table
  • Generates the new file group required for the new front-end partition
  • Adds the new file group to the database
  • Adds the database files to the filegroup
  • Adds the new file group to the partition scheme
  • Adds the new value to the partition function range

The file groups for the partition scheme are created with the following properties:

                  SIZE = 750MB

                  MAXSIZE = 50000MB -> 50GB

                  FILEGROWTH = 500MB

The tods.usp_DerivePartitionYearId is used during the snapshot load process to derive the value of the t_PartitionYearId column for all data stored during the load process. The SnapshotDate is used to derive the t_PartitionYearId value.

Queries

Avoiding an explosion of data storage arising from snapshotting operations involves a performance tradeoff.

Read queries would be most performant if physical copies of every ODS record – including unchanged records – were stored with every snapshot. However, unique combinations of attribute values, and logical snapshots are reconstructed by joining to snapshot metadata tables at query runtime to reduce the storage requirements for a T-ODS instance. This join occurs using the SnapshotId and the associated tid primary key of the domain resource, which is an IDENTITY generated integer (rather than using an alphanumeric GUID or similar) to improve performance.

Because the join occurs over a primary key / foreign key relationship, the operation is efficient. The storage-optimized approach does not implement database constraints on “natural” primary keys and foreign keys from the Ed-Fi ODS tables into the T-ODS tables. As a result, joins between T-ODS data tables do not naturally have indexes that would allow the database optimizer to use efficient join algorithms. To remedy this, non-unique indexes were added on the T-ODS to match the columns making up the non-surrogate natural key.

To simplify querying the T-ODS data tables, a set of table-valued functions was created for each T-ODS data table:

  • t_[ODS namespace].ufn_Get[ODS table name]ByDate (e.g., t_edf.ufn_GetStudentByDate), which is a table-valued function for viewing historical data by AsOfDate. This function returns data for single date within a snapshot, based on snapshot record effective dates.
  • t_[ODS namespace].ufn_Get[ODS table name]ByRange (e.g., t_edf.ufn_GetStudentByRange), which is a table-valued function for viewing historical data by Effective date range. This function returns data where (Snapshot Record Effective Begin Date < input End Date OR input End Date IS NULL) AND Snapshot Record Effective End Date >= input Begin Date.
  • t_[ODS namespace].ufn_Get[ODS table name]BySnapshot (e.g., t_edf.ufn_GetStudentBySnapshot), which is a table-valued function for viewing historical data by Snapshot. The parameter is a pipe-delimited string that accepts:
    • Identifier, which returns ALL data within the Snapshot for the given snapshot identifier. 
    • Snapshot Date, which returns ALL data within the Snapshot for the given snapshot date.
    • Snapshot Code, which returns ALL data within the Snapshot for the given snapshot code.

Referential Integrity Approach

The Referential integrity of updates and inserts submitted through the T-API will be enforced using a combination of database constraints and SQL scripts:

Internal References

Internal References are handled via Database constraints applied within a domain entity

  • Within the T-ODS tables, the tid (PK) is established for the Root table and is used to associate the Root table to the related SnapshotRecord table   For example: tid (PK) would be established for the t_edfi.Student table and used to associate the t_edfi.Student table and the t_edfi.StudentSnapshotRecord table
  • Child table(s) contain a tid (FK) column to associate the child table(s) to the parent table.  For example: t_edfi.StudentAddress table would contain a tid_Student (FK) to the t_edfi.Student table

External References

External references will be validated through the use of SQL stored procedures that check references external to the domain entity where the domain entity and external references are linked by the SnapshotId. These are easiest to understand by example:

  • Type example. For a given SnapshotId, The usp_eStudent SQL stored procedure checks if the t_edfi.StudentAddress.AddressType is a valid value within the t_edfi.AddressType table.
  • Descriptor** example. For a given SnapshotId, The sp_eStudent SQL stored procedure checks if the t_edfi.Student.BirthCountryDescriptorCodeValue is a valid value within the t_edfi.CountryDescriptor table.
  • Association example. For a given SnapshotId.  The sp_eStudentProgramAssociation stored procedure checks if the t_edfi.StudentProgramAssociation.StudentUniqueId is a valid value with the t_edfi.Student table.

** As noted elsewhere, the T-ODS doesn't contain a t_edfi.Descriptor table since the edfi.Descriptor table columns have been moved into each specific Descriptor table in the T-ODS.

Temporal Field Data Dictionary

This section provides a data dictionary for the tables referenced in this documentation.

tods.Snapshot

The metadata table tods.Snapshot stores snapshot-level metadata.

FieldData TypeDefintion
SnapshotIdINT IDENTITY
  • (PK) column
  • Database generated IDENTITY
SnapshotDateDATE
  • Indicates the single day or point-in-time when the data was correct
  • Used by the Temporal Bulk API context to associate bulk loaded information with a particular snapshot
  • Represents the finest grain of time supported by the Temporal ODS: a single day
  • There will only be one record for any given entity on a SnapshotDate / single day
EffectiveBeginDateDATE
  • Represents the begin date of the snapshot range
  • Value cannot result in overlapping date range across snapshots
EffectiveEndDateDATE
  • Represents the end date of the snapshot range
  • Value cannot result in overlapping date range across snapshot
  • A null value indicates that this is the current record
SnapshotNameNVARCHAR(450)
  • Free-form, brief, human-readable text name, e.g., “September 1, 2017 End-of-Year Snapshot.”
  • Required unique
DescriptionNVARCHAR(1024)User friendly description of the snapshot
IsLockedForProcessingBIT
  • Used during the Long-running async snapshot process to facilitate the two-step create-and-populate snapshot needed by the API.
  • IsLockedForProcessing = 1/TRUE prevents updates to the snapshot metadata record, as well as prevents additional data loads
  • Data related to Snapshot should be excluded from queries when IsLockedForProcessing = 1/TRUE

IsLockedBIT
  • Flag indicating whether the records associated with this snapshot are locked (i.e., not available for editing).
  • IsLocked = 1/TRUE prevents future updates to the snapshot metadata record, as well as prevents additional data loads
StatusNVARCHAR(10)
  • 'ACTIVE' indicates Snapshot load process completed successfully without error
  • 'INACTIVE' indicates Snapshot load process encountered error (note: errors logged on ExceptionLog)
LoadStartTimeDATETIMECaptures the start time of the most recent data load process performed on the snapshot
LoadEndTimeDATETIMECaptures the end time of the most recent data load process performed on the snapshot
LastModifiedDateDATETIMECaptures the last date the snapshot metadata was modified
IdUNIQUEIDENTIFIERT-ODS GUID for the snapshot
t_PartitionYearIdSMALLINT
  • The partitioning column
  • Value derived based on SnapshotDate and its associated row in the tods.PartitionYearConfig table

tods.SnapshotCode

The metadata table tods.SnapshotCode stores one or more "aliases" of a given Snapshot metadata record.

FieldData TypeDefinition
SnapshotCodeIdINT IDENTITY
  • (PK) column
  • Database generated IDENTITY
SnapshotCodeNVARCHAR(50)
  • Human readable URI friendly name for the snapshot
  • Unique across all Snapshot names
SnapshotIdINT

FK to associated Snapshot metadata record

t_PartitionYearIdSMALLINT
  • The partitioning column
  • Value derived based on SnapshotDate and its associated row in the tods.PartitionYearConfig table

t_[ODS namespace].[data table name]SnapshotRecord

The Snapshot Record metadata tables associate individual records in the T-ODS data tables to one or more snapshots.

Field Definition
SnapshotIdINTFK to associated Snapshot metadata record

tid_[data table name]

(e.g. tid_Student, tid_SexType, tid_AcademicSubjectDescriptor)

INTFK to associated temporal data table
EffectiveBeginDateDATE
  • Supports corrections that do not align to Snapshot date range
  • Represents the begin date of the temporal data record
  • Value cannot result in overlapping date range across data records
  • Value must fall within date range of Snapshot metadata record
EffectiveEndDateDATE
  • Supports corrections that do not align to Snapshot date range
  • Represents the end date of the temporal data record
  • Value cannot result in overlapping date range across data records
  • Value must fall within date range of Snapshot metadata record
IdUNIQUEIDENTIFIERT-ODS GUID for the snapshot
t_PartitionYearIdSMALLINT
  • The partitioning column
  • Value derived based on SnapshotDate and its associated row in the tods.PartitionYearConfig table

T-ODS Data Tables

A given T-ODS data table has the same definition and patterns relative to its ODS counterpart with a few exceptions as described below.

Columns Across All t_[ODS namespace].[data table name]

The following lists the fields that are common across all T-ODS data tables.

FieldData TypeDefinition

tid_[dataTableName]     

(e.g., the t_edfi.Student table has tid_Student PK; the t_edfi.SexType table has tid_SexType PK, the t_edfi.AcademicSubjectDescriptor table has the tid_AcademicSubjectDescriptor PK)

                             

INT IDENTITY
  • New primary key column
  • Database generated IDENTITY
t_PartitionYearIdSMALLINTTo support partitioning scheme in order to enable linear or better performance impact as data grows to multiple years

Patterns Specific to Domain Child t_[ODS namespace].[data table name]

The following lists the patterns specific to T-ODS domain child data tables.

FieldData TypeDefinition
tid_[parentDataTableName]INT IDENTITYFK to associated parent - the immediate parent
ODS surrogate key column(s) replaced with natural key column(s) (e.g., AcademicSubjectDescriptorId replaced with AcademicSubjectDescriptorCodeValue and AcademicSubjectDescriptorNamespace)data type of natural keyExternal references monitored by SQL script

ODS parent key column(s) removed from child tables

(e.g., the t_edfi.StudentLanguage table does not contain the t_edfi.Student.StudentUniqueId natural key column)

n/aODS parent key column(s) will be surfaced by the functions to appear "as if" they were in the table structure

ODS Field Reference

As noted above, the T-ODS contains a set of data tables parallel to the present and future Ed-Fi ODS data tables, including extensions, with an additional "tid_TableName" column as the primary key.

While the transactional, current ODS does not have any fields specifically to support the T-ODS, the snapshot loading process leverages the following ODS field references:

ODS Table Column(s)Purpose

Primary Key columns

NOTE: surrogate key columns converted to natural key values

To uniquely identify and relate the ODS record to its T-ODS counterpart

LastModifiedDate

Used to determine if the ODS record was modified: if a record's ODS LastModifiedDate <> T-ODS LastModifiedDate then a new record is stored in the T-ODS resource table(s) during the snapshot process.

DRAFT NOTES (to be removed)
RequirementDOC-01
AudienceTechnical
TypeTechnical Reference Material
Due DateTBD