Info | ||
---|---|---|
| ||
This document provides a rough analysis of what will be needed if the Ed-Fi ODS/API were to be converted over to run on another relational database management system (RDBMS). PostgreSQL was used as an anchor; however, many of the findings will be generally applicable to other RDBMS's. |
Table of Contents |
---|
...
...
placeholder for generalized analysis
PostgreSQL Conversion
An entire 2.5 ODS database was scripted out and paintstakingly converted over so that it would execute successfully as pg-sql, thus building out the full ODS schema in a postgres database.
- Eliminate boilerplate at the top of the script - things related to SQL Cmd mode, connection properties (e.g. ansi_padding), etc.
- Replace all [ ] qualifiers with nothing. Postgres uses quotation marks as qualifiers as needed. However, when you use them to create a object, then you are stuck permanently using them. Best to create without the quotation marks.
- Create a
dbo
schema, belonging to thepostgres
user. - Alter other schemas so that they belong to the
postgres
user instead ofdbo
. Fix data types
unicode encodings on-the-fly. Since the Ed-Fi ODS is for use in the United States, UTF-16 support hardly seems necessary anywayT-SQL Type PG-SQL Type Notes NVARCHAR VARCHAR postgres implicitly converts handles all text as Unicode. DATETIME TIMESTAMP Not using TIMESTAMPTZ (with timezone) because the source data has no record of the time zone to begin with. DATETIME2 TIMESTAMP UNIQUEIDENTIFIER CHAR(40) BIT SMALLINT While postgres has a
bit
data type, it does not implicitly convert integer to bit. Instead it requires use a string '0' or '1' (or 'true', etc.). For maximum compatibility with existing code that uses the integer 0/1 instead of string '0'/'1', we should simply switch tosmallint
.- Postgres's
cluster
concept is different and more analysis is needed to know if it is needed (explicitly) for performance benefits. RemoveCLUSTERED
andNONCLUSTERED
keywords from all indexes and primary key constraints. Also removePAD_INDEX
from constraints. Safe to leave inFILLFACTOR
. - Remove
WITH CHECK
andNOCHECK
from constraints. - The pg-equivalent of
INT IDENTITY
isSERIAL
. Reg ex:/INT\s+IDENTITY\s*\(1, 1\)/
replace withSERIAL
- Remove all PRINT statements and extended properties (table/column descriptions). Extended properties can be rewritten as
COMMENT
s but the effort was not worthwhile in this POC. - Views
- Remove
WITH SCHEMABINDING
- Replace
COUNT_BIG(*)
withCOUNT(*)
, as postgres'sCOUNT
function already returns a big integer. Remove all indexes on views.
Note In a real conversion, some views might need to materialized views for performance optimization.
- Ensure that all view definitions end in a semi-colon.
- In auth.LocalEducationAgencyIdToStaffUSI, cast null values as integer.
UNPIVOT must be rewritten with UNION ALL statements.
Expand T-SQL PG-SQL Code Block language sql CREATE VIEW [auth].[EducationOrganizationIdToStateAgencyId] AS SELECT DISTINCT StateEducationAgencyId, EdOrgId as EducationOrganizationId FROM (SELECT * FROM auth.EducationOrganizationIdentifiers) As Source UNPIVOT (EdOrgId FOR IdType IN (EducationServiceCenterId, LocalEducationAgencyId, SchoolId)) -- Only ESCs, LEAs and Schools are accessible to State-level claims As UnpivotedSource UNION -- State-level claims also can access the State SELECT StateEducationAgencyId, StateEducationAgencyId AS EducationOrganizationId FROM edfi.StateEducationAgency GO
Code Block language sql CREATE VIEW auth.EducationOrganizationIdToStateAgencyId AS SELECT StateEducationAgencyId, EducationServiceCenterId as EducationOrganizationId FROM auth.EducationOrganizationIdentifiers WHERE EducationServiceCenterId IS NOT NULL UNION SELECT StateEducationAgencyId, LocalEducationAgencyId as EducationOrganizationId FROM auth.EducationOrganizationIdentifiers WHERE LocalEducationAgencyId IS NOT NULL UNION SELECT StateEducationAgencyId, SchoolId as EducationOrganizationId FROM auth.EducationOrganizationIdentifiers WHERE SchoolId IS NOT NULL UNION -- State-level claims also can access the State SELECT StateEducationAgencyId, StateEducationAgencyId AS EducationOrganizationId FROM edfi.StateEducationAgency;
- Views that use common table expressions (CTE) recursively need to be modified by adding the keyword
RECURSIVE
immediately afterWITH
. Affected views: StudentAssessmentStudentObjectiveAssessmentH, ObjectiveAssessmentH_CTE, LocalEducationAgencyH, LearningStandardH, LearninbjectiveH, AssessmentFamilyH. Default values in postgres are not constraints, so the syntax needs to be altered.
Substitute
now()
forgetdate()
.From To /ALTER TABLE ([^\s]+)[\s\n]+(?:ADD CONSTRAINT (?:[^\s]+)) DEFAULT \(getdate\(\)\) FOR (\w+)/ ALTER TABLE $1 ALTER COLUMN $2 SET DEFAULT now() Replace
newid()
withuuid_generate_v4()
. This requires first configuring uuid support by runningCREATE EXTENSION "uuid-ossp"
.From To /ALTER TABLE ([^\s]+)[\s\n]+(?:ADD CONSTRAINT (?:[^\s]+)) DEFAULT \(newid\(\)\) FOR (\w+)/ ALTER TABLE $1 ALTER COLUMN $2 SET DEFAULT uuid_generate_v4()
The maximum identifier length is 63. Longer identifiers are automatically truncated. This results in many foreign key names overlapping. To avoid this, use reasonable abbreviations to shorten these seven problematic beginnings to foreign keys:
From To fk_educationorganizationnetworkassociation_educationorganization fk_edorgnetworkassociation_edorg fk_learningstandardprerequisitelearningstandard_learningstandard
fk_lrnstdprerequisitelrnstd_lrnstd
fk_educationcontentderivativesourceeducationcontent_educationcontent
fk_edconderivativesourceedcon_edcon
fk_studentassessmentstudentobjectiveassessmentperformancelevel
fk_studassessmentstudobjassessmentperflvl
fk_postsecondaryeventpostsecondaryinstitutionidentificationcode
fk_pstscdryeventpstscdryinstidcode
fk_postsecondaryeventpostsecondaryinstitutionmediumofinstruction
fk_pstscdryeventpstscdryinstmediumofnstrn
fk_educationorganizationidentificationcode_educationorganization
fk_edorgidcode_edorg
- Remove
Change Queries
I was able to evaluate some of the code for the Change Queries feature coming in version 3.1. Thankfully it is possible to convert this code.
Potential Application Problems
- As noted above,
COUNT()
in postgres returns a big integer, not an integer. This could potentially mean needing to switch return types in C# fromint
tolong
- and it might be difficult to find where this is needed. - Exceptions coming back from the SQL connections will be of a different type, with different codes. Will need to look closely at the code base to see if this requires any changes.
- Hard-coded SQL exists in some places in the applications, and in many places in the unit tests. These SQL statements will all have to be evaluated with the conversion rules in mind. Preferably they would be completely removed from the code base, instead relying only on an ORM. Complex queries should be replaced with views. This is just a good practice, as hard-coded SQL embedded in C# is hard to maintain even if using only a single RDBMS.
Solution Challenges
Code Generation
placeholder
Install/Migration Scripts
Individual SQL scripts in the repositories often have some T-SQL-specific syntax in them. An example I saw while doing a code review of the new operational context work for 3.1:
In this example, lines 1, 2, 3, 4, 5, and 7 are all problematic in pg-sql.