Introduction

This document provides a rough analysis of what will be needed if the Ed-Fi ODS/API were to be converted over to run on another relational database management system (RDBMS). PostgreSQL was used as an anchor; however, many of the findings will be generally applicable to other RDBMS's.

PostgreSQL Conversion

An entire 2.5 ODS database was scripted out and paintstakingly converted over so that it would execute successfully as pg-sql, thus building out the full ODS schema in a postgres database.

Eliminate boilerplate at the top of the script - things related to SQL Cmd mode, connection properties (e.g. ansi_padding), etc.
Replace all [ ] qualifiers with nothing. Postgres uses quotation marks as qualifiers as needed. However, when you use them to create a object, then you are stuck permanently using them. Best to create without the quotation marks.
Create a dbo schema, belonging to the postgres user.
Alter other schemas so that they belong to the postgres user instead of dbo.

Fix data types

T-SQL Type	PG-SQL Type	Notes
NVARCHAR	VARCHAR	postgres implicitly converts handles all text as Unicode.
DATETIME	TIMESTAMP	Not using TIMESTAMPTZ (with timezone) because the source data has no record of the time zone to begin with.
DATETIME2	TIMESTAMP
UNIQUEIDENTIFIER	CHAR(40)
BIT	SMALLINT	While postgres has a `bit` data type, it does not implicitly convert integer to bit. Instead it requires use a string '0' or '1' (or 'true', etc.). For maximum compatibility with existing code that uses the integer 0/1 instead of string '0'/'1', we should simply switch to `smallint`.

Postgres's cluster concept is different and more analysis is needed to know if it is needed (explicitly) for performance benefits. Remove CLUSTERED and NONCLUSTERED keywords from all indexes and primary key constraints. Also remove PAD_INDEX from constraints. Safe to leave in FILLFACTOR.
Remove WITH CHECK and NOCHECK from constraints.
The pg-equivalent of INT IDENTITY is SERIAL. Reg ex: /INT\s+IDENTITY\s*$1, 1$/ replace with SERIAL
Remove all PRINT statements and extended properties (table/column descriptions). Extended properties can be rewritten as COMMENTs but the effort was not worthwhile in this POC.

Views

Remove WITH SCHEMABINDING
Replace COUNT_BIG(*) with COUNT(*), as postgres's COUNT function already returns a big integer.
Remove all indexes on views.

In a real conversion, some views might need to materialized views for performance optimization.
Ensure that all view definitions end in a semi-colon.
In auth.LocalEducationAgencyIdToStaffUSI, cast null values as integer.

UNPIVOT must be rewritten with UNION ALL statements.

Click here to expand...

T-SQL

PG-SQL

CREATE VIEW [auth].[EducationOrganizationIdToStateAgencyId]
AS
    SELECT    DISTINCT StateEducationAgencyId, EdOrgId as EducationOrganizationId
    FROM
            (SELECT    * FROM auth.EducationOrganizationIdentifiers) As Source
    UNPIVOT (EdOrgId FOR IdType IN (EducationServiceCenterId, LocalEducationAgencyId, SchoolId)) -- Only ESCs, LEAs and Schools are accessible to State-level claims
            As UnpivotedSource
    UNION    
    -- State-level claims also can access the State
    SELECT    StateEducationAgencyId, StateEducationAgencyId AS EducationOrganizationId
    FROM    edfi.StateEducationAgency
GO

CREATE VIEW auth.EducationOrganizationIdToStateAgencyId
AS


    SELECT 
        StateEducationAgencyId,
        EducationServiceCenterId as EducationOrganizationId
    FROM
        auth.EducationOrganizationIdentifiers
    WHERE
        EducationServiceCenterId IS NOT NULL
    UNION
    SELECT 
        StateEducationAgencyId,
        LocalEducationAgencyId as EducationOrganizationId
    FROM
        auth.EducationOrganizationIdentifiers
    WHERE
        LocalEducationAgencyId IS NOT NULL
    UNION
    SELECT 
        StateEducationAgencyId,
        SchoolId as EducationOrganizationId
    FROM
        auth.EducationOrganizationIdentifiers
    WHERE
        SchoolId IS NOT NULL
    UNION   
    -- State-level claims also can access the State
    SELECT  StateEducationAgencyId, StateEducationAgencyId AS EducationOrganizationId
    FROM    edfi.StateEducationAgency;

Views that use common table expressions (CTE) recursively need to be modified by adding the keyword RECURSIVE immediately after WITH. Affected views: StudentAssessmentStudentObjectiveAssessmentH, ObjectiveAssessmentH_CTE, LocalEducationAgencyH, LearningStandardH, LearninbjectiveH, AssessmentFamilyH.
Default values in postgres are not constraints, so the syntax needs to be altered.
1. Substitute now() for getdate().
  From To
  /ALTER TABLE ([^\s]+)[\s\n]+(?:ADD CONSTRAINT (?:[^\s]+)) DEFAULT $getdate\($\) FOR (\w+)/ ALTER TABLE $1 ALTER COLUMN $2 SET DEFAULT now()
2. Replace newid() with uuid_generate_v4(). This requires first configuring uuid support by running CREATE EXTENSION "uuid-ossp".
  From To
  /ALTER TABLE ([^\s]+)[\s\n]+(?:ADD CONSTRAINT (?:[^\s]+)) DEFAULT $newid\($\) FOR (\w+)/ ALTER TABLE $1 ALTER COLUMN $2 SET DEFAULT uuid_generate_v4()

The maximum identifier length is 63. Longer identifiers are automatically truncated. This results in many foreign key names overlapping. To avoid this, use reasonable abbreviations to shorten these seven problematic beginnings to foreign keys:

From	To
fk_educationorganizationnetworkassociation_educationorganization	fk_edorgnetworkassociation_edorg
fk_learningstandardprerequisitelearningstandard_learningstandard	fk_lrnstdprerequisitelrnstd_lrnstd
fk_educationcontentderivativesourceeducationcontent_educationcontent	fk_edconderivativesourceedcon_edcon
fk_studentassessmentstudentobjectiveassessmentperformancelevel	fk_studassessmentstudobjassessmentperflvl
fk_postsecondaryeventpostsecondaryinstitutionidentificationcode	fk_pstscdryeventpstscdryinstidcode
fk_postsecondaryeventpostsecondaryinstitutionmediumofinstruction	fk_pstscdryeventpstscdryinstmediumofnstrn
fk_educationorganizationidentificationcode_educationorganization	fk_edorgidcode_edorg

Change Queries

I was able to evaluate some of the code for the Change Queries feature coming in version 3.1. Thankfully it is possible to convert this code. There are two components to the Change Queries work: a sequence table and triggers to set the ChangeVersion column on a table after update, utilizing the sequence table. The syntax for the s equence itself is the same in both. The trigger structure is a little different in postgres, and it in fact requires creating a function and then calling that function from the trigger.

From

To

CREATE TRIGGER
edfi.edfi_academichonorcaterytype_TR_UpdateChangeVersion 
ON
edfi.academichonorcaterytype AFTER UPDATE AS
BEGIN
    SET NOCOUNT ON
   
    UPDATE edfi.academichonorcaterytype
	SET changeversion = (NEXT VALUE FOR
changes.ChangeVersionSequence)
    FROM edfi.academichonorcaterytype
    WHERE EXISTS (SELECT 1 from inserted i
where i.id = u.id)

END
GO

CREATE OR REPLACE FUNCTION edfi.edfi_academichonorcaterytype_tr_updatechangeversion()
 RETURNS trigger
 LANGUAGE plpgsql
AS $function$
BEGIN
    UPDATE edfi.academichonorcaterytype
        set changeversion = nextval('changes.ChangeVersionSequence')
        where id = new.id;
        
    RETURN NEW;
END;
$function$

CREATE TRIGGER edfi_academichonorcaterytype_TR_UpdateChangeVersion 
AFTER UPDATE ON edfi.academichonorcaterytype
FOR EACH ROW
WHEN (pg_trigger_depth() = 0)
EXECUTE PROCEDURE edfi.edfi_academichonorcaterytype_TR_UpdateChangeVersion();

Solution Challenges

Potential .NET Application Problems

As noted above, COUNT() in postgres returns a big integer, not an integer. This could potentially mean needing to switch return types in C# from int to long - and it might be difficult to find where this is needed.
Exceptions coming back from the SQL connections will be of a different type, with different codes. Will need to look closely at the code base to see if this requires any changes.
Hard-coded SQL exists in some places in the applications, and in many places in the unit tests. These SQL statements will all have to be evaluated with the conversion rules in mind. Preferably they would be completely removed from the code base, instead relying only on an ORM. Complex queries should be replaced with views. This is just a good practice, as hard-coded SQL embedded in C# is hard to maintain even if using only a single RDBMS.
Bulk load might be a problem. The Npgsql library does support bulk copy, but I have not been able to find anything recent about NHibernate support. Old stack overflow discussions appear to pre-date the introduction of the bulk copy facility in Npgsql.

Code Generation

placeholder

Install/Migration Scripts

Individual SQL scripts in the repositories often have some T-SQL-specific syntax in them. An example I saw while doing a code review of the new operational context work for 3.1:

In this example, lines 1, 2, 3, 4, 5, and 7 are all problematic in pg-sql.

Converting to PostgreSQL: Analysis