Multi-Instance Management Solution Design
- Stephen Fuqua
Introduction
As described in detail at Multi-Instance Management, there is a feature request for the Admin API to have the ability to create and drop SQL Server and PostgreSQL databases on an instance. This document describes detailed design issues around process flow and security, and presents multiple options for supporting the business requirements.
Admin API Security Model
Admin API uses OAuth2 for access authentication and authorization, the latter via JSON Web Tokens (JWT). At this time, it only support the Client Credentials grant.
Option 1: Direct Command Execution
A system administrator will authenticate with the Admin API and issue a "create ODS instance" (or "delete ODS instance") command. The Admin API in turn issues the correct T-SQL or pgSQL command to create the new database on the remote server.
Limitations
This model may be limited in its ability to create or destroy ODS instances on different database servers. The details will vary by the environment and the exact SQL statement used to create the copy.
For example, the DBCC CLONEDATABASE
command in MSSQL only creates a new copy on the same server. The same is true when using PostgreSQL's create from template capability.
Some managed services may support creating a copy on a different server. For example, see Azure SQL's AS COPY OF
syntax for the CREATE DATABASE
command.
It would, of course, be feasible to write several different function calls, differing by the environment. There is already a need to write different SQL statements for MSSQL and PostgreSQL, after all. The Alliance would need to decide which platforms to support, research how to support them, and schedule that functionality into appropriate releases.
Security Concerns
These commands require administrative credentials. There are two significant concerns with that:
- What if someone gains illicit access to the appsettings file for Admin API, stealing SQL credentials?
- Alternately, what if someone finds a way to execute arbitrary code in the Admin API context?
Option 2: Queued Command Execution
This scenario tries to address the limitations and security concerns by decoupling the Admin API as recipient of an API command from the execution of that command, by introducing a message/event queue.
Advantages
- The Admin API no longer needs high-security credentials on the database server: it simply creates an event in a queue, and a separate process - which is not directly exposed over the Internet - has the credentials required to carry out the action.
- The Admin API itself does not need to know how to manage multiple types of databases - it simply writes to the queue.
- The Alliance could create something simple to handle basic use cases, and the community could develop their own solutions for cloud management, without having to contribute new code to the Admin API itself.
- The end user gets a quick response that the command has been accepted, without having to wait for execution of the command to finish.
- Event / queue pattern may be helpful for other situations.
- For simplicity, write the queue on the main Admin API database (MSSQL or pgSQL).
- If someone in the community wants to put these events on Kinesis, Kafka, Event Bridge, etc., then they can either contribute some code to Admin API, or use change data capture (CDC) to push changes into a stream.
- Asynchronous process could run in cloud functions or as a lightweight service
Disadvantages
- More moving parts to develop, test, and manage.
- Need an additional API endpoint and back-end capability for checking on status.
There could be a middle ground option using something like Hangfire or Quartz.net for job control. These tools are built into a .NET application, and historically the Ed-Fi Alliance has used Hangfire directly in the admin applications. Option 2 does not propose having one of these tools directly in the Admin API, because (a) doing so does not solve the security problem and (b) that adds resource overhead to the Admin API runtime. These tools could be used in a stand-alone back-end service that reads from the job queue.
This document is not, at this time, making any recommendation on how to manage the asynchronous process. Such recommendations can be made if and when option 2 gains traction.
Option 3: Decoupled Execution
The Admin API does not directly manage the database instances. However, Ed-Fi provides SQL scripts that can be used as part of an orchestration process. The system administrator is responsible for managing the orchestration tool (e.g. TerraForm) and writing an appropriate shell script to execute the SQL script(s) (via psql or sqlcmd command line utilities, for example).
This version introduces an implied status on the instance metadata stored by Admin API:
- The initial POST request to Admin API creates a database record for the instance, and that record will have
status: "pending"
. - When the database instance has been created, PUT an update with
status: "ready"
.
Note that the diagram below also introduces the idea of loading descriptors into an empty database.
Advantages
- Admin API does not need super-user credentials in the database server.
- Much simpler process than Option 2 in terms of moving parts and required services.
- The main script (leftmost lifeline) can be orchestrated via tools like TerraForm.
Disadvantages
- Had to introduce the
status
column in the instance metadata to deal with the fact that we can't have a database transaction rollback in the Admin API context in case theCREATE DATABASE
command fails for some reason. only a minor inconvenience
Table of Contents