August 2024 - Project Tanager Workgroup

August 28, 2024, 11:00 am - 11:45 am central. Contact @Stephen Fuqua for the meeting invitation.

Agenda

  • Demonstration of current work

  • Review the roadmap

  • Design Questions

Attendees: Max Paulson (Denver Public Schools), JF Guertin (EdGraph), Joshua Impson (Resultant), Erik Joranlien (Education Analytics), Doug Loyo (Edufied), Geoff McElhanon (Edufied), Adam Hopkins (Simpat Tech on behalf of Ed-Fi), Brad Banister (Doubleline Partners on behalf of Ed-Fi), Vinaya Mayya (Ed-Fi Alliance), Stephen Fuqua (Ed-Fi Alliance).

Demonstration

 

image-20240828-161611.png

Meeting notes:

  • Working demonstration of:

    • POST multiple resources

    • See data flowing into Kafka and OpenSearch

    • GET ALL and GET by query examples, pulling from OpenSearch

  • We think this will work with Elasticsearch, but have not tested yet.

  • Recommendation: consider adding advanced query capabilities found in Elasticsearch/OpenSearch.

Roadmap

Milestone 0.4

Milestone

Functional Goals

Milestone

Functional Goals

0.1 done

Compliant Discovery API, Descriptor API, and Resource API definition (except GET by query): able to run bulk upload, smoke test. Includes JSON validation based on API schema file. Fake OAuth (1).

0.2 DONE

  • 📢 Milestone 0.2.0 has been reached!

  • .NET application with PostgreSQL storage

  • Level 0 and Level 1 document validation

  • Reference and descriptor validation

  • Error message like ODS/API 7.2

  • Docker and Kubernetes

0.3 in progress

  • Streaming data out via Kafka

  • GET by query using OpenSearch

  • Cascading updates on allowed key changes

  • Abandoned direct Kubernetes support as too costly

  • by 10/1 (Ed-Fi Summit)

0.4 need to accelerate

  • Token authorization

  • Client credentials management

  • Namespace authorization

  • Maybe by mid-October?

0.5

  • Design spikes around ed-org authorization

  • Concurrency management with eTags

  • Extensions

  • Multiple data standards

  • Maybe by mid-December?

0.6

  • Dynamic profiles

  • Multitenancy and routing

  • Maybe by mid-February?

0.7

  • Education organization authorization

  • Claimset customization

  • By Tech Congress?

Meeting notes:

  • Kubernetes: there is strong interest, but it does not make sense for the Ed-Fi development team to continue learning Kubernetes.

  • JF may pursue more about Kubernetes setup.

  • Recommendation: place production deployment utilities in a separate repository.

Architecture

Storage of Metadata

This is a sample message in Kafka, created from Debezium:

image-20240828-162238.png
  • _lastModifiedDate, _etag (and _lineage in the future?)

  • Store as metadata or add to the document itself? What should downstream consumers receive?

  • Already sending “API Standard” rather than “Data Standard”. Example: “SchoolYearType” instead of “SchoolYear”.

Tenancy Routing

GitHub Discussion for asynchronous conversation: Multitenancy / routing / instance management

There are the requirements gathered from the multitenancy / routing / instance management workgroup:

  • Allow ODS instances to reside on separate database servers.

  • Support various database segmentation strategies: ODS per year, ODS per district, ODS per district per year, and ODS per tenant. (Reference: Segmentation strategies in ODS API)

  • Enable API administrators to choose between implicit routes (segmentation information not visible in the routes) and explicit routes (Reference: Context-Based Routing).

  • Provide isolated configuration for client and security setup by tenants, making it easier to onboard new tenants or remove existing ones.

Authentication

Given that client credentials will ultimately be in a 3rd party OAuth provider… do we want to give API clients the 3rd party OAuth token URL, or put it behind a proxy? If behind a proxy, three possible options:

  • Data Management Service

  • DMS Configuration Service

  • or API gateway / load balancer

Logging

Showing output from docker logs:

Initial Startup

dms-postgresql:5432 - no response Waiting for PostgreSQL to start... dms-postgresql:5432 - no response Waiting for PostgreSQL to start... dms-postgresql:5432 - accepting connections PostgreSQL is ready. Installing Data Management Service schema. Master ConnectionString => Host=dms-postgresql;Port=5432;Username=postgres;Password=******;Database=postgres Beginning database upgrade Checking whether journal table exists.. Journal table does not exist Executing Database Server script 'EdFi.DataManagementService.Backend.Postgresql.Deploy.Scripts.0000_Create_DMS_Schema.sql' Checking whether journal table exists.. Creating the "schemaversions" table The "schemaversions" table has been created Executing Database Server script 'EdFi.DataManagementService.Backend.Postgresql.Deploy.Scripts.0001_Create_Document_Table.sql' Executing Database Server script 'EdFi.DataManagementService.Backend.Postgresql.Deploy.Scripts.0002_Create_Alias_Table.sql' Executing Database Server script 'EdFi.DataManagementService.Backend.Postgresql.Deploy.Scripts.0003_Create_Reference_Table.sql' Executing Database Server script 'EdFi.DataManagementService.Backend.Postgresql.Deploy.Scripts.0004.1_Configure_Replication.sql' Executing Database Server script 'EdFi.DataManagementService.Backend.Postgresql.Deploy.Scripts.0004.3_Configure_Replication.sql' Executing Database Server script 'EdFi.DataManagementService.Backend.Postgresql.Deploy.Scripts.0010_Create_Insert_References_Procedure.sql' Executing Database Server script 'EdFi.DataManagementService.Backend.Postgresql.Deploy.Scripts.1000_Create_Reference_Validation_FKs.sql' Upgrade successful 2024-08-28 16:32:13.988 INF Injecting PostgreSQL as the primary backend datastore 2024-08-28 16:32:14.042 INF Injecting OpenSearch as the backend query handler 2024-08-28 16:32:14.133 DBG Hosting starting 2024-08-28 16:32:14.248 INF Now listening on: http://[::]:8080 2024-08-28 16:32:14.251 DBG Loaded hosting startup assembly EdFi.DataManagementService.Frontend.AspNetCore 2024-08-28 16:32:14.252 INF Application started. Press Ctrl+C to shut down. 2024-08-28 16:32:14.253 INF Hosting environment: Production 2024-08-28 16:32:14.253 INF Content root path: /app 2024-08-28 16:32:14.254 DBG Hosting started
  • First several lines are from a Bash script

  • Next block of lines, before the first one with a date/time stamp, is obviously from initial database setup.

  • any concern about these lines not having date/time stamp or a log level? If so, likely we can remedy that with a small user story. But if there is no concern, then it is not worth spending any time on it.

Happy Path Flow

2024-08-28 16:35:39.815 DBG Connection id "0HN67FTPTI90Q" accepted. 2024-08-28 16:35:39.816 DBG Connection id "0HN67FTPTI90Q" started. 2024-08-28 16:35:39.839 INF Request starting HTTP/1.1 POST http://localhost:8080/data/ed-fi/schoolYearTypes - null 103 2024-08-28 16:35:39.843 DBG Wildcard detected, all requests with hosts will be allowed. 2024-08-28 16:35:39.872 INF {"method":"POST","path":"/data/ed-fi/schoolYearTypes","traceId":"0HN67FTPTI90Q:00000001","clientId":{"Value":"localhost:8080","HasValue":true,"Host":"localhost","Port":8080}} 2024-08-28 16:35:39.915 DBG 1 candidate(s) found for the request path '/data/ed-fi/schoolYearTypes' 2024-08-28 16:35:39.917 DBG Endpoint 'HTTP: POST /data/{**dmsPath} => Upsert' with route pattern '/data/{**dmsPath}' is valid for the request path '/data/ed-fi/schoolYearTypes' 2024-08-28 16:35:39.918 DBG Request matched endpoint 'HTTP: POST /data/{**dmsPath} => Upsert' 2024-08-28 16:35:39.920 INF Executing endpoint 'HTTP: POST /data/{**dmsPath} => Upsert' 2024-08-28 16:35:40.046 DBG Connection id "0HN67FTPTI90Q", Request id "0HN67FTPTI90Q:00000001": started reading request body. 2024-08-28 16:35:40.046 DBG Connection id "0HN67FTPTI90Q", Request id "0HN67FTPTI90Q:00000001": done reading request body. 2024-08-28 16:35:40.056 DBG FrontendRequest: FrontendRequest { Path = /ed-fi/schoolYearTypes, Body = { "schoolYear": 2022, "schoolYearDescription": "2022-2023", "currentSchoolYear": true }, QueryParameters = System.Collections.Generic.Dictionary`2[System.String,System.String], TraceId = TraceId { Value = 0HN67FTPTI90Q:00000001 } } 2024-08-28 16:35:40.056 DBG Entering ApiSchemaValidationMiddleware- TraceId { Value = 0HN67FTPTI90Q:00000001 } 2024-08-28 16:35:40.058 DBG Entering ApiSchemaFileLoader 2024-08-28 16:35:40.147 DBG Entering ApiSchemaSchemaProvider 2024-08-28 16:35:40.269 DBG Entering ProvideApiSchemaMiddleware- TraceId { Value = 0HN67FTPTI90Q:00000001 } 2024-08-28 16:35:40.270 DBG Entering ParsePathMiddleware - TraceId { Value = 0HN67FTPTI90Q:00000001 } 2024-08-28 16:35:40.273 DBG Entering ParseBodyMiddleware - TraceId { Value = 0HN67FTPTI90Q:00000001 } 2024-08-28 16:35:40.274 DBG Entering DuplicatePropertiesMiddleware - TraceId { Value = 0HN67FTPTI90Q:00000001 } 2024-08-28 16:35:40.275 DBG Entering ValidateEndpointMiddleware - TraceId { Value = 0HN67FTPTI90Q:00000001 } 2024-08-28 16:35:40.282 DBG Entering RejectResourceIdentifierMiddleware - TraceId { Value = 0HN67FTPTI90Q:00000001 } 2024-08-28 16:35:40.284 DBG Entering CoerceFromStringsMiddleware - TraceId { Value = 0HN67FTPTI90Q:00000001 } 2024-08-28 16:35:40.288 DBG Entering ValidateDocumentMiddleware- TraceId { Value = 0HN67FTPTI90Q:00000001 } 2024-08-28 16:35:40.296 DBG Entering ValidateEqualityConstraintMiddleware- TraceId { Value = 0HN67FTPTI90Q:00000001 } 2024-08-28 16:35:40.298 DBG Entering BuildResourceInfoMiddleware - TraceId { Value = 0HN67FTPTI90Q:00000001 } 2024-08-28 16:35:40.301 DBG Entering ExtractDocumentInfoMiddleware - TraceId { Value = 0HN67FTPTI90Q:00000001 } 2024-08-28 16:35:40.302 DBG IdentityExtractor.ExtractDocumentIdentity 2024-08-28 16:35:40.303 DBG IdentityExtractor.DeriveSuperclassIdentityFrom 2024-08-28 16:35:40.304 DBG ReferenceExtractor.Extract 2024-08-28 16:35:40.305 DBG DescriptorExtractor.Extract 2024-08-28 16:35:40.314 DBG Entering DuplicateReferencesMiddleware - TraceId { Value = 0HN67FTPTI90Q:00000001 } 2024-08-28 16:35:40.317 DBG Entering UpsertHandler - TraceId { Value = 0HN67FTPTI90Q:00000001 } 2024-08-28 16:35:40.323 DBG Resilience pipeline executing. Source: 'unknownFailureCircuitBreaker/(null)', Operation Key: 'null' 2024-08-28 16:35:40.332 DBG Entering PostgresqlDocumentStoreRepository.UpsertDocument - TraceId { Value = 0HN67FTPTI90Q:00000001 } 2024-08-28 16:35:40.403 DBG Entering UpsertDocument.Upsert - TraceId { Value = 0HN67FTPTI90Q:00000001 } 2024-08-28 16:35:40.496 DBG Upsert success as insert - TraceId { Value = 0HN67FTPTI90Q:00000001 } 2024-08-28 16:35:40.507 DBG Resilience pipeline executed. Source: 'unknownFailureCircuitBreaker/(null)', Operation Key: 'null', Result: 'InsertSuccess { NewDocumentUuid = DocumentUuid { Value = b0544561-30a3-4814-90ea-9a7c3e4104d5 } }', Execution Time: 182.7581ms 2024-08-28 16:35:40.509 DBG Document store UpsertDocument returned EdFi.DataManagementService.Core.External.Backend.UpsertResult+InsertSuccess- TraceId { Value = 0HN67FTPTI90Q:00000001 } 2024-08-28 16:35:40.511 DBG FrontendResponse: FrontendResponse { StatusCode = 201, Body = , Headers = System.Collections.Generic.Dictionary`2[System.String,System.String], LocationHeaderPath = /ed-fi/schoolYearTypes/b0544561-30a3-4814-90ea-9a7c3e4104d5 } 2024-08-28 16:35:40.519 INF Setting HTTP status code 201. 2024-08-28 16:35:40.519 INF Write content with HTTP Response ContentType of application/json; charset=utf-8 2024-08-28 16:35:40.520 INF Executed endpoint 'HTTP: POST /data/{**dmsPath} => Upsert' 2024-08-28 16:35:40.522 INF Request finished HTTP/1.1 POST http://localhost:8080/data/ed-fi/schoolYearTypes - 201 0 application/json; charset=utf-8 683.6045ms 2024-08-28 16:35:40.523 DBG Connection id "0HN67FTPTI90Q" disconnecting. 2024-08-28 16:35:40.523 DBG Connection id "0HN67FTPTI90Q" stopped. 2024-08-28 16:35:40.524 DBG Connection id "0HN67FTPTI90Q" sending FIN because: "The Socket transport's send loop completed gracefully." 2024-08-28 16:35:41.834 DBG Connection id "0HN67FTPTI90R" accepted. 2024-08-28 16:35:41.835 DBG Connection id "0HN67FTPTI90R" started. 2024-08-28 16:35:41.837 INF Request starting HTTP/1.1 GET http://localhost:8080/data/ed-fi/schoolYearTypes - null null 2024-08-28 16:35:41.837 INF {"method":"GET","path":"/data/ed-fi/schoolYearTypes","traceId":"0HN67FTPTI90R:00000001","clientId":{"Value":"localhost:8080","HasValue":true,"Host":"localhost","Port":8080}} 2024-08-28 16:35:41.839 DBG 1 candidate(s) found for the request path '/data/ed-fi/schoolYearTypes' 2024-08-28 16:35:41.839 DBG Endpoint 'HTTP: GET /data/{**dmsPath} => Get' with route pattern '/data/{**dmsPath}' is valid for the request path '/data/ed-fi/schoolYearTypes' 2024-08-28 16:35:41.839 DBG Request matched endpoint 'HTTP: GET /data/{**dmsPath} => Get' 2024-08-28 16:35:41.840 INF Executing endpoint 'HTTP: GET /data/{**dmsPath} => Get' 2024-08-28 16:35:41.843 DBG FrontendRequest: FrontendRequest { Path = /ed-fi/schoolYearTypes, Body = , QueryParameters = System.Collections.Generic.Dictionary`2[System.String,System.String], TraceId = TraceId { Value = 0HN67FTPTI90R:00000001 } } 2024-08-28 16:35:41.843 DBG Entering ApiSchemaValidationMiddleware- TraceId { Value = 0HN67FTPTI90R:00000001 } 2024-08-28 16:35:41.936 DBG Entering ProvideApiSchemaMiddleware- TraceId { Value = 0HN67FTPTI90R:00000001 } 2024-08-28 16:35:41.936 DBG Entering ParsePathMiddleware - TraceId { Value = 0HN67FTPTI90R:00000001 } 2024-08-28 16:35:41.936 DBG Entering ValidateEndpointMiddleware - TraceId { Value = 0HN67FTPTI90R:00000001 } 2024-08-28 16:35:41.936 DBG Entering BuildResourceInfoMiddleware - TraceId { Value = 0HN67FTPTI90R:00000001 } 2024-08-28 16:35:41.938 DBG Entering ValidateQueryMiddleware - TraceId { Value = 0HN67FTPTI90R:00000001 } 2024-08-28 16:35:41.940 DBG Entering QueryRequestHandler - TraceId { Value = 0HN67FTPTI90R:00000001 } 2024-08-28 16:35:41.940 DBG Resilience pipeline executing. Source: 'unknownFailureCircuitBreaker/(null)', Operation Key: 'null' 2024-08-28 16:35:41.942 DBG Entering OpenSearchQueryHandlerRepository.QueryDocuments - TraceId { Value = 0HN67FTPTI90R:00000001 } 2024-08-28 16:35:41.944 DBG Entering QueryOpenSearch.Query - TraceId { Value = 0HN67FTPTI90R:00000001 } 2024-08-28 16:35:42.030 DBG Resilience pipeline executed. Source: 'unknownFailureCircuitBreaker/(null)', Operation Key: 'null', Result: 'QuerySuccess { EdfiDocs = [], TotalCount = 0 }', Execution Time: 89.5347ms 2024-08-28 16:35:42.031 DBG QueryHandler returned EdFi.DataManagementService.Core.External.Backend.QueryResult+QuerySuccess- TraceId { Value = 0HN67FTPTI90R:00000001 } 2024-08-28 16:35:42.031 DBG FrontendResponse: FrontendResponse { StatusCode = 200, Body = [], Headers = System.Collections.Generic.Dictionary`2[System.String,System.String], LocationHeaderPath = } 2024-08-28 16:35:42.032 INF Setting HTTP status code 200. 2024-08-28 16:35:42.032 INF Write content with HTTP Response ContentType of application/json; charset=utf-8 2024-08-28 16:35:42.033 INF Executed endpoint 'HTTP: GET /data/{**dmsPath} => Get' 2024-08-28 16:35:42.033 INF Request finished HTTP/1.1 GET http://localhost:8080/data/ed-fi/schoolYearTypes - 200 2 application/json; charset=utf-8 196.1507ms 2024-08-28 16:35:42.033 DBG Connection id "0HN67FTPTI90R" disconnecting. 2024-08-28 16:35:42.033 DBG Connection id "0HN67FTPTI90R" stopped. 2024-08-28 16:35:42.033 DBG Connection id "0HN67FTPTI90R" sending FIN because: "The Socket transport's send loop completed gracefully."
  • Most of this is telemetry information. Should this be going into the regular console log, or should we move that off to another log sink? (possibly using Open Telemetry).

  • We have not been very careful yet with our logging messages and we should do a little cleanup to make these more useful. For example:

    INF Request starting HTTP/1.1 GET http://localhost:8080/data/ed-fi/schoolYearTypes - null null INF {"method":"GET","path":"/data/ed-fi/schoolYearTypes","traceId":"0HN67FTPTI90R:00000001","clientId":{"Value":"localhost:8080","HasValue":true,"Host":"localhost","Port":8080}}
    • We ended up logging the GET request twice

    • We used structured logging in the second entry, but nowhere else.

    • Sidenote: the traceId that you see here was auto-generated by ASP.NET. However, we have already built in support for using a correlation / trace ID header if it is present.

    • I don’t know what that clientId value is about. Certainly not showing anything useful. I would have expected clientId to be something we pull from an auth token claim, but there was no auth token in this case.

    • structured logs or unstructured? If we write our logging code correctly, we may be able to control this purely through Serilog settings, letting you choose structured or unstructured.

    • The date/time and the level should probably be in the structured message.

  • Most of these debug messages show entry into a function. With the TraceId present, you can follow the performance for an individual request.

    • The function entered is not consistently separated from TraceId with “space dash space”.

    • Wouldn’t a structured message be easier to use here?

Client Error

Performed a POST operation with an invalid payload.

  • The debug messages include the raw input value with no data masking. In Meadowlark, we kept the structure while masking the values. Shall we do the same in the DMS?

  • Note that the client error is not treated as an application error (with ERR log level).

Application Error

Stopped the PostgreSQL container. Client received the following response:

Console log:

  • We deliberately left the stack trace in place.

    • Not “structured” today

  • bug - Some of the DBG messages are mistakenly missing the TraceId in their output.

Application Log - Circuit Breaker

We’ve implemented the circuit breaker pattern, with configurable settings. Sample log after “a bunch” of calls to the DMS while PostgreSQL is down:

  • Note the WRN, FTL, and ERR logs.

  • Probably better to try to avoid the final stack trace, on the circuit breaker, as that does not add useful information.

  • However, a search on 0HN67FTPTI917 reveals that the actual error that finally cause the circuit to open has not been logged.

  • Should the FTL message have simply been an ERR? After all, the application is still running, and the database connection may restore itself after a time.

Restart PostgreSQL

  • We shouldn’t be logging the query result when the circuit successfully re-closes.