Internationalization Work Group 2020-06-22
Agenda
- Review IWG goals
- Go through changes and questions since April
- Include Person entity updates
Materials
- (as PDF)
- IWG-DataModelConcepts - June 2020.vsdx (as Visio format)
Notes
Review of Charter Goals
Clarification that the goal is a version that is usable, but that the Alliance will not drive that usage - the model will be to initiate a community discussion on timelines for change
Model Reviews
Organization
- No changes here. No discussion here.
Person Domain
June Update: Ed-Fi v3.2.0-c includes this model and is being validated through the TPDM project
- Note: Another approach is to have a PersonRelationship rather than person-role relationships. This gives flexibility to define any type of relationship needed.
- This structure would work well as long as there is guidance as to the breadth of the semantics. But you wouldn't want to generalize it to the point of representing associations that have other implications in the model (like StaffStudent). If you could bound this intent, then that could work very well.
- Highly semantic relationships may need to have their own associations (StudentGuardian). In addition to this, person relationships between person-roles outside of highly semantic relationships, are not very useful. StaffGuardian is probably the one example where this is useful for things like transfer requests. Because of this, K12 has not invested in Person Identification Systems that can cross source systems. So, how do you identify a person? This could be a weakness of the strategy. This is more of a ecosystem maturity issue. Maybe Ed's idea of keeping some highly semantic relationships is an "out" for this issue.
- Ed: In K12, these person relationships outside of StudentGuardian, have not been important. But that doesn't mean they won't be important in the future. Would it be useful to know the "friends" of students from social media?
- In this model, would want a single Person record to represent the individual. And that Person would have different id's and source systems. So you need to have a separate table "Person Identifier" to hold various id's from different systems.
- Shared out PersonIdentifier model from MSFT. Would also recommend having Relationship value sets that are determined by region.
- Let's look at this concept with the TPDM use cases we now have.
- There is a difference between capturing the relationship at a concrete level versus how it affects the systems that are involved. There could be a variety of person relationships but in the LMS, you just need to know if someone is an "Observer" in a class like a Parent/Guardian/Specialist. Access may depend on the setting.
Student Membership Domain
Distinction between highly semantic relationships versus weak relationships (more generic).Open discussion to keys. Ed-Fi uses natural keys.
- Q: Any thought of associating StudentGroup with Section?
- We have not looked at this yet.
- This is something some systems do in addition to clubs and sports. Within a classroom there may be groups that work on projects. Same thing for Teams concept.
- Agreed. This is a need they see as well.
- Would like to avoid a strong surrogate id. Would prefer to stick with the natural keys.
- Voice of strong supporter of surrogate keys. Would like to hear more about why natural keys are preferred. It seems like more technology is moving towards surrogate keys.
- One of the goals with natural key model, is because there are strong identification systems for some of these entities. So we're trying to re-use those for discoverability. Student Id as an example. If they have an id, and you create a new one, then you've duplicated this id. If you intend your system to be a closed system, this is not an issue. But if you're releasing these id's out then this is a drawback. Natural keys also help provide an initial set of constraints to validate data. Ed-Fi has key unification. So data quality benefits are useful but they are not complete, by any means. This is a lesser benefit than the muddying the waters with more id's.
- One issue found is sometime systems do not supply very clean data. If you have a surrogate id, there is less of a barrier to entry. Any contributing system is providing their own unique id. So you do have to maintain unique id's per system. But for most cases, you would expect systems to not overlap on many entities. So data validation would be better served after the data has landed.
- One option would be to make this configurable to meet both options.
- Data lake standard has a different purpose than an ODS or analytics data store. Another note is that Ed-Fi today does provide a vehicle to use partial surrogate keys. For example, Ed-Fi v3 deals with a partial surrogate key for Section based on pains from the previously complex natural keys. In the assessment example, there is a vendor surrogate key (AssessmentIdentifier) but must be distinguished across vendors so Namespace is added.
- Is there a way for data can flow in an unconstrained way and then is validated after landing to be more constrained. Two ideas of "in transit" and "at rest".
- Wouldn't want to characterize this as a transactional versus data lake concept. Want to capture StudentSchoolEnrollment but maybe cannot provide all required elements required for a natural key. And these could still power downstream applications. Upfront requirements can be a big barrier, even from a transactional standpoint.
- Data cleanliness is a process. It is a current problem for new implementations because they get "stuck." For State implementations, they want the extra validation. But these barriers are tough in other contexts.
Staff Membership
- Mirrors Student Membership models. No discussion here.
Attendance
- Again, not a lot of changes here. No discussion here.
Programs
Removed abstraction that is in the core model.
Open question around how to handle specialized programs (like Special Education).
- One thoughts: programs and courses could be paralell as instructional opportunities for students?
- That came from the higher ed space but based on Ed-Fi's description of programs, these may not be as close after all. When we define these things, the goal is to associate a student with a program. In higher ed use case, this meant a student in a Master's program so that would be the relationship.
- Ed-Fi defined these differently in K12 by following domain driven design. These are not programs of study, like in higher ed.
Action Items
- TODO: Look at person identification concept from MS with the TPDM use cases.
- TODO: Update Person on Person Attributes tab.
- TODO: Add key information to these diagrams.
- TODO: Eric/Ed to provide some background information on the surrogate/partial surrogate/natural keys. This is good for our next meeting.
- TODO: Do a high level comparison of models between courses and programs.
- TODO: Diagram out options to handle specialty programs (e.g., Special Education, Title I).