Presentation on theme: "Data: Application requirements, data flow, and person registry Tom Barton University of Chicago."— Presentation transcript:
Data: Application requirements, data flow, and person registry Tom Barton University of Chicago
CAMP Directory Workshop Feb 3-6, 2004 Copyright Tom Barton 2004. This work is the intellectual property of the author. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the author. To disseminate otherwise or to republish requires written permission from the author.
CAMP Directory Workshop Feb 3-6, 2004 Outline Three stages of managing identity information 1.Feeding the person registry - integrating identity from many authoritative sources 2.Processes & business logic at the person registry 3.Feeding consumers of identity information Some examples sprinkled in Selected policy & process issues (time permitting)
CAMP Directory Workshop Feb 3-6, 2004 Core middleware for an integrated architecture
CAMP Directory Workshop Feb 3-6, 2004 Potential sources of identity info “Big” administrative systems: student systems, payroll/HR systems, academic records systems, financials, telecom mgmt system, alumni systems, library systems, … “Small” sources: affiliated organizations with fairly simple administrative operations (excel?) Collateral operational systems: application- specific directories/databases, NOS directories, campus card systems, other metadirectory/ID Mgmt operations People’s heads: “ad hoc” affiliations, self, proxies
CAMP Directory Workshop Feb 3-6, 2004 UofC sources: now Student info & campus card system by live RDBMS views Payroll & faculty by periodic batches Dozen or so “small feeds” by aperiodic upload Self Trusted Agents to make temporary and “pre- feed” accounts 370 or so departmental directory reviewers Network security group
CAMP Directory Workshop Feb 3-6, 2004 UofC sources: planning or earnest discussion Feed from UC Hospitals Alumni system Select distributed IT support staff (mail & password resets) Potentially anyone to manage ad hoc groups
CAMP Directory Workshop Feb 3-6, 2004 Feed mechanics Source system selection criteria –Express the set of affiliation types or constituencies authoritatively represented in the source –Affiliation indicator attributes Format & transmission technology –Complete selections vs. differentials vs. transactions –Automated vs. semi-manual (eg, maildrop) vs. manual –scp flatfiles, live views, varieties of EAI (what are you using?) –Actual metadirectory products (what are you using?) –Ad hoc record structure, XML (what are you doing?)
CAMP Directory Workshop Feb 3-6, 2004 Identity Matching Matching strategies –Match personal IDs for each source record –Per-source shared identifier with prior matching –Broadly used institutional identifier with prior matching The query “is this person new” is resolved somewhere, somehow. –Inaccurate answers spoil 1–1 relationship between registry objects and real world subjects –It’s worthwhile to think on how to improve it! Insert “rational” ID Mgmt spiel here …
CAMP Directory Workshop Feb 3-6, 2004 Identity matching at UofC: now SSN StudentID (after prior match by SSN) “CorpID” (mangling of substrings of lastname, SSN) Several options for identifying “self” as authoritative source
CAMP Directory Workshop Feb 3-6, 2004 Identity matching at UofC: upcoming (dose of rationality ) UCID (SSN replacement) assigned as unique key in payroll & student systems at record creation time Person registry is authoritative source of UCID “Is this person new” is answered when a new record is to be created in payroll or student systems Tightly-coupled and loosely-coupled designs are being considered UC Hospitals feed might also use a similar design
CAMP Directory Workshop Feb 3-6, 2004 Canonicalization Provide simpler, consistent representation of certain data –Name –Phone number(s) –Address(es) –Department names –Names of “major” affiliations Transformation rules and business logic –Which source trumps name –Phone & address mappings –Rules to determine expressed affiliations
CAMP Directory Workshop Feb 3-6, 2004 Fat or thin? Fat = contains selected data from sources Thin = contains only links to sources Issues with thin: –Source system availability –Source system security (apps need creds) –App complexity (feed mechanics, identity matching, canonicalization rules) –Policy complexity (authorize N apps to access M sources) Issues with fat: –Data freshness –Downstream from canonicalization (usually a pro, but can be a con) Most campuses are fat!
CAMP Directory Workshop Feb 3-6, 2004 Functional requirements for a registry entry Private primary key –Never reassigned, never revoked –Not used for any other purpose –GUIDs are preferable to uniqueness within a database Publicly visible key –Available for sources or consumers to use to refer to the person (better than, say, a username) –Probably numeric string <= 9 digits to ensure that it fits in most predefined fields –Reduces exposure in case of disaster with primary key Crosswalk source and consumer specific identifiers
CAMP Directory Workshop Feb 3-6, 2004 Functional requirements for a registry entry Personal information –answer the “is this person new” query with sufficient accuracy –Support account claiming, initialization, or re-initialization Storage for whatever’s authoritative in the person registry –Egs: support for provisioning, email, username(s) Information obtained from source systems that is valuable to authorization or entitlement algorithms and policies The entry and its principal identifiers and personal info (at least) are never deleted from the registry (except…)
CAMP Directory Workshop Feb 3-6, 2004 Registry record structure at UofC RDBMS (Sybase) with tables for: –Each major source system –One in which to collect all “small feeds” –Individuals, one row per person –Tracking usernames –Supporting service baskets and (de-)provisioning –Supporting the security model for registry operations DB-local primary key (not a GUID), no PVID Records for “temporary” affiliations are removed
CAMP Directory Workshop Feb 3-6, 2004 Logging & reporting requirements Audit –Who had which identifiers when –State changes (when using a stateful provisioning model) –Activity, to a degree Diagnostic views/reports for selected helpdesk and operational staff Refer requests for reports outside of the scope of IT operational needs to the data warehouse group!
CAMP Directory Workshop Feb 3-6, 2004 Provisioning strategy Provisioning = maintenance of electronic ephemera required to facilitate users’ access to services Format & transmission technology –Incremental vs. differential vs. full consumer rebuilds –Periodic vs. asynchronous updates –Per-consumer or standard record formats –Transmission techniques (what do you do?)
CAMP Directory Workshop Feb 3-6, 2004 Provisioning strategy Service baskets –Business logic that determines which categories of people are entitled to participate in which services, with which service levels –One aspect of a more inclusive access control architecture –Egs: shell accounts & quotas, mailboxes, email forwarding, dialup profiles, vpn, wireless, computer registration, calendar, … –Issue of excessive granularization
CAMP Directory Workshop Feb 3-6, 2004 Not shown: transitions to prospective state from grace, limbo, slide, IDonly. Stateful provisioning
CAMP Directory Workshop Feb 3-6, 2004 Independent variables for state transitions state substate date the present state was reached date by which the present state might end (expiration date) major affiliation (faculty, staff, enrolled student, accepted student, registered student, alum, …) list of the identifiers of resources being managed for this account
CAMP Directory Workshop Feb 3-6, 2004 Fault avoidance & recovery Bad source data arrives – what happens? Flux high water marks –Hold update when # changes exceeds threshold –Possible in source side, more often seen in consumer provisioning techniques “Semantical filters” –E.g. can absence from the HR feed mean anything other than they’re gone? –Construct source filters based on knowledge of business practices that relate to selection criteria on the source system.
CAMP Directory Workshop Feb 3-6, 2004 Fault avoidance & recovery Person registry change log –Enables rollback & replay of consumer updates –Good diagnostic info –Supports a “hit me with the new ones” incremental provisioning strategy Stateful provisioning model can be constructed to ensure continuity of service & buy time to fix effects of bad source data
CAMP Directory Workshop Feb 3-6, 2004 Expression of rules Hard coded or abstracted rule syntax? Rules for –Affiliation –State transitions –Inclusion in service baskets –Memberships in selected groups (“minor” affiliations, privilege classes) Stanford, Memphis examples –Rules expressed in terms of registry object methods –External configuration file eval’d by the code that manages the registry
CAMP Directory Workshop Feb 3-6, 2004 Common consumers Minimum set of consumers & consumer technologies needed to meet application requirements! –Authentication, attributes, groups, coordinated identity management Types –Generic LDAP (maybe >1 replication networks) –Active Directory (maybe >1 consuming domain) –Kerberos –eDirectory, NIS, Ph, RDBMS (show hands?, others?) –Applications as direct consumers –Affiliated identity management operations
CAMP Directory Workshop Feb 3-6, 2004 UofC consumers Consumers –openLDAP (1 replication network), Kerberos, Active Directory, NIS, Ph uid is RDN uid namespace issues: regular, temporary, hospital people –Above with periodic diffs, high water hold, async self & management updates –Peer ID Mgmt operations (periodic full) Service baskets & statefulness being developed –Manual quarterly account closures suits UofC culture –Automated stateful approach to loss of services per- basket
CAMP Directory Workshop Feb 3-6, 2004 Selected policy & process issues How will the University operate its identity management infrastructure? –What balance between centralized and distributed operation? Registry – singular, centralized function Consumers – high degree of distribution possible Registration Authorities – small number?? –Who may have which role with what authority & obligations? –Leverages & extends existing data administration policies & processes, or begs if those are insufficient –Highly cross-functional activity demanding organizational flexibility
CAMP Directory Workshop Feb 3-6, 2004 Selected policy & process issues What entitlements should attend each type of affiliation? –“Major” affiliations: student, faculty, alum, … Possibly former or recent student, faculty, …? –“Minor” affiliations: in course 123, in department X, in degree program Y, occupant of building Z, … –What processes should determine entitlements for each affiliation? How should affiliations be structured?
CAMP Directory Workshop Feb 3-6, 2004 Selected policy & process issues Who should be issued a credential? What assurance level should authentication for each constituency achieve? What constraints may pertain to each? –Applicants (student, faculty, staff) –Admitted students, accepted faculty or staff –Alums –Parents –Library patrons –Guests: visiting academics, conference attendees, hotel guests, arbitrary “friends”, …