Connect to the Purdue Home Page

Purdue University

Identity and Access Management

Purdue University Identification:
Managment Tools: Initial Investigation


Introduction

The objective of this project is to develop a database of people affiliated with Purdue University. As part of this project, a unique identifier would be established for each individual. This identifier has come to be knows as the Purdue University Identifier, or PUID. It became apparent, however, that the letters, "P-U-I-D," conjured up very specific images in most people's minds. Therefore, a conscious effort has been made in the initial stages of this project to avoid placing emphasis on the identifier. Rather, it has been attempted to focus on the creation of the entire people database.

A series of interviews were held with people around the University to determine if this central database was necessary, practical, and/or desired. The overwhelming response is that such a database would be of great service to the University and its customers, and all of the areas represented in the interviews would realize benefit from the database.

In addition, an effort was made to understand how the various major databases (HR, UD, and Student) interact in ways that might be impacted by this central database. Several relationships were found and are described later in this document.

Those interviewed for this effort were

  • Laverne Knodle, MI
  • Nancy Yuochunas, MI
  • Helen Greene, MI
  • Herman Buchanan, Human Resources
  • Lee Gordon, Student Services
  • John Steele, PUCC
  • Nancy Grenard, University Development
  • Dale Daniels and Wayne Hilt, HFS (Access Card)

An additional interview was held with via teleconference with Jon Ives at MIT. MIT has recently implemented a similar system, and valuable insights were gained from this conversation.

This document will define the Business Problem Statement which were uncovered relating to this project; the Business Criteria against which the proposed database must be measured; the Business Solution which will define the recommended format and use of the database; the Business Benefits of this database, and the Business Implementation which will highlight the recommended implementation strategy for the database.

Business Problem Statement

  1. Identification of a Unique Person between Databases

    Each of the people interviewed said their areas have difficulty when it is necessary to match or compare people's information from their office's files with the same person's information in another office's files. Three causes of this difficulty were identified:

    1. Different office's computer systems use different identification numbering strategies. Even when the strategies are basically the same, exceptions are handled differently.
    2. Some offices allow (and some are required to allow) clients to request an alternate identification number, especially when Social Security Number (SSN) is the typical identifier.
    3. Some offices have changed the way numbers are assigned over time.


  2. Acceptability/Utility of SSN as a Unique Identifier

    For several years, a student's SSN has also served as their unique Student Identifier (SID). Through the interviews, it became clear that SSN is no longer a satisfactory number to use as the SID. The reasons are twofold: Acceptability and Utility.

    The acceptability issue is a combination of client preference and legality. Several students request not to have their SSN as their SID. By law, such requests must be honored. In such situations, a SID beginning with '999' is assigned to the student.

    The utility issue arises because of the number of SSN changes that occur. Many students change their SID during their time at Purdue, either by requesting that SSN not be used, or when mistakes are found in setting the SID to the SSN.

    The real problem is when the number that is intended to uniquely identify an individual no longer does so. To indicate the magnitude of the problem, here are counts of the number of students that changed their SID since 1990:

    Academic Year Student SID Changes
    1990-91 251
    1991-92 417
    1992-93 605
    1993-94 503
    1994-95 514
    1995-96 526
    1996-97 534

    In addition, HR estimates there are about 1,200 employee SSN changes in a given year. Many of these are new Graduate Student / Employees who do not already have an SSN. These people are initially entered into HR with a '999' "SSN", and then the number is changed when they get a real SSN. But not all of the changes are for students getting a new SSN. Many are simply correcting errors introduced when the SSN was entered into the HR system. The error could have been made when the number was written down, when it was read, or when it was typed into the system. The result is the same, however, and it must be fixed.

    Hence, the utility of using SSN as a unique, constant identifier over a period of time is not high.

  3. Transfer of Data Between Databases

    The differences in unique identification information of a given individual have resulted in real-time difficulties with data transfer between databases.

    Multiple difficulties arise as HR must compare Student systems data for a student with HR data for that same person. Taxable fee remissions, student address, Social Security taxability status, and more, all affect HR calculations of payroll. When the student has elected not to be identified as a student by his SSN, HR must manually check the student's status.

    Lee Gordon estimates that 75 to 100 work-study student / employees have requested not to have their SSN as their SID. The Work Study office must manually track these student's hours and earnings to ensure they follow work study guidelines.

    Even systems within the same general administrative area can be affected. While students can request that their SSN not be used as their SID, federal reporting purposes require that financial aid be reported by SSN

  4. Propagation of Data Changes Between Databases

    When data such as a person's address is changed, there is no way to know that the data may be stored in multiple databases here at Purdue. And because there is no way to know the data may be in multiple databases, there is no way to automatically propagate the changed data to other databases. Nor is there a way to even notify the other database owners that the data has changed.

    Instead, we expect people to know the various offices where their data is stored, and to report data changes to each of those offices.

    It has been demonstrated time and again that this method doesn't work. It requires people, from recent graduates to Vice Presidents of the University, into a trial-and-error process of re-correcting data as it is mis-reported, or into a guessing game to determine what information they aren't receiving any more.

  5. Individual Authentication

    With the growing implementation and use of computer technology at the university, authentication of valid users of university resources has become increasingly important and difficult.

    This is especially evident for the Purdue Identity and Access Management Office (IAMO). IAMO provides services to individuals with a variety of relationships to the university. In order to provide expedient service to its customers, IAMO needs to be able to quickly validate if the person seeking resources is authorized for that resource. In addition, since the person may need to take advantage of the resource soon after starting the relationship, IAMO must have access to recently updated information.

    There is no one list to which IAMO can go to verify that individuals have a relationship with the University and what that relationship is. A list such as this would have to be quickly available, and contain virtually current data, in order to be used to validate users.

  6. Customer Services

    When a person presents himself/herself to a department and is not on their system, it is much like he/she is new again to the University.

    Since there is no centrally gathered data of people's relationships with the University, there is no way for the department to know whether the person is truly new to the University, or whether he/she has already had a long and mutually beneficial relationship with the University.

    There is no opportunity for the department to adapt their procedures to handle long-term Purdue people in a manner different from people new to Purdue University. There is no opportunity to verify existing data versus gathering new data. There is no way to know that a person may be very familiar with Purdue's practices, and which people need more explanation.

    For instance, if data were available for a 10-year employee, a department could verify the address from HR, verify knowledge about University policies, and maybe even accept a check numbered less than 150 without extra identification. For someone completely new to the University, a department must gather all the information they need, explain thoroughly University policies and procedures, and be cautiously business-like in accepting payment for services.

    Last but not least, there is a timing problem with having information available to verify new staff. Most of the fix will result from the new HR system, and changed timing and opportunities to directly enter data into the system for new employees. Having this new information available in one location to verify status will help departments provide better customer service to all clients.

Business Criteria for a Solution
  1. K.I.S.S.

    One theme which came up repeatedly in the interviews was the necessity to follow the "KISS" principle: "Keep It Simple, Simon." (Editorial privilege taken.) In order for the database to be accepted and utilized, its use must be basic, simple, and effective.

  2. Phased Implementation

    Given the complexity of identifying and combining "people" data from all sources into this database, the solution must be a phased implementation. An initial implementation with enough data can provide most of the benefit to the University of a complete implementation. The remaining holders of people data can see the demonstrated benefits, and this can build desire to be included in future phases.

  3. Location

    The database must be centrally located to facilitate delivery of information to all users, and to simplify its creation and comparison with people data in participating operational systems.

  4. Support

    Another recurring theme from the interviews is that this database must enjoy widespread support from the Purdue data community. If the data is to be kept up to date, all of the departments which enter and maintain individual data must be committed to support this effort. Otherwise the integrity of the data will be quickly compromised.

    It was understood by all who mentioned this that on the surface it looks it will add work to check another system to see if a person is already in Purdue data. Each person also mentioned that either this check is already being done, using less-friendly sources, or that work will be saved in the long run by having a defined database that links people from system to system.

  5. Accuracy

    In order for the database to be accepted by the data-user community, the system must provide users with data that is accurate and timely. The more immediate the information is, the more value the database will provide.

  6. Availability

    Since this database will be supporting business needs in a number of university areas, widespread availability of this system must be guaranteed. The data must be available for both on-line access as well as for batch jobs.

  7. Accessibility

    The database must be accessible through widely available technology. Generic access to the database must be available. Any special technology which requires a reworking of an individual department's standard must be avoided.

  8. Disruption

    The database must be available without interfering with normal operations of any of the departmental databases.

  9. Timeliness

    The first phase of the project must be completed by the end of April, 1998 in order to support the potential reissue of the Access Cards at that time.

Business Solution
  1. Project Ownership

    There are two aspects of ownership with this project.

    By far, the most important is to have each involved client area accept ownership of their part and involvement in the project. Without their commitment to the following areas, this project should not be attempted:

    1. Helping to define their part of the database.

    2. Working in concert with the other involved client areas during conversion to create a combined list of people from the client areas. This will be a non-trivial effort to define and refine the matching rules, and to manually handle the list(s) of "suspected but not certain" duplicates.

    3. Continued use of the people database as a primary source for identifying people before they are added to operational data systems.

    The second aspect is to champion the project to the Administrative Services Steering Committee to ensure it is assigned resources in Management Information necessary to develop and populate the database, and to build a supporting system to present and maintain the database. Laverne Knodle will provide this.

  2. Unique Identification

    A unique identifier will be assigned to each person with a relationship with the university. The features and constraints of this identifier will be as follows:

    • For reference purposes, the unique identifier will be called the Purdue University Identifier, or PUID.

    • The PUID will be numeric and have a maximum length of 9 digits. This will allow assignment of 1 billion unique numbers. We cannot foresee exhausting this supply of numbers. In addition, this number of digits will properly fit into the space available in the ISO numbers for the new student access cards due out in the next year.

    • Once assigned, a PUID for a given person will never change. Once a person is assigned a PUID, they have that PUID for life.

    • The PUID will be stored in the People database along with the unique identifiers from each of the other systems. This will alleviate the need for departments to make changes to their systems as all matches can take place in the People database.

    • All individuals in the people database must have a PUID.

  3. Data Ownership

    The only piece of information controlled by this application will the PUID. The PUID will be centrally owned by MI (?). All other information will come from existing database systems from other departments. These departments will maintain ownership and responsibility for their data.

    Even with the minimal data listed above, the possibility exists that different systems will have different values. A system of rules must be worked out to allow the data-gathering process to handle most or all of the data changes. The rules should acknowledge that the "most important system of record" will naturally change as people move through their experience at Purdue. The rules may also need to allow a manual overrides in particular situations. Or not. These rules need to be worked out with the DCMs and Data Stewards.

  4. Identification Information

    Even with the minimal data listed above, the possibility exists that different systems will have different values. A system of rules must be worked out to allow the data-gathering process to handle most or all of the data changes. The rules should acknowledge that the "most important system of record" will naturally change as people move through their experience at Purdue. The rules may also need to allow a manual overrides in particular situations. Or not. These rules need to be worked out with the DCMs and Data Stewards.

    • Date of Birth
    • Gender
    • Last Name
    • First Name
    • Middle Name
    • Maiden Name
    • Title
  5. Link Information

    Information needs to be stored with the PUID for each client area (or relationship) involved with the People database. This information would include the identifier used in the operational system, a status (of the sort now provided to service providers such as PUCC and the Libraries), and additional identifying information from the operational system.

    The identifier would allow simple and certain data lookup in on-line systems by giving the terminal operator the person's system identifier. It could also be used to verify a person is in another system, and to translate identifiers from one system to the other.

    The additional identifying information should be kept simple, providing information primarily to help determine if this is the person for whom you are looking.

    One use for additional data that I have heard before, but we did not hear during the interviews, was to identify people who have significant relationships with Purdue. This needs to be considered and either accepted or rejected. If accepted, each client area making use of data from the system will need to know that "VIP" information may be present, and make decisions about what impact, if any, this information should have on services provided.

  6. Database Platform

    For reference purposes, this central database of people associated with the university will be called the People database. The database will be kept using the Oracle database. This database has the best combination of presentation tools and commonality with the newest systems that are being developed or installed by Management Information. There are also defined and demonstrated means of transferring data as needed from MI's mainframe data systems.

Business Benefits
  • This system will be a foundation system allowing verification of all individuals with relationships to Purdue University. It can serve as a source for a future overall security scheme for identifying University customers when they attempt to access a University resource.

  • The system could allow new staff to be verified to access University resources without waiting until they've been paid the first time. These resources include IAMO-administered computer resources, the libraries, and Access Cards.

  • This system can be an information source for individuals with "non-traditional" university relationships. These include associate and adjunct faculty, emeritus faculty, professors from other universities working here at Purdue, classes of personnel not on the payroll, campus ministers, and so on. While some of these people have records on the HR system files, not all of them do.

  • This system can track attributes of a person not otherwise tracked. This would set up this system as a system of record for University relationships that are not otherwise tracked. While the attributes tracked would necessarily be minimal, there would be a defined place to track people.

  • This system would serve as a known, documented, and convenient central database to provide a quick and reliable way to link records individuals between databases, whether operational databases or data stores.

  • The system could serve as a source for unique identifiers for remote databases like the Libraries, or the VET school, who could then tie their data to other shared University data for the same person.

Business Solution Implementation

Phase I, needed by 5/1/98, needs to provide for the following. Some earlier decisions are needed as noted.
  • Identify server software for name matching and comparison, by 12/1/97. Obtain use of the software as soon as practical after that.

  • Import of current data for initial area(s) involved. We must decide before ??/??/97 which areas will be initially incorporated.

  • Provide numbers for Access Cards to be assigned to new Students

  • Provide data to PUCC for areas involved

  • Maintenance features to handle new people being entered into the operational systems for the areas initially incorporated. Both batch and online procedures are needed.

  • Maintenance features to handle new people being entered into the operational systems for the areas initially incorporated. Both batch and online procedures are needed.

Future phases will be needed to incorporate the following:
  • Shared data update notification

  • Shared data update propagation

  • Real-time links with ODS databases (HR and UD?)

  • Real-time links with mainframe databases

  • Import of people from additional operational areas

  • Other interfaces, such as for a security sys

Feedback | Contact Purdue | Style Standards
Maintained by: IAMO Team

Purdue University, West Lafayette, IN 47907, (765) 494-4600
© 2010 - 2013 Purdue University | An equal access/equal opportunity university | Copyright Complaints
If you have trouble accessing this page because of a disability, please contact the CSC at itap@purdue.edu or (765) 494-4000.