1 Glossary


Artificial Intelligence (AI) 

Describes the computer’s ability to automate various repetitive tasks that are traditionally done by humans. The three well-known types of AI are: Machine Learning, Deep Learning, and Neural Networks. Additionally, the three main areas where AI can be integrated into education are: “learning with AI (e.g. the use of AI-powered tools in classrooms), learning about AI (its technologies and techniques) and preparing for AI (e.g. enabling all citizens to better understand the potential impact of AI on human lives)” (UNESCO, 2021).

  • Examples of AI in education include: personalized learning, optimization of learning management systems and accessible learning technologies.

Business Intelligence 

Using cleaned, unanalyzed data stored in a variety of sources (i.e., data warehouses, data marts, data lakes), business intelligence transforms data into informational assets that enhance the data analysis capabilities of an organization to provide key supports for decision-making and strategic planning. These assets can include dashboards, spreadsheets, data visualizations, and other tools. Business intelligence leverages software and services used by enterprises to transform data into actionable insights gathered from historical, current, and predictive views of business operations (Knight, 2021). 


Chief Information Officer 

The most senior technology executive within an organization who oversees day-to-day IT operations. They are responsible for leading technological initiatives and strategies, and assisting the business side of the organization with maximizing data value.  


Communication Matrix 

A guiding document intended to formalize DM&G communications within an organization. The matrix should include key messages regarding DM&G activities and their intended audience. Some versions also include timeline and modality for each communication. 


Cross-Functional Teams 

A group of people with different functional expertise working toward a common goal. Organizations benefit from having individuals with various expertise because it allows multiple perspectives when solving data problems ranging from simple to more complex. 


Crosswalk 

A document describing how data elements are mapped across systems. 


Data Because data is an integral adjective in this section, we have excluded the word from the following definitions.

Analysis the ability to convert raw data sets into multiple forms that can be deemed insightful and useful by the organization which can later lead to the creation of business outcome differentials. 

Architecture captures and defines specific artifacts to be used enterprise wide, and relates more the systems that impact data movement (Algim, pg 62). It has three components: focuses on outcomes, employs specific activities, and reinforces collaboration mindset. 

Asset determines the datas value to the operation of the enterprise. 

Culture is the collective behaviors and beliefs of people who value, practice, and encourage the use of data to improve decision making within the organization. As a result, data is woven into the operations, mindset, and identity of an organization. It is comprised of a team of experts in data engineering, data science, data visualization, and business intelligence (Tabluea, 2022). 

Dictionary documents and defines all data fields including the purpose of the data, the characteristics of the data, and who owns the data which helps organize the data in a way that makes it more efficient for the organization. Input should be gathered from all data users across the organization to come to a consensus on how the organization specifically defines a data item. 

Flow Design describes how the data moves from one system to another.  It can be described in two ways: Current State and Desired State. 

Governance focuses on maintaining data assets and developing procedures and policies regarding strategic data management and governance practices around security, privacy and storage. It involves strategic activities by senior level staff, and systematic maintenance of the organization’s data assets.

Governance Council is comprised of operational and domain data stewards.  The Data Governance Council’s job is to resolve data disputes and organizational pain points in order to increase data value and ensure its asset worth in the organization. 

Handling Ethics is for all members of the organization who are involved in the collection, storage, analysis, and reporting of data. They keeps an eye on quality, security, privacy, and transparency. 

Management is a joint effort focused on mid- level people in the IT and business department. It is centered on implementing policies and procedures necessary to effectively manage data organization wide. 

Management Maturity (DMM) Model bridges the gap between business and IT to improve data management practices. The model defines the organization’s levels of data governance, reward, and risk and provides a common language that depicts progress in the primary data management disciplines. 

Modeling seeks through discovery to understand the scope, shape, and purpose of the data themselves. Different data models are used in operational settings and storage settings. 

Profiling uses statistical techniques to inspect data, assess quality, and analyze content and structure. Examples of profiling techniques include count of nulls, frequency distribution, and cross-column analysis to identify duplicate values. Data profiling is part of the discovery stage and, thus, helps organizations identify areas of data quality improvement.  

Quality Dimensions are used to measure the quality of data by establishing certain metrics. Dimensions can be objective or subjective. An organization can choose which data quality dimensions are most important based on the type of data they use. 

Six core dimensions of data quality are: 

  • Completeness
  • Uniqueness
  • Timeliness
  • Validity
  • Accuracy
  • Consistency  

Quality is a multi-step, iterative process that considers characteristics, implementation, and management techniques to improve how data is consumed. Data is considered high quality if it meets the expectations and intended usage of the consumer. Checking data quality is an ongoing and long-term process. 

Scientist an individual within an organization that conducts an array of statistical analysis techniques across many data sets. They utilize their programming and analytical skills to change the data into different forms which can be applicable in a multitude of ways within the organization. 

Security Policy is a statement outlining the processes and actions that will best protect the organization’s data assets. Security policies should be evaluated regularly so necessary updates can be implemented. Users must be trained on security policies so they are able to understand and follow them. These policies are supported by data security standards that provide supporting details for users. 

Security requirements are the needs that an organization must meet in relation to its data security practices. Some requirements are internal, based on processes and/or directed by stakeholders, whereas others are external, based on regulations or laws (such as the Family Educational Rights and Privacy Act).  

Security is controlled access and the appropriate handling of data in matters of privacy and confidentiality. Data security is most successful when it’s an organization-wide, collaborative effort. Relevant to different aspects of the organization: physical facilities, devices, networks, etc. 

Standards are official documents or agreements that provide guidelines for how data should be collected, recorded, and stored in a consistent format. Data standards enhance the interoperability of data across multiple systems as well as ensure that data are correctly represented and interpreted. Transparency and ethical standards are two important aspects of data standards that are commonly found in practices. 

Storage how data are archived for immediate or future use. 

Warehouse is the architectural technology designed and implemented to integrate data from a range of sources into an easy-to-access location, so that data can be used as decision support and be leveraged to enhance organizational value. Data warehouses are comprised of multiple databases containing row-level data and have the capability to execute queries and perform analysis. Data warehouses often house large amounts of historical data, centralizing and consolidating these from multiple sources (Henderson & Earley, 2017). All data stored in the data warehouse are cleaned and assessed for quality in order to provide a single, reliable source of truth to support business operations. Data warehouses provide support as a data management system to the operational function of the organization, as well as enable business intelligence activities to gather insights into the organization and its data consumers (Knight, 2021).


Database Administrator 

An organizational leader responsible for ensuring systems are functional and operate as intended, and performance is managed. They can be responsible for administrating data systems, facilities, storage, networks, and other components aligned to the data systems. 


Domain Data Steward 

A person who has a deep understanding of the data source, specific collection details (question and all possible answers), and the quality and security of the data in that domain. 


Gap-Risk Assessment 

Identifies the differences between best and current practices and potential risks by recognizing gaps and actively looking for ways to improve data management practices.


Governance Activity Matrix 

A data governance tool comprised of a two-dimensional matrix that cross-references the data processes of an organization with data governance roles and responsibilities, customizable to the data governance program of the organization. The matrix includes business units and their responsibilities cross with according data activities such as data migration tasks, data quality tasks, and master data tasks. The objective of the tool is to relay what data-related activities need to be completed, who needs to complete the activities, and the estimated amount of effort that should go into completing those activities. The matrix enables an organization to see identify areas of impact when changes to data occur, and how they will be reflected across the organization (Seiner, 2014). 


Interoperability 

The ability of data to flow to another system. 


Key Performance Indicators (KPIs) 

Metrics that can be influenced, such as sales goals or customer satisfaction.  


Machine Learning (ML) 

A branch of Artificial Intelligence (AI) and computer science that focuses on how computer utilizes advance software and complex algorithms to learn, adapt and update its tasks (i.e., making prediction, classification) based on existing data without the need for constant update in coding. Arthur Samuel first defined Machine Learning as “the field of study that gives computers the ability to learn without being explicitly programmed” (1959).

In 1997, Tom Mitchell, a computer science professor at Carnegie Mellon University, provided a more modern definition of ML: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E” (Mitchell, 1997).

There are three main types of Machine Learning:

  1. Supervised learning
  2. Unsupervised learning
  3. Reinforcement learning.

Measurements 

Data which have a number of descriptions such as temperature, length, weight, or cost. 


Metrics 

Measurements used for a comparative purpose, such as average temperature or average cost.  


Onboarding 

The process used throughout an organization to encourage employee participation in newly implemented data management and governance programs. It helps employees quickly and smoothly adjust to new performance requirements while developing new attitudes, knowledge, skills, and behaviors required to effectively fulfill new data management and governance policies and procedures.


Ongoing Communications 

A process which includes maintaining records of any type of communication (e.g. newsletter, email, or meeting notes) used to track changes in the data governance process or changes in tools used for data governance.


Operational system 

A system which holds data being actively being used to make data-driven decisions. Data is temporarily stored in operational systems until it is transitioned to a storage system.


Orientation Communication 

The communications used to introduce new employees to their jobs, workplace, co-workers, and responsibilities. It focuses on the roles a new employee will play in the organization and requires individuals to acknowledge data security, privacy, and compliance policies and procedures.


Readme File 

A text file that allows you to document context around your data file or folders which can be used to refer back to and track work. They are also helpful when working collaboratively with others because it allows for context to be shared and encourages explanation of nuances in the data for easier interpretation (University of Iowa, n.d.). 


Record Retention Policy 

Provides guidance on how long certain data items should be stored, when they should be disposed of, and how they should be disposed of. This policy may also cover exceptions to the policy related to legal compliance (Vital Records Control, 2020). 


Reference and Master Data 

Shared descriptions and definitions of data that rarely change and are used throughout the organization.


Risk 

Organizations can classify risks by assessing their potential impact and general likelihood. This helps organizations prioritize data security efforts. The term risk can refer to:

  • Potential data threats: actions by users or external actors, either with malicious intent or due to lack of awareness, that impact security.
  • Conditions within the security system that permit these actions, such as vulnerabilities. Security weak points through which ill-intentioned actors can gain access to the system.

Roll-up Process 

Establishes the rules which guide the proper aggregation of data prior to reporting. An example of this is reporting students as Hispanic or Latinx.


Security Processes: Activities that support an organization’s data policies and procedures which are used to protect data. These processes address issues related to:

  • AccessDetermines who is authorized to view and use which data. Restricts and allows access depending on level of data sensitivity and individual permissions. 
  • Audit: Used to determine whether data is being used in accordance with organization policies and national, state, and local regulations. Managed via activity logs and documentation.
  • Authentication: Allows data users to verify their identity before interacting with or using data. This process is controlled by verifying user credentials (usually through a login process) prior to accessing data.  
  • Authorization and EntitlementEnsures users have access to the data needed for their roles, and defines the full scope of their access rights which are usually set according to permissions.

Simple Virtuous Cycle

SVC can be implemented without prior experience in a specific domain. It minimizes the complexity of establishing data value by:

1. Measuring the situation

2. Identifying improvement to inform action steps

3. Implementing optimal action steps. 

License

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

Data Management and Governance Glossary Copyright © 2021 by Aishah Shubily; Emily Hires; Gisselle Diaz; Guillermo Lopez; Kenny Le; Kevin K. Nguyen; Lavanya Jawaharlal; and Maureen Ruiz-Sundstrom is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.

Share This Book