Sei sulla pagina 1di 41

Taking Responsibility for Data Quality through Data Governance

David Plotkin Finance Data Quality Manager

Bank f A B k of America i

Data Quality 2012 Asia Pacific Congress


Last revised: 01/21/2012

Agenda
Introduction Understanding Data Governance and its impact (and value add) to the Enterprise. Data Governance and Data Stewardship How to implement Data Governance:
What does the organization look like Figuring out what youve got Adding DG to the Project Methodology The tools youll need

Setting up a Communications Plan Measuring Success

Data Challenges: Data Needs to be Managed


Collecting Data Definitions for isolated databases is not enough:
Definitions written in haste by project staff Not rationalized across the Enterprise Documentation gets lost

Formal Enterprise-Wide Data Governance


Treat data as an asset inventory assign owner publish inventory, owner, glossary Ownership at a granular level of detail Consistent names & definitions across all apps and databases Data Governance involved in all aspects of Data Quality Include data governance processes in software lifecycles lifecycles.

Understanding Data Governance


Data Governance is the execution of authority over data management:
It all about d t ownership at the organizational l Its ll b t data hi t th i ti l level (D t l (Data Governance board) And decision making at the data element level (data stewardship)

The exercise and enforcement of decision-making authority over the management of data assets and the performance of data functions functions.
(Robert Seiner, TDAN and KII Consulting)

Ensuring that the enterprises data assets are formally managed. Coordinating communication to achieve collective goals through collaboration. th h ll b ti
(Steven Adler, IBM)

What is Data Governance (Practical)?


Represents the Enterprise in all things data and metadata
Metadata: Mandates capture of this information Data Quality: Issues, fixes, rules, and projects Data-related policies and procedures Champions data quality improvement projects Instigates methodology changes to ensure capture of data and metadata

Owns the data and metadata Driven by relatively high-ranking individuals who can make decisions for the Enterprise.

Data Governance Value


Data Governance must tie back to the universal value drivers:
Increase revenue and value Manage cost and complexity Ensure survival through attention to risk, compliance, security, and privacy (Gwen Thomas)

And does it? Thi k about: A dd Think b t


How much time is wasted arguing over ill-defined or undefined data elements. How many bad decisions are made due to undefined elements and poor quality. How lack of trust in data drives analysts to do strange things things

Enterprise Data Governance in a Nutshell


Data Governance ensures that data is treated as a valuable asset and that it is wellwell defined, accurate, consistent, and meets business needs. Data Governance provides project support along with an evolving set of policies, procedures, and guidelines to achieve these goals.
Everyone Inventory shared data, requirements, and issues Project Data Stewards work on projects to collect these these. DG SharePoint site facilitates the work. Data Governance Team Publish policies, processes and organization Coordinate committees Publish definitions, valid values and rules in Business Glossary Work with project teams to align deliverables to definitions Publish data quality issues and resolution decisions Data Stewardship Committee Ownership is by Business Function Business Data Steward represents each Business Function Escalation path is to Data Governance Council Data Stewards Define Data Elements, Valid Values & Derivation Rules Perform data quality analysis Work with SMEs and Technical Data Stewards Choose DQ remediation plan

Identify Data Elements and Issues

Assign Owner

Communicate Process, Decisions, Results

Define, Assess, A Make Decisions

The DGI Data Governance Framework

Enables overcoming challenges and achieving common goals


$ Goals 1 2
Cant easily Customize Product offerings and bundles

Revenue Generation
Undermines

Cost Reduction / Avoidance


Inhibits Cant easily Reduce data errors causes High Infrastructure cost causes

Compliance & Risk


Undermines

Strategy & Business Agility


Undermines

High potential Remediation costs

Non-compliant With state & Federal regulations

Difficult to Meet demands Of new business channels

In nhibitors

Cant easily Identify high value customers causes causes

Ad-hoc data Quality methods

Tarnished Brand reputation

Higher than Necessary Probability of Data misuse

Cant easily Identify key Relationships & hierarchies

Cant easily Identify crossSell, up-sell opportunities

Lack of data Retention policies Weak data Security monitoring

No single view Of customer

Exposure of Personally Identifiable Id tifi bl Information in Non-production

Cant easily Consolidate data From silos, Integrate new Systems quickly (M&A)

Courtesy of Steven Adler, IBM

Data Governance and Data Stewardship


A data stewardship program is a key part of an overall data governance program. It is the operational aspect of data governance. If (as we stated earlier) data governance is the the execution of authority over the management of data, then data stewardship is formalized accountability for the management of that data.
(courtesy of Robert Seiner, KIK Consulting)

This is where the day to day work gets done day-to-day done.

What do we mean by a Data Steward?


A key representative in a specific business area that is accountable for quality and use of that data throughout the organization. The data stewards are the owners of organization the data and the decision-makers about the data (Sherry
Michaels, Erie Insurance)

Data stewards are the ones who can reach into the organization and pull out the knowledge (and knowledgeable people) that are needed needed. Data Stewardship is NOT a job it is the formalizing of data responsibilities that are likely in place in an informal way. Data Stewardship involves specific tasks for which the p p stewards must be trained.

Data Stewardship: Needed for Data Quality


A data quality initiative introduces new constraints on the ways that individuals create, access, use, modify, and retire d d i data. T ensure that these constraints are not To h h i violated, the data governance and data quality staff must introduce stewardship That means: stewardship.
Data Quality policies: introduced and monitored Enough metadata to support the data quality processes Incorporation of data quality into system design by the developers. Data Quality requirements must support enterprise usage of the data (not just what is needed for the source system). Identifying important business impacts of poor quality.

Data Stewards are Accountable


Data Stewardship establishes accountability for:
Data definitions and derivations Data D t quality rules and th i enforcement lit l d their f t Key role in improving data quality Data-related communications Data element rationalization Contributing to data-related policies and procedures. Understanding the downstream uses of their data and how proposed changes impact those uses.

Data Stewards have authority


Their decisions are enforceable Oversees all data-related work in their business function Represents their business function as the single point of contact.

What Happens Without Data Governance?


Different parts of the organization:
Use their own definitions for data, so they may enter different values. L d t b d d i i l Leads to bad decisions, numbers th t d t match, etc. b that dont t h t Derive their numbers based on different calculations and the numbers dont match. Make different determinations of the data quality, leading to different degrees of confidence in the numbers (or even a decision not to use certain data) data). Long arguments about meaning and quality.

Master Data Management (MDM) is impossible! Improving Data Quality is very hard except in limited silos. silos

The organization without Data Governance

Data Quality without Data Governance


Data quality deteriorates over time Hard to correct because:
Data producers are incented to be fast, but not necessarily accurate. Stewards must champion changing the business priorities. priorities Data quality rules are not defined. Stewards can define the rules and required quality levels. Individuals make their own corrections. Stewardship exposes this and the costs of these processes. Poor quality data is not detected proactively Stewards can proactively. demand (and demand funding for) enforcement of DQ rules during system loads.

Data Governance Organization


Business and IT view of the Data Governance Organization:

Business
Data Governance Business Sponsor PT

IT
Data Governance IT Sponsor PT

Data Owners PT

Chief Data Steward FT Enterprise Application Owner (Delivery Manager) PT Application Domain Owner (Business Partners) PT

Enterprise Data Steward FT

Project Data Stewards FT

Business Data Stewards PT

Data Domain Stewards FT

Legend
Data Governance Committee Data Stewardship Council

Technical Data Stewards PT

PT = Part Time FT = Full Time

Creates working group

The Stewardship Organization

Data Stewardship Council


Enterprise Data Steward Sa es Sales
Membership Products

Insurance S i Services Claims

HR Underwriting Operations

Call C t Center

Marketing a et g

Financial M d li Modeling

IT

Financial T ti Transactions

Travel a e

Actuarial

Business Functions

Data Stewardship Committee


Functional body for data governance program Apply data standards, policies, and principles. Participate in and contribute to data governance processes. Evaluate effectiveness of processes. Approve and manage data-related information information. Contribute to and ensure completeness of data-related documentation (metadata). Make decisions on ownership of data. Communicate data governance vision & objectives to business function and data analyst community community. Shape data governance design and implementation; ensure alignment to the business. Communicate decisions of the committee.

Why Add Data Governance to Project Methodology?


DG tasks benefit from scope limitations of a project.
Limited block of data Limited number of source systems

Management of tasks and deliverables benefit from professionals (Project Managers).


PMs will bird dog the deliverables and ensure they get done (that s (thats the theory, anyway). theory anyway) PMs will schedule the tasks and allocate the resources.

Projects have the business attention business attention.


Subject matter experts are assigned. Time is allocated to work on the project tasks.

What needs to be added to Project Methodology?


Integration with Project Management Metadata components (definitions derivations data (definitions, derivations, quality rules). Data Quality Components Solution Evaluation components QA Components (including Data Quality Assurance)

Data Governance Value to a Project


Collection of data definitions
Building a body of stewarded and understood data definitions benefits all those in the enterprise who use the data, and alleviates confusion when discussing the data. This also helps with conversions.

Collection of data derivations


B ildi a body of stewarded and validated data derivations l d t a Building b d f t d d d lid t d d t d i ti leads to common way of calculating numbers. The result is not only that the project delivers results that match the official calculation method, but p y y p y much less time is spent by data analysts across the company attempting to reconcile reports.

Identification and resolution of data quality issues


Poor data quality can keep a project from going into prod ction The q alit production. risk to a project is lessened by early identification (and where possible, resolution) of data quality issues. Data profiling measures p , p p specifics of the data, and provides a comparison between what the data looks like and what the data quality rules say it should look like.

Adjust Project Methodology: Data Quality


Collect (during Analysis and Design):
Data Quality issues and rules for measuring quality (meet guidelines) Data Quality rules: When the data goes bad, how do you know? Information to verify the issues and quantify severity

Project resources
Guided by Project Data Steward, collected from business analysts/SMEs y j y Documented in Mapping document or DQ rule dictionary

Measure and validate rules against data using Data Profiling.


Quantifies the extent of the data quality p q y problem. Rules may need to be restated if fit to data is poor. Data is examined and results reported back to the business. Determination must be made as to fitness for use.

Metrics:
Total DQ rules stated and validated Fit of data to stated rules Change in quality of data over time

Adjust Project Methodology: QA


QA test cases written using Data Quality rules
Test cases run as part of regular QA process D t defects tracked in QA system and prioritized and worked Data d f t t k d i t d i iti d d k d just like any other defects. Some business rules and relationships may show up as data defects (policies without dri ers) itho t drivers).

QA test cases written using metadata (definitions)


Do screens show data expected based on definitions? Do valid value sets show values expected based on definitions and stated value sets? D screens show multiple fi ld th t are actually th same thi Do h lti l fields that t ll the thing (due to acronyms)? Has the metadata been entered into the EMR and glossary?

Data Governance and Data Quality


A primary deliverable for Data Governance is improved data quality This should go beyond just response to DQ issues (reactive) and include defining, finding, and fixing DQ issues before the customer does (proactive). Should include Data Quality Analysis and Reconciliation Needs to be driven by the Business Impacts of poor quality: some data may be bad, but if it doesnt stop important business processes, MOVE ON.

The Data Quality Improvement Cycle


(1) Identify and measure how poor data quality impedes business objectives

Analyze

(2) Define business-related data quality rules & performance targets

(5) Monitor data quality against targets (3) Design quality improvement processes that remediate process flaws.

Act
(4) Implement quality improvement methods and th d d processes

Business Results Metrics Example


Cost of poor quality data to your business:
Calling/Mailing costs: How many times did we contact someone who already had a particular type of policy or who was not eligible for that type of policy? How much postage/time was wasted? Loss of productivity/opportunity cost: How many policies could have been sold if agents had only contacted eligible policyholders? How much would those policies have been worth? Loss of business cost: How many policyholders canceled their policies because we didnt understand their needs or didnt appear to value their business (survey can give you an idea). What is the lost lifetime value of those customers? Compliance cost: How much did we spend responding to regulatory or audit requests (demand!). How much of that was attributable to poor data quality or information not available?

Steps to Data Quality Analysis and Reconciliation


Data Profiling Data Profiling Results Review
Reviewing the data quality analysis with Data Stewards to determine acceptable ranges of data quality, associated risk, transformation guidelines and recommendations on data guidelines, cleansing.

Data Cleansing
The development of required ETL processing to cleanse the data. Only want to do this once after the process has been fixed. Or thats the theory, anyway

Collecting the Data Quality Rules


Get the rules from the Data Stewards C eate template co ect the quality u es Create a te p ate to collect t e qua ty rules:
Mandatory, optional, valid values, valid range, data type, patterns R l ti Relationships b t hi between d t elements data l t Relationships between records in different tables

Guided G id d conversations with stewards t gather rules ti ith t d to th l Helping the business help us define what we mean by good quality f a d t element. d lit for data l t Can help to pre-profile the data (do a sample extract) to h t show th stewards what is actually present now. the t d h ti t ll t

What is Data Profiling?


Data Profiling is a process whereby one examines the data available in an existing database and collects statistics and i f i i d information about that d i b h data.
Wikipedia, http://en.wikipedia.org/wiki/data_profiling

Data Profiling is the use of analytical techniques to discover the structure, content, and quality of data. Danette McGilvray Granite Falls Consulting, Inc. McGilvray, Consulting Data Profiling is a set of algorithms for statistically analyzing and assessing the quality of data values within a data set as well as exploring relationships that exist between data elements or across data sets.
David Loshin, Knowledge Integrity, Inc.

What is Data Profiling (continued)?


Uses both real data and metadata to determine the quality of data. Identified source data requires both a detailed analysis of the raw data values currently stored in existing databases and files and review of the files, existing metadata, to determine the actual meaning, descriptions and relationships that should be found in the data. Data profiling should be used whenever data is g being converted, migrated, warehoused or mined. Can help discover business rules embedded within p data sets, which can be used for ongoing inspection and monitoring.

General Benefits from Data Profiling


Identify or validate availability of information. Improve predictability of project ti li j t timelines. Lower the risk of design changes late in the project project. Data integration and migration testing support. pp p Support compliance and audit requirements. Rapid assessment of which fields are consistently populated against model expectations. Focus data quality efforts where they th are really needed. ll d d Improve visibility to quality of data that supports business decision making. Compare source, target, and transitional data stores. Identify transformation rules for y migration and integration.
Danette McGilvray, Granite Falls Consulting, Inc.

Benefits: Saves the Programmers time and effort


Programmers already examine the data to make sure their work doesnt lead to code/load/explode.
If they believe what they are told about the data contents, it invariably leads to code failures. They end up reviewing the results with the project team to decide whether to code around the bad data or fix it. Profiling puts a rigorous process in place to prevent the need for this effort effort. Real example: 24 defects, $556,000 in development time, $142,000 in QA time, 6 month delivery delay because of unexpected d d data i the f d in h feed.

Scope of the Data Profiling Process


Not just done on raw data elements:
Includes counts and aggregations Other derived values

Can be run on:


Individual columns Across columns in a table Across tables Across applications and databases

Using Data Profiling for DQ Assessment


1. Extract data to be profiled 2. Analysts profile the data using a profiling tool and review results 3. Potential anomalies are noted within tools repository. Record: The data element in question The potential issue Why it might be an issue

4. Reports are generated from the profiling tool and reviewed by business Subject matter experts

5. Issues are reviewed and evaluated, e.g., Red: definitely an issue Green: not an issue Yellow: requires additional review review. Gray: Out of scope

6. Results reviewed for next steps steps.

Data Profiling is also a process


Determine Issues Worth fixing

1 Define Data Quality Rules

2 Profile the data Using a Data Profiling tool

3 Review Data Findings

4 Analyze Data Quality Issues

5 6

Set and Enforce Data Quality targets

Monitor ongoing Data Quality

Impacts on Metadata
The data quality rules discovered via data profiling are metadata. The results (quality of the data) are also metadata Must be documented Profiling results in a determination that either:
The interpretation of the data given by the metadata is correct and the data is wrong, or The data is correct and the metadata (data quality rules) are wrong Unless they are both wrong

Metadata M t d t needs to be recorded! d t b d d!

What Data Profiling Achieves

Metadata: Accurate and Inaccurate

Accurate Metadata

Data: Accurate and Inaccurate

Data Profiling

Facts about Inaccurate Data

Data Quality Issues

Analysis: An example of birthdates


Check out the beginning of the year
Looks too high

and the end of the year and year.

Finishing Up
Data Governance is a program that needs corporate support and an organization Data is an asset that must be defined, managed, stewarded and governed. Accountability and Communication are crucial. Data Quality and Robust Metadata are benefits of a Data Governance program Taking responsibility for Data Quality across the corporation is a primary goal of Data Governance

Thank you andany questions?

Potrebbero piacerti anche