Data Governance is a journey, not a destination

While “Data Governance” is not a new buzzword in the corporate world, it has been increasing in popularity due to the demands of capturing mountains of complex data for reporting, analysis, or data science purposes. When I read definitions online about data governance, I often end up in a never-ending rabbit hole of looking up the jargon used in their descriptions. After a lot of research and numerous implementations, I can more simply summarize that Data Governance is the umbrella term that defines a company’s processes and procedures on how employees interact with and secure data. There are a lot of components that fall under that umbrella and it’s often tricky to determine where to start. To help with this, let's dive into some examples and key aspects of a data governance framework.

Examples of daily data governance tasks

Information is used in almost every profession, which means the vast majority of us are at least partially knowledge workers. As knowledge workers, there is a really good chance that we participate in at least a piece of the governance process. To quickly touch upon some examples of data governance interactions:

  • A Finance end user should only have access to their respective business unit and no others
  • A different finance end user can see forecast information for their region and they are the only one who can see employee-level salary data
  • A business intelligence administrator is the only person in the system who can interact with personally identifiable information for their client-specific data
  • The CRM sales administrator maintains the sales data that is used as the single version of truth for a customer list
  • The ERP system administrator maintains the enterprise data that is used as the single version of truth for entity, company, and GL structure
  • IT puts network policies in place to better secure their technology infrastructure which means anybody who needs access needs to get manager and IT approval
  • A customer-facing representative gets a new job as a sales manager so he or she needs to pass along their standard operating procedures to a new hire

Within an organization, there are many end users and consumers of data. Keeping everyone on the same page requires a lot of work.

I believe it was a late NY rapper that once stated: “More data, more problems”. However, with the right tools and processes in place, we can change that paradigm to “more data, more solutions”. 

Still there?  The best part of this blog is yet to come...

Let’s take a look at some key aspects or tasks of a governance framework. Later I will describe where to put all of the documentation. I’ll go ahead and tease that it should be contained in a shared location or better yet a “wiki”.

Aspects of Governance

  • Aspect 1: Get Documenting:
    • Existing (and future) sources
      • To quote Maui from Disney’s Moana – in today’s high-tech world he may describe the basis of data governance as “Knowing where you are by knowing where you've been”
      • Having a list of data sources and owners will help identify data stewards who are the folks who interact with the data and can contribute to valuable discussions on data integration and sharing
    • Data dictionary
      • What are the fields (calculated and otherwise) that make up the staged and transformed data within the database(s)?
      • What are the formulas and special security rules that govern who should be able to view this data?
      • How does the business describe the fields?
      • Where does the data come from?
      • There are many fields to add to a data dictionary so start documenting and the fields will come
    • Reporting inventory
      • List the existing reports, the type, the technology, the audience, the dimensions, attributes and measures, etc. usually in large table form with filtering capabilities
    • Standard Operating Procedures (SOPs)
      • This is the meat and potatoes of documentation, but it is always worthwhile because people come and go within a department
      • Having a one-stop shop for all policies and procedures within a company that is organized will be invaluable for when Brenda from Accounting wins the lottery and moves to the south of France
      • The operating procedures could be anything from basic “how to” and training documentation to regulatory and compliance filings that need to happen every few years
    • Master Data Management procedures
      • Assess what the rules are for the single version of truth when it comes to entities or other dimensions
      • For entities or datasets that can be compiled through multiple data sources, what are the rules that govern these combined datasets?
    • Data Remediation procedures
      • What do you do when it all hits the fan?
      • What about when something is a little “off”?
      • This should outline who to go to for each of the various scenarios for incorrect data
      • This is a big one
      • What are the steps that a data engineer or architect should take to promote code when system changes are required?
      • Staging and transformational logic should be stored in a repository
      • Testing and tracking of the changes and versions should be traceable
    • This promotional path should be outlined in a development framework documentDevelopment framework
  • Aspect 2: Governance roles, user roles and security:
    • Data Governance Team
      • Depending on the size and culture of your organization, this task requires organizing a collective team from various departments with the goal of keeping open lines of communication and decisions for how the consumers of company data are expected to interact with the systems
      • Data stewards are the front line of handling data and will consult with other stewards on how to handle data issues and metadata requirements
      • A council of elders, just kidding... a council of governors can help prioritize issues and provide guidance on matters concerning source integration and reporting needs
      • The team should meet regularly to discuss a variety of topics, document decisions made, and create a list of actions
    • Permissions Matrix for Role Based Access Control (RBAC)
      • Define the roles within your organization and how they related to the data structures that require editing vs reporting
        • What admin roles are required?
        • What are the data engineering roles for ETL or ELT?
        • Who can edit source data?
        • Who needs access to the ODS vs who needs access to the reporting views?
        • How do permissions get inherited through the roles?
      • The RBAC model will then be used for the security setup within the data solution
    • RACI Matrix
      • From an organizational standpoint, come up with all of the roles and then a list of tasks. Decide who is:
        • Responsible
        • Accountable
        • Consulted
        • Informed
      • There needs to be at least one person accountable and one person (maybe the same person) responsible for each task

 

Documentation of Governance (Wiki)

The above inventories and documentation should make their way into a shared location such as a SharePoint site. Or better yet, if your company has access to a wiki (think Wikipedia with file sharing capabilities) you could benefit a lot from this type of real time communication. I personally find the wiki an invaluable part of the governance process especially when it comes to documenting the standard operating procedures and solution architecture documentation.

How technology helps

When it comes to technology: the more I learn, the more I am surprised by how little I know. And then someone asks me what I do for work and I go on and on about how I manage technology projects for various departments where we implement technology solutions usually in the form of a large data warehouse and create the ability for clients to analyze their data to get better insights. I don’t know if there’s a better elevator pitch out there folks. Maybe I do know a thing or two or at least “enough to be dangerous” (like you). But what technologies help when it comes to data governance?

We talked about the wiki.  A one-stop shop for documentation needs within an organization. No more sending files through email, “Hey! Can someone send me the latest version of the project charter?”  “No, Steve, I sent that to you 10 times last year. Now just check the wiki, here’s the link.”

Code Repositories and Version Control.  When it comes to the development framework it is important to have the ability to maintain a single version of truth when it comes to code or rules that govern how the data is staged and transformed through the reporting layers. The version control software will allow for continuous development and integration into the reporting solution which will make for happy users.

Data Warehousing.  For the data solution, having a single version of the truth for data is the end goal. Having a reliable solution with a focus on security is where the rubber meets the road. The RBAC model in practice will allow the end users to interact with the data in the correct manner. People will not be seeing things they shouldn’t. Internal security audits will go smoothly. Someone will get a bonus for all this great work, heyoo!

Reporting Tools.  There are many reporting tools out there. Finding the right one will take some trial and error. The reporting layer, from a governance standpoint, is the perfect place to begin when it comes to security roles. Users with view access to their business units will be able to review live system data and help in any data remediation issues which are destined to arrive at some point or another.

Where to begin?

Start with the lowest hanging fruit. Document your sources (source inventory). Then document the fields (data dictionary). Document the outputs (report inventory). Discuss with the data owners about security rules and how metadata is handled. Talk to your business partners about the operating procedures that need documentation. Re-read this article and keep going through the things I talked about in the aspects of governance section. 

There is no end!

Aerosmith once sang, “[data governance] is a journey, not a destination.” There is no end point when it comes to a data governance process. Governance is like the Olympic torch of the corporate world. It burns all year long and every so often it is seen and people just go bananas for it. It is a cycle of actions that need to be maintained, reviewed, improved, etc. 

Plea

If you have any questions on what this all means or you want to discuss each of these heavy topics in much greater detail, please reach out. What do you find helps keep your organization aligned when it comes to data governance principles?