Data is quickly becoming the currency of the digital economy, so it’s a good time to examine data governance in our “How to Speak Like a Data Center Geek” series. Massive growth in big data is leading to new opportunities for organizations to mine actionable insights and create monetization approaches. As one example, the Insights-As-A-Service market is projected to grow to $3.31 billion by 2021.i
However, as is often the case, hyper-growth can lead to hyper-attention, and data governance is no different. Businesses and governments alike are increasingly focused on key questions around data such as:
- How can I best make use of and govern the data that’s running my organization today?
- What is the quality of information that I have?
- How do I ensure that the right parties have access to the data?
- What kind of data protection is in place and how do I validate and monitor that my customers/citizens data is safe?
- Where can I tap into the right tools and innovation to monetize my data in the future? Is that in the public cloud or somewhere else?
Let’s dive into the topic for this Geek entry to see if it can help shed some light.
Googling the term “data governance” may lead to more confusion than clarity. Every organization seems to have its own version of what it means – for example, a Lights on Data article lists ten different definitions.ii Some of the confusion may stem from the fact that it is both a business strategy and an IT process for how an organization manages its data. Data is a business asset with value – similar to buildings, employees or supply chains. Like these other business assets, there are business and IT policies and processes that help to ensure that data complies with regulations, stays safe, is of high quality, and is leveraged for maximum business benefit. In short, data governance is the strategy and set of processes that determine how data will be used in business operations and is reflected in the system architecture. Let’s take a look at some of the elements of data governance:
Data acquisition: The first point of data entry for a business, whether this is input manually, automatically collected from sensors or ingested by tools. Open source software like Apache Spark or Kafka help enable ingestion of data across distributed IT architectures. In addition to data collection, data governance in this phase is focused on data cleansing and ensuring high data quality.
Data quality: Data quality is the degree to which the data is fit to serve its purpose in a given context – usually its intended use in business operations, decision making and planning. Data quality is determined by how accurate, complete, relevant and up-to-date it is. In a distributed IT architecture, maintaining a single source of truth for high data quality can be challenging, especially as the number of data sources proliferates with sensors and other devices. Data governance helps to ensure consistent data quality between various systems with processes that can address data conflicts and ensure that data is complete and accurate.
Data movement: Moving data from one place (the source) to another (the destination), generally through loads, feeds, transfers or stores. This is often done through a data fabric – a set of data services that manage data across various endpoints of on-premises and multicloud environments.
Data sovereignty: Data sovereignty is typically tied to regulations or data privacy laws that require companies to maintain certain data within the border of a country where it is collected. A good example is the European Union’s (EU) General Data Protection Regulation (GDPR) which governs how data about EU citizens can be used or accessed. Data governance ensures compliance with these regulations by putting protections in place to manage who has access to the data and that the data is being used for the right purposes. In cases where data needs to be moved out of the country for processing or analysis, data masking or data tokenization can help ensure compliance by replacing sensitive data with dummy or modified content.
Data privacy vs data protection: Data privacy is the regulations, laws or policies that govern the use of an individual’s data. Data privacy is generally focused on personally identifiable or other sensitive information such as healthcare or financial records, contact information or web surfing behavior. Data protection, or data security, is the set of tools, procedures and technology used to enforce the policies and regulations. This includes prevention of authorized access or misuse of the data a customer agrees to share with a company,iii as well as protecting digital data from cyberattacks and data breaches.
Data analysis: Data analysis is the process and science of extracting insights from raw data sets, increasingly with the aid of real-time automation and artificial intelligence. Traditionally data was moved to a centralized data warehouse for analytics, but this is increasingly moving to the digital edge as data sources and volumes continue to proliferate. By deploying an IT architecture based on Interconnection Oriented Architecture™ (IOA™) best practices, businesses can collect and aggregate data with local event processing to optimizing data streaming flows and edge analytics.
Data monetization: As data and real-time insights continue to grow, leaders are focusing more on how to turn data into business value. Data can be sold directly or indirectly turned into new product or service offerings. For example, medical services could be developed based on data collected from wearable devices, or a grocery store could streamline its inventory based on the contents in its customers’ refrigerators. The central question for these new marketplaces is how can companies buy and sell data (or the algorithms) in a compliant and trustful way?
Future-proofing as a service
Leaders concerned about future-proofing their organization cannot afford to ignore data governance. In the digital economy, competitive advantage is derived from an organization’s ability to harness data. Getting the biggest bang for the buck means making sure that data is of high quality, protected and compliant with regulations from the time it’s first collected until it’s monetized.
Learn how to gain greater control of globally distributed data by exploring the Digital Edge Playbook for Distributed Security.
Also, to really learn how to speak like a data center geek, check out every post in the series. (NOTE: Binge reading geeks welcome!)