The Democratization of Data and AI in the Digital Economy

Everyone can be a data scientist and a developer

Herbert J. Preuss

There’s an interesting generational divide happening as the digital economy continues to expand. On one end of the spectrum is the Silent Generation (over 75) who grew up before the internet or even PCs were a thing. And then there are the digital natives growing up today in a world that is increasingly defined by technology. This generation, Gen Z, knows how to ask Google anything, watch YouTube to learn any skill and stay connected with their friends via social media.

My son is part of this generation, and he’s learning to code in Scratch, the visual programming language developed by MIT primarily for children. The fact that this coding language is simple enough for children to use is a great indicator of a major trend happening today – the democratization of data and technology. As of March 2020, the Scratch  community has more than 50 million projects shared by over 52 million users and over 46 million monthly website visits.[i] Taking a cue from Scratch, for “data democratization” to reach its full potential, digital ecosystems that enable fast, secure data exchange between participants are essential.

50 million+

Projects in @Scratch, the programming language developed by @MIT primarily for children, is a great example of the democratization of data & tech.

What is the democratization of data and technology?

The phrase “knowledge is power”, coined by Sir Francis Bacon in 1597, is best translated to “data is power” or, better yet, “useful data is power” nowadays.In the past, access to information was limited to the societal elite, and, more recently, to the technical elite. Big data was stored in centralized data warehouses managed by IT teams. Gaining access to that information and deriving useful insights from it depended on having the right skills or working with IT and data analyst teams that did. But things are changing with the convergence of IT infrastructures and technologies that make data easier to share and interpret – even if you’re not a data scientist.

In comes data democratization, named by Gartner one of the “Top 10 Strategic Technology Trends for 2020”.[ii] Data democratization is defined as making digital information accessible to the average non-technical user of information systems, without requiring the involvement of IT. You can think about it as crowd enabling anybody to use data at any time with no barriers to access or understanding.[iii] Examples of this include intuitive tools and user interfaces for data consumption, code snippets and plugins, open source data and code, pretrained AI models, etc. These initiatives aspire to make it easier and more accessible for anyone to build things and derive insights. For instance, whereas in the past a developer may have had to partner with a data scientist to build an AI model from the ground up, today he/she could use an AI model that’s pretrained on image recognition and leverage a small training dataset to get the AI to recognize a particular type of image like anomalies in X-ray images.

When PCs first came out, you had to be an expert in the DOS operating system to use them. Now it’s so intuitive that anyone can use them. I think democratization will make that happen for other development areas as well.

Data democratization will disrupt computer and data science education, as it levels the playing field in terms of the required skills. Most importantly, it will accelerate the adoption of data analytics and AI across a wider array of organizations. And the “enabled crowd” will fuel the explosion of “software-defined everything.”

Data vs useful data in the age of AI

AI at its essence is a probability model and it needs to be trained on a lot of data to be accurate – a previous Interconnections blog on AI in the data center provides a good explanation on how the training process works. So, as an example, a healthcare AI model helping doctors diagnose disease would provide several recommendations with different percentages of accuracy, such as three diagnoses with 90%+ probability, two with 80%+ probability and so on. And, as AI models mature, they will enable us to interpret larger volumes of data. As a case in point, today only a few dozen snapshots are retained from an ultrasound, but as AI-enabled diagnosis matures, the entire ultrasound video will be useful.[iv]

Data + AI democratization in action

Given that AI needs a lot of data to be trained well, there are a number of open data/open AI initiatives working toward that goal. One example is CAPTCHA, the ubiquitous “Completely Automated Public Turing test to tell Computers and Humans Apart” on nearly every website. Every time you successfully pick the stoplight out of the grid of images, you are training a version of open AI. Other examples include:

  • Open data repositories sponsored by governments, science and nonprofits such as data.gov, United Nations / World Bank open data, the Human Genome project, for Open Science, Dataverse, etc.
  • Open source AI such as OpenAI, Tensorflow, Apache SystemML, OpenCog, Microsoft Cognitive Toolkit, AI Explainability 360 Open Source Toolkit, CAPTCHA solving kits, etc.
  •  Data exchanges such as CommonWell (health data), IBM Data Asset eXchange, Grower Information Services Cooperative (GiSC), Springbot Exchange (retail), IoT data exchanges, etc.
Companies will benefit from democratization too. By making it easy for anyone to use data and applications, they will gain a much greater pool of brainpower.

Interconnection is essential for democratization to flourish

For data + AI democratization to work, it needs to be trusted. And that means keeping data and AI algorithms safe and secure. The public internet is insecure and prone to performance and latency issues, but a digital ecosystem that interconnects its participants using private interconnection can improve performance, scalability and resilience while ensuring trusted data exchange. Global interconnection solutions, such as those found on Platform Equinix®, enable low-latency, secure connectivity for digital ecosystem participants to exchange data, insights and AI models.

Source: GXI Volume 3

Download the white paper on Machine Intelligence – The Killer App for the Digital Economy to learn more.

You may also be interested in reading our blogs on data exchanges and AI.

 

[i]MIT, Scratch community statistics as of Mar 2020.

[ii] Gartner, Top 10 Strategic Technology Trends for 2020: 10 technology trends IT can’t afford to ignore, 2020.

[iii]Alation, What is Data Democratization?;Forbes, Bernard Marr, What Is Data Democratization? A Super Simple Explanation And The Key Pros And Cons, Jul 2017.

[iv]Cisco, Five things that are bigger than the Internet: Findings from this year’s Global Cloud Index, Feb 2018.