ūüáļūüᶠ News Our roots are in Eastern Europe. We are actively committed to helping Ukraine refugees with our resources & expertise
From DevOps to DataOps

From DevOps to DataOps

December 26, 2018
Post views

Modern software development runs at a faster pace than ever before. As cloud computing divides winners and losers in the data-driven marketplace, converging applications and their data with the infrastructure they run on makes more and more sense.

Converging software development with infrastructure operations is not news. Everyone is familiar with DevOps. DevOps originated over a decade-and-a-half ago in the first generation of web and mobile, and it was a game changer. Now that ‚ÄúBig Data‚ÄĚ given way to bigger data, the disciplines of DevOps need to encompass what has come to be called ‚ÄúDataOps.‚ÄĚ

The ideal goal is to apply the discipline of well-behaved release and validation processes to make data as predictable, reliable, and fresh as updates to mobile apps and websites. The challenge is that data infrastructure is much more complicated.

Data Drift: When Data is a Moving Target

The era of bigger, faster data goes beyond cloud economics and cheaper storage. With so many sources of data running independently, there has been in an explosion of complexity in the data supply chain. Gone are the days where one database managed all transactions of one business process.

Market risks and opportunities drive changes in business requirements; changes in business requirements drive changes in apps; changes in apps drive changes in their data. The business requirements market risks and opportunities. Part of the problem is the speed of the dynamic. What complicates it further is that multiple systems that produce the data undergo their own changes independently. They don’t account for the impact of the consumers of downstream data and analytics.

Contrast this with 20th-century practices centered on¬†the relational database and the data warehouse. All data and applications were tightly controlled, and their output flowed to¬†flowing to¬†a¬†‚Äúsingle source of¬†Truth.‚ÄĚ The same people who ran the data warehouse ran business infrastructure. All the answers came from the same place; reports and analytics were always predictable, just like business models.

That‚Äôs now changed irreversibly. More apps produce more data, broadening the range of¬†stakeholders who consume data from those apps. Data infrastructure needs a¬†stable way to¬†metabolize those changes without interruption. Data quality used to¬†mean ‚Äúnever change the data.‚ÄĚ Now it¬†means ‚Äúthe data is¬†always changing; what are you going to¬†do¬†about¬†it?‚ÄĚ

Agile Development in the Age of Bigger-faster data

20th-century software release cycles were measured in months, and sometimes years. Development requirements were stable and transparent. Technologists built systems knowing how they would work and who would use them. They were tested to perfection before release into the wild.

Those days are gone. The change is felt most felt acutely by businesses who have to compete with the likes of Netflix, Facebook, Amazon, Google, LinkedIn, and their mass-market business models. When those titans emerged, no one really knew what software would best suit consumers. When mobile apps became a first-class citizen, mass-market feedback accelerated even further. That drove an even more rapid cycle of discovery, experimentation, and improvement.

It was at about this time that the Agile Manifesto set forth a new view software engineering that has come to dominate modern software development. Once there was a deep, well-divided set of specialties over a long life cycle of building a complex system. In contrast, agile development values speed of iteration and prescriptive assumptions about what’s the best way to do each thing. In the words of the Agile Manifesto, it prioritizes

  • Individuals and interactions over processes and tools
  • Working software over comprehensive documentation
  • Customer collaboration over contract negotiation
  • Responding to¬†change over following a¬†plan

Modern compute economics and the cloud in¬†some ways forced the hand of¬†software development teams. There was just too much complexity to¬†divide and conquer with a¬†single, long-term work assignment across a¬†single team. The approach systematically favors the adaptive strategies of¬†the left side ‚Äúover‚ÄĚ the classic on¬†the right.

A¬†good idea that works is¬†better than an¬†idea you haven‚Äôt tried yet, or¬†falling back on¬†‚Äúwe‚Äôve always done it¬†that way‚ÄĚ. Bigger and faster data has made the need even more acute.

Pipeline Development and Management

Relaxing the constraints of 20th-century software engineering doesn’t mean that iterative development is free of structure. Upleveling from DevOps to DataOps introduces new infrastructure complexities. There are four stepping stones around which the multiple specialties of data operations must converge.From DevOps to DataOps

  • Build‚ÄĒdesign topologies of¬†flexible, repeatable dataflow pipelines using configurable tools rather than brittle scripting and one-off data movement and imports
  • Execute‚ÄĒrun pipelines on¬†edge systems and in¬†auto-scaling in¬†or¬†cloud environments.
  • Operate‚ÄĒmanage dataflow performance through continuous monitoring and enforcement of¬†Data SLAs to¬†tie development goals to¬†operational reality
  • Protect‚ÄĒsecuring data end to¬†end, in¬†and around the pipeline. At¬†every point of¬†its journey, both from the bad actors as¬†well as¬†compliance, governance, etc.

The job of DataOps really revolves around pipelines, the movement of data from origin to consumption, in a fabric of many to many relationships. What makes this work data is that data pipelines are not monolithic. They are continually evolving as sources and endpoints change. Data pipelines include four types of logic for managing data in the relationships:

  • Data Origins: What are more systems were the day that enters the pipe
  • Transformation: type of¬†data processing that you want to¬†perform, by¬†which changes are introduced into data that passes through the pipeline
  • Executors: Triggering a¬†task when it¬†receives an¬†event. Executors do¬†not write or¬†store events.
  • Destinations: Represents the target(s) for a¬†pipeline; you can use one or¬†more destinations in¬†a¬†pipeline.

By¬†applying the DevOps idea of¬†‚Äúinfrastructure as¬†code‚ÄĚ, data pipelines can be¬†managed as¬†a¬†resilient set of¬†resources. Versioning of¬†configurations ensures that pipelines can continue to¬†deliver valuable data at¬†high quality even as¬†the sources and destinations of¬†data change.

Along with maintaining well-configured pipelines instance in¬†the repository, DataOps needs to¬†account for alerting and response to¬†changes. The ‚Äúexecutors‚ÄĚ function of¬†a¬†pipeline can do¬†more than move data from one stage to¬†the next. It¬†can also provide automated data validation and feedback, so¬†that if¬†there are problems at¬†any step of¬†the pipeline, even at¬†initial ingest, they can be addressed. It’s a set of assumptions analogous to¬†both test automation on¬†the development side, as¬†well as¬†systems monitoring on¬†the operations side.

With thresholds and validations built into the pipeline, Ops Team can take prompt action to iterate and make changes. Because the best way to prevent garbage-in/garbage-out is to ensure no garbage enters the pipeline to begin with.

David Fishman

David is a Silicon Valley executive with several decades of experience driving product marketing and product management at companies innovating in IT and cloud infrastructure, analytics, and open source. He has led alliance and demand gen operations at multiple startups and large public companies.

Related Stories

5 reasons to start developing for IoT

Read Story

iPaaS – the way to integrate modern cloud based solutions

Read Story

Creating a lean, mean requirements machine

Read Story