Welcome to part two of DataOps for Dummies, a series where we’re taking a step back from the technical content and look at the different aspects of DataOps. If you’re wondering what DataOps is and where it came from, you might’ve missed part one, which you can find here.
Last time, we talked about how DataOps evolved from a set of best practices to a whole new data handling discipline. That leads us to our current topic: what are the best practices that spawned the DataOps discipline? Let’s take a look at the different DataOps application steps so all of us can understand.
Scale your hardware and infrastructure
One of the significant and influential movements that lie at the origin of DataOps is the agile way of working. One of the core ideas of an agile work process is flexible project management, by working in smaller ‘sprints’. When developing your data product, the first sprint is towards an MVP, a minimum viable product, and keep scaling in mind. What is an MVP, and how does this generally help?
When working towards a big project, it’s easy to get caught up in all the little details and ambitious functionalities you want your product to have. The problem lies with scalability and not being able to deliver results fast. That’s why it’s imperative to get to the basics first. This is what you would call an MVP, developing a fundamental basis that lays the groundwork for an efficient process of further development. This could be a robust data flow, a platform to integrate and manage data, or even a workshop to set goals and priorities for your ideal data situation. It’s the first step to build upon in your journey to a successful data product.
Small steps towards rapid success
Now that you’ve got your agile MVP, it’s time to take a page out of the DevOps book and work towards your dream product with incremental development cycles. This basically boils down to working on small updates to your existing framework. That way, you can evaluate what you’re working towards in smaller timeframes and gather feedback from all your stakeholders every step of the way.
The greatest pitfall for data projects is what we call the waterfall method of development. This happens when you plan your entire project according to the result as you see it at the start of the production and try to develop the whole product in one go. When working this way, you risk spending months on something that doesn’t or, worse, can’t produce the results you were aiming for. Another danger of the waterfall method of product development is the estrangement towards the market needs, not standing still along the way to ask yourself what need you’re answering can mean a lot of wasted time and effort.
With incremental development cycles, you’re not only taking fewer risks when it comes to delivering a quality product, but you’re also able to roll out faster updates that produce visible results. You’re able to adapt to unforeseen problems along the way. By working with smaller development cycles and gathering faster feedback from all stakeholders and market response, you’re able to pivot faster. This also means you can be critical towards your MVP and rework it if deemed necessary.
Using a sandbox to test the structural integrity
Another benefit of working with small incremental development cycles is being able to test your changes on the existing environment that you’ve set in with your MVP. This process of testing updates by unleashing them upon the existing product is called Continuous Integration and Continuous Deployment, or CD/CI for short.
You got to be smart about this of course, and this is where the sandbox comes in. You don’t want to test your new update on what’s live, there’s no telling what might happen to what’s already working. This is circumvented by working in a digital copy (called staging) or your live production environment. This can be easily exemplified by, for example, using a downscaled model of a bridge, adding another extension and seeing if it still holds its structural integrity after adding the extra weight. Hence, the sandbox name, you create an environment for your experimental testing that’s a representation of your actual product.
Ensure quality with constant monitoring
With ongoing DataOps processes, monitoring doesn’t come at the very end, but at every stage in the process, or more specifically at every step of the data flow. When applying the DataOps methodology to your work, it’s essential to keep an eagle’s eye out and monitor your data, processes and results.
This is why one of the main components of DataOps is Statistical Process Control or SPC for short. This control means you’re installing data quality checks within your data flow that checks for irregular, null, or incorrect data. It’s essential to spot this kind of data as soon as possible, as it might have a devastating effect on your results.
Read more about the risk of bad data and malfunctioning models here.
Monitoring is not only about data quality but also about process quality. This means regularly testing your models and scripts, the transforming processes between data and output, for inconsistencies or unexpected behaviour. Next to the aforementioned CI/CD method, you can test your systems under extreme pressure, and test the speed of your processes over a longer period of time. Nothing is perfect, but regular testing is what differentiates a lousy data flow from a good one.
Better results with transparent communication
A necessity in DataOps is having quality data transparency, but also transparent and open communication about the data, processes and other parts of the surrounding work. However technical it may get, we’re all still humans, and that’s why transparent and open communication is vital in these processes.
Communication ideally starts with daily stand-ups, keeping track of progress, plans, and transparency about roadblocks because two heads are better than one, which goes for solving everyday problems. This also translates to weekly, monthly and or quarterly meetings to check up on the progress and what might have popped up, so you can react in a fast and agile way to get the desired ideal situation.
A happy ending to your data projects thanks to the DataOps discipline
There you have it, your first steps into the DataOps discipline. With agile working, a strong foundation upon which you build with small incremental changes, testing and monitoring along the way, and transparent communication, you can solve almost every data problem the market throws at you.
This guide isn’t foolproof of course, one missing ingredient is company-wide collaboration, but that might best be a subject for another blog (or an entire book). If you’re looking for a guide to implementing all these principles in your business we have series on becoming data-driven that you can use.
Want to have a boost towards your ideal data situation? Consider subscribing to our mailing list or book a hands-on workshop.