Why Your Data Scientists are not Innovating Fast Enough

Posted on 15-aug-2017 13:33:00

Data scientists and data engineers are in high demand these days. For instance, I see marketing agencies quickly hiring a “data scientist" out of fear for missing the "big data train" or they are afraid of missing business because they cannot prove being enough “data driven”. But for most other organisations, their main goal for hiring big data specialists is optimizing internal operations and obtain insights in customer behaviour and potential new trends.

Data teams vs Startup teams

On some level, you could compare these big data teams with startup teams. They both run experiments in search of finding new solutions for a particular audience or user group. For a startup entrepreneur, his audience might be a niche or a part of the market, yet to be discovered, while for the corporate data team, it is their own organisation or existing customer base.

An important difference between these two groups however, is that start-up entrepreneurs do have the freedom to experiment and discover things with whatever they can find on the internet. They don't have to adhere to corporate IT/compliance standards, nor is there a need to follow a whole bunch of standard operating procedures (SOP's) which can take days or even weeks.

This is one of the reasons Google created a "Moonshot factory". A place completely separated from the main building. An inspiring place where R&D teams operate freely from corporate processes and culture. If they require a tool for an experiment, they just buy and use it, immediately. Result: 10x more experiments and as such a much higher probability of finding a new product/market fit for a cool new technology or product.

So in short: More experiments means a higher probability to find at least one working business case that results in a competitive advantage for the company.

But to deploy big data experiments, the teams require IT infrastructure. And that's a huge bottleneck nowadays. In traditional SME's or larger companies, IT departments are not always organized to deploy "new" things quickly with the result that big data teams become reluctant in trying new things or get used to a slower tempo of innovation. This is not beneficial in this fast moving world.

Minutes vs. Weeks

However, we can cope with the problem thanks to the open-source community and the rise of automation tools such as the Tengu Platform, which is also open-source.

Steven Van Canneyt, lead Data Scientist at Realo points out that he was astonished when he saw Tengu setting up a website visitor tracking system for VRT and Newsmonkey, the dominant news sites from Belgium. Important building blocks were Storm and Cassandra, big data infrastructure for streaming analysis which Tengu deploys in a fully automated way. According to Steven: “It makes a huge difference when you can start processing after 5 minutes instead of days or even weeks. What's more; instant experimentation with open-source technologies gave us the opportunity to quickly validate business cases and move on to production."

 Tengu_UI_2A typical visualisation for Streaming analysis. In this case a Storm Cluster automatically deployed by Tengu.

From 3-week sprints to 1-week sprints

Another company managed to increase the speed of validating new business from three weeks to only one week. Before they started automating the deployments, they could only start working on the data the last (third) week. This because the first two weeks were used for deploying big data components such as clusters and setting up cloud infrastructures.

By using an automation platform (Tengu.io), they were able to reduce the deployment from two weeks to only a few hours since the automation platform made it child's-play to deploy whatever big data setup is required.

The result?

  • The big data team was able to reduce the 3-week sprint to 1-week sprints. (and a such; validate 3x more business cases in the same amount of time).
  • Overworked system admins (which are not easy to find anyway) could now be reassigned to more strategic tasks (instead of doing repetitive deployment work)
  • The company is able to find operational optimizations and market trends much quicker than their competitors because they are able to execute big data experiments three times faster than they used to do.

Big data Teams should focus on Data, not IT. By using Tengu, there is no more need to manually install, configure or integrate the technologies required. This allows big data teams to immediately focus on conceptualization, business intelligence and less on operations.

Topics: Big Data

Daan Moreels

Written by Daan Moreels

Join our 100+ subscribers!

Stay informed about topic related to DataOps, data management, Tengu, interesting data events.