Suppose that we want to install an Apache Hadoop cluster with 3 Hadoop Slaves. On top of that we deploy Spark, which can be used for processing the data stored on HDFS and Zeppelin to visualize the data after it has been processed.
Data scientists and data engineers are in high demand these days. For instance, I see marketing agencies quickly hiring a “data scientist" out of fear for missing the "big data train" or they are afraid of missing business because they cannot prove being enough “data driven”. But for most other organisations, their main goal for hiring big data specialists is optimizing internal operations and obtain insights in customer behaviour and potential new trends.