Copy-writing xample
What is Data Science Vs Data Engineering?
It all really started with the tabulating machine invented by Herman Hollerith for the 1890 US Census. Little did Mr. Hollerith know how it would manifest itself 130 years later. The transition from vacuum tubes to transistors in the 1960s marked the puberty of the Data Era. The invention of the Internet in the 1980s marked the coming of age, or the adulthood of the information age. The spawning of the need for vast amounts of data became the narcotic of our society.
While data science and data engineering have some overarching similarities and functions, at the same time they are two distinctly different disciplines. Data engineers service data scientists by providing the structure needed for their work. Data scientists depend on the systems built by data engineers to work their magic.
The flow of data is like a river. Picture the Nile River as a giant flow of data. Some of it is used for agricultural irrigation, some of it for tourism, and some of it for use by Egyptian citizens for domestic use; drinking water, bathing, sewage, etc. In addition, it’s also used to generate electricity in the form of hydroelectric dams. Each of these uses has its own unique purpose, and different data needs to be produced to manage it. Data scientists decide what data is available, or how to create it, and how it can be used to manage the various resources and purposes for its use. Data engineers then build the infrastructure based on the models that data scientists have created. All the irrigation canals, water treatment plants, hydroelectric dams, drinking water distribution services, and sewage removal networks are distinctly different systems, generating various types of data that can be used to manage those systems.
What are data scientists?
“These professionals are well-rounded, data-driven individuals with high-level technical skills who are capable of building complex quantitative algorithms to organize and synthesize large amounts of information used to answer questions and drive strategy in their organization. This is coupled with the experience in communication and leadership needed to deliver tangible results to various stakeholders across an organization or business.”, Says the Berkely School of Information, an educational leader located in California’s Silicon Valley.
To put it in simpler terms: Data scientists build digital models of problems and their solutions using whatever data is available.
They all use the following principles and tools to accomplish their work:
The scientific method is fundamental to Data Science
All branches of science; Chemistry, Biology, Medical research, Agriculture, and so forth use this method to organize their work, and data science is no exception. It helps control and manage the processes and systems in a logical way in order to produce a desired result.
1. Make an observation.
2. Ask a question.
3. Form a hypothesis, or an explanation that can be tested.
4. Make a prediction according to the hypothesis.
5. Test the prediction.
6. Iterate: use the results to make another hypothesis.
7. Start the cycle over again.
Processes are important too
While the processes are almost always unique to a specific goal attainment, they also usually have similarities. Each process can be different to the one before and after, and according to the needed outcome.
Algorithms used in the field of Data Science
Then there are some standard algorithms that are commonly used, as well as bespoke ones tailored to the task.
Watch these YouTube videos for more:
https://www.youtube.com/watch?v=X3paOmcrTjQ&ab_channel=Simplilearn
https://youtu.be/xC-c7E5PK0Y
Now that we have discussed what data scientists do, let’s look into data engineering and see what some of the considerations are, and some of the ways that data engineers deliver their parts of the solutions.
Data Engineering
Robert Chang, a prominent Data Engineer from Airbnb says that Data Science is “the Close Cousin to Data Science.”
While:
Dr. Bhushan Kapoor of datasciencegraduateprograms.com has a longer more in depth idea of it,
“Data engineers focus on the applications and harvesting of big data. Their role doesn’t include a great deal of analysis or experimental design. Instead, they are out where the rubber meets the road (literally, in the case of self-driving vehicles), creating interfaces and mechanisms for the flow and access of information. They may be experts in:
System architecture
Programming
Database design and configuration
Interface and sensor configuration
Although data engineers don’t always get the glory of coming up with crazy insights by querying and combining big data sources, their work is important in building the data stores that are used in that work, and in taking those insights and putting them to practical use.”
Data engineers design and build the various systems that will actually collect, store, process, and produce some useful output.
Let’s start with some of the standard skills and tools they use to do their jobs.
Main Data Engineering Skills.
Data Engineering.
Basic Language: Python.
Extensive Knowledge of Operating Systems.
Complete Database Knowledge – SQL and NoSQL.
Data Warehousing – Hadoop, MapReduce, HIVE, PIG, Apache Spark, Kafka, and Amazon Web Services (AWS).
Basic Machine Learning Familiarity.
Two of the most important elements of a solution that a data engineer creates are storage, and which OS to use.
Whether it’s for an Amazon warehouse or a local accounting firm, it is critical to not only store information, but also to make sure it is automatically backed up using more than one state-of-the art technique, and to design systems so that various users can retrieve data in a useful and interactive format. Data storage is also referred to as Data Warehousing.
a few of the things Data Engineers create:
Data Pipelines
Data Modeling for a Streaming Platforms
Data Modeling
Data Lakes
Data Warehouses
Examples Data Engineering Projects by Market Leaders
Now let’s examine an example of Data Engineering to see what exciting projects are underway at market leaders.
Space-X
Virtually every system on the Space-X Falcon Heavy rocket required a team of Data Scientists and a team of Data Engineers. From the propulsion system to the toilet, everything produces data that is critical and needs to be managed. Some of them are important life/safety issues, and some are health and recycling issues.
Of course the immediate end goal is to put people on Mars. This will expand the roles of Data Scientists and Data Engineers exponentially. Now, the aims are to just test and perfect the systems for short durations, but when humans are able to make the journey to the red planet there is going to be a lot more data that needs to be managed. Everything from food, to psychology issues, to entertainment is going to produce data that will be managed by systems designed by Data Engineers.
https://www.express.co.uk/news/science/-/spacex-crew-dragon-iss-how-do-astronauts-go-to-the-toilet
Follow this link to examine some contemporary projects Using:
1. Data pipelines with Apache Airflow
2. Data Lakes with Apache Spark
3. Build a production-grade data pipeline using Airflow
https://shravan-kuchkula.github.io/data-engineering/#1-data-pipelines-with-apache-airflow
The Future of Data Science and Engineering
Whatever your position on the modern state of data, its collection and use, for good or for bad (in my mind the good outweighs the bad) there are several undeniable truths. It isn’t going away; Big Data has become ubiquitous and the need to manage it fundamental and critical.
It has created dozens of new jobs titles since the 1970s, and will continue to do so in the future. With the IoT in full swing, and Quantum Computing on the horizon, surely Data Science and Data Engineering are bound to rush forward hand in hand. A dozen more sub-specialties have already been created in the past decade. They have blended and mixed into an almost homogenized field. Who knows what the subcategories of the future will be?
In researching this piece, many articles in which Data Engineers referred to themselves as Data Scientists and vice versa were found. However, this was due to the perception that they had to be masters of both, and not the fusion of the two jobs. Although experts might become both, the two fields; Data Science and Data Engineers will always rightly be divided into separate disciplines.
I hope this article has explained the differences, as well as been entertaining, informative, and hopeful.
https://ischoolonline.berkeley.edu/data-science/what-is-data-science/
https://medium.com/@rchang/a-beginners-guide-to-data-engineering-part-i-4227c5c457d7
https://www.datasciencegraduateprograms.com/data-engineering/
https://www.kdnuggets.com/2016/03/data-science-process.html--