From Data is the new oil to Data Scientist – a highly coveted role – Data Science is probably the most popular buzzword of the decade.
In this post, we shall seek to summarize the challenges that data science enthusiasts face when they set out on their journey to become a successful Data Scientist.
We shall then enumerate the key aspects to make impactful contributions by citing advice from industry experts all along. Here’s a quick overview of the topics that this post covers.
Popularity of Data Science
- Data is ubiquitous, and so is Data Science. From healthcare to education management and finance, all the way to Ad Tech, Data Science is indeed ubiquitous and industry-agnostic!
- We can leverage the power of data in tandem with tools from Math, Statistical Analysis and Machine Learning.
- We can generate insights and solve problems at scale.
Challenges of an Aspiring Data Scientist
Almost all of us have been ‘Data Science Enthusiasts’, and ‘Aspiring Data Scientists’ at some point in time. However, this also makes the job search process highly competitive.
For a beginner who is just looking to get started, the umpteen sources of information, online courses, and an almost infinitely long set of good-to-have skills can often be overwhelming.
The existing disparity between the number of high impact data science positions and the number of aspiring data scientists is also a valid point to be addressed. In all, what foundational skills should everyone seek to gain proficiency over to etch a successful trajectory in a Data Science career is a widely discussed aspect in itself.
As part of the never-ending skill requirements, there is often a need:
- to be familiar with Math, core Machine Learning,
- to be able to develop models that are production-ready, and
- to have a certain degree of domain expertise to build solutions that actually are meaningful and impactful.
In this post, we shall seek to highlight the viewpoints of some of the most eminent personalities in the Data Science Community on their advice for aspiring data scientists, useful resources and insights, all cherry-picked from the ongoing interview series with amazing guests at AI Time Journal.
Importance of Domain Expertise
Let’s start from the point that you could work as a data scientist in any industry out there. For example, Healthcare and Education management are disparate fields, but you could be a data scientist in either of these. This is where the context of domain expertise comes in.
Is domain expertise important to build effective solutions? Well, it definitely is. But, how much domain expertise is expertise enough? Let’s see what our guests had to say in this regard.
Christina Stathopoulos, Analytical Lead at Waze and an Adjunct Professor at IE Business School shares the following on the importance of domain knowledge:
“ Domain knowledge is very important but not always necessary to get started. I have worked in industries as diverse as travel, telecom, security and FMCG. In each case, I started with minimal domain knowledge and one of the biggest challenges was bringing myself up to speed with the industry dynamics. It is vital as a data scientist that you understand the industry that you form a part of in order to truly understand the meaning behind the data you work with.”
Sayak Paul from PyImageSearch also shares a similar opinion on the importance of domain knowledge in data science:
“ Domain knowledge is extremely important for effectively working through a project, particularly in this field. It helps to gauge what is and is not possible to achieve realistically. With an abundance of data, people often think if we throw them at a learning algorithm, we would get expected results for the task at hand. This is where domain knowledge comes right in helping to determine what level of data preprocessing is required in order to correctly feed that data to the algorithm.”
This means that it’s natural to start with minimal domain knowledge, but ramping up quickly to the recent advances in the industry and understanding the industry better are essential in building effective solutions while truly understanding the problem.
Essential Data Skills – Engineering and Data Fluency
Let’s now move on to the skills that prospective candidates should have. As cited before, the list is just too long to be mastered in a lifetime. So, let’s hear from the experts on what skills are sacrosanct and should definitely be diligently worked on.
Konstantin Golyaev, Data Science Manager at Microsoft shared the following in his interview with AI Time Journal.
“ My team tends to hire people with a fairly specific set of skills at the intersection of data science and ML engineering.”
While a working knowledge of tools and frameworks can often prove nifty in certain tasks, to understand what goes on under the hood is important to gain complete understanding. Read ahead to find out what Konstantin had to say about coding ML algorithms from scratch.
“ One of the primary things we seek in candidates is a track record of building ML models from scratch. We don’t expect every candidate to be able to train a new GPT-3 model, and we like to reuse existing solutions whenever possible. But people who succeed on our DS team can recreate existing solutions from scratch if it becomes truly necessary. To use an analogy: if we’re in the business of shipping vehicles to customers, then my team seeks to hire mechanics rather than just drivers.”
Understanding the math behind the working of algorithms is therefore beneficial. At the same time, how is being cognizant of machine learning engineering beneficial? Here’s Konstantin’s answer to this very relevant question.
“ Another skill we look for is being comfortable with the engineering aspects of data science. We don’t develop awesome predictive models just to see them get stale inside Jupyter notebooks. Instead, we modularize the model code and develop a battery of end-to-end tests so that we could deploy the models into production using CI/CD workflows.”
There’s no data science without data. What skills should prospective candidates have along the lines of being comfortable working with large amounts of data?
“ Finally, I know it’s not a particularly sexy skill, but I’m still convinced that the biggest return on invested learning effort in Data Science comes from getting good at SQL. It is almost impossible to avoid interacting with some sort of database if you’re working on a real business problem, and all databases speak SQL. Most data preparation and cleaning logic can and should be implemented inside a database (read: in SQL) whenever possible.” says Konstantin.
Ioannis Tsamardinos, Co-founder and CEO of JADBio also shares his insights on what he thinks makes an ideal data scientist.
“ They do not call Data Science a science by accident. Analyzing data always has a research component. You are discovering the patterns, you are exploring. You do not necessarily always know what exactly you are looking for, or what is interesting, or what is the best way to visualize results. So, some major skills for a data scientist are to be inquisitive, take the initiative, and be research-oriented. The ideal data scientist to me is a scientist, an artist, and a researcher.”
How beautiful and insightful is that! It sums up all the traits of a good data scientist in just a few words.
Useful Resources to Master the Fundamentals
On a final note, let’s list down a few resources that the guests have suggested thus far, which could be beneficial to everyone seeking to gain strong foundational skills.
Ioannis Tsamardinos gives the following advice to all data enthusiasts.
“ I am not going to recommend the next Python library to use or an R package or a new programming language for machine learning. I would recommend young data scientists learn statistics and machine learning theory behind the tools they use. I would recommend reading classic books like “The Elements of Statistical Learning” by Friedman et al. Do not get caught up with the hype of Deep Learning alone. One must have a more complete background that includes basic statistical and machine learning concepts and techniques. Learning causal discovery and modeling completely changes your view and perspective of data science and is definitely worth it. The book “Causation, Prediction, and Search” is a must on the subject.”
Sayak Paul recommends the following set of books and courses that would together facilitate a great learning path.
- Grokking Deep Learning (Andrew Trask)
- Deep learning with Python (Francois Chollet)
- Data Science from Scratch (Joel Grus)
- Automate The Boring Stuff with Python (Al Sweigart)
- Practical Deep Learning for Coders (fastai)
- Deep Learning Specialization (DeepLearning.AI)
Thank you for reading. Hope you all enjoyed reading the post and found a few key takeaways on what qualities, and skills collectively pave the path to be a successful data scientist.
References
The above comprehensive post has been heavily inspired by the following interviews and includes quotes from guests.
[2] Interview with Sayak Paul, Deep Learning Associate, PyImageSearch
[3] Interview with Konstantin Golyaev, Data Science Manager, Microsoft
[4] Interview with Ioannis Tsamardinos, CEO & Co-Founder, JADBio