Data Scientist: Industry Reaches “Maturity Point” as Specializations Grow
Data scientists are essential employees, in the sense that every industry needs them now.
When asked if we’ve gotten to a point where every company will employ or need someone in data science – if they haven’t already – Michelle McSweeney, Data Science Domain Lead at Codecademy, answered simply and categorically.
“Yes. Full stop. I don’t think I can stress this enough: almost every job is a data job.”
And there are plenty of those jobs available right now. In fact, Glassdoor had over 10,000 data scientist job openings at the time of publication. The site also said the data scientist is third on its list of the 50 Best Jobs in America for 2022. Enterprise Architect and Full-Stack Engineer ranked first and second.
SEE: Full-stack web development bootcamps: best programs and what to expect
Data is so ingrained in our lives at this point that I don’t think there’s a single industry that isn’t impacted by data. There is not a single company that is untouched by data.
“I have a very broad view of this,” she continued. “If you’re a restaurant and you collect paper receipts, and that’s how you do your taxes, that’s still data. It might not be digitized, you might be more efficient by digitizing that , but it’s still data. So I don’t ‘I don’t think there’s a single thing we do that doesn’t involve data on some level.’
Like any other professional skill, specializations exist. And they matter to employers and job seekers. This shift, she said, is an inflection point for the data science industry.
That’s one of the reasons Codecademy introduced four new data science career learning paths in June: Machine Learning, Data Analytics, Inference, and Natural Language Processing. Codecademy says it has analyzed job criteria and actual roles within major tech companies to shape the programs.
The e-learning platform said it had 2.3 million people enrolled in its data science career path over the past two years.
In a recent conversation with ZDNet, Michelle talked about universal skills for data scientists, the content and direction of new specializations, and the value of asynchronous learning.
Here is our interview. It has been condensed and edited.
Data science reaches “a point of maturity”
Michelle McSweeney: I think data science has reached that point of maturity.
In 2018, Elena Grewal, who was director of data science at Airbnb, posted this blog post on LinkedIn explaining how she organized her data science team. And there was a lot of discussion about it afterward, refinements and iterations on it, but ultimately it boiled down to this idea that there are three general flavors of data science.
There’s inference-based data science, there’s analytics-based data science, and there’s machine learning.
These terms don’t mean a ton of things to people outside of data science. And even in the field of data science, it’s hard to determine what flavor of data science you do, because that’s what you do. [If] a company says “I need a data scientist”. I’m going to call for a data scientist, they don’t even realize they look like a slice of this pie.
A closer look: analysis, inference and machine learning
McSweeney: Analytics focuses a lot on SQL; it focuses a lot on answering direct questions with data. And I like to call it answering questions about what happened, what’s going on, what is it?
Then there are the inference-based data scientists. And these are the type of people who answer the why questions. These are people who do a lot of statistics – they can work in Arc, they can work in SQL, they can work in Python – but ultimately their goal is statistical analysis of data in various ways. They are the ones doing the A/B testing; they are the ones who do the hypothesis testing.
READ IT: Best programming language for data science: Python still reigns, followed by SQL
When we talk about [how] data science is just a fancy word for statistics, we are talking about inference-based data scientists. It’s just a flavor they’re talking about. They answer questions about why.
And then we have these data machine learning scientists… There was this Harvard Business Review article from 2012 [about how] data science is going to be the sexiest career [of the 21st century].
They were talking about data scientists based on machine learning. … They are the ones who go deep into the limits of what computers can do today. We couldn’t do machine learning 20 years ago because our computers weren’t powerful enough.
And that’s where machine learning comes in. It’s where the power of computers and the power of data have finally come together. They answer questions about the future. They respond to what is going to happen. They therefore answer these predictive questions.
Programming and statistics: the universal skills of data science
McSweeney: [All Codecademy learning paths include] a basic in programming. No matter what. And it’s both Python and SQL. SQL is for databases and Python lets you write free-form programs. So that’s the basis that everyone has. Everyone has a foundation in data visualization and a bit of a specific Python package called Pandas.
Pandas is an invaluable tool for data science because it allows you to work with what is essentially an array of data programmatically. So we are not going in depth but we are going to have a solid base for everyone.
SEE: Guide to Python Programming Bootcamps
[And] even if you are the most technical data scientist, you will still need to communicate the results of these experiments to stakeholders. It is therefore important that everyone has this basis in communication.
There are also basic statistics for everyone, because every data scientist needs to understand how to run summary statistics, like getting the average of something. [There’s also a focus on] standard deviations and distributions because when we start thinking about data, once you think about it technically, you have to think about it statistically. The two are deeply linked.
Even if you don’t get into heavy statistics, this statistical foundation is what differentiates a data scientist from someone who can manipulate data in a table. It’s kind of a thing that takes you from being a programmer to being a data scientist.
Data science is not unique
McSweeney: Not all organizations do those three types of things, do they? Not all organizations need to use their data to make predictions or to make recommendations.
Most organizations need an analytics-driven data scientist. Most organizations need to be able to look at their data and say, “What’s going on? It’s analytics. Not everyone does A/B testing.
Not everyone needs an inference person. Not everyone needs a machine learning person.
So being able to focus on the type of data science that an organization does and focus on the skills that an individual needs to contribute to that organization allows people to focus I think and access to a career much faster because they’re not trying to learn everything.
You don’t have to be a jack of all trades.
Opportunities for beginners and advanced
McSweeney: The majority of our learners are entry-level job seekers. So, it’s someone who starts from zero and says to himself: “I want to be connected to a job as soon as possible”. And that’s who these career paths are really for.
We have other products in development to more specifically target these people who are perfecting themselves. … That’s just over a third of our learners [who] are upskillers. They usually don’t follow career paths from start to finish.
When I look at user progress data and see cohorts jumping in the middle – or jumping in the end – I think “those are my skills”. And that’s one of the cool things about these career paths is that you can start at the beginning and go all the way to the end and have all the skills you need for the job and get a certificate through to that, or you can choose what is actually relevant to you and fill in the knowledge gaps as well.
The future of learning is asynchronous
McSweeney: I really think asynchronous [learning] is just as valid as synchronous, probably more.
I taught in the classroom for a very, very long time. And I taught through STEM. And teaching programming skills synchronously is terrible because it excludes so many people. Invariably there will be someone who is fast and someone who is slow. And that creates a lot of conflict in the synchronous experience.
But learning these skills asynchronously – if you understand programming very quickly? Brilliant, fantastic. You can move through content faster. If it takes you longer, that’s fine too. You can sit down and spend some time thinking about these ideas.
I support asynchronous learning so enthusiastically because I think it gets more people interested in content and especially data science, which is very good, stable work. It allows more people from different backgrounds to access a synchronous classroom experience.
That doesn’t mean I think we should close all the schools — that’s not what I’m saying.
But I say that I particularly think [for] data science skills where there are so many different aspects, and people are thinking about these things and learning this skill at such different rates, [asynchronous learning] allows different people at different times in their lives to have access to a career in data science.