If you want to learn data science, you have your choice of online classes, lectures and other resources to build your skills.
With so many training options available, though, it can be hard to determine where to focus your efforts. This guide will help you sort out how to start and what you need to know for a career change.
But whatever route you choose, don’t think you can learn it all in a weekend cram session. Those working in this broad field, which involves using computer programming and statistics to crunch numbers on a large scale, need to be expert critical thinkers and demonstrate a certain level of tech acumen. It may take months or even years to learn what you need to know, so make sure you enjoy the work.
“If you really like it, you get into and enjoy it so much that the time you have to spend really isn’t such a big deal,” says Joel Sokol, director of the Master of Science in Analytics program on campus and online at the Georgia Institute of Technology.
Data science might be for you if you’re interested in drawing insights from the patterns and trends you notice in data sets. You’ll need an inquisitive mind and a head for numbers as well as a knack for computer programming.
As a field, data science is relatively new. Many colleges didn’t offer data science programs until recently, so it’s not uncommon to see people who have “data scientist” or “data analyst” in their job title who didn’t receive formal training.
So how do you begin building your data science skills?
“The answer to that question really depends on the individual,” says Michael P. Cummings, director of the Master of Professional Studies program in Data Science and Analytics at the University of Maryland.
For example, a social sciences major might have taken college math courses like introductory probability and statistics but might lack computer programming experience. A person with a computing background may be an expert coder but have only basic stats knowledge.
“This is a sophisticated field. It’s nontrivial things to learn,” Cummings says. “That said, if someone has substantial background in some of the basic elements of it, their time may be accelerated.”
If you don’t have a strong foundation in math or computing, don’t be discouraged, says Joseph Nelson, distinguished faculty at General Assembly, which offers tech boot camps and classes, and co-founder of artificial intelligence startup Roboflow. You’ll need to spend more time getting up to speed, but your current career might have given you strong critical thinking skills that would serve you well in data science. Also, you likely have expertise in a subject, which can be an asset.
“The toughest things in data science aren’t necessarily writing the best code or having the most awareness of how matrix multiplication works,” he says. “Those things are important, but sometimes the principal problem is defining the problem to be solved.”
Whether you want to learn all you need through self-study or you’re planning to apply to a data science program, you need to master a few skills. Here’s where to begin.
Mathematics and Data Science
A strong math background is essential for data scientists. Cummings says learning basic statistics and probability is the best place to start. Both subjects serve as the foundation for a lot of data science.
As you delve deeper into the field, many algorithms, referring to the instructions that you give the computer, use elements of calculus, so that’s important to understand as well. Finally, linear algebra is key for exploring topics like machine learning.
Free resources are available to those who need a refresher on these subjects. Check out:
- Khan Academy, an organization that publishes free courses on a variety of subjects.
- Platforms like EdX and Coursera, which allow you to audit classes for free.
- MIT OpenCourseWare, a free repository of course materials from the Massachusetts Institute of Technology.
- Fast.ai, a free technology education platform that offers a linear algebra course geared toward those who already have coding skills.
Programming Languages and Data Science
Data scientists use programming languages to extract, reformat and help analyze their data. Among the most popular languages right now are:
- Python. This programming language has a reputation for being easier to learn, and its code uses a lot of English words, like “print” and “class.” If you’re hoping to become a Pythonista – industry lingo for Python enthusiasts – the Python Software Foundation has a list of free tools and tutorials to get beginners started.
- R. Like with Python, novices may find it relatively easy to pick up R. You can download it for free.
Both have additional packages that you can download to increase their capabilities. You can learn these languages by taking classes on platforms like DataCamp, EdX and Coursera or even by searching for lectures on YouTube.
In the short term, Sokol says knowing at least R or Python is important. That will change, though. “No matter what you’re using now, that’s not going to be the language or platform of choice even in the medium-term future,” he says.
It’s essential to know how to pick up new languages. Sokol says you can practice that skill by working on a data set with a language you don’t know. If you run into problems, you can post a question on Stack Overflow, an online community that even professional programmers use for troubleshooting.
As you’re learning a new programming language, consider why you’re doing what you’re doing and don’t get too hung up on the nitty-gritty of the language itself.
Indeed, it’s one thing to learn the syntax for a particular language, Cummings says. It’s another to recognize a problem or a question you can translate into something a computer can operate on. Those sorts of skills might not be covered in a class.
“The language is not the important thing,” Cummings says. “An elementary school child knows basic English. But do you want them to be writing corporate reports? Probably not.”
Once you’ve taken a programming language course, Sokol suggests taking a data pipelining course, which he teaches at Georgia Tech. He says this type of class takes you through the steps of data analysis – “from collecting and scraping the data you need, cleaning it, and storing it; through accessing the data for computation; to creating visualizations (like graphs, charts and other helpful images) of the data and results.”
Also consider taking a course on using data in the industry you are interested in, such as health care or business.
Machine Learning and Data Science
Critical to data science is machine learning, the idea that someone can train a computer to notice patterns in data.
Here’s a classic example, Nelson says: Say you want to find a way for a computer to tell the difference between pictures of dogs and pictures of cats. “If we were doing explicit programming – meaning a non-machine-learning method – we might say, ‘look for pointy ears, look for whiskers, look for this triangle-shaped nose.’”
On the other hand, you could give the computer a ton of pictures identified as either cats or dogs and tell it to create its own way of finding the differences between the two. It might come up with the same distinguishing factors you’d know about anyway. But it might find differences you hadn’t considered.
“That’s the power of machine learning,” Nelson says. “It discovers relationships that humans might not have discovered.” The task of the data scientist, he says, is to find the algorithms that allow a computer to make those rules.
You probably encounter machine learning more than you’d think: For instance, machine learning connects you with shows on Netflix you might like, and Gmail uses it to make its spam filter more effective.
Business and Communications Skills
In data science, running the numbers is only half the job. Data scientists need to know how to ask the right questions and understand what sort of analysis would be valuable to the business. You must also find ways to explain your results to less tech-savvy colleagues who need to use your analysis to make decisions.
“Those are sort of hidden skills that people don’t talk about so much,” Sokol says. “But I think they’re really important.”
To practice your communications skills, you could maintain a blog that describes the work you’re doing. You might even consider taking a class in public speaking.
Here are some skills and tools that data scientists may need when they’re applying for jobs.
Top Skills and Attributes Listed as Requirements for Data Science Jobs
- Technical skills
- Computer Science
- Data Analytics
- Big Data
- Machine Learning
Source: ZipRecruiter, based on data science job postings from between Jan. 1 and May 15, 2020.
Keep in mind, though: Data science is a broad field, so the tools you use may vary depending on where you work and what your job function is. For example, Cummings says some data scientists may never use SQL, a common database querying language, but often use R. Also, what tools are popular is constantly evolving.
“The main thing is being able to translate a problem that’s been articulated to you … into a computational problem that addresses the specific needs of the audience,” Cummings says.
The first step is determining whether you like working with data, Nelson says. Usually that means doing some studying on your own, even if you plan on pursuing a master’s degree or enrolling in a boot camp later.
As you shop for online courses, check out their ratings and ask people for recommendations. Then, try things out.
Self-directed study might be enough for you if you’re hoping to make the transition to a more data-focused role inside your current company, Cummings says. You might take on more responsibilities as your skills develop, which may not require a formal credential.
“History is filled with people that have made contributions to art, humanities, all sorts of fields and had little or no formal training or no degree,” Cummings says.
However, he adds, “the certification becomes important because that tells potential employers that you’ve met certain standards.”
So, a boot camp or a master’s degree in data science may be worth considering when you’re making a more dramatic switch. If you’re weighing more formal education options, think about what’s most desirable in the industry you wish to work in, what you can afford, how much time you can spend completing your education, the quality of the program and whether it covers the topics you’re interested in.
“If someone is looking for the right credential, looking a little deeper than the name is important. That’s true at every level, from the individual course to the boot camp to the certificate to the degree,” Sokol says. “Look beyond the name and see what’s actually being taught.”
The costs of online data classes vary.
Many are available for free, particularly if you opt to audit the course, while others are on platforms that charge as much as $400 a month. You may be able to try out courses with a free trial. Some courses may be part of a track that leads to a certification or a microdegree, referring to a credential in a particular subject that you may be able to count toward a full degree later.
For online master’s programs in data science at a major university, total tuition ranges from $9,900 to more than $42,000. Data science boot camps cost around $8,940 to $16,000. A boot camp may only take a couple months to finish, and a master’s program could take a year or longer, depending on the program and whether you attend classes part or full time.
Whether you’re ready to make the switch is not always clear. “It’s sort of hard to go by feeling,” Sokol says, but if you look at a posting and think, “I could do that” as you read what you’d be responsible for, “then, you’re probably ready.”
The good news is that data scientists have been in demand in recent years. “Before COVID-19, data scientist was one of the fastest-growing occupations,” says Julia Pollak, labor economist for the job site ZipRecruiter: Nationwide, job openings rose 21% between 2017 and 2018, and 86% between 2018 and 2019.
Though the number of data science jobs has experienced a steeper-than-average drop in the last few months, Pollak says that number is “very likely to rebound.”
“As more activities shift online, businesses will be able to capture even more consumer data,” she says. “Those best at analyzing and using the data will be at a distinct advantage in the future. So the business case for hiring data scientists will only get stronger.”
When applying for data science jobs, you need a resume. Peter Cooman, senior applied data scientist at Civis Analytics, a Chicago-based data science firm, also recommends having a portfolio that shows not only your technical skills but also includes blog posts and other items that demonstrate your ability to communicate.
“It’s by no means a requirement,” Cooman says of a portfolio, but it can help make the case that you’re well-suited for the job. You can host your portfolio on a personal website or GitHub, a platform that allows users to share code.
During the application process, data scientist hopefuls typically complete a take-home test to demonstrate their technical proficiency, Cooman says.
Candidates who do well come in for an interview. Cooman looks for people with strong communication skills, problem-solving skills and an enthusiasm for the work and whose experience would complement the existing team.
Where you learned the skills is less of a factor, says Cooman, a biomedical engineer by training who taught himself data science through Coursera and EdX courses.
“The people who are active in the data science workforce, like me, who are interviewing candidates and who are making hiring decisions, became data scientists without any formal academic training,” he says. “We understand people’s plight. We understand there are many different paths people can take to get into the field.”
When vetting candidates, Ashok Srivastava, senior vice president and chief data officer at Intuit, says he prioritizes technical know-how over a particular type of education.
Regardless of where they learned their skills, applicants at the financial software company are tested on their knowledge of math, statistics, computer science, machine learning and other areas related to data science.
“In today’s world, there are many ways to learn,” Srivastava says. “We put the priority on the depth and breadth of the applicant’s knowledge, rather than how they acquired that knowledge.”