We hear about data science all the time. Harvard called it the sexiest job of the 21st century, Glassdoor has repeatedly ranked it the best job in the USA, and every week a new business publication is singing its praises. In fact, if you’re reading this, there’s a good chance that you’re looking to find the right school to teach you data science.
The weird thing about data science, though, is that no one person knows for sure exactly what it is — data science isn’t a well-defined term. We know, on some level, that it involves data analysis and programming, but we also know that there’s more to it than that.
Prospective students and seasoned data scientists alike find this frustrating. After all, if you’re looking to get into a field, generally you’d like to know exactly what you need to learn and how it all ties together. If you’re working in a field, it’s nice to be able to give a clear definition of what you do and why it’s important.
Now, people are starting to define data science. After working for months with a group of experts, Thinkful has put together a big answer to the perennial question: what is data science? It examines the history of data, looks at the ways companies are using big data and explains the key techniques data scientists are leveraging to make sense of it all.
While you should definitely check out the full piece, we want to share a few of the key aspects of data science, according to the article:
First, there’s data analysis, the bedrock. Most of us have worked with a spreadsheet program, typically Microsoft Excel (or, if you’re trendy, Google Sheets). This is small-scale data analysis. You bring in your information, organize it into a useful set of rows and columns, and then take it apart to gain some insights. Usually, you’ll present your findings visually, with charts.
But what happens when you have so many rows that you spend most of your time scrolling or just waiting for the thing to load?
That’s when you take the next step. Learning SQL lets you store all of that information in a database and pull only what you need. Understanding Python or R — the two top programming languages for data work — will help you process that data more efficiently and quickly than you ever could with a spreadsheet.
Building on top of that, we have machine learning. This is where things start to get data sciencey. There are three primary areas: supervised learning, unsupervised learning, and reinforcement learning. Here’s a quick definition of each: Supervised learning works by taking a big dataset of previous events, then predicting what will happen next based on prior events. Unsupervised learning takes in what looks like a bunch of data, and then groups things together that resemble each other. Reinforcement learning is less used, but basically works by rewarding or punishing the computer for taking certain actions — chess AI works this way.
This post was sponsored by Thinkful.