Most colleges have statistics courses. But there aren’t (yet) many data science courses. There is huge demand for data scientists, but a college student can’t take “Data Science 101″.
What is the difference between statistics and data science? w3r discussed this in a presentation to a group of upper-level math students at the University of Michigan – Dearborn. Here is a recap:
- Data Science projects draw upon a variety of data sources such as the web and large administrative data sets. Such data are not collected for analysis, so wrangling the data into a format conducive to analysis can be the biggest chunk of a data science project.
- Statistics projects typically have very structured output. This can be a table of coefficients, confidence intervals, and p-values. The product of a data science project is a story. While the creator of the data science product needs to know statistics, the consumer need not. Take this visualisation of World Cup loyalties by the New York Times. This is the product of a data science project that any soccer fan can appreciate — no statistics background required.
- Data Sciences may or may not involve heavy statistical modeling or machine learning. The point is to turn messy data into an insightful story, not to do complex modeling for its own sake.
Data science takes ugly, unruly data and turns it into a story.