Data Science Process

to break down any data science problem using this process

by Mohamed Jendoubi, founder @ uluumy

According to a KDNuggets poll, 43% of the advanced analytics projects use the CRISP-DM methodology which we’re going to present.

The article sums-up the lecture Data Science Process from our course Become a Citizen Data Scientist
In this article, we’re going to present the CRIS-DM Data Science Process.

6 Steps

The process is composed of 5 steps.
The key point to note is that the Process is iterative rather than linear. It means that we can and should go back and forth between the steps.

Business Understanding

The first step is Business Understanding. It’s the most critical step of the process. You need to frame the problem.
At the end of this stage you should have a deep understanding of the problem you want to resolve and a clear idea about the data you will use.

Data Understanding

The second step is Data Understanding.
Your business knowledge will help you to contextualize your data.
You notice that the steps Business understanding and Data Understanding are linked together with a double arrow

Data Preparation

The third step is Data Preparation.
In this stage you will check for the common issues like missing values and outliers. Also doing operating like filtering merging and transformation.
Also you run some data exploration using graphics and tables


The fourth step is the Modeling.
In this stage you build your model (for example a regression or a classification).
You notice that this step is linked to the previous one with a double arrow which mean that you will often need to step back to Data Preparation.


The next step is Evaluation.
Every model you build has to be evaluated in term of accuracy, robustness and deployability.
You notice that at this stage you may have to step back to the Business Understanding stage if the model you have built could not be deployed.


The last step is Deployment.
The final purpose of any data science project is to give actionable insight.

Here is a Prezi presentation of the main points presented in the article

Take a look to our course Become a Citizen Data Scientist

CRISP-DM : Cross Industry Standard Process for Data Mining

Share it: