Data Science Process
to break down any data science problem using this process
According to a KDNuggets poll, 43% of the advanced analytics projects use the CRISP-DM methodology which we’re going to present.
The process is composed of 5 steps.
The key point to note is that the Process is iterative rather than linear. It means that we can and should go back and forth between the steps.
The first step is Business Understanding. It’s the most critical step of the process. You need to frame the problem.
At the end of this stage you should have a deep understanding of the problem you want to resolve and a clear idea about the data you will use.
The second step is Data Understanding.
Your business knowledge will help you to contextualize your data.
You notice that the steps Business understanding and Data Understanding are linked together with a double arrow
The third step is Data Preparation.
In this stage you will check for the common issues like missing values and outliers. Also doing operating like filtering merging and transformation.
Also you run some data exploration using graphics and tables
The fourth step is the Modeling.
In this stage you build your model (for example a regression or a classification).
You notice that this step is linked to the previous one with a double arrow which mean that you will often need to step back to Data Preparation.
The next step is Evaluation.
Every model you build has to be evaluated in term of accuracy, robustness and deployability.
You notice that at this stage you may have to step back to the Business Understanding stage if the model you have built could not be deployed.
The last step is Deployment.
The final purpose of any data science project is to give actionable insight.
Here is a Prezi presentation of the main points presented in the article