Data Mining basic and advanced concepts - Part 1 : Variables
It is important to get to know your data. It's very tempting to jump straight to analyzing and forecasting or building predictive or regression models. But Data needs to be ready for that! Real world data is typically noisy, imperfect and heterogeneous. So cleaning this data and getting familiar with its structure is a crucial task. Usually, this task of data cleaning is time-consuming. I would say from a personal experience that about 75 to 80 percent of the time spent with data is time dedicated to getting it ready in a shape suitable for analysis and model building.