Data is at the core of our business at ConWX, and we know that the quality of input data is reflected in the accuracy of our forecasts. That is why cleaning data is the first step to ensure we have the best input data for training our models. It helps us to diagnose issues such as outliers, missing values, and noisy data, which all affect data quality.
Some estimate that data scientists spend 80% of their time cleaning and manipulating data, and less than 20% analysing it. In our experience, the ratio is not that dire, but truth be told, data cleaning is a big part of our work at ConWX.
Taking the amount of time used on cleaning data, we have made a few guidelines on, how to make data cleaning as smooth and easy as possible.
Advice on data cleaning from our data scientists