1. Missing Data – Why does it matter so much?Ever worked upon an analytical project and noticed the presence of blank or NAN or undefined values in the records representing the data and being in need of correctly dealing with them? This might be a routine situation while working with real world data. It becomes a crucial step to execute fair technique to handle these missing values after understanding the analysis required from the data as often data for one party can be a noise to another party. Data can be missing owing to corrupt data, incomplete data extraction process, data entry errors or simply the data is rare and is actually missing! But handling such data is of great challenge in order to make right decisions and generate robust predictive models or reports. This article sums up key steps to handle missing values using Smarten Augmented Analytics and further explains its utility from the Employee Salary Prediction dataset.
2. Just leave it or impute it!!The best possible methods to handle missing data are:
2.1. Remove records with missing values:
2.2. Replace missing values:
2.2.1. Replace numeric variables with medianWhen it comes to replacing numeric variables with a constant value, median is a better choice as compared to mean, mode and other statistical measures as it also very well deals with skewed data and data containing outliers. When data is missing completely at random, it’s fair to say that the missing values are most likely very close to the median distribution and it is a fast strategy to complete the dataset. However, if there is a substantial amount of missing data, using this technique causes distortion in the data distribution as well as original variance.
2.2.2. Replace categorical variables with mode
3. Smarten Assisted Predictive Modelling: Take the Guesswork out of Planning!Every organization must plan and forecast results. If the enterprise is to succeed, it must strive for accuracy and identify trends and patterns in the market and industry that will help it to predict future results, plan for growth and capitalize on opportunities. Smarten Insight provides predictive modeling capability and auto-recommendations and auto-suggestions to simplify use and allow business users to leverage predictive algorithms without the expertise and skill of a data scientist.
4. Above all else, show the dataLet’s gaze through the employee salary prediction dataset.
Employee Salary Prediction DatasetIt can be evident that we intend to predict the Salary of employees based upon their Gender, belonging to Senior Management or not, Team associated with as well as Bonus percentage being offered. This speaks of many missing values which need to be dealt with in the pre-processing stage itself. Also, it can be scrutinized that Bonus percentage is the only measure predictor and rest are dimensions. Let’s acquire the ability to operate such data using Smarten Augmented Analytics.
4.1. Create a fresh New Smarten Insight
Creating a new Smarten Insight
4.2. Select the data of your interest and click NEXT
Selecting the dataset to be handled for missing values
4.3. Perform Sampling and Filtering if required and click NEXT
Sampling and Filtering using Smarten
4.4. And here we go, perform data cleaning to handle missing data
Handling missing values using SmartenWe have to learn to interrogate our data collection process, not just our algorithms! With too little data, we won’t be able to make any conclusions that can be trusted. Making replacements in the data without understanding it, will again provide us with information approaching false decision making. Hence a healthy trade-off between these two as well as understanding the reasons why data are missing is important for handling the remaining data correctly!
Note: This article is based on Smarten Version 5.2. This may or may not be relevant to the Smarten version you may be using.
Original Post : Handling Missing Values using Smarten Augmented Analytics!