This article summarizes our recent article series on the definition, meaning and use of the various algorithms and analytical methods and techniques used in predictive analytics for business users, and in augmented data preparation and augmented data discovery tools.
The article series is designed to help business users better understand the analytical techniques so that the average user can feel more confident in adopting, embracing and sharing these tools.
This Thirty (30) article series includes:
Naïve Bayes Classification:
Use Case(s): Weather Forecasting, Fraud Analysis and more.
Frequent Pattern Mining (Association):
Use Case(s): Market Basket Analysis, Frequently Bundled Products and more.
Use Case(s): Predicting Loan Default, Predicting Success of Medical Treatment and more.
Multiple Linear Regression:
Use Case(s): Impact of Product Pricing, Promotion on Sales, Impact of rainfall, humidity on crop yield and more.
Independent Samples T Test:
Use Case(s): Are men more satisfied with their jobs than women? Does customer group A spend more on products than customer group B, and more.
Simple Random Sampling and Stratified Random Sampling:
Use Case(s): Average value of all cars in U.S. based on sample, sampling by age, gender, religion, race, educational attainment, socioeconomic status, and nationality and more.
Spearman’s Rank Correlation:
Use Case(s): Cluster various survey responders into groups, based on rank correlation, assess student rating by department chairs and by the faculty members and more.
Binary Logistic Regression Classification:
Use Case(s): Predict if loan default based on attributes of applicant; predict likelihood of successful treatment of new patient based on patient attributes and more.
Paired Sample T Test:
Use Case(s): Manufacturing unit manager analyzes statistical significance of cycle time difference, pre and post process change, determine whether sales increased following a particular campaign and more.
Simple Linear Regression:
Use Case(s): Measure the impact of product price on product sales, measure the impact of temperature on crop yield an more.
Use Case(s): Forecast product line growth based on data from the past 30 years based on yearly consumer inflation rate, yearly GDP data, target variables for user-specified time periods to clearly illustrate results for planning, production, sales and other factors and more.
Karl Pearson Correlation Analysis:
Use Case(s): Correlation between income and credit card delinquency rate, identify negative, positive and neutral correlations between the age of a consumer and the color of shirt they might purchase and more.
Use Case(s): Group loan applicants into high/medium/low risk based on attributes such as loan amount, installments, or employment tenure, organize customers into groups/segments based on similar traits, product preferences and expectations and more.
SVM Classification Analysis:
Use Case(s): Predict success of treatment success based on attributes of a patient, improve weather forecasting results and more.
Use Case(s): Outliers are sometimes discounted, or in other cases, they will indicate that the organization should focus solely on those outliers; identify when a person recovered from a particular disease in spite of the fact that most other patients did not survive, and more.
Decision Tree Analysis:
Use Case(s): Classify customers into those that will default and those that will not default. And assess the characteristics of customers that are likely to default, based on customer attributes and past online shopping behavioral data, one can predict the future purchases of customers and more.
Chi Square Test of Association:
Use Case(s): Determine if a product sells better in certain locations, verify if gender has an influence on purchasing decisions, Identify if demographic factors influence banking channel/product/service preference or selection of a type of term insurance plan and more.
FP Growth Analysis:
Use Case(s): Select items in a business catalog to complement each other so that buying one item will lead to buying another, analyze the association of purchased items in a single basket or single purchase and more.
Use Case(s): Predict sales of a drug for the next 2 months, based on drug sales from the past 12 months, suitable for forecasting when data is stationary or non-stationary, will produce accurate, dependable forecasts, when planning for short-term business results and more.
Multinomial-Logistic Regression Classification:
Use Case(s): Based on the attributes of a respondent e.g., demographics, marital status, gender, income, age, qualification etc., analysis can check the level of likely satisfaction with life/job/product/services, given a list of symptoms, one can predict if a patient is likely to be diagnosed with initial/intermediate/serious stages of a particular disease and more.
KMeans Clustering Algorithm:
Use Case(s): Loan applicants grouped as low, medium, and high risk based on applicant age, annual income, employment tenure, a movie ticket booking website can group users into frequent ticket buyers, moderate ticket buyers and occasional ticket buyers, based on past movie ticket purchases, and more.
Use Case(s): Average age and income for a particular type of product category purchased, Identify the most popular dish served in the restaurant or find out the most frequent rating given by customers for a given movie/restaurant or most frequent size or category of a sold product and more.
Use Case(s): Forecasting number of viewers by day for a particular game show for next two months.
Input data: Last six months daily viewer count data, insurance claim manager can forecast policy sales for next month based on past 12 months data and more.
Trends and Patterns:
Use Case(s): identify seasonality pattern when fluctuations repeat over fixed periods of time and where patterns do not extend beyond 1 year, analyze a stationary time series with statistical properties, where variances are all constant over time, or cyclical when fluctuations do not repeat over fixed periods of time, are unpredictable and extend beyond a year, and more.
Gradient Boosting Regression:
Use Case(s): Impact of temperature, rainfall and humidity on crop production, clarify relationships among factors such as seasonality, product pricing and product promotions and more.
Random Forest Regression:
Use Case(s): The impact of average rainfall, city location, parking availability, distance from hospital, and distance from shopping on the price of a house, or the impact of years of experience, position and productive hours on employee salary and more.
Use Case(s): Decide Loan Eligibility based on Applicant’s Annual income, Employment Period, Debt to Income Ratio etc. and predicting diamond prices using basic measurement metrics.
Random Forest Classification:
Use Case(s): Based on the historical data related to credit card payments, loan payments, existing loan status, job status we want to classify/divide the customers into defaulters and non defaulters and the data is a result of analysis to determine the quality of the red wine based upon chemicals it consists of.
Generalized Linear Regression (Gaussian Distribution):
Use Case(s): Identifying the profit made by each product based upon various factors like its total revenue, number of units sold, region of sale etc. and the predictive model will help us identify, profit on different products based on the sales, region and other cost factors.
Multilayer Perceptron Classifier:
Use Case(s): Identifying the important factors that lead to employee attrition and the right type of medication/treatment for various patients admitted in the hospital.
Each of these techniques, methods and algorithms has a unique value in advanced analytics. Augmented Data Discovery tools allow business users to gather and analyze data using these techniques within a sophisticated, intuitive navigation that is designed to guide users through the processing of selecting the appropriate algorithm or analytical technique based on the type of data selected.
This article series will help business users understand the concepts and the benefits of each technique, as well as the logic behind the application of these techniques, and the value-added auto-recommendations and suggestions provided by comprehensive augmented analytics tools.
The Smarten approach to augmented analytics and modern business intelligence focuses on the business user and provides tools for Advanced Data Discovery so users can perform early prototyping and test hypotheses without the skills of a data scientist. Smarten Augmented Analytics tools include Assisted Predictive Modeling, Smart Data Visualization, Self-Serve Data Preparation, Clickless Analytics with natural language processing (NLP) for search analytics, Auto Insights, Key Influencer Analytics, and SnapShot monitoring and alerts. These tools are designed for business users with average skills and require no specialized knowledge of statistical analysis or support from IT or data scientists. Businesses can advance Citizen Data Scientist initiatives with in-person and online workshops and self-paced eLearning courses designed to introduce users and businesses to the concept, illustrate the benefits and provide introductory training on analytical concepts and the Citizen Data Scientist role.
The Smarten approach to data discovery is designed as an augmented analytics solution to serve business users. Smarten is a representative vendor in multiple Gartner reports including the Gartner Modern BI and Analytics Platform report and the Gartner Magic Quadrant for Business Intelligence and Analytics Platforms Report.