Augmented Analytics Algorithms and Techniques: Learning for Citizen Data Scientists

This article summarizes our recent article series on the definition, meaning and use of the various algorithms and analytical methods and techniques used in predictive analytics for business users, and in augmented data preparation and augmented data discovery tools.

The article series is designed to help business users better understand the analytical techniques so that the average user can feel more confident in adopting, embracing and sharing these tools.

This Thirty (30) article series includes:

Naïve Bayes Classification:

What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?

Use Case(s): Weather Forecasting, Fraud Analysis and more.

Frequent Pattern Mining (Association):

What is Frequent Pattern Mining (Association) and How Does it Support Business Analysis?

Use Case(s): Market Basket Analysis, Frequently Bundled Products and more.

KNN Classification:

What is KNN Classification and How Can This Analysis Help an Enterprise?

Use Case(s): Predicting Loan Default, Predicting Success of Medical Treatment and more.

Multiple Linear Regression:

What is Multiple Linear Regression and How Can it be Helpful for Business Analysis?

Use Case(s): Impact of Product Pricing, Promotion on Sales, Impact of rainfall, humidity on crop yield and more.

Independent Samples T Test:

What is the Independent Samples T Test Method of Analysis and How Can it Benefit an Organization?

Use Case(s): Are men more satisfied with their jobs than women? Does customer group A spend more on products than customer group B, and more.

Simple Random Sampling and Stratified Random Sampling:

What Are Simple Random Sampling and Stratified Random Sampling Analytical Techniques?

Use Case(s): Average value of all cars in U.S. based on sample, sampling by age, gender, religion, race, educational attainment, socioeconomic status, and nationality and more.

Spearman’s Rank Correlation:

What is Spearman’s Rank Correlation and How is it Useful for Business Analysis?

Use Case(s): Cluster various survey responders into groups, based on rank correlation, assess student rating by department chairs and by the faculty members and more.

Binary Logistic Regression Classification:

What is Binary Logistic Regression Classification and How is it Used in Analysis?

Use Case(s): Predict if loan default based on attributes of applicant; predict likelihood of successful treatment of new patient based on patient attributes and more.

Paired Sample T Test:

What is the Paired Sample T Test and How is it Beneficial to Business Analysis?

Use Case(s): Manufacturing unit manager analyzes statistical significance of cycle time difference, pre and post process change, determine whether sales increased following a particular campaign and more.

Simple Linear Regression:

What is Simple Linear Regression and How Can an Enterprise Use this Technique to Analyze Data?

Use Case(s): Measure the impact of product price on product sales, measure the impact of temperature on crop yield an more.

ARIMAX Forecasting:

What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?

Use Case(s): Forecast product line growth based on data from the past 30 years based on yearly consumer inflation rate, yearly GDP data, target variables for user-specified time periods to clearly illustrate results for planning, production, sales and other factors and more.

Karl Pearson Correlation Analysis:

What is Karl Pearson Correlation Analysis and How Can it be Used for Enterprise Analysis Needs?

Use Case(s): Correlation between income and credit card delinquency rate, identify negative, positive and neutral correlations between the age of a consumer and the color of shirt they might purchase and more.

Hierarchical Clustering:

What is Hierarchical Clustering and How Can an Organization Use it to Analyze Data?

Use Case(s): Group loan applicants into high/medium/low risk based on attributes such as loan amount, installments, or employment tenure, organize customers into groups/segments based on similar traits, product preferences and expectations and more.

SVM Classification Analysis:

What is SVM Classification Analysis and How Can It Benefit Business Analytics?

Use Case(s): Predict success of treatment success based on attributes of a patient, improve weather forecasting results and more.

Outlier Analysis:

What is Outlier Analysis and How Can It Improve Analysis?

Use Case(s): Outliers are sometimes discounted, or in other cases, they will indicate that the organization should focus solely on those outliers; identify when a person recovered from a particular disease in spite of the fact that most other patients did not survive, and more.

Decision Tree Analysis:

What is the Decision Tree Analysis and How Does it Help a Business to Analyze Data?

Use Case(s): Classify customers into those that will default and those that will not default. And assess the characteristics of customers that are likely to default, based on customer attributes and past online shopping behavioral data, one can predict the future purchases of customers and more.

Chi Square Test of Association:

What is the Chi Square Test of Association and How Can it be Used for Analysis?

Use Case(s): Determine if a product sells better in certain locations, verify if gender has an influence on purchasing decisions, Identify if demographic factors influence banking channel/product/service preference or selection of a type of term insurance plan and more.

FP Growth Analysis:

What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining to Analyze Data?

Use Case(s): Select items in a business catalog to complement each other so that buying one item will lead to buying another, analyze the association of purchased items in a single basket or single purchase and more.

ARIMA Forecasting:

What is ARIMA Forecasting and How Can it Be Used for Enterprise Analysis?

Use Case(s): Predict sales of a drug for the next 2 months, based on drug sales from the past 12 months, suitable for forecasting when data is stationary or non-stationary, will produce accurate, dependable forecasts, when planning for short-term business results and more.

Multinomial-Logistic Regression Classification:

What is the Multinomial-Logistic Regression Classification Algorithm and How Does One Use it for Analysis?

Use Case(s): Based on the attributes of a respondent e.g., demographics, marital status, gender, income, age, qualification etc., analysis can check the level of likely satisfaction with life/job/product/services, given a list of symptoms, one can predict if a patient is likely to be diagnosed with initial/intermediate/serious stages of a particular disease and more.

KMeans Clustering Algorithm:

What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to Analyze Data?

Use Case(s): Loan applicants grouped as low, medium, and high risk based on applicant age, annual income, employment tenure, a movie ticket booking website can group users into frequent ticket buyers, moderate ticket buyers and occasional ticket buyers, based on past movie ticket purchases, and more.

Descriptive Statistics:

What is Descriptive Statistics and How Do You Choose the Right One for Enterprise Analysis?

Use Case(s): Average age and income for a particular type of product category purchased, Identify the most popular dish served in the restaurant or find out the most frequent rating given by customers for a given movie/restaurant or most frequent size or category of a sold product and more.

Holt-Winters Forecasting:

What is the Holt-Winters Forecasting Algorithm and How Can it be Used for Enterprise Analysis?

Use Case(s): Forecasting number of viewers by day for a particular game show for next two months.

Input data: Last six months daily viewer count data, insurance claim manager can forecast policy sales for next month based on past 12 months data and more.

Trends and Patterns:

What Are Data Trends and Patterns, and How Do They Impact Business Decisions?

Use Case(s): identify seasonality pattern when fluctuations repeat over fixed periods of time and where patterns do not extend beyond 1 year, analyze a stationary time series with statistical properties, where variances are all constant over time, or cyclical when fluctuations do not repeat over fixed periods of time, are unpredictable and extend beyond a year, and more.

Gradient Boosting Regression: 

What is Gradient Boosting and How is it Used for Enterprise Analysis?

Use Case(s): Impact of temperature, rainfall and humidity on crop production, clarify relationships among factors such as seasonality, product pricing and product promotions and more.

Random Forest Regression: 

What is Random Forest Regression and How Can it Help Your Business?

Use Case(s): The impact of average rainfall, city location, parking availability, distance from hospital, and distance from shopping on the price of a house, or the impact of years of experience, position and productive hours on employee salary and more.

Isotonic Regression:

What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?

Use Case(s): Decide Loan Eligibility based on Applicant’s Annual income, Employment Period, Debt to Income Ratio etc. and predicting diamond prices using basic measurement metrics.

Random Forest Classification:

What Is Random Forest Classification And How Can It Help Your Business?

Use Case(s): Based on the historical data related to credit card payments, loan payments, existing loan status, job status we want to classify/divide the customers into defaulters and non defaulters and the data is a result of analysis to determine the quality of the red wine based upon chemicals it consists of.

Generalized Linear Regression (Gaussian Distribution):

What Is Generalized Linear Regression with Gaussian Distribution And How Can An Enterprise Use This Technique To Analyze Data?

Use Case(s): Identifying the profit made by each product based upon various factors like its total revenue, number of units sold, region of sale etc. and the predictive model will help us identify, profit on different products based on the sales, region and other cost factors.

Multilayer Perceptron Classifier:

What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise Analysis?

Use Case(s): Identifying the important factors that lead to employee attrition and the right type of medication/treatment for various patients admitted in the hospital.

Each of these techniques, methods and algorithms has a unique value in advanced analytics. Augmented Data Discovery tools allow business users to gather and analyze data using these techniques within a sophisticated, intuitive navigation that is designed to guide users through the processing of selecting the appropriate algorithm or analytical technique based on the type of data selected.

This article series will help business users understand the concepts and the benefits of each technique, as well as the logic behind the application of these techniques, and the value-added auto-recommendations and suggestions provided by comprehensive augmented analytics tools.

You can find more educational resources by browsing our Augmented Analytics Learning and Augmented Analytics Videos pages.

About Smarten

The Smarten approach to augmented analytics and modern business intelligence focuses on the business user and provides tools for Advanced Data Discovery so users can perform early prototyping and test hypotheses without the skills of a data scientist. Smarten Augmented Analytics tools include Assisted Predictive Modeling, Smart Data Visualization, Self-Serve Data Preparation, Clickless Analytics with natural language processing (NLP) for search analytics, Auto Insights, Key Influencer Analytics, and SnapShot monitoring and alerts. These tools are designed for business users with average skills and require no specialized knowledge of statistical analysis or support from IT or data scientists. Businesses can advance Citizen Data Scientist initiatives with in-person and online workshops and self-paced eLearning courses designed to introduce users and businesses to the concept, illustrate the benefits and provide introductory training on analytical concepts and the Citizen Data Scientist role.

The Smarten approach to data discovery is designed as an augmented analytics solution to serve business users. Smarten is a representative vendor in multiple Gartner reports including the Gartner Modern BI and Analytics Platform report and the Gartner Magic Quadrant for Business Intelligence and Analytics Platforms Report.

Original Post: Augmented Analytics Algorithms and Techniques: Learning for Citizen Data Scientists