Data Literacy

Enable Business Transformation
Improve Knowledge and Skill

In this rapidly changing business environment and market, every organization must make the best of precious human resources. Just as markets benefit from disruption, the enterprise too can benefit from disruption. Improving Data Literacy is a good way to disrupt the business culture and transform business users into Citizen Data Scientists. When business users are empowered with business analytics each team member can leverage core business skills and knowledge and complement that knowledge with augmented analytics to innovate, solve problems and support organizational goals

A Data Literate business user can manipulate, analyze and understand data and has a basic understanding of analytical techniques and outcomes

Enabling Data Literacy within your organization is a foundational step to the transformation of business users into Citizen Data Scientists, enabling team members to make confident day-to-day decisions, share data and perform Advanced Analytics, all without the assistance of IT professionals or analysts.

Algorithms, Analytical Methods and Techniques

Predictive Analytics includes numerous types of algorithms and, while data scientists are skilled in choosing and executing these algorithms or analysis, the average business user does not have these skills. The learning opportunities provided here represent an important part of our support for business users. These presentations help users to understand the basic premise of a particular algorithm and how it works, as well as the type of use cases or scenario that would most benefit from application of this algorithm.

Naive Bayes is a classification algorithm that is suitable for binary and multiclass classification. It is a supervised classification technique used to classify future objects by assigning class labels to instances/records using conditional probability. In supervised classification, training data are already labeled with a class. For example, if fraudulent transactions are already flagged in transactional data and if we want to classify future transactions into fraudulent/non fraudulent, then that type of classification would be called supervised...

See Our Article on this Topic:
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?

Frequent Pattern Mining (AKA Association Rule Mining) is an analytical process that finds frequent patterns, associations, or causal structures from data sets found in various kinds of databases such as relational databases, transactional databases, and other data repositories. Given a set of transactions, this process aims to find the rules that enable us to predict the occurrence of a specific item based on the occurrence of other items in the transaction...

See Our Article on this Topic:
What is Frequent Pattern Mining (Association) and How Does it Support Business Analysis?

The KNN (K Nearest Neighbors) algorithm analyzes all available data points and classifies this data, then classifies new cases based on these established categories. It is useful for recognizing patterns and for estimating. Let’s say we want to determine the likelihood of loan default based on two predictors (age and loan type), with ‘default’ being the target...

See Our Article on this Topic:
What is KNN Classification and How Can This Analysis Help an Enterprise?

Multiple Linear Regression is a statistical technique that is designed to explore the relationship between two or more variables (X, and Y). It is useful in identifying important factors (X,) that will impact a dependent variable (Y), and the nature of the relationship between each of the factors and the dependent variable...

See Our Article on this Topic:
What is Multiple Linear Regression and How Can it be Helpful for Business Analysis?

The independent sample t-test is a statistical method of hypothesis testing that determines whether there is a statistically significant difference between the means of two independent samples...

See Our Article on this Topic:
What is the Independent Samples T Test Method of Analysis and How Can it Benefit an Organization?

Sampling is the technique of selecting a representative part of a population for the purpose of determining the characteristics of the whole population. There are two types of sampling analysis: Simple Random Sampling and Stratified Random Sampling...

See Our Article on this Topic:
What Are Simple Random Sampling and Stratified Random Sampling Analytical Techniques?

Correlation is a statistical measure that indicates the extent to which two variables fluctuate together A positive correlation indicates the extent to which those variables increase or decrease in parallel. A negative correlation indicates the extent to which one variable increases as the other decreases. The Spearman’s Rank Correlation is a measure of correlation between two ranked (ordered) variables. This method measures the strength and direction of association between two sets of data when ranked by each of their quantities...

See Our Article on this Topic:
What is Spearman’s Rank Correlation and How is it Useful for Business Analysis?

Logistic regression measures the relationship between the categorical target variable and one or more independent variables. It is useful for situations in which the outcome for a target variable can have only two possible types (in other words, it is binary). Binary Logistic Regression Classification makes use of one or more predictor variables that may be either continuous or categorical to predict the target variable classes. This technique helps to identify important factors (Xi) impacting the target variable (Y) and also the nature of the relationship between each of these factors and the dependent variable...

See Our Article on this Topic:
What is Binary Logistic Regression Classification and How is it Used in Analysis?

The Paired Sample T Test is used to determine whether the mean of a dependent variable e.g., weight, anxiety level, salary, reaction time, etc., is the same in two related groups. For example, one might consider two groups of participants that are measured at two different “time points” or two groups that are subjected to two different “conditions”. Paired T Test is used to evaluate the before and after of a situation, treatment, condition, etc...

See Our Article on this Topic:
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?

Simple Linear Regression is a statistical technique that attempts to explore the relationship between one independent variable (X) and one dependent variable (Y). This method helps a business to identify the relationship between X and Y and the nature and direction of that relationship...

See Our Article on this Topic:
What is Simple Linear Regression and How Can an Enterprise Use this Technique to Analyze Data?

An Autoregressive Integrated Moving Average with Explanatory Variable (ARIMAX) model can be viewed as a multiple regression model with one or more autoregressive (AR) terms and/or one or more moving average (MA) terms. This method is suitable for forecasting when data is stationary/non stationary, and multivariate with any type of data pattern, i.e., level/trend /seasonality/cyclicity...

See Our Article on this Topic:
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?

Correlation is a statistical measure that indicates the extent to which two variables fluctuate together. A positive correlation indicates the extent to which those variables increase or decrease in parallel. A negative correlation indicates the extent to which one variable increases as the other decreases. The Karl Pearson’s correlation measures the degree of linear relationship between two variables...

See Our Article on this Topic:
What is Karl Pearson Correlation Analysis and How Can it be Used for Enterprise Analysis Needs?

Hierarchical Clustering is a process by which objects are classified into a number of groups so that they are as much dissimilar as possible from one group to another group and as much similar as possible within each group...

See Our Article on this Topic:
What is Hierarchical Clustering and How Can an Organization Use it to Analyze Data?

SVM Classifications are based on the idea of finding a hyper plane that best divides a dataset into predefined classes, as shown in the image below. The goal is to choose a hyperplane with the greatest possible margin between the hyper-plane and any point within the training set, giving a greater chance of new data being classified correctly...

See Our Article on this Topic:
What is SVM Classification Analysis and How Can It Benefit Business Analytics?

An outlier is an element of a data set that distinctly stands out from the rest of the data. In other words, outliers are those data points that lie outside the overall pattern of distribution as shown in figure below...

See Our Article on this Topic:
What is Outlier Analysis and How Can It Improve Analysis?

There are two basic types of decision tree analysis: Classification and Regression...

See Our Article on this Topic:
What is the Decision Tree Analysis and How Does it Help a Business to Analyze Data?

It is used to determine whether there is a statistically significant association between the two categorical variables. This technique is used to determine if the relationship exists between any two business parameters that are of categorical data type. One might use this technique to determine whether gender is related to a voting preference or whether the two data points are independent and unrelated. An enterprise might also use Chi Square to determine if there is a relationship between the region in which a product is purchased, and the product or category of product that is purchased...

See Our Article on this Topic:
What is the Chi Square Test of Association and How Can it be Used for Analysis?

Frequent pattern mining (previously known as Association) is an analytical algorithm that is used by businesses and, is accessible in some self-serve business intelligence solutions. The FP Growth analytical technique finds frequent patterns, associations, or causal structures from data sets in various kinds of databases such as relational databases, transactional databases, and other forms of data repositories...

See Our Article on this Topic:
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining to Analyze Data?

Autoregressive Integrated Moving Average (ARIMA) predicts future values of a time series using a linear combination of its past values and a series of errors. This analytical forecasting method is suitable for instances when data is stationary/non stationary and is univariate, with any type of data pattern, i.e., level/trend/seasonality/cyclicity.

See Our Article on this Topic:
What is ARIMA Forecasting and How Can it Be Used for Enterprise Analysis?

Logistic regression measures the relationship between the categorical target variable and one or more independent variables It deals with situations in which the outcome for a target variable can have two or more possible types. Logistic regression makes use of one or more predictor variables that can be either continuous or categorical and predicts the target variable classes. Logistic regression model output is helpful in identifying important factors that will impact the target variable and the nature of relationships between each of these factors and dependent variables...

See Our Article on this Topic:
What is the Multinomial-Logistic Regression Classification Algorithm and How Does One Use it for Analysis?

The KMeans Clustering algorithm is a process by which objects are classified into number of groups so that they are as much dissimilar as possible from one group to another, and as much similar as possible within each group. KMeans Clustering is a grouping of similar things or data. For example, objects within group 1 (cluster 1) shown in image below should be as similar as possible...

See Our Article on this Topic:
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to Analyze Data?

Descriptive statistics helps users to describe and understand the features of a specific dataset, by providing short summaries and a graphic depiction of the measured data. There are numerous methods of descriptive statistics, including Mean, Median, and Mode methods of averaging data and percentile, quartile, skewness and standard deviation/variance measurements as well as plotting methods like box plots and histograms...

See Our Article on this Topic:
What is Descriptive Statistics and How Do You Choose the Right One for Enterprise Analysis?

The Holt-Winters algorithm is used for forecasting and It is a time-series forecasting method. Time series forecasting methods are used to extract and analyze data and statistics and characterize results to more accurately predict the future based on historical data...

See Our Article on this Topic:
What is the Holt-Winters Forecasting Algorithm and How Can it be Used for Enterprise Analysis?

This statistical technique is used to explore the relationship between two or more variables ( Xand Y ). Analytical output identifies important factors ( Xi ) impacting the dependent variable (y) and the nature of the relationship between each of these factors and the dependent variable. Gradient Boosting Regression is limited to predicting numeric output so dependent variable must be numeric in nature…

See Our Article on this Topic:
What is Gradient Boosting Regression and How is it Used for Enterprise Analysis? 

Random Forest Regression creates a set of Decision Trees from a randomly selected subset of the training set, and aggregates by averaging values from different decision trees to decide the final target value. Random Forest Regression is limited to predicting numeric output so dependent variable must be numeric in nature…

See Our Article on this Topic:
What is Random Forest Regression and How Can it Help Your Business? 

 

Isotonic Regression is a statistical technique of fitting a free-form line to a sequence of observations such that the fitted line is non-decreasing (or non-increasing) everywhere, and lies as close to the observations as possible. Isotonic Regression is limited to predicting numeric output so the dependent variable must be numeric in nature…

See Our Article on this Topic:
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?

 

Random Forest Classification is a machine learning technique utilizing the aggregated outcome of many decision tree classifiers in order to improve precision of the outcome. It measures the relationship between the categorical target variable and one or more independent variables.

See Our Article on this Topic:
What Is Random Forest Classification And How Can It Help Your Business?

 

Generalized Linear Regression with Gaussian Distribution is a statistical technique which is a flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution. The Generalized Linear Model (GLM) generalizes linear regression by allowing the linear model to be related to the response variable via a link function (in this case link function being Gaussian Distribution) and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

See Our Article on this Topic:
What Is Generalized Linear Regression with Gaussian Distribution And How Can An Enterprise Use This Technique To Analyze Data?

 

Multilayer Perceptron (MLP) is a technique of feed-forward artificial neural networks using a back propagation learning method to classify the target variable used for supervised learning. It consists of multiple layers and nonlinear activation allowing it to distinguish data that is not linearly separable.

See Our Article on this Topic:
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise Analysis?

 


Browse our Learn and Explore page for more learning and educational opportunities, including Smarten workshops which provide a foundation for Citizen Data Scientist transformation, define new and expanded roles within the organization. If you or your business are ready to take the Citizen Data Scientist journey, you can find more information on our FREE online Citizen Data Scientist course and one-day in-person or online instructor-led workshops.

Contact us now to find out more about our services and products.

Get started today with our self-paced FREE online Citizen Data Scientist course.

×

Coming soon!