# Machine Learning Comprehensive (Using R)

Ex utamur fierent tacimates duis choro an

Lorem ipsum dolor sit amet, ius minim gubergren ad. At mei sumo sonet audiam, ad mutat elitr platonem vix. Ne nisl idque fierent vix.

### Overview

Machine Learning is the basis for the most exciting careers in data analysis today. You’ll learn the models and methods and apply them to real world situations ranging from identifying trending news topics, to building recommendation engines, ranking sports teams and plotting the path of movie zombies.

### Objective

The course objective is to provide you an in depth understanding and hands on with current business problem using Machine Learning Classification & Regression techniques.

### Prerequisites

• Basic knowledge of statistics and Mathematics
• and basic programming knowledge

### What you'll learn

• Statistical Concept used in Machine Learning
• R Programming
• Data Pre-processing & Exploratory Data Analysis
• Advanced Regression & Classification technique
• Model Evaluation technique & deciding the best fit model
• How to Solve a real business problem using Machine Learning

### Course Outline

#### Introduction to Machine Learning

• Applications of Machine Learning in various fields
• How is Machine Learning different from traditional programming and reporting?
• Who are data scientists?
• What they do & what kind of projects they work on?

• Data types
• Continuous variable
• Ordinal variable
• Categorical Variable
• Time series
• Miscellaneous
• Descriptive statistics
• Inferential statistics

#### Sampling

• What is sampling
• Different types of sampling
• Simple random sampling
• Systematic sampling
• Stratified Sampling

#### Data Distribution

• Normal Distribution
• Binomial Distribution
• Skewness

#### Measures of Central Tendency

• Mean, Median, Mode
• Variance and Standard Deviation

#### Normalization

• What is Normalization?
• Different Types of Normalization
• Z score

#### Hypothesis testing

• Null & Alternate Hypothesis
• Type 1 and Type 2 Errors

#### Correlation

• What is Correlation?
• Correlation Coefficient
• Positive and Negative Correlation

#### Introduction to R

• An introduction to R programming.
• Type of objects in R
• Creating new variables or uploading existing variables
• If statement and For loops.
• String searching and manipulations.
• Reading data from data frames and text files.
• Casting and melting data to different formats.
• Merging datasets
• Filtering data using dplyr

#### Exploratory Data analysis and visualization

• Getting data into R – reading from files
• Linear Vs Non-Linear data
• Bi-variate and Multi-variate analysis
• Cleaning and preparing the data – converting data types (Character to numeric etc.)
• Handling missing values – Imputation or replacing with place holder values
• Visualization in R using ggplot2(lots and charts) – Histogram, bar charts, box plot, scatter plots
• Adding more dimension to the plots -geom.(), dodge etc.
• Correlation – Positive, negative and no correlation
• Correlation vs causation
• Data transformation

#### Predictive Analytics

• Different type of predictive analytics – prediction, forecasting etc
• Supervised learning

#### Linear Regression

• Assumptions
• Model development and interpretation
• Model validation – tests to validate assumptions
• Multiple linear regression
• Disadvantages of linear models

#### CLASSIFICATION: Classification: Logistic Regression

• Need for logistic regression
• Model development and interpretation – Example
• Confusion matrix – error measurement
• ROC curve
• Measuring sensitivity and specificity
• Advantages and disadvantages of logistic regression models

#### Decision Trees - Classification and regression tree

• Process of tree building
• Entropy and gini index
• Problem of over fitting
• Pruning a tree back
• Classification model development and validation – Example
• CART and CTREE – Example
• Advantages and disadvantages of tree based models

#### KNN - K nearest neighbours

• What is KNN?
• Model development and validation – Example

#### SVM - Support Vector Machines

• What is SVM?
• Maximum margin Classifier
• SVM for Non-Linear data – Kernels
• Model development and validation – Example

#### Cross validation

• Different types of cross validation techniques

#### Ensembles Methods: Random forest

• Bagging
• Random Feature Selection
• Hyper parameter Tuning
• Model development and validation – Example

#### Ensembles Methods: Xgboost (GBM)

• Boosting – Gradient boosting machines
• Model development and validation – Example
• Xgboost – Extreme Gradient Boosting
• Model development and validation – Example

#### Unsupervised learning

• What is unsupervised learning?
• Distance measures and Linkage Criteria
• Cluster analysis
• Hierarchical clustering
• Model development and interpretation – Example
• Cluster Dendrogram

#### K - mean clustering

• Model development and interpretation – Example
• Choosing optimal value of k (Elbow, Average silhouette and Gap statistics method)

#### Principal component analysis

• Need for PCA(Curse of dimensionality)
• Advantages of principal components
• Applications of PCA – Example

#### Model Validation and deployment

• Error measurement
• RMSE – Root mean squared error
• Area under the curve
• Cross validation
• Different types of cross validation techniques

#### Practical use cases and best practices

• Business problem to an analytical problem
• Problem definition and analytical method selection
• Guidelines in model development
• Course Id                                             A101