Chapter 13- Instance Based Learning
Chapter Outline
- Chapter Outline
- Quick Recap
- Basics of Instance-based Learning
- k-Nearest Neighbor (k-NN) Algorithm
- Applying k-NN Algorithm on Numeric Data – A Step by Step Example
- Applying k-NN Algorithm on Categorical Data – A Step by Step Example
- Chapter Summary
Quick Recap
- Quick Recap – Bayesian Learning
- Alhumdulilah, Up till now, we have studied the following Machine Learning Algorithms
- FIND-S Algorithm
- List Then Eliminate Algorithm
- Candidate Elimination Algorithm
- ID3 Algorithm
- Perceptron Algorithm (Two Layer ANN)
- Regular Neural Network (Multi-layer ANN)
- Major Problems
- Problem 1
- These ML Algorithms do not allow us to combine prior knowledge (or past experience) with our Training Examples, when learning a Concept
- Problem 2
- These ML Algorithms either accept or reject a Hypothesis (h) i.e. take a Binary Decision
- Instead of assigning a probability to each Hypothesis (h) in the Hypothesis Space (H)
- A Possible Solution
- Bayesian Learning Algorithms
- Naïve Bayes – Summary
- Representation of Training Examples (D)
- Attribute-Value Pair
- Representation of Hypothesis (h)
- Probability Distribution
- Searching Strategy
- I am not clear about Search Strategy of Naïve Bayes Algorithm. Please drop me an email if you know it.
- Training Regime
- Batch Method
- Strengths
- Naïve Bayes Algorithm allows us to combine prior knowledge (or past experience) with our Training Examples, when learning a Concept
- Naïve Bayes Algorithm assigns a probability to each Hypothesis (h) in the Hypothesis Space (H)
- Instead of simply accepting or rejecting a Hypothesis (h)
- Naïve Bayes Algorithm can handle noisy Data
- Naïve Bayes Algorithm show good Accuracy on Testing Data even if the Set of Training Examples (D) is rather small
- Naïve Bayes Algorithm can avoid Overfitting
- Naïve Bayes Algorithm are used in Deep Learning Algorithms, which allows Deep Learning Algorithms to learn from small datasets / corpora
- Naïve Bayes Algorithm can make Probabilistic Predictions
- Instead of making a Binary Prediction i.e. 0 or 1
- Weaknesses
- Naive Bayes Algorithm assumes that Attributes / Features are independent of each other
- However, in Real world, Attributes / Features depend on each other
- Bayesian Learning Algorithms are Probabilistic Machine Learning Algorithms
- Bayesian Learning uses Bayes’ Theorem to determine the Conditional Probability of a Hypotheses (h) given a Set of Training Examples (D)
- Bayesian Learning Algorithms allow us to combine prior knowledge (or past experience) with your Set of Training Examples (D)
- Bayesian Learning Algorithms are suitable for following Machine Learning Problems
- When you want to combine prior knowledge (or past experience) with your Set of Training Examples (D)
- When you want to assign a probability to each Hypothesis (h) in the Hypothesis Space (H)
- Instead of simply accepting or rejecting a Hypothesis (h)
- When you have noisy Data
- When you have small Training Data
- In recent years, Bayesian Learning Algorithms are used in Deep Learning Algorithms, which allows Deep Learning Algorithms to learn from small datasets / corpora
- When you want to make Probabilistic Predictions
- Instead of making a Binary Prediction i.e. 0 or 1
- A Brute Force Bayesian Learning Algorithm works as follows
- Step 1: Compute Posterior Probability for all the Hypothesis (h) in Hypothesis Space (H)
- Step 2: Output Most Probable Hypothesis (a.k.a. Maximum A Posteriori (MAP) Hypothesis, hMAP)
- Bayes Optimal Classifier works as follows to calculate the Most Probable Classification of an unseen instance x
- Step 1: Calculate Posterior Probability for every Hypothesis h ∈ H
- Step 2: Calculate the prediction of each Hypothesis (h) for each new instance
- Step 3: Use the combination of Step 1 and 2. to classify each new instance
- Prediction returned by hMAP(x) is (always) not Most Probable Classification?
- To reduce the computation cost in finding the Most Probable Classification using Bayes Optimal Classifier, the Gibbs Algorithm works as follows
- Step 1: Choose one Hypothesis (h) at random, according to P(h|D)
- Step 2: Use this h to classify new unseen instances
- Naive Bayes Algorithm is a highly practical Bayesian Learning Algorithm
- Performance comparable to Artificial Neural Networks and Decision Tree Learning Algorithms on some tasks
- Naïve Bayes is very good for
Text Classification
Basics of Instance-based Learning
- Summary - Machine Learning Algorithms Studied So Far
- Machine Learning Algorithms studied so far are
- FIND-S Algorithm
- List Then Eliminate Algorithm
- Candidate Elimination Algorithm
- ID3 Algorithm (Decision Tree Learning)
- Perceptron (Two Layer Artificial Neural Network)
- Regular Neural Network (Multi-layer Artificial Neural Network)
- Naïve Bayes Algorithm (Bayesian Learning Algorithm)
- How these ML Algorithms Work?
- Given
- Set of Training Examples (D)
- Set of Functions / Hypothesis (H)
- Training Phase
- Build an explicit representation of the Target Function (called approximated Target Function) by
- Searching Hypothesis Space (H) to find a Hypothesis h (approximated Target Function), which best fits the Set of Training Examples (D)
- Testing Phase
- Use the explicit representation (approximated Target Function) to
- classify unseen instances
- Approximated Target Function is applied on an unseen instance and it predicts the Target Classification
- Note
- These Machine Learning Algorithms are also called Eager Machine Learning Algorithms
- Instance-based Machine Learning Algorithms
- How Instance-based ML Algorithms Work?
- Given
- Set of Training Examples (D)
- Training Phase
- Simply store the Set of Training Examples (D)
- Testing Phase
- To classify a new unseen instance x
- Step 1: Compare unseen instance x with all the stored Training Examples
- Step 2: Depending on relationship of unseen instance x with stored Training Examples
- Predict the Target Classification for unseen instance x
- Note
- These Machine Learning Algorithms are also called Lazy Machine Learning Algorithms
- Eager ML Algorithms vs. Lazy ML Algorithms
- Eager Machine Learning Algorithms
- Machine Learning Algorithms which build an explicit representation of the Target Function as Training Examples are presented are called Eager Machine Learning Algorithms
- Lazy Machine Learning Algorithms
- Machine Learning Algorithms which defer processing until a new unseen instance must be classified are called Lazy Machine Learning Algorithms
- Basic Idea - Instance-based Classifiers
- Basic Idea
- If there is Long Hair like a Female, Covered Head with scarf like a Female, then it’s probably a Female
- Basic Idea - Instance-based Classifiers
- Basic Idea
- If there is Long Hair like a Female, Covered Head with scarf like a Female, then it’s probably a Female
- Basic Idea - Instance-based Classifiers
- Basic Idea
- If there is Long Hair like a Female, Covered Head with scarf like a Female, then it’s probably a Female
- Basic Idea - Instance-based Classifiers
K-Nearest Neighbor (k-NN) Algorithm
- K-Nearest Neighbor (k-NN) Algorithm
- k-NN Algorithm is the grand-daddy of Instance-based Machine Learning Algorithms
- Question
- What do we mean by k nearest neighbors in k-NN Algorithm?
- Answer
- k nearest neighbors of an unseen instance x are Training Examples that have the k smallest distance to unseen instance x
- Example – k-Nearest Neighbors
- K-Nearest Neighbor (k-NN) Algorithm Cont…
- Representation of Training Examples in k-NN Algorithm
- Attribute-Value Pair
- Attribute Values can be
- Categorical
- Numeric
- Question
- How to Compute Relationship between unseen instance x and stored Training Examples (D)?
- Answer
- Situation 1
- Attribute Values are Categorical
- Similarity / Distance (or relationship) between unseen instance x and stored Training Examples can be computed using following measures
- Jaccard Similarity Co-efficient
- Hamming Distance etc.
- Situation 2
- Attribute Values are Numeric
- Similarity / Distance (relationship) between unseen instance x and stored Training Examples can be computed using following measure
- Euclidean Distance
- Cosine Similarity etc.
- Note
- Similarity = 1 – Distance
- Distance = 1 – Similarity
- k-Nearest Neighbor (k-NN) Algorithm requires three things
- Set of Training Examples (D)
- Distance Metric to compute distance between unseen instance x and Set of stored Training Examples
- Value of k
- Number of Nearest Neighbors to consider when classifying unseen instance x
- In k-Nearest Neighbor (k-NN) Algorithm, value if k should be
- Carefully chosen
- If k is too small
- Sensitive to noise points (Training Examples)
- If k is too large
- Neighborhood may include points (Training Examples) from other Classes
- Given
- Set of 15 Training Examples
- Distance Metric = Euclidean Distance
- k = 3
- Steps – Classify an unseen instance x using k-NN Algorithm
- Step 1: Compute Euclidean Distance between unseen instance x and all 15 Training Examples
- Step 2: Identify 3-nearest neighbors (as k = 3)
- Step 3: Use Class labels of k nearest neighbors to determine the Class label of unseen instance x (e.g., by taking Majority Vote)
- Scaling Issue – k-NN Algorithm
- Scaling Issue
- Attributes may have to be scaled to prevent distance measures from being dominated by one of the Attributes
- Example – Scaling Issue
- Height of a person may vary from
- 1.5m to 1.8m
- Weight of a person may vary from
- 90lb to 300lb
- Income of a person may vary from
- Rs. 10,000 to Rs. 1,000,000
- Income of a person may vary from
Applying k-NN Algorithm on Numeric Data – A Step by Step Example
- Gender Identification Problem
- Machine Lerning Problem
- Gender Identification
- Input
- Human
- Output
- Gender of a Human
- Task
- Given a Human (Input), predict the Gender of the Human (Output)
- Treated as
- Instance-based Learning Problem
- Representation of Examples
- Example = Input + Output
- Input
- Human
- Output
- Gender
- Representation of Input and Output
- Attribute-Value Pair
- Representation of Input
- Input is represented as a
- Set of 3 Input Attributes
- Height
- Weight
- Salary
- Representation of Output
- Output is represented as a
- Set of 1 Output Attribute
- Gender
- Sample Data
- We obtained a Sample Data of 7 Examples
- Scaling Issue in Sample Data
- In our Sample Data
- Values of Salary Attribute are very high compared to other two Attributes (Height and Weight)
- High value of Salary Attribute will dominate other two Attributes in computing distance
- Therefore, we need to scale / normalize values of all three Attributes in a
- Range of [0 – 1]
- Question
- How to scale / normalize Attribute Values?
- Answer
- There are many possible solutions
- A simple and widely used approach is
- Min-Max Scaling
- Min-Max Scaling Approach
- Min-Max Scaling Approach scales / normalizes a value in the
- Range of [0 – 1]
- Formula
- Where X is the Value of Attribute (A) for an instance
- Xmin is the minimum Value of Attribute (A) among all instances in Sample Data
- Xmax is the maximum Value of Attribute (A) among all instances in Sample Data
- Xnorm is the scaled / normalized value of Attribute (A) for an instance
- Applying Mix-Max Scaling on Height Attribute
- Heightmin = 4.8
- Heightmax = 6.0
- Scaling / Normalizing Value of Height for Example E1
- Xnorm = 6.0- 4.86.0- 4.8 = 1.21.2=1.00
- Scaling / Normalizing Value of Height for Example E2
- Xnorm = 4.8- 4.86.0- 4.8 = 01.2=0.00
- Scaling / Normalizing Value of Height for Example E3
- Xnorm = 5.2- 4.86.0- 4.8 = 0.41.2=0.33
- Scaling / Normalizing Value of Height for Example E4
- Xnorm = 5.3- 4.86.0- 4.8 = 0.51.2=0.42
- Scaling / Normalizing Value of Height for Example E5
- Xnorm = 5.0- 4.86.0- 4.8 = 0.21.2=0.16
- Scaling / Normalizing Value of Height for Example E6
- Xnorm = 5.7- 4.86.0- 4.8 = 0.91.2=0.75
- Scaling / Normalizing Value of Height for Example E7
- Xnorm = 5.0- 4.86.0- 4.8 = 0.21.2=0.16
- Applying Mix-Max Scaling on Weight Attribute
- Weightmin = 55
- Weightmax = 150
- Scaling / Normalizing Value of Weight for Example E1
- Xnorm = 100- 55150- 55 = 4595=0.47
- Scaling / Normalizing Value of Weight for Example E2
- Xnorm = 70- 55150- 55 = 1595=0.16
- Scaling / Normalizing Value of Weight for Example E3
- Xnorm = 80- 55150- 55 = 2595=0.26
- Scaling / Normalizing Value of Weight for Example E4
- Xnorm = 150- 55150- 55 = 9595=1.00
- Scaling / Normalizing Value of Weight for Example E5
- Xnorm = 75- 55150- 55 = 2095=0.21
- Scaling / Normalizing Value of Weight for Example E6
- Xnorm = 125- 55150- 55 = 7095=0.74
- Scaling / Normalizing Value of Weight for Example E7
- Xnorm = 55- 55150- 55 = 095=0.00
- Applying Mix-Max Scaling on Salary Attribute
- Salarymin = 20000
- Salarymax = 100000
- Scaling / Normalizing Value of Salary for Example E1
- Xnorm = 20000- 20000100000- 20000 = 080000=0.00
- Scaling / Normalizing Value of Salary for Example E2
- Xnorm = 30000- 20000100000- 20000 = 1000080000=0.12
- Scaling / Normalizing Value of Salary for Example E3
- Xnorm = 80000- 20000100000- 20000 = 6000080000=0.75
- Scaling / Normalizing Value of Salary for Example E4
- Xnorm = 100000- 20000100000- 20000 = 8000080000=0.00
- Scaling / Normalizing Value of Salary for Example E5
- Xnorm = 50000- 20000100000- 20000 = 3000080000=0.37
- Scaling / Normalizing Value of Salary for Example E6
- Xnorm = 40000- 20000100000- 20000 = 2000080000=0.25
- Scaling / Normalizing Value of Salary for Example E7
- Xnorm = 60000- 20000100000- 20000 = 4000080000=0.50
- Sample Data after Scaling / Normalization
- Note
- Values of all three Attributes are scaled / normalized in a
- Range of [0 – 1]
- Split the Sample Data
- We split the Sample Data using Random Split Approach into
- Training Data – 2 / 3 of Sample Data
- Testing Data – 1 / 3 of Sample Data
- Training Data
- Testing Data
- Machine Learning Cycle
- Four phases of a Machine Learning Cycle are
- Training Phase
- Build the Model using Training Data
- Testing Phase
- Evaluate the performance of Model using Testing Data
- Application Phase
- Deploy the Model in Real-world, to make prediction on Real-time Unseen Data
- Feedback Phase
- Take Feedback from the Users and Domain Experts to improve the Model
- Training Phase
- In the Training Phase, k-NN Algorithm will
- Store the Set of 5 Training Examples
- Recall
- Training Data
- Model
- Stored Set of 5 Training Examples
- In the next Phase i.e. Testing Phase, Insha Allah, we will
- Evaluate the performance of the Model
- Testing Phase
- Question
- How good Model has learned?
- Answer
- Evaluate the performance of the model on unseen data (or Testing Data)
- Evolution Measures
- Evaluation will be carried out using
- Error measure
- Error
- Definition
- Error is defined as the proportion of incorrectly classified Test instances
- Formula
- Note
- Accuracy = 1 – Error
- Evaluate Model
- Given
- Model
- Set of 5 Training Examples
- Distance Measure
- Euclidean Distance
- k Nearest Neighbors
- k = 3
- Classify Test Examples (unseen instances) using k-NN Algorithm
- For each Test Example (unseen instance) E
- Step 1: Compute Euclidean Distance between unseen instance E and all 5 Training Examples
- Step 2: Identify 3-nearest neighbors (as k = 3)
- Step 3: Use Class labels of 3 nearest neighbors to determine the Class label of unseen instance E (e.g., by taking Majority Vote)
- Euclidean Distance
- Definition
- The distance between two Points (Examples) is simply the Absolute Value of the difference between their Attribute Values
- Formula
- Where d represents Point 01 (Example 01)
- q represents Point 02 (Example 02)
- Example 1 – Computing Euclidean Distance for Unnormalized Data
- Consider the following two (Unnormalized) Training Examples
- Euclidean Distance between Training Examples E1 and E2 is :
- Example 2 – Computing Euclidean Distance for normalized Data
- Consider the following two (Normalized) Training Examples
- Euclidean Distance between Instances E1 and E2 is
- Comparison of Normalized and Unnormalized Data
- Considering the previous two Examples
- Euclidean Distance for Unnormalized Data
- 10000.04
- Euclidean Distance for Normalized Data
- 1.05
- Conclusion
- Normalization / Scaling has a significant impact on the
- Calculation of Euclidean Distance
- Predicting Class of Text Example E6 using k-NN Algorithm
- Step 1: Compute Euclidean Distance between Text Example E6 and all 5 Training Examples
- Computing Euclidean Distance Training Example E1 and Test Example E6
- Euclidean Distance Training Example E1 and Test Example E6
- Computing Euclidean Distance Training Example E2 and Test Example 6
- Euclidean Distance Training Example E2 and Test Example E6
- Computing Euclidean Distance Training Example E3 and Test Example E6
- Euclidean Distance Training Example E3 and Test Example E6
- Computing Euclidean Distance Training Example E4 and Test Example E6
- Euclidean Distance Training Example E4 and Test Example E6
- Computing Euclidean Distance Training Example E5 and Test Example E6
- Euclidean Distance Training Example E5 and Test Example E6
- Step 2: Identify k-Nearest Neighbors
- Identify 3-nearest neighbors (as k = 3)
- Distance of Test Example E6 with all 5 Training Examples
- d (E6, E1) = 0.44
- d (E6, E2) = 0.95
- d (E6, E3) = 0.81
- d (E6, E4) = 0.48
- d (E6, E5) = 0.80
- 3-Nearest Neighbors
- Male (Votes) = 2
- Female (Votes) = 1
- Prediction of Test Example E6 (using Majority Vote)
- Male
- Remember
- The best way to learn anything is to
- Do It Yourself 😊
- Whenever you want to learn any concept, Learn it
- Step by Step
- Simple to Complex
- Using a Detailed Example
- Predicting Class of Text Example E7 using k-NN Algorithm
- Step 1: Compute Euclidean Distance between Text Example E7 and all 5 Training Examples
- Computing Euclidean Distance Training Example E1 and Test Example E7
- Euclidean Distance Training Example E1 and Test Example E7
- Computing Euclidean Distance Training Example E2 and Test Example E7
- Euclidean Distance Training Example E2 and Test Example E7
- Computing Euclidean Distance Training Example E3 and Test Example E7
- Euclidean Distance Training Example E3 and Test Example E7
- Computing Euclidean Distance Training Example E4 and Test Example E7
- Euclidean Distance Training Example E4 and Test Example E7
- Computing Euclidean Distance Training Example E5 and Test Example E7
- Euclidean Distance Training Example E5 and Test Example E7
- Step 2: Identify k-Nearest Neighbors
- Identify 3-nearest neighbors (as k = 3)
- Distance of Test Example E7 with all 5 Training Examples
- d (E7, E1) = 1.08
- d (E7, E2) = 0.43
- d (E7, E3) = 0.39
- d (E7, E4) = 1.26
- d (E7, E5) = 0.24
- 3-Nearest Neighbors
- Male (Votes) = 0
- Female (Votes) = 3
- Prediction of Test Example E7 (using Majority Vote)
- Female
- Computing Error
- Application Phase
- We assume that our Model
- performed well on large Test Data and can be deployed in Real-world
- Model is deployed in the Real-world and now we can make
- predictions on Real-time Data
- Steps – Making Predictions on Real-time Data
- Step 1: Take Input from User
- Step 2: Convert User Input into Feature Vector
- Exactly same as Feature Vectors of Training and Testing Data
- Step 3: Convert User Input into Normalized Feature Vector
- Step 4: Compute Euclidean Distance between Unseen Example and all 5 Training Examples
- Step 5: Identify k Nearest Neighbors
- Step 6: Use Class labels of k nearest neighbors to determine the Class label of unseen instance E (e.g., by taking Majority Vote)
- Making Predictions on Real-time Data
- Step 1: Take input from User
- Step 2: Conver User Input into Feature Vector
- Note that order of Attributes must be exactly same as that of Training and Testing Examples
- Step 3: Conver User Input into Normalized Feature Vector
- Note that order of Attributes must be exactly same as that of Training and Testing Examples
- Scaling / Normalizing Value of Height for Unseen Example
Xnorm = 5.2- 4.86.0- 4.8 = 0.41.2=0.34
- Scaling / Normalizing Value of Weight for Unseen Example
Xnorm = 70- 55150- 55 = 1595=0.15
- Scaling / Normalizing Value of Salary for Unseen Example
Xnorm = 30000- 20000100000- 20000 = 1000080000=0.12
- Step 4: Compute Euclidean Distance between Unseen Example and all 5 Training Examples
- Computing Euclidean Distance Training Example E1 and Unseen Example
- Euclidean Distance Training Example E1 and Unseen Example
- Computing Euclidean Distance Training Example E2 and Unseen Example
- Euclidean Distance Training Example E2 and Unseen Example
- Computing Euclidean Distance Training Example E3 and Unseen Example
- Euclidean Distance Training Example E3 and Unseen Example
- Computing Euclidean Distance Training Example E4 and Unseen Example
- Euclidean Distance Training Example E4 and Unseen Example
- Computing Euclidean Distance Training Example E5 and Unseen Example
- Euclidean Distance Training Example E5 and Unseen Example
- Step 5: Identify k Nearest Neighbors
- Identify 3-nearest neighbors (as k = 3)
- Distance of Unseen Example with all 5 Training Examples
- d (E, E1) = 0.74
- d (E, E2) = 0.34
- d (E, E3) = 0.63
- d (E, E4) = 0.85
- d (E, E5) = 0.31
- 3-Nearest Neighbors
- Male (Vote) = 0
- Female (Vote) = 3
- Step 6: Use Class labels of k nearest neighbors to determine the Class label of unseen instance E (e.g., by taking Majority Vote)
- Unseen Example E is predicted as
- Female
- Note
- You can take Input from user, apply Model and return predictions as many times as you like 😊
- Feedback Phase
- Only Allah is Perfect 😊
- Take Feedback on your deployed Model from
- Domain Experts and
- Users
Improve your Model based on Feedback 😊
TODO and Your Turn
Todo Tasks
Your Turn Tasks
Todo Tasks
TODO Task 1
- Task 1
- Consider the Titanic Dataset with the following Attributes
- Gender (Numeric)
- Ticket Class (Numeric)
- Parent/Child Abroad: (Numeric)
- Annual Income: (Numeric)
- Survival: No, Yes
- We obtained the following Sample Data
- Sample Data was split into Training and Testing Data in a
- Train-Test Split Ratio of 67%-33%
- Training Data
- Testing Data
- Note
- Consider k-NN Algorithm when answering questions given below
- Your answer should be
- Well Justified
- Questions
- Write down the Input and Output for the above Machine Learning Problem?
- How Training Example is represented?
- Execute the Machine Learning Cycle using Normalized Sample Data using Euclidean Distance and Min-Max Scaling?
- Classify Test Examples using k = 1, 3 and 5
- What is the impact of changing the value of k on classification of Test Examples?
- Execute the Machine Learning Cycle using Normalized Sample Data using Cosine Similarity Measures and Min-Max Scaling?
- Classify Test Examples using k = 1, 3 and 5
- What is the impact of changing the value of k on classification of Test Examples?
- Compare results obtained (on Test Data) using Cosine Similarity Measure and Euclidean Distance?
Your Turn Tasks
Your Turn Task 1
- Task 1
- Select a Machine Learning Problem (similar to: Titanic – Machine Learning from Disaster) and answer the questions given below.
- Note
- Consider k-NN Algorithm in answering all the questions.
- Questions
- Write down the Input and Output for the above Machine Learning Problem?
- How Training Example is represented?
- Execute the Machine Learning Cycle using Normalized Sample Data using Euclidean Distance and Min-Max Scaling?
- Classify Test Examples using k = 1, 3 and 5
- What is the impact of changing the value of k on classification of Test Examples?
- Execute the Machine Learning Cycle using Normalized Sample Data using Cosine Similarity Measures and Min-Max Scaling?
- Classify Test Examples using k = 1, 3 and 5
- What is the impact of changing the value of k on classification of Test Examples?
- Compare results obtained (on Test Data) using Cosine Similarity Measure and Euclidean Distance?
Applying k-NN Algorithm on Categorical Data – A Step by Step Example
- Gender Identification Problem
- Machine Learning Problem
- Gender Identification
- Input
- Human
- Output
- Gender of a Human
- Task
- Given a Human (Input), predict the Gender of the Human (Output)
- Treated as
- Instance-based Learning Problem
- Representation of Examples
- Example = Input + Output
- Input
- Human
- Output
- Gender
- Representation of Input and Output
- Attribute-Value Pair
- Representation of Input
- Input is represented as a
- Set of 6 Input Attributes
- Height
- Weight
- HairLength
- HeadCovered
- WearingChain
- ShirtSleeves
- Representation of Output
- Output is represented as a
- Set of 1 Output Attribute
- Gender
- TIP
- To become a great person
- Always respect
- Children
- Woman
- Old People
- Olama.e.Karam
- Sample Data
- We obtained a Set of 7 Examples
- Split the Sample Data
- We split the Sample Data using Random Split Approach into
- Training Data – 2 / 3 of Sample Data
- Testing Data – 1 / 3 of Sample Data
- Training Data
- Testing Data
- Machine Learning Cycle
- Four phases of a Machine Learning Cycle are
- Training Phase
- Build the Model using Training Data
- Testing Phase
- Evaluate the performance of Model using Testing Data
- Application Phase
- Deploy the Model in Real-world, to make prediction on Real-time Unseen Data
- Feedback Phase
- Take Feedback from the Users and Domain Experts to improve the Model
- Training Phase
- In the Training Phase, k-NN Algorithm will
- Store the Set of 5 Training Examples
- Recall
- Training Data
- Model
- Stored Set of 5 Training Examples
- In the next Phase i.e. Testing Phase, Insha Allah, we will
- Evaluate the performance of the Model
- Testing Phase
- Question
- How good Model has learned?
- Answer
- Evaluate the performance of the model on unseen data (or Testing Data)
- Evolution Measures
- Evaluation will be carried out using
- Error measure
- Error
- Definition
- Error is defined as the proportion of incorrectly classified Test instances
- Formula
- Note
- Accuracy = 1 – Error
- Evaluate Model
- Given
- Model
- Set of 5 Training Examples
- Distance Measure
- Hamming Distance
- k Nearest Neighbors
- k = 3
- Classify Test Examples (unseen instances) using k-NN Algorithm
- For each Test Example (unseen instance) E
- Step 1: Compute Hamming Distance between unseen instance E and all 5 Training Examples
- Step 2: Identify 3-nearest neighbors (as k = 3)
- Step 3: Use Class labels of 3 nearest neighbors to determine the Class label of unseen instance E (e.g., by taking Majority Vote)
- Hamming Distance
- Definition
- Hamming Distance between two instances with same number of Attributes is the Number of Attributes for which
- Attribute-Values of two Attributes are different
- Example
- Hamming Distance between Instances E1 and E2 is
- d (E1, E2) = 1
- Predicting Class of Text Example E6 using k-NN Algorithm
- Step 1: Compute Hamming Distance between Text Example E6 and all 5 Training Examples
- Computing Hamming Distance between Training Example E1 and Test Example E6
- Hamming Distance between Training Example E1 and Test Example E6
- d (E6, E1) = 3
- Computing Hamming Distance between Training Example E2 and Test Example E6
- Hamming Distance between Training Example E2 and Test Example E6
- d (E6, E2) = 4
- Computing Hamming Distance between Training Example E3 and Test Example E6
- Hamming Distance between Training Example E3 and Test Example E6
- d (E6, E3) = 3
- Computing Hamming Distance between Training Example E4 and Test Example E6
- Hamming Distance between Training Example E4 and Test Example E6
- d (E6, E4) = 4
- Computing Hamming Distance between Training Example E5 and Test Example E6
- Hamming Distance between Training Example E5 and Test Example E6
- d (E6, E5) = 4
- Step 2: Identify k-Nearest Neighbors
- Identify 3-nearest neighbors (as k = 3)
- Distance of Test Example E6 with all 5 Training Examples
- d (E6, E1) = 3
- d (E6, E2) = 4
- d (E6, E3) = 3
- d (E6, E4) = 4
- d (E6, E5) = 4
- 3-Nearest Neighbors
- Male (Vote) = 1
- Female (Vote) = 2
- Step 3: Use Class labels of 3 nearest neighbors to determine the Class label of unseen instance E (e.g., by taking Majority Vote)
- Prediction of Test Example E6 (using Majority Vote)
- Female
- Remember
- The best way to learn anything is to
- Do It Yourself 😊
- Whenever you want to learn any concept, Learn it
- Step by Step
- Simple to Complex
- Using a Detailed Example
- Predicting Class of Text Example E7 using k-NN Algorithm
- Step 1: Compute Hamming Distance between Text Example E7 and all 5 Training Examples
- Computing Hamming Distance between Training Example E1 and Test Example E7
- Hamming Distance between Training Example E1 and Test Example E7
- d (E7, E1) = 3
- Computing Hamming Distance between Training Example E2 and Test Example E7
- Hamming Distance between Training Example E2 and Test Example E7
- d (E7, E2) = 2
- Computing Hamming Distance between Training Example E3 and Test Example E7
- Hamming Distance between Training Example E3 and Test Example E7
- d (E7, E3) = 4
- Computing Hamming Distance between Training Example E4 and Test Example E7
- Hamming Distance between Training Example E4 and Test Example E7
- d (E7, E4) = 2
- Computing Hamming Distance between Training Example E5 and Test Example E7
- Hamming Distance between Training Example E5 and Test Example E7
- d (E7, E5) = 4
- Step 2: Identify k-Nearest Neighbors
- Identify 3-nearest neighbors (as k = 3)
- Distance of Test Example E7 with all 5 Training Examples
- d (E7, E1) = 3
- d (E7, E2) = 2
- d (E7, E3) = 3
- d (E7, E4) = 2
- d (E7, E5) = 4
- 3-Nearest Neighbors
- Male (Vote) = 0
- Female (Vote) = 3
- Step 3: Use Class labels of 3 nearest neighbors to determine the Class label of unseen instance E (e.g., by taking Majority Vote)
- Prediction of Test Example E7 (using Majority Vote)
- Female
- Computing Error
- Application Phase
- We assume that our Model
- performed well on large Test Data and can be deployed in Real-world
- Model is deployed in the Real-world and now we can make
- predictions on Real-time Data
- Steps – Making Predictions on Real-time Data
- Step 1: Take Input from User
- Step 2: Convert User Input into Feature Vector
- Exactly same as Feature Vectors of Training and Testing Data
- Step 3: Compute Hamming Distance between Unseen Example and all 5 Training Examples
- Step 4: Identify k Nearest Neighbors
- Step 5: Use Class labels of k nearest neighbors to determine the Class label of unseen instance E (e.g., by taking Majority Vote)
- Making Predictions on Real-time Data
- Step 1: Take input from User
- Step 2: Conver User Input into Feature Vector
- Note that order of Attributes must be exactly same as that of Training and Testing Examples
- Making Predictions on Real-time Data
- Step 3: Compute Hamming Distance between Unseen Example and all 5 Training Examples
- Computing Hamming Distance between Training Example E1 and Unseen Example
- Hamming Distance between Training Example E1 and Unseen Example
- d (E, E1) = 1
- Computing Hamming Distance between Training Example E2 and Unseen Example
- Hamming Distance between Training Example E2 and Unseen Example
- d (E, E2) = 0
- Computing Hamming Distance between Training Example E3 and Unseen Example
- Hamming Distance between Training Example E3 and Unseen Example
- d (E, E3) = 2
- Computing Hamming Distance between Training Example E4 and Unseen Example
- Hamming Distance between Training Example E4 and Unseen Example
- d (E, E4) = 2
- Computing Hamming Distance between Training Example E5 and Unseen Example
- Hamming Distance between Training Example E5 and Unseen Example
- d (E, E5) = 2
- Step 4: Identify k-Nearest Neighbors
- Identify 3-nearest neighbors (as k = 3)
- Distance of Unseen Example with all 5 Training Examples
- d (E, E1) = 1
- d (E, E2) = 0
- d (E, E3) = 2
- d (E, E4) = 2
- d (E, E5) = 2
- 3-Nearest Neighbors
- Male (Vote) = 1
- Female (Vote) = 2
- Step 5: Use Class labels of 3 nearest neighbors to determine the Class label of unseen instance E (e.g., by taking Majority Vote)
- Unseen example is predicted as
- Female
- Note
- You can take Input from user, apply Model and return predictions as many times as you like 😊
- Feedback Phase
- Only Allah is Perfect 😊
- Take Feedback on your deployed Model from
- Domain Experts and
- Users
- Improve your Model based on Feedback 😊
- Strengths and Weaknesses – k-NN Algorithm
- Strengths
- Very simple implementation
- Robust regarding the search space
- For example, Classes don’t have to be linearly separable
- k-NN Algorithm can be easily updated with new Training Examples (Online Learning) at very little cost
- Need to tune only two main parameters
- Distance Metric and
- Value of k
- Weaknesses
- Testing each instance is expensive
- Possible Solution
- R-tree or kd-tree approaches can be used to speed up the identification of k nearest neighbors
- Sensitive to noisy or irrelevant Attributes, which can result in less meaningful Distance scores
- Possible Solution
- Scaling and/or Feature Selection can be used to resolve this issue
- Sensitiveness to highly unbalanced datasets
- Possible Solution
- Use a balanced dataset for k-NN Algorithm
- Offline Learning vs. Online Learning
- Offline Learning
- You learn a Concept from a static dataset
- Online Learning
- You learn a Concept from data and keep on updating your Hypothesis (h) as more data is available for the same Concept
TODO and Your Turn
Todo Tasks
Your Turn Tasks
Todo Tasks
TODO Task 2
- Task 1
- Consider the Titanic Dataset with the following Attributes
- Gender: Male, Female
- Ticket Class: Upper, Middle, Lower
- Parent/Child Abroad: Zero, One, Two, Three
- Embarked: Cherbourg, Queenstown, Southampton
- Survival: No, Yes
- We obtained the following Sample Data
- Sample Data was split into Training and Testing Data in a
- Train-Test Split Ratio of 80%-20%
Training Data
- Testing Data
- Note
- Consider k-NN Algorithm when answering questions given below
- Your answer should be
- Well Justified
- Questions
- Write down the Input and Output for the above Machine Learning Problem?
- How Training Example is represented?
- Execute the Machine Learning Cycle using Hamming Distance measure?
- Classify Test Examples using k = 1, 3 and 5
- What is the impact of changing the value of k on classification of Test Examples?
Your Turn Tasks
Your Turn Task 2
- Task 1
- Select a Machine Learning Problem (similar to: Titanic – Machine Learning from Disaster) and answer the questions given below.
- Note
- Consider k-NN Algorithm in answering all the questions.
- Questions
- Write down the Input and Output for the above Machine Learning Problem?
- How Training Example is represented?
- Execute the Machine Learning Cycle using Hamming Distance measure?
- Classify Test Examples using k = 1, 3 and 5
- What is the impact of changing the value of k on classification of Test Examples?
Chapter Summary
- Chapter Summary
- Machine Learning Algorithms studied so far are
- FIND-S Algorithm
- List Then Eliminate Algorithm
- Candidate Elimination Algorithm
- ID3 Algorithm (Decision Tree Learning)
- Perceptron (Two Layer Artificial Neural Network)
- Regular Neural Network (Multi-layer Artificial Neural Network)
- Naïve Bayes Algorithm (Bayesian Learning Algorithm)
- The above-mentioned ML Algorithms are called Eager Methods
- How Eager ML Algorithms Work?
- Given
- Set of Training Examples (D)
- Set of Functions / Hypothesis (H)
- Training Phase
In Next Chapter
- In Next Chapter
- In Sha Allah, in next Chapter, I will present
- Evaluating Hypothesis (Models)