Chapter 9 - Version Space Algorithms
Chapter Outline
- Chapter Outline
- Quick Recap
- Version Space
- List Then Eliminate Algorithm
- Candidate Elimination Algorithm
- Chapter Summary
Quick Recap
- Quick Recap – FIND-S Algorithm
- Learning is a Searching Problem
- Research is defined as a systematic investigation of sources and materials , to develop solutions for Real-world Problems to improve the quality (peace and happiness) of human life
- A Researcher is a man of character and (s)he should safeguard his character
- The word Research is made up of
- Re and
- search
- Therefore, in Research, we mainly refine the solution(s) proposed for a Real-world Problem
- The main steps of a Research Cycle are as follows
- Step 1: Identify the Real-world Problem
- Step 2: Propose Solution (called Solution 01) to solve the Real-world Problem
- Step 3: List down Strengths and Weaknesses of Solution 01
- Step 4: Propose Solution (called Solution 02) to
- further strengthen the Strengths of Solution 01
- overcome limitations of Solution 01
- Step 5: List down Strengths and Weaknesses of Solution 02
- Step 6: Propose Solution (called Solution 03) to
- further strengthen the Strengths of Solution 02
- overcome limitations of Solution 02
- Step 7: Continue this cycle till the Day of Judgment
- Considering FIND-S Algorithm
- Input to Learner (FIND-S Algorithm)
- Set of Training Examples (D)
- Set of Functions / Hypothesis (H)
- Output by Learner (FIND-S Algorithm)
- A Hypothesis (h) from H which best fits the Training Examples (D)
- Note that h is an approximation of Target Function
- Inductive Bias Is the set of assumptions needed in addition to Training Examples to justify Deductively Learner’s Classification
- FIND-S Algorithm – Summary
- Representation of Example
- Attribute-Value Pair
- Representation of Hypothesis (h)
- Conjunction (AND) of Constraints on Attributes
- Searching Strategy
- I am not clear about this. Please drop me an email if you know.
- Training Regime
- Incremental Method
- Inductive Bias of FIND-S Algorithm
- Training Data is error-free
- Target Function / Concept is present in the Hypothesis Space (H)
- Strengths
- Returns a Model (h), which can be used to make predictions on unseen data
- Weaknesses
- Only works on error-free Data
- However, Real-world Data is noisy
- Works on assumption that Target Function is present in the Hypothesis Space (H)
- However, we may / may not find the Target Function in the Hypothesis Space (H) and this may / may not be known
- Only returns one hypothesis which best fits the Training Data
- However, there can be multiple hypothesis , which best fit the Training Data
Version Space
- Recap - FIND-S Algorithm
- Strengths
- Returns a Model (h), which can be used to make predictions on unseen data
- Weaknesses
- Only works on error free Data
- However, Real-world Data is noisy
- Works on assumption that Target Function f is present in the Hypothesis Space (H)
- However, we may / may not find the Target Function f in the Hypothesis Space (H) and this may / may not be known
- Only returns one hypothesis which best fits the Training Data
- However, there can be multiple hypothesis, which best fit the Training Data
- Main Problem in FIND-S Algorithm
- Problem
- Only returns one hypothesis which best fits the Training Data
- However, there can be multiple hypothesis, which best fit the Training Data
- Proposed Solution
- Version Space
- Version Space vs Version Space Algorithm
- Version Space (VSH,D)
- Version Space (VSH,D) contains set of all hypothesis consistent with the Training Examples
- Version Space Algorithm
- Version Space Algorithm computes the set of all hypothesis consistent with the Training Examples
- Version Space Algorithms
- In this Chapter, In sha Allah we will study two Version Space Algorithms
- List Then Eliminate Algorithm
- Candidate Elimination Algorithm
- Gender Identification Problem
- In this Chapter, we will use the same Gender Identfication Probelm presented in
- Chapter 8 – FIND-S Algorithm
- Representation of Training Example (d) and Hypothesis (h)
- Representation of Training Example (d)
- Attribute-Value Pair
- Representation of Hypothesis (h)
- Conjunction of Constraints on Input Attributes
- Sample Data – Vector Representation
- Vector Representation of Examples
x1= <Short, Light, Short, Yes, Yes, Half> +
x2= <Short, Light, Long, Yes, Yes, Half> +
x3 = <Tall, Heavy, Long, Yes, Yes, Full> –
x4 = <Short, Light, Long, Yes, No, Full> +
x5 = <Short, Light, Short, Yes, Yes, Half> –
x6 = <Tall, Light, Short, No, Yes, Full> –
- Training Data – Vector Representation
- Vector Representation of Training Examples
x1= <Short, Light, Short, Yes, Yes, Half> +
x2= <Short, Light, Long, Yes, Yes, Half> +
x3 = <Tall, Heavy, Long, Yes, Yes, Full> –
x4 = <Short, Light, Long, Yes, No, Full> +
- Testing Data – Vector Representation
- Vector Representation of Test Examples
x5 = <Short, Light, Short, Yes, Yes, Half> –
x6 = <Tall, Light, Short, No, Yes, Full> –
List Then Eliminate Algorithm
- Machine Learning Cycle
- Four phases of a Machine Learning Cycle are
- Training Phase
- Build the Model using Training Data
- Testing Phase
- Evaluate the performance of Model using Testing Data
- Application Phase
- Deploy the Model in Real-world, to make prediction on Real-time unseen Data
- Feedback Phase
- Take Feedback form the Users and Domain Experts to improve the Model
- Note
- For List Then Eliminate Algorithm, In sha Allah I will only execute the
- Training Phase 😊
- Hyptoehsis Sapce (H)
- Consider a Hyptoehsis Sapce (H) of 15 Hypothesis (h)
- H = {h1, h2, h3, h4, h5, h6, h7, h8, h9, h10, h11, h12, h13, h14, h15}
- In sha Allah, in next Slides, I will try to explain how to use List then Eliminate Algoirthm to find out
- All hypotheis that are consistent with the Training Data
- i.e. Version Space
- List Then Eliminate Algorithm
- Training Phase
- Version Space ← a list containing every hypothesis in H
- VSH,D = {h1, h2, h3, h4, h5, h6, h7, h8, h9, h10, h11, h12, h13, h14, h15}
- Training Phase, Cont…
- Remove from Version Space any hypothesis that is inconsistent with the Training Example h(x) ≠ c(x)
- Training Example x1
- h1, h2, h3 are inconsistent with Training Example x1
- VSH,D = { h4, h5, h6, h7, h8, h9, h10, h11, h12, h13, h14, h15}
- Training Example x2
- h14, h15 are inconsistent with Training Example x2
- VSH,D = { h4, h5, h6, h7, h8, h9, h10, h11, h12, h13}
- Training Example x3
- h6, h9, h12 are inconsistent with Training Example x3
- VSH,D = { h4, h5, h7, h8, h10, h11, h13}
- Training Example x4
- h5, h11 are inconsistent with Training Example x4
- VSH,D = { h4, h7, h8, h10, h13}
- Training Phase, Cont…
- Output the list of hypotheses in VersionSpace
- The list of hypotheses in VersionSpace are
- VSH,D = { h4, h7, h8, h10, h13}
- Conclusion
- List Then Eliminate Algorithm returned
- 5 Hypotheses that are consistent with the Training Data i.e. Version Space (VSH,D)
- Inductive Bias - List Then Eliminate Algorithm
- Inductive Bias
- Inductive Bias is the set of assumptions needed in addition to Training Examples to justify Deductively Learner’s Classification
- Inductive Bias of List Then Eliminate Algorithm
- Training data is error free
- Target Function / Concept is present in the Hypothesis Space (H)
- Training Regime- List Then Eliminate Algorithm
- Incremental Method
- See Chapter 2 – Basics of Machine Learning
- Strengths and Weaknesses – List Then Eliminate Algorithm
- Strengths
- List Then Eliminate Algorithm overcomes the limitation of FIND-S Algorithm and
- returns multiple hypothesis, which best fits the Training Data i.e. Version Space (VSH,D )
- Weaknesses
- Only works on error free Data
- However, Real-world Data is noisy
- Works on assumption that Target Function is present in the Hypothesis Space (H)
- However, we may / may not find the Target Function f in the Hypothesis Space (H) and this may / may not be known
- List Then Eliminate Algorithm is computationally expensive because it makes a
- pairwise comparison of each Training Example with all hypothesis in Hypothesis Space (H)
- Why List Then Eliminate Algorithm is Computationally Expensive?
- Question
- Why List Then Eliminate Algorithm is computationally expensive?
- Answer
- List Then Eliminate Algorithm is computationally expensive because
- Hypothesis Space (H) is represented as a List
- Representing Hypothesis Space (H) as a List, forces
- List Then Eliminate Algorithm to make pairwise comparisons
- Conclusion
- To reduce the computational cost of Version Space Algorithms
- Have a more compact representation of Hypothesis Space (H)
- To reduce the computational cost of Version Space Algorithms
Candidate Elimination Algorithm
- Limitation of List Then Eliminate Algorithm
- Problem
- List Then Eliminate Algorithm is computationally expensive because
- representation of Hypothesis Space (H) is not compact
- Proposed Solution
- Represent Hypothesis Space (H) as
- General to Specific Ordering of Hypothesis
- Representation of Hypothesis Space (H) in Candidate Elimination Algorithm
- In Candidate Elimination Algorithm, Hypothesis Space (H) is represented as
- General to Specific Ordering of Hypothesis
- Representation of Hypothesis Space (H) in Candidate Elimination Algorithm
- General to Specific Ordering of Hypothesis
- Machine Learning Cycle
- Four phases of a Machine Learning Cycle are
- Training Phase
- Build the Model using Training Data
- Testing Phase
- Evaluate the performance of Model using Testing Data
- Application Phase
- Deploy the Model in Real-world, to make prediction on Real-time unseen Data
- Feedback Phase
- Take Feedback form the Users and Domain Experts to improve the Model
- Training Phase – Candidate Elimination Algorithm
- Training Phase
- Representing Version Space (VSH,D)
- The General Boundary G of Version Space VSH,D is the set of maximally general members (or hypotheses)
- The Specific Boundary S of Version Space VSH,D is the set of maximally specific members (or hypotheses)
- Every member (or hypotheses) of the Version Space VSH,D lies between these boundaries
- After observing all the Training Examples, the Candidate Elimination Algorithm will
- Output Version Space (comprising of 6 hypothesis)
- VSH,D
- <Short, Light, ?, Yes, ?, ?>
- <Short, ?, ?, Yes, ?, ?>
- <Short, Light, ?, ?, ?, ?>
- <?, Light, ?, Yes, ?, ?>
- <Short, ?, ?, ?, ?, ?>
- <?, Light, ?, ?, ?, ?>
- Note that all 6 hypotheses in the Version Space (VSH,D) are an approximation of the Target Function f
- Recall
- Training Data
- x1= <Short, Light, Short, Yes, Yes, Half> +
- x2= <Short, Light, Long, Yes, Yes, Half> +
- x3 = <Tall, Heavy, Long, Yes, Yes, Full> –
- x4 = <Short, Light, Long, Yes, No, Full> +
- 6 Models in the Version Space VSH,D
- <Short, Light, ?, Yes, ?, ?>
- <Short, ?, ?, Yes, ?, ?>
- <Short, Light, ?, ?, ?, ?>
- <?, Light, ?, Yes, ?, ?>
- <Short, ?, ?, ?, ?, ?>
- <?, Light, ?, ?, ?, ?>
- Models – in the form of Rules
- In the next phase i.e. Testing Phase, we will
- Evaluate the performance of the Model(s)
- Testing Phase
- Question
- How good Model(s) has learned?
- Answer
- Evaluate the performance of the Model(s) on unseen data (or Testing Data)
- Evolution Measures
- Evaluation will be carried out using
- Error measure
- Error
- Definition
- Error is defined as the proportion of incorrectly classified Test instances
- Note
- Accuracy=1-Error
- Testing Phase, Cont…
- Two Approaches for Testing Phase
- Single Model
- Ensemble Model
- Single Model
- A Single Model is trained on the Training Data and used to make predictions on the Test Data
- For Candidate Elimination Algorithm, we can
- Randomly select a Model (or h) from the VSH,D and use it to make predictions on the Test Data
- Strengths
- It is computationally fast and requires less time to make predictions compared to Ensemble Model
- Weaknesses
- Error is likely to be high compared to Ensemble Model
- Ensemble Model
- An Ensemble Model works by training different Models on the same Training Data and using each Model to individually make predictions on the Test Data
- For Candidate Elimination Algorithm, we can
- Select all 6 Models (or h) from the VSH,D (Ensemble Model) and use them to make predictions on the Test Data
- Strengths
- Error is likely to be low compared to Single Model
- Weaknesses
- It is computationally expensive and requires more time to make predictions compared to Single Model
- Problem
- How to combine predictions of different Models to make a Final Prediction
- A Possible Solution
- Use Voting Approach
- Voting Approach
- Voting Approach
- Voting is one of the simplest approaches to combine predictions from different Models
- Model Weight
- In Voting Approach, we may assign
- Same weight to all Models
- Different weights to all Models
- Steps – How Voting Approach Works?
- Given a Test instance (x)
- Step 1: Make individual predictions using different Models
- Step 2: Combine predictions of individual Models
- Step 3: Final Prediction of x will be the class which has the majority vote
- Example 1 – Voting Approach
- Assumption
- All Models have the same weight i.e. 1
- Consider a Binary Classification Problem
- Class 1 = Yes
- Class 2 = No
- Five different Models are trained on same Training Data
- Question
- What will be the Final Prediction of Test Instance (x) using Ensemble Model?
- Using Voting Approach to Make Predictions on Text Instance (x)
- Step 1: Make individual predictions using different Models
- Step 2: “Combine predictions” of “individual” Models
- No. of Votes for Yes Class = 1 +1 +1 = 3
- No. of Votes for No Class = 1 + 1 = 2
- Step 3: “Final Prediction of x” will be the “class” which has the majority vote
- Majority Vote = Yes
- Final Prediction of x
- Yes
- Note
- For Binary Classification Problem with same weight
- Use “odd” number of Models i.e. 3, 5, 7, 11 etc. 😊
- Example 2 – Voting Approach
- Assumption
- All Models have “different” weights
- Consider a Binary Classification Problem
- Class 1 = Yes
- Class 2 = No
- Five “different” Models are trained on “same” Training Data
- Question
- What will be the “Final Prediction” of Test Instance (x) using Ensemble Model?
- Using Voting Approach to Make Predictions on Text Instance (x)
- Step 1: Make individual predictions using different Models
- Step 2: Combine predictions of individual Models
- No. of Weighted Votes for Yes Class = 0.2 + 0.6 + 0.4 = 1.2
- No. of Weighted Votes for No Class = 1.0 + 0.8 = 1.8
- Step 3: Final Prediction of x will be the class which has the majority vote
- Majority Vote = No
- Final Prediction of x
- No
- Effect of Weights on Models in Making Final Predictions
- When the Models had same weights
- Final Prediction was Yes
- When the Models had different weights
- Final Prediction was No
- Conclusion
- Model Weight plays an important role in making the Final Prediction
- Voting vs Voting Classifier
- Voting
- Voting is one of the simplest methods to combine predictions from multiple Machine Learning Algorithms
- Voting Classifier
- Voting Classifier is not an actual Classifier (Machine Learning Algorithm) but a wrapper for a set of different Machine Learning Algorithms, which are trained and tested on the “same” Data
- A Voting Classifier combines predictions of individual Machine Learning Algorithms to make Final Predictions on unseen Data
- Note
- In sha Allah, in the next Slides, I will first try to show the
- Testing Phase, Application Phase and Feedback Phase using
- Single Model Approach
- After that, I will try to show
- Testing Phase, Application Phase and Feedback Phase using
Ensemble Model Approach
Single Model Approach
- Single Model – Testing Phase
- Randomly select a Single Model form VSH,D
- Single Model = <Short, Light, ?, Yes, ?, ?>
- Apply Single Model on Test Data
- x5 = < Short, Light, Short, Yes, Yes, Half> –
- x6 = <Tall, Light, Short, No, Yes, Full> –
- Applying Single Model on x5
- Prediction returned by Single Model
- x5 is predicted Positive (Incorrectly Classified Instance)
- Applying Single Model on x6
- Prediction returned by Single Model
- x6 is predicted Negative (Correctly Classified Instance)
- Teating Phase – Single Model
- Single Model - Application Phase
- We assume that our Single Model
- performed well on large Test Data and can be deployed in Real-world
- Single Model is deployed in the Real-world and now we can make
- predictions on Real-time Data
- Steps – Making Predictions on Real-Time Data
- Step 1: Take Input from User
- Step 2: Convert User Input into Feature Vector
- Exactly same as Feature Vectors of Training and Testing Data
- Step 3: Apply Model on the Feature Vector
- Step 4: Return Prediction to the User
- Example - Making Predictions on Real-time Data
- Step 1: Take input from User
- Step 2: Convert User Input into Feature Vector
- Note that order of Attributes must be exactly same as that of Training and Testing Examples
Step 3: Apply Single Model on Feature Vector
- Sterp 4: Return Prediction to the User
- Positive
- Note
- You can take Input from user, apply Model and return predictions as many times as you like 😊
- Single Model - Feedback Phase
- Only Allah is Perfect 😊
- Take Feedback on your deployed Single Model from
- Domain Experts and
- Users
Improve your Single Model based on Feedback 😊
Ensemble Model Approach
- Ensemble Model – Testing Phase
- Ensemble Model
- Comprises of all six hypothesis (Models) in the Version Space VSH,D
- Below are the six Models used to make the Ensemble Model
- Model 1 = <Short, Light, ?, Yes, ?, ?>
- Model 2 = <Short, ?, ?, Yes, ?, ?>
- Model 3 = <Short, Light, ?, ?, ?, ?>
- Model 4 = <?, Light, ?, Yes, ?, ?>
- Model 5 = <Short, ?, ?, ?, ?, ?>
- Model 6 = <?, Light, ?, ?, ?, ?>
- Apply Ensemble Model on Test Data
- x5 = < Short, Light, Short, Yes, Yes, Half> –
- x6 = <Tall, Light, Short, No, Yes, Full> –
- Applying Ensemble Model on x5
- Note
- Final Prediction was calculated based on Majority Vote
- Prediction returned by Ensemble Model
- x5 is predicted Positive (Incorrectly Classified Instance)
- Applying Ensemble Model on x6
- Note
- Final Prediction was calculated based on Majority Vote
- Prediction returned by Ensemble Model
- x6 is predicted Negative (Correctly Classified Instance)
- Teating Phase – Ensemble Model
- Ensemble Model – Application Phase
- We assume that our Ensemble Model
- performed well on large Test Data and can be deployed in Real-world
- Ensemble Model is deployed in Real-world and now we can make
- predictions on Real-time Data
- Steps – Making Predictions on Real-time Data
- Step 1: Take Input from User
- Step 2: Convert User Input into Feature Vector
- Exactly same as Feature Vectors of Training and Testing Data
- Step 3: Apply Ensemble Model on the Feature Vector
- Step 4: Return Prediction to the User
- Making Predictions on Real-time Data
- Step 1: Take input from User
- Step 2: Conver User Input into Feature Vector
- Note that order of Attributes must be exactly same as that of Training and Testing Examples
Step 3: Apply Ensemble Model on Feature Vector
- Note
- Final Prediction was calculated based on Majority Vote
- Step 4: Return Prediction to the User
- Ensemble Model predicts the unseen example as
- Positive
- Note
- You can take Input from user, apply Model and return predictions as many times as you like 😊
- Ensemble Model – Feedback Phase
- Only Allah is Perfect 😊
- Take Feedback on your deployed Ensemble Model from
- Domain Experts and
- Users
Improve your Ensemble Model based on Feedback
Discussion – Candidate Elimination Algorithm
- Inductive Bias - Candidate Elimination Algorithm
- Inductive Bias
- Inductive Bias is the set of assumptions needed in addition to Training Examples to justify Deductively Learner’s Classification
- Inductive Bias of Candidate Elimination Algorithm
- Training Data is error free
- Target Function / Concept is present in the Hypothesis Space (H)
- Training Regime- Candidate Elimination Algorithm
- Incremental Method
- See Chapter 3 – Basics of Machine Learning
- Strengths and Weaknesses – Candidate Elimination Algorithm
- Strengths
- Candidate Elimination Algorithm overcomes the limitation of List Then Eliminate Algorithm and
- provides a more compact representation of the Version Space
- Version Space returned by Candidate Elimination Algorithm can be used to make predictions on unseen Data using either a
- Single Model or
- Ensemble Model
- Weaknesses
- Only works on error free Data
- However, Real-world Data is noisy
- Works on assumption that Target Function f is present in the Hypothesis Space (H)
- However, we may / may not find the Target Function f in the Hypothesis Space (H) and this may / may not be known
- Only works when values of Attributes are Categorical
- However, in Real-world many Machine Learning Problems have
- Attributes / Features with Numeric values
- Only works for simple representations of Data
- However, in Real-world many Machine Learning Problems have complex representation (for e.g. Face Detection, Face Recognition etc.)
TODO and Your Turn
TODO Task 1
- Task 1
- Consider the Titanic Dataset with the following Attributes
- Gender: Male, Female
- Ticket Class: Upper, Middle, Lower
- Parent/Child Aboard: Zero, One, Two, Three
- Embarked: Cherbourg, Queenstown, Southampton
- Survival: No, Yes
- We obtained the following Sample Data
- Sample Data was split into Training and Testing Data in a
- Train-Test Split Ratio of 67%-33%
- Training Data
- Testing Data
- Note
- Consider Candidate Elimination Algorithm when answering questions given below
- Your answer should be
- Well Justified
- Questions
- Write down the Input and Output for the above Machine Learning Problem?
- How Training Example is represented?
- How Hypothesis (h) should be represented?
- Execute the Machine Learning Cycle?
- Execute Testing and Application Phases using a
- Single Model and
- Ensemble Model
- Write down your observations that you observed during the execution of Machine Learning Cycle?
Your Turn Task 1
- Task 1
- Select a Machine Learning Problem (similar to: Titanic – Machine Learning form Disaster) and answer the questions given below.
- Note
- Consider Candidate Elimination Algorithm in answering all the questions.
- Questions
- Write down the Input and Output for the selected Machine Learning Problem?
- How Training Example is represented?
- How Hypothesis (h) should be represented?
- Calculate Size of Instance Space, Concept Space, Syntactically Distinct Hypothesis and Semantically Distinct Hypothesis?
- Execute the Machine Learning Cycle?
- Execute Testing and Application Phases using a
- Single Model and
- Ensemble Model
- Write down your observations that you observed during the execution of Machine Learning Cycle?
Chapter Summary
- Chapter Summary
- Problem – FIND-S Algorithm
- Only returns one hypothesis which best fits the Training Data
- However, there can be multiple hypothesis, which best fit the Training Data
- Proposed Solution
- Version Space
- Version Space (VSH,D) contains set of all hypothesis consistent with the Training Examples
- Version Space Algorithm computes the set of all hypothesis consistent with the Training Examples
- Two Algorithms which return a Version Space are
- List Ten Eliminate Algorithm
- Candidate Elimination Algorithm
- List Ten Eliminate Algorithm – Summary
- Representation of Example
- Attribute-Value Pair
- Representation of Hypothesis (h)
- Conjunction (AND) of Constraints on Attributes
- Searching Strategy
- I am not clear about this. Please drop me an email if you know. Jazak Allah Khair
- Training Regime
- Incremental Method
- Inductive Bias of List Ten Eliminate Algorithm
- Training Data is error-free
- Target Function / Concept is present in the Hypothesis Space (H)
- Strengths
- List Then Eliminate Algorithm overcomes the limitation of FIND-S Algorithm and
- returns multiple hypothesis, which best fits the Training Data i.e. Version Space (VSH,D)
- Weaknesses
- Only works on error-free Data
- However, Real-world Data is noisy
- Works on the assumption that Target Function is present in the Hypothesis Space (H)
- However, we may / may not find the Target Function in the Hypothesis Space (H) and this may / may not be known
- List Then Eliminate Algorithm is computationally expensive because it makes a
- pairwise comparison of each Training Example with all hypothesis in Hypothesis Space (H)
- List Then Eliminate Algorithm is computationally expensive because
- Hypothesis Space (H) is represented as a List , which forces
- List Then Eliminate Algorithm to make pairwise comparisons
- To reduce the computational cost of Version Space Algorithms
- Have a more compact representation of Hypothesis Space (H)
- for e.g. Candidate Elimination Algorithm
- Candidate Elimination Algorithm – Summary
- Representation of Example
- Attribute-Value Pair
- Representation of Hypothesis (h)
- Conjunction (AND) of Constraints on Attributes
- Searching Strategy
- I am not clear about this. Please drop me an email if you know. Jazak Allah Khair
- Training Regime
- Incremental Method
- Inductive Bias of Candidate Elimination Algorithm
- Training Data is error-free
- Target Function / Concept is present in the Hypothesis Space (H)
- Strengths
- Candidate Elimination Algorithm overcomes the limitation of List Then Eliminate Algorithm and
- provides a more compact representation of the Version Space
- Version Space returned by Candidate Elimination Algorithm can be used to make predictions on unseen Data using either a
- Single Model or
- Ensemble Model
- Weaknesses
- Only works on error-free Data
- However, Real-world Data is noisy
- Works on assumption that Target Function is present in the Hypothesis Space (H)
- However, we may / may not find the Target Function in the Hypothesis Space (H) and this may / may not be known
- Only works when values of Attributes are Categorical
- However, in Real-world many Machine Learning Problems have
- Attributes / Features with Numeric values
- Only works for simple representations of Data
- However, in Real-world many Machine Learning Problems have complex representation (for e.g. Face Detection, Face Recognition etc.)
- Single Model
- A Single Model is trained on the Training Data and used to make predictions on the Test Data
- Strengths
- It is computationally fast and requires less time to make predictions compared to Ensemble Model
- Weaknesses
- Error is likely to be high compared to Ensemble Model
- Ensemble Model
- An Ensemble Model works by training different Models on the same Training Data and using each Model to individually make predictions on the Test Data
- Strengths
- It is computationally expensive and requires more time to make predictions compared to Single Model
- Weaknesses
- Error is likely to be low compared to Single Model
- Voting Approach is one of the simplest approaches to combine predictions from different Models
- In Voting Approach, we may assign
- Same weight to all Models
- Different weights to all Models
- Model Weight plays an important role in making the Final Prediction
- Voting Approach works as follows
- Given a Test instance x
- Step 1: Make individual predictions using different Models
- Step 2: Combine predictions of individual Models
- Step 3: Final Prediction of x will be the class which has the majority vote
- Voting Classifier is not an actual Classifier (Machine Learning Algorithm) but a wrapper for a set of different Machine Learning Algorithms , which are trained and tested on the same Data
- A Voting Classifier combines predictions of individual Machine Learning Algorithms to make Final Predictions on unseen Data
In Next Chapter
- In Next Chapter
- In Sha Allah, in next Chapter, I will present
- Decision Tree Learning