Machine Learning

January 10, 2023

Chapter 7 - Concept Learning and Hypothesis Representation

Chapter Outline

Quick Recap

Basics of Concept Learning

Instance Space, Training Data, Concept Space and Hypothesis Space

Hypothesis (h) Representation

Hypothesis Space (H) Representation

Chapter Summary

Quick Recap

A Real-world Problem can be treated as a Machine Learning Problem using the following Step by Step approach

- Step 1: Decide the Learning Setting

- Step 2: Obtain Sample Data

- Step 3: Understand and Pre-process Sample Data

- Step 4: Represent Sample Data in Machine Understandable Format

- Step 5: Select Suitable Machine Learning Algorithms

- Step 6: Split Sample Data into Training Data and Testing Data

- Step 7: Select Suitable Evaluation Measure(s)

- Step 8: Execute First Two Phases of Machine Learning Cycle

- - Training Phase
  - Testing Phase
- Step 9: Analyze Results

- Step 10: Execute 3^rd and 4^th Phases of Machine Learning Cycle
  - Application Phase
  - Feedback Phase
- Step 11: Based on Feedback
  - Go to Step 1 and Repeal all the Steps

Three main Learning Settings are

- Supervised Learning

- Unsupervised Learning

- Semi-supervised Learning

What Type of Data should be obtained depends upon

- Leering Setting you selected in Step 1

Two Main Choices to Obtain Sample Data are

- Use Existing Corpora / Datasets

- Develop your Own Corpora / Datasets

If (Corpora / Datasets Exist for your Research Problem)
- Then
  - Use existing Corpora / Datasets
- Else
  - You will need to develop your own Corpora / Datasets
Very often, Machine Learning Algorithms understand Data represented in the form of

- Attribute-Value Pair

We should consider the following main points when choosing suitable Machine Learning Algorithms for your Machine Learning Problem

1. Type of Machine Learning Problem
2. Number of Parameters
3. Size of Training and Testing Data
4. Number of Features
5. Training and Testing Time
6. Accuracy
7. Speed and Accuracy in Application Phase

Machine Learning Algorithms are designed to solve specific Machine Learning Problems

Two Important Points to Know

1. Complete and correct understanding of the Type of Machine Learning Problem , you are trying to solve using Machine Learning Algorithms
2. In previous studies, what Machine Learning Algorithms have proven to be most effective for the Type of Machine Learning Problem you are solving?

Good Starting Points for Classification Problems

- Feature-based ML Algorithms

- - For Textual Data

- - - Random Forest

- - - Support Vector Machine

- - - Logistic Regression

- - - Naïve Bayes

- - - Gradient Boost

- - For Image / Video Data

- - - Support Vector Machine

- - - Regular Neural Networks

- - - Logistic Regression

- - - Naive Bayes

- - - Extreme Learning Machines

- - - Random Forest

- - - Extreme Gradient Boost

- - - Type II Approximate Reasoning

- - For Audio Data

- - - Connectionist Temporal Classification

- Deep Learning ML Algorithms

- - For Textual Data

- - - Recurrent Neural Networks (RNN)

- - - Long Short-Term Memory (LSTM)

- - - BI-LSTM

- - - Gated Recurrent Units (GRU)

- - - BI-GRU

- - For Image / Video Data

- - - Convolutional Neural Networks (most popular)

- - For Audio Data

- - - Recurrent Neural Networks (RNN)

Good Starting Points for Regression Problems

- Feature-based ML Algorithms

- - Linear Regression

- - Regression Trees

- - Lasso Regression

- - Multivariate Regression

Good Starting Points for Sequence to Sequence Problems

- - For Textual Data

- - - Recurrent Neural Networks (RNN)

- - - Long Short-Term Memory (LSTM)

- - - BI-LSTM

- - - Gated Recurrent Units (GRU)

- - - BI-GRU

- - For Image / Video Data

- - - Convolutional Neural Networks

- - For Audio Data

- - - Recurrent Neural Networks (RNN)

Good Starting Points for Unsupervised Learning Problems

- Feature based Mal Algorithms

- - For Textual Data

- - - K-Means

- - - Agglomerative Hierarchical Clustering

- - - Mean-Shift Clustering Algorithm

- - - DBSCAN – Density-Based Spatial Clustering of Applications with Noise

- - - EM using GMM – Expectation-Maximization (EM) Clustering using Gaussian Mixture Models (GMM)

- - For Image / Video Data

- - - K-Means

- - - Fuzzy C Means

- Deep Learning ML Algorithms

- - For Image / Video Data

- - - Generative Adversarial Networks

- - - Auto-encoders

Good Starting Points for Semi-supervised Learning Problems

- Feature based ML Algorithms

- - Label Spreading Algorithm

Question

- Which Machine Learning Algorithm(s) is best for a specific Machine Learning Problem?

Answer

- Apply all available Machine Learning Algorithms and see which performs best 😊

Problem

- It requires a lot of effort, time and resources to

- - Apply all available Machine Learning Algorithms and find the best one

A Possible Solution

- Start with Good Starting Points 😊

Machine Learning Experts say that following Machine Learning Algorithms are Good Starting Points

- Feature based ML Algorithms

- - For Structured / Unstructured / Semi-structured Data

- - - Support Vector Machine

- - - Logistic Regression

- Deep Learning ML Algorithms

- - For Textual Data

- - - Recurrent Neural Network (RNN)

- - For Image / Video Data

- - - Convolutional Neural Network (CNN)

- - For Audio Data

- - - Recurrent Neural Network (RNN)

A ML Algorithm’s behavior is affected by

- No. of Parameters

Size of Training Data

- Size of Training Data plays a very important role in the Selection of Suitable ML Algorithms

- Feature-based ML Algorithms

- - Feature based ML Algorithms (a.k.a. Classical ML Algorithms) can be accurately trained , even if the Training Data is small

- Deep Learning ML Algorithms

- - To accurately train Deep Learning Algorithms huge amount of Training Data is required

Size of Testing Data

- Size of Testing Data plays a very important when evaluating a Machine Learning Algorithm

- To deploy a Model in Real-world (Application Phase), it should fulfill the following two conditions

- - Model should perform well (Condition 01) on large Test Data (Condition 02)

Number of Features used to Train a Model, have a significant impact on the performance of the Model

Selection of most discriminating Features is important to get good results

In Text / Image / Video / Genetic Corpora / Datasets
- Number of Features is very high compared to the No. of Instances in a Corpus / Dataset
Two popular and widely used approaches to reduce Number of Features in a Corpus / Dataset are

- Feature Reduction

- - Feature Reduction (a.k.a. Dimensionality Reduction) is a process which transforms Features into a lower dimension

- Feature Selection

- - Feature Selection is the process of selecting most discrimination (or important) subset of Features (excluding redundant or irrelevant Features) from the Original Set of Features (without changing them)

Popular Methods for Feature Reduction are

- Principal Component Analysis

- Generalized Discriminant Analysis

- Auto-encoders

- Non-negative Matrix Factorization

Popular Methods for Feature Selection are

- Wrapper Methods

- Filter Methods

Feature Extraction

- Creates new Features

Feature Selection

- Selects a subset of Features from the Original Set of Features

Given a Corpus / Dataset

- First carry out

- - Feature Extraction then

- - Feature Reduction / Feature Selection

Training and Testing Time mainly depends upon two main factors

- Size of Training and Testing Data

- Target Accuracy
Training Time of Deep Learning ML Algorithms is quite high compared to Feature based ML Algorithms
The Target Accuracy may differ from Machine Learning Problem to Machine Learning Problem

Speed and Accuracy requirements in Application Phase, may vary from Machine Learning Problem to Machine Learning Problem

Standard Practice for Splitting Sample Data

- Use a Train-Test Split Ratio of

- - 67% – 33%

Selection of Suitable Evaluation Measure(s) is important to

- correctly evaluate the performance of a Model

Selection of Suitable Evaluation Measure(s) mainly depends on

- Type of Machine Learning Problem

Some of the most popular and widely used Evaluation Measures for Classification Problems are

- Baseline Accuracy (a.k.a. Most Common Categorization (MCC))

- Accuracy

- True Negative Rate

- False Positive Rate

- False Negative Rate

- Recall or True Positive Rate or Sensitivity

- Precision or Specificity

- F1

- Area Under the Curve (AUC)

Some of the most popular and widely used Evaluation Measures for Regression Problems are

- Mean Absolute Error (MAE)

- Mean Squared Error (MSE)

- Root Mean Squared Error (RMSE)

- R2 or Coefficient of Determination

- Adjusted R2

Some of the most popular and widely used Evaluation Measures for Sequence-to-Sequence Problems are

- ROUGE (Recall-Oriented Understudy for Gisting Evaluation)

- BLEU (Bi-Lingual Evaluation Understudy)BLEU

- METEOR (Metric for Evaluation of Translation with Explicit Ordering)

Recall the Equation

- Data=Model+Error

Training Phase

- Use Training Data to build the Model

Testing Phase

- Use Testing Data to evaluate the performance of the Model

- - i.e. calculate Error in the Model

When analyzing results, remember the Machine Learning Assumption

In Application Phase

- Deploy the Model in the Real-world to make predictions on unseen data

In Feedback Phase

- Take Feedback from

- - Domain Experts

- - Users of the ML system

- Based on Feedback from Domain Experts and Users

- - Improve your Model

In this Chapter, we treated (Step by Step) following four Real-world Problems as Machine Learning Problems

- GPA Prediction Problem

- Emotion Prediction Problem

- Text Summarization Problem

- Machine Translation Problem

Basics of Concept Learning

Recall that

- Most of the Machine Learning Algorithms use Inductive Learning Approach to learn

- Concept Learning is a major subclass of Inductive Learning

Definition

- Concept Learning refers to the human ability

- - to learn categories for Real-world Object and

- - to recognize new instances of those categories

Purpose

- To completely and correctly understand the characteristics and behaviors of Real-world Objects falling in different categories and use them according to instructions of Allah (to get maximum benefits)

Importance

- Without Concept Learning, it will be difficult to learn about category of a Real-world Object and use them efficiently and properly

Applications

- Concept learning helps to speed up the learning process because

- - If you completely and correctly understand the characteristics and behaviors of a category of a Real-world Object, then

- - - new (unseen) Real-world Objects falling in the same category will have similar characteristics and behavior

Definition

- In Machine Learning, a Real-world Object is a material thing that can be seen and touched

Two Main Properties of a Real-world Object

1. Characteristics / Attributes / Features
2. Behavior

Characteristics and Behavior of a Real-world Object help us to

- learn to categories Real-world Object (or distinguish one Real-world Object from the other)

Rule of ThumbGiven

- Similar Real-world Objects will have Similar Characteristics and Behavior and vice versa

Example

- Consider the following two Real-world Object

- Real-world Object 1 – Book

- - Characteristics

- - - Author(s), Title, No. of Pages, Price

- - Behavior

- - - Open, Read, Close

- Real-world Object 2 – Human

- - Characteristics

- - - Height, Weight, Hair Length, Bear, Scarf

- - Behavior

- - - Walk, Talk, Eat, Run

Observation

- Characteristics and Behavior of Real-world Object 1 (Book) are entirely different from those of Real-world Object 2 (Human)

Conclusion
- Book and Human are two different Real-world Objects and will fall in different categories

Goal
- A Real-world Object

- - Instructor of Machine Learning Course

- Categories

- - Human vs Non-Human

Goal

- Learn to categorize Real-world Object (Instructor) into one of the two categories

- - Category / Class 1 = Human

- - Category / Class 2 = Non-Human

Observation

- Characteristics and Behavior of Instructor (Real-world Object) are

- - similar to Human Category and

- - dissimilar to Non-Human Category

Conclusion

- Instructor (Real-world Object) is categorized as

- - Human

Given

- A Real-world Object

- - Car

- Categories

- - Human vs Non-Human

Goal

- Learn to categorize Real-world Object (Car) into one of the two categories

- - Category / Class 1 = Human

- - Category / Class 2 = Non-Human

Observation

- Characteristics and Behavior of Car (Real-world Object) are

- - similar to Non-Human Category and

- - dissimilar to Human Category

Conclusion

- Car (Real-world Object) is categorized as

- - Non-Human

A Major Problem

- Machine Is dump and cannot understand Real-world Objects (as they are)

Solution

- Change representation of Real-world Objects in a Format, which Machine Learning Algorithms can understand

Real-world Object

- Student

Few Characteristics of a Student

- Student ID

- Stunted Name

- Gender

- Degree Program

- CGPA

Question

- How to represent COMSATS University, Lahore Campus students (Real-world Objects) in Machine?

A Possible Solution

- Use Attribute-Value Pair

- - Represent Characterizes of a Stunted (Real-world Object) using a set of Attributes

Representation of 3 COMSATS Students in Machine (using a Database Table called Tbl-Student)

In Object-Oriented Programming (OOP)

- Characteristics are represented using Variables and

- Behavior is represented using Functions / Methods

Definition

- Concept Learning refers to a machine’s ability

- - to learn categories for Real-world Object and

- - to recognize new instances of those categories

Much learning is acquiring general Concepts from specific examples

Question

- Why use specific examples to learn general Concepts?

Answer

- It is practically not possible to collect all examples of a Concept

Example

- Machine Learning Problem

- - Gender Identification

- Two Main Situations in Collecting Examples to Learn the Concept of Gender Identification

- - Ideal Situation

- - - All examples (all human beings in the world) are collected to learn the Concept of Gender Identification

- - Realistic Situation
    - Only a small subset of examples (or Sample) is collected to learn the Concept of Gender Identification

Definition

- Given

- - Set of Training Examples (D) and

- - Set of Functions / Hypothesis (H)

- Infer

- - A Boolean-valued Function using the Set of Training Examples

Possible Output Values of a Boolean-valued Function

- A Boolean-valued Function has two possible Output values

- - For example

- - - 1 (Yes)

- - - 0 (No)

Recall

- In a Binary Classification Problem, there are

- - Two possible Output values

- Therefore, Concept Learning is also known as

- - Binary Classiﬁcation

Positive Example

- In Concept Learning, if Output is 1 / True / Yes, then it is called a Positive Example

Negative Example

- In Concept Learning, if Output is 0 / False / No, then it is called a Negative Example

Concept

- For a given Real-world Object

- - Identify if it is a Human or not?

Input

- Object

Categories / Classes

- Category / Class 1 = Human (True)

- Category / Class 2 = Non-Human (False)

Output

- True

- - If Input (Real-world Object) is Human

- False

- - If Input (Real-world Object) is Non-Human

Example 1

- Input

- - Car

- Output

- - False

Example 2

- Input

- - Student

- Output

- - True

Recall

- Much learning involves

- - Learning Input-Output Functions

Therefore, to learn any Concept, the solution is to

- Learn a Function

Conclusion
- A Concept is a Function, which we want to learn

TODO and Your Turn

Todo Tasks

Your Turn Tasks

Todo Tasks

Your Turn Tasks

Instance Space, Training Data, Concept Space and Hypothesis Space

Definition

- Instance Space (X) is the set of all possible combinations of Input Attribute values

In Machine Learning, Instance Space is same as

- Population in Statistics

Symbols

- - represents Instance Space

- |X|

- - represents Size of Instance Space

Formula to Compute |X|

- |X| = (Values of Input Attribute 1) x (Values of Input Attribute 2) x (Values of Input Attribute 3) x ……… x (Values of Input Attribute n-1) x (Values of Input Attribute n)

- where n represents

- - Total Number of Input Attributes

Concept to be Learned

- Gender Identification

Input

- Human

Output

- Gender of the Human

Representation of Input i.e. Human

- Set of 2 Input Attributes

- - Input Attribute 1

- - - Height

- - Input Attribute 2

- - - Weight

- Possible Values of Input Attributes

- - Values – Input Attribute 1 (Height)

- - - Short

- - - Medium

- - - Tall

- - Values – Input Attribute 2 (Weight)

- - - Normal

- - - Heavy

Representation of Output i.e. Gender of a Human

- Set of 1 Output Attribute

- - Output Attribute

- - - Gender

- Possible Values of Output Attribute

- - Values – Output Attribute (Gender)

- - - Male

- - - Female

- Note

- - Gender Identification is a Concept Learning Problem

- - - Therefore, the Output Attribute (Gender) has only
      - 2 possible Output values

Concept to be Learned

- Gender Identification

Input

- Human

Output

- Gender of the Human

Representation of Input i.e. Human

- Set of 3 Input Attributes

- - Input Attribute 1

- - - Height

- - Input Attribute 2

- - - Weight

- - Input Attribute 3

- - - Scarf

- Possible Values of Input Attributes

- - Values – Input Attribute 1 (Height)

- - - Short

- - - Medium

- - - Tall

- - Values – Input Attribute 2 (Weight)

- - - Small

- - - Normal

- - - Heavy

- - Values – Input Attribute 3 (Scarf)

- - - Yes

- - - No

Representation of Output i.e. Gender of a Human

- Set of 1 Output Attributes

- - Output Attribute

- - - Gender

- Possible Values of Output Attribute

- - Values – Output Attribute (Gender)

- - - Male

- - - Female

- Note

- - Gender Identification is a Concept Learning problem

- - - Therefore, the Output Attribute (Gender) has only

- - - - 2 possible Output values

Sample Data

- Subset of Instance Space (X)

Size of Sample Data is always less than the Size of Instance Space

In Machine Learning, Sample Data is same as

- Sample in Statistics

Sample Data is split into

- Training Data (D)

- Testing Data

Formally, each d ∈ D is composed of

- di, c(di)

- - di represents the Input

- - c(di) represents the Output

Recall

- Concept Learning is a major subclass of Inductive Learning and

- - Inductive Learning Approach has Scope of Error

Why a Concept Cannot be Completely Learned?

- To completely learn a Concept, we need

- - entire Instance Space (X)

- Problem

- - Practically impossible to acquire the entire Instance Space (X)

- What we have is Training Data (D)

- - D is a subset of X

- Machine Learning Models learns from Training Data

- - i.e. Machine Learning Model learns from incomplete data

- Therefore, a Concept cannot be completely learned

Definition

- Concept Space (C) is the set of all possible combinations of Output values

Symbols

- - represents Concept Space

- |C|

- - represents Size of Concept Space

Formula to Compute |C|

- C= 2|x|

- - where |X| represents the Size of Instance Space

Example 1

- Considering a Concept Learning Problem in which

- - |X| = 4

- C= 2|x| =24 = 16

Example 2

- Considering a Concept Learning Problem in which

- - |X| = 9

- |C| = 2|x| = 2|9| = 512

Example 3

- Considering a Concept Learning Problem in which

- - |X| = 96

- |C| = 2|x| =2|96| = 79,228,162,514,264,337,593,543,950,336

Note

- In these three examples

- - As |X| increases, the value of |C| exponentially increases and

- - - For |X| = 96, the value of |C| is very huge

- Real-world Corpora / Datasets may comprise of a large number of Input Attributes, which will result in a

- - very huge Concept Space

- Recall

- - Learning is a Searching Problem

- - A Major Problem

- - - Bigger the |C|, the more challenging it becomes to search the Concept Space

Problem with a Huge Concept Space

- The computational cost and time required to search a huge Concept Space will be very high

Question

- How to efficiently and quickly search a huge Concept Space to find a Concept / Function which best fits the Training Data?

A Possible Solution

- Reduce the size of Concept Space

Definition

- The reduced Concept Space (C) is called the Hypothesis Space (H)

Question

- How to reduce the size of Concept Space (C)?

A Possible Solution

- Introduce search bias

Consider the following searching problem

- There are a total of 40 students (25 Male and 15 Female) in Machine Learning class. Adeel is the topper of the class with a CGPA of 3.86. Our goal is to automatically search and find out the topper of the class.

Target Concept / Function to be Searched (or Learned)

- Topper of Machine Learning Class

- - i.e. Adeel (CPGA = 3.86)

|C| = Size of Machine Learning Class = 40

Question

- How to introduce search bias to reduce the size of Concept Space?

A Possible Solution

- Introduce Gender Bias

Choice 1 – Gender Bias

- Exclude Males from Concept Space

- - |HFemales| = 40 – 25 = 15

- Note that Hypothesis Space (HFemales) only contains Female students of the Machine Learning class

Choice 2 – Gender Bias

- Exclude Females from Concept Space

- - |HMales| = 40 – 15 = 25

- Note that Hypothesis Space (HMales) only contains Male students of the Machine Learning class

Now we have three spaces

- Concept Space

- - |C| = 40

- Hypotheses Space only containing Female students

- - |HFemales| = 15

- Hypotheses Space only containing Male students

- - |HMales| = 25

Target Concept / Function to be Searched (or Learned)

- Topper of Machine Learning Class

- - i.e. Adeel (CPGA = 3.86)

Searching Target Function / Concept in |C|

- Outcome of Search

- - We will definitely find the Target Function / Concept in Concept Space (C)

- Reason

- - We are searching the entire Concept Space (C)

- - - Therefore, we cannot miss the Target Function / Concept

Searching Target Function / Concept in HFemales

- Outcome of Search

- - We will not find the Target Function / Concept in Hypothesis Space (HFemales)

- Reason

- - We are searching the reduced Concept Space (C)

- - - Therefore, we may / may not find the Target Function / Concept in a Hypothesis Space

Searching Target Function / Concept in HMales

- Outcome of Search

- - We will find the Target Function / Concept in Hypothesis Space (HMales)

- Reason

- - We are searching the reduced Concept Space (C)

- - - Therefore, we may / may not find the Target Function / Concept in a Hypothesis Space

Conclusion

- Searching a Concept Space
  - We will definitely find the Target Function / Concept
- Searching a Hypothesis Space
  - We may / may not find the Target Function / Concept and
    - this may / may not be known to us

Searching a Concept Space

- Strengths

- - We will definitely find the Target Function / Concept

- Weaknesses

- - Computational cost is high

- - Amount of Time required for searching is high

Searching a Hypothesis Space

- Strengths

- - Computational cost is low

- - Amount of Time required for searching is small

- Weaknesses

- - We may / may not find the Target Function / Concept and this may / may not be known to us

- Titanic sank in 1912 (some passengers survived and remaining died). Kaggle launched a competition titled: Titanic: Machine Learning from Disaster

- - URL: https://www.kaggle.com/c/titanic

- Below are some of the Attributes (with their Possible Values) taken from Titanic Project Dataset

- - Gender: Male, Female

- - Ticket Class: Upper, Middle, Lower

- - Survival: No, Yes

- - Parent/Child Abroad: Zero, One, Two, Three

- - Embarked: Cherbourg, Queenstown, Southampton

- Goal of a Machine Learning Algorithm is to predict

- - Whether a Passenger Survived or Not?

Note

- Your answer should be

- - Well Justified

Questions

- Write down Input and Output for the above discussed Machine Learning Problem?

- Compute the Size of Instance Space

- Compute the Size of Concept Space

- Can we say that this Machine Learning Problem is a Binary Classification Problem? Explain.

- Can you completely learn the above-discussed Concept?

Task 2

- Rizwan wants to search Adeel from a group of 300 people. There are a total of 3 groups: (1) Young – contains 100 people, (1) Middle – contains 125 people and (3) Old – contains 75 people. Note that Adeel falls in the group of young people.

Questions

- What is the Target Function / Concept?

- What is the Size of Concept Space?

- How Rizwan can introduce search bias to reduce the Size of Concept Space?

- How many Search Spaces will be made after introducing search bias?
Discuss each Search Space separately and identify from which Search Spaces you will find the Target Function / Concept and from which you miss the Target Function / Concept

TODO and Your Turn

Todo Tasks

Your Turn Tasks

Todo Tasks

Your Turn Tasks

Hypothesis (h) Representation

Recall

- Input to Learner

- - Set of Training Examples (D)

- - Set of Functions / Hypothesis (H)

Machine is Dump

- Therefore, we will need to represent

- - Both Training Example (d) and Hypothesis (h) in a Format, which

- - - Machine can understand

Very often, a Training Example (d) is represented as

- Attribute-Value Pair

Values of Attributes can be mainly categorized as

- Categorical

- Numeric

Examples

- Following Machine Learning Algorithms represent a Training Example (d) in the form of Attribute-Value Pair

- - FIND-S Algorithm

- - List Then Eliminate Algorithm

- - Candidate Elimination Algorithm

- - Decision Tree Learning Algorithms

- - - For example, ID3, J48, Random Forest

- - Artificial Neural Networks

- - - For example, Multi-Layer Perceptron

- - Bayesian Learning Algorithms

- - - For example, Naïve Bayes

- - Instance-based Learning Algorithms

- - - For example, k-NN

Representation of Hypothesis (h) may vary from Machine Leering Algorithm to Machine Learning Algorithm

Examples

- Representation of Hypothesis (h) for FIND-S, List Then Eliminate and Candidate Elimination Algorithms is

- - Conjunction (AND) of Constraints on Attributes

- Representation of Hypothesis (h) for ID3 Algorithms is

- - Disjunction (OR) of Conjunction (AND) of Constraints on Attributes

- Representation of Hypothesis (h) for Regular Neural Network Algorithms is

- - Combination of Weights between Units

In sha Allah, in this example, I will consider the Gender Identification Problem and will try to explain the representation of both

- Training Example (d) and Hypothesis (h)

Concept to be Learned

- Gender Identification

Input

- Human

Output

- Gender of the Human

Recall

- Example = Input + Output

Representation of Input i.e. Human

- Set of 2 Input Attributes

- - Input Attribute 1

- - - Height

- - Input Attribute 2

- - - Weight

- Values of Input Attributes

- - Values – Input Attribute 1 (Height)

- - - Short

- - - Medium

- - - Tall

- - Values – Input Attribute 2 (Weight)

- - - Small

- - - Normal

- - - Heavy

Representation of Output i.e. Gender of a Human

- Set of 1 Output Attributes

- - Output Attribute

- - - Gender

- Values of Output Attribute

- - Values – Output Attribute (Gender)

- - 1. Yes (Female)
    2. No (Male)

Note

- Example (both Input and Output) are represented as

- - Attribute-Value Pair

X = {x1,x2,x3,x4,x5, x6, x7, x8, x9}

Let’s see each example separately, considering the following order of Input Attributes

<Height, Weight>

x1 = <Short, Small>

x2 = <Short, Normal>

x3 = <Short Heavy>

x4 = <Medium, Small>

x5 = <Medium, Normal>

x6 = <Medium, Heavy>

x7 = <Tall, Small>

x8 = <Tall, Normal>

x9 = <Tall, Heavy>

Assume we obtained Sample Data of 6 instances

Sample Data is split into

- Training Data – 2 / 3 of Sample Data

- Testing Data – 1 / 3 of Sample Data

Formally, each d ∈ D is composed of di, c(di)

- di represents the Input

- c(di) represents the Output

We have completed the Representation of Training Examples (D) in the form of

- Attribute-Value Pair

In sha Allah, in next slides we will discuss the

- Representation of Hypothesis (h)

Two Main Representation

- Conjuction (AND) of Input Attributes

- Disjunction (OR) of Conjuction (AND) of Input Attributes

Note a Concept can also be written in the form of

- Rules (if-to)

Concept

- C = < Height = Short AND Weight = Normal>

Concept in the form of Rules

Concept in Tabular Form

Concept

- C = < Height = Medium AND Weight = Normal>

Concept in the form of Rules

Concept in Tabular Form

Concept

- C = < Height = Short AND Weight = Heavy>

Concept in the form of Rules

Concept in Tabular Form

In all three examples given on previous slides, Output is Yes only for

- One combination of Height and Weight

Concept

- C = < Height = Short AND Weight = Small OR

Height =Medium AND Weight = Small OR

Height = Short AND Weight = Normal OR

Height = Medium AND Weight = Normal OR

Height = Short AND Weight = Heavy >

Concept in the form of Rules

Concept in Tabular Form

Concept

- C = < Height = Short AND Weight = Small OR

Height = Short AND Weight = Normal OR

Height = Medium AND Weight = Normal OR

Height = Short AND Weight = Heavy >

Concept in the form of Rules

Concept in Tabular Form

Concept

- C = < Height = Medium AND Weight = Normal OR

Height = Short AND Weight = Heavy >

Concept in the form of Rules

Concept in Tabular Form

In all three examples given on previous slides, Output is Yes for

- Multiple combinations of Height and Weight

Question

- Which of the two Concept Representations is more powerful?

- - Conjunction (AND) of Input Attributes

- - Disjunction (OR) of Conjunction (AND) of Input Attributes

Answer

- Disjunction (OR) of Conjunction (AND) of Input Attributes

Reason

- We can represent more complex Concepts using Disjunction (OR) of Conjunction (AND) of Input Attributes compared to Conjunction (AND) of Input Attributes

Recall

- Learner has to search the Concept Space / Hypothesis Space using the Set of Training Examples (D) to find a Concept / Hypothesis / Function, which best fits the Training Data

In our Gender Identification Problem

- |C| = 512

The reduced Concept Space (C) is called the Hypothesis Space (H)

To reduce the size of Concept Space (C)

- introduce search bias

Most Common Bias

- uses the AND (Conjunction) relationship between Attributes

In other words, Hypothesis Space (H) uses the

- Conjunction (AND) of Attributes

- - i.e. h = <Height, Weight>

Question

- How a Hypothesis (h) can be represented?

Answer

- There are many possible representations of a Hypothesis (h)

In our Gender Identification Problem, a Hypothesis (h) is represented as

- Conjunction (AND) of Constraints on Input Attributes

Each constraint can be

- A specific value : e.g.
  - Weight = Small

- A don’t care value (any of Possible Values): e.g.
  - Weight = ?

- No value allowed (Null Hypothesis Ø): e.g.
  - Weight = Ø

Consider the Gender Identification Problem

- We have two Input Attributes

- - Height

- - Weight

Constraints – Height Attribute

- No value allowed

- - Ø

- A specific value

- - Short, Medium, Tall

- Any value allowed

- - ?

Total No. of Constraints – Height Attribute

- Ø, Short, Medium, Tall, ?

Constraints – Weight Attribute

- No value allowed

- - Ø

- A specific value

- - Small, Normal, Heavy

- Any value allowed

- - ?

Total No. of Constraints – Weight Attribute

- Ø, Small, Normal, Heavy, ?

In our Gender Identification Problem

- h = <Height, Weight>

Some of the Hypotheis (h) for our Gender Idetnification Problem are as follows

- h = <Height, Weight>

- h1 = <Short, Small>

- h2 = <Tall, Heavy>

- h3 = <Ø, Small>

- h4 = <?, ?>

An Example (d) is

- Vector of Attribute Values

- Examples

- - d = Height, Weight, Gender

- - d1 = Short, Small, Yes

- - d2 = Tall, Heavy, No

Note

- In d1 and d2

- - Small, Short, Heavy, Tall, Yes and No are

- - - Attribute Values

A hypothesis (h) is

- Vector of Constraints on Input Attributes

- Examples

- - h = <Height, Weight>

- - h1 = <Short, Small >

- - h2 = <Tall, Heavy>

- - h3 = <Ø, Small>

- - h4 = <?, ?>

Note

- In h1 , h2 and h3

- - Small, Short, Heavy and Tall are

- - - Constraints on Input Attributes

Hypothesis (h) can be written in the form of Rules

Example 1

- h = <Height, Weight>

- h1 = <Short, Small>

Example 2

- h = <Height, Weight>

- h4 = <?, ?>

Hypothesis (h) is represented as

- Conjunction (AND) of Constraints on Input Attributes

Constraints on attributes include

- Ø – means that there will be no value for which Hypothesis (h) will be 1 (Yes)

- Specified values – means can take values from a specified set of values

- ? – means that for all values of the input Hypothesis (h) = 1 (Yes)

In Gender Identification Problem

- There are two Input Attributes

- - Height

- - Weight

- Constraints on Height Attribute (total 5 constraints)

- - Ø, Short, Medium, Tall, ?

- Constraints on Weight Attribute (total 5 constraints)

- - Ø, Small, Normal, Heavy, ?

- |H| = No. of Constraints on Weight Attribute x No. of Constraints on Height Attribute

- |H| = 5 x 5 = 25

Recall

- |C| = 512 and

- |H| = 25

Conclusion

- Introduction of search bias reduced the size of Concept Space (C)

Consider the following two print statements

- In C Programming Language

- - printf (“Allah loves those who serve humanity”)

- In Python Programming Language

- - print (“Allah loves those who serve humanity”)

Both statements are

- Semantically same but
- Syntactically distinct

No. of Syntactically Distinct Hypothesis = 25

h = <Height, Weight>

- h1 = < Ø, Ø >

- h2 = < Ø, Small >

- h3 = < Ø, Normal >

- h4 = < Ø, Heavy >,

- h5 = < Ø, ? >,

- h6 = < Short, Ø >,

- h7 = < Short, Small >,

- h8 = < Short, Normal >,

- h9 = < Short, Heavy >,

- h10 = < Short, ? >,

- h11 = < Medium, Ø >,

- h12 = < Medium, Small >,

- h13 = < Medium, Normal >,

- h14 = < Medium, Heavy >,

- h15 = < Medium, ? >,

- h16 = < Tall, Ø >,

- h17 = < Tall, Small >,

- h18 = < Tall, Normal >,

- h19 = < Tall, Heavy >,

- h20 = < Tall, ? >,

- h21 = < ?, Ø >,

- h22 = < ?, Small >,

- h23 = < ?, Normal >,

- h24 = < ?, Heavy >,

- h25 = < ?, ? >

There are redundancies within these 25 hypotheses

- Caused by Ø

Note

- Whenever there is Ø in any of the Input Attributes and we are considering Conjunctions (AND)

- - Output will always be 0

The following Hypotheses are semantically same

h = <Height, Weight>

- h1 = < Ø, Ø >

- h2 = < Ø, Small >

- h3 = < Ø, Normal >,

- h4 = < Ø, Heavy >,

- h5 = < Ø, ? >,

- h6 = < Short, Ø >,

- h11 = < Medium, Ø >,

- h16 = < Tall, Ø >,

- h21 = < ?, Ø >

No. of Semantically Distinct Hypothesis = 1 + (Small, Medium, Heavy, ?) x (Short, Medium, Tall, ?)

No. of Semantically Distinct Hypothesis = 1+ 4 x 4 = 17

h = <Height, Weight>

- h1 = < Ø, Ø >

- h2 = < Short, Small >

- h3 = < Short, Normal >,

- h4 = < Short, Heavy >,

- h5 = < Short, ? >,

- h6 = < Medium, Small >,

- h7 = < Medium, Normal >,

- h8 = < Medium, Heavy >,

- h9 = < Medium, ? >,

- h10 = < Tall, Small >,

- h11 = < Tall, Normal >,

- h12 = < Tall, Heavy >,

- h13 = < Tall, ? >,

- h14 = < ?, Small >,

- h15 = < ?, Normal >,

- h16 = < ?, Heavy >,

- h17 = < ?, ? >

To summarize

- |C| = 512

- |H| = 25 (Syntactically Distinct Hypothesis)

- |H| = 17 (Semantically Distinct Hypothesis)

Conclusion

- Search bias has reduced the size of Concept Space from 512 to 17, which is a

- - Huge reduction

- Advantage of Reduction

- - A Learner can quickly search with low computational cost

- - - A space of 17 Hypothesis (h) instead of 512 Hypothesis (h)

- Disadvantage of Reduction
  - The Target Faction / Concept may / may not be present in the Hypothesis Space (H) and this may / may not be known

TODO and Your Turn

Todo Tasks

Your Turn Tasks

Todo Tasks

Your Turn Tasks

Hypothesis Space (H) Representation

In Machine Learning, Concept Learning can be formulated as

- Problem of searching through a predefined space of potential hypotheses (H) (Hypothesis Space) for the Hypothesis (h) that best fits the Training Data (D)

Hypothesis Space (H)

- Hypothesis Space (H) is a predefined space of Hypothesis (h)

Question

- How can we represent a Hypothesis Space (H) for a Learner?

Answer

- There are many possible representations of Hypothesis Space (H)

A simple representation of Hypothesis Space (H) is

- General-to-Specific Ordering of Hypotheses

Hypothesis (h) Representation

- Conjunction of Constraints on Input Attributes

Hypothesis Space (H) Representation

- General-to-Specific Ordering of Hypotheses

Note

- Representation of Hypothesis (h) is entirely different from the Hypothesis Space (H)

Consider the Gender Identification Problem discussed in the previous Section

- Hypothesis (h) Representation

Most Specific Hypothesis (h)

- h = <∅, ∅>

- The Output of Most Specific Hypothesis (h) will always be 0 / False / No

- - i.e. it will classify all instances as Negative

Most General Hypothesis (h)

- h = <?, ?>

The Output of Most Specific Hypothesis (h) will always be 1 / True / Yes

- - i.e. it will classify all instances as Positive

If we represent Hypothesis Space (H) as General-to-Specific Ordering of Hypotheses then

- Specific Boundry = Most Specific Hypothesis (h)

- Remaining Hypothesis will lie here

- General Boundry = Most General Hypothesis (h)

A hypothesis (hi) is more general then or equal to another hypothesis (hj) if

- It hi puts fewer constraints than hj and therefore classifies

- - more instances as Positive

Consider the following two instances

- x1 = <Short, Heavy>

- x2 = <Tall, Heavy>

Consider the following three Hypotheses

- h1 = <Short, Heavy>

- h2 = <Tall, Heavy>

- h3 = <?, Heavy>

Let’s see which Hypothesis is more general than other Hypotheses

Apply h1 on x1 and x2

- x1 is classified as Positive

- x2 is classified as Negative

Apply h2 on x1 and x2

- x1 is classified as Negative

- x2 is classified as Positive

Apply h3 on x1 and x2

- x1 is classified as Positive

- x2 is classified as Positive

Conclusion

- h3 is more general then h1 and h2 because

- - it puts fewer constraints compared to h1 and h2, and
  - it classified more instances as Positive compared to h1 and h2

TODO and Your Turn

Todo Tasks

Your Turn Tasks

Todo Tasks

Your Turn Tasks

Chapter Summary

Most of the Machine Learning Algorithms use Inductive Learning Approach to learn

Concept Learning is a major subclass of Inductive Learning

In Psychology, Concept Learning refers to the human ability

- to learn categories for Real-world Object and

- to recognize new instances of those categories

In Machine Learning, a Real-world Object is a material thing that can be seen and touched

Two Main Properties of a Real-world Object are

1. Characteristics / Attributes / Features
2. Behavior

Characteristics and Behavior of a Real-world Object help us to
- learn to categories Real-world Object (or distinguish one Real-world Object from the other)
- Similar Real-world Objects will have Similar Characteristics and Behavior and vice versa

Problem

- Machine Is dump and cannot understand Real-world Objects ( as they are )

Solution

- Change representation of Real-world Objects in a Format, which Machine Learning Algorithms can understand

In Machine Learning, Concept Learning refers to a machine’s ability

- to learn categories for Real-world Object and

- to recognize new instances of those categories

Concept Learning – Formal Definition

- Given

- - Set of Training Examples (D) and

- - Set of Functions / Hypothesis (H)

- Infer

- - A Boolean-valued Function using the Set of Training Examples

Concept Learning is also known as Binary Classiﬁcation

In Concept Learning

- if Output is 1 / True / Yes, then it is called a Positive Example

- if Output is 0 / False / No, then it is called a Negative Example

A Concept is a Function , which we want to learn

Instance Space (X) is the set of all possible combinations of Input Attribute values

In Machine Learning, Instance Space is same as

- Population in Statistics

Sample Data

- Subset of Instance Space (X)

Size of Sample Data is always less than the Size of Instance Space

In Machine Learning, Sample Data is same as

- Sample in Statistics

Sample Data is split into

- Training Data (D)

- Testing Data

Formally, each d ∈ D is composed of

- di, c(di)

- - di represents the Input

- - c(di) represents the Output

Concept Space (C) is the set of all possible combinations of Output values

Symbols

Formula to Compute |C|

- |C| = 2|X|

- - where |X| represents the Size of Instance Space

The computational cost and time required to search a huge Concept Space will be very high

The reduced Concept Space (C) is called the Hypothesis Space (H)

Most Common Approach to reduce the size of Concept Space is to

- Introduce search bias

Most Common Bias

- uses the AND (Conjunction) relationship between Attributes

Searching a Concept Space

- Strengths

- - We will definitely find the Target Function / Concept

- Weaknesses

- - Computational cost is high

- - Amount of Time required for searching is high

Searching a Hypothesis Space

- Strengths

- - Computational cost is low

- - Amount of Time required for searching is small

- Weaknesses

- - We may / may not find the Target Function / Concept and

- - - this may / may not be known to us

Very often , a Training Example (d) is represented as

- Attribute-Value Pair

Values of Attributes can be mainly categorized as

- Categorical

- Numeric

Examples

- Following Machine Learning Algorithms represent a Training Example (d) in the form of Attribute-Value Pair

- - FIND-S Algorithm

- - List Then Eliminate Algorithm

- - Candidate Elimination Algorithm

- - Decision Tree Learning Algorithms

- - - For example, ID3, J48, Random Forest

- - Artificial Neural Networks

- - - For example, Multi-Layer Perceptron

- - Bayesian Learning Algorithms

- - - For example, Naïve Bayes

- - Instance based Learning Algorithms

- - - For example, k-NN

Representation of Hypothesis (h) may vary from Machine Leering Algorithm to Machine Learning Algorithm

Examples

- Representation of Hypothesis (h) for FIND-S, List Then Eliminate and Candidate Elimination Algorithms is

- - Conjunction (AND) of Constraints on Attributes

- Representation of Hypothesis (h) for ID3 Algorithms is

- - Disjunction (OR) of Conjunction (AND) of Constraints on Attributes

- Representation of Hypothesis (h) for Regular Neural Network Algorithms is

- - Combination of Weights between Units

Hypothesis Space (H)

- Hypothesis Space (H) is a predefined space of Hypothesis (h)

There are many possible Representations of Hypothesis Space (H)

A simple Representation of Hypothesis Space (H) is

- General-to-Specific Ordering of Hypotheses

Ilm o Irfan

Machine Learning

Table of Contents

Chapter 7 - Concept Learning and Hypothesis Representation

Chapter Outline

Quick Recap

Basics of Concept Learning

TODO and Your Turn

Instance Space, Training Data, Concept Space and Hypothesis Space

TODO and Your Turn

Hypothesis (h) Representation

TODO and Your Turn

Hypothesis Space (H) Representation

TODO and Your Turn

Chapter Summary

In Next Chapter

Chapter 6 - Treating a Real-world Problem as a Supervised Machine Learning Problem

Chapter 8 - FIND-S Algorithm

Share this article!

About Us

Quick Links

Useful Links

Subscribe Our Newsletter

Ilm o Irfan

Machine Learning

Table of Contents

Chapter 7 - Concept Learning and Hypothesis Representation

Chapter Outline

Quick Recap

Basics of Concept Learning

TODO and Your Turn​

Instance Space, Training Data, Concept Space and Hypothesis Space

TODO and Your Turn​

Hypothesis (h) Representation

TODO and Your Turn​

Hypothesis Space (H) Representation

TODO and Your Turn

Chapter Summary

In Next Chapter

Chapter 6 - Treating a Real-world Problem as a Supervised Machine Learning Problem

Chapter 8 - FIND-S Algorithm

Share this article!

About Us

Quick Links

Useful Links

Subscribe Our Newsletter

TODO and Your Turn

TODO and Your Turn

TODO and Your Turn