머신러닝

Chapter 9. Decision Trees

jun1-cs 2025. 11. 1. 19:53

*parametric model vs. nonparametric model

parametric model: when we train it, we fit the model's parameters based on the train data.

nonparametric model: we divide input space into local regions -> using multiple standards, we put the test data into its specific region.

 

1. What is Decision Tree model?

Hierarchical model, splitting region into multiple local regions, based on many hierarchical standards. The first node is called root node, and branches come out of each node to form a new region.

Tree is composed of one root node + multiple branches consisting the body of tree + leaf nodes(last end of the tree) 

=> leaves are groups of data, while root node and branches are standards that classify data into local regions.

(if-then standards)

 

Two types of decision tree models -> classification model and regression model -> difference in terms of impurity function

 

2. Classification Trees

from node m, many branches can come out of it. The goal of decision tree is to classify data into many regions, and purifying the leaf nodes. -> to purify leaves, entropy, gini index(two-order polynomial function) and misclassification error are adopted.

1) Entropy: -plog2(p)-(1-p)log2(1-p)

2) Gini index: 2p(1-p)

3) Misclassification error: 1-max(p,1-p)

=> 1,0 -> maximum purity(least error) / 0.5,0.5 -> minimum purity(highest error)

Fig 1. Total impurity equation in classification model, based on entropy loss

Equation above -> from node m, n number of branches are splitted(j) -> from branch j, there can be K splitted nodes. 

i) discrete attributes -> branch: classify with exact values -> such as "if x1==0"  

ii) numeric attributes -> branch: split into all possible splits(based on inequality) ->  such as "if x1>10"  

 

Branching Algorithm:

def splitAttribute(X):

   for i in all xi attibutes, if xi is discrete/numeric and splitentropy is less than e, e=splitentropy,  return i

def generateTree(X):

   i=splitAttribute(X)

   for each node branched from xi

      find data Xi -> generateTree(Xi)

=> basis of CART(Classification and Regression Tree), ID3, C4.5

 

-> we must choose the split that minimizes the impurity -> to make the tree more pure

 

3. Regression Trees

bm(x): 1 if xt reaches the node m, else 0

gm: average value of true labels(rt) that reach the node m

Em: mean squared error from true labels and gm, which reach the node m

 

-> nodes take branch!

bmj(x): 1 if xt reaches the node m with branch j, else 0

gmj: average value of true labels(rt) that reach the node m with branch j

Em'=mean value of Em from each branch j!

Fig 2. Total impurity equation in node m after splitting, based on MSE

4. Pruning the trees

1) Prepruning: preventing the nodes from having too few instances, to avoid generalization error and variance.

-> restricting the minimum size of nodes, such as 5% of total instances

 

2) Postpruning: dividing the instances into two datasets: initial set + pruning set

-> validate the trained tree with pruning set to find out which subtrees are not specially better-performing than alternative nodes. -> if leaf node does not perform worse than subtree, subtree is pruned.

 

pruning rules -> simplify the tree by pruning subtrees

 

5. Learning rules from data

Get if-then rules to train a decision tree -> the other way is to learn the rules directly.

There are two ways:

1) Rule induction- DFS based, generate one path at a time

sequential covering: add one rule at a time to rule base: outer loop -> add one condition at a time to current rule: inner loop 

=> two pruning steps to better generalization

 

2) Tree induction- BFS based, generate paths simultaneously

 

 

 

   

'머신러닝' 카테고리의 다른 글

Linear methods for classification - LDA, QDA, RDA  (0) 2026.02.02
EM algorithm  (0) 2026.01.20
Bootstrap in machine learning  (0) 2026.01.19
Chapter 4. Parametric methods  (0) 2025.11.02
Chapter 2 - Note  (0) 2025.10.29