ensemble learning

머신러닝

ensemble learning

jun1-cs 2026. 2. 23. 22:58

ensemble learning: compare multiple classifer models and make better decision to reduce the prediction error

1. Independent ensemble learning: ml models operate in parallel -> fast

1) voting: train multiple type of models with train data and vote to classify it

Ex) plurality voting, majority voting

- we can also mathematically prove that n classifiers with same error rate e can exhibit lower or same error rate than in case of individual classifier

- classification problem -> use VotingClassifier from sklearn.ensemble

- regression problem -> use VotingRegressor from sklearn.ensemble

-> various factors in those function: 'weight' of each model, hard/soft for plurality/probability average

2) bagging (bootstrap aggregating): train 1 type of model multiple times with same data and vote for final decision

step 1. extract n bootstrap samples (independent trials)

step 2. train n models using n bootstrap samples

step 3. vote n predictions to make final decision

Ex) Random forest model: max_depth factor -> how many times to repeat step 1,2 for separating features

- classification problem -> use RandomForestClassifier or BaggingClassifier from sklearn.ensemble

-> various factors in BaggingClassifier: base estimator function type, the number of estimators

2. Dependent ensemble learning: ml models operate with depending on eachother -> slow, costly

1) boosting: aggregate a number of weak learners to make a strong learner -> dynamically control weight values of weak learners to better focus on tricky datapoints

Ex) adaboost algorithm, gradient boost algorithm

- adaboost: updating weight for each datapoint to predict all datapoints well

i) discrete target (-1/1): set initial weight wi=1/n for each datapoint -> train fj(x) -> error rate for fj = Sigma[wi * I(yi != fj(xi)] -> model weight aj=1/2 log(1-ej / ej): if error rate is high, model weight gets lower -> datapoint weight update: wi =wi * exp[-aj * yi * fj(xi)]: if fj didn't classified xi well, wi gets bigger -> further models should classify xi well! -> normalize wi with total sum of wi -> repeat those steps for number of weak trainers -> use aj, fj, calculate sum of total aj*fj(x), and get the sign of sum value

ii) continuous target (probability of belonging to each class): set initial weight wi=1/n for each datapoint ->train pj(x) [0,1] -> update fj: fj(x)=1/2 log(pj(x) / 1-pj(x)) -> wi=wi * exp[-yi fj(xi)] -> normalize wi -> repeat those steps for number of weak trainers -> use fj, calculate sum of total fj(x), and get the sign of sum value

=> continuous target algorithm does not implement model weight, aj. Instead, this directly update model

=> import AdaBoostClassifier, AdaBoostRegressor from sklearn.ensemble

=> can personally implement adaboost algorithm using various types of weak learners, e.g. SVM, LR, NB, NN

- gradient boosting

* other train process: x1 -> F(x1) -> minimize residual y1-F(x1)

But, gradient boosting suggest 'residual function' f(x). -> f(xi)=yi-F(xi)

(xi, yi-F(xi)) -> train f(xi)

=> L(yi, F(xi))=1/2 (yi-F(xi))^2

=> J=Sigma[L(yi,F(xi))]

-> partial derivative of J about F(xi) = F(xi)-yi

-> Fm(x) = Fm-1(x) + f(x) = Fm-1(x) + y-F(x) = Fm-1(x) + partial derivative of J about F(xi)

(1) setting initial F0(x)

(2) calculate residual, e_m-1 = y - F_m-1(x)

(3) fit f_m(x) to e_m-1

(4) F_m(x) = F_m-1(x) + eta * f_m(x)

- GradientBoostingClassifier from sklearn.ensemble, XGBClassifier or function from LightGBM library

2) stacking: stacking multiple learners -> base learner + meta learner

Ex) k-fold CV -> k folds of train data -> k models are trained. In case of each model, after training model with (k-1) folds of train data, 1 fold data is used to output prediction -> aggregate predictions of base learners -> pred data

-> use it as input data of meta learner

* similar to stacking multiple NN layers in DNN model

'머신러닝' 카테고리의 다른 글

Dimension Reduction - PCA (0)	2026.03.08
Affine and Convex (0)	2026.02.23
fuzzy logic (0)	2026.02.22
Genetic Algorithm (0)	2026.02.21
kNN, k means clustering, a priori method (0)	2026.02.21

현재글ensemble learning

jun1-cs 님의 블로그

jun1-cs 님의 블로그 입니다.

순차 무모순, 딥러닝, hedge operation, hyperplane, 깊이우선탐색, k means clustering, 의사를 위한 실전 인공지능, reduced rank LDA, kernel pca, 스레드, ML, a priori, dl, dfs, membership function, probability theory, backpropagation, degree of membership, deep learning, 선형화 가능,

Today :
Yesterday :

일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

jun1-cs 님의 블로그