ensemble learning: compare multiple classifer models and make better decision to reduce the prediction error
1. Independent ensemble learning: ml models operate in parallel -> fast
1) voting: train multiple type of models with train data and vote to classify it
Ex) plurality voting, majority voting
- we can also mathematically prove that n classifiers with same error rate e can exhibit lower or same error rate than in case of individual classifier
- classification problem -> use VotingClassifier from sklearn.ensemble
- regression problem -> use VotingRegressor from sklearn.ensemble
-> various factors in those function: 'weight' of each model, hard/soft for plurality/probability average
2) bagging (bootstrap aggregating): train 1 type of model multiple times with same data and vote for final decision
step 1. extract n bootstrap samples (independent trials)
step 2. train n models using n bootstrap samples
step 3. vote n predictions to make final decision
Ex) Random forest model: max_depth factor -> how many times to repeat step 1,2 for separating features
- classification problem -> use RandomForestClassifier or BaggingClassifier from sklearn.ensemble
-> various factors in BaggingClassifier: base estimator function type, the number of estimators
2. Dependent ensemble learning: ml models operate with depending on eachother -> slow, costly
1) boosting: aggregate a number of weak learners to make a strong learner -> dynamically control weight values of weak learners to better focus on tricky datapoints
Ex) adaboost algorithm, gradient boost algorithm
- adaboost: updating weight for each datapoint to predict all datapoints well
i) discrete target (-1/1): set initial weight wi=1/n for each datapoint -> train fj(x) -> error rate for fj = Sigma[wi * I(yi != fj(xi)] -> model weight aj=1/2 log(1-ej / ej): if error rate is high, model weight gets lower -> datapoint weight update: wi =wi * exp[-aj * yi * fj(xi)]: if fj didn't classified xi well, wi gets bigger -> further models should classify xi well! -> normalize wi with total sum of wi -> repeat those steps for number of weak trainers -> use aj, fj, calculate sum of total aj*fj(x), and get the sign of sum value
ii) continuous target (probability of belonging to each class): set initial weight wi=1/n for each datapoint ->train pj(x) [0,1] -> update fj: fj(x)=1/2 log(pj(x) / 1-pj(x)) -> wi=wi * exp[-yi fj(xi)] -> normalize wi -> repeat those steps for number of weak trainers -> use fj, calculate sum of total fj(x), and get the sign of sum value
=> continuous target algorithm does not implement model weight, aj. Instead, this directly update model
=> import AdaBoostClassifier, AdaBoostRegressor from sklearn.ensemble
=> can personally implement adaboost algorithm using various types of weak learners, e.g. SVM, LR, NB, NN
- gradient boosting
* other train process: x1 -> F(x1) -> minimize residual y1-F(x1)
But, gradient boosting suggest 'residual function' f(x). -> f(xi)=yi-F(xi)
(xi, yi-F(xi)) -> train f(xi)
=> L(yi, F(xi))=1/2 (yi-F(xi))^2
=> J=Sigma[L(yi,F(xi))]
-> partial derivative of J about F(xi) = F(xi)-yi
-> Fm(x) = Fm-1(x) + f(x) = Fm-1(x) + y-F(x) = Fm-1(x) + partial derivative of J about F(xi)
(1) setting initial F0(x)
(2) calculate residual, e_m-1 = y - F_m-1(x)
(3) fit f_m(x) to e_m-1
(4) F_m(x) = F_m-1(x) + eta * f_m(x)
- GradientBoostingClassifier from sklearn.ensemble, XGBClassifier or function from LightGBM library
2) stacking: stacking multiple learners -> base learner + meta learner
Ex) k-fold CV -> k folds of train data -> k models are trained. In case of each model, after training model with (k-1) folds of train data, 1 fold data is used to output prediction -> aggregate predictions of base learners -> pred data
-> use it as input data of meta learner
* similar to stacking multiple NN layers in DNN model
'머신러닝' 카테고리의 다른 글
| Dimension Reduction - PCA (0) | 2026.03.08 |
|---|---|
| Affine and Convex (0) | 2026.02.23 |
| fuzzy logic (0) | 2026.02.22 |
| Genetic Algorithm (0) | 2026.02.21 |
| kNN, k means clustering, a priori method (0) | 2026.02.21 |