1. Curse of dimensionality
: as the dimension of space gets bigger, required size of data (number of distinct features) gets higher -> the model can be overfitted
Use dimension reduction methods => select only important orientation, vector that shows high variance of datapoints
-> eliminate unimportant features (noise)
Ex) PCA, LDA
2. Principal components analysis
Ex) input data, X [100,4] -> standardize each feature column, to make each column's mean to be 0
-> make covariance matrix Sigma [4,4]
-> calculate 4 eigenvalues and 4 eigenvectors (eigenvalue of Sigma: size of variance)
-> eigenvectors e1,e2,e3,e4 (normalized)
-> select eigenvector with high eigenvalue which stands for highly variable feature (important feature)
(ei with high eigenvalue means that ei is orthogonal with rows of X whose diagonal var components are diminished.)
if) y1=Xe1 (X is projected onto axis e1 -> formed y1 with variance lambda1)
(1) var(y1) = e1^T Sigma e1 = e1^T lambda1 e1
-> var(y1)=lambda1
(2) cov(y1,y2) = e1^T Sigma e2 = 0
*explained variance = lambda i / sum of all lambda values
PCA
=> n features -> orthonormal, n new vectors whose eigenvalues stand for variance of projected X
=> finding the best direction(vector) that shows high variance of datapoints in feature space
=> using less features to predict target (moderate n_pca to eliminate noise and predict target well )
3. Kernel PCA
transforming input data X into higher dimension, before going through pca process
(1) X -> X' = psi(X) : mapping datapoints of X into higher dimension feature space (changing the feature space)
(2) operate pca process with X'
* types of kernels: linear, poly, rbf, sigmoid. cosine, precomputed functions
new insights
(1) making my own kernel that can be used in medical features? to make latent variables with basic demo, lab variables
(2) revised form of PCA process which is prioritized for medical features?
(3) containing pca process inside of the ml model?
'머신러닝' 카테고리의 다른 글
| ensemble learning (0) | 2026.02.23 |
|---|---|
| Affine and Convex (0) | 2026.02.23 |
| fuzzy logic (0) | 2026.02.22 |
| Genetic Algorithm (0) | 2026.02.21 |
| kNN, k means clustering, a priori method (0) | 2026.02.21 |