Dimension Reduction

머신러닝

Dimension Reduction - PCA

jun1-cs 2026. 3. 8. 22:49

1. Curse of dimensionality

: as the dimension of space gets bigger, required size of data (number of distinct features) gets higher -> the model can be overfitted

Use dimension reduction methods => select only important orientation, vector that shows high variance of datapoints

-> eliminate unimportant features (noise)

Ex) PCA, LDA

2. Principal components analysis

Ex) input data, X [100,4] -> standardize each feature column, to make each column's mean to be 0

-> make covariance matrix Sigma [4,4]

-> calculate 4 eigenvalues and 4 eigenvectors (eigenvalue of Sigma: size of variance)

-> eigenvectors e1,e2,e3,e4 (normalized)

-> select eigenvector with high eigenvalue which stands for highly variable feature (important feature)

(ei with high eigenvalue means that ei is orthogonal with rows of X whose diagonal var components are diminished.)

if) y1=Xe1 (X is projected onto axis e1 -> formed y1 with variance lambda1)

(1) var(y1) = e1^T Sigma e1 = e1^T lambda1 e1

-> var(y1)=lambda1

(2) cov(y1,y2) = e1^T Sigma e2 = 0

*explained variance = lambda i / sum of all lambda values

PCA

=> n features -> orthonormal, n new vectors whose eigenvalues stand for variance of projected X

=> finding the best direction(vector) that shows high variance of datapoints in feature space

=> using less features to predict target (moderate n_pca to eliminate noise and predict target well )

3. Kernel PCA

transforming input data X into higher dimension, before going through pca process

(1) X -> X' = psi(X) : mapping datapoints of X into higher dimension feature space (changing the feature space)

(2) operate pca process with X'

* types of kernels: linear, poly, rbf, sigmoid. cosine, precomputed functions

new insights

(1) making my own kernel that can be used in medical features? to make latent variables with basic demo, lab variables

(2) revised form of PCA process which is prioritized for medical features?

(3) containing pca process inside of the ml model?

'머신러닝' 카테고리의 다른 글

ensemble learning (0)	2026.02.23
Affine and Convex (0)	2026.02.23
fuzzy logic (0)	2026.02.22
Genetic Algorithm (0)	2026.02.21
kNN, k means clustering, a priori method (0)	2026.02.21

현재글Dimension Reduction - PCA

jun1-cs 님의 블로그

jun1-cs 님의 블로그 입니다.

dl, k means clustering, dfs, ML, degree of membership, 딥러닝, membership function, hedge operation, a priori, kernel pca, 순차 무모순, backpropagation, hyperplane, 깊이우선탐색, 선형화 가능, 스레드, probability theory, reduced rank LDA, 의사를 위한 실전 인공지능, deep learning,

Today :
Yesterday :

일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

jun1-cs 님의 블로그