-
Notifications
You must be signed in to change notification settings - Fork 2
PRINCIPAL_COMPONENT_ANALYSIS
TYehan edited this page Feb 27, 2025
·
1 revision
This document explains the primary machine learning concepts demonstrated in the Principal Component Analysis (PCA) practical notebook. The notebook applies PCA to the Olivetti faces dataset to perform dimensionality reduction and visualize the principal components (eigenfaces).
-
Dataset Acquisition:
- The Olivetti faces dataset is loaded using
fetch_olivetti_faces
fromsklearn.datasets
. - The data contains images (64x64 pixels) of faces which are flattened into feature vectors.
- The Olivetti faces dataset is loaded using
-
Basic Visualization:
- A grid of sample images is displayed to provide an overview of the dataset.
-
PCA Initialization:
- The PCA model is initialized with
n_components=150
andwhiten=True
to standardize the influence of components.
- The PCA model is initialized with
-
Transformation:
- The dataset is transformed into its principal components using the PCA model. This reduces dimensionality while retaining the variance in the data.
-
Variance Reporting:
- The notebook prints the percentage of total variance explained by each of the first 12 principal components.
-
Cumulative Variance Visualization:
- A cumulative variance plot is generated to illustrate how the variance accumulates with the addition of more principal components.
- Dashed lines mark the variance explained by the first 12 components.
-
Eigenfaces Display:
- The first 12 principal components (eigenfaces) are reshaped back into the original image dimensions.
- These eigenfaces are visualized in a grid format to show the key patterns captured from the original dataset.
This practical notebook serves as a comprehensive example of applying PCA to reduce dimensionality, analyze the variance captured by principal components, and visualize the eigenfaces, thereby demonstrating how fundamental machine learning techniques can be used for feature extraction and data visualization.