Principal Component Analysis for Feature Extraction
Keywords:
PCA, Dimensionality, Robust Technique, Image ProcessingAbstract
Feature extraction in image processing involves transforming raw pixel data into a more meaningful representation that can be used for various tasks such as image classification, object detection, or image retrieval. The goal is to extract important attributes or characteristics (features) from the image that capture essential information and reduce the dimensionality of the data while preserving its most significant aspects. One of most Common Feature Extraction Techniques One popular The Principal Component Analysis (PCA) method is used to reduce dimensionality and extract features. In various domains, including image processing, finance, and bioinformatics. This paper explores the fundamentals of PCA, its mathematical foundation, and practical applications for feature extraction. We demonstrate how High-dimensional data can be converted into a lower-dimensional space using PCA, while retaining significant information, enhancing computational efficiency, and improving model performance. Using PCA for feature extraction involves transferring, as much as possible, the variance (information) of the initial data with high dimensions placed in an area with lower dimensions. Images are inherently high-dimensional data, with each pixel representing a feature. For example, a 256x256 grayscale image has 65,536 features. Analyzing and processing such high-dimensional data can be computationally intensive and may lead to overfitting in machine learning models. Autism Facial image dataset used in this paper. PCA reduces this dimensionality by identifying the most significant components (principal components) that demonstrate the variation in the image data.
References
A. Maćkiewicz and W. Ratajczak, "Principal Components Analysis (PCA)," Computers & Geosciences, vol. 19, no. 3, pp. 303-342, 1993.
H. Abdi and L. J. Williams, "Principal Component Analysis," Wiley Interdisciplinary Reviews: Computational Statistics, vol. 2, no. 4, pp. 433-459, 2010.
S. P. Mishra, U. Sarkar, S. Taraphder, S. Datta, D. Swain, R. Saikhom, and M. Laishram, "Multivariate Statistical Data Analysis-Principal Component Analysis (PCA)," International Journal of Livestock Research, vol. 7, no. 5, pp. 60-78, 2017.
M. Greenacre, P. J. Groenen, T. Hastie, A. I. d’Enza, A. Markos, and E. Tuzhilina, "Principal Component Analysis," Nature Reviews Methods Primers, vol. 2, no. 1, p. 100, 2022.
B. M. S. Hasan and A. M. Abdulazeez, "A Review of Principal Component Analysis Algorithm for Dimensionality Reduction," Journal of Soft Computing and Data Mining, vol. 2, no. 1, pp. 20-30, 2021.
A. Tharwat, "Principal Component Analysis—A Tutorial," International Journal of Applied Pattern Recognition, vol. 3, no. 3, pp. 197-240, 2016.
H. Cardot and D. Degras, "Online Principal Component Analysis in High Dimension: Which Algorithm to Choose?," International Statistical Review, vol. 86, no. 1, pp. 29-50, 2018.
S. Naveen, A. Omkar, J. Goyal, and R. Gaikwad, "Analysis of Principal Component Analysis Algorithm for Various Datasets," in 2022 International Conference on Futuristic Technologies (INCOFT), 2022, pp. 1-7.
F. Yao, J. Coquery, and K. A. Lê Cao, "Independent Principal Component Analysis for Biologically Meaningful Dimension Reduction of Large Biological Data Sets," BMC Bioinformatics, vol. 13, p. 1-15, 2012.
A. Tharwat, "Principal Component Analysis—A Tutorial," International Journal of Applied Pattern Recognition, vol. 3, no. 3, pp. 197-240, 2016.
G. R. Naik, Ed., Advances in Principal Component Analysis: Research and Development. Springer, 2017.
F. Kherif and A. Latypova, "Principal Component Analysis," in Machine Learning, Academic Press, 2020, pp. 209-225.
M. Zhao, Z. Jia, Y. Cai, X. Chen, and D. Gong, "Advanced Variations of Two-Dimensional Principal Component Analysis for Face Recognition," Neurocomputing, vol. 452, pp. 653-664, 2021.
J. Deng, K. Wang, D. Wu, X. Lv, C. Li, J. Hao, and W. Chen, "Advanced Principal Component Analysis Method for Phase Reconstruction," Optics Express, vol. 23, no. 9, pp. 12222-12231, 2015.
Y. Hao, "Integrated Analysis of Multimodal Single-Cell Data," Cell, vol. 184, no. 13, pp. 3573-3587, 2021, doi: 10.1016/j.cell.2021.04.048.
J. Chong, "Using MetaboAnalyst 4.0 for Comprehensive and Integrative Metabolomics Data Analysis," Current Protocols in Bioinformatics, vol. 68, no. 1, 2019, doi: 10.1002/cpbi.86.
J. Grove, "Identification of Common Genetic Risk Variants for Autism Spectrum Disorder," Nature Genetics, vol. 51, no. 3, pp. 431-444, 2019, doi: 10.1038/s41588-019-0344-8.
R. D. Riley, "Calculating the Sample Size Required for Developing a Clinical Prediction Model," The BMJ, vol. 368, 2020, doi: 10.1136/bmj.m441.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Noor Hasan Fadhil, Akmam Majed Mosa, Ihsan Sahib Abdulsaheed
This work is licensed under a Creative Commons Attribution 4.0 International License.