Dimensionality Reduction and Feature Selection Methods for Script Identification on Document Images
Main Article Content
Abstract
The goal of this research is to explore effects of dimensionality reduction and feature selection on the problem of script identification from images of printed documents. The k-adjacent segment is ideal for this use due to its ability to capture visual patterns. We have used principle component analysis to reduce the size of our feature matrix to a handier size that can be trained easily, and experimented by including varying combinations of dimensions of the super feature set. A modular approach in neural network was used to classify 7 languages - Arabic, Chinese, English, Japanese, Tamil, Thai and Korean.
Article Details
Issue
Section
Articles