Dimensionality Reduction and Feature Selection Methods for Script Identification on Document Images

Bruce Poon, Rahman Saami, M. Ashraful Amin, Hong Yan

doi:10.17762/itii.v2i1.6

PDF

Published: Mar 31, 2014

DOI: https://doi.org/10.17762/itii.v2i1.6

Bruce Poon, Rahman Saami, M. Ashraful Amin, Hong Yan

Abstract

The goal of this research is to explore effects of dimensionality reduction and feature selection on the problem of script identification from images of printed documents. The k-adjacent segment is ideal for this use due to its ability to capture visual patterns. We have used principle component analysis to reduce the size of our feature matrix to a handier size that can be trained easily, and experimented by including varying combinations of dimensions of the super feature set. A modular approach in neural network was used to classify 7 languages - Arabic, Chinese, English, Japanese, Tamil, Thai and Korean.

Issue

Vol. 2 No. 1 (2014)

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details