|
This paper presented a two step method for offline handwritten Farsi word recognition. In first step, in order to improve the recognition accuracy and speed, an algorithm proposed for initial eliminating lexicon entries unlikely to match the input image. For lexicon reduction, the words of lexicon are clustered using ISOCLUS and Hierarchal clustering algorithm. Clustering is based on the features that describe the shape of word generally. In second step, a new method proposed to extract histogram of gradient image which this showed well the correspondence between different samples of handwritten word images. The gradient feature vectors of input words are compared with gradient feature vectors of candidate words using K nearest neighbor classifications. The recognition result on handwritten words of IRANSHAR dataset showed that the lexicon reduction step and the new method of extracting gradient feature increased recognition accuracy and speed by removing classifier confusion.
In this paper, the problem of classification of motor imagery EEG signals using a sparse representation-based classifier is considered. Designing a powerful dictionary matrix, i.e. extracting proper features, is an important issue in such a classifier. Due to its high performance, the Common Spatial Patterns (CSP) algorithm is widely used for this purpose in the BCI systems. The main disadvantages of the CSP algorithm are its sensibility to noise and the over learning phenomena when the number of training samples is limited. In this study, to overcome these problems, two modified form of the CSP algorithms, namely the DLRCSP and GLRCSP have been used. Using the adopted methods, the average detection rate is increased by a factor of about 7.78 %. Also, a problem of the SRC classifier which uses the standard BP algorithm is the computational complexity of the BP algorithm. To overcome this weakness, we used a new algorithm which is called the SL0 algorithm. Our classification results show that using the SL0 algorithm, the classification process is highly speeded up. Moreover, it leads to an increase of about 1.61% in average correct detection compared to the basic standard algorithm.
Considering the existence of a many speech degradation factors, speech enhancement has become an important topic in the field of speech processing. Beamforming is one of the well-known methods for improving the speech quality that is conventionally applied using regular (classical) microphone arrays. Due to the restrictions in the regular arrangement of microphones, in recent years there has been an emerging trend toward the microphone arrays with irregular arrangement (or so-called Ad-hoc microphone arrays). Due to the lack of knowledge about the location and the arrangement of microphones, and spreading of the microphones throughout the environment, the idea of clustering has been considered in this paper. We propose a method for the clustering of microphones in directional noise fields. For this type of noise fields, we propose a new clustering method that works based on the energy of the received signals. We have tried that the proposed clustering method to be applicable in different directional noise fields. We also propose a modified structure for the GSC beamformer by considering different roles for microphone clusters. Our evaluations indicate that in some situations, employing a microphone cluster produces superior results compared to the usage of all microphones. This, in turn, shows that the performance of the speech enhancement system can been improved using the clustering process, while the computational load is also decreased (due the reduction in the number of employed microphones).
The performance of automatic speech recognition (ASR) systems is adversely affected by the variations in speakers, audio channels and environmental conditions. Making these systems robust to these variations is still a big challenge. One of the main sources of variations in the speakers is the differences between their Vocal Tract Length (VTL). Vocal Tract Length Normalization (VTLN) is an effective method introduced to cope with this variation. In this method, the speech spectrum of each speaker is frequency warped according to a specific warping factor of that speaker. In this paper, we first developed the common search-based method to obtain the appropriate warping factor over a HMM-based Persian continuous speech recognition system. Then pointing out the computational cost of search-based method, we proposed a linear regression process for estimating warping factor based on the scores generated by our gender detection system. Experimental results over a Persian conversational speech database shown an improvement about 0.54 percent in word recognition accuracy as well as a significant reduction in computational cost of estimating warping factor, compared to search-based approach.
Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and therefor the syntactic representation chosen affects the overall performance of the system. In this research, we present a semantic role labeling system based on full syntactic parsing. For this purpose, we use a dependency parser and machine learning methods. In our system, we have made an effort to overcome the problems of previous semantic role labelers for Persian, which all are based on shallow syntactic parsing. The outcome of the system is promising.
Due to the increasing number of documents, XML, effectively organize these documents in order to retrieve useful information from them is essential. A possible solution is performed on the clustering of XML documents in order to discover knowledge. Clustering XML documents is a key issue of how to measure the similarity between XML documents. Conventional clustering of text documents using a document similarity measure used in information content, they can cause structural information contained in XML documents is ignored. In this paper, a new model named matrix space model to represent both structural and content features of documents in XML, is proposed. Based on this model, the Jaccard similarity measure is defined and the colonial competitive algorithm for clustering XML documents is used. Experimental results show that the proposed model function in identifying similar documents which closely identified with the same structure and content information are effective. This method can improve the accuracy of clustering, and XML data can be used to increase productivity.
The most important method for behavior recognition of recurrent maps is to plot bifurcation diagram. In conventional method used for plotting bifurcation diagram, a couple of time series for different values of model parameter have been generated and these points have been plotted with due respect to it after transient state. It does not have enough accuracy necessary for period detection and essential for discrimination between long periodic behaviors from chaotic behaviors; on the other hand because of being 2-dimensinal, it will not be possible to investigate the effect if the initial condition is in the basin of attraction.
In this research, a new bifurcation diagram is presented which is called: Qualitative Bifurcation Diagram (QBD). QBD provides accurate determination of periodicity. Results of our algorithm implementation on logistic map, represents its ability on determining long periods and period windows. Bifurcation diagram of logistic map does not obey mosaic tiling patterns (patterns that are created by arrangement not interaction) as a disciplinein addition to having the dynamic order. Some benefits of QBD are: long period discrimination, period window detection, computation time reduction, period presentation instead of amplitude show. In the following we have an analytical survey to Lyapunov exponent – as a usual measurement tool for chaotic behavior – and important notes are expressed. Finally, Recurrent Quantification Analysis (RQA) and QBD are compared.
Artificial immune system (AIS) is one of the most meta-heuristic algorithms to solve complex problems. With a large number of data, creating a rapid decision and stable results are the most challenging tasks due to the rapid variation in real world. Clustering technique is a possible solution for overcoming these problems. The goal of clustering analysis is to group similar objects.
AIS algorithm can be used in data clustering analysis. Although AIS is able to good display configure of the search space, but determination of clusters of data set directly using the AIS output will be very difficult and costly. Accordingly, in this paper a two-step algorithm is proposed based on AIS algorithm and hierarchical clustering technique. High execution speed and no need to specify the number of clusters are the benefits of the hierarchical clustering technique. But this technique is sensitive to outlier data.
So, in the first stage of introduced algorithm the search space and the configuration space are identified using the proposed AIS algorithm, and therefore outlier data are determined. Then in second phase, using hierarchical clustering technique, clusters and their number are determined. Consequently, the first stage of proposed algorithm eliminates the disadvantages of the hierarchical clustering technique, and AIS problems will be resolved in the second stage of the proposed algorithm.
In this paper, the proposed algorithm is evaluated and assessed through two metrics that were identified as (i) execution time (ii) Sum of Squared Error (SSE): the average total distance between the center of a cluster with cluster members used to measure the goodness of a clustering structure. Finally, the proposed algorithm has been implemented on a real sample data composed of the earthquake in Iran and has been compared with the similar algorithm titled Improved Ant System-based Clustering algorithm (IASC). IASC is based on Ant Colony System (ACS) as the meta-heuristics clustering algorithm. It is a fast algorithm and is suitable for dynamic environments. Table 1 shows the results of evaluation.
Table 4: Compare the two algorithms
| Proposed algorithm | IASC | Alg. |
| 12 | 18 | Execution time (s) |
| 5/3 | 9/4 | SSE |
The results showed that the proposed algorithm is able to cover the drawbacks in AIS and hierarchical clustering techniques and on the other hand has high precision and acceptable run speed.
Epilepsy is a neurological disorder after stroke. About 1 percent of people in the world are involved with this second most common neurological disorder. Epilepsy can affect people of different ages with an altered behavior or lack of patient awareness and affect one's social life. In 75% of cases, if epilepsy is diagnosed early and properly, it can be treated.
Among all existing methods of analysis for the detection of epileptic brain activity, EEG is more applicable, due to its special features (including its low-cost and innocuous). Despite all the advantages of this method, the visual scoring of the EEG records by a human scorer is clearly a very time consuming and costly task considering the large number of epileptic patients admitted to the hospitals and the amount of data needs to be scored. Thus, a tremendous effort has been devoted by researchers towards automatic epileptic seizures detection in EEG.
This paper offers a novel method based on heuristic and intelligent algorithms, inclined planes system optimization (IPO), to detect epileptic samples from healthy subjects. Like other heuristic algorithms, IPO is inspired by nature and its laws. How to move sphere objects on the slope without friction and their desire to reach the lowest point, shapes the main idea of the IPO. In the IPO, small balls like particles in the PSO are placed randomly on the search space. The balls search the search space to find the optimal point which is the lowest point (relative to a reference point) on the surface.
In the current work, the data described by Andrzejak et al. was used; which contains 5 sets (Z, O, N, F and S). In this work, three different classification problems are created from the above dataset in order to compare the performance of our method with other approaches:
The EEG signal under study is firstly decomposed into five sub-bands through DWT (D1–D4 and A4), and each sub-band represents different frequency bands information. Afterwards, four statistical parameters of maximum, minimum, average and standard deviation were calculated for each sub-band. And then, using the optimization algorithm IPO, the best weights are calculated to apply to the OVA classifier in order to find the best hyper plane separating the two classes. The fitness function defined in the IPO algorithm, is the number of signals that have been classified incorrectly.
To classify EEG signals in three problems, the 10-fold Cross-Validation method is used. In this method, the data is divided into 10 subsections. And then, one subset is used for test and nine others for training. This procedure is repeated 10 times, until all the data is used for testing. The proposed algorithm have been implemented 10 times for the two wavelet functions Db1 and db2. Using the proposed method, the accuracy obtained for the three problems is 100%, 98/1%, 97/34%, respectively. Also by the proposed method diagnosis of epilepsy can be achieved very quickly. The results show that the algorithm is capable of detecting signals of epileptic and non-epileptic in less than 5 milliseconds. This makes it possible to use this method in real-time systems.
Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown that distance metric learning-based algorithms considerably outperform the commonly used distance metrics such as Euclidean distance. In the kernelized version of the metric learning algorithms, the data points are implicitly mapped into a new feature space using a non-linear kernel function. The associated distance metric is then learned in this new feature space. Utilizing kernel function improves the performance of pattern recognition algorithms, however choosing a proper kernel and tuning its parameter(s) are the main issues in such methods. Using of an appropriate composite kernel instead of a single kernel is one of the best solutions to this problem. In this research study, a multiple kernel is constructed using the weighted sum of a set of basis kernels. In this framework, we propose different learning approaches to determine the kernels weights. The proposed learning techniques arise from the distance metric learning concepts. These methods are performed within a semi supervised framework where different cost functions are considered and the learning process is performed using a limited amount of supervisory information. The supervisory information is in the form of a small set of similarity and/or dissimilarity pairs. We define four distance metric based cost functions in order to optimize the multiple kernel weight. In the first structure, the average distance between the similarity pairs is considered as the cost function. The cost function is minimized subject to maximizing of the average distance between the dissimilarity pairs. This is in fact, a commonly used goal in the distance metric learning problem. In the next structure, it is tried to preserve the topological structure of the data by using of the idea of graph Laplacian. For this purpose, we add a penalty term to the cost function which preserves the topological structure of the data. This penalty term is also used in the other two structures. In the third arrangement, the effect of each dissimilarity pair is considered as an independent constraint. Finally, in the last structure, maximization of the distance between the dissimilarity pairs is considered within the cost function not as a constraint. The proposed methods are examined in the clustering application using the kernel k-means clustering algorithm. Both synthetic (a XOR data set) and real data sets (the UCI data) used in the experiments and the performance of the clustering algorithm using single kernels, are considered as the baseline. Our experimental results confirm that using the multiple kernel not only improves the clustering result but also makes the algorithm independent of choosing the best kernel. The results also show that increasing of the number of constraints, as in the third structures, leads to instability of the algorithm which is expected.
A rate control algorithm at the group of picture (GOP) level is proposed in this paper for variable bit rate applications of the H.265/HEVC video coding standard with buffer constraint. Due to structural changes in the HEVC compared to the previous standards, new rate control algorithms are needed to be designed. In the proposed algorithm, quantization parameter (QP) of each GOP is obtained by modifying QP of previous GOP according to target bit rate and buffer status. Buffer status and target bit rate are input variables selected to expand a two dimensional lookup table. Output of the lookup table is provided in a way to allow short-term variations in bit rate, in order to reach better and more uniform visual quality of reconstructed video. In addition, a QP cascading technique is used for calculating QP of frames in each GOP that operates like a bit allocation scheme and causes suitable trade-off between quality and compression rate. Unlike conventional methods, proposed scheme uses a lookup table instead of using a rate-distortion model that significantly reduces the computational complexity. Several video sequences with completely different contents were used for experiments. Some short video sequences are concatenated to attain long video sequences which are closer to variable bit rate applications. Lookup table based (LUT) proposed algorithm is implemented on HM reference software and compared with λ-domain rate control algorithm (λ-RC) and constant QP (CQP) case that defined as anchor. In almost the same average bit rate (CQP: 1527.97, LUT: 1520.92, λ-RC: 1529.41), average QP (28.09, 28.18, 29.91) and average peak signal to noise ratio (PSNR) (37.88, 37.87, 37.76) of LUT is closer to CQP than that of λ-RC. Average values of QP standard deviation (1.13, 2.28, 4.27) and PSNR standard deviation (1.37, 2.11, 2.15) of LUT is smaller than λ-RC and closer to CQP. From rate control point of view, minimum buffering delay on average for all video sequences resulted by LUT is the same with that of λ-RC which is one of the best rate controllers proposed for the HEVC (0.94, 0.36, 0.35). Consequently, experimental results show that not only bit rate is perfectly controlled according to buffer constraints, but also the quality of reconstructed video is well maintained.
Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The Gaussian mixture model is the most common statistical model in spectral-based language identification systems. On the other hand, in phonetic-based methods, speech signals are divided into a sequence of tokens using the hidden Markov model (HMM) and a language model is trained using the obtained sequence. Approaches like PRLM, PPRLM, and PR-SVM are some examples of phonetic-based methods. In research papers, usually a combination of phonetic-based and spectral-based systems are used to achieve a high quality language identification system. Spectral-based methods have been the focus of researchers, since they have no need for labeled data and usually achieve better results than phonetic approaches. Therefore, in this paper, these methods used for language identification and different spectral methods, are introduced, implemented, and compared with spoken language recognition.
The basic spectral language identification method is Gaussian Mixture Model-Universal Background Model (GMM-UBM). In this paper, the MMI discrimination method is used to improve the Gaussian model of each language. Moreover, in order to model the language dynamically, GMM is replaced with the ergodic hidden Markov model (EHMM). GSV-SVM and GMM tokenizer methods are also implemented as two popular spectral approaches. In this paper, novel speaker and channel variation modeling methods are used as language identification approaches, including joint factor analysis (JFA), identity vector (i-Vector) and several variations compensation methods exploited to improve the results of i-Vector.
Furthermore, in order to boost the performance of language recognition systems, different post-processing methods are applied. For post-processing, each element of raw score vector indicates the degree by which the spoken signal belongs to a language. Post-processing methods are applied to this vector as a classifier and allows making better language detection decisions by mapping the raw score vector to a space of desired languages. Different studies have employed different post-processing methods, including GMM, NN, SVM, and LLR. This study exploits several score post-processing methods to improve the quality of language recognition.
The goal of the experiments in this article is to detect and distinguish Farsi, English, and Arabic, individually and simultaneously from other languages. The latter is also called open-set language identification. The signals considered in this paper include two-sided conversations, whose quality is usually not desirable due to strong noise signals, background noises of individuals or music, accents, etc.
Gaussian mixture-universal model (GMM-UBM) was implemented as the basic method. In this approach, mean EER of the three target languages (Farsi, English, and Arabic) was 13.58. Experimental results indicated that training the GMM language identification system with the MMI discrimination training algorithm is more efficient than systems only trained by the ML algorithm. More specifically, the mean EER of the three target languages was reduced about 8 percent in comparison to GMM-UBM. The GMM tokenizer method was also tested as a novel spectral approach. Using this method, the mean EER of the three target languages was also about 5 percent better than GMM-UBM.
In this study, the GSV-SVM discrimination method was also used for language recognition. The results of this method were considerably better than those of common spectral approaches, such that the mean EER of the three target languages was reduced by 11 percent in comparison to GMM-UBM. This study improves the low speed of this method using a model pushing method.
This study also implemented two novel methods, JFA and i-Vector. According to the results, both of these methods provide better results than GMM-UBM, such that the mean EER values of the three target languages in JFA and i-Vector are respectively reduced by 1% and 12%. Generally, experimental results showed that i-Vector provides better results than other spectral language identification systems.
This study is a result of a seven-year research in spoken language identification in the advanced technology development center of Khajeh Nasiredin Tousi. The ongoing research includes studying and implementing novel spectral language identification algorithms like PLDA and state-of-the-art phonetic language identification methods to combine the two spectral and phonetic systems and eventually, achieving a high quality language identification system.
© 2015 All Rights Reserved | Signal and Data Processing

