2024-03-29T00:49:07+03:30 http://jsdp.rcisp.ac.ir/browse.php?mag_id=29&slc_lang=fa&sid=1
29-440 2024-03-29 10.1002
Signal and Data Processing JSDP 2538-4201 2538-421X 10.52547/jsdp 2018 14 4 An Access Management System to Mitigate Operational Threats in SCADA System Payam Mahmoudi Nasr P.mahmoudi@umz.ac.ir Ali Yazdian Varjani Yazdian@modares.ac.ir One of the most dangerous insider threats in a supervisory control and data acquisition (SCADA) system is the operational threat. An operational threat occurs when an authorized operator misuses the permissions, and brings catastrophic damages by sending legitimate control commands. Providing too many permissions may backfire, when an operator wrongly or deliberately abuses the privileges. Therefore, an access management system is required to provide necessary permissions and prevent malicious usage.  An operational threat on a critical infrastructure has the potential to cause large financial losses and irreparable damages at the national level. In this paper, we propose a new alarm-trust based access management system reducing the potential of operational threats in SCADA system.  In the proposed system, the accessibility of a remote substation will be determined based on the operator trust and the criticality level of the substation. The trust value of the operator is calculated using the performance of the operator, periodically or in emergencies, when an anomaly is detected. The criticality level of the substation is computed using its properties. Our system is able to detect anomalies that may result from the operational threats. The simulation results in the SCADA power system of Iran show effectiveness of our system.   Access control trust insider threat anomaly detection SCADA 2018 3 01 3 18 http://jsdp.rcisp.ac.ir/article-1-440-en.pdf 10.29252/jsdp.14.4.3
29-424 2024-03-29 10.1002
Signal and Data Processing JSDP 2538-4201 2538-421X 10.52547/jsdp 2018 14 4 Signal Detection Based on GPU-Assisted Parallel Processing for Infrastructure-based Acoustical Sensor Networks hamed sadeghi h.sadeghi@mail.ru Amir Akhavan Bitaghsir amir.akhavan@aut.ac.ir Nowadays, several infrastructure-based low-frequency acoustical sensor networks are employed in different applications to monitor the activity of diverse natural and man-made phenomena, such as avalanches, earthquakes, volcanic eruptions, severe storms, super-sonic aircraft flights, etc. Two signal detection methods are usually implemented in these networks for the purpose of event occurrence identification, which are the progressive multi-channel correlator (PMCC) and the so-called Fisher detector. But, the Fisher method is more important and applicable in low signal-to-noise (SNR) ratio conditions, which is of a special interest in acoustical monitoring networks. Unfortunately, an important disadvantage of this algorithm is its relative high detection-time; which limits its application for real-time detection scenarios. This disadvantage is fundamentally due to a beam forming process in Fisher algorithm, which requires doing complete search in a slowness-network, constructed from possible incoming wave front directions and speeds. To address this issue, we propose a method for implementation of this beam forming on a graphics processing unit (GPU), in order to realize a fast-computing and/or near real-time signal processing technique. In addition, we also propose a parallel-processing algorithm for further enhancement of the performance of this GPU-based Fisher detector. Simulation results confirm the performance improvement of Fisher detector, in terms of required processing time for acoustical signal detection applications.   Sensor network array processing beamforming parallel processing GPU 2018 3 01 19 30 http://jsdp.rcisp.ac.ir/article-1-424-en.pdf 10.29252/jsdp.14.4.19
29-529 2024-03-29 10.1002
Signal and Data Processing JSDP 2538-4201 2538-421X 10.52547/jsdp 2018 14 4 Data Clustering Based On Key Identification Ehsan Fazl-Ersi fazlersi@um.ac.ir Masoud Kazemi Nooghabi masoud.kazemi@stu.um.ac.ir Clustering has been one of the main building blocks in the fields of machine learning and computer vision. Given a pair-wise distance measure, it is challenging to find a proper way to identify a subset of representative exemplars and its associated cluster structures. Recent trend on big data analysis poses a more demanding requirement on new clustering algorithm to be both scalable and accurate. A recent advance in graph-based clustering extends its ability to millions of data points by massive utility of engineering endeavor and parallel optimization. However, most other existing clustering algorithms, though promising in theory, are limited in the scalability issue. In this paper, a novel clustering method is proposed that is both accurate and scalable. Based on a simple criteria, ”key” items that are representative of the whole data set are iteratively selected and thus form associated cluster structures. Taking input of pairwise distance measure between data instances, the proposed method searches centers of clusters by identifying data items far away from selected keys, but representative of unselected data items. Inspired by hierarchical clustering, small clusters are iteratively merged until a desired number of clusters are obtained. To solve the scalability problem, a novel tracking table technique is designed to reduce the time complexity which is capable of clustering millions of data points within a few minutes. To assess the performance of the proposed method, several experiments are conducted. The first experiment tests the ability of our algorithm on different manifold structures and various number of clusters. It is observed that our clustering algorithm outperforms existing alternatives in capturing different shapes of data distributions. In the second experiment, the scalability of our algorithm to large scale data points is assessed by clustering up to one million data points with dimensions of up to 100. It is shown that, even with one million data points, the proposed method only takes a few minutes to perform clustering. The third experiment is conducted on the ORL database, which consists of 400 face images of 40 individuals. The proposed clustering method outperforms the compared alternatives in this experiment as well. In the final experiment, shape clustering is performed on the MPEG-7 dataset, which contains 1400 silhouette images from 70 classes, 20 different shapes for each class. The goal here is to cluster the data items (here the binary shapes) into 70 clusters, so that each cluster only includes shapes that belong to one class. The proposed method outperforms other alternative clustering algorithms on this dataset as well. Extensive empirical experiments demonstrate the superiority of the proposed method over existing alternatives, in terms of both effectiveness and efficiency. Furthermore, our algorithm is capable of large-scale data clustering where millions of data points can be clustered in a few seconds.   Clustering Key Identification Large Scale 2018 3 01 31 42 http://jsdp.rcisp.ac.ir/article-1-529-en.pdf 10.29252/jsdp.14.4.31
29-495 2024-03-29 10.1002
Signal and Data Processing JSDP 2538-4201 2538-421X 10.52547/jsdp 2018 14 4 Improving Named Entity Recognition Using Izafe in Farsi mohammad َAbdoos mohammadabdous@comp.iust.ac.ir behrooz manaei B_minaei@iust.ac.ir Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine translation, semantic search, and relation extraction and quotation recognition systems. Named entity recognition in the Persian language is far more complex and more difficult than English. In English texts usually proper nouns begin with capital letters and this feature makes it easy to identify named entities, but this feature is absent in Persian language texts. To create a named entity recognition system, generally three methods are being used which include rule-based, machine-learning-based and hybrid methods. Each of these methods has its own advantages and disadvantages. Lack of named entity labeled data is the greatest challenge in Persian text. Because of this problem usually rule-based methods used to extract entities. In this paper firstly, the dictionary of organizations, places and people were extracted from Wikipedia. Wikipedia is one of the best sources for extracting entities in which more than 200000 Farsi-named entities are known to exist. The proposed algorithm classify each Wikipedia article title by using its categories. Each of Wikipedia titles has several categories that can be used to partially identify the named entity type. Then named entity recognition accuracy (precision) was increased using the rules. These rules can be divided into 3 categories that include morphological rules, adjacency and text patterns. The most important rules are adjacency rules. By using these rules the type of entity with the word nearby each entity (like Mr, Mrs , …) can be identified. To evaluate the system, 42000 tokens of BijanKhan corpus were manually annotated (labeled). Early F-measure was calculated 78.79 percent. In continue, named entity recognition accuracy (precision) improved using izāfe which is one of the important Persian language features and 81.94 percent for F-measure was achieved. The results showed that using izāfe in named entity recognition systems significantly increases their accuracy.   Named Entity Recognition Natural Language Processing Rule Based Wikipedia Izafe 2018 3 01 43 54 http://jsdp.rcisp.ac.ir/article-1-495-en.pdf 10.29252/jsdp.14.4.43
29-449 2024-03-29 10.1002
Signal and Data Processing JSDP 2538-4201 2538-421X 10.52547/jsdp 2018 14 4 An Approach for Extraction of Keywords and Weighting Words for Improvement Farsi Documents Classification vahideh rezaie vahidehrezaie@gmail.com mahid mohammadpour m.mohammadpour@iauyasooj.ac.ir hamid parvin parvin@iust.ac.ir samad nejatian samad.nej.2007@gmail.com Due to ever-increasing information expansion and existing huge amount of unstructured documents, usage of keywords plays a very important role in information retrieval. Because of a manually-extraction of keywords faces various challenges, their automated extraction seems inevitable. In this research, it has been tried to use a thesaurus, (a structured word-net) to automatically extract them. Authors claim that extraction of more meaningful keywords out of documents can be attained via employment of a thesaurus. The keywords extracted by applying thesaurus, can improve the document classification. The steps to be taken to increase the comprehensiveness of search should be such that in the first step the stop words are removed and the remaining words are stemmed. Then, with the help of a thesaurus are found words equivalent, hierarchical and dependent. Then, to determine the relative importance of words, a numerical weight is assigned to each word, which represents effect of the word on the subject matter and in comparison with other words used in the text. According to the steps above and with the help of a thesaurus, an accurate text classification is performed. In this method, the KNN algorithm is used for the classification. Due to the simplicity and effectiveness of this algorithm (KNN), there is a great deal of use in the classification of texts. The cornerstone of KNN is to compare with the text trained and text tested to determine their similarity between. The empirical results show the quality and accuracy of extracted keywords are satisfiable for users. They also confirm that the document classification has been enhanced. In this research, it has been tried to extract more meaningful keywords out of texts using thesaurus (which is a structured word-net) rather than not using it.   thesaurus information retrieval extraction of keywords weight 2018 3 01 55 78 http://jsdp.rcisp.ac.ir/article-1-449-en.pdf 10.29252/jsdp.14.4.55
29-492 2024-03-29 10.1002
Signal and Data Processing JSDP 2538-4201 2538-421X 10.52547/jsdp 2018 14 4 Converting Dependency Treebank to Constituency Treebank for Persian Ahmad Pouramini pouramini@gmail.com Masood Ghayoomi masood.ghayoomi@gmail.com Amine Naseri naseri.amine@sirjantech.ac.ir There are two major types of treebanks: dependency-based and constituency-based. Both of them have applications in natural language processing and computational linguistics. Several dependency treebanks have been developed for Persian. However, there is no available big size constituency treebank for this language. In this paper, we aim to propose an algorithm for automatic conversion of a dependency treebank to a constituency treebank for Persian. Our method is based on an existing method. However, we make modification to enhance its accuracy. The base algorithm constructs a constituency structure according to a set of conversion rules. Each rule maps a dependency relation to a constituency subtree. The constituency structure is built by combining these subtrees. We investigate the effects of the order in which dependency relations are processed on the output constituency structure. We show that the best order depends on the charactersitics of the target language. We also make modification in the algorithm for matching the conversion rules. To match a dependency relation to a conversion rule, we start with detailed infromation and if no match was found, we decrease the details and also change the method for matching. We also make modification in the algorithm used for combining the constituency subtrees. We use statistical data derived from a treebank to find a proper position for attaching a constituency subtree to the projection chain of the head. The expremental results show that these modifications provide an improvement of 16.48% in the accuracy of the conversion algorithm.     Natural language processing Treebanks Dependency structure Phrase structure 2018 3 01 79 96 http://jsdp.rcisp.ac.ir/article-1-492-en.pdf 10.29252/jsdp.14.4.79
29-522 2024-03-29 10.1002
Signal and Data Processing JSDP 2538-4201 2538-421X 10.52547/jsdp 2018 14 4 Using of Model Based Hand Poses Estimation for Imitation of User\'s Arm Movements by Robot Arm Maryam Zare mehrjardi zaremaryam@stu.yazd.ac.ir Mehdi Rezaeian mrezaeian@yazd.ac.ir Pose estimation is a process to identify how a human body and/or individual limbs are configured in a given scene. Hand pose estimation is an important research topic which has a variety of applications in human-computer interaction (HCI) scenarios, such as gesture recognition, animation synthesis and robot control. However, capturing the hand motion is quite a challenging task due to its high flexibility. Many sensor-based and vision-based methods have been proposed to fulfill the task. In sensor-based systems, specialized hardware is used for hand motion capture. Generally, vision-based hand pose estimation methods can be divided into two categories: appearance-based methods and model-based methods. In appearance-based approaches, various features are extracted from the input images to estimate the hand pose. Usually a lot of training samples are used to train a mapping function from the features to the hand poses in advance. Given the learned mapping function, the hand pose can be estimated efficiently. In model-based approaches the hand pose is estimated by aligning a projected 3D hand model to the extracted hand features in the inputs. Therefore, the desired information to be provided includes state at any time. These methods require a lot of calculations which are not possible in practice to implement them immediately. Hand pose estimation using (color/depth) images consist of three steps: Hand detection and its separation Feature extraction Setting the parameters of the model using extracted feature and updating the model To extract necessary features for pose estimation, depending on used model and usage of hand gesture analysis, features such as fingertips position, number of fingers, palm position and joint angles are extracted. In this paper a model-based markerless dynamic hand poses estimation scheme is presented.  Motion Capture is the process of recording a live motion event and translating it into usable mathematical terms by tracking a number of key points in space over time and combining them to obtain a single 3D representation of the performance. The sequence of depth images, color images and skeleton data obtained from Kinect (a new tool for markerless motion capture) at 30 frames per second are as inputs of this scheme. The proposed scheme exploits both temporal and spatial features of the input sequences, and focuses on index and thumb fingertips localization and joint angles of the robot arm to mimic the user's arm movements in 3D space in an uncontrolled environment. The RoboTECH II ST240 is used as a real robot arm model. Depth and skeleton data are used to determine the angles of the robot joints. Three approaches to identify the tip of the thumb and index fingers are presented using existing data, each with its own limitations. In these approaches, concepts such as thresholding, edge detection, making convex hull, skin modeling and background subtraction are used. Finally, by comparing tracked trajectories of the user's wrist and robot end effector, the graphs show an error about 0.43 degree in average which is an appropriate performance in this research. The key contribution of this work is hand pose estimation per every input frame and updating arm robot according to estimated pose. Thumb and index fingertips detection as part of feature vector resulted using presented approaches. User movements transmit to the corresponding Move instruction for robot. Necessary features for Move instruction are rotation values around joints in different directions and opening value of index and thumb fingers at each other.   pose estimation depth data markerless motion capture Kinect 3d model 2018 3 01 97 116 http://jsdp.rcisp.ac.ir/article-1-522-en.pdf 10.29252/jsdp.14.4.97
29-524 2024-03-29 10.1002
Signal and Data Processing JSDP 2538-4201 2538-421X 10.52547/jsdp 2018 14 4 Improving the accuracy of the author name disambiguation by using clustering ensemble Sayed Mohammad Mortazavi mortazavi.s.m@outlook.com Mohammad Hossein Nadimi Shahraki nadimi@iaun.ac.ir Mostafa Mosakhani m_student1367@yahoo.com Today, digital libraries are important academic resources including millions of citations and bibliographic essential information such as titles, author's names and location of publications. From the view of knowledge accumulation management, the ability to search fast, accurate, desired contents, has a great importance. The complexity and similarity in these resources cause many challenges and ambiguities. One of the most of these challenges is the author name disambiguation which makes an extensive scope of research. Although many effective methods have been developed by using clustering techniques in disambiguation of the author's name, the accuracy of these methods is not acceptable and still there are some problems such as fragmentation and error in the produced results of these methods, since there is no uniform standard of citations, various combinations, and numerous, written, verbal patterns. In fact, experiences have shown that the use of a single method to disambiguate names does not provide results with a high accuracy despite concerns expressed above. In this paper, a new method is proposed to disambiguate author names in different formats and combinations with more accuracy. The proposed solution carries out the disambiguation in two steps; In the first step, agglomerative hierarchical clustering algorithm produces clusters using similar functions and different thresholds. In the second step, clusters produced by clustering ensemble technique in the previous stage are combined to provide more accurate clusters with less fragmentation. The proposed method is experimentally evaluated by conducted DBLP datasets with K criterion. The evaluation results show that the proposed method enhances the accuracy of disambiguation of author names in different formats.   Digital libraries Author Name Disambiguation Ambiguous name Clustering Ensemble 2018 3 01 117 128 http://jsdp.rcisp.ac.ir/article-1-524-en.pdf 10.29252/jsdp.14.4.117
29-383 2024-03-29 10.1002
Signal and Data Processing JSDP 2538-4201 2538-421X 10.52547/jsdp 2018 14 4 A Fuzzy C-means Clustering Approach for Continuous Stress Detection during Driving Sara Pourmohammadi pourmohammadi@semnan.ac.ir Ali Maleki amaleki@semnan.ac.ir Stress is one of the main causes of physical and mental disorders leading to various types of diseases. In recent two decades, stress level detection during driving to avoid accidents has attracted much of researchers’ attentions. However, the existing studies usually neglect this fact that stress level during driving varies due to irregular events. Contrary to the previous works, this paper demonstrates that to assume a fixed level of stress for a long period- e.g. while driving in highway- is unreasonable. According to the above assumption, a novel approach for continuous stress detection is proposed based on fuzzy c-means clustering and cluster labeling by the expert. Fuzzy c-means clustering is used to specify levels of stress instead of the former different classification and labeling methods. Concurrently, utilizing background knowledge of data and clustering results, the label of each cluster is obtained. Then, proper weights are assigned to labeled clusters.  By combining the membership values of clusters and weights associated with each cluster’s label, a score of stress is obtained in short time intervals. Stress in driving dataset provide stressful conditions during real driving. The experiments were performed on a specific route of open roads and where drivers traverse were limited to daily commutes. For each drive, Electrocardiogram (ECG), Electromyogram (EMG), foot and hand Galvanic skin response (GSR), respiration and marker signals were acquired from the sensors worn by the driver. Clearly, the more number of physiological signals are used, the more computational cost must be paid, so in this work, heart rate, EMG, foot GSR and hand GSR from mentioned dataset are selected. After that, six features consisting of the mean value of the heart rate, the mean value of EMG, the mean value of the hand GSR and the mean value of foot GSR in addition to mean absolute differences for hand and foot GSR are extracted for each 10 second window (100 second window with 90% overlap) of signals. Next step is to cluster via fuzzy c-means algorithm. In this study, the data is located in 5 clusters and according to the membership degree of each window, input signals and background data from dataset, an adequate label is assigned by the expert to each cluster. The labels of these five clusters are "very low", "low", "medium", “high" and "very high" stress, which are respectively the least stressed to the most stressful. Therefore, the base weight vector is obtained as . The weights assigned to the clusters will be a permutation of the mentioned base weight vector. After assigning the weight of clusters, in each window, the membership degree obtained by the Fuzzy c-means method is multiplied by the weight assigned to that cluster and the resulting numbers are accumulated for the 5 clusters. The calculated value scales to the range of 0 to 100, in order to quantifying the stress. For better representation, a collection of 100 different colors in the range of dark blue to dark red of the visible spectra will be defined by the use of “colormap” command in MATLAB. By taking the calculated value to the range of 0 to 100, one of the mentioned colors will be chosen. So the color will be associated to the stress value of the corresponding window. In this paper, in addition to the qualitative assessment of the results, the correlation between the determined stress and subjective rating scores is considered as a quantitative criterion. The results illustrate the effectiveness of the proposed method to improve both the precision and accuracy of stress detection. In fact, the stress in driving dataset have imprecise labels which the proposed systematic approach estimates the stress continuously utilizing the background knowledge of data. The results clearly represent valid, efficient criteria for stress during driving in each moment without using long time window, show the continues stress from the beginning of the experiment until the end of it, and exaggerate individual differs and unexpected hazards during the experiment.   Continuous stress detection Stress during driving Fuzzy c-means clustering 2018 3 01 129 142 http://jsdp.rcisp.ac.ir/article-1-383-en.pdf 10.29252/jsdp.14.4.129
29-491 2024-03-29 10.1002
Signal and Data Processing JSDP 2538-4201 2538-421X 10.52547/jsdp 2018 14 4 A comparison of machine learning techniques for Persian Extractive Speech to Speech Summarization without Transcript Hoda Sadat Jafari hodas.jafari@gmail.com mohammadmehdi homayounpour homayoun@aut.ac.ir In this paper, extractive speech summarization using different machine learning algorithms was investigated. The task of Speech summarization deals with extracting important and salient segments from speech in order to access, search, extract and browse speech files easier and in a less costly manner. In this paper, a new method for speech summarization without using automatic speech recognition system (ASR) is proposed. ASR systems usually have high error rates especially in adverse acoustic environment and for low resource languages. Our goal was to answer this question: is it possible to summarize a Persian speech without ASR using less or no training data? We have proposed a method which discovers salient parts directly from speech signal by using a semi-supervised algorithm. The proposed algorithm consists of three main stages, features extraction, identifying key patterns and selecting important sentences. First we have segmented speech voices manually into sentences to eliminate sentence segmentation errors. Therefore, we could have better comparison between different summarization methods. Then we have extracted some features from each sentence such as sentence duration, if the sentence is first or last sentence in the speech and so on. Also, repetitive patterns between each two sentence of speech are discovered directly from speech signal by using S-DTW algorithm. S-DTW algorithm can discover repetitive patterns between two speech signals by using MFCC features. By using these repetitive patterns between each pair of sentences we can make a similarity matrix. Therefore, we could measure the similarity distance between each pair of sentences and eliminate redundant sentences from summary without the need to use an ASR system After finding the similarity between each two speech segments and extracting some features from each segment, various machine learning algorithms including unsupervised (MMR, TextRank), supervised (SVM, Naïve Bayes) and semi-supervised algorithms (self-training, Co-training) are used in order to extract salient parts. Experiences are done in read Persian news. The results show that using semi-supervised co-training method and appropriate features, the performance of speech summarization system on read Persian news corpus can improve about 3% compared to selecting the first sentences and by 5% compared to longest sentences when ROUGE-3 is used as the evaluation measure.     Extractive speech summarization speech signal key patterns S-DTW algorithm machine learning 2018 3 01 143 157 http://jsdp.rcisp.ac.ir/article-1-491-en.pdf 10.29252/jsdp.14.4.143