Combining an Ensemble Clustering Method and a New Similarity Criterion for Modeling the Hereditary Behavior of Diseases

mojarad, musa; parvin, hamid; nejatiyan, samad; Bagheri Fard, Karam allah

doi:10.52547/jsdp.18.2.97

Volume 18, Issue 2 (10-2021) JSDP 2021, 18(2): 97-114 | Back to browse issues page

‎ 10.52547/jsdp.18.2.97

Mendeley

Zotero

RefWorks

mojarad M, parvin H, nejatiyan S, Bagheri Fard K A. Combining an Ensemble Clustering Method and a New Similarity Criterion for Modeling the Hereditary Behavior of Diseases. JSDP 2021; 18 (2) :97-114
URL: http://jsdp.rcisp.ac.ir/article-1-980-en.html

Combining an Ensemble Clustering Method and a New Similarity Criterion for Modeling the Hereditary Behavior of Diseases

Musa Mojarad ^*

, Hamid Parvin

, Samad Nejatiyan

, Karam allah Bagheri Fard

Department of Computer Engineering, Firoozabad Branch, Islamic Azad University

Abstract: (3267 Views)

Background: There are many theories about the causes of hereditary diseases, but physician believe that both the genetic and environmental factors simultaneously play an important role in the development and progression of these diseases, although the extent to which this effect is not yet clear. In order to detect effective genes in the development of diseases, it is necessary to achieve the relationship between cells/tissues.
Objective: In fact, inter-cell or inter-tissue communications indicate the hereditary relationships between patients. Detecting these communications help to identify common parts of the body that are influenced by various diseases. The interaction between different cells/tissues can be demonstrated by expressing the gene between them. By sampling chromosomes, useful information is obtained about the type of disease and how it is transmitted. By examining this information, you can identify disorders that have led to highly altered changes. In previous research, various clustering methods have been used to discover the links between diseases based on gene expression data. However, ensembl clustering approaches have not yet been used for this purpose.
Method: In this paper, the recognition of intercellular and inter-tissue interactions in various diseases have been done according to the characteristics of the topological structure of the graph and an improved ensembl clustering method. The proposed clustering algorithm uses an agreed similarity function to measure the similarity between objects. The proposed method has two stages; in the first step, several clustering models are combined to identify the initial relationships between cells or tissues in order to produce better results than individual algorithms. In the second stage, the similarity between cells or tissues in each cluster is calculated by using a similarity criterion based on the topological structure of the graph. Eventually, the maximum similarity between cells or tissues in each cluster is used to discover the relationship between diseases. In addition, an algorithm for improving the uncertainty of objects is evaluated by allocating them to other clusters in order to enhance the quality of the final clusters.
Results: To evaluate the performance of the proposed method, several UCI datasets and the FANTOM5 dataset have been used. The results of the proposed method on the phantom data set 5 report a silhouette of 0.901 in 18 clusters for cells and 0.762 in 13 clusters for tissues.
Conclusion: The conducted evaluations have confirmed the power of the proposed clustering algorithm in terms of accuracy. Clustering of cells or tissues has increased the accuracy and concentration of the topological similarity criterion of the graph in the range of similarity of cells or tissues.

Keywords: Intercellular communication, Improved clustering, Graph topological structure, FANTOM5 dataset

Full-Text [PDF 947 kb] (1419 Downloads)

Type of Study: Applicable | Subject: Paper
Received: 2019/02/26 | Accepted: 2020/08/18 | Published: 2021/10/8 | ePublished: 2021/10/8

References

1. [1] I. H. Park, N. Arora, H. Huo, N. Maherali, T. Ahfeldt, A. Shimamura,G. Q. Daley, "Disease-specific induced pluripotent stem cells", cell, no. 134(5), 877-886, 2008. [DOI:10.1016/j.cell.2008.07.041] [PMID] [PMCID]

2. [2] D. R. Bentley, S. Balasubramanian, H. Swerdlow, G. P. Smith, J. Milton, C. G. Brown, J. M. Boutell, "Accurate whole human genome sequencing using reversible terminator chemistry", nature, vol. 456(7218), pp.53, 2008.

3. [3] W. Cook, and J. Palsberg, "A denotational semantics of inheritance and its correctness", Vol. 24. No. 10. ACM, 1989. [DOI:10.1145/74878.74922]

4. [4] H. C. Lukaski, "Methods for the assessment of human body composition: traditional and new", The American journal of clinical nutrition, vol. 46 (4), pp. 537-556, 1987. [DOI:10.1093/ajcn/46.4.537] [PMID]

5. [5] R. A. Lewis, B. Otterud, D. Stauffer, J. M. Lalouel, & M. Leppert, "Mapping recessive ophthalmic diseases: linkage of the locus for Usher syndrome type II to a DNA marker on chromosome 1q", Genomics, vol. 7(2), pp.250-256, 1990 [DOI:10.1016/0888-7543(90)90547-8]

6. [6] A. Bretto, H. Cherifi, D. Aboutajdine, "Hypergraph imaging: an overview", Pattern Recognition, vol. 35(3), pp. 651-658, 2002. [DOI:10.1016/S0031-3203(01)00067-X]

7. [7] A. R. Forrest, H. Kawaji, M. Rehli, J. K. Baillie, M. J. De Hoon, V. Haberle, R. Andersson, "A level mammalian expression atlas", Nature, vol. 507(7493), pp. 462, 2012. [DOI:10.1038/nature13182] [PMID] [PMCID]

8. [8] M. E. Hegi, A. C. Diserens, S. Godard, , P. Y. Dietrich, , L. Regli, , S. Ostermann, R. Stupp, "Clinical trial substantiates the predictive value of O-6-methylguanine-DNA methyltransferase promoter methylation in glioblastoma patients treated with temozolomide", Clinical cancer research, vol.10(6), pp. 1871-1874, 2008. [DOI:10.1158/1078-0432.CCR-03-0384] [PMID]

9. [9] Karimi A, Hoseini L S. An Optimal Algorithm for Dividing Microscopic Images of Blood for the Diagnosis of Acute Pulmonary Lymphoblastic Cell Using the FCM Algorithm and Genetic Optimization. JSDP. 2018; 15 (2) :45-54 [DOI:10.29252/jsdp.15.2.45]

10. [10] Vahidi Ferdosi S, Amirkhani H. Weighted Ensemble Clustering for Increasing the Accuracy of the Final Clustering. JSDP. 2020; 17 (2) :100-85. [DOI:10.29252/jsdp.17.2.100]

11. [11] M. A. Ahmad, Z. Borbora, J. Srivastava, & N. Contractor, "Link prediction across multiple social networks. In Data Mining Workshops (ICDMW)", 2010 IEEE International Conference on, pp. 911-918, 2010. [DOI:10.1109/ICDMW.2010.79]

12. [12] H. M. Byrne, "Dissecting cancer through mathematics: from the cell to the animal model", Nature reviews. Cancer, vol.10 (3), pp. 221, 2010. [DOI:10.1038/nrc2808] [PMID]

13. [13] A. Csikász-Nagy, "Computational systems biology of the cell cycle", Briefings in bioinformatics, vol. 10(4), pp. 424-434, 2004. [DOI:10.1093/bib/bbp005] [PMID]

14. [14] J. Hofbauer, K. Sigmund, "Evolutionary game dynamics", Bulletin of the American Mathematical Society, vol. 40(4), pp.479-519, 2003. [DOI:10.1090/S0273-0979-03-00988-1]

15. [15] R. P. Araujo, D. S. McElwain, "A history of the study of solid tumour growth: the contribution of mathematical modelling", Bulletin of mathematical biology, vol. 66(5), pp. 1039, 2004. [DOI:10.1016/j.bulm.2003.11.002] [PMID]

16. [16] A. Csikász-Nagy, M. Cavaliere, S. Sedwards, "Combining game theory and graph theory to model interactions between cells in the tumor microenvironment", In New Challenges for Cancer Systems Biomedicine Springer Milan, pp. 3-18, 2012. [DOI:10.1007/978-88-470-2571-4_1]

17. [17] S.Aranganayagi, K. Thangavel, "Clustering categorical data using Silhouette coefficient as a relocating measure", In Conference on Computational Intelligence and Multimedia Applications, 2007, International Conference on vol. 2, pp. 13-17, 2007. [DOI:10.1109/ICCIMA.2007.328]

18. [18] T.Nakano, T. Suda, M. Moore, R. Egashira, A. Enomoto, K. Arima, "Molecular communication for nanomachines using intercellular calcium signaling. In Nanotechnology," 5th IEEE Conference on, pp. 478-481, 2005.

19. [19] T. Nakano, M. Moore, A. Enomoto, T. Suda, T.Koujin, T. Haraguchi, Y. Hiraoka, A cell-based molecular communication network, In Proceedings of the 1st international conference on Bio inspired models of network, information and computing systems , pp. 23, 2006. [DOI:10.1145/1315843.1315871]

20. [20] H. Jiang, J. Liu, Z. Zhao, Graph mining based knowledge discovery in designing decision-making context models, In Systems and Informatics (ICSAI), 2014 2nd International Conference on , pp. 948-953, 2014. [DOI:10.1109/ICSAI.2014.7009422] [PMCID]

21. [21] H. Shao, Y. He, K. C. Li, X. Zhou, "A system mathematical model of a cell-cell communication network in amyotrophic lateral sclerosis", Molecular BioSystems, vol. 9(3), pp.398-406. 2013. [DOI:10.1039/c2mb25370d] [PMID] [PMCID]

22. [22] R.Gao, H. Xu, P. Hu, W. C. Lau, "Accelerating graph mining algorithms via uniform random edge sampling", In Communications (ICC), 2016 IEEE International Conference on, pp. 1-6, 2016. [DOI:10.1109/ICC.2016.7511156]

23. [23] T. Alqurashi, W. Wang, "Object-neighbourhood clustering ensemble method", In International Conference on Intelligent Data Engineering and Automated Learning, pp. 142-149, 2014. [DOI:10.1007/978-3-319-10840-7_18]

24. [24] W. Cui, Y. Xiao, H. Wang, W. Wang, "Local search of communities in large graphs", In Proceedings of the 2014 ACM SIGMOD international conference on Management of data , 2014, pp. 991-1002. [DOI:10.1145/2588555.2612179]

25. [25] J. A. Ramilowski, T. Goldberg, J. Harshbarger, E. Kloppman, M. Lizio, V. P. Satagopam, A. Forrest, "A draft network of ligand-receptor-mediated multicellular signalling in human", Nature communications, 2015. [DOI:10.1038/ncomms8866] [PMID] [PMCID]

26. [26] L. Cantini, E. Medico, S. Fortunato, M. Caselle, "Detection of gene communities in multi-networks reveals cancer drivers", Scientific reports, vol. 5, srep17386, 2015. [DOI:10.1038/srep17386] [PMID] [PMCID]

27. [27] G. A. Pavlopoulos, M. Secrier, C. Moschopoulos, N. Soldatos, S. Kossida, J. Aerts, P. G. Bagos, "Using graph theory to analyze biological networks", BioData mining, vol. 4(1), no.10, 2010. [DOI:10.1186/1756-0381-4-10] [PMID] [PMCID]

28. [28] A. L. Fred, A. K. Jain, "Combining multiple clusterings using evidence accumulation", IEEE transactions on pattern analysis and machine intelligence, vol. 27(6), pp. 835-850, 2005. [DOI:10.1109/TPAMI.2005.113] [PMID]

29. [29] A. Strehl, J. Ghosh, "Relationship-based clustering and visualization for high-dimensional data mining", INFORMS Journal on Computing, vol. 15(2), pp. 208-230, 2003. [DOI:10.1287/ijoc.15.2.208.14448]

30. [30] G. Karypis, V. Kumar, A hypergraph partitioning package, 1998. [DOI:10.1145/309847.309954]

31. [31] N. Iam-On, T. Boongoen, S. Garrett, C. Price, "A link-based approach to the cluster ensemble problem", IEEE transactions on pattern analysis and machine intelligence, vol.33(12), pp. 2396-2409, 2011. [DOI:10.1109/TPAMI.2011.84] [PMID]

32. [32] S. Mimaroglu, E. Aksehirli, "Diclens: Divisive clustering ensemble with automatic cluster number", IEEE/ACM transactions on computational biology and bioinformatics, vol. 9(2), pp. 408-420, 2012. [DOI:10.1109/TCBB.2011.129] [PMID]

33. [33] M. A. Dillies, A. Rau, J. Aubert, C. Hennequet-Antier, M. Jeanmougin, N. Servant, G. Guernec, "A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis", Briefings in bioinformatics, vol. 14(6), pp. 671-683. 2013. [DOI:10.1093/bib/bbs046] [PMID]

34. [34] R. Reddy, "A comparison of methods: normalizing high-throughput RNA sequencing data", bioRxiv, 026062, 2015. [DOI:10.1101/026062]

35. [35] S. Hawinkel, "Evaluation of normalization and analysis methods for microbiome data", 2015.

36. [36] S. Noguchi, T. Arakawa, S. Fukuda, M. Furuno, A. Hasegawa, F. Hori, T. Kawashima, "FANTOM5 CAGE profiles of human and mouse samples", Scientific data, vol. 4, pp. 112-170, 2017.

37. [37] L. C. Gandolfo, T. P. Speed, RLE Plots: Visualising Unwanted Variation in High Dimensional Data. arXiv preprint arXiv:1704.03590, 2017. [DOI:10.1371/journal.pone.0191629] [PMID] [PMCID]

38. [38] M. Arab, M. Hasheminezhad, Limitations of quality metrics for community detection and evaluation, In Web Research (ICWR), 2017 3th International Conference, pp. 7-14, 2017. [DOI:10.1109/ICWR.2017.7959298]

39. [39] Y. Qiao, H. Wang, D. Wang, "Parallelizing and optimizing overlapping community detection with speaker-listener Label Propagation Algorithm on multi-core architecture", In Cloud Computing and Big Data Analysis (ICCCBDA), 2017 IEEE 2nd International Conference, 2017 , pp. 439-443.

40. [40] S. Mohammadi, J. Davila-Velderrain, M. Kellis, A. Grama, DECODE-ing sparsity patterns in single-cell RNA-seq. bioRxiv, 241646, 2018 [DOI:10.1101/241646]

41. [41] D. J. D. S.Price, Networks of scientific papers. Science, pp. 510-515, 1965. [DOI:10.1126/science.149.3683.510] [PMID]

42. [42] A. Papadimitriou, P. Symeonidis, Y. Manolopoulos, "Fast and accurate link prediction in social networking systems", Journal of Systems and Software, vol. 85(9), pp. 2119-2132, 2012. [DOI:10.1016/j.jss.2012.04.019]

43. [43] L. Katz, "A new status index derived from sociometric analysis", Psychometrika, vol. 18 (1), pp. 39-43, 1953. [DOI:10.1007/BF02289026]

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Signal and Data Processing

Vote