Improving Near Real Time Data Warehouse Refreshment

Hazrati, Isa; Daneshpour, Negin

doi:10.29252/jsdp.15.2.31

Signal and Data Processing Journal A scientific journal officially licensed by the Commission for Scientific Publications of the (MSRT). Publisher: Research Ceter for Developmen of Technologies

EN FA

Volume 15, Issue 2 (9-2018) JSDP 2018, 15(2): 31-44 | Back to browse issues page

‎ 10.29252/jsdp.15.2.31

Mendeley

Zotero

RefWorks

Hazrati I, Daneshpour N. Improving Near Real Time Data Warehouse Refreshment. JSDP 2018; 15 (2) :31-44
URL: http://jsdp.rcisp.ac.ir/article-1-636-en.html

Improving Near Real Time Data Warehouse Refreshment

Isa Hazrati

, Negin Daneshpour ^*

Shahid Rajaee Teacher Training University

Abstract: (5432 Views)

Near-real time data warehouse gives the end users the essential information to achieve appropriate decisions. Whatever the data are fresher in it, the decision would have a better result either. To achieve a fresh and up-to-date data, the changes happened in the side of source must be added to the data warehouse with little delay. For this reason, they should be transformed in to the data warehouse format. One of the famous algorithms in this area is called X-HYBRIDJOIN. In this algorithm the data characteristics of real word have been used to speed up the join operation. This algorithm keeps some partitions, which have more uses, in the main memory. In the proposed algorithm in this paper, disk-based relation is joined with input data stream. The aim of such join is to enrich stream. The proposed algorithm uses clustered index for disk-based relation and join attribute. Moreover, it is assumed that the join attribute is exclusive throughout the relation. This algorithm has improved the mentioned algorithm in two stages. At the first stage, some records of source table which are frequently accessible are detected. Detection of such records is carried out during the algorithm implementation. The mechanism is in the way that each record access is counted by a counter and if it becomes more than the determined threshold, then it is considered as the frequently used record and placed in the hash table. The hash table is used to keep the frequently used records in the main memory. When the stream is going to enter in to join area, it is searched in this table. At the second stage, the choice method of the partition which is going to load in the main memory has been changed. One dimensional array is used to choose the mentioned partition. This array helps to select a partition of source table with highest number of records for the join among all partitions of source table. Using this array in each iteration, always leads to choose the best partition loading in memory. To compare the usefulness of the suggested algorithm some experiments have been done. Experimental results show that the service rate acquired in suggested algorithm is more than the existing algorithms. Service rate is the number of joined records in a time unit. Increasing service rate causes the effectiveness of the algorithm.

Keywords: Near Real Time Data Warehouse, Join, Data Stream, Decision Making

Full-Text [PDF 3913 kb] (3054 Downloads)

Type of Study: Research | Subject: Paper
Received: 2017/06/24 | Accepted: 2018/04/29 | Published: 2018/09/16 | ePublished: 2018/09/16

References

1. [1] I. Hazrati, N. Daneshpour, "RX-HYBRIDJOIN: improved algorithm for near-real-time data warehouse," presented at the 10th Symposium on the Advancement of Science and Technology, Mashhad, Khavaran Higher Education Institution, 2015.

2. [2] I. Hazrati, N. Daneshpour, "IX-HYBRIDJOIN: improved algorithm for near-real-time data warehouse," presented at the 21th National Computer Conference of Iran, Tehran, Institute of Basic Sciences, 2015.

3. [3] A. Nguyen and A. Tjoa, "Zero-latency data warehousing for heterogeneous data sources and continuous data streams," Paper presented at the 5th International Conference on Information Integration and Web-based Applications Services, Austrian, 2003, pp. 55–64.

4. [4] A. Gupta, F. Yang, J. Govig, A. Kirsch, K. Chan, K. Lai, S. Wu, S. G. Dhoot, A. R. Kumar, A. Agiwal, S. Bhansali, M. Hong, J. Cameron, M. Siddiqi, D. Jones, J. Shute, A. Gubarev, S. Venka--taraman, and D. Agrawal, "Mesa: geo-replicated, near real-time, scalable data warehouse-ing," presented at the 40th International Conf-erence on Very Large Data Bases, China, 2014, pp. 1259-1270.

5. [5] A. Karakasidis, P. Vassiliadis, and E. Pitoura, " ETL queues for active data warehousing," presented at the 2th International Workshop on Information Quality in Information Systems, New York, 2005, pp. 28–39.

6. [6] C. Anderson, The Long Tail: Why the Future of Business is Selling Less of More, Hyperion, 2009.

7. [7] F. Dehne, Q. Kong, A. Rau-Chaplin, H. Zaboli, and R. Zhou, "Scalable real-time OLAP on cloud architectures," Journal of Parallel and Distributed Computing, vol. 79-80, pp. 31-41, 2015. [DOI:10.1016/j.jpdc.2014.08.006]

8. [8] F. Dehne, Q. Kong, A. Rau-Chaplin, H. Zaboli, and R. Zhou, "Distributed Tree Data Structure For Real-Time OLAP On Cloud Architectures," presented at the International Conference on Big Data, Silicon Valley, 2013, pp. 499-505.

9. [9] F. Majeed and S. Mahmood, "Efficient data streams processing in the real time data ware-house," presented at the 3rd IEEE Interna-tional Conference on Computer Science and Infor-mation Technology, Chengdu, 2010, pp. 57-61.

10. [10] F. Majeed, S. Mahmood, S. Ubaid, N. Khalil, S. Siddiqi, and F. Ashraf, "A burst resolution technique for data streams management in the real-time data warehouse," presented at the 7th Internat-ional Conference on Emerging Technologies, Islamabad, 2011, pp. 1-5. [PMID]

11. [11] H. Zhou, D. Yang, and Y. Xu, "An ETL strategy for real-time data warehouse," presented at the International Conference on Intelligent Systems and Knowledge Engineering, Shanghai, 2011, pp. 329–336.

12. [12] H. Alzeini, SH. Hameed, and M. Habaebi, "A framework for developing real-time OLAP algorithm using multi-core processing and GPU: heterogeneous computing," presented at the 5th International Conference on Mechatronics, Kuala Lumpur. 2013.

13. [13] L. Golab, T. Johnson, J. S. Seidel, and V. Shkapenyuk, "Stream warehousing with data depot," presented at the 35th SIGMOD Interna-tional Conference on Management of Data, Rhode Island, 2009, pp. 847–854. [DOI:10.1145/1559845.1559934]

14. [14] L. Chen, W. Rahayu, and D. Taniar, "Towards near real-time data warehousing," presented at the 24th IEEE International Conference on Advanced Information Networking and Applications, Perth, 2011, pp. 1150-1157.

15. [15] M. Obal, B. Dursun, Z. Erdem, and A. Kadir, "A real-time data warehouse approach for data processing," presented at the Signal Processing and Communications Applications Conference, Haspolat, 2013, pp. 1-4.

16. [16] M. A. Naeem, G. Dobbie, and G. Weber, "X-HYBRIDJOIN for near-real-time data warehousing," presented at the 28th British National Conference on Databases, Manchester, 2011, pp. 33–47.

17. [17] M. A. Naeem, G. Dobbie, and G. Weber, "A lightweight stream-based join with limited resource consumption" presented at the 14th International Conference DaWaK, Vienna, 2011, pp. 431-442.

18. [18] M. A. Naeem, G. Dobbie, and G. Weber, "Hybridjoin for near-real-time data warehousing," International Journal of Data Warehousing and Mining, vol. 7, no. 4, pp. 21-42, 2011. [DOI:10.4018/jdwm.2011100102]

19. [19] M. A. Naeem, G. Dobbie, and G. Weber, "An event-based near real-time data integration archite-cture," presented at the Enterprise Distributed Object Computing Conference Workshops, Munich, 2008, pp. 401–404. [DOI:10.1109/EDOCW.2008.14]

20. [20] M. A. Naeem and N. Jamil, "An efficient stream-based join to procees end user transactions in real-time data warehousing," Journal of Digital Infor-mation Management, vol. 3, pp. 201-215, 2014.

21. [21] M. Thiele and W. Lehner, "Evaluation of load scheduling strategies for real-time data warehouse environments," presented at the 35th International Conference on Very Large Databases, Lyon, 2009, pp. 84-99.

22. [22] N. Polyzotis, S. Skiadopoulos, P. Vassiliadis, A. Simitsis, and N. Frantzell, "Meshing Streaming Updates with Persistent Data in an Active Data Warehouse," IEEE Transactions on Knowledge and Data Engineering, vol. 20, issue. 7, pp. 976-991, 2008. [DOI:10.1109/TKDE.2008.27]

23. [23] R. Abrahiem, "A new generation of middleware solutions for a near-real-time data warehousing architecture," presented at the 2007 IEEE International Conference on Electro/Information Technology, Chicago, 2007, pp. 192-197. [DOI:10.1109/EIT.2007.4374453]

24. [24] S. Sudha and S. Manikandan, "M-hybridjoin- an adaptive approach for stream based near real-time data warehousing," International Journal of Ad-vanced Engineering Technology, vol. 7, issue 1, pp. 321-326, 2016.

25. [25] T. Jorg, and S. Dessloch, "Near real-time data warehousing using state-of-the-art ETL tools," presented at the 35th International Conference on Very Large Databases, Lyon. 2009.

26. [26] W. J. Labio, J. L. Wiener, H. Garcia, and V. Gorelik, "Efficient resumption of interrupted ware-house loads," SIGMOD Rec. vol. 29, no. 2, pp. 46–57, 2000. [DOI:10.1145/335191.335379]

27. [27] W. J. Labio, J. Yang, Y. Cui, H. Garcia, and J. Widom, "Performance issues in incremental warehouse maintenance," presented at the 26th International Conference on Very Large Data Bases, San Francisco, 2000, pp.461–472.

28. [28] ] M. A. Naeem, G. Dobbie, and G. Weber, "Efficient usage of memory resources in near-real-time data warehousing," presented at the Emerging Trends and Applications in Information Communi-cation Technologies, Pakistan, 2012, pp. 326-337.

29. [29] M. A. Naeem, G. Dobbie, and G. Weber, "Optimised X-HYBRIDJOIN for near-real-time data warehousing" presented at the 23th Austra-lasian Database Conference, Melbourne, 2012, pp. 21-30.

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.