Providing a Dynamic Technique for Answering Ad-hoc Continuous Aggregate

,; Safaei, Ali

doi:10.29252/jsdp.14.3.3

Signal and Data Processing Journal A scientific journal officially licensed by the Commission for Scientific Publications of the (MSRT). Publisher: Research Ceter for Developmen of Technologies

EN FA

Volume 14, Issue 3 (12-2017) JSDP 2017, 14(3): 3-22 | Back to browse issues page

‎ 10.29252/jsdp.14.3.3

Mendeley

Zotero

RefWorks

Safaei A. Providing a Dynamic Technique for Answering Ad-hoc Continuous Aggregate. JSDP 2017; 14 (3) :3-22
URL: http://jsdp.rcisp.ac.ir/article-1-486-en.html

Providing a Dynamic Technique for Answering Ad-hoc Continuous Aggregate

Ali Safaei ^*

Abstract: (6931 Views)

Data Streams are infinite, fast, time-stamp data elements which are received explosively. Generally, these elements need to be processed in an online, real-time way. So, algorithms to process data streams and answer queries on these streams are mostly one-pass. The execution of such algorithms has some challenges such as memory limitation, scheduling, and accuracy of answers. They will be more important and serious, chiefly if the queries are not predefined but Ad-hoc, and also should be executed after data stream tuples are gone.
Countinous aggregate queries are types of queries with some special characteristics making it possible to perform more specific, efficient qeury processing techniques, specifiaclly beneficient for ad-hoc ones.
In this paper, a dynamic efficient techinque is proposed for answering the ad-hoc continiues aggregate queries over data streams. The main idea of the proposed technique is to generate and handle an efficiet tree data structure as the synopse, in the form of Dynamic Prefix Aggregate Tree.
In general, the two following approaches can be used to calculate any function such as ; either implementation of an algorithm for the calculation of function f, or storing the answers of function f for all possible states. When the algorithm runtime is high, the second method strengthened by proper selection of indices can return a proper answer in a very short time (even ).
But the major problem of the second method is the total number of possible answers which can be very high and also can be out of the possible storage capacity and processing potential within a certain acceptable time period. For example, suppose that the cardinality of each of the parameters of is 10. In this case, the total number of possible states will be . As it is evident, the total number of states increases with the number of parameters and their cardinalities.When the total number of states is so great that generating answers with respect to consumed time and space is impossible, a more convenient, practical method should be employed. This more practical approach can be the storing of some of the answers (selectively) with respect to the following conditions:

Obtaining un-stored answers from the set of stored answers.
Higher probability of utilizing stored answers (i.e. higher probability of submitting requests from stored set).
Eliminating (not storing) null answers.

The same idea can be implemented for online and almost real time processing of queries, so that by receiving each tuple, all possible answers get obtained and stored. By doing so, in the time of need (when answering to an ad-hoc query) stored answers will be used instead of calculating each answer.
Accordingly, some answers are stored in a tree structure to be used at the right time. In this paper, in order to answer ad-hoc continuous aggregate queries over data streams, a method is proposed that uses a tree structure for storing the aggregate results. The important point in this method is that all steps of the construction, maintenance and using of the tree must be online. For these purposes, it is enough to keep all possible answers. But to apply an online construction and maintenance of tree, we must keep some answers, according to the inherent features of data streams. In this way, the main goal is to choose the answers possessing the most overlap with responses answers of received ad-hoc queries. The proposed method, creates the tree structure and maintains it dynamically to answer ad-hoc aggregate continuous queries over data streams.
For this purpose, queries at instant are modeled as in form of , where or (when , the aggregate over the whole sliding window is returned) and is the size of sliding window and (when , the aggregate over the whole is returned).
In order to increase the overlapping, a statistical task is performed on a dimensions of the received queries. In this way, dimensions are determined with the highest, lowest request. When , means that there is no request for this dimension. Therefore, we select and store the answers related to the dimension with highest request, and ignore those with the lowest. Obviously, these answers should be obtained and presented using stored answers.
As the request for dimensions may change, the tree structure must be dynamically constructed and maintenance that will be presented this dynamic structure in this paper. Experimental evaluattion of the proposed method shows that, using the proposed Dynamic Aggregate Tree for ansering countinous Ad-hoc aggregate queies is more cost-effective, in terms of response time and memory usage.

Keywords: Data Stream, Continuous ad-hoc aggregate queries, Dynamic Prefix Tree, Aggregate cell

Full-Text [PDF 5832 kb] (3721 Downloads)

Type of Study: Research | Subject: Paper
Received: 2016/02/9 | Accepted: 2016/10/29 | Published: 2018/01/29 | ePublished: 2018/01/29

References

1. [1] M. Chen, S. Mao and Y. Liu, "Big Data: A Survey," Mobile Networks and Applications, vol. 19, no. 2, pp. 171-209, April 2014. [DOI:10.1007/s11036-013-0489-0]

2. [2] C. L. Philip Chen and C. Y. Zhang, "Data-intensive applications, challenges, techniques and technologies: A survey on Big Data," Elsevier, vol. 275, pp. 314-347, January 2014.

3. [3] R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein and R. Varma, "Query Processing, Resource Management, and Approximation in a Data Stream Management System," in Proceedings of the 2003 CIDR Conference, Asilomar, 2003.

4. [4] R. Sabbagh Go and N. Daneshpour, "An Improved View Selection Algorithm in Data Warehouses by Finding Frequent Queries," JSDP, vol. 14, no. 1, pp. 29-40, 2017. [DOI:10.18869/acadpub.jsdp.14.1.29]

5. [5] M. Cho, J. Pei and K. Wang, "Answering ad hoc aggregate queries from data streams using prefix aggregate trees," Knowledge and Information Systems, vol. 12, no. 3, pp. 301 - 329, August 2007. [DOI:10.1007/s10115-006-0024-8]

6. [6] C. J. Hahn, S. G. Warren and R. Eastman, "Extended Edited Synoptic Cloud Reports from Ships and Land Stations Over the Globe," 1952-2009. [Online]. Available: http://cdiac.ornl.gov/epubs/ndp/ndp026c/ndp026c.html.

7. [7] K. Beyer and R. Ramakrishnan, "Bottom-up computation of sparse and Iceberg CUBE," in SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data, New York, 1999. [DOI:10.1145/304182.304214]

8. [8] K. A. Ross and D. Srivastava, "Fast Computation of Sparse Datacubes," in VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases, San Francisco, 1997.

9. [9] Y. Sismanis, A. Deligiannakis, N. Roussopoulos and Y. Kotidis, "Dwarf: Shrinking the PetaCube," in SIGMOD '02 Proceedings of the 2002 ACM SIGMOD international conference on Management of data, New York, 2002. [DOI:10.1145/564691.564745]

10. [10] W. Wang, J. Feng, H. Lu and J. Yu, "Condensed cube: an effective approach to reducing data cube size," in 18th International Conference on Data Engineering, San Fransisco,CA, 2002. [DOI:10.1109/ICDE.2002.994705]

11. [11] J. Li, D. Maier, K. Tufte, V. Papadimos and P. A. Tucker, "Semantics and evaluation techniques for window aggregates in data streams," in International Conference on Management of Data, New York, 2005. [DOI:10.1145/1066157.1066193]

12. [12] S. Madden, M. J. Franklin, J. M. Hellerstein and W. Hong, "TAG: a Tiny AGgregation Service for Ad- Hoc Sensor Networks," ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation, vol. 36, no. SI, pp. 131-146, 2002. [DOI:10.1145/1060289.1060303]

13. [13] J. Considine, F. Li, G. Kollios and J. Byers, "Approximate aggregation techniques for sensor databases," in ICDE '04 Proceedings of the 20th International Conference on Data Engineering, Washington, DC, USA, 2004. [DOI:10.1109/ICDE.2004.1320018]

14. [14] C. Intanagonwiwat, R. Govindan and D. Estrin, "Directed diffusion: a scalable and robust communication paradigm for sensor networks," in MobiCom '00 Proceedings of the 6th annual international conference on Mobile computing and networking, New York, NY, USA, 2000. [DOI:10.1145/345910.345920]

15. [15] Y. Yao and J. Gehrke, "The cougar approach to in-network query processing in sensor networks," ACM SIGMOD Record, vol. 31, no. 3, pp. 9-18, September 2002. [DOI:10.1145/601858.601861]

16. [16] Y. Yao and J. Gehrke, "Query Processing for Sensor Networks," Asilomar, 2003.

17. [17] J. Zhao, R. Govindan and D. Estrin, "Computing Aggregates for Monitoring Wireless Sensor Networks," in SNPA, 2003.

18. [18] S. Madden, M. J. Franklin, J. M. Hellerstein and W. Hong, "The Design of an Acquisitional Query Processor for Sensor Networks," in International Conference on Management of Data, New York, NY, USA, 2003. [DOI:10.1145/872757.872817]

19. [19] A. A. Safaei, M. Haghjoo and F. Abdi, "Semantic Cache Schema for Query Processing in Mobile Databases," in IEEE ICDIM'08, London, UK, 2008. [DOI:10.1109/ICDIM.2008.4746755]

20. [20] A. A. Safaei, K. Mehran, M. Haghjoo and K. Izadi, "Caching Intermediate Results for Multiple-Query Optimization," in 5th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA-2007), Jordan, 2007. [DOI:10.1109/AICCSA.2007.370914]

21. [21] P. Flajolet and G. N. Martin, "Probabilistic counting algorithms for data base applications," Journal of Computer and System Sciences, vol. 31, no. 2, pp. 182 - 209, 1985. [DOI:10.1016/0022-0000(85)90041-8]

22. [22] N. Alon, Y. Matias and M. Szegedy, "The space complexity of approximating the frequency moments," JOURNAL OF COMPUTER AND SYSTEM SCIENCES, vol. 58, no. 1, pp. 137-147, February 1999. [DOI:10.1006/jcss.1997.1545]

23. [23] Z. Bar-yossef, T. S. Jayram, R. Kumar, D. Sivakumar and L. Trevisan, "Counting Distinct Elements in a Data Stream," in Randomization and Approximation Techniques in Computer Science, Springer-Verlag, 2002, pp. 1-10. [DOI:10.1007/3-540-45726-7_1]

24. [24] G. Cormode, M. Datar, P. Indyk and S. Muthukrishnan, "Comparing data streams using hamming norms (how to zero in)," IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 3, pp. 529-540, March 2003. [DOI:10.1109/TKDE.2003.1198388]

25. [25] P. Flajolet, "On Adaptive Sampling," Computing, vol. 43, no. 4, pp. 391 - 400, February 1990. [DOI:10.1007/BF02241657]

26. [26] S. Ganguly, M. Garofalakis and R. Rastogi, "Processing set expressions over continuous update streams," in MOD International Conference on Management of Data, New York, NY, USA, 2003. [DOI:10.1145/872757.872790]

27. [27] P. B. Gibbons and S. Tirthapura, "Estimating simple functions on the :union: of data streams," in SPAA '01 Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures, New York, NY, USA, 2001. [DOI:10.1145/378580.378687]

28. [28] F. Dehne, Q. Kong, A. Rau-Chaplin, H. Zaboli and R. Zhou, "Scalable real-time OLAP on cloud architectures," Parallel and Distributed Computing, Vols. 79-80, pp. 31-41, May 2015. [DOI:10.1016/j.jpdc.2014.08.006]

29. [29] A. A. Safaei, "Real-time Processing of Streaming Big Data," Journal of Real-Time Systems, August 2016.

30. [30] B. Babcock, S. Babu, M. Datar, R. Motwani and J. Widom, "Models and issues in data stream systems," in twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database, New York, 2002. [DOI:10.1145/543613.543615]

31. [31] A. Arasu, B. Babcock, S. Babu, J. Cieslewicz, K. Ito, R. Motwani, U. Srivastava and J. Widom, "Stream: The stanford data stream management system," Stanford InfoLab, 2004.

32. [32] D. J. DeWitt and J. Gray, "Parallel Database Systems: The Future of High Performance Database Processing," Communications of the ACM, vol. 36, no. 6, pp. 85-98, June 1992. [DOI:10.1145/129888.129894]

33. [33] A. A. Safaei and M. Haghjoo, "Dispatching of Stream Operators in Parallel Execution of Continuous Queries," Supercomputing, vol. 61, pp. 619-641, 2012. [DOI:10.1007/s11227-011-0621-5]

34. [34] A. A. Safaei and M. Haghjoo, "Parallel processing of continuous queries over data streams," Distributed and Parallel Databases, vol. 28, no. 2, pp. 93-118, 2010. [DOI:10.1007/s10619-010-7066-3]

35. [35] Safaei, A. A., and M. S. Haghjoo. "Parallel processing of data streams." J Comput Sci Eng 11.2 (2014): 11-29

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.