
SpeCH : a scalable framework for data placement of data-intensive services in geo-distributed clouds
- Author
- Ankita Atrey, Gregory Van Seghbroeck (UGent) , Higinio Mora, Filip De Turck (UGent) and Bruno Volckaert (UGent)
- Organization
- Abstract
- The advent of big data analytics and cloud computing technologies has resulted in wide-spread research on the data placement problem. Since data-intensive services require access to multiple datasets within each transaction, traditional schemes of uniformly partitioning the data into distributed nodes, as employed by many popular data stores like HDFS or Cassandra, may cause network congestion thereby affecting system throughput. In this article, we propose a scalable and unified framework for data-intensive service data placement into geographically distributed clouds. The proposed framework introduces a new paradigm for partitioning a set of data-items into geo-distributed clouds using Spectral Clustering on Hypergraphs, and is therefore called SpeCH. Scaling spectral methods to large workloads is challenging, since computing the spectra of the hypergraph laplacian is a computationally intensive task. SpeCH provides two solutions to tackle this problem: (1) an algorithm, called SpectralApprox, that leverages randomized techniques for obtaining low-rank approximations of the hypergraph matrix with bounded guarantees, thereby significantly improving the efficiency of spectral clustering while also providing high quality solutions in practice; (2) an algorithm, called SpectralDist, that exploits the highly parallel nature of the spectral clustering algorithm and uses Apache Spark to speed-up the process while retaining the same quality guarantees as the exact algorithm. Additionally, being distributed in nature, SpectralDist enables SpeCH to perform data placement on workloads that require resources beyond the capacity of a single machine. Experiments on a real-world trace-based online social network dataset show that the SpeCH is effective, efficient, and scalable Empirically, SpectralApprox is comparable in efficacy on the evaluated metrics, while being up to 10 times faster in execution time when compared to state-of-the-art techniques. On the other hand, though SpectralApprox is 7-8 times faster when compared to SpectralDist, in terms of efficacy on the evaluated metrics the latter is up to 50% better.
- Keywords
- ALGORITHM, NETWORK, Data placement, Geo-distributed clouds, Location-based services, Online, social networks, Scalability, Spectral clustering, Hypergraphs, Approximation, Distribution, Apache Spark
Downloads
-
(...).pdf
- full text
- |
- UGent only
- |
- |
- 1.96 MB
-
7469 i.pdf
- full text
- |
- open access
- |
- |
- 478.85 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-8628784
- MLA
- Atrey, Ankita, et al. “SpeCH : A Scalable Framework for Data Placement of Data-Intensive Services in Geo-Distributed Clouds.” JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, vol. 142, Academic Press Ltd- Elsevier Science Ltd, 2019, pp. 1–14, doi:10.1016/j.jnca.2019.05.012.
- APA
- Atrey, A., Van Seghbroeck, G., Mora, H., De Turck, F., & Volckaert, B. (2019). SpeCH : a scalable framework for data placement of data-intensive services in geo-distributed clouds. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 142, 1–14. https://doi.org/10.1016/j.jnca.2019.05.012
- Chicago author-date
- Atrey, Ankita, Gregory Van Seghbroeck, Higinio Mora, Filip De Turck, and Bruno Volckaert. 2019. “SpeCH : A Scalable Framework for Data Placement of Data-Intensive Services in Geo-Distributed Clouds.” JOURNAL OF NETWORK AND COMPUTER APPLICATIONS 142: 1–14. https://doi.org/10.1016/j.jnca.2019.05.012.
- Chicago author-date (all authors)
- Atrey, Ankita, Gregory Van Seghbroeck, Higinio Mora, Filip De Turck, and Bruno Volckaert. 2019. “SpeCH : A Scalable Framework for Data Placement of Data-Intensive Services in Geo-Distributed Clouds.” JOURNAL OF NETWORK AND COMPUTER APPLICATIONS 142: 1–14. doi:10.1016/j.jnca.2019.05.012.
- Vancouver
- 1.Atrey A, Van Seghbroeck G, Mora H, De Turck F, Volckaert B. SpeCH : a scalable framework for data placement of data-intensive services in geo-distributed clouds. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS. 2019;142:1–14.
- IEEE
- [1]A. Atrey, G. Van Seghbroeck, H. Mora, F. De Turck, and B. Volckaert, “SpeCH : a scalable framework for data placement of data-intensive services in geo-distributed clouds,” JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, vol. 142, pp. 1–14, 2019.
@article{8628784, abstract = {{The advent of big data analytics and cloud computing technologies has resulted in wide-spread research on the data placement problem. Since data-intensive services require access to multiple datasets within each transaction, traditional schemes of uniformly partitioning the data into distributed nodes, as employed by many popular data stores like HDFS or Cassandra, may cause network congestion thereby affecting system throughput. In this article, we propose a scalable and unified framework for data-intensive service data placement into geographically distributed clouds. The proposed framework introduces a new paradigm for partitioning a set of data-items into geo-distributed clouds using Spectral Clustering on Hypergraphs, and is therefore called SpeCH. Scaling spectral methods to large workloads is challenging, since computing the spectra of the hypergraph laplacian is a computationally intensive task. SpeCH provides two solutions to tackle this problem: (1) an algorithm, called SpectralApprox, that leverages randomized techniques for obtaining low-rank approximations of the hypergraph matrix with bounded guarantees, thereby significantly improving the efficiency of spectral clustering while also providing high quality solutions in practice; (2) an algorithm, called SpectralDist, that exploits the highly parallel nature of the spectral clustering algorithm and uses Apache Spark to speed-up the process while retaining the same quality guarantees as the exact algorithm. Additionally, being distributed in nature, SpectralDist enables SpeCH to perform data placement on workloads that require resources beyond the capacity of a single machine. Experiments on a real-world trace-based online social network dataset show that the SpeCH is effective, efficient, and scalable Empirically, SpectralApprox is comparable in efficacy on the evaluated metrics, while being up to 10 times faster in execution time when compared to state-of-the-art techniques. On the other hand, though SpectralApprox is 7-8 times faster when compared to SpectralDist, in terms of efficacy on the evaluated metrics the latter is up to 50% better.}}, author = {{Atrey, Ankita and Van Seghbroeck, Gregory and Mora, Higinio and De Turck, Filip and Volckaert, Bruno}}, issn = {{1084-8045}}, journal = {{JOURNAL OF NETWORK AND COMPUTER APPLICATIONS}}, keywords = {{ALGORITHM,NETWORK,Data placement,Geo-distributed clouds,Location-based services,Online,social networks,Scalability,Spectral clustering,Hypergraphs,Approximation,Distribution,Apache Spark}}, language = {{eng}}, pages = {{1--14}}, publisher = {{Academic Press Ltd- Elsevier Science Ltd}}, title = {{SpeCH : a scalable framework for data placement of data-intensive services in geo-distributed clouds}}, url = {{http://doi.org/10.1016/j.jnca.2019.05.012}}, volume = {{142}}, year = {{2019}}, }
- Altmetric
- View in Altmetric
- Web of Science
- Times cited: