Advanced search
2 files | 2.44 MB Add to list

SpeCH : a scalable framework for data placement of data-intensive services in geo-distributed clouds

Author
Organization
Abstract
The advent of big data analytics and cloud computing technologies has resulted in wide-spread research on the data placement problem. Since data-intensive services require access to multiple datasets within each transaction, traditional schemes of uniformly partitioning the data into distributed nodes, as employed by many popular data stores like HDFS or Cassandra, may cause network congestion thereby affecting system throughput. In this article, we propose a scalable and unified framework for data-intensive service data placement into geographically distributed clouds. The proposed framework introduces a new paradigm for partitioning a set of data-items into geo-distributed clouds using Spectral Clustering on Hypergraphs, and is therefore called SpeCH. Scaling spectral methods to large workloads is challenging, since computing the spectra of the hypergraph laplacian is a computationally intensive task. SpeCH provides two solutions to tackle this problem: (1) an algorithm, called SpectralApprox, that leverages randomized techniques for obtaining low-rank approximations of the hypergraph matrix with bounded guarantees, thereby significantly improving the efficiency of spectral clustering while also providing high quality solutions in practice; (2) an algorithm, called SpectralDist, that exploits the highly parallel nature of the spectral clustering algorithm and uses Apache Spark to speed-up the process while retaining the same quality guarantees as the exact algorithm. Additionally, being distributed in nature, SpectralDist enables SpeCH to perform data placement on workloads that require resources beyond the capacity of a single machine. Experiments on a real-world trace-based online social network dataset show that the SpeCH is effective, efficient, and scalable Empirically, SpectralApprox is comparable in efficacy on the evaluated metrics, while being up to 10 times faster in execution time when compared to state-of-the-art techniques. On the other hand, though SpectralApprox is 7-8 times faster when compared to SpectralDist, in terms of efficacy on the evaluated metrics the latter is up to 50% better.
Keywords
ALGORITHM, NETWORK, Data placement, Geo-distributed clouds, Location-based services, Online, social networks, Scalability, Spectral clustering, Hypergraphs, Approximation, Distribution, Apache Spark

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 1.96 MB
  • 7469 i.pdf
    • full text
    • |
    • open access
    • |
    • PDF
    • |
    • 478.85 KB

Citation

Please use this url to cite or link to this publication:

MLA
Atrey, Ankita, et al. “SpeCH : A Scalable Framework for Data Placement of Data-Intensive Services in Geo-Distributed Clouds.” JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, vol. 142, Academic Press Ltd- Elsevier Science Ltd, 2019, pp. 1–14, doi:10.1016/j.jnca.2019.05.012.
APA
Atrey, A., Van Seghbroeck, G., Mora, H., De Turck, F., & Volckaert, B. (2019). SpeCH : a scalable framework for data placement of data-intensive services in geo-distributed clouds. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 142, 1–14. https://doi.org/10.1016/j.jnca.2019.05.012
Chicago author-date
Atrey, Ankita, Gregory Van Seghbroeck, Higinio Mora, Filip De Turck, and Bruno Volckaert. 2019. “SpeCH : A Scalable Framework for Data Placement of Data-Intensive Services in Geo-Distributed Clouds.” JOURNAL OF NETWORK AND COMPUTER APPLICATIONS 142: 1–14. https://doi.org/10.1016/j.jnca.2019.05.012.
Chicago author-date (all authors)
Atrey, Ankita, Gregory Van Seghbroeck, Higinio Mora, Filip De Turck, and Bruno Volckaert. 2019. “SpeCH : A Scalable Framework for Data Placement of Data-Intensive Services in Geo-Distributed Clouds.” JOURNAL OF NETWORK AND COMPUTER APPLICATIONS 142: 1–14. doi:10.1016/j.jnca.2019.05.012.
Vancouver
1.
Atrey A, Van Seghbroeck G, Mora H, De Turck F, Volckaert B. SpeCH : a scalable framework for data placement of data-intensive services in geo-distributed clouds. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS. 2019;142:1–14.
IEEE
[1]
A. Atrey, G. Van Seghbroeck, H. Mora, F. De Turck, and B. Volckaert, “SpeCH : a scalable framework for data placement of data-intensive services in geo-distributed clouds,” JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, vol. 142, pp. 1–14, 2019.
@article{8628784,
  abstract     = {{The advent of big data analytics and cloud computing technologies has resulted in wide-spread research on the data placement problem. Since data-intensive services require access to multiple datasets within each transaction, traditional schemes of uniformly partitioning the data into distributed nodes, as employed by many popular data stores like HDFS or Cassandra, may cause network congestion thereby affecting system throughput. In this article, we propose a scalable and unified framework for data-intensive service data placement into geographically distributed clouds. The proposed framework introduces a new paradigm for partitioning a set of data-items into geo-distributed clouds using Spectral Clustering on Hypergraphs, and is therefore called SpeCH. Scaling spectral methods to large workloads is challenging, since computing the spectra of the hypergraph laplacian is a computationally intensive task. SpeCH provides two solutions to tackle this problem: (1) an algorithm, called SpectralApprox, that leverages randomized techniques for obtaining low-rank approximations of the hypergraph matrix with bounded guarantees, thereby significantly improving the efficiency of spectral clustering while also providing high quality solutions in practice; (2) an algorithm, called SpectralDist, that exploits the highly parallel nature of the spectral clustering algorithm and uses Apache Spark to speed-up the process while retaining the same quality guarantees as the exact algorithm. Additionally, being distributed in nature, SpectralDist enables SpeCH to perform data placement on workloads that require resources beyond the capacity of a single machine. Experiments on a real-world trace-based online social network dataset show that the SpeCH is effective, efficient, and scalable Empirically, SpectralApprox is comparable in efficacy on the evaluated metrics, while being up to 10 times faster in execution time when compared to state-of-the-art techniques. On the other hand, though SpectralApprox is 7-8 times faster when compared to SpectralDist, in terms of efficacy on the evaluated metrics the latter is up to 50% better.}},
  author       = {{Atrey, Ankita and Van Seghbroeck, Gregory and Mora, Higinio and De Turck, Filip and Volckaert, Bruno}},
  issn         = {{1084-8045}},
  journal      = {{JOURNAL OF NETWORK AND COMPUTER APPLICATIONS}},
  keywords     = {{ALGORITHM,NETWORK,Data placement,Geo-distributed clouds,Location-based services,Online,social networks,Scalability,Spectral clustering,Hypergraphs,Approximation,Distribution,Apache Spark}},
  language     = {{eng}},
  pages        = {{1--14}},
  publisher    = {{Academic Press Ltd- Elsevier Science Ltd}},
  title        = {{SpeCH : a scalable framework for data placement of data-intensive services in geo-distributed clouds}},
  url          = {{http://doi.org/10.1016/j.jnca.2019.05.012}},
  volume       = {{142}},
  year         = {{2019}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: