Advanced search
1 file | 1.07 MB

Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab)

Author
Organization
Abstract
Background: Parallel data manipulation using R has previously been addressed by members of the R community, however most of these studies produce ad hoc solutions that are not readily available to the average R user. Our targeted users, ranging from the expert ecologist/microbiologists to computational biologists, often experience difficulties in finding optimal ways to exploit the full capacity of their computational resources. In addition, improving performance of commonly used R scripts becomes increasingly difficult especially with large datasets. Furthermore, the implementations described here can be of significant interest to expert bioinformaticians or R developers. Therefore, our goals can be summarized as: (i) description of a complete methodology for the analysis of large datasets by combining capabilities of diverse R packages, (ii) presentation of their application through a virtual R laboratory (RvLab) that makes execution of complex functions and visualization of results easy and readily available to the end-user. New information: In this paper, the novelty stems from implementations of parallel methodologies which rely on the processing of data on different levels of abstraction and the availability of these processes through an integrated portal. Parallel implementation R packages, such as the pbdMPI (Programming with Big Data – Interface to MPI) package, are used to implement Single Program Multiple Data (SPMD) parallelization on primitive mathematical operations, allowing for interplay with functions of the vegan package. The dplyr and RPostgreSQL R packages are further integrated offering connections to dataframe like objects (databases) as secondary storage solutions whenever memory demands exceed available RAM resources. The RvLab is running on a PC cluster, using version 3.1.2 (2014-10-31) on a x86_64-pc-linux-gnu (64-bit) platform, and offers an intuitive virtual environmet interface enabling users to perform analysis of ecological and microbial communities based on optimized vegan functions. A beta version of the RvLab is available after registration at: https://portal.lifewatchgreece.eu/

Downloads

  • BDJ article 8357.pdf
    • full text
    • |
    • open access
    • |
    • PDF
    • |
    • 1.07 MB

Citation

Please use this url to cite or link to this publication:

Chicago
Varsos, Constantinos, Theodore Patkos, Anastasis Oulas, Christina Pavloudi, Alexandros Gougousis, Umer Zeeshan Ijaz, Irene Filiopoulou, et al. 2016. “Optimized R Functions for Analysis of Ecological Community Data Using the R Virtual Laboratory (RvLab).” Biodiversity Data Journal 4.
APA
Varsos, C., Patkos, T., Oulas, A., Pavloudi, C., Gougousis, A., Ijaz, U. Z., Filiopoulou, I., et al. (2016). Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab). BIODIVERSITY DATA JOURNAL, 4.
Vancouver
1.
Varsos C, Patkos T, Oulas A, Pavloudi C, Gougousis A, Ijaz UZ, et al. Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab). BIODIVERSITY DATA JOURNAL. 2016;4.
MLA
Varsos, Constantinos, Theodore Patkos, Anastasis Oulas, et al. “Optimized R Functions for Analysis of Ecological Community Data Using the R Virtual Laboratory (RvLab).” BIODIVERSITY DATA JOURNAL 4 (2016): n. pag. Print.
@article{8507220,
  abstract     = {Background: Parallel data manipulation using R has previously been addressed by members of the R community, however most of these studies produce ad hoc solutions that are not readily available to the average R user. Our targeted users, ranging from the expert ecologist/microbiologists to computational biologists, often experience difficulties in finding optimal ways to exploit the full capacity of their computational resources. In addition, improving performance of commonly used R scripts becomes increasingly difficult especially with large datasets. Furthermore, the implementations described here can be of significant interest to expert bioinformaticians or R developers. Therefore, our goals can be summarized as: (i) description of a complete methodology for the analysis of large datasets by combining capabilities of diverse R packages, (ii) presentation of their application through a virtual R laboratory (RvLab) that makes execution of complex functions and visualization of results easy and readily available to the end-user.
New information: In this paper, the novelty stems from implementations of parallel methodologies which rely on the processing of data on different levels of abstraction and the availability of these processes through an integrated portal. Parallel implementation R packages, such as the pbdMPI (Programming with Big Data -- Interface to MPI) package, are used to implement Single Program Multiple Data (SPMD) parallelization on primitive mathematical operations, allowing for interplay with functions of the vegan package. The dplyr and RPostgreSQL R packages are further integrated offering connections to dataframe like objects (databases) as secondary storage solutions whenever memory demands exceed available RAM resources.
The RvLab is running on a PC cluster, using version 3.1.2 (2014-10-31) on a x86\_64-pc-linux-gnu (64-bit) platform, and offers an intuitive virtual environmet interface enabling users to perform analysis of ecological and microbial communities based on optimized vegan functions.
A beta version of the RvLab is available after registration at: https://portal.lifewatchgreece.eu/},
  articleno    = {e8357},
  author       = {Varsos, Constantinos and Patkos, Theodore and Oulas, Anastasis and Pavloudi, Christina and Gougousis, Alexandros and Ijaz, Umer Zeeshan and Filiopoulou, Irene and Pattakos, Nikolaos and Vanden Berghe, Edward and Fern{\'a}ndez-Guerra, Antonio and Faulwetter, Sarah and Chatzinikolaou, Eva and Pafilis, Evangelos and Bekiari, Chryssoula and Doerr, Martin and Arvanitidis, Christos},
  issn         = {1314-2828},
  journal      = {BIODIVERSITY DATA JOURNAL},
  language     = {eng},
  pages        = {28},
  title        = {Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab)},
  url          = {http://dx.doi.org/10.3897/bdj.4.e8357},
  volume       = {4},
  year         = {2016},
}

Altmetric
View in Altmetric