R package source code: https://gitlab.gwdg.de/mpievolbio-it/repeatr

R package pages: https://mpievolbio-it.pages.gwdg.de/repeatr/

R package issues: https://gitlab.gwdg.de/mpievolbio-it/repeatr/issues

repeatR - Description

repeatR is a fork and add-on of the repeat alignment algorithm introduced by Vara C et al. (2019) and original implemented from Luca Ferretti at https://github.com/lucaferretti/RepeatDistance in R.

The algorithm is described in more detail in Vara C et al. (2019) and in Ferretti L et al. (2018).

This R package tries to add some usability to the original R function by incorporating some wrapper scripts and bindings to Biostrings.

Installation

see also here for the R package pages https://mpievolbio-it.pages.gwdg.de/repeatr/

R specific installation prerequisites

install packages from cran

In most cases you need to first install the following system-wide packages to be able to compile the R dependencies.

Ubuntu/Debian

sudo apt-get install libcurl4-openssl-dev libssl-dev libxml2-dev

CentOS

sudo yum install libcurl-devel openssl-devel libxml2-devel
install.packages("devtools")
install.packages("testthat")
install.packages("ape")
install.packages("adegenet")
install.packages("ade4")
install.packages("pegas")

install packages from Bioconductor

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("Biostrings")

install MSA2dist

library(devtools)
install_gitlab("mpievolbio-it/msa2dist", host = "https://gitlab.gwdg.de",
build_vignettes = TRUE, dependencies = TRUE)
#install_github("kullrich/MSA2dist", build_vignettes = TRUE, dependencies = TRUE)

install repeatR

library(devtools)
install_gitlab("mpievolbio-it/repeatr", host = "https://gitlab.gwdg.de",
build_vignettes = FALSE, dependencies = FALSE)
#install_github("kullrich/repeatr", build_vignettes = FALSE, dependencies = FALSE)

Vignettes

These vignettes introduce repeatR

Quick-guide

library(repeatR)
## load example sequence data
data("mousePRDM9", package="repeatR")
## define repeat pattern
myRepPattern<-"PY"
## define repeat length
myRepLength<-84
## select 20 random samples
mousePRDM9.random <- sample(mousePRDM9, 20)
## split original CDS file into repeats
mousePRDM9.random.split<-repeatR::splitRepByPattern(mousePRDM9.random,
    myRepPattern, myRepLength)
## get distance for all-vs-all comparison excluding highly variable sites
dist.mat.hamming.exclude.pos<-repeatR::ListPairwiseDistance(
    x=mousePRDM9.random.split$cds,
    dist.type="hamming",
    wmut=1,
    windel=3.5,
    wslippage=1.75,
    exclude.pos=c(37:39,46:48,55:57),
    post.include=FALSE,
    output.dist="distance")
## calculate bionj tree from resulting distances and write tree in newick format
mousePRDM9.random.bionj<-ape::bionj(as.dist(dist.mat.hamming.exclude.pos))
ape::write.tree(mousePRDM9.random.bionj)
plot(mousePRDM9.random.bionj)

Todo

  • write Vignette
  • add co-phylo-plot

License

GPL-3 (see LICENSE)

Contributing Code

If you would like to contribute to repeatr, please file an issue so that one can establish a statement of need, avoid redundant work, and track progress on your contribution.

Before you do a pull request, you should always file an issue and make sure that someone from the repeatr developer team agrees that it’s a problem, and is happy with your basic proposal for fixing it.

Once an issue has been filed and we’ve identified how to best orient your contribution with package development as a whole, fork the main repo, branch off a feature branch from master, commit and push your changes to your fork and submit a pull request for repeatr:master.

By contributing to this project, you agree to abide by the Code of Conduct terms.

Bug reports

Please report any errors or requests regarding repeatR to Kristian Ullrich ()

or use the issue tracker at https://gitlab.gwdg.de/mpievolbio-it/repeatr/issues

Code of Conduct - Participation guidelines

This repository adhere to Contributor Covenant code of conduct for in any interactions you have within this project. (see Code of Conduct)

See also the policy against sexualized discremination, harassment and violence for the Max Planck Society Code-of-Conduct.

By contributing to this project, you agree to abide by its terms.

References

Vara C., Capilla L., Ferretti L., Ledda A., Sanchez-Guillen RA., Gabriel SI., Albert-Lizandra G., Florit-Sabater B., Bello-Rodriguez J., Ventura J., Searle JB., Mathias ML., and Ruiz-Herrera A. (2019). PRDM9 Diveristy at Fine Geographical Scale Reveals Contrasting Evolutionary Patterns and Functional Constraints in Natural Populations of House Mice Molecular Biology and Evolution, 36(8), 1686-1700. https://doi.org/10.1093/molbev/msz091

Ferretti L., Ruiz-Herrera A., Ledda A. (2018). Genetic distance between complex repeats https://github.com/lucaferretti/RepeatDistance/blob/master/RepeatAlignment.pdf