MSA2dist calculates pairwise distances between all sequences of a DNAStringSet or a AAStringSet using a custom score matrix and conducts codon based analysis.

Installation

see also here for the R package pages https://mpievolbio-it.pages.gwdg.de/msa2dist/

R specific installation prerequisites

install packages from cran

In most cases you need to first install the following system-wide packages to be able to compile the R dependencies.

Ubuntu/Debian

sudo apt-get install libcurl4-openssl-dev libssl-dev libxml2-dev libglu1-mesa-dev libgit2-dev
#pkgdown dependencies - pkgdown is used to build R package pages
#sudo apt-get install libssh2-1-dev libfontconfig1-dev libharfbuzz-dev libfribidi-dev

CentOS

sudo yum install libcurl-devel openssl-devel libxml2-devel mesa-libGLU-devel libgit2-devel
#pkgdown dependencies - pkgdown is used to build R package pages
#sudo yum install libssh2-devel fontconfig-devel harfbuzz-devel fribidi-devel
install.packages("Rcpp")
install.packages("RcppThread")
install.packages("devtools")
install.packages("testthat")
install.packages("ape")
install.packages("doParallel")
install.packages("dplyr")
install.packages("foreach")
install.packages("rlang")
install.packages("seqinr")
install.packages("tibble")
install.packages("tidyr")
install.packages("stringi")

install packages from Bioconductor

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("Biostrings")
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("GenomicRanges")

install MSA2dist

library(devtools)
install_gitlab("mpievolbio-it/msa2dist", host = "https://gitlab.gwdg.de",
build_vignettes = TRUE, dependencies = TRUE)
#install_github("kullrich/MSA2dist", build_vignettes = TRUE, dependencies = TRUE)

Quick-guide

library(MSA2dist)
## load example sequence data
data("hiv", package="MSA2dist")

## calculate pairwise AA distances based on Grantham's distance
aa.dist <- hiv |> cds2aa() |> aastring2dist(score=granthamMatrix())
head(aa.dist$distSTRING)

## create and plot bionj tree
aa.dist.bionj <- ape::bionj(as.dist(aa.dist$distSTRING))
plot(aa.dist.bionj)

## calculate pairwise DNA distances based on IUPAC distance
dna.dist <- hiv |> dnastring2dist(model="IUPAC")
head(dna.dist$distSTRING)

## create and plot bionj tree
dna.dist.bionj <- ape::bionj(as.dist(dna.dist$distSTRING))

## creation of the association matrix:
association <- cbind(aa.dist.bionj$tip.label, aa.dist.bionj$tip.label)

## cophyloplot
ape::cophyloplot(aa.dist.bionj,
                 dna.dist.bionj,
                 assoc=association,
                 length.line=4,
                 space=28,
                 gap=3,
                 rotate=TRUE)

## calculate pairwise DNA distances based on K80 distance
dna.dist.K80 <- hiv |> dnastring2dist(model="K80")
head(dna.dist.K80$distSTRING)

## calculate pairwise AA distances based on getAAMatrix() function from the alakazam package
data("AAMatrix", package="MSA2dist")
aa.dist <- hiv |> cds2aa() |> aastring2dist(score=AAMatrix)

## example how to calculate all pairwise kaks values given a MSA
hiv_kaks_Li <- hiv |> dnastring2kaks(model="Li")
head(hiv_kaks_Li)

hiv_kaks_NG86 <- hiv |> dnastring2kaks(model="NG86")
head(hiv_kaks_NG86)

## using KaKs_Calculator2 model

hiv_kaks_YN <- hiv |> dnastring2kaks(model="YN")
head(hiv_kaks_YN)

hiv_kaks_MYN <- hiv |> dnastring2kaks(model="MYN")
head(hiv_kaks_MYN)

## example how to calculate all pairwise kaks values calculating
## pairwise MSA on the fly (see ?cds2codonaln)

hiv_kaks_Li <- hiv |> dnastring2kaks(model="Li", isMSA=FALSE)
head(hiv_kaks_Li)

## codon plot - sites under possible positive selection
library(tidyr)
library(dplyr)
library(ggplot2)
hiv_xy <- hiv |> dnastring2codonmat() |> codonmat2xy()
hiv_xy %>% dplyr::select(Codon,SynMean,NonSynMean,IndelMean) %>%
  tidyr::gather(variable, values, -Codon) %>% 
  ggplot2::ggplot(aes(x=Codon, y=values)) + 
    ggplot2::geom_line(aes(colour=factor(variable))) + 
    ggplot2::geom_point(aes(colour=factor(variable))) + 
    ggplot2::ggtitle("HIV-1 sample 136 patient 1 from Sweden envelope glycoprotein (env) gene")

TODO

  • codonmat2pnps : alternative translation tables

License

GPL-3 (see LICENSE)

Contributing Code

If you would like to contribute to MSA2dist, please file an issue so that one can establish a statement of need, avoid redundant work, and track progress on your contribution.

Before you do a pull request, you should always file an issue and make sure that someone from the MSA2dist developer team agrees that it’s a problem, and is happy with your basic proposal for fixing it.

Once an issue has been filed and we’ve identified how to best orient your contribution with package development as a whole, fork the main repo, branch off a feature branch from main, commit and push your changes to your fork and submit a pull request for MSA2dist:main.

By contributing to this project, you agree to abide by the Code of Conduct terms.

Bug reports

Please report any errors or requests regarding MSA2dist to Kristian Ullrich ()

or use the issue tracker at https://gitlab.gwdg.de/mpievolbio-it/msa2dist/issues

Code of Conduct - Participation guidelines

This repository adhere to Contributor Covenant code of conduct for in any interactions you have within this project. (see Code of Conduct)

See also the policy against sexualized discrimination, harassment and violence for the Max Planck Society Code-of-Conduct.

By contributing to this project, you agree to abide by its terms.