This function calculates Ka/Ks (pN/pS)
for all combinations given in an indices list
of a
DNAStringSet
.
If the sequences in the DNAStringSet
are not a multiple-sequence
alignment, pairwise codon alignments can be calculated on the fly.
Models used and implemented according to
Li (1993) (using seqinr
) or
Nei and Gojobori (1986) (own implementation) or models from
KaKs_Calculator2
ported to MSA2dist
with Rcpp
.
indices2kaks(
cds,
indices,
model = "Li",
threads = 1,
isMSA = TRUE,
sgc = "1",
verbose = FALSE,
...
)
DNAStringSet
coding sequence alignment [mandatory]
list
list of indices to calculate Ks/Ks [mandatory]
specify codon model either "Li" or "NG86" or
one of KaKs_Calculator2
model "NG", "LWL", "LPB", "MLWL", "MLPB",
"GY", "YN", "MYN", "MS", "MA", "GNG", "GLWL", "GLPB", "GMLWL", "GMLPB",
"GYN", "GMYN" [default: Li]
number of parallel threads [default: 1]
cds DNAStringSet
represents MSA [default: TRUE]
standard genetic code (for KaKs Calculator models) [default: 1]
verbosity (for KaKs Calculator models) [default: FALSE]
other codon alignment parameters
A data.frame
of KaKs
values
"MS/MA/GNG/GLWL/GLPB/GMLWL/GMLPB/GYN:" Wang et al. (2010) KaKs_Calculator 2.0: a toolkit incorporating gamma-series methods and sliding window strategies.Genomics, proteomics & bioinformatics. 8(1), 77-80.
"Li/LWL:" Li et al. (1985) A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol., 2(2), 150-174.
"Li/LPB:" Li (1993). Unbiased estimation of the rates of synonymous and nonsynonymous substitution. Journal of molecular evolution, 36(1), pp.96-99.
"NG86/NG:" Nei and Gojobori. (1986) Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol., 3(5), 418-426.
"LPB:" Pamilo and Bianchi. (1993) Evolution of the Zfx and Zfy genes: Rates and interdependence between genes. Mol. Biol. Evol., 10, 271-281.
"MLWL/MLPB:" Tzeng et al. (2004). Comparison of three methods for estimating rates of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol., 21(12), 2290-2298.
"GY:" Goldman and Yang (1994). A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol., 11(5) 725-736.
"YN:" Yang et al. (2000) Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics. 155(1), 431-449.
"MYN:" Zhang et al. (2006). Computing Ka and Ks with a consideration of unequal transitional substitutions. BMC evolutionary biology, 6(1), 1-10.
"data(hiv):" Ganeshan et al. (1997) Human immunodeficiency virus type 1 genetic evolution in children with different rates of development of disease. J. Virology. 71(1), 663-677.
Wang et al. (2009). gamma-MYN: a new algorithm for estimating Ka and Ks with consideration of variable substitution rates. Biology Direct, 4(1), 1-18.
## load example sequence data
data("hiv", package="MSA2dist")
## create indices
idx <- list(c(2, 3), c(5,7,9))
#indices2kaks(hiv, idx, model="Li")
hiv |> indices2kaks(idx, model="Li")
#> Comp1 Comp2 seq1 seq2 ka ks vka vks
#> 1 2 3 U68497 U68498 0.1229150 0.01780069 0.001145105 0.0005798517
#> 2 5 7 U68500 U68502 0.1598801 0.14384729 0.001607816 0.0064428544
#> 3 5 9 U68500 U68504 0.1280077 0.16775540 0.001386777 0.0080191008
#> 4 7 9 U68502 U68504 0.1080925 0.15227654 0.001025740 0.0058858088
#> Ka Ks Ka/Ks
#> 1 0.1229150 0.01780069 6.9050687
#> 2 0.1598801 0.14384729 1.1114569
#> 3 0.1280077 0.16775540 0.7630618
#> 4 0.1080925 0.15227654 0.7098433
#indices2kaks(hiv, idx, model="NG86")
hiv |> indices2kaks(idx, model="NG86")
#> Comp1 Comp2 seq1 seq2 Codons Compared Ambigiuous Indels Ns
#> result.1 2 3 U68497 U68498 91 91 0 0 0
#> result.1.1 5 7 U68500 U68502 91 91 0 0 0
#> result.2 5 9 U68500 U68504 91 91 0 0 0
#> result.2.1 7 9 U68502 U68504 91 91 0 0 0
#> Sd Sn S N
#> result.1 1.5 22.5 57.5 215.5
#> result.1.1 6.33333333333333 29.6666666666667 56.6666666666667 216.333333333333
#> result.2 6.5 24.5 56.1666666666667 216.833333333333
#> result.2.1 7.83333333333333 21.1666666666667 57.5 215.5
#> ps pn pn/ps
#> result.1 0.0260869565217391 0.104408352668213 4.00232018561485
#> result.1.1 0.111764705882353 0.13713405238829 1.22698888978996
#> result.2 0.115727002967359 0.112990007686395 0.976349553597824
#> result.2.1 0.136231884057971 0.0982211910286156 0.72098533840154
#> ds dn dn/ds
#> result.1 0.026551445288187 0.112429520622234 4.23440303915417
#> result.1.1 0.121024643713475 0.15144523277581 1.25135863348919
#> result.2 0.125695312562783 0.122465898420489 0.97430760084485
#> result.2.1 0.150342008285561 0.10527596415443 0.700243167927274
#> Ka Ks Ka/Ks
#> result.1 0.112429520622234 0.026551445288187 4.23440303915417
#> result.1.1 0.15144523277581 0.121024643713475 1.25135863348919
#> result.2 0.122465898420489 0.125695312562783 0.97430760084485
#> result.2.1 0.10527596415443 0.150342008285561 0.700243167927274
#indices2kaks(hiv, idx, model="NG86", threads=2)
hiv |> indices2kaks(idx, model="NG86", threads=2)
#> Comp1 Comp2 seq1 seq2 Codons Compared Ambigiuous Indels Ns
#> result.1 2 3 U68497 U68498 91 91 0 0 0
#> result.1.1 5 7 U68500 U68502 91 91 0 0 0
#> result.2 5 9 U68500 U68504 91 91 0 0 0
#> result.2.1 7 9 U68502 U68504 91 91 0 0 0
#> Sd Sn S N
#> result.1 1.5 22.5 57.5 215.5
#> result.1.1 6.33333333333333 29.6666666666667 56.6666666666667 216.333333333333
#> result.2 6.5 24.5 56.1666666666667 216.833333333333
#> result.2.1 7.83333333333333 21.1666666666667 57.5 215.5
#> ps pn pn/ps
#> result.1 0.0260869565217391 0.104408352668213 4.00232018561485
#> result.1.1 0.111764705882353 0.13713405238829 1.22698888978996
#> result.2 0.115727002967359 0.112990007686395 0.976349553597824
#> result.2.1 0.136231884057971 0.0982211910286156 0.72098533840154
#> ds dn dn/ds
#> result.1 0.026551445288187 0.112429520622234 4.23440303915417
#> result.1.1 0.121024643713475 0.15144523277581 1.25135863348919
#> result.2 0.125695312562783 0.122465898420489 0.97430760084485
#> result.2.1 0.150342008285561 0.10527596415443 0.700243167927274
#> Ka Ks Ka/Ks
#> result.1 0.112429520622234 0.026551445288187 4.23440303915417
#> result.1.1 0.15144523277581 0.121024643713475 1.25135863348919
#> result.2 0.122465898420489 0.125695312562783 0.97430760084485
#> result.2.1 0.10527596415443 0.150342008285561 0.700243167927274
## define three unaligned cds sequences
cds1 <- Biostrings::DNAString("ATGCAACATTGC")
cds2 <- Biostrings::DNAString("ATGCATTGC")
cds3 <- Biostrings::DNAString("ATGCAATGC")
cds_sequences <- Biostrings::DNAStringSet(list(cds1, cds2, cds3))
names(cds_sequences) <- c("cds1", "cds2", "cds3")
## create indices
idx <- list(c(1, 2), c(1,3))
## set isMSA to FALSE to automatically create pairwise codon alignments
#indices2kaks(cds_sequences, idx, model="Li", isMSA=FALSE)
cds_sequences |> indices2kaks(idx, model="Li", isMSA=FALSE)
#> Comp1 Comp2 seq1 seq2 ka ks vka vks Ka Ks Ka/Ks
#> result.1 1 2 cds1 cds2 0 9.999999 0 9.999999 0 9.999999 0
#> result.2 1 3 cds1 cds3 0 9.999999 0 9.999999 0 9.999999 0