Skip to main content

Table 1 Selectivity score based selected codons

From: A Partial Least Squares based algorithm for parsimonious variable selection

Phylum

Gen.

Perf.

Positive and Negative impact

Actinobacteria

42

90.6

TCCGTA, TACGGA, GTGAAG, CTTCAC, TGTACA, TCCGTT, AGAAGG, CCTTCT, GAGGCT, GGAACA, TCCACC, TGTTCC, TTCCGT, CTTAAG, GGGATC, GATCCC, CCTTAA, TTAAGG AACGGA, GGTGGA, GTCGAC,

Bacteroides

16

96.3

TATATA, TCTATA, CTATAT, TATAGA, ATATAG, TATAGT, TTATAG, CTTATA, CTATAA, ACTATA, TATATC, GATATA, CTATAG, TATACT TATAAG, ATATAT,

C renarchaeota

16

96.5

AACGCT, AGCGTT, ACGAGT, ACTCGT, ACGACT, TTAGGG, TCGTGT, ACACGA, CCCTAA, TAGCGT, TACGAG, ACGCTA, CGTGTT, AACACG, GGGCTA, CTACGA, TCGTAG, CGAGTA, TACTCG, GCGTTT AGTCGT, CTCGTA, TAGCCC,

Cyanobacteria

17

97.1

CAATTG, GTTCAA, TTGAAC, TAAGAC, GTCTTA, CTTAGT, TTAGTC, GGTCAA, GACTAA, ACTAAG, CTTGAT, AAGTCA, ATCAAG, TGGTTC, GAACCA, AGTCAA, GACCAA, TTGGTC, TTGATC, GACTTG, TCTTAG, CAAGTC TTGACC, TGACTT, TTGACT, GATCAA,

Euryarchaeota

31

93.3

ACACCG, CGGTGT, TCGGGT, GGTGTC, TCGGTG, CACCGA, ACCCGA, CCGCGG, GGTGTG, TCACCG, TATCGT, TACGCT, TTCTGC GACACC, CACACC,

Firmicutes

89

80.3

TCGGTA, TACCGA, ACAGGA, TCCTGT

Alphaproteobacteria

70

85.9

TCGCGA, AAGATC, GATCTT, TTCGCG, AAATTT, CGCGAA

Betaproteobacteria

42

90.8

GGAACA, TGTTCC, TAGTCG, CGACTA, GCTAGC, AAGCTC, GAGCTT, TACGAG, CTCGTA, CTTGCA, GATCTT, TGCAAG, AAGATC, AGGCTT, AAGCCT, CTCGAG

Gammaproteobacteria

92

81.2

CTCAGT, ACTGAG, GACTCA, TGAGTC, ACTCAG, ACTCTG, CAGAGT, CTCAGA, TCTGAG, CTGTCT, CCAGAG, CTCTGG, TCACCT, TGACTC, CTCTGT, AGGTGA, GAGTCA, TCACTC, GAGTGA CTGAGT, AGACAG, ACAGAG,

Deltaproteobacteria

18

96.0

GACATT, TCATGT, ACATGA, AATGTC, AACATC, ATGTTG, CAACAT, CATTGT, ACAATG, ACATTG, ACAACA, TGTTGT, AACAAC, GTTGTT, CATTTC, GTTCCA, TGGAAC, CAACAA, TTGTTG, GAAACA, GGAACA, TGTTCC, AATGAC, GTCATT GATGTT, CAATGT, GAAATG, TGTTTC,

Epsilonproteobacteria

12

96.9

TCCTGT, ACAGGA, GTATCC, TCAGGA, TCCTGA, TGCAGA, TCTGCA, TTCAGG, CCTGAA, ATATCC, GAACCT, AGGTTC, GGAGAT, ATCTCC, TTGCAG, GGATAC, GGATAT, CTGCAA, TCCCTG, CAGGGA, ACTGCA, TGCAGT, TTCCTG, TACAGG

  1. Results obtained for each phylum by using the VIP criterion. Gen. is the number of genomes for that phylum in the data set, Perf. is the average test-set performance i.e. percentage of correctly classified samples, when classifying the corresponding phylum. This is synonymous to the true positive rate. Positive impact variables are variables with selectivity score above 0.01 and with positive regression coefficients while Negative impact variables are similar with negative regression coefficients.