On weighted k-mer dictionaries

Table 1 Some basic statistics for the datasets used in the experiments, for \(k=31\), such as: number of distinct \(k\)-mers (n), number of distinct weights (\(|\mathcal {D}|\)), largest weight (max), expected weight value (E), and empirical entropy of the weights (\(H_0(W)\))

Dataset	n	\({\|\mathcal {D}\|}\)	\({\lceil \log _{2}\|\mathcal {D}\|\rceil }\)	\({max }\)	\({\lceil \log _{2}max \rceil }\)	E	\({H_{0}(W)}\)
E-Coli	5,235,781	22	5	27	5	1.05	0.206
S-Enterica-100	12,408,741	620	10	7956	13	38.94	4.155
Human-Chr-13	90,911,778	806	10	6354	13	1.08	0.160
C-Elegans	94,006,897	398	9	3478	12	1.07	0.223

ISSN: 1748-7188