Skip to main content

Table 1 Detailed description of the ten gene families of the mammalian dataset

From: Aligning coding sequences with frameshift extension penalties

Gene family

Human gene

# of genes

# of CDS

Length

\(\frac{N*(N-1)}{2}\)

I (FAM86)

ENSG00000118894

6

14

10335

91

II (HBG017385)

ENSG00000143867

6

10

8988

45

III (HBG020791)

ENSG00000179526

6

10

11070

45

IV (HBG004532)

ENSG00000173020

17

33

52356

528

V (HBG016641)

ENSG00000147041

13

33

64950

528

VI (HBG014779)

ENSG00000233803

28

44

45813

946

VII (HBG012748)

ENSG00000134545

24

44

28050

946

VIII (HBG015928)

ENSG00000178287

5

19

5496

171

IX (HBG004374)

ENSG00000140519

13

30

36405

435

X (HBG000122)

ENSG00000105717

11

24

27081

276

Total number of pairs of CDS

4011

  1. For each gene family, the family identifier used in [6] or [12], the Ensembl identifier of a human gene member of the family, the number of human, mouse and cow genes in the family, the total number of CDS of these genes, the total sum of lengths of these CDS and the number of distinct pairs of CDS are given