Articles
Page 2 of 10
-
Citation: Algorithms for Molecular Biology 2022 17:1
-
An optimized FM-index library for nucleotide and amino acid search
Pattern matching is a key step in a variety of biological sequence analysis pipelines. The FM-index is a compressed data structure for pattern matching, with search run time that is independent of the length o...
Citation: Algorithms for Molecular Biology 2021 16:25 -
An improved approximation algorithm for the reversal and transposition distance considering gene order and intergenic sizes
In the comparative genomics field, one of the goals is to estimate a sequence of genetic changes capable of transforming a genome into another. Genome rearrangement events are mutations that can alter the gene...
Citation: Algorithms for Molecular Biology 2021 16:24 -
A simpler linear-time algorithm for the common refinement of rooted phylogenetic trees on a common leaf set
The supertree problem, i.e., the task of finding a common refinement of a set of rooted trees is an important topic in mathematical phylogenetics. The special case of a common leaf set L is known to be solvable i...
Citation: Algorithms for Molecular Biology 2021 16:23 -
Testing the agreement of trees with internal labels
A semi-labeled tree is a tree where all leaves as well as, possibly, some internal nodes are labeled with taxa. Semi-labeled trees encompass ordinary phylogenetic trees and taxonomies. Suppose we are given a c...
Citation: Algorithms for Molecular Biology 2021 16:22 -
Approximation algorithm for rearrangement distances considering repeated genes and intergenic regions
The rearrangement distance is a method to compare genomes of different species. Such distance is the number of rearrangement events necessary to transform one genome into another. Two commonly studied events a...
Citation: Algorithms for Molecular Biology 2021 16:21 -
DeepGRP: engineering a software tool for predicting genomic repetitive elements using Recurrent Neural Networks with attention
Repetitive elements contribute a large part of eukaryotic genomes. For example, about 40 to 50% of human, mouse and rat genomes are repetitive. So identifying and classifying repeats is an important step in ge...
Citation: Algorithms for Molecular Biology 2021 16:20 -
Heuristic algorithms for best match graph editing
Best match graphs (BMGs) are a class of colored digraphs that naturally appear in mathematical phylogenetics as a representation of the pairwise most closely related genes among multiple species. An arc connec...
Citation: Algorithms for Molecular Biology 2021 16:19 -
A novel method for inference of acyclic chemical compounds with bounded branch-height based on artificial neural networks and integer programming
Analysis of chemical graphs is becoming a major research topic in computational molecular biology due to its potential applications to drug design. One of the major approaches in such a study is inverse quant...
Citation: Algorithms for Molecular Biology 2021 16:18 -
INGOT-DR: an interpretable classifier for predicting drug resistance in M. tuberculosis
Prediction of drug resistance and identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Solving this problem requires a ...
Citation: Algorithms for Molecular Biology 2021 16:17 -
Approximate search for known gene clusters in new genomes using PQ-trees
Gene clusters are groups of genes that are co-locally conserved across various genomes, not necessarily in the same order. Their discovery and analysis is valuable in tasks such as gene annotation and predicti...
Citation: Algorithms for Molecular Biology 2021 16:16 -
Shape decomposition algorithms for laser capture microdissection
In the context of biomarker discovery and molecular characterization of diseases, laser capture microdissection is a highly effective approach to extract disease-specific regions from complex, heterogeneous ti...
Citation: Algorithms for Molecular Biology 2021 16:15 -
Distinguishing linear and branched evolution given single-cell DNA sequencing data of tumors
Cancer arises from an evolutionary process where somatic mutations give rise to clonal expansions. Reconstructing this evolutionary process is useful for treatment decision-making as well as understanding evol...
Citation: Algorithms for Molecular Biology 2021 16:14 -
Bayesian optimization with evolutionary and structure-based regularization for directed protein evolution
Directed evolution (DE) is a technique for protein engineering that involves iterative rounds of mutagenesis and screening to search for sequences that optimize a given property, such as binding affinity to a ...
Citation: Algorithms for Molecular Biology 2021 16:13 -
Using Robinson-Foulds supertrees in divide-and-conquer phylogeny estimation
One of the Grand Challenges in Science is the construction of the Tree of Life, an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life i...
Citation: Algorithms for Molecular Biology 2021 16:12 -
Using the longest run subsequence problem within homology-based scaffolding
Genome assembly is one of the most important problems in computational genomics. Here, we suggest addressing an issue that arises in homology-based scaffolding, that is, when linking and ordering contigs to ob...
Citation: Algorithms for Molecular Biology 2021 16:11 -
Disk compression of k-mer sets
K-mer based methods have become prevalent in many areas of bioinformatics. In applications such as database search, they often work with large multi-terabyte-sized datasets. Storing such large datasets is a de...
Citation: Algorithms for Molecular Biology 2021 16:10 -
The Bourque distances for mutation trees of cancers
Mutation trees are rooted trees in which nodes are of arbitrary degree and labeled with a mutation set. These trees, also referred to as clonal trees, are used in computational oncology to represent the mutati...
Citation: Algorithms for Molecular Biology 2021 16:9 -
LazyB: fast and cheap genome assembly
Advances in genome sequencing over the last years have lead to a fundamental paradigm shift in the field. With steadily decreasing sequencing costs, genome projects are no longer limited by the cost of raw seq...
Citation: Algorithms for Molecular Biology 2021 16:8 -
The energy-spectrum of bicompatible sequences
Genotype-phenotype maps provide a meaningful filtration of sequence space and RNA secondary structures are particular such phenotypes. Compatible sequences, which satisfy the base-pairing constraints of a give...
Citation: Algorithms for Molecular Biology 2021 16:7 -
Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph
Genome wide optical maps are high resolution restriction maps that give a unique numeric representation to a genome. They are produced by assembling hundreds of thousands of single molecule optical maps, which...
Citation: Algorithms for Molecular Biology 2021 16:6 -
Exact transcript quantification over splice graphs
The probability of sequencing a set of RNA-seq reads can be directly modeled using the abundances of splice junctions in splice graphs instead of the abundances of a list of transcripts. We call this model gra...
Citation: Algorithms for Molecular Biology 2021 16:5 -
Natural family-free genomic distance
A classical problem in comparative genomics is to compute the rearrangement distance, that is the minimum number of large-scale rearrangements required to transform a given genome into another given genome. Th...
Citation: Algorithms for Molecular Biology 2021 16:4 -
Improving metagenomic binning results with overlapped bins using assembly graphs
Metagenomic sequencing allows us to study the structure, diversity and ecology in microbial communities without the necessity of obtaining pure cultures. In many metagenomics studies, the reads obtained from m...
Citation: Algorithms for Molecular Biology 2021 16:3 -
Fast lightweight accurate xenograft sorting
With an increasing number of patient-derived xenograft (PDX) models being created and subsequently sequenced to study tumor heterogeneity and to guide therapy decisions, there is a similarly increasing need fo...
Citation: Algorithms for Molecular Biology 2021 16:2 -
Quantifying steric hindrance and topological obstruction to protein structure superposition
In computational structural biology, structure comparison is fundamental for our understanding of proteins. Structure comparison is, e.g., algorithmically the starting point for computational studies of struct...
Citation: Algorithms for Molecular Biology 2021 16:1 -
Fast and accurate structure probability estimation for simultaneous alignment and folding of RNAs with Markov chains
Simultaneous alignment and folding (SA&F) of RNAs is the indispensable gold standard for inferring the structure of non-coding RNAs and their general analysis. The original algorithm, proposed by Sankoff, solv...
Citation: Algorithms for Molecular Biology 2020 15:19 -
gsufsort: constructing suffix arrays, LCP arrays and BWTs for string collections
The construction of a suffix array for a collection of strings is a fundamental task in Bioinformatics and in many other applications that process strings. Related data structures, as the Longest Common Prefix...
Citation: Algorithms for Molecular Biology 2020 15:18 -
A linear-time algorithm that avoids inverses and computes Jackknife (leave-one-out) products like convolutions or other operators in commutative semigroups
Data about herpesvirus microRNA motifs on human circular RNAs suggested the following statistical question. Consider independent random counts, not necessarily identically distributed. Conditioned on the sum, ...
Citation: Algorithms for Molecular Biology 2020 15:17 -
Reconstruction of time-consistent species trees
The history of gene families—which are equivalent to event-labeled gene trees—can to some extent be reconstructed from empirically estimated evolutionary event-relations containing pairs of orthologous, paralo...
Citation: Algorithms for Molecular Biology 2020 15:16 -
On an enhancement of RNA probing data using information theory
Identifying the secondary structure of an RNA is crucial for understanding its diverse regulatory functions. This paper focuses on how to enhance target identification in a Boltzmann ensemble of structures via...
Citation: Algorithms for Molecular Biology 2020 15:15 -
Algorithms for the quantitative Lock/Key model of cytoplasmic incompatibility
Cytoplasmic incompatibility (CI) relates to the manipulation by the parasite Wolbachia of its host reproduction. Despite its widespread occurrence, the molecular basis of CI remains unclear and theoretical models...
Citation: Algorithms for Molecular Biology 2020 15:14 -
Fast computation of genome-metagenome interaction effects
Association studies have been widely used to search for associations between common genetic variants observations and a given phenotype. However, it is now generally accepted that genes and environment must be...
Citation: Algorithms for Molecular Biology 2020 15:13 -
Evolution through segmental duplications and losses: a Super-Reconciliation approach
The classical gene and species tree reconciliation, used to infer the history of gene gain and loss explaining the evolution of gene families, assumes an independent evolution for each family. While this assum...
Citation: Algorithms for Molecular Biology 2020 15:12 -
Precise parallel volumetric comparison of molecular surfaces and electrostatic isopotentials
Geometric comparisons of binding sites and their electrostatic properties can identify subtle variations that select different binding partners and subtle similarities that accommodate similar partners. Becaus...
Citation: Algorithms for Molecular Biology 2020 15:11 -
Context-aware seeds for read mapping
Most modern seed-and-extend NGS read mappers employ a seeding scheme that requires extracting t non-overlapping seeds in each read in order to find all valid mappings under an edit distance threshold of t. As t g...
Citation: Algorithms for Molecular Biology 2020 15:10 -
Detecting transcriptomic structural variants in heterogeneous contexts via the Multiple Compatible Arrangements Problem
Transcriptomic structural variants (TSVs)—large-scale transcriptome sequence change due to structural variation - are common in cancer. TSV detection from high-throughput sequencing data is a computationally c...
Citation: Algorithms for Molecular Biology 2020 15:9 -
The distance and median problems in the single-cut-or-join model with single-gene duplications
In the field of genome rearrangement algorithms, models accounting for gene duplication lead often to hard problems. For example, while computing the pairwise distance is tractable in most duplication-free mod...
Citation: Algorithms for Molecular Biology 2020 15:8 -
Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences
Non-parametric and semi-parametric resampling procedures are widely used to perform support estimation in computational biology and bioinformatics. Among the most widely used methods in this class is the stand...
Citation: Algorithms for Molecular Biology 2020 15:7 -
Linear-time algorithms for phylogenetic tree completion under Robinson–Foulds distance
We consider two fundamental computational problems that arise when comparing phylogenetic trees, rooted or unrooted, with non-identical leaf sets. The first problem arises when comparing two trees where the le...
Citation: Algorithms for Molecular Biology 2020 15:6 -
From pairs of most similar sequences to phylogenetic best matches
Many of the commonly used methods for orthology detection start from mutually most similar pairs of genes (reciprocal best hits) as an approximation for evolutionary most closely related pairs of genes (recipr...
Citation: Algorithms for Molecular Biology 2020 15:5 -
Alignment- and reference-free phylogenomics with colored de Bruijn graphs
The increasing amount of available genome sequence data enables large-scale comparative studies. A common task is the inference of phylogenies—a challenging task if close reference sequences are not available,...
Citation: Algorithms for Molecular Biology 2020 15:4 -
GrpClassifierEC: a novel classification approach based on the ensemble clustering space
Advances in molecular biology have resulted in big and complicated data sets, therefore a clustering approach that able to capture the actual structure and the hidden patterns of the data is required. Moreover...
Citation: Algorithms for Molecular Biology 2020 15:3 -
Finding all maximal perfect haplotype blocks in linear time
Recent large-scale community sequencing efforts allow at an unprecedented level of detail the identification of genomic regions that show signatures of natural selection. Traditional methods for identifying su...
Citation: Algorithms for Molecular Biology 2020 15:2 -
Non-parametric correction of estimated gene trees using TRACTION
Estimated gene trees are often inaccurate, due to insufficient phylogenetic signal in the single gene alignment, among other causes. Gene tree correction aims to improve the accuracy of an estimated gene tree ...
Citation: Algorithms for Molecular Biology 2020 15:1 -
Kohdista: an efficient method to index and query possible Rmap alignments
Genome-wide optical maps are ordered high-resolution restriction maps that give the position of occurrence of restriction cut sites corresponding to one or more restriction enzymes. These genome-wide optical m...
Citation: Algorithms for Molecular Biology 2019 14:25 -
NANUQ: a method for inferring species networks from gene trees under the coalescent model
Species networks generalize the notion of species trees to allow for hybridization or other lateral gene transfer. Under the network multispecies coalescent model, individual gene trees arising from a network ...
Citation: Algorithms for Molecular Biology 2019 14:24 -
TMRS: an algorithm for computing the time to the most recent substitution event from a multiple alignment column
As the number of sequenced genomes grows, researchers have access to an increasingly rich source for discovering detailed evolutionary information. However, the computational technologies for inferring biologi...
Citation: Algorithms for Molecular Biology 2019 14:23 -
Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics
Genomic data analyses such as Genome-Wide Association Studies (GWAS) or Hi-C studies are often faced with the problem of partitioning chromosomes into successive regions based on a similarity matrix of high-re...
Citation: Algorithms for Molecular Biology 2019 14:22 -
Super short operations on both gene order and intergenic sizes
The evolutionary distance between two genomes can be estimated by computing a minimum length sequence of operations, called genome rearrangements, that transform one genome into another. Usually, a genome is mode...
Citation: Algorithms for Molecular Biology 2019 14:21
- ISSN: 1748-7188 (electronic)