MicroRNAs (miRNAs) are single-stranded, non-coding RNAs averaging 21 nucleotides in length. The mature miRNA is cleaved from a 70–110 nucleotide (nt) "hairpin" precursor with a double-stranded region containing one or more single-stranded loops. MiRNAs target messenger RNAs (mRNAs) for cleavage, primarily by repressing translation and causing mRNA degradation .
Several computational approaches have been applied to miRNA gene prediction using methods based on sequence conservation and/or structural similarity [3–7]. All of these methods rely on binary classifications that artificially generate a non-miRNA class based on the absence of features used to define the positive class. Nam, et al.  constructed a highly specific probabilistic Markov model (HMM) using the features of miRNA sequence and secondary structure; a negative class consisting of 1,000 extended stem-loop structures was generated based on several criteria, including sequence length (64–90 nt), stem length (above 22 nt), bulge size (under 15 nt), loop size (3–20 nt), and folding free energy (under -25 kcal/mol). Pfeffer, et al.  used support vector machines (SVMs) for predicting conserved miRNAs in herpes viruses. Features were extracted from the stem-loop and represented in a vector space. The negative class was generated from mRNAs, rRNAs, or tRNAs from human and viral genomes. The same technique was also applied to clustered miRNAs . Xue, et al.  defined a negative class called pseudo pre-miRNAs. The criteria for this negative class included a minimum of 18 paired bases, a maximum of -15 kcal/mol folding free energy and no multiple loops. See  for a full review of miRNA discovery approaches.
In a recent publication we described a two-class machine learning approach for miRNA prediction using the naïve Bayes classifier . Four criteria were used to select a pool of negative examples from candidate stem loops: stem length out of the range 42–85 nt, at most -25 kcal/mol of folding free energy, loop length greater than 26 nt and a number of base pairs (bp) that is not in the range (16–45) of the positive class. This approach, like all of the binary classifiers mentioned earlier, does not address the best number of negative examples to use and this influences the balance between false positive and false negative predictions. A comparison of a genuine negative class with one generated from random data for miRNA target prediction has been reported [14, 15] showing that the two negative classes did not produce the same results.
Lately, Wang, et al.  developed an elegant algorithm, positive sample only learning (PSoL), to predict non-coding RNA (ncRNA) genes by generating an optimized negative class of ncRNA from so-called "unlabeled" data using two-class SVM. This method addresses predicting ncRNA genes without using negative training examples, but the procedure is quite complicated. Using their data set, we tested one of the one-class approaches, OC-SVM, to demonstrate a solution of the problem they addressed.
The method we now describe uses only the known miRNAs (positive class) to train the miRNA classifier. We emphasize that the one-class approach is a good tool not only for its simplicity, but in order to avoid generating a negative class where the basis for defining this class is not clear. The only required input for this tool is the miRNA sequences from a specific genome (or multiple genomes) for building the model to be used later as a miRNA predictor. In addition, we have tested the accuracy of the one-class method in the identification of miRNA in "newly sequenced" organisms such as the Epstein Barr virus genome, which were not used for training the classifier. The results are comparable to our two-class approach with high sensitivity and similar numbers of new predictions.