Automated Classification of DNA Structure from Sequence Information (1997)
AbstractWe introduce an algorithm, lllama, which combines simple pattern recognizers into a general method for estimating the entropy of a sequence. Each pattern recognizer exploits a partial match between subsequences to build a model of the sequence. Since the primary features of interest in biological sequence domains are subsequences with small variations in exact composition, lllama is particularly suited to such domains. We describe two methods, lllama-length and lllama-alone, which use this entropy estimate to perform maximum a posteriori classi cation. We apply these methods to several problems in three-dimensional structure classi cation of short DNA sequences. The results include a surprisingly low 3.6% error rate in predicting helical conformation of oligonucleotides. We compare our results to those obtained using more traditional methods for automated generation of classi ers
RightsThis Item is protected by copyright and/or related rights.You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use.For other uses you need to obtain permission from the rights-holder(s).