YTÜ DSpace Kurumsal Arşivi

Splice site prediction using machine learning

Basit öğe kaydını göster

dc.contributor.author Pashaei, Elham
dc.date.accessioned 2022-12-21T11:35:48Z
dc.date.available 2022-12-21T11:35:48Z
dc.date.issued 2017
dc.identifier.uri http://dspace.yildiz.edu.tr/xmlui/handle/1/13160
dc.description Tez (Doktora) - Yıldız Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2017 en_US
dc.description.abstract Due to an explosion in the quantity of DNA sequences over the past decades, development of new methods to accurately detect the genes is vital. The success of these methods strongly depends on precise identification of the splice sites. In eukaryotic genomes, each gene is composed of exons and introns. During DNA transcription only exons of the gene, which contain codes for proteins are transcribed into mRNAs. The term splice site refers to the boundary between exon and intron. While the intron-exon junction with consensus dinucleotide AG is called acceptor splice site, donor splice site refers to an exon-intron junction with consensus dinucleotide GT. In DNA sequence, splice site prediction is a search problem for finding donor and acceptor boundaries. Numerous Machine Learning methods have been used for splice sites identification. Performances of these methods highly depend on the DNA encoding approaches, which try to extract informative features from DNA sequences. Using AdaBoost classifier, we have proposed three new DNA encoding methods for feature extraction by combining several approaches that have already proven successful in determining pattern around splice sites. the proposed approaches provided significantly better performance than eleven current state-of-the-art algorithms based on several performance criteria. We also have developed an online prediction server (HSSAda) based on proposed approach, which is freely available at https://pashaei.shinyapps.io/hssada. The HSSAda tool achieved higher accuracy while compared with the existing tools like NNplice, WMM, MM1, and MEM, using the independent test set. It is believed the proposed methods can be helpful in discovering location and structure of eukaryotic genes due to their high prediction accuracy and simplicity. We also assessed the performance of RF as classification and feature selection method in splice site prediction domain. The investigation tried to answer the question whether RF outperforms SVM, which is the most outstanding classification approach in splice site detection, using Markovian encoding methods or not. Finally, we proposed another DNA encoding method using SVM and second order Markov model for splice site detection. en_US
dc.language.iso en en_US
dc.subject Gene detection en_US
dc.subject Splice sites prediction en_US
dc.subject Machine learning en_US
dc.title Splice site prediction using machine learning en_US
dc.type Thesis en_US


Bu öğenin dosyaları

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Basit öğe kaydını göster