Genomic Data Analysis using Machine Learning Methods for Disease Classification and Causal Gene Analysis
Principal Investigator:
Ms Nuriye Ozlem Ozcan Simsek
Approved Research ID:
44337
Approval date:
December 19th 2018
Lay summary
The occurrence of complex diseases often results from multiple gene mutations, each of which may have a different level of influence on the disease. Therefore, the diagnosis of such diseases are cumbersome and may take long time for observation of the patient. The treatment is planned according to the diagnosis. Late diagnosis prevents the application of treatments and often results in loss of the patient. As DNA sequencing technologies are improving and getting cheaper, genomic data can be utilized for disease classification. Each complex disease in general has a different set of causal genes and the disease emerges from the combination of various mutations of these genes. Machine learning techniques can be utilized to model these causal gene sets and can be employed as an assistant service for medical experts. With the help of improving hardware solutions, the outcome of machine learning techniques will be available in milliseconds. This will help to save more lives by correct assistance in diagnosis. The precise detection of these genes will also lead to improvements in personalized medicine field. In this study, we propose using the mutation information in DNA sequencing data for disease classification. We will apply and compare machine learning techniques for this research problem. Traditional and deep learning techniques will be utilized for this study. The best performing technique will be further analyzed for causal gene detection. The genes, which are utilized for classification, will be proposed as causal genes. These genes will provide new targets for biomedical researchers. They can also be studied as new target genes for personal medicine development. There are two main contributions of this research study. The first one is to create an assistant classification agent for medical experts. This agent will provide a precise diagnosis proposal in milliseconds. Therefore, it will improve the diagnosis period of the patients and will lead to correct treatment. The second contribution is to detect new causal or target genes for diseases. These genes can be utilized either for building diagnosis tests or for personal medicine development.