Data Driven Bioinformatics
Emanuel Weitschek (emanuel.weitschek <at> iasi.cnr.it)
During last years there has been a rapid development of biomedical technologies, but even more of the data they produce. Especially thanks to the introduction of next generation sequencers, the amount of data produced has increased considerably. For these reasons, computer science and big data have become the key instruments for managing and analyzing them. Furthermore, algorithms and computational methods have been developed to support the collection, storage and analysis of biological data.
The course will provide a brief introduction to bioinformatics and big data, with particular attention to the following areas of interest: Introduction to molecular biology (genomics, DNA, RNA genetic code), Bioinformatics, Next generation DNA sequencing (NGS), Growth of biological and biomedical data, biomedical data collection and biomedical databases, biomedical data Management, distributed platforms and infrastructures for NGS data analysis, Biological applications, classification of genomic data, case control studies, early diagnosis, and personalized medicine
1. Introduction to Bioinformatics
2. Biomedical Big Data, biomedical databases, and biomedical data management systems
3. Biomedical data analysis and supervised learning
4. Applications and case studies (I): DNA barcoding, Gene Expression, DNA methylation.
5. Applications and case studies (II): clinical data, biomedical signals, and biomedical images.
Primary source material will be readings in the form of research papers and material provided by the instructor.
A project work will be assigned to the students. The work constitutes in a biomedical data management and analysis task. Open source bioinformatics software (SAM-tools, Bowtie, GMQL, etc.) and Big Data environments (Hadoop, SPARK, etc.) are going to be adopted. Additionally, a free access to the CINECA supercomputing facilities (grant number HP10CTJZAM) is going to be provided.