CS 681 Projects
Anything in general the field of "bioinformatics / computational
biology" can be proposed. The projects can be proposed/submitted by
groups of 2 or 3 people, as well as individually. The purpose of the
project is to increase your knowledge about bioinformatics /
computational biology in general, and see what problems can be worked
on for further improvement. You can:
- Apply known algorithms / available tools to different datasets; such as:
- Download next-generation sequence reads for an organism (E.
coli, C. elegans) and try to assemble it. Then compare your assembly
with available reference genomes for accuracy.
- Analyze the genome(s) of one or more individuals (of any
organism) to discover variants such as SNPs, small indels, structural
variation.
- Analyze RNA-seq data (from different tissues); estimate
coverage and/or expression for each gene (beware of alternative
splicing).
- Implement previously described algorithms in an efficient and user friendly manner. For example:
- The NovelSeq framework is
hard to work with. Reimplement the tools within the framework, and
write a "single command line" program that reads the different
alignmentfiles. Sample dataset and contact with the original implementers will be provided.
- Proposals for other algorithms / tools can be submitted.
- Implement basic tools for genome analysis or other topics within the theme of the course. For example:
- Given a set of genomic variants (SNPs, indels, etc.) and a
reference genome; patch the reference genome with the variation
information to in silico compute the genome of the analyzed individual.
- Develop
your own algorithm. Obviously this cannot be for a very complex and
hard problem due to time limitations. A simple, yet useful problem
would suffice. For example:
- A simple scaffolding algorithm that uses data from multiple
sequencing platforms (e.g. short reads [Illumina] and strobe reads
[Pacific Biosciences]) to improve assembly of a single BAC clone. For this project, raw reads as well as pre-assembled contigs can be provided. This can be a very complicated problem, but simple solutions may be accepted.