Big Data in Bioinformatics

Biology is becoming increasingly data-intensive as high-throughput genomic assays become more accessible to greater numbers of biologists. Working with large-scale data sets requires user-friendly yet powerful software tools that stimulate user’s intuition, reveal outliers, detect deeper structures embedded in the data, and trigger insights and ideas for new experiments. We are interested in developing tools and techniques that help biologists manage, integrate, visualize, analyze, and interpret large-scale data sets from genomics.

Tools developed by our Faculty:


– Alternative splicing database
Bioviz – Integrated Genome Browser, integrative visual analytics for next-generation genomics CressExpress – CressExpress, gene networks analysis in Arabidopsis
GleClubs – Global Ensemble Clusters of Binding Sites, algorithm for predicting cis-regulatory binding sites in prokaryotes
PlantGDB – Resource for comparative plant genomics
Supramap – Web application for integrating genetic, evolutionary, geospatial, and temporal data

Amino Acid
Supramap Legend

Molecular evolution of the polymerase basic 2 gene of H5N1 “avian influenza” over time and space.

The lineages of H5N1 carrying lysine at position 627 (red lines) are mutants adapted to mammalian hosts.

The lineages of H5N1 carrying glutamic acid at position 627 (white lines) are wild-types adapted to avian hosts.

Faculty in this Research Area: