- The rapid advancements in high-throughput experiment technologies make biological data increasing at an unprecedentedly exponential rate. To answer the most important and complex biological questions, it is very often to involve the integration of diverse data from multiple data sources, which needs to harness collective contributions and build bioinformatic Web APIs for massive data integration.
- The fast-growing volume of biological data makes it imperative to develop time-efficient applications for large-scale data analysis. This requires utility of highly efficient computing technologies (e.g., cloud, parallel) and establishment of lightweight programming environment to make full use of computing resources as well as storage resources.
- Data, broadly speaking, including raw data, algorithms, results, pipelines, publications, knowledge and even connections among people, are growing at an unparalleled pace. Thus, it needs to link researchers all over the world and build scientific social networks for efficient and effective data sharing.
- Omics Bioinformatic Cloud: a hadoop-based bioinformatics cloud for large-scale NGS data storage, analysis and sharing
- Participants: Siqi Liu, Dong Zou, Ang Li, Chao Xu
- RiceWiki: a community-based annotation platform for rice genes
- Participants: Chao Xu, Siqi Liu, Dong Zou, Ang Li, Lina Ma, Hao Wu, Gang Wu, Dawei Huang
Computational Molecular Evolution
Modeling Compositional Dynamics
- Sequence compositions at different levels (e.g., codon) reflect an interplay result of mutation and selection. To better understand sequence evolution, it is of fundamental significance to study sequence composition, which is closely related to gene expression, translation speed and/or accuracy, gene function, protein structure, the intrinsic nature of the genetic code, and so on.
Detecting Mutation and Selection
- A number of models have been proposed for modeling evolution of protein-coding sequence. It would be desirable to model sequence evolution and detect selective pressure, not merely in protein-coding sequences, but also in non-coding sequences.
Simulating Evolutionary Process
- Simulating evolutionary process of molecular sequences over time is essential for a broad range of evolutionary studies. To perform simulations in a biologically realistic way, it is necessary to take full considerations of a variety of multiple parameters, such as, mutation rate, functional and structural constraints, pattern of site substitution, co-evolving sites, site-specific evolutionary constraints, etc.
- Non-coding sequences: composition, evolution, expression, function
- Participants: Lina Ma, Hao Wu, Gang Wu, Dawei Huang, Siqi Liu, Ang Li
- Prokaryotic genomes: three dnaE-based groups
- Participants: Hao Wu, Gang Wu, Dawei Huang, Lina Ma, Dong Zou, Ang Li
- Circadian genes: identification, clustering, expression,
- Participants: Gang Wu, Hao Wu, Dawei Huang
- Substitution features & evolutionary models:
- Participants: Dawei Huang
We look forward to world-wide collaborations as well as comments, suggestions and guidance from colleagues and peers with common research interests.
Permission and Copyright of Images
Permission is required to use the above two images. High-resolution versions of these images are available upon request.