Ongoing Project
Last update: 2012-03-14
Binning method for metagenomic sequences
Researchers:
Sen-Lin Tang, Issam Saeed, Arthur Hsu and Saman Halgamuge (University of Melbourne)
Introduction:
My colleagues and I are working on the binning method for microbial community metagenomic sequences. Because of the complexity of the microbial community, the metagenomic sequences were often highly fragmented. Therefore sorting out sequences belong to which microbial groups will help researchers analyze the functional, evolutionary, ecological relationships between the microbes in a community. We join several groups worldwide to develop the method in binning metagenomic sequences according to their taxonomy. In 2006 and 2008, we presented an improved method, S-GSOM (seeded-self-organization map). The methods allow us to accurately sort out the metagenomic sequences , those are larger than 8 kb. With the seeding method, we furthermore can predict the sequence cluster with considerable accuracy. The methods can be found in two papers:

Chon-Kit Kenneth Chan , Arthur L. Hsu , Saman K. Halgamuge and Sen-Lin Tang, 2008, Binning sequences using very sparse labels within a metagenome, BMC Bioinformatics, 9:215, doi:10.1186/1471-2105-9-215. Chon-Kit Kenneth Chan, Arthur L. Hsu, Sen-Lin Tang and Saman K. Halgamuge. 2008, Using Growing Self-Organising Maps to Improve the Binning Process in Environmental Whole Genome Shotgun Sequencing, Journal of Biomedicine and Biotechnology , 2008:513701. ohannes Reinhard, Chon-Kit Kenneth Chan, Saman Halgamuge, Sen-Lin Tang and Rudolf Kruse, Region Identification on a Trained Growing Self-Organizing Map for Sequence Separation between Different Phylogenetic Genomes\", BIOINFO 2005, Proceedings of the 2005 International Joint Conference of InCoB, AASBi and KSBI, pp124-129 A new strategy in binning metagenomic sequences We have developed another even more robust strategy in binning the metagenomic sequences. The details can be found in our publication. Here is the abstract of the paper: An approach to infer the unknown microbial population structure within a metagenome is to cluster nucleotide sequences based on common patterns in base composition, otherwise referred to as binning. When functional roles are assigned to the identified populations, a deeper understanding of microbial communities can be attained, more so than genecentric approaches that explore overall functionality. In this study, we propose an unsupervised, model-based binning method with two clustering tiers, which uses a novel transformation of the oligonucleotide frequency-derived error gradient and GC content to generate coarse groups at the first tier of clustering; and tetranucleotide frequency to refine these groups at the secondary clustering tier. The proposed method has a demonstrated improvement over PhyloPythia, S-GSOM, TACOA and TaxSOM on all three benchmarks that were used for evaluation in this study. The proposed method is then applied to a pyrosequenced metagenomic library of mud volcano sediment sampled in southwestern Taiwan, with the inferred population structure validated against complementary sequencing of 16S ribosomal RNA marker genes. Finally, the proposed method was further validated against four publicly available metagenomes, including a highly complex Antarctic whale-fall bone sample, which was previously assumed to be too complex for binning prior to functional analysis. Saeed, S-L. Tang, S.K. Halgamuge, “Unsupervised discovery of microbial population structure within metagenomes using nucleotide base composition”, Nucleic Acids Research, 40(5):e34, 2012.
Biodiversity Research Center, Academia Sinica - No.128, Academia Road, Sec.2, Nankang, Taipei 115, Taiwan
Copyright © 2008-2023 Sen-Lin Tang Microbial Lab. All Rights Reserved.