Open Access Open Badges Primary research

Development of an integrated genome informatics, data management and workflow infrastructure: A toolbox for the study of complex disease genetics

Oliver S Burren, Barry C Healy, Alex C Lam, Helen Schuilenburg, Geoffrey E Dolman, Vincent H Everett, Davide Laneri, Sarah Nutland, Helen E Rance, Felicity Payne, Deborah Smyth, Chris Lowe, Bryan J Barratt, Rebecca CJ Twells, Daniel B Rainbow, Linda S Wicker, John A Todd, Neil M Walker* and Luc J Smink*

Author Affiliations

Juvenile Diabetes Research Foundation/Welcome Trust Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, Wellcome Trust/MRC Building, Addenbrooke's Hospital, Cambridge, CB2 2XY, UK

For all author emails, please log on.

Human Genomics 2004, 1:98-109  doi:10.1186/1479-7364-1-2-98

Published: 1 January 2004


The genetic dissection of complex disease remains a significant challenge. Sample-tracking and the recording, processing and storage of high-throughput laboratory data with public domain data, require integration of databases, genome informatics and genetic analyses in an easily updated and scaleable format. To find genes involved in multifactorial diseases such as type 1 diabetes (T1D), chromosome regions are defined based on functional candidate gene content, linkage information from humans and animal model mapping information. For each region, genomic information is extracted from Ensembl, converted and loaded into ACeDB for manual gene annotation. Homology information is examined using ACeDB tools and the gene structure verified. Manually curated genes are extracted from ACeDB and read into the feature database, which holds relevant local genomic feature data and an audit trail of laboratory investigations. Public domain information, manually curated genes, polymorphisms, primers, linkage and association analyses, with links to our genotyping database, are shown in Gbrowse. This system scales to include genetic, statistical, quality control (QC) and biological data such as expression analyses of RNA or protein, all linked from a genomics integrative display. Our system is applicable to any genetic study of complex disease, of either large or small scale.

type 1 diabetes; complex disease; genome informatics; data management; genetics