Open Access Open Badges Primary research

Gene selection and cancer type classification of diffuse large-B-cell lymphoma using a bivariate mixture model for two-species data

Yuhua Su1*, Dahlia Nielsen2, Lei Zhu3, Kristy Richards4, Steven Suter5, Matthew Breen6, Alison Motsinger-Reif7 and Jason Osborne7

Author Affiliations

1 Dr. Su’s Statistics, Department of Human Nutrition, Food, and Animal Sciences, University of Hawaii at Manoa, Honolulu, HI 96822, USA

2 Bioinformatics Research Center, North Carolina State University, Raleigh, NC 27695, USA

3 Biomarker and Predictive Analytics, GlaxoSmithKline, 5 Moore Drive, Research Triangle Park, NC 27709, USA

4 Department of Genetic, University of North Carolina, Chapel HillUSA, NC 27599

5 Department of Clinical Sciences, Center for Comparative Medicine & Translational Research, North Carolina State University, Raleigh, NC 27695, USA

6 Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, Raleigh, NC 27695, USA

7 Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA

For all author emails, please log on.

Human Genomics 2013, 7:2  doi:10.1186/1479-7364-7-2

Published: 5 January 2013


A bivariate mixture model utilizing information across two species was proposed to solve the fundamental problem of identifying differentially expressed genes in microarray experiments. The model utility was illustrated using a dog and human lymphoma data set prepared by a group of scientists in the College of Veterinary Medicine at North Carolina State University. A small number of genes were identified as being differentially expressed in both species and the human genes in this cluster serve as a good predictor for classifying diffuse large-B-cell lymphoma (DLBCL) patients into two subgroups, the germinal center B-cell-like diffuse large B-cell lymphoma and the activated B-cell-like diffuse large B-cell lymphoma. The number of human genes that were observed to be significantly differentially expressed (21) from the two-species analysis was very small compared to the number of human genes (190) identified with only one-species analysis (human data). The genes may be clinically relevant/important, as this small set achieved low misclassification rates of DLBCL subtypes. Additionally, the two subgroups defined by this cluster of human genes had significantly different survival functions, indicating that the stratification based on gene-expression profiling using the proposed mixture model provided improved insight into the clinical differences between the two cancer subtypes.

Mixture models; Gene expression; Homology; Lymphoma