About WeGet: Weighted Gene CoExpression Tool

WeGET is a computational tool to find mammalian genes that strongly co-express with a human query gene set of interest. Currently, WeGET uses over one thousand human and murine microarray data sets in its analysis. Importantly, data sets are weighted by their relevance to the query genes.

Purpose

WeGET performs a computational analysis to find genes that co-express with a set of query genes inside a large compendium of human and murine microarray experiments. The central idea used by WeGET is that when the query genes are involved in a common biological system (e.g. pathway, process or function), other (possibly unknown) genes that strongly coexpress with this set of genes might also be relevant. WeGET weights the datasets by their relevance to the query gene set and ranks all other genes by their degree of weighted co-expression. Finally, the human and murine ranks are integrated using a robust method based on rank-order statistics.

Weighting microarray dataset by relevance

Mammalian gene (co)expression is condition and tissue specific. When the set of query genes is strongly expressed and co-expressed in a dataset, this indicates that the microarray experiment is relevant to the biological system of interest. Therefore, in selecting co-expressing genes, WeGET assigns more weight to these microarray datasets. Users can see which datasets were assigned the heighest weights, to see if the experimental design of the microarray is indeed relevant to their query.

Define your "query genes"

The query genes is a set of genes which have (or are thought to have) some biological relationship. For instance, they are (expected to be) involved in the same pathway, tissue or respond similarly to stress or perturrbations. You can define a set of query genes by entering their EntrezIDs or Gene Symbol and WeGET will provide the computational results by email.

Refine your "query genes"

Using all the microarray datasets, WeGET computes "the relation" between the query genes. It can happen that a couple of the query genes do not co-express well with the other genes in the set. This might be an indication that these genes are actually not (or not so much) related to the underlying biological system of interest. WeGET graphically shows the "relatedness" of the query genes in a network, where related genes cluster together and less related genes a more distant from this cluster.

The Gene Ontology (GO) co-expression database

Gene Ontology (GO) is widely used to "group" genes involved in a common biological pathway, cellular component or molecular function . We used WeGet to precompute the weighted co-expression for all GO terms that have between 5 and 500 associated genes. If you have a set of genes that are enriched for certain GO terms, selecting the GO term here and looking up the rank of your genes of interest might provide a first and quick idea of their relevance to this GO term. In particular, genes that are not associated to the GO term, might still show significant coexpression with the genes that are associated with the GO term.

Robustness

To test how "sensitive" the final ranking scheme is to the specific query gene set, WeGET performs cross validation analysis. For query sets up to 50 genes, leave-one-out cross validation is used; for larger sets we use 10-fold cross validation. Because there is generally no "gold-standard" of which genes are related, robustness is computed as the area under the rank-recall curve (AUC). A high AUC indicates that a large fraction of the query genes are among the top ranking genes, when they were held out of the query gene list.

Citation

If you use WeGET in your work, please cite: Szklarczyk, R., Megchelenbrink, W., Cizek, P., Ledent, M., Velemans, G., Szklarczyk, D., and Huynen, M. A. (2015). WeGET: predicting new genes for molecular systems by weighted co-expression. Nucleic Acids Research, gkv1228.

Contact

For questions, please e-mail to w.megchelenbrink[at]science.ru.nl