- Starting a search
Enter the query in gene-symbols, separated by spaces (Figure 1
). Query can be a single-gene or multiple genes (up to ~150). If the query is multi-gene, then there should be some connections between the query genes (such as coexpressions), or the query should be biologically coherent (for example, they describe a common biological process, function, module, or they physically interact).
Figure 1: Home screen
- Viewing the retrieved genes and datasets
Expression View is the first view of the search results. Here, the query genes and their coexpressed neighbor genes are displayed, and a side-by-side comparison across datasets is shown (Figure 2).
Figure 2: Expression view
Datasets are ordered by relevance score to the query genes. Panel 1
shows the dataset panel, with dataset keywords, title, and relevance score. Users can mouse-over it. (see example
Co-expressed genes (Panel 2) can be moused over to display the full gene name. (see example) Click on it to view gene description and its co-expression score. SEEK derives a single integrated coexpressed gene ranking, since it is more reliable than from a single dataset. This integration weights datasets differently, according to what query genes are.
Use the export links at the bottom of the page to see the complete rank lists. For example, the gene-list export (Figure 3) shows a table with gene name, ENTREZ ID, co-expression score, the gene P-Value, and description. P-Value is estimated from comparing the observed gene rank with the gene ranks in 10,000 random queries.
Figure 3: Export gene-list
- Gene-enrichment analysis
Click on "Enrichment of genes" to biologically interpret retrieved co-expressed genes. SEEK performs a hypergeometric test based on a selected number of top retrieved genes and a selected functional database (Figure 4).
Figure 4: KEGG pathway enrichment
- Limiting user’s search to tissue or disease related datasets
By default, SEEK searches through the entire compendium to discover relevant datasets and co-expressed genes. Users can however change this behavior: limit the scope of the search to certain disease-, cell-, or tissue-categories. This is helpful if users want to view expression only in a given expression context.
To do so, first search the query globally (i.e., enter query in the search box on main page). Then on the result page, click on "Refine Search" located next to "Enrichment of genes". A dialog box will appear. Choose "Limit datasets by tissue/cell/disease type". Then choose among the tissue or disease categories displayed on the page ("brain" is selected in Figure 5 below). Once the categories are confirmed, click "Refine".
Figure 5: Limit datasets to brain
SEEK will now perform the current query utilizing only datasets related to user-selected categories (brain).
- Checking up- and down-regulated conditions
First, search query "GLI1 GLI2 PTCH1", then refine search to "Brain" category of datasets. The following result page (Figure 6) is displayed:
Figure 6: GLI1 GLI2 PTCH1 query in brain datasets
Let's focus on the first dataset on the result page. The dendrogram at the top of the heat-map is made from grouping conditions in the dataset based on the expression of the top 100 retrieved genes. There are 3 large groups in the first dataset. The blue-group indicates down-regulated conditions and the red-group includes those up-regulated conditions. Click on the row corresponding to "SMO" in the heat-map to see the condition-specific view (Figure 7).
Figure 7: Condition-specific view
This condition-level view allows users to see the conditions in the selected dataset. Users can compare the expression pattern of the selected gene "SMO" with the query (GLI1, GLI2, PTCH1). They can also see that the up-regulation (the red) corresponds to medulloblastoma and rhabdoid tumors, while the down-regulation (the blue) corresponds to the normal brain and control. Mouse-over a condition label to read more about the condition (see example). Users can also choose what attributes to display and to sort conditions by.
- Viewing the co-expression landscape
First, search query "GLI1 GLI2 PTCH1", then refine search to "Brain" category of datasets. In the result page, click on "Co-expression" tab on the top right.
This is the co-expression landscape across 50 datasets (Figure 8). Each cell shows the coexpression score of the gene in a dataset.
Panel 1 shows the co-expression among the query genes themselves, based on how well each retrieves the rest of the query (this is displayed in the heat-map).
Panel 2 shows the co-expression score of each retrieved gene to the query. The co-expression is normalized to z-scores between -3 and +3 (negative: negative correlation, positive: positive correlation).
Figure 8: Co-expression view
In the Genes (1-100) tab, go to the "Genes (17601-17700)
". This displays the anti-correlated genes located at the bottom of the gene-ranking (see example
- Checking the coherence of query genes
Because users may not know if the query genes are coexpressed, users can use the Panel 1 in Figure 8 to check the query coherence. An example is provided here. Each colored cell represents the cross-validation of a query gene in a dataset, meaning the ability of this gene to retrieve the remainder query genes based on co-expression. In this example, some of the query genes (EXT2, SGCE, VCL, etc) appear to have no co-expressions with the rest of the query across datasets, perhaps due to little transcriptional regulations involving these genes. Users can thus eliminate these genes in order to construct a coherent, improved query.