- When should I use the Refine search function?
Refine search is designed to filter dataset selection after a query has been performed. This function is helpful in single-gene query
, or small query
(<5 genes) which may not provide adequate information to the dataset weighting algorithm to find relevant datasets. It serves a complementary role to SEEK's weighting in situation when the query's coexpression is very noisy and heterogeneous across the datasets. For such small queries, without expanding the query gene-set, the solution that SEEK recommends is Refine Search, an option that allows users to focus their search on a more coherent dataset category (such as tissue, disease, or even general categories such as cancer, non-cancer, etc).
Essentially, this function provides metadata-assisted search. For example, if a user knows that a query is associated with a rare concept such as "pancreatic beta cell", which the automatic algorithm may have trouble finding based on the query genes alone, then refining selection to "beta cell" datasets would help guide SEEK to produce more accurate results.
- Examples of some queries analyzed using the Refine Search function
(Signal transducer and activator of transcription 4) in Multi-tissue profiling datasets
shows elevated expression in CD8, CD4 T Cells compared to other cell types.
STAT4 in Non-Cancer datasets correctly prioritizes systemic inflammatory response syndrome, sepsis, inflammation related datasets .
STAT4 in (Variation): Disease state datasets identifies coexpressed genes separating chronic lymphocytic leukemia vs conventional mantle cell lymphoma patients (GSE16455), septic shock patients vs normal control group (GSE26440).
- How do I use the Refine search function?
Click on the Refine Search link down at the bottom of the search result page. A window (Figure 1
) will pop up.
Figure 1: Refine Search (step 1)
Choose the first option, "tissue/cell/disease types, major categories", which is the most popular.
Figure 2: Refine Search (step 2)
You will see three types listed in the table (Figure 2):
- (Category): Major categories, with explanations on some categories below:
- Leukemia: leukemia, lymphoma
- Non-Cancer: anything not cancer and not leukemia associated
- Non-Cancer, Others: non-cancer other than blood cells, brain, muscle, fat, and stem cells
- Primary Cancer Tumor: primary tumor datasets from TCGA and GEO
- Multiple Tissue Profiling: 13 datasets each spanning many organs, tissues, cell types, or cell lines (useful for checking expression)
- (Variation): Datasets which exhibit variations in cell line, cell type, disease state characteristics (see Figure 3)
- Disease/Tissue/Cell types: more specific categories including AML, Acne, Adenocarcinoma, Adenovirus infection, etc (see Figure 3)
Figure 3: Refine Search (step 2 continued), variation and disease/tissue type categories
Next, choose the category of interest. There are many pages of other categories which you can view using the navigation panel at the bottom (Figure 4, box 1). Or you can search for a category by entering in the search box (Figure 4, box 2), e.g. entering "brain". Once a category is selected, you can click on "Check Selection" (Figure 4, box 3).
Figure 4: Refine Search (step 3)
Finally, click on the "Refine" button.