Home » Documentations » Biological Case Studies
  1. Study a pathway of interest, Hedgehog signaling pathway
    This example shows how SEEK can help users to achieve these three objectives:
    • Explore Hedgehog signaling pathway (Hh) across the diverse compendium datasets
    • Find out the disease states and cancer types in which Hh pathway genes are co-expressed (i.e. find datasets associated with the Hh pathway)
    • Find out other gene candidates in this pathway

    Hedgehog (Hh) pathway is a major development and cancer pathway. This pathway is perturbed in cancer patients likely caused by mutations. The pathway is SHH, DHH, IHH ligand dependent and upon ligand binding it produces the transcription factors GLI1, GLI2 which then activate a wide range of downstream processes.

    To start exploring this pathway, we enter GLI1 GLI2 PTCH1 as query, which are transcription factors and receptor protein that are markers of this pathway, and central to the machinery.

    Figure 1 shows the result of this query. In this figure, panel 1 shows the prioritization of datasets based on the co-expression of the query genes (the top 4 datasets are shown in the figure). These prioritized datasets represent cancer studies where the expression/coexpression of the pathway genes indicate the importance of the Hh pathway activations. Mouse-over the dataset header to learn about the type of tissues or diseases studied (see example).

    Figure 1: Hh query GLI1 GLI2 PTCH1.

    For example, when we examine the top datasets, we have simultaneously discovered Hh activations across a diverse set of disease states, such as medulloblastoma, rhabdoid tumors, lung small-cell carcinoma. Many of these have confirmed literature associations to aberrant Hh signaling [1] [2] [3] [4].

    Previously, we know that Hh misregulations often result in the constitutive activation of the pathway. Here we use the coexpression of the pathway genes GLI1/2 and PTCH1 as a proxy to represent pathway activity. Coregulations of Hh genes in this case measures active pathway signaling. Retrieved datasets will show pathway expression profiles consistent with activating Hh dysfunction.

    Pinpointing disease/cancer types associated with a pathway can be very useful. It can suggest a pathway-based stratification of cancer patients based on pathway profiles, which may lead to useful strategies for treating the patient by targeting the Hh pathway. By looking across thousands of datasets in SEEK, the co-expression landscape across diverse tissue/disease states can now be comprehensively examined.

    To answer the third question, panel 2 displays an integrated list of co-expressed genes around the query. These represent genes that are predicted to be associated with Hh. SEEK retrieved many currently known members of Hh machinery, such as SMO, HHIP, BOC, PTCH2. One of the top ranked members that SEEK identified, KIF7 (rank 22/17680) is the homolog of Cos2 protein in Drosophila melanogaster, and is a recently experimentally verified Hh regulator [5] [6].

  2. Study a differentially expressed gene-set, glean underlying pathways and processes

    Investigators often wish to know what biological process and pathways are underlying a differentially expressed gene-set generated from an independent microarray study or RNASeq study. But for various reasons, the gene enrichment analysis sometimes might not find any pathways, or the relevant pathways aren't detected. This could be due to factors such as heterogeneity of the gene-set, biological noises in the data, or limited number of genes to do enrichment on, etc. SEEK can offer an alternative solution by performing a co-expression expansion on the gene-set first.

    For example, we have a set of 10 genes which represent biomarkers for the ERBB2 subtype of breast cancer (obtained from [7]). After trying gene-set enrichment analysis on these 10 genes, we could not obtain any significant enriched processes.

    We query these 10 genes in SEEK. Figure 2 shows the results below.

    Figure 2: 10-gene ERBB2 query.

    SEEK returns several independent breast cancer studies as being highly ranked among thousands of studies that are databased in the compendium. This is a reassuring sign considering that this gene set is derived from breast cancer transcriptomic experiments. Investigators can check out these datasets to learn about the experimental design, selection of patient subjects, and clinical characteristics of these patients in these related studies.

    More importantly, through co-expressed genes, we are now able to discover several important biological processes as underlying process in the ERBB2+ breast cancer subtype. For example, we discovered the enrichment in endocytosis for this ERBB2+ breast subtype (which is supported by literature evidence [8]) (Figure 3).

    Figure 3: Enrichment for ERBB2 query

    The idea of expanding a set of genes is not limited to differentially expressed gene-sets. It can also expand gene-sets derived from GWAS studies or other genetic screenings. For GWAS studies, investigators can enter some disease associated genes into SEEK to look for evidence of co-expressions in existing datasets, and identify disease-associated chromosomal locations which are enriched by the co-expressed genes. (This is what the Gene Enrichment analysis - the MsigDB chromosomal location option provides.)

  3. Find functionally related gene pairs involving the query

    The metalloproteinases (MMP2 and MMP9), which function together to promote cell migration and in the breakdown of the extracellular matrix, are often found in elevated expression levels in various types of cancer [9]. Investigators can use SEEK to find the substrates of these two enzymes and the proteins that these enzymes interact with.

    The results of searching this query (MMP2 and MMP9) indicates several collagens being highly ranked (COL1A2, COL1A1, COL5A1), and fibronectin (FN1, rank 7 out of 17680). These findings made sense because collagens are degraded by MMP’s [10], and fibronectin promotes the activation of MMP’s by stimulating their secretion [11].

    Other proteins that have experimental evidence of physical interactions with MMP’s are also retrieved, such as thrombospondin (THBS2 [11]: rank 37, THBS1 [11]: rank 131 out of 17680), TIMP metallopeptidase inhibitor (TIMP1 [12]: rank 39, TIMP2 [13]: rank 103, TIMP3 [14]: rank 106 out of 17680), and SERPINF1 [15] (rank 16 out of 17680, also known as PEDF, and is a substrate of MMP2 and MMP9). In particular, the regulation of MMP’s by SERPINF1 is important in the context of angiogenesis, and is recently described as a promising target for cancer therapy [16].