About EXPOLDB   HelpDesk  
 
Home
Query EXPOLDB
 SimRep 
Create Graph
Data Sources 
 Tutorial 
 Related Links 
Institute of Genomics and Integrative Biology

EXPOLDB is a resource for investigating the natural variations in gene expression in humans and aims to provide insights into gene regulation by linking gene expression data from microarray experiments with the distribution of cis modulators of transcription. It is the first systematic effort to collect gene expression data from microarrays and link them with the distribution of (TG/CA)n repeats. In this release, the database contains gene expression data of blood leukocytes from 13 normal human individuals (five pairs of monozygotic twins and 3 unrelated individuals) measured using HG-U95A oligonucleotide microarrays consisting probes for ~10,000 genes.

EXPOLDB provides information for the 2,888 genes that were differentially expressed (signal log ratio > 1.585) in unrelated individuals and 212 genes in twins. The information on mean expression and variability (CV) for the more than 5,000 genes that expressed in blood and had present call (P) can be accessed. It also provides information on the expression status of 542 known housekeeping genes. This database links expression profiles with distribution of (TG/CA)n repeats and can serve as a resource for examining the role of these repeats. 

To make the results more comprehensive, information related to annotation, chromosomal location, cellular localization, Gene Ontology, biochemical roles of the gene products, tissue specific expression, and associated hyperlinks to other public databases have been provided.

Since the database incorporates both genotype and phenotype data and, with the contribution of additional data sets from other investigators and our own studies, can serve as a unique resource for those that study the effects of repetitive sequences on gene expression.

Salient features of EXPOLDB

  • Examine the expression patterns of genes in blood (mean expression, variation, high, moderate or low).
  • Explore the differentially expressed genes in different datasets: Unrelated Individuals, Twins and Housekeeping genes.
  • Identify genes whose expression varies within a specified range.
  • Explore the expression status and differential expression of genes using annotation-based keyword search.
  • Explore the human chromosomes with respect to gene expression and variability.
  • Interrogate the distribution of (TG/CA)n and Alu repeats in a gene or a set of genes.
  • Find the expression and variability shown by a gene containing a known polymorphic repeat.
  • Explore biochemical pathways for variablity in gene expression and distribution of (TG/CA)n repeats.
  • Find repeats in a given nucleotide sequence using 'SimRep'.

Query Options

The information embedded in EXPOLDB can be retrieved through the 'Query EXPOLDB" page. This page provides information in two categories:

I) Expression in Blood

II) Differentially Expressed Genes

The user can select any of these two query categories. Both the query pages provide the following search options:

A. Chromosome:
This query option can search for the expression and variability of genes present on "All" or a selected chromosome.

The default option will provide list of genes from "All" chromosomes. The selection of a chromosome provides the chromosome specific list of genes. User can also select multiple number of chromosomes from the list box using selection key(shift).

B. Gene Symbol :

This query option allows the user to search a given gene in this database.

The user should enter the HGNC approved gene symbol in the box below. Eg: STAT6

 

The wildcard search is also provided. For example the gene SHARP can be searched putting wild cards "STAT*" as below

The 'wildcard' search option allows the user to search genes belonging to a closely related family such as RUNX* will find RUNX1, RUNX2, RUNX3.

In order to retreive information on multiple genes, the user can enter multiple gene symbols seperated by spaces as indicated below

C. Gene Definition

The query option allows the user to search for a gene on the basis of keyword in its annotation. For example, search for the keyword "protease" will give the list of genes containing the word "protease". This query can help in searching for genes on the basis of its function.

D. UniGene ID

The genes in the database can be searched on the basis of its UniGene Cluster ID. The user should enter the complete UniGene ID in the search box as below. No wildcard search option is available in this query.

E. Functional pathways

The information for 134 biochemical pathways from KEGG and GenMAPP databases are available for the genes present in this database. The user can retreive information by choosing from the list of available pathways from the list box or can enter the name of the functional pathway for e.g, entering 'glycolysis' into the field would give the list of genes available in EXPOLDB belonging to this selected pathway. Since the pathway information is compiled from two different resources, there is a small overlap in case of Glycolysis and Gluconeogeneis pathways available as 'Glycolysis_and_Gluconeogenesis' from GenMAPP and Glycolysis/Gluconeogenesis from KEGG. Similarly, the Pentose Phosphate Pathway is available as Pentose_phosphate_pathway from GenMAPP and Pentose phosphate pathway from GenMAPP. By convention, the KEGG pathways uses "/" to split words whereas GenMAPP uses "_" to split words. The reference list of the pathways incorporated in EXPOLDB is available in the ‘Biochemical Pathway’ section

However, the number of genes belonging to a pathway that appear from a query will depend on their present (P) call in the array experiments or presence of the probe sets on the HG U95Av2 array. An example of Glycolysis pathway is also available in the 'Biochemical Pathway' section.

F. Polymorphic repeat

Aknown polymorphic (TG/CA)n repeat can be searched by giving the marker ID. The database can be searched for the gene that contains this polymorphic marker. The D number for example : "D12S1644" can be entered or the GenBank accession number : "Z53110" can be entered in this box. No wildcard search option is available for this query box.

If the user wants to find out whether a gene of his interest contains a polymorphic repeat, then he can submit either the HGNC approved Gene Symbol in the Gene Symbol input box or give a keyword search in the Gene Definition input box. The ‘EXPOL profile’ page of the given gene will contain the information about the presence of any known polymorphic (TG/CA)n repeats as obtained from the CEPH database
(ftp://ftp.cephb.fr/ceph_genotype_db/ceph_db/Ver_9/mkr/).


G. Variability

The variability in the gene has been defined in the database by "Coefficient of Variation (CV)" or "Signal Log Ratio".

The differentially expressed genes can be searched in the Query form entitled "Differential Expression" on the basis of the range of expression variability measured as signal log ratio. The user can

1) Enter a range for Signal Log Ratio (Minimum available is 1.6)


2) The default option is "All" for all other queries.

3) Select from the given 'Signal Log Ratio' range from the list of specified range


The expression status and the variability in expression (CV) shown by the gene can be queried by the form "Expression in Blood"

 

1) The default option is "All" for all other queries.

2) Select from the given CV range from the list of specified range

3) Enter a range for CV ( Minimum is 0, Maximum is 0.5)


H. Data Set
  • For the "Expression in Blood" query form, two dataset options are available. The user can query the status of expression of all 5,407 genes or for the 542 human housekeeping genes.
  • For the "Differential Expression" query form, four dataset options are available that includes 'All', 'Housekeeping Genes', 'Unrelated individuals' and 'Twins'. The default selected option is "All".

I. Submitting the Query

The query can be submitted by clicking the submit button present at the end.


The submission of query will give the 'Result page' displaying the list of genes as the result of submitted query. Submission of the keyword "glycolysis" in the "Biochemical Pathway" query option and selection of chromosome 11 gives the following result.


By clicking on the Gene Symbol the detailed description about the gene including functional information, tissue expression, expression values, variability and distribution of (TG/CA)n and Alu repeats can be obtained. Hyperlinks are provided on the EXPOL profile page to link source databases (available on the web) to make the information more comprehensive.


SimRep, an online tool, developed in Perl identifies dinucleotide and other repeats in a given nucleotide sequence. The nucleotide sequence has to be submitted in raw format [plain sequence only containing A,T,G or C nucleotides without any spaces].
A sample input format is shown below:

The user can search for a perfect dinucleotide repeat by selecting from the list box or can enter a specific repeat (maximum 6 nucleotides) in the "Other Simple Repeat" field as shown above. The cut-off for scoring a repeat has to be specified in the "Enter Cut-Off" field. Only the repeats greater than or equal to the specified cut-off units will be reported. The program identifies the given repeat/pattern on both strands and reports "+", when the repeat is present on given strand and "-" for the complementary strand.
SimRep reports the number of times the repeat has occurred in the sequence, start and end position of the repeat in the nucleotide sequence in the form of a table as shown below.


List of HGNC Gene Symbols of human genes (version Feb 2006) available in EXPOLDB can be accessed from here. Gene List

 

Any Suggestions?? Help us improve.


About Expol Download Data Tutorial - Disclaimer FAQ

©2003 IGIB