|
|
About EXPOLDB HelpDesk | ||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
“Expression
Linked Polymorphism Database (EXPOLDB): A resource for linking genome
wide expression with cis modulators of transcription in the Human Genome
” The availability
of the human genome sequence and the parallel accumulation of microarray
data offer new opportunities to examine the role of repetitive sequences as
modulators of gene expression underlying the variation in expression.
Therefore, in order to obtain insight into gene regulation in humans, a necessary
pre-requisite step is to link gene expression data from microarray experiments
with the distribution of cis modulators of transcription such as the dinucleotide (TG/CA)n repeats. At present there is no database linking genome wide expression
with the distribution of (TG/CA)n and other simple repeats.
Tetrapodic layout of EXPOLDB: The 4 domains of layout are shown in bold face type. Note that all attributes of a gene are singularly linked to its official HGNC symbol serving as the primary key. |
| Database and web interface |
The backend data was prepared in MS Access 2000 (Microsoft Corporation, Inc. USA). Server site scripting was prepared using ASP (Active Server Pages, version 3.0), PHP (PHP: Hypertext Preprocessor, version 5.0) and Perl (Practical Extraction Report Language, version 5.8.1). The client site scripting was prepared using JavaScript and HTML (Hyper Text Markup Language, version 4.0). Internet Information Server (IIS) version 6.0 was used as web server.
|
| EXPOLDB as a resource to examine natural variation in population |
Natural
variation in gene expression between healthy human individuals has been
largely unexplored. The variation in gene expression is an outcome of
the complex inter-play of genetic polymorphisms (acting in cis or in trans),
physiological variations (such as time of day, gender) and environmental
factors. In order to understand the genetic basis of variation in gene
expression between normal human individuals, we need to obtain genome-wide
expression data from various populations. We examined the gene expression
profiles in 13 normal human individuals including five pairs of monozygotic
twins and three unrelated individuals in blood leukocytes measured using
HG U95Av2 oligonucleotide microarrays consisting probes for ~10, 000 genes..
A total of 5,407 genes were found expressed in blood leukocytes. Of these, a total of 2,888 genes were found to be differentially expressed in pairwise comparisons between unrelated individuals
and 212 genes in monozygotic twins. Information on mean expression and variability (CV) for human housekeeping genes that had present call (P) in any of the used arrays is also provided in EXPOLDB. This database is likely to be a useful resource for those that are interested in studying natural variations in humans. |
| Mean expression and Measures of variability |
The mean expression of each gene was computed from the log10 transformed 'signal' values with P calls across all 13 arrays. All genes considered for mean
expression had present (P) call in arrays. Replicate probe sets if
present, were clustered together and averaged. A total of 5,407 genes were
found expressed in blood leukocytes using these criteria. For these genes, the coefficient
of variation (CV) was computed as SD/Mean where SD is
the standard deviation of the log10 transformed 'signal' values across the 13 arrays.
Examining the role of (TG/CA)n repeats as modulators of gene expression |
| (TG/CA)n repeats as cis modulator of transcription |
About 50% of the human genome consists of repetitive elements comprising of simple repeats, short interspersed nucleotide elements (SINEs), medium reiteration, long terminal repeats and long interspersed nucleotide elements (LINES). Among the dinucleotide repeats, (TG/CA)n is the most frequent in the human genome and many of these repeats exhibit length polymorphism. This property has been extensively used in the construction of genetic maps. (TG/CA)n repeats which have alternating purine/pyrimidine sequences have a propensity to undergo conformational transition on methylation under physiological conditions.
(TG/CA)n repeats can influence transcription of a gene in cis (variations within the gene) due to ‘incidence’ or ‘secondary elongation’ in these repeats (either within or in close proximity to the gene). The functional roles of (TG/CA)n repeats are beginning to emerge. Since the early observation on the modulation of transcription by (TG)n tracts by Hamada et al (1984), experimental evidences on the role of (TG/CA)n repeats in the regulation of gene expression have been steadily accumulating. The up-regulation or down-regulation of transcription by (TG/CA)n repeats has been reported for the following genes: rat alpha-lactalbumin (Meera et al., 1989), rat prolactin (Naylor et al. 1990), Acetyl-CoA carboxylase ACC (Tae et al. 1994), matrix metalloprotease MMP-9 (Shimajiri et al. 1999), gamma interferon IFN-gamma (Pravica et al. 1999), epidermal growth factor receptor EGFR (Gebhardt et al. 1999), salt sensitivity HSD11B2 (Agarwal et al. 2000), and tilipia Prolactin1 (Streelman et al. 2002). Expression differences in these genes due to polymorphic (TG/CA)n repeats varies from one gene to another, exhibits a wide range, and in some cases is reported to be as high as 20 folds. While the effects induced by microsatellite variation may be complicated by the presence of other transcriptional regulatory elements in the proximity of a given gene, these observations underscore the importance of the (TG/CA)n repeats and their polymorphisms in gene regulation. A list of more examples from literature illustrating the role of these repeats as modulators of gene expression is given below. "Examples from Literature"
Recently, several reports describing
the association of polymorphism in (TG/CA)n repeats with genetic diseases
have appeared such as in the coronary heart disease (eNOS, endothelial
nitric oxide synthase; Laule et al. 2003), in diabetic retinopathy (ALR2,
aldose reductase; Kumaramanickavel et al. 2003), in asthma (IFN-gamma,
gamma interferon; Nagarkatti et al. 2002) and in breast cancer (IGF-I,
insulin-like growth factor-I;Yu et al. 2001). Case studies to examine the role of (TG/CA)n repeats as modulators of gene expression |
| Housekeeping genes |
Housekeeping genes are expressed constitutively in
all tissues to maintain cellular functions and hence used as controls
to examine the expression of other genes. The housekeeping genes are
less likely to be affected by variations in tissue specific factors,
number of different cell types in blood leukocytes (if similar quantities
of total RNA is taken) and other structural alterations in chromatin
structure that may vary between different individuals. The clustering
of housekeeping genes in the human genome supports the above rationale
and indicates that it may be advantageous to assemble them in a common
region that remains in an open conformation across all the cells (Lercher
et al., 2002).
|
| The RUNX family |
The mammalian RUNX genes comprise a small
family of three genes RUNX1, RUNX2 and RUNX3 that act as master
regulators of gene expression in major developmental pathways (Levanon
et al. 2003). They contain a highly conserved region designated ‘runt
domain’ (RD), found in the Drosophila gene Runt. RUNX1 and
RUNX2 play fundamental roles in organogenesis and are associated
with human diseases. Only recently RUNX3 has become the focus
of investigations. Sequence analysis suggests that RUNX3 is
the evolutionary founder of the mammalian RUNX family (Bangsow
et al. 2001) and both genes have similar architecture. In adults, both
RUNX1 and RUNX3 are highly expressed in the hematopoietic
system with high levels of mRNA and proteins in spleen, thymus and blood
(Levanon et al. 1994, 1996; Meyers et al. 1996; Le et al. 1999; Levanon
et al. 2003). Thus, the RUNX family provides a set of genes
with similar architecture to investigate the effects of (TG/CA)n repeats
in expression.
The Expol profiles of these two genes indicate that RUNX1 has several (TG/CA)n repeats whereas RUNX3 does not have any (TG/CA)n repeats (n >=6 units). RUNX1 has many long (TG/CA)n repeats (n ≥ 12 units) : (CA)17, (CA)22 and (TG)12, (TG)13, (TG)14, (TG)21, (TG)23, (TG)24 in introns and and one interrupted (TG)7-CG-(TG)9 repeat in exon 8 . The mean expression of RUNX3 (2.90 log10 signal units) is about higher than the mean expression of RUNX1 (2.37 log10 signal units) indicating that RUNX3 was expressed higher than RUNX1. The differences were statistically significant (t-test, df = 13, P < 0.0002). The uniformity observed in the difference between the expression of RUNX3 and RUNX1 in all experiments suggests that the incidence of (TG/CA)n repeats in RUNX1 correlates with its generally observed reduced expression. These results are in corroboration with previous experimental studies including Interferon-gamma (IFN-gamma, (CA)10-15 ), Epidermal growth factor receptor (EGFR, (CA)21, 14) ,and the salt sensitivity HSD11B2 ((CA)14, 23) gene (Pravica et al, 1999, Gebhardt et al, 1999, Agarwal et al, 2000) where reduced expression levels of these genes correlates with either the presence (vs. absence) of repeats or the increase in length of the repeats |
| The Eukaryotic Initiation Factor housekeeping genes | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Five eukaryotic initiation factor genes EIF3S5,
EIF3S6, EIF4A1, EIF4A2 and EIF4G2 were retrieved by submitting
EIF* in the Gene Symbol field and selecting the housekeeping genes dataset.
All these genes had present (P) call in all array experiments. Of these,
only EIF3S6 has a perfect dinucleotide stretch (TG)12 within
the gene. The rest of the four genes did not have (TG/CA)n repeats (n
≥ 6 units). The mean expression of the EIF genes without
(TG/CA)n repeats was 3.27 log10 signal units which is higher than the
mean expression of EIF3S6 (2.73 log10 signal units). The differences were statistically
significant (t-test, df = 43, P < 0.0001). Table: Mean expression in different individual samples
These results provide leads for further experimental investigations
to correlate the ‘incidence’ or ‘secondary elongation’ of (TG/CA)n repeats
with the observed gene expression and demonstrate the usefulness of
EXPOLDB. Studying genome wide expression pattern of genes |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Examining gene expression and variability in Biochemical Pathways |
With the present focus of biology shifting towards adopting a systemic approach to understand the complexity of human biology, biochemical pathways have become one of the focus of recent investigations. EXPOLDB offers a unique utility by providing information on gene expression and its variability in several biochemical pathways (134) across different individuals.
This potential utility of EXPOLDB is illustrated here by choosing the example of Glycolysis pathway, which is a basic source of energy in mammals. Submitting the keyword 'glycolysis' in the field on 'Functional Pathway' in the query page 'Expression in Blood' retrieved the records on genes coding for enzymes involved in glycolysis and related linked pathways as outlined in KEGG or GenMAPP databases. We examined the expression of ten known genes of this pathway involved in conversion from glucose to pyruvate by selecting the 'square boxes' adjacent to the gene symbols. All ten genes of the glycolysis pathway were present in EXPOLDB and showed coefficient of variation (CV) of expression below 0.15, indicating low variability in accordance with the most constant housekeeping genes described by Hsiao et al. 2002. Since the expression of the gene GAPDH coding for Glyceraldehyde-3-phosphate dehydrogenase was found to be highly variable between different individuals [20,23], analysis using EXPOLDB provides information on variability of additional genes of the glycolytic pathway that are worth examining for their low variability at large scale. If verified, then some of these genes, if not all, can be used as internal controls in mRNA quantitation experiments. The potential utility of EXPOLDB illustrated through the example of 'Glycolysis' pathway and the list of available pathways can be accessed through the following link. |
| Identification of genes varying in twins and unrelated individuals |
Natural variation in gene expression between healthy human individuals has been the focus of recent studies (Cheung et al., 2003, Whitney et al., 2003). EXPOLDB houses information on expression pattern of genes that vary in monozygotic twins and unrelated individuals that can be queried in these distinct datasets by using the appropriate metrics of variability (CV or signal log ratio) (Sharma et al., 2005). This genome-wide expression data from various individuals (within monozygotic twins and unrelated individuals) compiled in EXPOLDB provides an opportunity to understand the genetic basis of variation in gene expression between normal human individuals and may serve as a resource for the researchers in this field and aid in systemic approches. |
| References |
Click here to refer the list of references. |
Any Suggestions?? Help us improve.
| About Expol - Download Data - Tutorial - Disclaimer- FAQ |
©2003 IGIB