|
Predicting Pathways
for Unknown Genes Using
Gene Expression
Pattern in Yeast
Vandana Sreedharan,1
Olac Fuentes,2 and Stephen Aley1
(1) Bioinformatics
Program and Department of Biological Sciences, The University of Texas at El Paso
(2) Computer Science
Department, The University of Texas at El Paso
Abstract
The task of mapping
genes to biological pathways is quite consequential, because it is
necessary to understand every detail of a pathway in order to
understand how it functions and where it can malfunction.
Exploration of gene expression patterns has revealed that a gene's
expression is linked to the pathways where the gene belongs and
can be used to predict the gene's pathway. Due to the complexity
of the correlation between expression pattern and pathways,
classifiers such as Support Vector machines and Artificial Neural
Networks can be used to predict the pathways using the expression
pattern.
Yeast has many genes for which the pathways are not known. Hence
finding pathways for these unknown genes might be highly
beneficial. Considering 17 major pathways, 639 Microarray
experiments and 1169 genes in Yeast, the prediction is done using
Artificial Neural Networks and Support Vector Machines using
complete set of data sets as well as using a reduced data set.
Predictions and validation came out to be somewhat pathway
dependence- True positive rate ranged from 33% to 53% for the
overall data, but it was around 67% for eight pathways. False
positive rate range was 10% -25% for overall dataset, and 1% to
15% for 8 pathways. Then prediction using the same model is done
on the genes that have no known pathways. The predictions on these
unknown genes are validated by two techniques: GO term analysis
and BLAST. Most of the results agree with each other, giving an
indication to the pathways that the unknown genes might be
involved in.
Linkage
Analysis with Sib-Pair Data
Lizette Ortega,1 Jaclyn Scholl,2 Jaime
Ramos,3 Javier Rojo4
(1) University of
Arizona
(2) Providence College
(3) The University of Texas at El Paso
(4) Rice University, Houston, TX
Abstract
Through various
statistical methodologies, many tests have been devised to help
identify connections between a particular genotype of interest and
a phenotype. The purpose of this work is to not only acknowledge
the great contributions of several sib-pair methods, but to also
move beyond the isolated conclusions of each paper by presenting
more general comparisons and ultimately establishing more
comprehensive results. Using computer simulation, five regression
methods proposed by Hasemen, Elston, and Drigalenko are compared
in different conditions such as, three sample sizes and three
phenotype distributions. Results are obtained looking at Type I
Error and Power.
Research done at Rice University Summer Institute of Statistics
Three Dimensional
Structural Analysis of the Intact
Thermus
thermophilus ATP Synthase by Electron Microscopy
Sudheer Molugu,1
Sandhya Samavedam,1 Daniela Stock,2
and Ricardo A. Bernal1,2
(1) Department of
Chemistry, The University of Texas at El Paso
(2) MRC-LMB Cambridge, UK
Abstract
ATPases/synthases are
proton translocating molecular machines that are essential to all
living organisms because of their role in energy interconversion.
These molecular rotary motors are not only responsible for the
synthesis of ATP but, depending on their location, can also
function in the acidification of intracellular compartments at the
expense of ATP. The most intensively studied and best understood
is the eukaryotic F type ATPase, which synthesizes ATP via the
proton motive force that is generated by photosynthesis or
respiration. V-type ATPases are a related group of enzymes that
rotate in the opposite direction to that of F-type ATPases and
hydrolyze ATP in order to pump protons across the membrane. A-type
ATPases make up the third group, which is mostly composed of
ATPases of archaebacterial origin. All three ATPase types share a
gross conservation of structure and are evolutionarily related.
The general arrangement is that of a multi-subunit water-soluble
domain (V1/F1/A1) that is connected by a central stalk to an
multi-subunit integral membrane domain (Vo/Fo/Ao).
The exact
stoichiometry and location of various ATPase proteins remains in
question, particularly for those in the peripheral stalk. In order
to gain some insight into the structure and function of the
bacterial ATPase, we have examined the structure of the ATPase
from the hyperthermophilic eubacterium Thermus thermophilus using
electron microscopy. A three dimensional negative stain
reconstruction has revealed the presence of not one but two
peripheral stalks. The central stalk is well resolved, especially
with respect to its interaction with a single catalytic subunit in
the soluble sector giving rise to an asymmetry comparable to the
three catalytic states identified in the F-ATPase. Moreover,
density corresponding to the membrane domain reveals 6-fold
symmetry, indicating that there are probably 12 proteolipids in
the membrane component of the rotor. As a whole, the ATPase
appears to be about 20Å longer along the long-axis when compared
to the X-ray structure of the F1c10 ATPase. The increased length
appears to be solely due to a longer central stalk and not a
larger soluble or membrane domain.
Poster: Biochemical
and Computational Analyses of
Calcium Binding
Proteins in Bacteria
Charmy Gandhi1,2 and Delfina C. Domínguez2
(1) Bioinformatics
Program, The University of Texas at El Paso
(2) Clinical Laboratory Science Program, The University of Texas
at El Paso
Abstract
The function of
calcium (Ca2+) as a cell regulator is well documented
in eukaryotes. However, little is known about the role of Ca2+
in prokaryotes. Calcium ions play a pivotal role in eukaryotes by
maintaining and regulating many vital functions including cell
differentiation, gene expression, transport, motility, cell
division. Ca2+ homeostasis depends on the existence of
calcium binding proteins (CaBPs) as well as other mechanisms.
Recent studies suggest that bacteria, similar to eukaryotes, keep
tight control of cytosolic free Ca2+, have Ca2+
transporters, and CaBPs. We hypothesize that CaBPs play an
important role in Ca2+ homeostasis and that Ca2+
ions are involve in the regulation of several intracellular
processes in bacteria. An essential step toward an increased
understanding of the role of Ca2+ in prokaryotes is the
identification of intercellular CaBPs. Our preliminary data
indicates that several CaBPs are present in bacteria (E. coli,
B. subtilis and B. pertussis). These proteins share
similar characteristics with eukaryotic CaBPs including calmodulin
(CaM). The identified proteins are acidic, low molecular weight,
cross-react with both monoclonal anti-calmodulin and anti-calerythrin
antibodies and bind radioactive calcium (45CaCl2). In
an effort to identify and sequence these proteins we analyzed
crude cell lysates by 2D-electrophoresis
followed by mass spectrometry. Most of the proteins associated
with CaBP characteristics are associated with stress responses
(including DnaK, EF-Tu/Ts, AhpC, L7/L12, and GroEl). Based on
these findings and other published data the purpose of this
research is to perform a computational analysis to investigate the
presence of Ca2+ binding domains (including EF-hand, C2
domain, Gla domain, ANX domain) in these bacterial protein
sequences. The long-term goal of this research is to illuminate
the role of Ca2+ in bacteria.
Development,
Implementation and Testing of
a DNA Microarray Test
Suite
Ehsanul Haque
Bioinformatics Program, The University of Texas at El Paso
Abstract
Affymetrix Gene Chip
technology for measuring gene expression is one of the most
popular in medical science and basic biology research. After the
experiment has been performed, a series of computational
processing steps take place to convert the raw image data file to
one intensity value per gene. The number of competing microarray
data processing methods is large and growing, each having areas of
strengths and weaknesses. I initiated the development of a test
suite to help the user identify the best method for microarray
data analysis for their particular purpose. The test suite
includes graphics and summary statistics for parameters such as CV
(Coefficient of Variance) and RA (Relative Accuracy) and will help
the user to compare different processing methods. I used the test
suite to compare the results of four microarray data processing
methods.
Using Proteomic
Approach to Identify Tumor Associated Antigens as Markers in
Hepatocellular Carcinoma (HCC)
Kok Sun
Looi and Jianying Zhang
Department of Biological Sciences, The University of Texas at El
Paso
Abstract
Liver cancer,
especially hepatocellular carcinoma (HCC), affects the Hispanic
population of the United States at a rate double that of the white
population. The majority of people with HCC will die within one
year of its detection. This high case-fatality rate can in part be
attributed to lack of diagnostic methods that allow early
detection. In this project, we identified TAAs in HCC using
two-dimensional polyacrylamide gel electrophoresis (2-DE gel) and
mass spectrometry. Identified 29 proteins were immunoreacted with
HCC sera. Of 29 identified proteins, 17 were reported relating to
cancer and five relating to apoptosis. The molecular
identification and characterization of TAAs in HCC will also
contribute to our understanding of their role in malignant
transformation of the liver, thereby providing attractive
candidates for early diagnosis and targeted therapies.
Computational
Data (Physical Properties) of Structurally Modified Lead Compounds
of Thiophene Derivatives
Rama Krishna Empati,1 Suman Sirimulla,2 G.
Nagrajan,3
K.S Manjunath,3
and S. Mohan3
(1) Department of
Chemistry, The University of Texas at El Paso
(2) PES College of Pharmacy, Bangalore, India.
(3) SSR College of Pharmacy, Mahabubnagar, India.
Abstract
A novel series of
thiophene compounds with chlorine as substituent is taken in to
consideration. These compounds have been considered by the fact
that Chlorine containing β-lactam antibiotics like Cloxacillin,
dicloxacillin, clotrimazole, miconazole, ketaconazole have been
synthesized and screened for antifungal & antimicrobial activity.
We have calculated the physical properties like Melting point,
Heat of formation, HOMO, LOMO, Dipole, area, volume, electronic
charges and energy using computational software’s (Gaussian, PC
Model, Titan) of the structurally modified lead compounds of
thiophene derivatives.
Applying a
Hybrid Data Mining Approach to
Tumor Malignancy
Prediction
Tzu-Liang (Bill) Tseng, Udayvarun Konada, Alexander Nadackal,
and Kalyan Aleti
Department of Mechanical and Industrial Engineering
The
University of Texas at El Paso
Abstract
Automated decision
support for clinicians has been proposed in recent years. However,
little work has been devoted to the development of computer-based
systems to support clinicians' judgments and diagnoses. This paper
presents a new hybrid approach to automated clinical decision
support. The approach consists of a novel rough-set method for
feature selection and an enhanced support vector machine algorithm
for accurate prediction. Being unique and useful in solving
medical decision problems, the approach can derive decision rules
and identify the most significant features simultaneously. We
tested the approach using data from diagnoses of real patients
with solitary pulmonary nodule, an indication of potential lung
malignancy. Variants of the approach achieved over 90 percents
diagnostic accuracies and the derived rules were shown to
effectively assist further examination. This research thus
contributes to developing and validating a useful approach to
automated clinical decision support.
"Histrionics":
A Database Mining Approach for Classification of Functional
Disorders of the Autonomic Nervous System
Elise Marshall
Bioinformatics
Program, The University of Texas at El Paso
Abstract
A statistical
association approach applied to medical history information
provides a means to characterize syndromes, potentially
facilitating identification of pathophysiological mechanisms. In
dysautonomias, altered function of one or more components of the
autonomic nervous system adversely affect health. Chronic
orthostatic intolerance (COI) syndromes exemplify dysautonomias in
which the patient cannot tolerate prolonged standing. Postural
tachycardia syndrome (POTS) is characterized by an excessive
increment in heart rate during standing, and neurocardiogenic
syncope (NCS), the most common cause of acute loss of
consciousness in adults, can be evoked by orthostasis. For
instance, the symptom cluster in POTS could reflect decreased
venous return to the heart and compensatory activation of the
sympathetic nervous and adrenomedullary hormonal systems.
The Function of
Protein Disulfide Isomerase
Yu-Hsiang Wang1 and Mahesh Narayan2
(1) Department of
Biological Sciences, The University of Texas at El Paso
(2) Department of Chemistry, The University of Texas at El Paso
Abstract
Multi-disulfide-bond-containing proteins acquire their native
structures through an oxidative folding reaction involving the
formation of native disulfide bonds and native structure through
thol-disulfide exchange reactions and a conformational folding
event, respectively. In many proteins, the rate-determining step
in oxidative folding involves the formation of a structured
intermediate from its unstructured isomers through isomerisation
of non-native disulfide bonds to the native ones coupled with the
conformational folding reaction; the ensuing native-like tertiary
structure protects the formed native disulfides from further thiol-disulfide
isomerisation reactions. In vivo, the 56-kDA oxidoreductase,
protein disulfide isomerise (PDI), catalyzes oxidative protein
folding of “substrate proteins” before export to their respective
extracellular environments.
We have studied the PDI-catalyzed formation of des [40-95], a
three-disulfide-bond-containing structured intermediate of the
four-disulfide-bond-containing protein bovine pancreatic
ribonuclease A (RNase A) from its unstructured isomers as a
function of pH. Our data indicate that PDI has the greatest impact
on the reaction-rate at pH 7, with decreasing influence as the pH
of the reaction environment is increased.
Given the anomalously low pKa (6.7) of a PDI thiol, our results
demonstrate that the isomerisation activity of PDI is ideally
suited to the environs of the lumen of the ER where the pH is ~ 7
and uncatalyzed thiol-disulfide reactions are inherently slow.
These results have important implications for the development of
PDI-mimics that might eventually be used as chemotherapeutics for
alleviating misfolding-related diseases such as Alzheimer’s,
Parkinson’s and Jakob-Creutzfeldt’s disease.
Small-molecule
Catalyzed Oxidative Protein Folding:
The Quest for In
Vivo Chemotherapeutics
Paul Nieves,1,2 Saemin Chang,2 Matthew Fink,2
Luis Martínez,2
and Mahesh Narayan2
(1) Universidad
Metropolitana, PR
(2) Department of Chemistry, The University of Texas at El Paso
Abstract
Multi-disulfide-bond-containing proteins acquire their native
structures through an oxidative folding reaction; a process
involving the formation of the native set of protein disulfide
bonds through thiol-disulfide exchange reactions (oxidation,
isomerisation and reduction) of their cysteines/disulfides coupled
with a conformational folding event. In vivo, the 56-kDa
oxidoreductase, protein disulfide isomerise (PDI), catalyzes
oxidative folding reactions in the lumen of the E.R. prior to
export of the “substrates (disulfide-bond-containing proteins)” to
their extracellular environs.
The oxidative folding rate of the four-disulfide-bond-containing
protein bovine pancreatic ribonuclease A (RNase A) was examined in
the presence of a synthetic small-molecule dithiol,
(+/-)-trans-1,2-bis(2-mercaptoacetamido) cyclohexane (BMC), and in
combination with a naturally occurring osmolyte, trimenthylamine-N-oxide
(TMAO). The results indicate that the oxidative folding rate of
RNase A is enhanced 2-fold by the presence of BMC (0.4 mM) and
3-fold by the combined presence of the dithiol (0.4 mM) and the
osmolyte (0.2 M) relative to the control experiment.
Current efforts are geared towards the synthesis of a
second-generation small-molecule mimic of PDI, viz.,
(+/-)-trans-1,2,4,5-tetra (2-mercaptoacetamido) cyclohexane which
will be tested for its efficacy in catalyzing oxidative folding
reactions. The ultimate objective is the synthesis of a
small-molecule chemotherapeutic that can be used to catalyze in
vivo protein folding, thereby alleviating misfolding-related
diseases such as Alzheimer’s, Parkinson’s and Jakob-Creutzfeldt’s
disease.
Cross-validated QSAR
studies of a Systematic Simple
Traditional Protocol
verses Fallacious and Complicated
Suman Sirimulla, Carrie Ash-Mott, and William C. Herndon
Department of Chemistry, The University of Texas at El Paso
Abstract
A common procedure for
QSAR analysis consist of data selection (generally sets of
congeneric compounds and their corresponding biological
activities), tabulation of trial physico-chemico or ad hoc
molecular structural descriptors, followed by a multilinear
statistical analysis to derive a statistically valid QSAR
correlation of the activity data making use of a subset of the
trial descriptors. A final important step is cross-validation to
assess the putative predictive (rather than just correlative)
capabilities of the derived QSAR model
equation.
The results presented in this study will consist of an analysis of
three recent cross-validated studies in which antimalarials
activities of a set of aromatic mefloquine derivatives are
correlated with calculated atomic charges using increasingly
complex statistical procedures. The reported conclusions are that
these methods give high quality statistical results, providing
useful techniques with very good predictive power. However, these
conclusions are negated by the fact that over 60% of the compounds
(13 out of 21) in the study are assumed
to have insensible fictitious structures.
The perceived high quality of the overall statistical results may
indicate deficiencies in the modeling protocols used in the above
studies, and in rationales that have been used to justify
cross-validation procedures. In particular, the interpretation of
the results of the cross-validation as measuring predictive power
of a QSAR model will be criticized. We argue that cross-validation
is valuable to primarily establish robustness of the fit to a
model equation, and, in particular, the leave-one-out procedure
gives useful information about outliers.
The results of a very successful elementary QSAR study using
substituents indicator variables, coupled with two calculated
theoretical AM1 parameter for the actual compounds used in the
work outlined above are presented.
OGPET v1.0: Prediction
of mucin-type O-glycosylation residues using variation profiling.
Rafael Torres, Jr.,1 Yash Dayal,2 Ming-Ying
Leung,2 and Igor Almeida1
(1) Department of Biological Sciences, Border Biomedical Research
Center, The University of Texas at El Paso.
(2) Department of Mathematical Sciences, Bioinformatics Program,
Border Biomedical Research Center, The University of Texas at El
Paso.
Abstract
O-Glycosylation (OG)
is a key post-translational modification of proteins that is
considerably altered in certain pathologies (e.g., cancer).
Therefore, owing its potential diagnostic and therapeutic
relevance, few algorithms for prediction of OG sites were
developed. However, these algorithms exhibit rather low
specificity in predicting true OG sites. Based on experimentally
mapped mucin-type OG residues, we have developed an algorithm,
namely O-Glycosylation Prediction Electronic Tool (OGPET), which
shows very high sensitivity and specificity. OGPET makes amino
acid (aa) prediction motifs considering 5 relevant positions (-3,
-1, +1, +3, and +4) around the possible Thr/Ser residue (position
0) that are known to influence the interaction of the polypeptide
GalNAc-transferase (ppGalNacT) with the target protein.
Furthermore, analysis of the physical and chemical properties of
aa allowed the algorithm to indistinctively switch aa in any of
the 5 relevant positions without increasing the rate of
false-positive predictions. Our results showed a sensitivity of
0.97 and a specificity of 0.98 for standard performance tests.
OGPET predicted true-positive sites despite mutations on the
protein primary sequence using the aa variation approach
(variation profiling). Finally, a new set of prediction
constraints was able to find novel sites that were not originally
included on the training sets. OGPET is currently available
through the WWW (http://129.108.112.23/OGPET/).
Project supported by
Grant#5G12RR008124 from the National Center for Research Resources
(NCRR)/NIH. Its contents are solely the responsibility of the
authors, and do not necessarily represent the official views of
NCRR or NIH. R.T., Jr. is recipient of a NIH/MARCU*STAR
scholarship.
RNAVLab: An
Open-source User-friendly Virtual Laboratory
for the Study of RNA
Secondary Structures
Michela Taufer,1 Ming-Ying Leung,2 Kyle
Johnson,3 Abel Licon,1 Prayook
Tungjatooronrusamee,2 Yash Dayal,2 Daniel
Catarino,1 Hao Lei2
(1) Computer Science
Department, The University of Texas at El Paso
(2) Bioinformatics Program, The University of Texas at El Paso
(3) Department of Biological Sciences, The University of Texas at
El Paso
Abstract
The goal of the
RNAVLab project is to design and build an adaptive grid computing
system that, at runtime, identifies and exploits computer
resources across the The University of Texas at El Paso (UTEP)
campus to study secondary structures of large numbers of RNA
segments using a variety of prediction programs. The grid
environment at UTEP is based on an unified software tool for RNA
secondary structure prediction, alignment, comparison, and
classification. Our tool uses grid computing to build the
computing power needed for predictions of large RNA sequences. New
features are easy to integrate in our tool because of its
modularity. We are currently using our tool for the study of
prediction accuracy of a variety of codes for RNA secondary
structure predictions, including pseudoknots; the identification
of common motifs and their functions in virus secondary
structures, e.g., viral replication; and the identification of
common pseudoknots across viruses within the same family, species,
or genus.
DAPLDS: Dynamically
Adaptive Protein-ligand Docking System Using Volunteer Computing
Michela Taufer,1 Patricia J. Teller,1
Martine Ceberio,1 David Anderson,2 Charles
L. Brooks III,3 Andre Kerstens,1 Trilce
Estrada,1 David Flores,1 Richard Zamudio,1
Karina Escapita,1 Guillermo Lopez,1 Roger
Armen3
(1) Computer Science
Department, The University of Texas at El Paso
(2) Space Sciences Laboratory, The University of California at
Berkeley
(3) Department of Molecular Biology, The Scripps Research
Institute
Abstract
DAPLDS or Dynamically
Adaptive Protein-Ligand Docking System is a project that involves
collaboration among the University of Texas - El Paso, The Scripps
Research Institute (TSRI), and the University of California -
Berkeley. This project, through implementation and use of a cyber
tool, DAPLDS, that enables adaptive multi-scale modeling in a GC
environment, will further knowledge of the atomic details of
protein-ligand interactions and, by doing so, will accelerate the
discovery of novel pharmaceuticals. The goals of the project are:
(1) to explore the multi-scale nature of algorithmic adaptations
in protein-ligand docking and (2) to develop cyber infrastructures
based on computational methods and models that efficiently
accommodate these adaptations.
Topaz: A Friendly Tool
for Scientists to Access Data
on Grid Repositories
Richard Zamudio,1 Daniel Catarino,1
Michela Taufer,1
Karan Bhatia,2 and Brent Stern2
(1) Computer Science
Department, The University of Texas at El Paso
(2) San Diego Supercomputer Center, University of California at
San Diego
Abstract
As grid
infrastructures mature, an increasing challenge is to provide
end-user scientists with intuitive interfaces to computational
services, data management capabilities, and visualization tools.
The current approach used in a number of cyber-infrastructure
projects is to leverage the capabilities of the Mozilla framework
to provide rich end-user tools that seamlessly integrate with
remote resources such as web/grid services and data repositories.
The goal of this
project is to provide the scientific community with an
user-friendly, efficient interface to grid technologies. Therefore
we are designing and implementing Topaz, an open-source GridFTP
protocol extension to the Firefox browser. In the design,
implementation and performance analysis of Topaz, we are been
guided by rigorous software engineering tools such as the Data
Flow Diagrams (DFDs). GridFTP servers, similar to FTP servers used
on the Internet, provide a data repository for files and are
optimized for grid use (support for very large file sizes,
high-performance data transfer, third-party transfer, integration
with Grid Security Infrastructure). Topaz provides scientists with
a familiar and user-friendly interface with which to access
arbitrary GridFTP servers by providing upload and download
functionalities, as well as by obtaining and managing
certificates.
|