Computational Biology Seminar Series

Computational Biology Seminar Series

Fall 02 Seminars:

Monday, December 9
4pm, CSIC 1115
University of Maryland

Title: Inferring Gene Transcription Networks: The Davidson Model

Sorin Istrail,
Celera Genomics

Joint work with Vladimir Filkov (UC Davis)
and Eric Davidson (Caltech)

Abstract

In 2001 Eric Davidson published his "Genomic Regulatory Systems"
book where he reports on 30 years of work, together with his colleagues,
on purple sea urchin. Their work provided a general experimental framework
for the study of a gene¹s cis-regulatory region (an upstream DNA sequence
containing a series of consecutive binding sites). Their approach consisted
of systematic, almost exhaustive, series of mutations of individual binding
sites, together with the associated measurements of the transcription rates.
By quantitative analysis, they were able to infer a complete set of minimal functional units of regulation and their interrelations. They proceeded hierarchically to uncover "modularity" and "hardwired information processing
logic" of a gene¹s cis-region. Most of their work was focused on the endo16
gene. Their extraordinary technology and the inference of the underlying
"network" for this gene resulted in the most completely understood
transcriptional system to date.

It is quite remarkable how combinatorial and robust their approach is. We
will present an analysis and a mathematical formalism for the Davidson
transcriptional network inference technology. We will also present a glace
into our recent work with Eric Davidson towards the identification of the
regulatory "programming language."

Monday, October 14
4pm, CSIC 1115
University of Maryland

Title: Genome as Literature

Dr. David B. Searls,
SrVP Bioinformatics,
GlaxoSmithKline Pharmaceuticals

Abstract

The human genome has been called the "book of life," a natural extension of the long-standing metaphor of DNA as a language. Taking this conceit seriously, we can ask to what extent the genome may profitably be viewed as a work of literature, subject to critical exegesis. While seemingly at opposite poles from the "hard science" of molecular biology, in fact such an approach is not so far from the increasingly hermeneutic role of the bioinformatician, insofar as both are concerned with comparing texts, detecting subtle patterns and relationships, elucidating theme and variation, etc. In this talk I will explore literary and linguistic aspects of the genome, by means of a "genomic" textual analysis of Lewis Carroll's Jabberwocky.

Wednesday, December 4
3pm, AVW 1112
University of Maryland

Title: Why is Sequence Comparison Useful?

Dr. David Lipman, NCBI, NIH

Abstract

There seems to be no question that biologists believe sequence comparison is useful. The BLAST server at NCBI alone performs over 70,000 database searches daily and over 120,000 scientific papers refer to some aspect of biological sequence comparison. Furthermore, one of the most compelling yet implicit justifications for the investment in high throughput genome sequencing projects has been the expectation that many of the gene products within this growing inventory will match previously studied proteins.

It was not always so - the first papers describing useful discoveries from sequence database searches often termed this detection of evolutionary relationships as "serendipitous" or "unexpected". Subsequent studies on protein sequences and structures showed that detectable conservation over hundreds of millions and even billions years of evolution is a rule, rather than an exception, in biology. Extrapolations made by several groups using different methods suggested that there are only about 1000 basic protein folds and a complete classification of all protein families is a realistic goal for the near future.

Though we don't yet know why most proteins evolve so slowly, it is important to realize that the conservative mode of protein evolution determines our very ability to make sense out of genome comparisons and that theoretical and empirical studies in molecular evolution are directly relevant for the practical goals of functional genomics. I will review some notable case stories from the early days of database searching and our growing understanding of the universe of protein families.

Fall 01 Seminars:

Friday, December 14 Primer 10:15-10:55 Seminar 11:00-12:00 3258 A.V. Williams Bldg University of Maryland
*Primer Title:* Some Biology That Computer Scientists Need for Bioinformatics *Seminar Title:* Functional Genomics and Bioinformatics Applied to Understanding Oxidative Stress Resistance in Plants
Lenwood S. Heath Associate Professor, Department of Computer Science, Virginia Polytechnic Institute and State University	PRIMER ABSTRACT: (40 minutes) Improved experimental technologies in the life sciences, such as DNA sequencing, microarray miniaturization of gene expression studies, and high resolution mass spectrometry for proteomics, has created an explosion in the production and availability of biological data. Biologists now deposit data from their experiments in databases, as it is no longer feasible to directly report the mass of detailed experimental data in a journal paper. There are numerous online databases of sequence data--genomic DNA, cDNAs, open reading frames, and proteins. The sequencing of the entire genomes of over 800 organisms have been completed and the sequences placed online, including drosophila (fruit fly), human, mouse, and arabidopsis (thale cress). Numerous microarray gene expression data sets are also available through the Internet. This abundance of biological data demands computational resources for managing, searching, analyzing, and mining that data, giving rise to the interdisciplinary field of bioinformatics. Bioinformatics, in turn, presents major new career and research opportunities for computer scientists. In this talk, we give an overview of some of the key biological concepts needed by computer scientists to understand the challenges and opportunities of bioinformatics. We will also give a succinct list of the bioinformatics challenges we currently find most interesting in our bioinformatics group. SEMINAR ABSTRACT: (50 minutes) In this talk, we discuss the application of microarray technology and bioinformatics to studying successful resistance of loblolly pine trees to drought stress. Microarray technology gives biologists access to information about gene expression for thousands of genes simultaneously. The computational component of the study is supported by an NSF-funded project named Expresso. Expresso is a problem solving environment that is being developed by computer scientists at Virginia Tech to support the management of the process of microarray experiment design, data capture, and data analysis. It is being developed in parallel with several biological studies involving microarray technology, including a large study of drought stress in pine. We describe both that biological study and the computational ideas being developed by our bioinformatics collaborators at Virginia Tech and elsewhere.
Wednesday, December 5 2:00 PM 1112 A.V. Williams University of Maryland
Title: Scaling Law in Sizes of Protein Sequence Families: From Super-Families to Orphan Genes
Ron Unger Faculty of Life Science Bar-Ilan University, Ramat-Gan 52900, Israel On Sabbatical at UMIACS and CARB, University of Maryland	Abstract: It has been observed that the size of protein sequence families is unevenly distributed, with few super families with a large number of members and many "orphan" proteins that do not belong to any family. Here it is shown that the distribution of sizes of protein families in different databases and classifications (Protomap, Prodom, Cog) follows a power-law behavior with similar scaling exponents, which is characteristic of self-organizing systems. A simple model of protein evolution is proposed, in which proteins are dynamically generated and clustered into families. The model yields a scaling behavior very similar to the distribution observed in the actual sequence databases, and thus shows that the existences of "super families" of proteins and "orphan" proteins are two manifestations of the same evolutionary process. (Joint work with Shlomo Havlin)
October 4, 2001 10:00 AM 2460 A.V. Williams University of Maryland
Title: Target Selection, Model Organism Genetics, and Comparative Genomics
Christian Burks, Ph.D. Vice President, Chief Informatics Officer Exelixis	Abstract: Completion of the human genome has created a new challenge for the pharmaceutical industry when selecting targets for screening: rather than focusing on simply finding and identifying genes and proteins -- they have in principle all been identified, subject to end-game closure and corrections -- we are focusing on characterizing their function and using this information for prioritizing them with respect to their relative merits as agents of or targets for therapeutic intervention. A similar paradigm shift is in progress for targets in insects and other pests for pesticide development and targets in plants for trait improvement, exemplified by the recent completion of the Arabidopsis genome. Model organism genetic screens and comparative genomics provide both speed and facility in optimizing target selection, particularly from the point of view that a target should be viewed as a pathway, or network of interacting proteins, rather than as an individual protein.