IXORA BITS 2013
Wednesday, May 22, 2013
BIOINFORMATICS GLOSSARY
Accession number
An identifier supplied by the curators of the major
biological databases upon submission of a novel entry that uniquely identifies
that sequence (or other) entry.
Active site
The amino acid residues at the catalytic site of an enzyme.
These residues provide the binding and activation energy needed to place the
substrate into its transition state and bridge the energy barrier of the
reaction undergoing catalysis
Algorithm
A series of steps defining a procedure or formula for
solving a problem, that can be coded into a programming language and executed.
Bioinformatics algorithms typically are used to process, store, analyze,
visualize and make predictions from biological data.
Alignment
The result of a comparison of two or more gene or protein
sequences in order to determine their degree of base or amino acid similarity.
Sequence alignments are used to determine the similarity, homology, function or
other degree of relatedness between two or more genes or gene products.
Allele
A given form of a gene that occupies a specific position or
locus on a chromosome. Variant forms of genes occurring at the same locus are
said to be alleles of one another.
Analogy
Reasoning by which the function of a novel gene or protein
sequence may be deduced from comparisons with other gene or protein sequences
of known function. Identifying analogous
or homologous genes via similarity searching and alignment is one of the chief
uses of Bioinformatics. (See also alignment, similarity search.)
Annotation
A combination of comments, notations, references, and
citations, either in free format or utilizing a controlled vocabulary, that
together describe all the experimental and inferred information about a gene or
protein. Annotations can also be applied
to the description of other biological systems.
Batch, automated annotation of bulk biological sequence is one of the
key uses of Bioinformatics tools.
Anticodon
The triplet of contiguous bases on tRNA that binds to the
codon sequence of nucleotides on mRNA. Example: GGG codes for Glycine.
Antisense
DNA or RNA composed of the complementary sequence to the
target DNA/RNA. Also used to describe a therapeutic strategy that uses
antisense DNA or RNA sequences to target specific gene DNA sequences or mRNA
implicated in disease, in order to bind and physically inhibit their expression
by physically blocking them.
Assembly
Compilation of overlapping sequences from one or more
related genes that have been clustered together based on their degree of
sequence identity or similarity. Sequence assembly may be used to piece
together "shotgun" sequencing fragments (see shotgun sequencing)
based upon overlapping restriction enzyme digests, or may be used to identify
and index novel genes from "single-pass" cDNA sequencing efforts.
Autoradiography
A method used to locate radioisotope-labeled materials which
have been separated in gels or are present in blots. The location of the
radiolabeled material is determined by overlaying the test material with a photographic
film that is sensitive to the radioisotope.
Bacterial artificial chromosome (BAC)
Cloning vector that can incorporate large fragments of DNA.
(see YACS)
Bacteriophage
A virus that infects bacteria. The bacteriophage DNA has
served as a basis for cloning vectors, and is also utilized to create phage
libraries containing human or other genes.
Baculovirus
An insect virus which forms the basis of a protein
expression system
Base pair
A pair of nitrogenous bases (a purine and a pyrimidine),
held together by hydrogen bonds, that form the core of DNA and RNA i.e the A:T,
G:C and A:U interactions.
Beta sheet
A three dimensional arrangement taken up by polypeptide
chains that consists of alternating strands linked by hydrogen bonds. The
alternating strands together form a sheet that is frequently twisted. One of
the secondary structural elements characteristic of proteins.
Bioinformatics
The field of endeavor that relates to the collection,
organization and analysis of large amounts of biological data using networks of
computers and databases (usually with reference to the genome project and DNA
sequence information)
Bivalent
Having two binding sites; having 2 free electrons available
for binding.
Blunt-end (ligation)
The joining of DNA fragments that contain no overhang at
either end and consequently no DNA bases available for hybridization (cf.
sticky-end ligation).
cDNA (complementary DNA)
A DNA strand copied from mRNA using reverse transcriptase. A
cDNA library represents all of the expressed DNA in a cell.
cDNA library
A set of DNA fragments prepared from the total mRNA obtained
from a selected cell, tissue or organism.
Chimeric clone
A cloning artifact created by a foreign gene being inserted
into a vector in an incorrect orientation resulting in theexpression of a
protein consisting of a fusion of two different gene products.
Chromat
Data file output from most popular DNA sequencers. Chromat
files consist of the fluorescent traces generated by the sequencer for each of
the four chemical bases, A, C, G, and T, together with the sequence and
measures of the error in the traces at each sequence position.
Chromatin
The chromosome as it appears in its condensed state,
composed of DNA and associated proteins (mainly histones).
Chromosome
The structure in the cell nucleus that contains all of the
cellular DNA together with a number of proteins that compact and package the
DNA.
Clone
A population of genetically identical cells or DNA
molecules.
Cloning
The formation of clones or exact genetic replicas.
Cluster
The grouping of similar objects in a multidimensional
space. Clustering is used for
constructing new features which are abstractions of the existing features of
those objects. The quality of the clustering depends crucially on the distance
metric in the space. In bioinformatics, clustering is performed on sequences,
high-throughput expression and other experimental data. Clusters of partial or
complete gene sequences can be used to identify the complete (contiguous)
sequence and to better identify its function. Clustering expression data
enables the researcher to discern patterns of co-regulation in groups of genes.
Coding regions (CDS)
The portion of a genomic sequence bounded by start and stop
codons that identifies the sequence of the protein being coded for by a
particular gene.
Codon
A sequence of three adjacent nucleotides that designates a
specific amino acid or start/stop site for transcription.
Combinatorial chemistry
The use of chemical methods to generate all possible
combinations of chemicals starting with a subset of compounds. The building
blocks may be peptides, nucleic acids or small molecules. The libraries of
compounds formed by this methodology are used to probe for new pharmaceutical
reagents (see high-throughput screening).
Complementary determining region (CDR)
The hypervariable regions of an antibody molecule,
consisting of three loops from the heavy chain and three from the light chain,
that together form the antigen-binding site.
Complexity (of gene sequence)
Contig
A length of contiguous sequence assembled from partial,
overlapping sequences, generated from a "shotgun" sequencing
project. Contigs are typically created
computationally, by comparing the overlapping ends of several sequencing reads
generated by restriction enzyme digestion of a segment of genomic DNA. The creation of contigs in the presence of
sequencing errors, ambiguities and the presence of repeats is one of the most
computationally challenging aspects of the role of Bioinformatics in genome
analysis.
Cosmids
DNA vectors that allow the insertion of long fragments of
DNA (up to 50 kbases).
Term used to describe the high resolution molecular
structure derived by x- ray crytallographic analysis of protein or other
biomolecular crystals.
Data Cleaning
A process whereby automated or semi-automated algorithms are
used to process experimental data, including noise, experimental errors and
other artifacts, in order to generate and store high-quality data for use in
subsequent analysis. Data cleaning is typically required in high-throughput
sequencing where compression or other experimental artifacts limit the amount
of sequence data generated from each sequencing run or "read."
Data Mining
The ability to query very large databases in order to
satisfy a hypothesis ("top-down" data mining); or to interrogate a
database in order to generate new hypotheses based on rigorous statistical
correlations ("bottom-up" data mining).
Data Processing
Data processing is defined as the systematic performance of
operations upon data such as handling, merging, sorting, and computing. The
semantic content of the original data should not be changed, but the semantic
content of the processed data may be changed.
Data Warehouses
Vast arrays of heterogeneous (biological) data, stored
within a single logical data repository, that are accessible to different
querying and manipulation methods.
Database
Any file system by which data gets stored following a
logical process. (see also relational
database)
Dendrogram
A graphical procedure for representing the output of a
hierarchical clustering method. A
dendrogram is strictly defined as a binary tree with a distinguished root, that
has all the data items at its leaves.
Conventionally, all the leaves are shown at the same level of the
drawing. The ordering of the leaves is
arbitrary, as is their horizontal position. The heights of the internal nodes
may be arbitrary, or may be related to the metric information used to form the
clustering.
Dimer
A composite molecule formed by the binding of two molecules
(see homo and heterodimers).
Disulphide bond
Covalent link formed between the sulphur atoms of two
different cysteine residues in a protein. Important in maintaining the folded
structure of a protein, and also for linking different proteins in a complex.
DNA (deoxyribonucleic acid)
The chemical that forms the basis of the genetic material in
virtually all organisms. DNA is composed of the four nitrogenous bases Adenine,
Cytosine, Guanine, and Thymine, which are covalently bonded to a backbone of
deoxyribose-phosphate to form a DNA strand. Two complementary strands (where
all Gs pair with Cs and As with Ts) form a double helical structure which is
held together by hydrogen bonding between the cognate bases.
DNA fingerprinting
A technique for identifying human individuals based on a
restriction enzyme digest of tandemly repeated DNA sequences that are scattered
throughout the human genome, but are unique to each individual.
DNA microarrays
The deposition of oligonucleotides or cDNAs onto an inert
substrate such as glass or silicon. Thousands of molecules may be organized
spatially into a high-density matrix. These DNA chips may be probed to allow
expression monitoring of many thousands of genes simultaneously. Uses include
study of polymorphisms in genes, de novo sequencing or molecular diagnosis of
disease.
DNA polymerase
An enzyme that catalyzes the synthesis of DNA from a DNA
template given the deoxyribonucleotide precursors.
DNA probes
Short single stranded DNA molecules of specific base
sequence, labeled either radioactively or immunologically, that are used to
detect and identify the complementary base sequence in a gene or genome by
hybridizing specifically to that gene or sequence.
DNA sequencing
The technique in which the specific sequence of bases
forming a particular DNA region is deciphered.
DNase (Deoxyribonuclease)
One of a series of enzymes that can digest DNA.
Domain (protein)
A region of special biological interest within a single
protein sequence. However, a domain may also be defined as a region within the
three-dimensional structure of a protein that may encompass regions of several
distinct protein sequences that accomplishes a specific function. A domain
class is a group of domains that share a common set of well-defined properties
or characteristics.
Drug discovery cycle
The cycle of events required to develop a new drug.
Typically this involves research, preclinical testing and clinical development,
and can take from 5 to 12 years.
Electrophoresis
The use of an external electric field to separate large
biomolecules on the basis of their charge by running them through acrylamide or
agarose gels.
Enhancers
DNA sequences that can greatly increase the transcription
rates of genes even though they may be far upstream or downstream from the
promoter they stimulate.
Epigenomics
The study of complex expression networks or linkages both
spatially (within the body) and temporally (at different times in development).
Exon
The region of DNA within a gene that codes for a polypeptide
chain or domain. Typically a mature protein is composed of several domains
coded by different exons within a single gene.
Expressed Sequence Tags (ESTs)
A small sequence from an expressed gene that can be
amplified by PCR. ESTs act as physical markers for cloning and full length
sequencing of the cDNAs of expressed genes. Typically identified by purifying
mRNAs, converting to cDNAs, and then sequencing a portion of the cDNAs.
Expression (gene or protein)
A measure of the presence, amount, and time-course of one or
more gene products in a particular cell or tissue. Expression studies are typically performed at
the RNA (mRNA) or protein level in order to determine the number, type, and
level of genes that may be up-regulated or down-regulated during a cellular
process, in response to an external stimulus, or in sickness or disease. Gene chips and proteomics now allow the study
of expression profiles of sets of genes or even entire genomes.
Expression profile
The level and duration of expression of one or more genes,
selected from a particular cell or tissue type, generally obtained by a variety
of high-throughput methods, such as sample sequencing, serial analysis, or
microarray-based detection.
Expression vector
A cloning vector that is engineered to allow the expression
of protein from a cDNA. The expression vector provides an appropriate promoter
and restriction sites that allow insertion of cDNA.
Fingerprint
A fingerprint is a set of motifs used to predict the
occurrence of similar motifs, in either an individual sequence or in a
database. Fingerprints are refined by iterative scanning of a composite protein
sequence database. A composite or
multiple-motif fingerprint contains a number of aligned motifs taken from different
parts of a multiple alignment. True
family members are then easy to identify by virtue of possessing all elements
of the fingerprint, while subfamily members may be identified by possessing
only part of it.
Frameshift
A deletion, substitution, or duplication of one or more
bases that causes the reading-frame of a structural gene to shift from the
normal series of triplets.
Functional genomics
The use of genomic information to delineate protein
structure, function, pathways and networks. Function may be determined by
"knocking out" or "knocking in" expressed genes in model
organisms such as worm, fruitfly, yeast or mouse.
Fusion protein
The protein resulting from the genetic joining and
expression of 2 different genes (see chimeric)
Gaps (affine gaps)
A gap is defined as any maximal, consecutive run of spaces
in a single string of a given alignment. Gaps help create alignments that
better conform to underlying biological models and more closely fit patterns
that one expects to find in meaningful alignment. The idea is to take in
account the number of continuous gaps and not only the number of spaces when
calculating an alignment. Affine gaps contain a component for gap insertion and
a component for gap extension, where the extension penalty is usually much
lower than the insertion penalty. This mimics biological reality as multiple
gaps would imply multiple mutations, but a single mutation can lead to a long
gap quite easily.
Gap penalties
The penalty applied to a similarity score for the
introduction of an insertion or deletion gap, the extension of a gap, or both.
Gap penalties are usually subtracted from a cumulative score being determined
for the comparison of two or more sequences via an optimization algorithm that
attempts to maximize that score.
Gel electrophoresis
A technique by which molecules are separated by size or
charge by passing them through a gel under the influence of an external
electric field.
Gene Index
A listing of the number, type, label and sequence of all the
genes identified within the genome of a given organism. Gene indices are
usually created by assembling overlapping EST sequences into clusters, and then
determining if each cluster corresponds to a unique gene. Methods by which a
cluster can be identified as representing a unique gene include identification
of long open reading frames (ORFs), comparison to genomic sequence, and
detection of SNPs or other features in the cluster that are known to exist in
the gene.
GenBank
Data bank of genetic sequences operated by a division of the
National Institutes of Health.
Gene
Classically, a unit of inheritance. In practice, a gene is a
segment of DNA on a chromosome that encodes a protein and all the regulatory
sequences (promoter) required to control expression of that protein.
Gene chips (also Gene arrays)
The covalent attachment of oligonucleotides or cDNA directly
onto a small glass or silicon chip in organized arrays. Over 50,000 different
DNA fragments can be presented on a single chip providing a high throughput
parallel method of probing gene expression, genotype or gene function.
Gene expression
The conversion of information from gene to protein via
transcription and translation.
Gene families
Subsets of genes containing homologous sequences which
usually correlate with a common function.
Gene library
A collection of cloned DNA fragments created by restriction
endonuclease digestion that represent part or all of an organism's genome.
Gene product
The product, either RNA or protein, that results from
expression of a gene. The amount of gene product reflects the activity of the
gene.
Gene therapy
The use of genetic material for therapeutic purposes. The
therapeutic gene is typically delivered using recombinant virus or liposome
based delivery systems.
Genetic code
The mapping of all possible codons into the 20 amino acids
including the start and stop codons.
Genetic engineering (Recombinant DNA technology)
The procedures used to isolate, splice and manipulate DNA
outside the cell. Genetic Engineering allows a recombinantly engineered DNA
segment to be introduced into a foreign cell or organism, and be able to
replicate and function normally.
Genetic marker
Any gene that can be readily recognized by its phenotypic
effect, and which can be used as a marker for a cell, chromosome, or individual
carrying that gene. Also, any detectable polymorphism used to identify a
specific gene.
Genome
The complete genetic content of an organism.
Genomic DNA (sequence)
DNA sequence typically obtained from mammalian or other
higher-order species, which includes both intron and exon sequence (coding
sequence), as well as non-coding regulatory sequences such as promoter, and
enhancer sequences.
Genomics
The analysis of the entire genome of a chosen organism.
Genotype
Strictly, all of the genes possessed by an individual. In
practice, the particular alleles present in a specific genetic locus.
Glycosylation
The addition of carbohydrate groups (sugars) e.g. to
polypeptide chains
Haploid
A cell or organism containing only one set of chromsomes
without the homologous pairs. (cf. diploid)
Heterodimer
Protein composed of 2 different chains or subunits .
Heteroduplex
Hybrid structure formed by the annealing of two DNA strands
(or an RNA and DNA) that have sufficient complementarity in their sequence to
allow hydrogen bonding.
Hidden Markov model (HMM)
A joint statistical model for an ordered sequence of
variables. The result of stochastically
perturbing the variables in a Markov chain (the original variables are thus
"hidden"), where the Markov chain has discrete variables which select
the "state" of the HMM at each step. The perturbed values can be
continuous and are the "outputs" of the HMM. A Hidden Markov Model is
equivalently a coupled mixture model where the joint distribution over states
is a Markov chain. Hidden Markov models are valuable in bioinformatics because
they allow a search or alignment algorithm to be trained using unaligned or
unweighted input sequences; and because they allow position-dependent scoring
parameters such as gap penalties, thus more accurately modeling the
consequences of evolutionary events on sequence families.
High-throughput screening
The method by which very large numbers of compounds are
screened against a putative drug target in either cell-free or whole-cell
assays. Typically, these screenings are carried out in 96 well plates using
automated, robotic station based technologies or in higher- density array
("chip") formats.
HLA complex
Another name for the MHC in humans; refers to the
"Human Leukocyte Antigen" complex located on chromosome 6.
Homeobox
A highly conserved region in a homeotic gene composed of 180
bases (60 amino acids) that specifies a protein domain (the homeodomain) that
serves as a master genetic regulatory element in cell differentiation during
development in species as diverse as worms, fruitflies, and humans.
Homeodomain
A 60 amino-acid protein domain coded for by the homeobox
region of a homeotic gene.
Homeotic gene
A gene that controls the activity of other genes involved in
the development of a body plan. Homeotic genes have been found in organisms
ranging from plants to humans.
Homology
(strict) Two or more biological species, systems or
molecules that share a common evolutionary ancestor. (general) Two or more gene
or protein sequences that share a significant degree of similarity, typically
measured by the amount of identity (in the case of DNA), or conservative
replacements (in the case of protein), that they register along their lengths.
Sequence "homology" searches are typically performed with a query DNA
or protein sequence to identify known genes or gene products that share
significant similarity and hence might inform on the ancestry, heritage and
possible function of the query gene.
Housekeeping genes
Genes that are always expressed (ie. they are said to be
constitutively expressed) due to their constant requirement by the cell.
Hybridization
The interaction of complementary nucleic acid strands. This
can occur between two DNA strands or between DNA and RNA strands, and is the
basis of many techniques such as Southern and northern blots.
Hydrogen bond
A weak chemical interaction between an electronegative atom
(e.g. nitrogen or oxygen) and a hydrogen atom that is covalently attached to
another atom. This bond maintains the two-helices of DNA together and is also
the primary interaction between water molecules.
Hydrophilicity
(lit. water-loving) The degree to which a molecule is
soluble in water. Hydrophilicity depends to a large degree on the charge and
polarizability of the molecule and its ability to form transient hydrogen-bonds
with (polar) water molecules.
Hydrophobicity
(lit. water-hating) The degree to which a molecule is
insoluble in water, and hence is soluble in lipids. If a molecule lacking polar
groups is placed in water, it will be entropically driven to finding a
hyrdophobic environment (such as the interior of a protein or a membrane).
Idiotype
Antibody variants localized to the variable portion of an
immunoglobulin that are recognised by their antigenic determinants. The
determinants are composed from the antigen-combining site or CDRs. Every unique
antigenic determinant has a specific antibody with its own unique idiotype.
Immunoglobulin
A member of the globulin protein family consisting of two
light and two heavy chains linked by disulfide bonds. All antibodies are
immunoglobulins.
in silico (biology)
(Lit. computer mediated). The use of computers to simulate,
process, or analyse a biological experiment.
in situ hybridization
A variation of the DNA/RNA hybridization procedure in which
the denatured DNA is in place in the cell and is then challenged with RNA or
DNA extracted from another source. (See also fluorescence in situ
hybridization).
Integration
The physical insertion of DNA into the host cell genome. The
process is used by retroviruses where a specific enzyme catalyses the process
or can occur at random sites with other DNA (eg. transposons).
Intracellular signalling
The communication of a molecular message from the surface of
the cell to the nucleus via the participation of a series of molecules,
including receptors, enzymes, proteins, and small-molecules. The end result of
the signalling process is the up- or down-regulation of a particular series of
genes that may be involved in cell growth, division or differentiation.
Introns
Nucleotide sequences found in the structural genes of
eukaryotes that are non-coding and interrupt the sequences containing
information that codes for polypeptide chains. Intron sequences are spliced out
of their RNA transcripts before maturation and protein synthesis. (cf. Exons)
Isoschizomers
Two different restriction enzymes which recognize and cut
DNA at the same recognition site. e.g Sma I and Xma I both recognize and cut
the sequence CCCGGG.
Isozymes
Two or more enzymes capable of catalyzing the same reaction
but varying in their specificity due to differences in their structures and
hence their efficiencies under different environmental conditions.
Iteration
A series of steps in an algorithm whereby the processing of
data is performed repetitively until the result exceeds a particular threshold.
Iteration is often used in multiple sequence alignments whereby each set of
pairwise alignments are compared with every other, starting with the most
similar pairs and progressing to the least similar, until there are no longer
any sequence-pairs remaining to be aligned.
Junk DNA
Term used to describe the excess DNA that is present in the
genome beyond that required to encode proteins. A misleading term since these
regions are likely to be involved in gene regulation, and other as yet
unidentified functions.
Karyotype
The constitution (typically number and size) of chromosomes
in a cell or individual.
Knockout mice (gene targeting)
Mice which have been engineered to lack a chosen gene. The
gene is inactivated in so called embryonic stem cells using the technique of
homologous recombination. These cells are then introduced into a early stage
embryo (blastocyst) and this is then transplanted into a recipient mouse. The subsequent
progeny lack the targeted gene in some cells. This technique is used to
determine the function of the chosen gene.
"Lab on a chip"
Term describing microdevices that allow rapid,
microanalytical analysis of DNA or protein in a single, fully integrated
system. Typically, these devices are miniature surfaces, made of silicon, glass
or plastic, which carry the necessary microdevices (pumps, valves, microfluidic
controllers, and detectors) that allow sample separation and analysis. These
devices are used in drug discovery, genetic testing and separation science.
Lead compound
A candidate compound identified as the best "hit"
(tight binder) after screening of a combinatorial (or other) compound library,
that is then taken into further rounds of screening to determine its
suitability as a drug.
Lead optimization
The process of converting a putative lead compound
("hit") into a therapeutic drug with maximal activity and minimal
side affects, typically using a combination of computer-based drug design, medicinal
chemistry and pharmacology.
Leucine zipper
Protein motif which binds DNA in which 4-5 Leucines are
found at 7 amino acid intervals. This motif is present typically in
transcription factors and other proteins that bind DNA.
Lexicon
In Bioinformatics, a lexicon refers to a pre-defined list of
terms that together completely define the contents of a particular database.
Library
A large collection of compounds, peptides, cDNAs or genes
which may be screened in order to isolate cognate molecules.
Ligand
Any small molecule that binds to a protein or receptor; the
cognate partner of many cellular proteins, enzymes, and receptors.
Linkage
The association of genes (or genetic loci) on the same
chromosome. Genes that are linked together tend to be transmitted together.
Linkage map
A genetic map of a chromosome or genome delineated by
mapping the positions of genes to their chromosomes by their linkage to readily
identifiable genetic loci.
Locus
The specific position occupied by a gene on a chromosome. At
a given locus, any one of the variant forms of a gene may be present. The
variants are said to be alleles of that gene.
Map unit
A measure of genetic distance between two linked genes that
corresponds to a recombination frequency of 1%.
Markov chain
Any multivariate probability density whose independence
diagram is a chain.The variables are ordered, and each variable
"depends" only on its neighbors in the sense of being conditionally
independent of the others. Markov chains
are an integral component of hidden Markov models.
Meiosis
A process within the cell nucleus that results in the
reduction of the chromosome number from diploid (two copies of each chromosome)
to haploid (a single copy) through two reductive divisions in germ cells.
Melting (of DNA)
The denaturation of double-stranded DNA into two single
strands by the application of heat. (Denaturation breaks the hydrogen bonds
holding the double-stranded DNA together).
Messenger RNA (mRNA)
The complementary RNA copy of DNA formed from a
single-stranded DNA template during transcription that migrates from the
nucleus to the cytoplasm where it is processed into a sequence carrying the
information to code for a polypeptide domain.
Methylation
The addition of -CH3 (methyl) groups to a target site.
Typically such addition occurs on to the cytosine bases of DNA. (see maternal
imprinting).
Microarray
A 2D array, typically on a glass, filter, or silicon wafer,
upon which genes or gene fragments are deposited or synthesized in a
predetermined spatial order allowing them to be made available as probes in a
high-throughput, parallel manner.
Microfluidics
The miniaturization of chemical reactions or pharmacalogical
assays into microscopic tubes or vessels in order to greatly increase their
throughput, by placing many of them side-by-side in an array.
Mimetics
Compounds that mimic the function of other molecules via
their high degree of structural (conformational) similarity, and hence
physio-chemical properties.
Missense mutation
A point mutation in which one codon (triplet of bases) is
changed into another designating a different amino acid.
Mitosis
The nuclear division that results in the replication of the
genetic material and its redistribution into each of the daughter cells during
cell division.
Modeling
In bioinformatics, modeling usually refers to molecular
modeling, a process whereby the three-dimensional architecture of biological
molecules is interpreted (or predicted), visually represented, and manipulated
in order to determine their molecular properties. (general) A series of
mathematical equations or procedures which simulate a real-life process, given
a set of assumptions, boundary parameters, and initial conditions.
Monomer
A single unit of any biological molecule or macromolecule,
such as an amino acid, nucleic acid, polypeptide domain, or protein.
Monovalent
Having one binding site; strictly, an atom with only one
free electron available for binding in its highest energy shell.
Motif
A conserved element of a protein sequence alignment that usually
correlates with a particular function. Motifs are generated from a local
multiple protein sequence alignment corresponding to a region whose function or
structure is known. It is sufficient that it is conserved, and is hence likely
to be predictive of any subsequent occurrence of such a structural/functional
region in any other novel protein sequence.
Multigene family
A set of genes derived by duplication of an ancestral gene,
followed by independent mutational events resulting in a series of independent
genes either clustered together on a chromosome or dispersed throughout the
genome.
Multiple (sequence) alignment
A Multiple Alignment of k sequences is a rectangular array,
consisting of characters taken from the alphabet A , that satisfies the following
conditions: There are exactly k rows; ignoring the gap character, row number i
is exactly the sequence s I ; and each column contains at least one character
different from "-". In practice multiple sequence alignments include
a cost/weight function, that defines the penalty for the insertion of gaps (the
"-" character) and weights identities and conservative substitutions
accordingly. Multiple alignment algorithms attempt to create the optimal
alignment defined as the one with the lowest cost/weight score.
Multiplex sequencing
Approach to high-throughput sequencing that uses several
pooled DNA samples run through gels simultaneously and then separated and
analyzed.
Mutagen
Any agent that can cause an increase in the rate of
mutations in an organism.
Mutation
An inheritable alteration to the genome that includes
genetic (point or single base) changes, or larger scale alterations such as
chromosomal deletions or rearrangements.
Naked DNA
Pure, isolated DNA devoid of any proteins that may bind to
it.
NCEs (New Chemical Entity)
Compounds identified as potential drugs that are sent from
research and development into clinical trials to determine their suitability .
Nested PCR
The second round amplification of an already PCR-amplified
sequence using a new pair of primers which are internal to the original
primers. Typically done when a single PCR reaction generates insufficient
amounts of product.
Neural net
A neural net is an interconnected assembly of simple
processing elements, units or nodes, whose functionality is loosely based on
the animal brain. The processing ability of the network is stored in the
inter-unit connection strengths, or weights, obtained by a process of
adaptation to, or learning from, a set of training patterns. Neural nets are
used in bioinformatics to map data and make predictions, such as taking a
multiple alignment of a protein family as a training set in order to identify
novel members of the family from their sequence data alone.
Nonsense mutation
A point mutation in which a codon specific for an amino-acid
is converted into a nonsense codon.
Northern blotting
A technique to identify RNA molecules by hybridization that
is analogous to Southern blotting (see Southern blotting).
Nuclease
Any enzyme that can cleave the phosphodiester bonds of
nucleic acid backbones.
Nucleoside
A five-carbon sugar covalently attached to a nitrogen base.
Nucleotide
A nucleic acid unit composed of a five carbon sugar joined
to a phosphate group and a nitrogen base.
Object-Relational Database
Object databases combine the elements of object orientation
and object-oriented programming languages with database capabilities. They
provide more than persistent storage of programming language objects. Object
databases extend the functionality of object programming languages (e.g., C++,
Smalltalk, or Java) to provide full-featured database programming capability.
The result is a high level of congruence between the data model for the
application and the data model of the database.
Object-relational databases are used in Bioinformatics to map molecular
biological objects (such as sequences, structures, maps and pathways) to their
underlying representations (typically within the rows and columns of relational
database tables.) This enables the user to deal with the biological objects in
a more intuitive manner, as they would in the laboratory, without having to
worry about the underlying data model of their representation.
Oligonucleotide
A short molecule consisting of several linked nucleotides
(typically between 10 and 60) covalently attached by phosphodiester bonds.
Open reading frame (ORF)
Any stretch of DNA that potentially encodes a protein. Open
reading frames start with a start codon, and end with a termination codon. No
termination codons may be present internally. The identification of an ORF is
the first indication that a segment of DNA may be part of a functional gene.
Operator
A segment of DNA that interacts with the products of
regulatory genes and facilitates the transcription of one or more structural
genes.
Operon
A unit of transcription consisting of one or more structural
genes, an operator, and a promoter.
Ortholog
Orthologs are genes in different species that evolved from a
common ancestral gene by speciation. Normally, orthologs retain the same
function in the course of evolution. Identification of orthologs is critical
for reliable prediction of gene function in newly sequenced genomes. (See also
Paralogs.)
Overlapping clones
Collection of cloned sequences made by generating randomly
overlapping DNA fragments with infrequently cutting restriction enzymes.
Palindrome
A region of DNA with a symmetrical arrangement of bases
occuring about a single point such that the base sequences on either side of
that point are identical (if the strands are both read in the same direction)
e.g 5' GAATTC 3' whose complementary sequence is 3' CTTAAG 5'.
Pattern
Molecular biological patterns usually occur at the level of
the characters making up the gene or protein sequence. A pattern language must
be defined in order to apply different criteria to different positions of a
sequence. In order to have position-specific comparison done by a computer, a
pattern-matching algorithm must allow alternative residues at a given position,
repetitions of a residue, exclusion of alternative residues, weighting, and
ideally, combinatorial representation.
Pathways
Bioinformatics strives to define representations of key
biological datatypes, algorithms and inference procedures, including sequences,
structures, biological pathways and reactions. Representing and computing with
biological pathways requires ontologies for representing pathway knowledge;
User interfaces to these databases; Physico-chemical properties of enzymes and
their substrates in pathways; And pathway analysis of whole genomes including
identifying common patterns across species and species differences.
Paralog
Paralogs are genes related by duplication within a genome.
Orthologs retain the same function in the course of evolution, whereas paralogs
evolve new functions, even if these are related to the original one.
Peptide bond (amide bond)
A covalent bond formed between two amino acids when the
amino group of one is linked to the carboxy group of another (resulting in the
elimination of one water molecule).
Pharmacogenomics
The use of (DNA-based) genotyping in order to target
pharmaceutical agents to specific patient populations. Genetic differences are
known to affect responses to many types of drug therapy, and pharmacogenomics
analysis serves to customize the use of pharmaceuticals for specific subgroups
of patients.The rationale for this approach is that observed gene expression
differences may correlate with, and explain, the differences in side effects
and efficacy to drugs in humans.
Pharmacophore
The three dimensional spatial arrangment of atoms,
substituents, functional groups, or chemical features that together are
sufficient to describe the pharmacologically active components of a drug
molecule or molecule series.
Phenotype
Any observable feature of an organism that is the result of
one or more genes.
Phylum
The segmentation of the animal kingdom into about 30 major
groups collectively known as phyla. The members of each phylum share the same
basic structure and organization. For instance, fish, birds, and human beings
belong to one phylum - the Chordata - because all have spinal cords.
Physical map
A physical map consists of a linearly ordered set of DNA
fragments encompassing the genome or region of interest. Physical maps are of
two types, macro-restriction maps and ordered clone maps. The former consists
of an ordered set of large DNA fragments generated by using restriction enzymes
whose recognition sequences are infrequently represented in the genome. An
ordered clone map consists of an overlapping collection of cloned DNA
fragments. The DNA may be cloned into any one of the available vector
systems--YACs, cosmids, phage, or even plasmids. Major advantages of ordered
clone
maps are that they are of high resolution and directly
provide the clones for further study.
Plasmid
Any replicating DNA element that can exist in the cell
independently of the chromosomes. Synthetic plasmids are used for DNA cloning.
Most commonly found in bacterial cells.
Pleitropy
The multiple effects on an organism's phenotype due to a
single gene or allele e.g the cytokines which can bind to multiple cellular
receptors and effect growth and multiple immune pathways.
Point mutation
A mutation in which a single nucleotide in a DNA sequence is
substituted by another nucleotide.
Poly(A) tail
The stretch of Adenine (A) residues at the 3' end of
eukaryotic mRNA that is added to the pre-mRNA as it is processed, before its
transport from the nucleus to the cytoplasm and subsequent translation at the
ribosome.
Polyadenylation site
A site on the 3'-end of messenger RNA (mRNA) that signals
the addition of a series of Adenines during the RNA processing step and before
the mRNA migrates to the cytoplasm.
These so-called poly(A) "tails" increase mRNA stability
andallow one to isolate mRNA from cells by PCR-amplification using poly(T)
primers.
Polygenic inheritance
Inheritance involving alleles at many genetic loci.
Polymerase chain reaction (PCR )
Technique used to amplify or generate large amounts of
replica DNA of a segment of any DNA whose "flanking" sequences are
known. Oligonucleotide primers which bind these flanking sequences are used by
an enzyme (Taq polymerase) to copy the sequence in between the primers. Cycles
of heat to break apart the DNA strands, cooling to allow the primers to bind,
and heating again to allow the enzyme to copy the intervening sequence lead to
a doubling of DNA at each cycle. The reactions are typically carried out on a
regulated heating block and consist of 30-35 cycles of repeated amplification
of all the DNA present. Single molecules of "target" DNA can be
amplified to microgram amounts of DNA. The target DNA can be of any origin.
Polymorphism
(lit. many forms) The existence of a gene in a population in
at least two different forms at a frequency far higher than that attributable
to recurrent mutation alone. Variations in a population may be measured by
determining the rate of mutation in polymorphic genes.
Polypeptide
A single chain of covalently attached amino acids joined by
peptide bonds. Polypeptide chains usually fold into a compact, stable form (a
domain) that is part (or all) of the final protein.
Positional cloning
Method used to define the location of a gene on a chromosome
and use this information to identify and clone the gene. The location of the
gene is determined by linkage analysis of DNA from a large family containing
afflicted and normal members to identify linkages between the transmission of
the disease gene and observable genetic markers. This information is then used
to screen (by chromosomal jumping and walking) the location for putative genes.
The disease gene must be compared between the afflicted and normal family
members and be shown to be different in the two groups. The full sequencing of
the gene will then provide information regarding the characteristics and
function of the gene product, and a potential explanation for the cause of the
disease.
Post-transcriptional modification
Alterations made to pre-mRNA before it leaves the nucleus
and becomes mature mRNA.
Post-translational modification
Alterations made to a protein after its synthesis at the
ribosome. These modifications, such as the addition of carbohydrate or fatty
acid chains, may be critical to the function of the protein.
Primary sequence (protein)
The linear sequence of a polypeptide or protein.
Primary structure (protein)
Primer
A short oligonucleotide that provides a free 3' hydroxyl for
DNA or RNA synthesis by the appropriate polymerase (DNA polymerase or RNA
polymerase).
Probe
Any biochemical that is labelled or tagged in some way so
that it can be used to identify or isolate a gene, RNA, or protein.
Promoter (site)
A promoter site is defined by its recognition by eukaryotic
RNA polymerase II; its activity in a higher eukaryote; by experimentally evidence,
or homology and sufficient similarity to an experimentally defined promoter;
and by observed biological function.
Protein families
Sets of proteins that share a common evolutionary origin
reflected by their relatedness in function which is usually reflected by
similarities in sequence, or in primary, secondary or tertiary structure.
Subsets of proteins with related structure and function.
Proteome
The entire protein complement of a given organism.
Proteomics
The study of the proteome. Typically, the cataloging of all
the expressed proteins in a particular cell or tissue type, obtained by
identifying the proteins from cell extracts using a combination of 2D gel
electrophoresis and mass spectrometry. The large scale analysis of the protein
composition and function. (cf genomics)
Query (sequence)
A DNA, RNA of protein sequence used to search a sequence
database in order to identify close or remote family members (homologs) of
known function, or sequences with similar active sites or regions (analogs),
from whom the function of the query may be deduced.
Rational drug design (Structure based drug design)
The development of drugs based on the 3-dimensional
molecular structure of a particular target.
A sequence of codons beginning with an intiation codon and
ending with a termination codon, typically of at least 150 bases (50 amino
acids) coding for a polypeptide or protein chain (see ORF and URF).
Recessive
Any trait that is expressed phenotypically only when present
on both alleles of a gene (cf dominant).
Recombinant DNA (rDNA)
DNA molecules resulting from the fusion of DNA from
different sources. The technology employed for splicing DNA from different
sources and for amplifying the resultant heterogenous DNA.
Recombination
A new combination of alleles resulting from the
rearrangement occuring by crossing-over or by independent assortment (see
crossing over).
Recursion
An algorithmic procedure whereby an algorithm calls on
itself to perform a calculation until the result exceeds a threshold, in which
case the algorithm exits. Recursion is a powerful procedure with which to
process data and is computationally quite efficient.
Regulatory gene
A DNA sequence that functions to control the expression of
other genes by producing a protein that modulates the synthesis of their
products (typically by binding to the gene promoter). (cf. Structural gene).
Relational Database
A database that follows E. F. Codd's 11 rules, a series of
mathematical and logical steps for the organization and systemization of data
into a software system that allows easy retrieval, updating, and expansion. An
RDBMS stores data in a database consisting of one or more tables of rows and
columns. The rows correspond to a record (tuple); the columns correspond to
attributes (fields) in the record. In an RDBMS, a view, defined as a subset of
the database that is the result of the evaluation of a query, is a table.
RDBMSs use Structured Query Language (SQL) for data definition, data
management, and data access and retrieval. Relational and object-relational
databases are used extensively in bioinformatics to store sequence and other
biological data.
Relational Database Management Systems (RDBMS)
A software system that includes a database architecture,
query language, and data loading and updating tools and other ancillary
software that together allow the creation of a relational database application.
Repeats (repeat sequences)
Repeat sequences and approximate repeats occur throughout
the DNA of higher organisms (mammals). For example, the Alu sequences of length
about 300 characters, appear hundreds of thousands of times in Human DNA with
about 87% homology to a consensus Alu string. Some short substrings such as
TATA-boxes, poly-A and (TG)* also appear more often than by chance. Repeat
sequences may also occur within genes, as mutations or alterations to those
genes. Repetitive sequences, especially mobile elements, have many applications
in genetic research. DNA transposons and retroposons are routinely used for
insertional mutagenesis, gene mapping, gene tagging, and gene transfer in
several model systems.
Repetitive elements
Repetitive elements provide important clues about chromosome
dynamics, evolutionary forces, and mechanisms for exchange of genetic
information between organisms The most ubiquitous class of repetitive elements
in the DNA sequence in primate genomes is the Alu family of interspersed
repeats which have arisen in the last 65 million years of evolution Alu repeats
belong to a class of sequences defined as short interspersed elements (SINEs).
Approximately 500,000 Alu SINEs exist within the human genome, representing
about 5% of the genome by mass.
Replication
The synthesis of an informationally identical macromolecule
(e.g. DNA) from a template molecule.
Repressor
The protein product of a regulatory gene that combines with
a specific operator (regulatory DNA sequence) and hence blocks the
transcription of genes in an operon.
Restriction enzyme (restriction endonuclease)
A type of enzyme that recognizes specific DNA sequences
(usually palindromic sequences 4, 6, 8 or 16 base pairs in length) and produces
cuts on both strands of DNA containing those sequences only. The
"molecular scissors" of rDNA technology.
Restriction fragment length polymorphisms (RFLPs)
Variation within the DNA sequences of organisms of a given
species that can be identified by fragmenting the sequences using restriction
enzymes, since the variation lies within the restriction site. RFLPs can be
used to measure the diversity of a gene in a population.
Restriction map
A physical map or depiction of a gene (or genome) derived by
ordering overlapping restriction fragments produced by digestion of the DNA
with a number of restriction enzymes.
Reverse Genetics
The use of protein information to elucidate the genetic
sequence encoding that protein. Used to describe the process of gene isolation
starting with a panel of afflicted patients
Reverse transcriptase
A DNA polymerase that can synthesise a complementary DNA
(cDNA) strand using RNA as a template - a so-called RNA-dependent DNA
polymerase.
Reverse transcriptase-PCR (RT-PCR)
Procedure in which PCR amplification is carried out on DNA
that is first generated by the conversion of mRNA to cDNA using reverse
transcriptase.
Ribonucleic acid (RNA)
A category of nucleic acids in which the component sugar is
ribose and consisting of the four nucleotides Thymidine, Uracil, Guanine, and
Adenine. The three types of RNA are messenger RNA (mRNA), transfer RNA (tRNA)
and ribosomal RNA (rRNA).
Secondary structure (protein)
The organization of the peptide backbone of a protein that
occurs as a result of hydrogen bonds e.g alpha helix, Beta pleated sheet.
Selectivity
Selectivity of bioinformatics similarity search algorithms
is defined as the significance threshold for reporting database sequence
matches. As an example, for BLAST searches, the parameter E is interpreted as
the upper bound on the expected frequency of chance occurrence of a match
within the context of the entire database search. E may be thought of as the number of matches
one expects to observe by chance alone during the database search.
Sense strand
The strand of double-stranded DNA that acts as the template
strand for RNA synthesis. Typically only one gene product is produced per gene,
reading from the sense strand only. (Some viruses have open reading frames in
both the sense and the antisense strands).
Sensitivity
Sensitivity of bioinformatics similarity search algorithms
centers around two areas: First, how well can the method detect biologically
meaningful relationships between two related sequences in the presence of
mutations and sequencing errors; Secondly how does the heuristic nature of the
algorithm affect the probability that a matching sequence will not be detected.
At the user's discretion, the speed of most similarity search programs can be
sacrificed in exchange for greater sensitivity - with an emphasis on detecting
lower scoring matches.
Sequence Tagged Site (STS)
A unique sequence from a known chromosomal location that can
be amplified by PCR. STSs act as physical markers for genomic mapping and
cloning.
Sexual PCR (Molecular Diversity)
Sexual PCR is a form of PCR in which similar, but not
identical, DNA sequences are reassembled to obtain novel juxtapositions, simulating
the result of genetic recombination. The result is the creation of an array of
related genes which may possess improved characteristics. By repeated rounds of
recombination, selection and PCR-based amplification vastly improved
gene-products, such as enzymes with greater activity, may be generated and
selected.
Shotgun cloning
The cloning of an entire gene segment or genome by
generating a random set of fragments using restriction endonucleases to create
a gene library that can be subsequently mapped and sequenced to reconstruct the
entire genome.
Similarity (homology) search
Given a newly sequenced gene, there are two main approaches
to the prediction of structure and function from the amino acid sequence.
Homology methods are the most powerful and are based on the detection of
significant extended sequence similarity to a protein of known structure, or of
a sequence pattern characteristic of a protein family. Statistical methods are
less successful but more general and are based on the derivation of structural
preference values for single residues, pairs of residues, short oligopeptides
or short sequence patterns. The transfer of structure/function information to a
potentially homologous protein is straightforward when the sequence similarity
is high and extended in length, but the assessment of the structural
significance of sequence similarity can be difficult when sequence similarity
is weak or restricted to a short region.
Signal sequence (leader sequence)
A short sequence added to the amino-terminal end of a
polypeptide chain that forms an amphipathic helix allowing the nascent
polypeptide to migrate through membranes such as the endoplasmic reticulum or
the cell membrane. It is cleaved from the polypeptide after the protein has
crossed the membrane.
Single nucleotide polymorphisms (SNPs)
Variations of single base pairs scattered throughout the
human genome that serve as measures of the genetic diversity in humans. About 1
million SNPs are estimated to be present in the human genome, and SNPs are useful
markers for gene mapping studies.
Single-pass sequencing
Rapid sequencing of large segments of the genome of an
organism by isolating as many expressed (cDNA) sequences as possible and
performing single sequencer runs on their 5' or 3' ends. Single-pass sequencing
typically results in individual, error-prone sequencing reads of 400-700 bases,
depending on the type of sequencer used. However, if many of these are
generated from numerous clones from different tissues, they may be overlapped
and assembled to remove the errors and generate a contiguous sequence for the
entire expressed gene.
Site
Sites in sequences can be located either in DNA (e.g.
binding sites, cleavage sites) or in proteins. In order to identify a site in
DNA, ambiguity symbols are used to allow several different symbols at one
position. Proteins, however, need a different mechanism (see Pattern).
Restriction enzyme cleavage sites, for instance, have the following
properties: limited length (typically,
less than 20 base pairs); definition of the cleavage site and its appearance
(3', 5' overhang or blunt); definition of the binding site.
Southern blotting
A procedure for the identification of DNA by transmitting a
fragment isolated on an agarose gel to a nitrocellulose filter where it can be
hybridized with a complementary "probe" sequence.
Splice site
The sequence found at the 5' and 3' region of exon/intron
boundaries, usually defined by a consensus sequence:
Intron
5' CAGGTAAGT---------TNCAGG 3'
A G C T
N represents any nucleotide; the bottom line represents
alternative nucleotides at the indicated positions.
Splice form
By using alternative splicing, a single message precursor
from DNA can generate an entire family of mRNAs and proteins. This can be
utilized to create specificity in cell-cell or cell-ligand interactions. A cell
may produce a given protein, but it will be a different splice-form of the
protein than that produced by an adjacent cell. In this manner, the two cells
have the potential to interact differently with other cells or molecules. Two
places where this has been extremely important is in the production of
cell-surface specificity proteins in the immune and nervous systems.
Splicing
The joining together of separate DNA or RNA component parts.
For example, RNA splicing in eukaryotes involves the removal of introns and the
stitching together of the exons from the pre-mRNA transcript before maturation.
Solvent accessibility
The surface area (typically measured in square angstroms) of
a biological molecule, usually a protein, that is exposed to solvent in its
native, folded form. Determining the solvent accessibility of a protein helps
define which amino acids in its molecular sequence are on the exterior of the
molecule, and thus available to participate in interactions with other
molecules.
Structural gene
Gene which encodes a structural protein (cf. Regulatory
gene).
Structure prediction
Algorithms that predict the secondary, tertiary and
sometimes even quarternary structure of proteins from their sequences. Determining protein structure from sequence
has been dubbed "the second half of the Genetic Code" since it is the
folded tertiary structure of a protein that governs how it functions as a gene
product. As yet most structure
prediction methods are only partially successful, and typically work best for
certain well-defined classes of proteins.
Substitution matrix
A model of protein evolution at the sequence level resulting
in the development of a set of widely used substitution matrices. These are
frequently called Dayhoff, MDM (Mutation Data Matrix), BLOSUM or PAM (Percent
Accepted Mutation) matrices. They are derived from global alignments of closely
related sequences. Matrices for greater
evolutionary distances are extrapolated from those for lesser ones.
Subtraction library
A cDNA library that only contains cDNAs uniquely expressed
in a given cell or tissue. e.g T cells and B cells will express many common
RNAs, as well as a very small percentage which will be unique for T cells and B
cells respectively. To make a T cell subtraction library, the cDNA from a T
cell library is hybridized with a vast excess of B cell RNA. The commonly
expressed genes will result in RNA-cDNA hybrids which can be removed (or
subtracted) to leave only T cell specific cDNAs.
Tentative Consensus (TC)
The identification of a sequence from an EST cluster that
represents part or all of a complete gene.
TCs are usually determined by clustering ESTs allowing for sequencing
errors, artefacts such as chimeric clones, and naturally occuring biological
phenomena such as alternative splicing.
Creation of a cluster allows one to generate a consensus sequence and
then identify a long open reading frame which would suggest the possibility of
that consensus representing abona fide gene.
Tentative Human Consensus sequences (THCs)
A consensus sequence generated from human EST fragments.
THCs may be validated by comparison against databases of known human gene
sequences, human genomic sequences, or by identification of the ORFs or other
sequence features contained within the consensus as belonging to a known human
gene product.
Tertiary structure
Folding of a protein chain via interactions of its
sideschain molecules including formation of disulphide bonds between cysteine
residues.
Transcript
The single-stranded mRNA chain that is assembled from a gene
template.
Transcription
The assembly of complementary single-stranded RNA on a DNA
template.
Transcription factors
A group of regulatory proteins that are required for
transcription in eukaryotes. Transcription factors bind to the promoter region
of a gene and facilitate transcription by RNA polymerase.
Transfer RNA (tRNA)
A small RNA molecule that recognizes a specific amino acid,
transports it to a specific codon in the mRNA, and positions it properly in the
nascent polypeptide chain.
Transformation
A genetic alteration to a cell as a result of the
incorporation of DNA from a genetically diferent cell or virus; can also refer
to the introduction of DNA into bacterial cells for genetic manipulation.
Transgene
A foreign gene that is introduced into a cell or whole
organism (eg.transgenic mice) for therapeutic or experimental purposes.
Translation
The process of converting RNA to protein by the assembly of
a polypeptide chain from an mRNA molecule at the ribosome.
Transmembrane region
The region of a transmembrane protein that actually spans
the membrane. Transmembrane regions are
usually hydrophobic in order to be thermodynamically compatible with the lipid
bilayer portion of the membrane. They
may consist of either alpha-helical or beta-strand secondary structure
elements, but in either case the external residues (the ones facing the
membrane) are invariably hydrophobic while the internal residues may be
hydrophilic (as in the case of a pore or channel) or polar. One common transmembrane structural domain is
the seven-helix bundle seen in numerous channel proteins.
Tentative Consensus (TC)
The identification of a sequence from an EST cluster that
represents part or all of a complete gene.
TCs are usually determined by clustering ESTs allowing for sequencing
errors, artefacts such as chimeric clones, and naturally occuring biological
phenomena such as alternative splicing.
Creation of a cluster allows one to generate a consensus sequence and
then identify a long open reading frame which would suggest the possibility of
that consensus representing abona fide gene.
Tentative Human Consensus sequences (THCs)
A consensus sequence generated from human EST fragments.
THCs may be validated by comparison against databases of known human gene
sequences, human genomic sequences, or by identification of the ORFs or other
sequence features contained within the consensus as belonging to a known human
gene product.
Tertiary structure
Folding of a protein chain via interactions of its
sideschain molecules including formation of disulphide bonds between cysteine
residues.
Transgene
A foreign gene that is introduced into a cell or whole
organism (eg.transgenic mice) for therapeutic or experimental purposes.
Translation
The process of converting RNA to protein by the assembly of
a polypeptide chain from an mRNA molecule at the ribosome.
Unidentified reading frame (URF)
An open reading frame encoding a protein of undefined
function .
Variable numbers of tandem repeats (VNTRs)
DNA sequence blocks of 2-60 base pairs which are repeated
from two to more than 20 times in different individuals. This polymorphism
makes VNTRs very useful DNA markers used in genomic mapping, linkage analysis
and also DNA fingerprinting.
Variation (genetic)
Variation in genetic sequences and the detection of DNA
sequence variants genome-wide allow studies relating the distribution of
sequence variation to a population history. This in turn allows one to
determine the density of SNPS or other markers needed for gene mapping
studies. Quantitation of these
variations together with analytical tools for studying sequence variation also
relate genetic variations to phenotype.
Vector
Any agent that transfers material (typically DNA) from one
host to another. Typically DNA vectors are autonomous DNA elements (such as
plasmids) that can be manipulated and integrated into a host's DNA or
recombinant viruses.
Virtual libraries
The creation and storage of vast collections of molecular
structures in an electronic database. These databases may be queried for
subsets that exhibit specific physicochemical features, or may be
"virtually screened" for their ability to bind a drug target. This
process may be performed prior to the synthesis and testing of the molecules
themselves.
Western blot
Technique in which specific antibodies are used to identify
their antigens from a mixture of proteins. Typically, these proteins mixtures
are first separated by electrophoresis and then transfered onto nylon sheets by
electrotransfer. Radiolabeled or enzyme-linked antibodies are incubated with
the sheets and unbound antibodies washed away allowing the position of the
bound antibody to be revealed by autoradiography or color which is formed upon
addition of a substrate.
Wild type
Form of a gene or allele that is considered the
"standard" or most common.
X chromosome
In mammals, the sex chromosome that is found in two copies
in the homogametic sex (female in humans) and one copy in the hererogametic sex
(male in humans).
Yeast 2-hybrid system
A yeast-based method used to simultaneously identify, and
clone the gene for, proteins interacting with a known protein. The basis of
this method is a "transcriptional reporter assay" (see definition) in
which reporter gene expression is dependent on two domains. The first domain is
linked to the known protein. The second domain is genetically linked to a
library. If the library is screened against the known protein the two domains
will interact only if a protein from the library binds the known protein,
resulting in transcription activation of the reporter gene, and a blue color.
The "blue yeast clone" will contain the gene encoding the newly
identified protein.
Z-DNA
A conformation of DNA existing as a left-handed double helix
(the phosphate-sugar backbone forms a left-handed zig-zag course), which may
play a role in gene regulation.
Zinc fingers
A protein motif formed by the interaction of repeated
cysteine and histidine residues with a zinc ion. The spacing of the repeats results
in finger like arrangements of the protein loops formed from the interaction
which interact with DNA. These motifs are typically found in transcription
factors.
Subscribe to:
Posts (Atom)