Molecular biology

If there's one field of biology which is most likely to interest computer people, it would probably be molecular biology. That's because molecular biology is the view of biology in which life seems the most structured.

Unlike closely related fields like biochemistry or genetics, in which living things seem more abstract, molecular biology views living things at, of course, the molecular level, where everything is composed of fundamental things like protons, neutrons, and electrons. Indeed, from the standpoint of molecular biology, most living things start to seem as though they are structured much like a computer. Molecular biology is especially concerned with the structure of DNA and RNA molecules, which are the "software" of life.

Molecular biology is not to be confused with microbiology, which is simply the study of microorganisms like bacteria and viruses.

What makes molecular biology different from genetics (which is also mainly interested in DNA and RNA) is that genetics tends to be predominantly concerned with the effects of gene replication (like heredity), while molecular biology is more concerned with the process of gene replication itself. Indeed, some people maintain that molecular biology is more of an engineering practice than a true science; certainly, molecular biology has little to do with the study of life forms, as it is more interested in the molecules that make up life forms. Perhaps not surprisingly, then, molecular biology is somewhat looked down upon by the "pure" scientists.

For a person entering the field of biology from the computer world, however, the field of molecular biology is likely to be the most attractive. Since this website is more about the science of information than the science of life itself, if there's going to be anything on biology here, molecular biology seems to be the most appropriate field to go with. Thus, herein lies a very brief introduction to some of the essentials of molecular biology. Note that I am not a biologist at all, and so much of the information here is quite liable to be inaccurate, incomplete, or misleading.


Structure of DNA

DNA (deoxyribonucleic acid) is quite possibly the most recognizable symbol of biological science in the world today. It is the classic "double helix" shape, and it contains the information needed to guide the growth of all forms of life that are composed of cells.

The outer edges of the DNA helix are two long strands of sugar often called the backbone of the DNA strand. This backbone is composed of a pentose sugar called 2-deoxyribose.

Between these two strands are "rungs" (they look like the rungs on a ladder), each of which is composed of two molecules of some base. (Recall that a base is the opposite of an acid.) In DNA, there are only four types of bases that can exist within the center of the double helix: Adenine, thymine, guanine, and cytosine. Since these bases all conveniently happen to start with different letters, they are frequently abbreviated A, T, G, and C. The bases which exist in DNA and RNA are called nucleotides.

The chemical formulas for each of these four bases is as follows:

Adenine: C5H5N5
Thymine: C5H6N2O2
Guanine: C5H5N5O
Cytosine: C4H5N3O

Each of these bases can only bond to one other type of base within the DNA structure. Adenine can only bind to thymine (and vice-versa), while guanine can only connect to cytosine.

The rungs of twin base molecules that exist in the middle of the DNA double helix are often called "base pairs". In a single human cell, there are about 6 billion base pairs. (The human genome contains about 3 billion base pairs, and each cell has two copies of the genome, meaning each cell has 6 billion pairs.)

Fundamental DNA operations

To determine the order of bases in a DNA or RNA molecule is called sequencing. There are two sequencing methods in common use today: The chain termination method, developed by Frederick Sanger, is by far the most commonly-used. The other method is the chemical cleavage method.

Just as in computer usage, it is important to be able to "cut and paste" portions of DNA. A big part of genetic engineering work is just taking pieces of DNA and joining them onto other pieces of DNA.

To cut DNA at a specific point along its strand, a restriction enzyme is used. This is a chemical which will cut open the DNA chain at a specific combination of base pairs.

Similarly, when you want to join two pieces of DNA together (which is called ligation), you use a ligase enzyme.

To reproduce several copies of a particular stretch of DNA is called amplifying the DNA. Polymerase Chain Reaction (PCR) is a common technique for amplifying DNA.

Amino acids and proteins

Amino acids are a series of organic compounds which are probably of greatest interest for being what proteins are made of.

Although more than 100 amino acids are known, there are only 20 which are of interest in genetics because they can be genetically coded. These amino acids are often called standard amino acids. Each standard amino acid has both a three-letter and a one-letter abbreviation. Here is a complete alphabetical list of the 20 standard amino acids, along with their 3-letter and 1-letter abbreviations:

Alanine Ala A
Arginine Arg R
Asparagine Asn N
Aspartic Acid Asp D
Cysteine Cys C
Glutamic Acid Glu E
Glutamine Gln Q
Glycine Gly G
Histidine His H
Isoleucine Ile I
Leucine Leu L
Lysine Lys K
Methionine Met M
Phenylalanine Phe F
Proline Pro P
Serine Ser S
Threonine Thr T
Tryptophan Trp W
Tyrosine Tyr Y
Valine Val V

A protein is a compound made of amino acids. The word "protein" is often used in everyday speech as if protein were only one uniform substance. This is quite far from the truth; there are many different known proteins.

Proteins are manufactured in the cytoplasm of cells (the region of the cell that lies outside the cell nucleus) by ribosomes, the fundamental structures that manufacture proteins.


RNA (ribonucleic acid) has a structure very similar to that of DNA. It has the same helix shape, but while in DNA the backbone of this helix is 2-deoxyribose, in RNA the backbone is made of ribose.

RNA also has the same base-pair rungs that DNA has, and *almost* the same bases that make up those pairs. The only difference is that in RNA, adenine binds to uracil instead of thymine. The T letter of the DNA alphabet therefore becomes U in the RNA alphabet.

Despite this difference in the alphabets, all RNA is actually encoded by DNA; that is, all RNA molecules receive information on what sequence to form from "parent" DNA molecules.

Although RNA molecules all follow the same general structure, they vary in their function. There are three main functional types of RNA molecules, all of which play different roles in the creation of protein:

1. Messenger RNA (mRNA) contains the information blueprint for protein. mRNA is created in the nucleus of a cell and takes on information from DNA. It then moves outside the nucleus of the cell and uses that information to specify amino acid sequences for proteins.

2. Ribosomal RNA (rRNA) is what ribosomes are mostly made out of. Ribosomes, recall, are the structures in cells that manufacture proteins.

3. Transfer RNA (tRNA) acts as a delivery mechanism. It carries amino acids to ribosomes so that the amino acids can be turned into proteins.


A gene is a segment of a DNA strand that determines something about how a host organism will develop.

One strand of DNA usually contains several genes. Human beings, for example, are estimated to have about 25,000 genes.


A chromosome, structurally, is simply a molecule of DNA combined with a protein. Several chromosomes exist in the nucleus of every cell of living things. The DNA contained in these chromosomes constitutes that living thing's genome. The genome of a living thing is its entire genetic code, from which you could (at least in theory) create an identical clone of that living thing.

A species of living thing can be characterized, in part, by how many chromosomes it has per cell. For example, human beings have 46 chromosomes in every cell of their body (except for sperm and egg cells, which contain only 23 chromosomes).

The Human Genome Project

The Human Genome Project (HGP) was, simply, a project to map out and identify the function of each base pair in the human genome, and thereby understand the purpose of every human gene. Given the approximately 3 billion base pairs in the human genome, this was understandably an enormous project.

The HGP was formally considered to be completed in April 2003, with the researchers claiming to have sequenced the human genome with 99.99% accuracy, a bold claim for so great a task.

The information gained through the HGP has, in a remarkable act of public service and yet another example of the capabilities of the Internet, been made available free to the public in the form of an online database called GenBank. You can access it from the homepage of the U.S. National Center for Biotechnology Information (NCBI).


The term cloning is usually used today to refer to the process of creating a genetically identical organism (typically an animal or a human) from an existing one. However, on a similar but much smaller scale, "cloning" can also mean to create an identical copy of a gene or a cell. I'll focus on cloning genes and cells, since doing so is much more important to the field of molecular biology. (Cloning of whole animals or even humans is not as useful as you might think, because cloning animals isn't as simple as just letting them breed naturally, and the lives of humans are shaped more by their psychology than their DNA. Cloning a sheep was a neat trick, but now that it's been done, scientists are discovering that there isn't much point in doing so.)

To clone a gene, the sequence of DNA you want to clone (remember, a gene is just a particular sequence of a DNA strand) must first be isolated (separated from other materiual). This is done with a restriction enzyme.

Once you have the gene by itself, it is implanted into an existing cell. Then, as the cell multiplies, the gene will multiply along with it. This sounds simple, but there's an added complication: The cell will not multiply unless its DNA actually contains instructions to do so. Since the gene which you are trying to clone will probably not contain instructions to multiply, a vector is used. A vector, in gene cloning, is a piece of DNA which contains instructions to multiply a cell and to which you can attach a gene so that the gene will multiply along with the cell. Typically, a particular type of DNA sequence called a plasmid is used as a vector. The most popular plasmid for gene cloning is called pBR322. It's relatively cheap and readily available, and it multiplies rapidly. In any case, the gene is combined with the vector using a ligase enzyme.

Once the gene you wish to clone has been implanted within the vector, the resultant DNA combination is implanted in a host cell, often a bactera like E. coli, which, like pBR322, is cheap, simple, and ubiquitous. Getting the DNA into the cell isn't as scientific a process as you might think; the DNA is simply stirred into a mixture containing a mass of host cells in the hopes that the DNA will get into one of them. Although heating up the mixture or adding a salt like CaCl2 can make the cells more susceptible to absorbing the DNA, it's still really just a mixing process.

Once the cells have had some time to multiply, you'll have several copies of the gene, one in each cell that spawned from the original cell that absorbed the original gene. Since some non-affected cells will probably still remain in the mix, you'll have to screen for cells that contain the gene you want, but once you find them, you can simply extract the gene from their DNA using, again, a restriction enzyme.

How to change your DNA

There is DNA in every cell of your body. It is therefore quite a large task to try and change all the DNA in your body. Nonetheless, it is possible to do so. The art of changing your DNA hinges on the use of retroviruses.

A retrovirus is a "virus" containing a genome made of RNA which can copy the RNA sequence into a DNA molecule. This is the reverse of what usually happens; normally, genetic information passes only from DNA to RNA, and not vice-versa, which is why such structures are called "retro"viruses--their direction of genetic information travel is the inverse of what is typical. Retroviruses rely on an enzyme called reverse transcriptase to accomplish this feat.

Once the retrovirus' new sequence is implanted in a host's DNA, the new gene sequence will start to spread through the host. This sounds scary, and indeed, retroviruses can potentially be extremely damaging; HIV, the virus which causes AIDS, is a retrovirus. It works by reprogramming your DNA.

However, there is no reason why a retrovirus has to be harmful; it is merely a change in your genes that replicates through your body. This could just as well produce beneficial effects by "fixing" unwanted genes, or adding desired genes.

Back to the main page