I am not a DNA expert so I am confused a little bit about the 35,000 number. What it is exactly? Here is the basics as I understand it.
The Basics
Cells are the fundamental working units of every living system. All the instructions needed to direct their activities are contained within the chemical DNA (deoxyribonucleic acid).
DNA from all organisms is made up of the same chemical and physical components. The DNA sequence is the particular side-by-side arrangement of bases along the DNA strand (e.g., ATTCCGGA). This order spells out the exact instructions required to create a particular organism with its own unique traits.
The
genome is an organism’s complete set of DNA. Genomes vary widely in size: the smallest known genome for a free-living organism (a bacterium) contains about 600,000 DNA base pairs, while human and mouse genomes have some 3 billion. Except for mature red blood cells, all human cells contain a complete genome.
DNA in the human genome is arranged into 24 distinct
chromosomes--physically separate molecules that range in length from about 50 million to 250 million base pairs. A few types of major chromosomal abnormalities, including missing or extra copies or gross breaks and rejoinings (translocations), can be detected by microscopic examination. Most changes in DNA, however, are more subtle and require a closer analysis of the DNA molecule to find perhaps single-base differences.
Each chromosome contains many
genes, the basic physical and functional units of heredity. Genes are specific sequences of bases that encode instructions on how to make proteins. Genes comprise only about 2% of the human genome; the remainder consists of noncoding regions, whose functions may include providing chromosomal structural integrity and regulating where, when, and in what quantity proteins are made. The human genome is estimated to contain 30,000 to 40,000 genes.
Although genes get a lot of attention, it’s the
proteins that perform most life functions and even make up the majority of cellular structures. Proteins are large, complex molecules made up of smaller subunits called amino acids. Chemical properties that distinguish the 20 different amino acids cause the protein chains to fold up into specific three-dimensional structures that define their particular functions in the cell.
The constellation of all proteins in a cell is called its
proteome. Unlike the relatively unchanging genome, the dynamic proteome changes from minute to minute in response to tens of thousands of intra- and extracellular environmental signals. A protein’s chemistry and behavior are specified by the gene sequence and by the number and identities of other proteins made in the same cell at the same time and with which it associates and reacts. Studies to explore protein structure and activities, known as proteomics, will be the focus of much research for decades to come and will help elucidate the molecular basis of health and disease.
So it occurs to me that we have to understand the whole DNA base pairs to create a life form. Is that correct? And knowing just 35,000 genes itself is not enough anymore than having a jet engine without the wings , tail or body to fly? Did I miss something here?
Some other stuff:
By the Numbers
The human genome contains 3164.7 million chemical nucleotide bases (A, C, T, and G).
The average gene consists of 3000 bases, but sizes vary greatly, with the largest known human gene being dystrophin at 2.4 million bases.
The total number of genes is estimated at 30,000 to 40,000, much lower than previous estimates of 80,000 to 140,000 that had been based on extrapolations from gene-rich areas as opposed to a composite of gene-rich and gene-poor areas.
The order of almost all (99.9%) nucleotide bases is exactly the same in all people.
The functions are unknown for more than 50% of discovered genes.
The Wheat from the Chaff
About 2% of the genome encodes instructions for the synthesis of proteins.
Repeated sequences that do not code for proteins (junk DNA) make up at least 50% of the human genome.
Repetitive sequences are thought to have no direct functions, but they shed light on chromosome structure and dynamics. Over time, these repeats reshape the genome by rearranging it, thereby creating entirely new genes or modifying and reshuffling existing genes.
During the past 50 million years, a dramatic decrease seems to have occurred in the rate of accumulation of these repeats.
How It's Arranged
The human genome’s gene-dense “urban centers” are composed predominantly of the DNA building blocks G and C.
In contrast, the gene-poor “deserts” are rich in the DNA building blocks A and T. GC- and AT-rich regions usually can be seen through a microscope as light and dark bands on the chromosomes.
Genes appear to be concentrated in random areas along the genome, with vast expanses of noncoding DNA between.
Stretches of up to 30,000 C and G bases repeating over and over often occur adjacent to gene-rich areas, forming a barrier between the genes and the “junk DNA.” These CpG islands are believed to help regulate gene activity.
Chromosome 1 has the most genes (2968), and the Y chromosome has the fewest (231).
How the Human Genome Compares with Those of Other Organisms
Unlike the human’s seemingly random distribution of gene-rich areas, many other organisms’ genomes are more uniform, with genes evenly spaced throughout.
Humans have on average three times as many kinds of proteins as the fly or worm because of mRNA transcript “alternative splicing” and chemical modifications to the proteins. This process can yield different protein products from the same gene.
Humans share most of the same protein families with worms, flies, and plants, but the number of gene family members has expanded in humans, especially in proteins involved in development and immunity.
The human genome has a much greater portion (50%) of repeat sequences than the mustard weed (11%), the worm (7%), and the fly (3%).
Although humans appear to have stopped accumulating repetitive DNA over 50 million years ago, there seems to be no such decline in rodents. This may account for some of the fundamental differences between hominids and rodents, although estimates of gene numbers are similar in both species. Scientists have proposed many theories to explain evolutionary contrasts between humans and other organisms, including life span, litter sizes, inbreeding, and genetic drift.
Variations and Mutations
Scientists have identified about 1.4 million locations where single-base DNA differences (SNPs, see Goals Box: Sequence Variation) occur in humans. This information promises to revolutionize the processes of finding chromosomal locations for disease-associated sequences and tracing human history.
The ratio of germline (sperm or egg cell) mutations is 2:1 in males vs females. Researchers point to several reasons for the higher mutation rate in the male germline, including the greater number of cell divisions required for sperm formation than for eggs.
Applications, Future Challenges
Deriving meaningful knowledge from the DNA sequence will define research through the coming decades to inform our understanding of biological systems. This enormous task will require the expertise and creativity of tens of thousands of scientists from varied disciplines in both the public and private sectors worldwide.
The draft sequence already is having an impact on finding genes associated with disease. Genes have been pinpointed and associated with numerous diseases and disorders including breast cancer, muscle disease, deafness, and blindness. Additionally, finding the DNA sequences underlying such common diseases as cardiovascular disease, diabetes, arthritis, and cancers is being aided by the human SNP maps generated in the HGP in cooperation with the private sector. These genes and SNPs provide focused targets for the development of effective new therapies.
One of the greatest impacts of having the sequence may well be in enabling an entirely new approach to biological research. In the past, researchers studied one or a few genes at a time. With whole-genome sequences and new automated, high-throughput technologies, they can approach questions systematically and on a grand scale. They can study all the genes in a genome, for example, or all the gene products in a particular tissue or organ or tumor, or how tens of thousands of genes and proteins work together in interconnected networks to orchestrate the chemistry of life.