Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. However, before we focus on methods to build knowledge on biological processes, it is important to ask ourselves the question what life is.
When thinking about science it is crucial, as Bohm (1980) nicely formulates, to acknowledge that scientific theories are not “true knowledge” corresponding to “the reality as is”, but rather ever-changing insights that are giving shape and form to how we view and experience the world.
Timelapse of a growing sunflower (Source: https://www.youtube.com/watch?v=_owPRdbaMdY)
Remarkably, each plant cell contains all information from the seed, i.e. all its DNA as well as its cellular structure. However, as Bohm (1980) points out the seed contains little to nothing from the actual biomass of the plant that it has developed. The seed merely contains the information that is required to transform its environment to grow the plant from solar energy and the chaos of simple molecules CO\(_2\), nitrogen, phosphorous, etc. The plant will eventually produce new seeds, which will spread its information further throughout the environment and will transform it over and over again to grow plants. It thus becomes obvious that we do not need to make the distinction between “inanimated” and “animated” matter. Indeed, if we would stick to this old Cartesian way of thinking we are confronted with confusing questions like when would a so-called “inanimated” carbon atom become animated? As soon as it enters the leaf stomata or when it is assimilated into an organic compound? Does the carbon atom becomes inanimated again when it is released from the plant as it burns the sugars it has assimilated into CO\(_2\) during the night?
Indeed, we can consider that the environment is augmented with the information of the seed, which somehow seems to direct its environment to grow a corresponding plant. So life itself can be regarded as belonging to a totality including plant and the environment. So Bohm (1980) concludes that “we do not need to fragment the whole into life and inanimated matter, nor do we have to try to reduce life completely to nothing but an outcome of the latter”.
The remaining of this chapter builds on the work of five scientists:
In his seminal lecture series: “What Is Life? The Physical Aspect of the Living Cell” Schrödinger (1944) defined life as
an open system that can generate order from chaos by exploiting external energy sources,
with the capacity to transmit its own specific blueprint from generation to generation.
Note, that in his seminal lecture series Schrödinger (1944) also laid out key functions of the molecule that was involved in this specific blueprint and this while DNA was not even discovered.
At first, the property that life can generate order or structure seems to contradict the second law of thermodynamics that states that: a system is always gearing towards maximal entropy.
Entropy can be loosely defined as a physical quantity for the level of how energy is spread out. It is thus the tendency of a system to evolve from a concentrated energy state to an energy state that is more spread out. This can be easily understood from a simple example known to each of us: if you put a hot pot in a large room, the pot will cool down and the temperature in the room will slightly rise until the pot and the room have the same temperature. As a result the concentrated heat energy from the pot is nicely spread out over the entire room.
Schrödinger understood that life was not violating this second law. Indeed, by being an open system, it can interact with the environment, and by eating and breathing there has to be a way of “concentrating order” or maintaining a higher more concentrated energy state.
Prigogine was the first one who provided the theoretical framework for a type of chemistry that is key for life (Prigogine and Stengers 1984). He realized that the chemistry of life is totally different from most chemical systems that were studied up to then. Indeed, the chemistry and processes of life are highly non-linear with many feedback loops and are operated far from equilibrium by continuously pushing matter and energy through it.
We intuitively know the latter: our body is typically at a higher energy state than the corresponding material in a non-living state. We can simply acknowledge that as we are hotter than the room, and a crude way to estimate how long we are death is by measuring the temperature of a corpse. To maintain our higher energy state, to concentrate chemicals in our cells, and, to build complex molecules, cells and tissues from it, we have to keep on eating and breathing.
Prigogine discovered that complex self-organising systems can spontaneously arise if they are open and can exchange lots of energy and matter with their surroundings. Key for it is a chaos of matter and a flow of energy through the system.
Convection currents are for instance an example of a simple physical process generating massive structures. They can be observed when simply heating a shallow layer of fluid or gas from below, which establishes a temperature gradient that lets the fluids rest state to become unstable. Indeed, the warmer fluid at the bottom is lighter and starts to float to the top where it cools down again, whereas the colder fluid at the top is more heavy and sinks to the bottom, where it is heated. Large ensembles of molecules therefore start to exhibit coherent patterns of motion (see Figure @ref(fig:convectionCells)).
The convection motion leads to a spontaneous complex spatial organisation, almost a dance, of billions of molecules moving coherently and forming hexagonal convection cells. Such convection cells can be observed in a small dish of a fluid in a lab, up to the level of the earth’s atmosphere and the solar photosphere (see movie below).
Movie of the solar photosphere observed with the Swedish 1-m Solar Telescope (SST) on La Palma, Spain. The movie shows solar granulation which is a result of convective motions of bubbles of hot gas that rise from the solar interior. When these bubbles reach the surface, the gas cools and flows down again in the darker lanes between the bright cells. In these so-called intergranular lanes, we can also see small bright points and more extended bright elongated structures. These are regions with strong magnetic fields. (Source: Wikipedia, user Luc.rouppe, link: https://en.wikipedia.org/wiki/File:Granulation_Quiet_Sun_SST_25May2017.webm)
Prigogine showed that spontaneous self-organisation also occurs in chemical systems that are operated far from equilibrium by continuously pushing matter and energy through it. These self-organising processes are largely unpredictable and spontaneously arise in our universe. Prigogine and Stengers (1984) further argue that such chemical and physical processes are the basis of the complex self-organisation that we observe in living systems.
In chemical systems the influx of energy allows the generation of structure while producing lots of entropy by dissipation. Dissipation means that energy has been converted from one energy form to another and that it can no longer be converted back in its original form. So dissipation is irreversible, which adds the arrow of time to life. Indeed conversion of energy to heat inherently takes place in all underlying chemical reactions and these cannot be reversed without the addition of new energy. Hence, a large part of the concentrated energy that enters cells through sunlight or chemical energy in food, is spread out in a less concentrated energy form: heat.
The structure that spontaneously emerges in living organisms is thus not violating the second law because of the rise in entropy by the dissipation of heat. Prigogine therefore coined the novel term dissipative structures for such systems.
Dissipative structures typically have multiple attractors, i.e. states, regimes, forms, shapes or structures to which they spontaneously evolve. The specific attractor to which the system is organising itself highly depends on the initial environmental conditions. Note, that dissipative structures are also characterized by feedback loops. These feedback loops allow the system to remain at its current attractor upon changes in the environment, e.g. a kind of homeostasis. However, some environmental stimuli are amplified by the feedback loops and can switch the dissipative structure towards another attractor and thus induce a regime switch.
Prigogine argued that chemistry in cells, cells itself, tissues, organs, organisms, populations of organisms, ecosystems and our globe can be seen as dissipative structures (Prigogine and Stengers 1984). The fruiting body of slime molds are a compelling example of a living system that often changes from attractor, see Figure @ref(fig:Dictyostelium). Dictyostelium slime molds, spend most of their lives as separate single-celled amoeba. But, upon stress one of the amoebas releases a chemical signal cAMP. Others detect this signal, and respond in two ways: the amoeba moves towards the signal and secretes more cAMP effectively boosting the signal. So the signal gets amplified by a feedback loop which eventually triggers the system to switch from their current attractor - a state of free living amoebas - towards a novel attractor - their fruiting body - that will release spores and will spread the slime molds towards novel environments (see e.g. youtube movie https://youtu.be/bkVhLJLG7ug).
Our whole globe can be seen as a large dissipative structure located in our solar system, which is a dissipative zone.
There is an energy source, the sun that is radiating concentrated energy in photons.
Life forms are organising structure from the chaos of molecules on Earth by dissipating energy from the solar photons, light and UV, to heat through their organic pigments, e.g. chlorophyll.
Animals, bacteria and fungi are secondary dissipative structures that are feeding on the concentrated chemical energy in the form of sugar, starch, proteins and fats that are produced by plants and cyanobacteria. They again dissipate energy in the form of heat through respiration.
The heat produced by life is dissipated to water and air and this induces tertiary dissipative processes like the water cycle, wind and sea currents etc.
Eventually heat is radiated to space, which acts as an energy sink.
Because the temperature of Earth remains roughly stable, more or less a similar amount of energy is radiated to space under the form of heat as what comes in under the form of photons from the sun. As the energy content of heat is lower and more disperse it has a higher entropy than that of the incoming sunlight. Because almost no mass is exchanged between Earth and space, mass has to be recycled and life is inherently cyclic.
In his Book “Life Evolving - Molecules, Mind and Meaning”, De Duve (2002) gave a very simple but brilliant definition of life:
Life is
In the following sections we will explore what de Duve meant with each component in his definition.
Life is one because all living organisms on earth
“Life is One” because all organisms are composed of cells. Green algae that can perform photosynthesis are a beautiful example of unicellular organisms. They were key for the development of life on our planet by releasing oxygen in our atmosphere (Figure @ref(fig:greenAlgae)).
A vital component of a cell is its membrane (the outer layer of the cell) that is separating them from their environment while enabling them to interact with it and to concentrate chemicals inside the cell.
Multicellular organisms are composed of multiple cells. The cells are organised in tissues, e.g. spongy tissue in our bones or epithelium cells of our stomach; organs, e.g. bones or our stomach; organ systems, e.g. skeleton or digestive system; upto the organism. Figure @ref(fig:multiCellular) also shows how organisms are further organised in populations of organisms, ecosystems, and eventually our entire Biosphere. So “Life is One” because all living organisms consist of cells that are organised in large networks that work together.
“Life is One” because all species evolved from the same ancestral population of cells. This is also referred to as the Last Universal Common Ancestor (LUCA). It is nicely indicated by the tree of life in Figure @ref(fig:treeOfLife), which is one of the most important organising principles in biology. It shows the evolutionary relationships among different organisms and that all living beings eventually can be traced back to LUCA who is located at the root of the tree. Note, that the animal kingdom to which we belong is only a small branch in the tree.
“Life is One” because all living organisms use the same “energy coin”, the ATP-ADP system, to store and reuse energy. Adinosine-triphosphate (ATP) consists of a ribose sugar with 3 phosphate groups and a base adinine. Splitting a phosphate group from adinosine-triphosphate (ATP) results in adinosine-diphosphate (ADP), a free phosphate group and energy. The other way around, energy can be stored as chemical energy by binding a phosphate group to ADP (see Figure @ref(fig:atp-adp)).
Note, that ATP is also used to build RNA, an important molecule involved in storing, passing and expressing our genetic information. Indeed, ATP is incorporated in RNA upon splitting two phosphate groups. The resulting AMP (adinosine monophosphate) is one of the building blocks of RNA. See Section @ref(sectionNucleicAcids) for more details on RNA. So there is a close link between energy and genetic information!
(ref:atp-adp-cap) Our energy coin ATP-ADP. Adinosine-triphosphate (ATP) consists of a base adinine, a ribose sugar and 3 phosphate groups. Splitting a phosphate group from adinosine-triphosphate (ATP) by a reaction with water (H\(_2\)O) results in adinosine-diphosphate (ADP), a free phosphate group and releases energy. The other way around, energy can be stored as chemical energy by binding a phosphate group to ADP (Source: Adapted from Wikipedia)
###3 Building Blocks of Life
“Life is One” because all living organisms are composed of the same basic bio-molecules and we all know more than we think about them because we are what we eat!
Almost all molecules of living organisms are composed of
Lipids, fats and oils, are used for storage. However, an important class of lipids, the phospholipids, make up membranes of cells and unicellular compartments called organelles see Figure @ref(fig:lipids). Indeed phospholipids have a polar head that likes to be in water and a long apolar tail that does not mix with water. Therefore bilayers of phospholipid molecules are spontaneously formed in aqueous solutions.
Membranes provide the basis for concentrating specific molecules in (compartments of) living cells. This enable cells to build up a disequilibrium of chemical molecules that can perform work as a concentration gradient spontaneously dissipates toward its equilibrium value. So phospholipids are key for creating the boundary conditions necessary for cellular organization.
Carbohydrates are important bio-molecules that are used for storage, energy and structure (Figure @ref(fig:carbohydrates)).
Amino acids by themselves are simple molecules (Figure @ref(fig:aminoAcids)). Although hundreds of amino acids exist in nature, life is using only 20 of them and it is combining them in long molecules, polymers, which are referred to as proteins. Proteins are hetero-polymers, consisting of the 20 amino acids that are arranged in long sequences that differ from protein to protein. So unlike starch, that only consists of glucose molecules that are all identical, proteins are capable of carrying information.
(ref:aminoAcidsCap) Amino acids are simple molecules. They can be combined in long molecules, polymers, also known as proteins. Their long chain of amino acids spontaneously folds into a complex 3D structure from which their biological function emerges (Source: thebiologynotes.com)
The crux of proteins is that their long chain of amino acids spontaneously folds into a complex 3D structure, which is determined by the properties and the specific sequence of its amino acids. From their complex and specific 3D structure their biological function emerges.
Indeed, many proteins work as a “lock” in which specific (bio)molecules fit as a “key”. This enables proteins to bring molecules close together and to promote chemical reactions without being consumed. This process is also referred to as catalysis and proteins that perform catalysis are called enzymes, see Figure @ref(fig:enzyme).
Proteins are the main workhorses of the cell and they are important for moving food, digesting food, copying DNA, for giving the cell structure, affecting the rates at which other proteins work, etc…
Nucleic acids are build of nucleotides. Each nucleotide is composed of one of four nitrogen-containing nucleobases, which carry the information
a phosphate group and a sugar, i.e. ribose in ribonucleic acid (RNA, Figure @ref(fig:RNA)) and deoxyribose in deoxyribonucleic acid (DNA, Figure @ref(fig:DNA)).
These nucleotides are the building blocks that are combined in long polymers: RNA and DNA. DNA and RNA are hetero-polymers and the nucleotides conceptually can be combined in any order. So they can contain information. Indeed, they are key to store and use the genetic information we inherit from our parents.
DNA typically occurs as a double strand, where C and G, and, A and T hybridize to each other using respectively three and two hydrogen bridges to form the iconic double stranded helix structure, see Figure @ref(fig:DNA).
There is a general consensus that life probably began by using RNA as information carrier. DNA, however, is much more stable and lasts longer in water than RNA. Therefore, DNA probably emerged later and is now used to store the genetic information in most organisms.
Life is also one because we share the same genetic code, but we refer to Section @ref(lifeInformation) “Life is Information” for more details.
“Life is Chemistry” because a cell consists of a complex network of chemical reactions that are connected to each other. Many feedback loops exist in this chemistry making it largely nonlinear. The feedback loops enables a cell to maintain in its regime or to switch from regime or attractor upon external and/or internal stimuli. An overview of most reactions in a living cell is given in Figure @ref(fig:chemicalReactionsCell). You can zoom in on this map on http://biochemical-pathways.com/#/map/1.
In the remainder of this section we will introduce energy and catalysis in more detail, and we also give an intriguing example of how proteins play a role in the self-organisation of a cell.
“Life is Chemistry” because it uses chemical reactions to store and use energy. See the ATP-ADP system in Section @ref(sectionEnergyCoin)
“Life is Chemistry” because most chemical reactions are catalyzed, i.e. initiated, promoted and made faster by proteins. The proteins facilitate the reaction without being used. The reactions would never take place if we would only mix the molecules at the concentrations that are typically occurring in a cell.
A catalyst is a chemical substance that helps a reaction to take place without being consumed. Proteins that are catalysts are also referred to as enzymes.
Enzymes are proteins that
Loosely speaking they are
These binding sites emerged from to the unique 3D structure of the protein. In Figure @ref(fig:enzyme) a schematic overview is given of the enzyme action of a protein.
In most cases enzymes work together in pathways, which consist of multiple chemical reactions for which (part of the) molecules produced in the previous reaction are used by another enzyme to facilitate the next reaction.
The Krebs cycle is a well known example. This pathway is the main source of energy for a cell through metabolising carbohydrates, lipids and/or proteins. The Krebs cycle is a cyclic pathway of multiple chemical reactions each catalyzed by another enzyme (see Figure @ref(fig:krebsCycle) or the you tube movie https://www.youtube.com/embed/yk14dOOvwMk).
The Krebs cycle is also used to generate building blocks for constructing certain nucleotides and amino acids.
We can end this section with a quote of De Duve (2002): “Any living organism is a reflection of its enzyme arsenal”.
Life is also characterized by its ability for self-organisation. Some proteins are also important to give structure to a cell and they can spontaneously form structure. An intriguing illustration of self-organisation is provided by Cheng and Ferrell (2019) who homogenized Xenopus laevis egg cell cytoplasma extracts, i.e. the liquid with structures from the cell, and showed that the homogeneous extract spontaneously reorganized itself in cell like structures in a matter of minutes (see Figure @ref(fig:selforganisation) and you tube movie https://www.youtube.com/embed/prq1Occu22s).
(ref:selforganisationCaption) Homogenized Xenopus laevis egg cytoplasmic extracts spontaneously organized into cell-like compartments (Cheng and Ferrell 2019)
Cheng and Ferrell (2019) found that ATP, the energy currency of a cell; microtubuli, a kind of filamentous proteins; and dynein, a kind of motor protein, were required and driving this process of self-organisation.
Note, that proteins thus play a central role in life. They are key for catalysis and giving structure. Hence, a cell thus not only inherits genetic information, but, also its spatial organisation from a mother cell!
Our genetic information to build all our bio-molecules, which we inherit from our parents, is stored in our DNA. DNA is a hetero-polymer or a long chain of 4 different building blocks, nucleotides. So the genetic information is stored in an alphabet of 4 letters. This is adenine (A), cytosine (C), guanine (G) and thymine (T) for DNA.
It makes no physical or chemical difference what the identity of the next nucleotide is in the DNA chain. So our DNA can be seen as a coding system to store and pass genetic information from one generation to the next. Life is therefore also information and not only chemistry. It is absolutely mind blowing how variation in the sequence of this “four letter code” can lead to such a wealth of distinct characteristics and functions that can be observed between different specimens of the same species, and, even more so between different life forms.
In the reminder of this section we will focus on the flow of information in a cell in more detail.
The “central paradigm” of molecular biology states that the sequence of nucleotides in DNA are first transcribed into RNA and then translated into proteins, see Figure @ref(fig:centralParadigm).
Note, that some RNA molecules are also end products. Indeed RNA can also have a catalytic function, i.e. initiate and promote chemical reactions without being consumed.
A gene is the unit of genetic material, a DNA sequence that is encoding for the synthesis of a gene product, either a protein or a functional RNA.
Figure @ref(fig:transcriptionTranslation) shows the process of transcription of a gene from DNA to RNA and the translation of RNA to proteins.
The transcription of DNA to RNA is initiated by opening the DNA double strand. Then a complementary RNA strand is synthesized by exploiting that A hybridizes to U (or T) and G to C through hydrogen bounds.
Once that the complementary RNA strand is made, it is further processed in the nucleus into messenger RNA (mRNA). The mRNA travels from the cell nucleus to the cell cytosol (cell liquid) where it is translated into proteins, and, where the majority of the reactions take place.
There mRNA is translated into proteins by ribosomes. In the ribosomes the mRNA is bound to transfer RNA (tRNA). tRNA can bind to the mRNA molecule if it has 3 consecutive nucleotides that are complementary to the triplet of the 3 consecutive nucleotides on the mRNA template that is lying in the ribosome.
The transfer RNA transports one specific amino-acid that is then incorporated in the protein that is being formed, which is the growing amino acid chain in Figure @ref(fig:transcriptionTranslation). The sequence of 3 consecutive nucleotides of DNA of a gene is therefore also called a codon because it encodes for a specific amino acid.
Upon incorporation, the ribosome shifts to the next triplet of the mRNA and the next transfer RNA is bound to it, and so on until a stop codon is reached and the protein is finished.
(ref:transcriptionTranslationCaption) In the cell nucleus the DNA strand opens up and is transcribed into an RNA molecule. Upon processing, the messenger RNA (mRNA) travels from the nucleus to the cytosol where it is translated into proteins. The mRNA is the template that fits into a ribosome that has the function to bind transfer RNA molecules to the mRNA template. It does that by hybridising to a tRNA, which has a triplet of three nucleotides that are complementary to those of the mRNA template. The tRNAs have a amino acid on their tail, which is incorporated into a long chain of amino acids, the protein that is produced. If the amino acid is incorporated, the ribosome moves to the next triplet and the process happens all over with a new tRNA (Source: tokresources.org)
There are in total 64=4\(^3\) codons encoding for each of the 20 amino acids, and, for a start and a stop codon to initiate and stop the protein translation, respectively. Hence, there is a redundancy in the code, see @ref(fig:codonTable).
We can learn an important message from the codon table:
There is no chemical necessity that explicitly connects the three nucleotides “CGG” to the amino acid arginine rather than to glutamine.
Nucleotides themselves do not seem to a have chemical connection to the amino acids they encode
Therefore we call it a code or information rather than just “genetic chemistry”
The code apparently evolved so that many mutations give rise to
so that protein function is conserved.
DNA is thus the carrier of genetic information. However, RNA plays a more central role:
Capra and Luisi (2014) write in their book “The Systems View of Life”, that the “term autopoeisis was coined by Varela and Maturana in the 1970s. Auto, of course, means self and refers to the autonomy of self-organising systems; and poiesis (which shares the same Greek root as the word poetry) means making. So, autopoiesis means self-making”.
They continue stating that “the main characteristic of life is self-maintenance due to internal networking of a chemical system that continuously reproduces itself within the boundary of its own making”.
Indeed, a living cell is the smallest autopoietic unit. The cell consists of a complex internal network of chemical reactions and this within the boundary of its outer membrane. It can organize itself, maintain itself, can make copies of itself and this with the enzymatic machinery that is present in the cell itself.
Networks of cells are then combined in larger autopoietic units, e.g.
An organism or a biological system can therefore not be broken down or reduced to its parts to provide a complete understanding of it. A complete understanding is only possible when it is viewed as a whole. Indeed, new functions emerge at a higher level from the network that has been formed between its lower-level parts. Varela refers to these properties with the term emergent properties.
Emergent properties can be found at each level. At the chemical level, where the protein hemoglobin for instance has the unique property that it can transport oxygen. This property emerges from its unique 3D structure (see Figure @ref(fig:hemoglobin)). We could never have derived its biological function at the protein-level by breaking it down to its constituting amino acids and only studying the individual properties of each of these amino acids (see section @ref(sectionAminoAcids) for more details on amino acids).
So hemoglobins’ property to transport oxygen naturally emerges at the protein level upon combining many simple amino acid molecules in the appropriate sequence. Indeed, this specific long chain of amino acid spontaneously folds into a unique 3D structure that gives the hemoglobin protein its unique function.
Lichens are another compelling example of emergent properties (Figure @ref(fig:lichen)). Lichens are true colonists. They were among the first organisms that colonized the earth surface. Indeed, when glaciers pull back lichens appear on the bare hostile rocks and in the course of thousands of years they can convert these rocks into soil, an environment that is fertile for other species. However, their unique and important properties for life on earth could never have been expected when only studying a lichen’s components.
Although lichens macroscopically look like one being, they are effectively two beings: a green algae with the precious gift of photosynthesis that turns sunlight and air into sugars, and, a fungus that provides shelter and is gifted with the art to mine the rocks for minerals, but which cannot make sugars. They are joined in a symbiosis that is so close that their union becomes an entirely new organism. When biologists tried to unravel this unique algal-fungal marriage, they started to grow them in the lab in ideal conditions for both algae and fungus. However, under these conditions they both happily lived next to each other. So the unique properties of lichens could not be delineated from studying both species apart. Only when the scientists exposed them to ecofactors that became so harsh, they teamed up in reciprocity and gained their unique transformative power of turning rocks into fertile environments (Kimmerer 2013).
Capra and Luisi (2014) convincingly argue that the same holds at any level and for life itself. Indeed the autopoietic property of a cell originates by properly combining all its molecules in the large networks of chemical reactions from which its structure and organisation, self maintenance, and self replication emerge. We cannot localize life. Life is simply a global property that emerges from the collective interactions of all kind of molecules in a cell, of the collection of cells in a tissue, of the collection of tissues in a organ system, of the collection of organ systems for an organism, of the collection of organisms in a population etc. At each level we have a network of lower level components from which new properties naturally emerge.
So life itself is an emergent property, it is not present in the parts it originates from, and only emerges when its parts are assembled together correctly.
For Maturana and Varela cognition is also inseparable from autopoiesis (Capra and Luisi 2014). Each living organism is an open system that interacts with its environment through its sensory arsenal. It senses its environment, it feeds on its environment, releases products in it, it changes its environment, and, actualizes itself to its environment. The fact that an organism is changing its environment is often overlooked. However, life has dramatically transformed Earth. Indeed, think of photosynthesis for instance that radically changed Earth by the release of the highly reactive molecule oxygen.
An organism at any time in its development is a record of its previous interactions with its environment and that determines its future interactions. Hence, cognition is a natural product of its evolution.
Varela sees life as a gestalt of three domains (Capra and Luisi 2014):
Life is the synergy of these three domains.
An overview of the evolution of our universe is given in Figure @ref(fig:evolutionUniverse).
The most common theory is that our cosmos started with the Big Bang. A very energy-rich state.
By expanding the universe quickly cooled down enough for energy to be converted into mass, i.e. the majority in hydrogen and a fraction in more heavy helium nuclei.
Under the influence of the gravitational force, matter then started to cluster in nebula, gaseous clouds essentially consisting of hydrogen (H) and helium (He).
Due to concentration difference in these first nebula, these clouds further contracted and eventually imploded (See Figure @ref(fig:genesisStar)).
Extreme heating in the nebula that imploded by gravity gave rise to a condition in which all matter was in the form of a super hot plasma. In such a plasma nuclear fusion spontaneously takes place. In this process light atoms are combined into more heavy atoms and much energy is released.
First all hydrogen atoms are converted into helium (see Figure @ref(fig:nuclearFusion)). The mass of the resulting Helium nuclei is slightly lower than that of the original helium nuclei and the mass difference has been converted again into energy.
If all hydrogen in a star is used, the fusion stops because the activation energy to convert helium to lithium is too high (Figure @ref(fig:fusionEnergy)).
Therefore, the star cools down, implodes and subsequently explodes into a supernova (see Figure @ref(fig:supernova)). During the supernova nuclear fusion further occurs towards more heavy elements up to iron (Figure @ref(fig:fusionEnergy)) and these elements are projected in the universe.
A supernova gives rise to new nebula and these will eventually lead to new stars and the whole process begins all over.
Hence, through nuclear fusion in the stars the more heavy atoms have been generated of which the bio-molecules of life are eventually built. So as often mentioned in a catchphrase: we are built out of cosmic dust.
Upon nuclear fusion in the stars many different elements are formed. For life, hydrogen, carbon, oxygen, sulfur, phosphorous and nitrogen atoms, among others, are key to compose organic matter.
In interstellar space, carbon reacts spontaneously with other elements to form poly-aromatic carbohydrates (PAHs). This is for instance visible in a photograph of the Cat’s paw nebula (Figure @ref(fig:catPawNebula)). In the green regions radiation of hot stars induces fluorescence of PAHs.
PAHs were already be generated shortly after the Big Bang. Once they are produced, they are further transformed in interstellar space upon reaction with hydrogen and oxygen among others to form precursor molecules (“forerunners”) for amino acids and nucleotides, which are the building blocks of proteins, and, of RNA and DNA, respectively. So these simple building blocks that are essential for the chemistry of life are already omnipresent in space.
In a remote corner of our milky way, which is an average galaxy, an average planet was formed, which we know as our planet Earth. After the Earth cooled down liquid water was formed. The unique features of liquid water are essential for life as we know it. Indeed, once liquid water was available, life emerged in less than 300 million years, which is very short time interval on the geological timescale.
So on earth conditions emerged that made the “genesis of life” as we know it, possible. The exact process that led to the life forms on earth is unknown and will probably remain so. However, a general rationale is displayed in Figure @ref(fig:originOfLife).
First, a chemical evolutionary process gave rise to organic molecules with increasing complexity. It is a general consensus that RNA molecules emerged first from free nucleotides. RNA molecules are known to be catalytic and are carriers of genetic information. They can also replicate themselves under anoxic conditions (in the absence of oxygen) and in the presence of iron, conditions that occurred at young planet Earth (e.g. Hsiao et al. (2013)). These RNA molecules evolved either alone, the RNA-world hypothesis, or already in conjunction with proteins.
Next, it is assumed that prebiotic molecules, which could self-replicate, were encapsulated by phospholipids, which spontaneously form membrane like structures. This resulted in a protocell with a membrane (see Figure @ref(fig:originOfLife)). In these protocells a further chemical evolution was possible, separated from the environment, which ultimately led to a self-replicating and self-organizing cell that was built from the four essential type of bio-molecules of life: lipids, carbohydrates, proteins and nucleic acids.
Further biological evolution of these first living cells eventually gave rise to our last universal common ancestor (LUCA). LUCA must have had at least 355 genes, which all living organisms have in common. LUCA was anaerobic, living without oxygen, and maintained its hereditary material using DNA, the genetic code, and produced proteins from RNA templates in ribosomes. LUCA retrieved its energy from oceanic volcanic activity in deep sea vents, i.e. the intensely hot plumes caused by seawater interacting with magma erupting through the ocean floor. And eventually LUCA evolved into all species that currently live on Earth.
All species evolved from the same ancestral population of cells. This is also referred to as the same Last Universal Common Ancestor (LUCA). In the tree of life the evolutionary relations are summarized between different organisms (and groups of organisms). All living beings eventually can be traced back to the last universal common ancestor who is located at the root of the tree, see Figure @ref(fig:treeOfLifeBis).
Phylogenesis is the process of the origin of all species from the tree of life from LUCA.
4.5 BYA | 4.3 BYA | 3.8 BYA | 3.5 BYA | 540 MYA | 520 MYA |
---|---|---|---|---|---|
(Source: naturedocumetaries.org)
At its origin about 4.5 Billion Years Ago (BYA) the Earth was black, a hot basalt rock and dust in a cold vacuum. As the Earth cooled down she became Grey Earth (4.3 BYA) as she was mainly covered by granite and rocks. Another 500 million years later she was covered with liquid water: Blue Earth (3.8 BYA).
In only 300 million years she radically changed into Red Earth (3.5 BYA) due to life. Indeed, cyanobacteria emerged that can do photosynthesis and produce oxygen, a very reactive molecule. This led all free iron in the ocean to precipitate as iron oxide or rust (red). There was an explosion of the number of minerals, they went from approximately 250 to more than 5000 mineral species. Oxygen also caused a mass extinction because only few organisms could cope with its highly reactive nature. The coming 3 billion years life on Earth stayed relatively similar.
Around 540 MYA Earth was struck by a large ice age, White Earth. Again leading to a mass extinction due to the cold. However, volcanic activity came to the rescue by producing greenhouse gasses and an atmosphere that could retain more heat.
In less than 20 million years (MY) Earth radically changed again and turned into Green Earth (520 MYA). There was an explosion of life, and life suddenly evolved from being mainly unicellular to complex multicellular forms.
There are two archetypes of cells:
From 3.5 BYA - 540 MYA we find mainly prokaryotic and some simple eukaryotic organisms in fossils and life was mainly unicellular.
Eukaryotic cells are believed to be generated by endosymbiosis, i.e. a symbiotic relationship where one organism lives inside the other, which is beneficial for both organisms.
The process of endosymbiosis that led to eukaryotic cells is displayed in Figure @ref(fig:endosymbiosis).
First an anaerobic1 prokaryotic cell was assumed to have grown and to have developed membrane systems inside the cell, which gave rise to the formation of the nucleus. At a certain point of time this large prokaryotic cell with internal membrane systems must have ingested an aerobic2 proteobacterium, which managed to avoid digestion. The two cells started with an endosymbiotic relation.
The proteobacterium provided the host cell with additional energy through the synthesis of ATP by metabolizing carbohydrates and lipids using oxygen and the host cell fed the proteobacterium with these bio-molecules. This gave the endosymbiotic pair the ability to grow further in size and to thrive in the oxygen rich environment of planet Earth that was established during the Cambrian era that started 540 million years ago. The increase of oxygen is believed to be one of the major triggers that gave rise to the “biological big bang” that led to an explosion of novel species and the emergence of all large groups3 in the animal kingdom (He et al. 2019).
During their endosymbiotic evolution the proteobacterium gradually handed over many genes to the cell nucleus, and became under regulation of the nucleus. It eventually evolved to mitochondria, the organelles that are the energy plants of a cell. So, we inherit more from mother than father: we inherit our cell structure as well as our energy system, i.e. our mitochondria and their remaining DNA, from mother side through her egg cell.
For plant cells, a second endosymbiotic event must have occurred with eukaryotic cells and cyanobacteria. The ingested cyanobacteria has evolved in a similar way into chloroplasts, which give plants the ability for photosynthesis and led them become the most successful life form of the globe. Fossil evidence of land plants dates back to 485 MYA - 420 MYA, however, phylogenetic analysis also suggests an earlier origin in the Cambrian (Strother and Foster 2021).
Other key differences between Prokaryota and Eukaryota are in terms of their reproduction. For Prokaryota that only takes place by cell division. A mutation in the DNA is thus fixed in all daughter cells. While nearly all Eukaryota have a phase of sexual reproduction. They are diploid organisms, i.e. they have two copies of each gene, one from fathers’ and mothers’ side. This enables successive mutations to be made in one copy because another functional copy of the gene is available. Moreover during sexual reproduction recombination of chromosomes occurs, i.e. reshuffling of paternal and maternal genes, which leads to much more variation.
The Eukaryota further evolved in protists, which are unicellular organisms, fungae, plants and animals.
How does the evolution of LUCA to all species of the tree of life occurred? Evolution is largely driven by variability and selection.
Variation occurs, among others, through errors in the copying of DNA for cell division of prokaryotic cells and gamete4 production for eukaryotic cells (see Figure @ref(fig:dnaPolymerase)).
The error margin of DNA replication is 1 error per billion base pairs that are copied, which can lead to point mutations. Note, that we humans have a genome size of about 6.4 billion base pairs.
Other sources of variability are
Most mutations are neutral, i.e. the mutated codon encodes for the same amino acid. Neutral mutations can be used as a molecular/genetic clock to tell how far apart two species are in an evolutionary sense.
But, mutations are not always neutral! For instance, sickle cell anemia is caused by one point mutation in hemoglobin (see Figure @ref(fig:sickleCell1) and @ref(fig:sickleCell2)). A T (thymine base) is mutated to an A (adenine base), which results in a codon that encodes for the amino acid valine instead of glutamic acid. This causes dramatic changes to the 3D structure of the hemoglobin protein and causes a deformation of the red blood cells from a round to sickle shape, which leads to anemia.
Why does this mutation remain? It mainly occurs in Africa, where the mutation for sickle cell anemia is selected because it makes affected individuals resistant against malaria, which gives them a evolutionary competitive advantage despite their anemia. However, offspring who inherit the mutation of both parents are not viable. Therefore, the regular variant of hemoglobin also remains.
Hence, evolution is a natural process that is driven by two opposing forces: variation and selection.
Variation occurs by spontaneous copy errors in genetic code or mutations, among others.
Selection occurs upon ecofactors: Is a mutation beneficial or harmful for a particular organism in its specific environment? The odds on fixation of the mutation thus depends on the reproductive success.
The process of genetic variation and selection can eventually lead to evolution of new species upon many generations.
Another important process for the origin of species is genetic drift. This are random fluctuations of alleles, which are particular sequence variants of a gene. Genetic drift is particularly strong in small populations. As opposed to selection it is not adaptive.
New species will thus originate more quickly when a small fraction of the population gets isolated in a new environment.
Horizontal gene transfer, which is non-sexual transfer of genetic information between two distinct organisms, is also important for the evolutionary process.
This is very common between Prokaryota, i.e. eubacteria and archaea bacteria, think on the super bugs in hospitals that acquired resistance to multiple antibiotics. It also occurs between Eukaryota. Mainly in protists, which are unicellular organisms with a nucleus. As well as between Prokaryota on the one hand and Eukaryota on the other hand.
Endosymbiosis is also a very important driver of evolution. Niles Eldredge and Stephen Jay Gould have shown that the fossil record indicates that evolution happens in bursts: it is slow most of the time until suddenly rapid changes occur over a brief time (Margulis 1999). According to Lynn Margulis this can be explained due to endosymbiosis, where one species starts to live within another species, which is a source of evolutionary novelty that can give rise to an explosion of novel life forms. We already touched in more detail upon an important endosymbiosis event in section @ref(endosymbiosis).
The term teleonomy means that evolution only has the primitive goal to maintain and reproduce the species, and, that it thus has no other purpose and thus also no direction.
When complex organs and organisms originate it might seem as if there is a direction or purpose, but, that is not the case! A good example is the development of an eye, see Figure @ref(fig:evolutionEye).
Indeed,
The eye is not developed by evolution with the purpose to see.
The eye only has the function to see.
It is the result of a gradual process where each adaptation gave a reproductive advantage in a particular environment.
In another environment it can be no longer functional and than it might disappear or might become dysfunctional, e.g. moles eyes.
Hence, the origin of a species is the result of evolution, but, not the purpose of evolution. Evolution is thus adaptation with as sole goal maintenance and reproduction.
From the distribution of the complexity of species in Figure @ref(fig:distributionComplexity), it is clear that the most abundant species in number always has been bacteria. Indeed, evolution has no direction. The distribution of complexity at present, however, has a tail to the right. This mainly stems from the fact that there is lower bound of complexity of living organisms. Hence, evolution could not generate organisms with a complexity below this lower bound and it therefore only seems as if it slightly favors increasing complexity.
(ref:gould) Distribution of complexity of species (Gould 1997)
Bar-On, Phillips, and Milo (2018) published the distribution of carbon mass fixated in different types of species in Figure @ref(fig:carbonFixated), which also shows that there is no preference towards more complex organisms.
(ref:carbonFixatedCap) Mass in giga tons of carbon for different groups of species. (Bar-On, Phillips, and Milo 2018)
Note, that
Plants are the most successful group in terms of the mass of carbon that they fixate.
Bacteria clearly dominate the more complex animal kingdom.
Animals represent a relative small fraction. Among animals, cattle and humans are over represented.
Other compelling evidence that there is no direction to evolution stems from the number of bacterial cells in our body. Sender, Fuchs, and Milo (2016) showed that the ratio of the number of bacterial cells to the number of human cells is close to 1. A human with a weight of 70kg has \(\pm\) 38 trillion bacterial cells and 30 trillion humane cells. Note, that trillion is a thousand billion or 10\(^{12}\)!
So, all these examples make clear that evolution has no direction. For every organism with a higher complexity there also emerge many organisms with lower complexity! However, we can view evolution as a creative process that is generating novel life forms that are well adapted to their environment.
The ontogenesis is the development of an organism from a fertilized egg cell up to the adult stage until we eventually die.
Our ontogenesis starts from the genetic potential we inherit from our parents through their egg and sperm cell. The fertilized egg cell then starts to divide and at a certain point the cells start to differentiate into different tissues. Each of our cells, apart from our germ cells, has the same genetic material!
Another very important factor in our development is our environment and how we interact with our environment. Indeed, our phenotype, our observable traits and characteristics, stem from the complex interplay between our genomic makeup, our genotype, and environmental factors.
Key questions that arise in this respect is how cells of different tissues of the same organism that share the same genetic code can be morphologically so distinct, and how the environment and changes in our environment can impact our phenotype?
This is due to epigenetics!
Epigenetics is the study of how development, behavior and environment can cause changes that affect the way we use our genes. Unlike genetic changes, epigenetic changes are reversible and do not change your DNA sequence, but they can change how our body can (or cannot) read a DNA sequence, see Figure @ref(fig:epigenetics).
Epigenetic markers, however, are copied during cell division to the two new cells that are formed. This causes a liver cell to remain a liver cell and a brain cell to remain a brain cell upon cell division.
Epigenetic markers are small molecules that interact either directly with the DNA or with histones, which are specific proteins that act as spools that wind the long DNA molecule in a more compact/condense form. They can make genes accessible or inaccessible for RNA transcription and thus eventually for the production of proteins. Two common types are
Both types can impact the activity of genes. On the one hand DNA methylation actively represses the expression of a specific gene, while DNA demethylation enhances gene expression. On the other hand histone acetylation makes the structure of a region of DNA more open and thus more accessible for transcription of the genes in that region, while de-acetylation has the opposite effect.
Epigenetics plays an important role in embryonic development (see Figure @ref(fig:epiEmbryo)).
(ref:epiEmbryoLab) Epigenetics in embryo genesis at CpG islands, regions that can bind with many methyl groups. All CpG islands of a sperm cell are almost fully methylated. That of an egg cell are methylated around 50%. Upon fertilization the methylation drops and almost all genes become accessible for the undifferentiated blastula. As cells differentiate and get specific functions in tissues methylation increases again (Source: Mariuswalter, Wikipedia)
CpG islands are regions in our DNA that can be heavily methylated. They are major regulatory units and around 70% of our genes have CpG islands in their promotor regions7. Methylation of the promotor region is typically associated with a reduction of gene expression. In Figure @ref(fig:epiEmbryo) we observe that the CpG islands of a sperm cell are almost fully methylated, which indicates that many genes are likely to be silenced. Indeed, a sperm cell has one main function and that is to swim forward and fertilize the egg cell. The CpG islands of an egg cell are methylated around 50%. Upon fertilization the methylation drops in the blastocyst (an early stage of embryonic development, about five days upon fertilization). This is required for the cells to regain the property that they can divide and differentiate to all cell-types of the developing organism. As differentiation of the cells starts and as they evolve into tissues, methylation increases again so that the cells have access to less genes and get more specific functions. Note, that the methylation status is also passed upon cell division, which leads to invariance of differentiated cells: indeed, a liver cell remains a liver cell upon division and a brain cell a brain cell, etc.
Epigenetics are also largely driven by ecofactors. This can be nicely illustrated with identical twins who have almost the same genome. It gets more easy to tell them apart over time due to epigenetic changes that are triggered by the different environments to which they were exposed and that effectively changed how they are using their genes. A good example is the difference in skin ageing between twins that had a different exposure to UV radiation (e.g. Figure @ref(fig:epiUV)).
(ref:epiUVlab) Difference in skin ageing between twins is largely induced by epigenetic changes originating from a difference in exposure to UV radiation (Schwab and Hogenson 2017)
Aging and longevity are influenced by genetic, epigenetic, and environmental factors during development, growth, maturity, and older stages. It has been shown that the genetic component (heritability) plays only a moderate role in aging and longevity, so epigenetics seems to be a major contributor (Adwan-Shekhidem and Atzmon 2018). Indeed, many epigenetics markers and in particularly DNA methylation sites have been shown to be remarkable predictors of chronological age. Aging has been shown to be associated with profound changes in the epigenetic landscape, that give rise to alterations of gene expression and genome architecture.
Epigenetics is plays also an important rol in learning (Creighton et al. 2020).
In the last decade, it has been shown that epigenetics is very important in development of the brain and for learning.
Animal research has shown that the disruption and inhibition of gene expression and translation in the brain shortly after a learning event, has a tremendous impact on long time memories (\(\geq 24\) hours), but not on short term memories. These manipulation had most impact in the first hours upon the learning event. Suggesting that the initial memory consolidation happens within a 6 hour time window.
Learning typically happens in waves. Genes activated shortly after learning return to baseline within 24 hours, a time point at which a second wave of gene expression is triggered. In the second wave, genes are included that encode for epigenetic regulators. So, the first wave within 6 hours upon learning is known to up regulate transcription factors important to initiate the second wave of transcription, which is needed to establish long-lasting memories.
Understanding memory consolidation gets even more complex due to the reorganization of neural networks associated with recent memory storage and later memory phases (\(>7\) days).
In a large number of studies, DNA methylation has been shown to play an important role in multiple stages of memory formation.
On the one hand it is shown that DNA methylation is transient, i.e. rapidly induced and reversed in the first hours upon learning. On the other hand, stably altered cortical DNA methylation has been shown up to 4 weeks upon learning and blocking the cortical DNA methylations has been shown to impair memory. So methylation plays a role in different stages of the process of memory formation.
It also has been shown that methylation influences memory through different mechanisms, e.g.
A large body of literature support the role of methylation in learning and it is shown that methylation associated to learning occurs in the hippocampus, prefrontal cortex and the amygdala of the brain (Figure @ref(fig:brainRegionsLearning)).
Another type of epigenetic regulation is through the modification of histones, i.e. the proteins that act as spools around which the long DNA molecule is winded and which can affect the accessibility of a DNA region upon modification.
Histone modifications with epigenetic markers have been shown to be mainly important in the initial phase of memory consolidation. They are rapidly modified after which they return to baseline. These modifications cause “unwinding or winding the DNA from or around the histones” making particular genes accessible or inaccessible for expression.
Histone epigenetic markers seem to regulate the transcriptional sensitivity towards external stimuli. Histone acetylation that makes the DNA more accessible, for instance, has been shown to increase learning induced gene expression and seem to be particularly important for regulating memory strength.
So recent progress has shown the tremendous importance of epigenetics for brain development and learning, however, the field of neuro-epigenetics is still in its infancy and many open questions remain.
Identification and quantification of peptides and proteins
Data exploration and quality control using plots
Preprocessing: log-transformation, Filtering, Normalization, Summarization
Dealing with batch effects and other confounders
Statistical Concepts
NGS Data exploration
Preprocessing/normalization
Additional Statistical Concepts
Theory and Tutorials are blended
Communication and submission of projects via Ufora
All tutorials from week 2 onwards are based on R/Bioconductor via R-studio Scripts are made in R/markdown: a file format to combine text, R code and R output.
Anaerobic: living in the absence of oxygen↩︎
Aerobic: oxygen using and respiration with oxygen provides more energy↩︎
Large groups of a kingdom are more formally referred to as phyla↩︎
Gamete: sperm or egg cell↩︎
A methyl group (-CH\(_3\)) consists of one carbon and three hydrogen atoms↩︎
An acetyl group (-CO-CH\(_3\)) consists of two carbon, one oxygen and three hydrogen atoms↩︎
A promoter is a sequence of DNA to which proteins bind to initiate transcription of the gene downstream of the promotor↩︎