Francisco Requena - The incredible robustness of the human genome

In a recent interview, one of the most prolific Spanish researchers in the biomedical field, Carlos López Otín, stated the following:

“I turned 63 years old, it seems to me a cosmic feat. 63 years resisting thousands and thousands of changes in my genome every day. It is clear to me that the amazing thing is not to have cancer, but to not have it.”

Clinicians and researchers often focus on variants that cause phenotypic consequences. The reality is that most variants have limited or no impact on the human body and, despite mutations accumulated over a lifetime, most humans remain healthy.

The advent of WGS and the transition from a disease-based cohort to a population-based cohort is giving the possibility to identify more variants in healthy individuals. For instance, the gnomAD project reported a median of 22,636 SVs in healthy individuals spanning a cumulative median length of 10.02 Mb per haploid genome [1], and a median of 205 heterozygous LoF and 33 homozygous LoF variants [2].

These figures alone give the impression that the human genome is immune to mutations. Nothing could be further from the truth. For instance, the population prevalence of patients with rare diseases is 3.5-5.9% [9] and OMIM (updated April 11th, 2022) reports 6,356 phenotypes with a known molecular basis. Interestingly many of these diseases are monogenic.

Therefore, we face two opposing scenarios: sometimes the combined effect of more than 22,000 SVs do not lead to a pathogenic phenotype, but in some people, a base-pair change can lead to irreversible developmental disorders. For example, there are also cases where LoF variants cause embryonic lethality in humans in a heterozygous state, while others are benign even in homozygosity.

This dichotomy has hindered one of the central goals of genetics: understanding the links between genetic variation and disease. As a consequence, multiple pathogenicity scores have been developed over the past few years leveraging multiple sources of information, such as interspecies conservation (e.g. RVIS [4], PhastCons [10]), intraspecies conservation (e.g. pLI [2], GDIS [3]), or aggregation-based methods, including other lines of evidence such as epigenetic information (e.g. CADD [5]).

The clinical value of these methods has been corroborated. In spite of that, these methods only report single-variant predictions and the potential combinatorial effect (e.g. additive, suppressor) of two or more variants remains unexplored.

Personally, I believe that the transition from single-variant predictions to combined variants predictions will be a crucial step for the advancement of our field.

But a question arises: how do we prioritize each potential combination?

For instance, considering the median number of structural variants in a healthy human being (22,636), more than 250 million potential combinations would have to be evaluated, and this figure only considers two variants combinations.

As a way to overcome this, we need new tools that help to annotate the human genome based on the degree of mutational robustness [11].

The human genome, like any other biological system, is resistant to genetic variation and environmental changes. Therefore, the assessment of variant combinations can only be carried out if we have a good understanding of the mechanisms that make our genome robust and how this robustness changes across the human genome.

Phenotypic robustness

The robustness of biological systems has been described since the earliest studies in developmental biology. Waddington realized that wing development in Drosophila is generally resilient to minor perturbations caused by environmental changes, such as heat or osmotic stress [12]. Knockout strains of the yeast Saccharomyces cerevisiae exist for 96% of open reading frames, indicating a remarkable tolerance to single gene deletions, although many of these genes are required under certain growth conditions [13].

One may argue that this tolerance can be explained by the fact that most variants disrupt genes with no relevant biological functions. We know that this is true for some groups of genes, such as olfactory receptor genes [14]. At the same time, we also know that mutations that disrupt genes with critical functions have no phenotypic consequences. For instance, mammalian cell division is a critical biological process controlled mainly by cyclin-dependent kinases (Cdk). Surprisingly, mice lacking Cdk4, Cdk6 or Cdk2 are viable due to compensatory mechanisms. Interestingly, Cdk4 and Cdk6 simultaneously lead to embryonic lethality [15].

Due to these compensatory genetic mechanisms, it is plausible to think that many genes with essential functions will never be associated with human diseases. However, we must exclude from this list those disrupted genes that lead to embryonic lethality and therefore no clinical assessment is possible.

Multiple mechanisms of mutational robustness have been described that protect against damaging variants in protein-coding genes:

Genetic redundancy. The inactivation of one gene has little or no effect on the phenotype of the organism due to compensation by one or more other genes. Total or partial functional overlap can be obtained through different genes with similar functions or identical copies of genes through gene duplication. For instance, paralogous genes (genes that have been duplicated at some point in evolutionary history) are often less essential than singletons [16] due to the buffering effect of the paralogous copy. In addition, it has been described that half of the pathogenic CNVs affected a cluster of functionally related genes, but only 4% of benign CNVs affected one [17].
Distributed robustness. This mechanism is prominent in metabolic pathways, but also in signaling and transcriptional networks. It describes the compensation of a perturbed gene’s function by another gene or set of genes despite not sharing functions [18]. Interestingly, these genes may be in different pathways or within the same network.
Transcriptional adaptation. A mutation can lead to activation of the mRNA surveillance machinery that increases the expression of related genes that have sequence similarity to the mRNA of the mutated gene. These upregulated genes take over the function of the mutated gene and act as buffers for mutations [19].

From an evolutionary perspective, these mechanisms should be more present in genes whose alteration could compromise the viability of the system. This idea confronts, i) the common belief that clinically-relevant genes are the most biologically important and ii) that focusing only on translational or clinical research may be sufficient to understand fundamental biological mechanisms. In fact, due to this level of robustness, many biological relevant genes will not be sufficiently studied because they are not clinically relevant.

We can therefore assume that clinically-relevant genes are not necessarily the most biologically important ones, but in fact genes with a certain biological value and, at the same time, vulnerable to mutations.

Regulatory regions are especially characterized by their genetic robustness, even at a higher level than protein-coding genes. As discussed in the introduction, a protein-coding gene can be regulated by several regulatory elements (e.g. enhancers, non-coding RNAs). For instance, disease-associated genes are related to higher redundant enhancer domains and therefore buffered against the effects of non-coding mutations. In fact, this would explain why disease-associated genes are depleted of cis-eQTLs [7].

Exploring the genetic robustness of regulatory regions would shed light on the contradictory results about the non-essentiality of TADs [8] and enhancers [20].

Holistic vision

In humans, we still hope to find links between single genetic alterations and phenotypes. While this may be true in some cases, the reality is that in most patients more than one genomic element is affected.

The high degree of robustness precludes the assumption that a phenotype arising from a system with such a level of complexity can be explained by a single genetic factor (Burgess 2022).

As a consequence of this functional redundancy, it is time to consider the human genome for what it is, a complex system, and to follow a holistic approach, i.e. to consider that the interaction of genomic elements and environmental factors is more than the mere sum of the individual parts.

The field of medicine is replete with a number of clinical cases that are missing pieces to understand the big picture. For instance, individuals harboring the HLAB27 antigen are about 300 times more likely to develop the autoimmune disease ankylosing spondylitis. Around 8% of people in the UK have this antigen and most do not suffer from the disease.

With this in mind, some concerns arise when we read studies that assess the impact on individual genes, such as perturbation screenings in cell lines, or knockout and knockdown in mouse models without considering, for instance, the dependency between genes with redundant functions.

Complex diseases

The dependency problem has been well settled in the cancer field. Continued efforts to identify the driven mutation(s) have led to the search for gene dependencies as a strategy for the development of new drugs and diagnostic tools. For instance, new perturbation screening studies based on digenic dependencies in pathway-driven cancers are finding vulnerabilities that can be used as therapeutic targets in cancer [21].

In addition, for many traits, even the most important loci in the genome have small effect sizes and combined, significant hits only explain a modest fraction of the heritability.

The search for the “missing heritability” [6] has prompted the research community to develop polygenic scores (PGS) that allow genetic prediction of complex traits. In fact, Boyler et al. went a step further and proposed an omnigenic approach in which the cellular regulatory network should be assessed even if these other genes are distant from the “core disease genes” [23].

Only aggregation-based methods will find the missing pieces of the puzzle called heritability and explain surprising findings, such as the high degree of overlap of common SNPs in the population that are risk variants for autism spectrum disorder [22].

Rare diseases

This vision is particularly interesting for patients with rare diseases. Even in well-described monogenic disorders, a patient’s symptoms may differ from one individual to another. Concepts such as the variable penetrance or expressivity between patients with the same disorder clearly support this approach.

In the field of rare diseases, the two-hit model has been well established as a factor that can explain the clinical variability of patients carrying the same mutation. For instance, a two-hit model was able to predict the variable expressivity of children with severe developmental delay who are carriers of a recurrent microdeletion at chromosome 16p12.1 [23].

In addition, the identification of other mutations in different genes may not only explain the variable expressivity but also the presence or absence of the disease. Some pathologies are characterized by a digenic inheritance in which the alteration of two different genes is necessary and sufficient to cause a pathology with a defined diagnosis [24]. For instance, retinitis pigmentosa or Bardet–Biedl syndrome are some well-characterized human diseases with a digenic inheritance [24]. To date, [16] digenic combinations involved in different human digenic diseases have been described [25] and more recent approaches have been developed for predicting the pathogenicity of gene combinations that are simultaneously disrupted [26].

CNVs represent a great example of oligogenic effects. Many of the current CNVs evaluated as pathogenic harbor more than one gene. Probands with partially overlapping deletions could be explained by differences in the disrupted genomic elements. More recent approaches suggest an interaction-based model for the assessment of CNVs [27], such as the use of interaction networks or enrichment analyses for the identification of shared biological pathways.

Simultaneously, rare disorders can be affected by other factors like any other trait. For instance, beta-thalassemia is a monogenic disorder with very diverse clinical features among patients, which is partially explained by the existence of secondary and tertiary factors that modulate the severity of this disease [28]. Some developmental disorders are associated with environmental factors, such as fetal alcohol syndrome [29] and microcephaly through infectious agents [30].

References

Beyter D, Ingimundardottir H, Oddsson A, Eggertsson HP, Bjornsson E, Jonsson H, et al. Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits. Nat Genet. 2021. doi:10.1038/s41588-021-00865-4
Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581: 434–443.
Itan Y, Shang L, Boisson B, Patin E, Bolze A, Moncada-Vélez M, et al. The human gene damage index as a gene-level approach to prioritizing exome variants. Proc Natl Acad Sci U S A. 2015;112: 13615–13620.
Petrovski S, Gussow AB, Wang Q, Halvorsen M, Han Y, Weir WH, et al. The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity. PLoS Genet. 2015;11: 1–25.
Kircher M, Witten DM, Jain P, O’roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014;46: 310–315.
Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature. 2009;461: 747–753.
Wang X, Goldstein DB. Enhancer Domains Predict Gene Pathogenicity and Inform Gene Discovery in Complex Disease. Am J Hum Genet. 2020;106: 215–233.
Despang A, Schöpflin R, Franke M, Ali S, Jerković I, Paliou C, et al. Functional dissection of the Sox9-Kcnj2 locus identifies nonessential and instructive roles of TAD architecture. Nat Genet. 2019;51: 1263–1271.
Nguengang Wakap S, Lambert DM, Olry A, Rodwell C, Gueydan C, Lanneau V, et al. Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database. Eur J Hum Genet. 2020;28: 165–173.
Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005;15: 1034–1050.
Fares MA. The origins of mutational robustness. Trends Genet. 2015;31: 373–381.
Waddington CH. Canalization of Development and Genetic Assimilation of Acquired Characters. Nature. 1959. pp. 1654–1655
Ni L, Connelly C, Riles L, Véronneau S, Dow S. Functional profiling of the Saccharomyces cerevisiae genome. Nature. 2002
Rausell A, Luo Y, Lopez M, Seeleuthner Y, Rapaport F, Favier A, et al. Common homozygosity for predicted loss-of-function variants reveals both redundant and advantageous effects of dispensable human genes. Proc Natl Acad Sci U S A. 2020;117: 13626–13636.
Barrière C, Santamaría D, Cerqueira A, Galán J, Martín A, Ortega S, et al. Mice thrive without Cdk4 and Cdk2. Mol Oncol. 2007;1: 72–83.
De Kegel B, Ryan CJ. Paralog buffering contributes to the variable essentiality of genes in cancer cell lines. PLoS Genet. 2019;15: e1008466.
Andrews T, Honti F, Pfundt R, de Leeuw N, Hehir-Kwa J, Vulto-van Silfhout A, et al. The clustering of functionally related genes contributes to CNV-mediated disease. Genome Res. 2015;25: 802–813.
Sundaram MV. The love–hate relationship between Ras and Notch. Genes Dev. 2005;19: 1825–1839.
El-Brolosy MA, Kontarakis Z, Rossi A, Kuenne C, Günther S, Fukuda N, et al. Genetic compensation triggered by mutant mRNA degradation. Nature. 2019;568: 193–197.
Osterwalder M, Barozzi I, Tissières V, Fukuda-Yuzawa Y, Mannion BJ, Afzal SY, et al. Enhancer redundancy provides phenotypic robustness in mammalian development. Nature. 2018;554: 239–243.
Ito T, Young MJ, Li R, Jain S, Wernitznig A, Krill-Burger JM, et al. Paralog knockout profiling identifies DUSP4 and DUSP6 as a digenic dependence in MAPK pathway-driven cancers. Nat Genet.
Grove J, Ripke S, Als TD, Mattheisen M, Walters RK, Won H, et al. Identification of common genetic risk variants for autism spectrum disorder. Nat Genet. 2019
Girirajan S, Rosenfeld JA, Cooper GM, Antonacci F, Siswara P, Itsara A, et al. A recurrent 16p12.1 microdeletion supports a two-hit model for severe developmental delay. Nat Genet.
Gazzo AM, Daneels D, Cilia E, Bonduelle M, Abramowicz M, Van Dooren S, et al. DIDA: A curated and annotated digenic diseases database. Nucleic Acids Res. 2016;44: D900–7.
Papadimitriou S, Gazzo A, Versbraegen N, Nachtegael C, Aerts J, Moreau Y, et al. Predicting disease-causing variant combinations. bioRxiv. 2019; 520353.
Jensen M, Girirajan S. An interaction-based model for neuropsychiatric features of copy-number variants. PLoS Genet. 2019;15: 1–14.
Weatherall DJ. Phenotype-genotype relationships in monogenic disease: lessons from the thalassaemias. Nat Rev Genet. 2001;2: 245–255.
Mattson SN, Bernes GA, Doyle LR. Fetal Alcohol Spectrum Disorders: A Review of the Neurobehavioral Deficits Associated With Prenatal Alcohol Exposure. Alcohol Clin Exp Res. 2019;43: 1046–1062.
Mlakar J, Korva M, Tul N, Popović M, Poljšak-Prijatelj M, Mraz J, et al. Zika Virus Associated with Microcephaly. N Engl J Med. 2016;374: 951–958.