There is a longstanding interest in identifying the subset of our genome that is the most essential to life and normal development. Such regions should be under the highest purifying selection, and therefore, exhibit lower nucleotide diversity. In the case of protein-coding genes, especially strong “constraint” should be observed against protein-altering (i.e., missense, stop-gain, frameshift, etc.) variants. Prior studies have attempted to identify constrained genes, but have been unable to identify focal regions of constraint within each gene: in other words, which specific regions of protein genes are most intolerant, and therefore most likely to cause disease when mutated?
To address this question, Quinlan and colleagues studied genetic variation detected among >120,000 human exomes to reveal focal coding regions that lack variation in healthy individuals. These “constrained coding regions” (CCRs) are inferred to be under strong purifying selection and are enriched for known pathogenic variants. Perhaps the most intriguing aspect of this map of CCRs is the fact than many of the most constrained regions lie within genes that lacked prior disease association. Thus, these regions hold the promise of new disease gene discovery in the context of developmental disorders and are used to prioritize mutations in rare human diseases.
A map of constrained coding regions in the human genome. Havrilla JM, Pedersen BS, Layer RM, Quinlan AR. Nat Genet. 2019 Jan;51(1):88 (cover article).
Press Releases and Media:
University of Utah Health: “Big Datasets Pinpoint New Regions to Explore the Genome for Disease“