Dr. Sanchari Sinha Dutta, Ph.D :
Overlapping genes (OLGs) or dual-coding genes utilize two different reading frames or initiation codons for encoding more than one protein. Since both coding (sense strand) and non-coding (antisense strand) strands are utilized during transcription, an OLG can be originated by overlapping of two genes on the same strand or the opposite strand.
The development of an OLG may involve the extension of an open reading frame, which can be resulted from the start or stop codon substitutions, or elimination of start or stop codons due to deletions and frameshifts.
Moreover, for gene overlapping, the transcription initiation signal for one gene must be located inside the second gene, whose transcription initiation site is located further upstream.
In general, OLGs can be subdivided based on the degree of overlap and the shared direction of gene transcription. In convergent and divergent overlaps, 3′ and 5′ termini of the gene pair are involved, respectively.
Moreover, in nested overlaps, one gene is positioned entirely within the intron of the second gene; whereas, in embedded overlaps, more than one intron or exon is shared by the involved genes.
How overlapping proteins are formed?
Unlike common mechanisms of novel protein formation, such as gene duplication or horizontal gene transfer, OLG-encoded proteins are created de novo via a process called overprinting.
In this process, a de novo reading frame is created due to certain mutations in the original reading frame; at the same time, the protein-coding ability of the original frame remains intact.
Studies have shown that these de novo proteins play vital roles in viral pathogenic responses by suppressing interferon release from host cells, by neutralizing RNA-dependent gene silencing, and by triggering host cell death.
In contrast, these proteins can reduce the rate of virus evolution as well as adaptability to the host cell microenvironment by decreasing the frequency of synonymous mutations (mutations that introduce changes in an exon of a protein-coding gene, without altering the amino acid sequence).
What are the benefits of overlapping genes?
Given the obligation of maintaining two functional genes, it is obvious that OLGs can survive prolonged evolutionary stages only when the overlap is proved to be beneficial for an organism. OLGs are known to play an important role in modulating the host-pathogen interaction.
OLGs in the viral genome
In viruses, the presence of OLGs is particularly beneficial in overcoming the limitation of small genome size. Importantly, OLGs facilitate the production of more than one protein from a particular DNA segment, which is not possible in sequentially arranged genetic sequences.
OLGs were first identified in the genome of bacteriophage PhiX174, which is a single-stranded DNA virus with a very small genome. If transcribed linearly, the tiny genome of the virus will not be able to encode all proteins (11 proteins) that the virus needs for survival, pathogenicity, and virulence. Thus, gene overlapping is an important evolutionary mechanism for this type of organism.
Studies have found that most of the viral proteins encoded by OLGs are unfolded/disordered and contain a highly unusual amino acid sequence (disorder-promoting amino acids). These proteins are mainly accessory proteins responsible for viral pathogenicity and infection transmission. They do not support basic cellular functions including viral replication or structural development.
Since disordered proteins can rapidly switch between different structural forms, the unstructured characteristics of overlapping proteins can reduce the evolutionary restrictions imposed by the overlap.
OLG in the human genome
Recent evidence suggests the presence of OLGs in the human genome. Although the exact functions of OLGs in humans are yet to be identified, it is primarily believed that human OLGs are not associated with the evolutionary constraint of reducing the genome size.
Studies have found that about 25% of protein-coding genes overlap in the human genome. Of these overlaps, the frequency of same-strand overlap is higher than that in the opposite-strand overlap. Although OLGs are distributed throughout the human genome, the pattern of distribution varies between chromosomes.
One of the potent examples of OLG in the human genome is INK4a/ARF, wherein the conventional transcription of the DNA sequence leads to the production of INK4a protein. At the same time, the gene also encodes a completely different protein called ARF by altering the reading frame of the codon (alternate reading frame: ARF). These two proteins possess a completely different structure and function.
In humans, protein products of OLGs are believed to play important roles in regulating many physiological functions, preventing disease development and progression, and improving longevity. For example, both INK4a and ARF act as tumor suppressors but exert their effects through separate tumor-suppressing pathways.
In cancer patients, these genes are often found to be silent or inactivated. Moreover, studies conducted on mice have shown that mutations in INK4a/ARF gene locus are associated with the development and progression of tumors.
Overlapping genes (OLGs) or dual-coding genes utilize two different reading frames or initiation codons for encoding more than one protein. Since both coding (sense strand) and non-coding (antisense strand) strands are utilized during transcription, an OLG can be originated by overlapping of two genes on the same strand or the opposite strand.
The development of an OLG may involve the extension of an open reading frame, which can be resulted from the start or stop codon substitutions, or elimination of start or stop codons due to deletions and frameshifts.
Moreover, for gene overlapping, the transcription initiation signal for one gene must be located inside the second gene, whose transcription initiation site is located further upstream.
In general, OLGs can be subdivided based on the degree of overlap and the shared direction of gene transcription. In convergent and divergent overlaps, 3′ and 5′ termini of the gene pair are involved, respectively.
Moreover, in nested overlaps, one gene is positioned entirely within the intron of the second gene; whereas, in embedded overlaps, more than one intron or exon is shared by the involved genes.
How overlapping proteins are formed?
Unlike common mechanisms of novel protein formation, such as gene duplication or horizontal gene transfer, OLG-encoded proteins are created de novo via a process called overprinting.
In this process, a de novo reading frame is created due to certain mutations in the original reading frame; at the same time, the protein-coding ability of the original frame remains intact.
Studies have shown that these de novo proteins play vital roles in viral pathogenic responses by suppressing interferon release from host cells, by neutralizing RNA-dependent gene silencing, and by triggering host cell death.
In contrast, these proteins can reduce the rate of virus evolution as well as adaptability to the host cell microenvironment by decreasing the frequency of synonymous mutations (mutations that introduce changes in an exon of a protein-coding gene, without altering the amino acid sequence).
What are the benefits of overlapping genes?
Given the obligation of maintaining two functional genes, it is obvious that OLGs can survive prolonged evolutionary stages only when the overlap is proved to be beneficial for an organism. OLGs are known to play an important role in modulating the host-pathogen interaction.
OLGs in the viral genome
In viruses, the presence of OLGs is particularly beneficial in overcoming the limitation of small genome size. Importantly, OLGs facilitate the production of more than one protein from a particular DNA segment, which is not possible in sequentially arranged genetic sequences.
OLGs were first identified in the genome of bacteriophage PhiX174, which is a single-stranded DNA virus with a very small genome. If transcribed linearly, the tiny genome of the virus will not be able to encode all proteins (11 proteins) that the virus needs for survival, pathogenicity, and virulence. Thus, gene overlapping is an important evolutionary mechanism for this type of organism.
Studies have found that most of the viral proteins encoded by OLGs are unfolded/disordered and contain a highly unusual amino acid sequence (disorder-promoting amino acids). These proteins are mainly accessory proteins responsible for viral pathogenicity and infection transmission. They do not support basic cellular functions including viral replication or structural development.
Since disordered proteins can rapidly switch between different structural forms, the unstructured characteristics of overlapping proteins can reduce the evolutionary restrictions imposed by the overlap.
OLG in the human genome
Recent evidence suggests the presence of OLGs in the human genome. Although the exact functions of OLGs in humans are yet to be identified, it is primarily believed that human OLGs are not associated with the evolutionary constraint of reducing the genome size.
Studies have found that about 25% of protein-coding genes overlap in the human genome. Of these overlaps, the frequency of same-strand overlap is higher than that in the opposite-strand overlap. Although OLGs are distributed throughout the human genome, the pattern of distribution varies between chromosomes.
One of the potent examples of OLG in the human genome is INK4a/ARF, wherein the conventional transcription of the DNA sequence leads to the production of INK4a protein. At the same time, the gene also encodes a completely different protein called ARF by altering the reading frame of the codon (alternate reading frame: ARF). These two proteins possess a completely different structure and function.
In humans, protein products of OLGs are believed to play important roles in regulating many physiological functions, preventing disease development and progression, and improving longevity. For example, both INK4a and ARF act as tumor suppressors but exert their effects through separate tumor-suppressing pathways.
In cancer patients, these genes are often found to be silent or inactivated. Moreover, studies conducted on mice have shown that mutations in INK4a/ARF gene locus are associated with the development and progression of tumors.