In recent years, great interest is attracted to interactions of different genomic regions of interphase chromosomes, which are located at large distances from each other (based on the linear dimensions of the chromosomal DNA)      . Physically separated chromosomal loci more frequently communicate with each other within certain domains, which are called “topological domains”  or “topologically associated domains”―TADs  with chromatin looping out of DNA region located in between. These domains persist in interphase chromosomes of different cell types; they are evolutionarily conserved on syntenic chromosomal regions of different organisms. The average size of topological domains was originally designated as 800 kb in the chromosomes of mouse and human embryonic stem cells  . Later  with the help of the Hi-C method with higher resolution (1 kb), the average size of these domains in human chromosomes was reduced to 185 kb. Similar domains with an average length of 60 kb are discovered in D. melanogaster chromosomes  .
TADs were identified using various modifications of the 3C-method (Chromosome Conformation Capture). Formaldehyde fixation of protein-protein and protein-DNA interactions is one of the first steps of the method      as well as salt and EDTA treatment of cell nucleus that may distort the picture of the existing interactions. It is not surprising therefore that substantial part of the identified interactions refers to the enhancer-promoter interactions (related to the pattern of expressed genes), and interactions with the participation of insulator proteins.
However, one cannot exclude the involvement of direct DNA-DNA interactions in the structural and functional organization of chromosomes   . Short polypurine/ polypyrimidine tracts (9 - 10 bp), capable of forming the triple-helix DNA structure (H-form DNA) are possible participants of this process. Triplex DNA structures are revealed by immunocytochemistry in different types of chromosomes (interphase, metaphase, meiotic, polytene), they are usually detected in the intergenic regions and introns of different genes. Polypurine/polypyrimidine 9 - 10 bp long but no longer tracts are of very weak recombinogenity (for details see  ). When complementary polypurine and polypyrimidine tracts are located at a distance from each other, their interaction (formation of the H-form DNA) leads to looping out of nucleotide chain located in between. This was shown in in vitro experiments using both synthetic oligonucleotides    , and the human alpha-1-antitrypsin gene  , and while bioinformatic essays   . It should be noted that the protein-coding and non-coding sequences (satellite DNA and repeats of LINE and SINE type) can form loop structures with fundamentally similar mechanism  .
The question that still needs to be elucidated is how the “borders” between the adjacent TADs (including the nucleotide sequences of the chromosome regions) are organized  . However, based on the definition of TADs (see above) the “border” is the one to ensure cooperation between genomic regions inside the domain and to prevent (reduce the probability of) cross-domain interactions. This assumption applies to the function of insulators as well.
Earlier   using transgenic system of D. melanogaster yellow and white genes, we have shown that the nuclear envelope DNA attachment sites (neDNA) are capable of reporter genes’ protection from the Position Effect Variegation (PEV), and the ability is improved in the presence of Wari insulator. If only one of the transgenes (white gene + Wari) is flanked by neDNA, it appears to be better protected against PEV. In other words, the insulator “works” well between the two neDNA fragments, while it has slight effect through the neDNA fragment. Interestingly, neDNA is evolutionarily conserved   . These point nuclear envelope chromosomal attachment sites may also form looped chromatin structures   . The neDNA fragment (EnvM4), used in our transgenic experiments, possess extremely conserved motif (AAAGA)n. DNA motifs (GAGA), responsible for the attachment to the nuclear envelope, were revealed within the LADs (Lamina Associated Domains)  . However, it should be noted that represented consensus contains AAAGA motifs as well. The most common feature of these two types of DNA motifs is the presence of polypurine tandem copies.
TADs are not detected both in mitotic chromosomes  and in the inactivated X-chromosome  . Perhaps one reason for identifying no TADs in mitotic chromosomes is the absence of the one of the components involved in the formation of TADs’ borders at metaphase stage. That component turns to be the nuclear envelope/nuclear lamina, the one to attach interphase chromosomes at various sites. Concerning inactivated (compactisied) X-chromosome, identifying no TADs may be of the same cause as the compaction of chromosomes is accompanied by a change in the pattern of its interaction with the nuclear envelope/nuclear lamina.
Despite the identification of TADs in many organisms and many studies in this field, much remains unclear. The data on TADs give mostly a general idea about the interactions between the different sections of DNA. A detailed study of the nucleotide sequences, transcription, epigenetic status as well as higher levels of chromatin organization can help to create a complete picture of the processes taking place in the genome.
This study is devoted to D. melanogaster heat shock genes (hsp70) locus. These genes encode proteins of hsp70 family performing the protective functions of the cell, preventing denaturation and aggregation of proteins. It is known that heat shock genes are expressed regardless of the surrounding chromatin  . We explored the question of a possible mechanism for the establishment of an independent expression domain of heat shock genes at the level of the loop chromatin organization with the involvement of: neDNA (nucleotide motifs―nuclear envelope attachment “markers” (AAAGA)n and (GAGA)n), short “complementary” polypurine/polypyrimidine tracts as well as insulators of the locus (scs/scs’-elements) and their proteins (BEAF32 and Zw5)). This paper presents an analysis of the nucleotide sequences of the locus and the attempt to compare it with the available experimental data.
2. Materials and Methods
D. melanogaster chromosomal locus 87А7 contains hsp70Aa and hsp70Ab genes (3R: 11,904,163 .. 12,006,544), length 102,382 bp (FlyBase, release 6,0).
Search for nucleotide sequences able to form three-chain DNA structures was performed according to the following criteria: polypurine/polypyrimidine tracts should be potentially able to form T(A.T), A(AT), C(GC), and G(GC) triplexes not less than 9 bp long. To simplify the simulation the search for complementary polypurine/polypyrimidine tracts was led on the same DNA strand   . We used NCBI resources, flybase.org, including Drosophila Sequence Coordinates Converter (http://flybase.org/static_pages/downloads/COORD.html) and Vector NTI software. Juicebox software was also used to analyse Hi-C data.
3.1. General Characteristics of the Region
The chromosomal segment of D. melanogaster chromosome 3 (3R: 11,904,163 .. 12,006,544) is 102,382 bp in length with 17 genes located within (from CG14731 gene to Ect3 gene) (Figure 1). The research revealed 152 polypyrimidine (Y) tracts and 186 polypurine (R) tracts no less than 9 bp long in the region (maximum size of identified tracts is 27 bp). Among them 120 polypyrimidine tracts and 135 polypurine tracts are potentially able to interact with each other with looping out of DNA located in between. Most of the tracts can possibly interact with a numerous complementary tracts located in various sites of analyzed region on chromosome 3.
Scale and representation of genes is taken from flybase. Arrows show the direction of transcription. Localization of the functional areas identified in hsp70 genes locus is showed with figures: black squares―poly-A/T tracts; triangles―insulators class II (Su(Hw)); circles―insulators class I (CP190, BEAF32, CTCF); asterisks―(AAAGA)2 tracts; rhombi―scs/scs’-elements. The border between physical domains is marked with vertical dashed lines. Diagonal hatching represents the areas of frequent intra-domain interactions. Diagonal cells denote regions that are found within the loop for hsp70 genes/one loop for the whole locus and black rectangles denote the bases of these loop structures respectively.
The insulators of hsp70 genes locus are localized in the area: the region of scs- element overlaps with the promoter of CG31211 gene (3R: 11,948,272 .. 11,950,063) and includes Y82-Y84 and R108-R110 tracts, which have 31 complementary tract in the test region. Another insulator, scs’-element, is located in the area (3R: 11,962,738 .. 11,963,832) overlapping with the promoters of CG3281 and aurora genes. This region includes tracts R123, R124, and there is only one polypyrimidine tract complementary to R124 (data not shown). Flybase.org resource shows the localization of two types of recognition sites for the insulator proteins at the locus. The first are for the insulators class II.mE01 (contain recognition sites for Su(Hw) protein): insulator_II_2339 and insulator_II_2340. The second type of sites are for insulators class I.mE01 (contain recognition sites for two of three insulator proteins―CP190, BEAF32 and CTCF): insulator_I_3071, insulator_I_3072, insulator_I_3073, insulator_I_3074 (Table 1, Figure 1).
Figure 1. Schematic representation of D. melanogaster hsp70 genes locus.
Table 1. Sequence analysis results for the D. melanogaster hsp70 genes locus (3R: 11,904,163 .. 12,006,544; 102382 bp).
The sequence analysis of the region revealed a large number of single GAGA and AAAGA tracts, but not dimers that are only 10 and 2, respectively (Table 1, Figure 1). The size of the chromosomal region flanked by the dimers of conservative AAAGA tract is about 60 kb (Table 1, Figure 1). It should be noted that the area of the locus bounded with (AAAGA)2 tracts on both sides lies within an area bounded on both sides with the insulator protein Su (Hw) binding sites. In turn, the binding sites for insulator proteins CP190, BEAF32 and CTCF are located within the area bounded by the tracks (AAAGA)2. Most of the (GAGA)2 tracts are also located in the same region.
The formation of the DNA loop structures (bending of the DNA molecule) is facilitated by poly-A and poly-T tracts  . Furthermore, it is shown that poly-A and poly-T rich DNA regions selectively interact with nuclear envelopes in vitro  , enrich genomic regions that interact with nuclear lamina  . Sequence analysis of the locus has revealed 31 poly-A/T tracts in the region (Table 1, Figure 1). Interestingly, (AAAGA)2 sites are enriched with poly-A/T tracts (Figure 1).
Accordingly, the detailed analysis of the possibilities for loop structures forming was carried out in the region flanked by (AAAGA)2 tracts on both sides. A number of 55 polypyrimidine tracts capable of interacting with 189 polypurine tracts were found in the region.
3.2. The Whole Area between (AAAGA)2: Tracts Can Form a Single Loop Structure
The first question that we were interested in is whether the entire chromosomal region, located between the two conservative dimers AAAGA, form a loop by means of three- stranded DNA structures? It turned out that it is potentially possible, with 11 options of such loops. Four polypyrimidine and 14 polypurine tracts in the region of about 10.5 kb long between genes CG14731 and CG31211 (3R: 11,938,828 .. 11,949,416) have complementary polypurine (4) and polypyrimidine (7) tracts in the region of about 4 kb long between genes mfas and Ect 3 (3R: 11,994,767 .. 11,998,654). The size of these potential loop structures is about 60 kb. Interestingly, the sites with the potential to form such looped structures are flanked by conservative (AAAGA)2 as well as they are rich in poly-A and poly-T (Table 1, Figure 1).
3.3. hsp70 Genes Can Be Localized in The Loop Structures of Small Size
Genes hsp70Aa and hsp70Ab can be organized in a loop of a smaller size (about 30 kb). The two looped structures can be formed potentially. The first loop composition: 6 tracts localized between genes CG14731 and CG31211 (11,938,916 .. 11,948,717) are complementary to 4 tracts in the CG12213 gene (11,967,091 .. 11,967,605). In this case, genes CG31211, hsp70Aa, hsp70Ab, CG3281, and aurora are found within the loop. The second loop composition: 11 tracts localized between genes CG14731 and CG31211 (11,938,828 .. 11,948,218) are complementary to 5 tracts in the CG18347 gene (11,968,206 .. 11,971,622). Thereby the genes CG31211, hsp70Aa, hsp70Ab, CG3281, aurora, and CG12213 enter the loop. Since the tracts between the CG14731 and CG31211 genes largely overlap, either one or the other loop can be realized potentially (Figure 1).
Interestingly, polypurine/polypyrimidine tracts located in the 5’-region of the locus also overlap when forming a single loop structure and smaller loops of the analyzed chromosome segment (Figure 1). It means that the formation of a smaller loop (isolation of genes hsp70Aa and hsp70Ab) is accompanied by the destruction of a larger loop, and vice versa, the formation of a “big” loop may be accompanied by the destruction of the “smaller” loop. Perhaps the destruction of the large loop appears as puffing on the cytological level. Simultaneously scs- and scs’-elements are physically placed inside the area of decondensed chromatin. This consequence of our simulation corresponds with some experimental data. First, while heat shock scs- and scs’-elements are identified within the puff, but not at the borders of condensed and decondensed chromatin  . Second, conservative AAAGA tracts were found at the band/interband border on D. melanogaster polythene chromosomes using FISH-analysis (unpublished data).
3.4. A Possible Mechanism for hsp70 Gene Independent Expression Domain Formation
It can be assumed that the formation of a domain of active hsp70 genes is implemented in 2 steps. At the first stage the loop is formed with the participation of polypurine/ polypyrimidine tracts (the formation of two types of loops is possible). BEAF32 and Zw5 proteins of scs/scs’-elements participate at the second stage (Figure 2).
Step 1. While formation of the first loop (see above), CG31211, hsp70Aa, hsp70Ab, CG3281, and aurora genes are found within the loop. When forming the second loop (see above), CG31211, hsp70Aa, hsp70Ab, CG3281, aurora, and CG12213 genes enter the loop. Since the tracts between the CG14731 and CG31211 genes largely overlap, either one or the other loop can be realized potentially.
Step 2: When forming the hinge structure at step 1, scs- and scs’-elements become sufficiently closer to each other (small distance apart in the nuclear volume). Perhaps it improves the conditions for interaction of BEAF32 and Zw5 proteins, which is shown
Figure 2. Schematic representation of the two variants of loop structures for hsp70 genes that can be formed with the participation of triple-stranded DNA.
in vitro and vivo  . A loop within the loop is formed (Figure 2). This results in CG31211, CG3281, and aurora genes’ inactivation because BEAF32 protein binding site, which specifically binds to scs’-element, overlaps with DREF transcription factor binding site in promoters of CG3281 and aurora genes  . Similar reason may probably underlie CG31211 gene inactivation, as scs-element specifically binded by Zw5 protein overlaps with the promoter of the gene. Thus, in the first loop formed with polypurine/polypyrimidine tracts only hsp70Aa and hsp70Ab genes can be in the active state. In the second loop formed with polypurine/polypyrimidine tracts inactivation of CG31211, CG3281, and aurora genes can be reached by the same means. Isolation of gene CG12213 from hsp70Aa and hsp70Ab genes may be due to the fact that it is situated between the bases of the loops formed by a) triplex DNA structures, b) BEAF32 and Zw5 proteins (Figure 2).
3.5. Loops and TADs of 87А7 Locus
To analyze Drosophila TADs we used data by Sexton et al.  . It turned out that 87A7 locus comprises two physical domains. The boundary between them is located upstream to genes hsp70Aa and hsp70Ab and falls into both “small” chromatin loops and “big” one formed by DNA-DNA interactions as well. Hi-C (TADs) data shows that the areas of frequent intra-domain interactions are localized within hsp70Aa and hsp70Ab genes loop (regardless of the loop formation variant), at insulators class I.mE01 and scs’ region. Another area, characterized by increased frequency of contacts, is localized in the big loop, the formation of which involves conservative tracts (AAAGA)n.
4.1. Chromosomal Domain of 87A7 Locus
The results of our study show that the short polypurine/polypyrimidine tracts may be involved in the loop organization of D. melanogaster chromosomal locus for hsp70 genes. Tracts (AAAGA)2 appeared to be the most significant nucleotide sequences to determine the domain at 87A7 locus. These tandem tracts are of extremely high evolutionary conservation, are localized mainly in the intergenic regions  of the genomes of various organisms (including D. melanogaster) and at the borders of the band/inter- band on polythene chromosomes of Drosophila (unpublished data).
The entire locus region located between two (AAAGA)2 tracts can be arranged into loops with the help of complementary polypurine/polypyrimidine tracts (Figure 1, Figure 2). Moreover, this area has the potential to form a large number of smaller loops with the participation of complementary polypurine/polypyrimidine tracts. For its size (approximately 60 kb) and a large number of potential intra-domain interactions (135 of 186 polypurine tracts have complementary polypyrimidine tracts (72.6%), in turn, 120 of 152 polypyrimidine tracts have complementary polypurine tracts (78.9%)), these domains may correspond to TADs of Drosophila chromosomes. We assume that TAD may correspond to the area of interphase chromosome located between the two sites that anchor interphase chromosomes to the nuclear envelope (in this particular case― between two evolutionarily conserved dimers AAAGA).
The comparison of our data (Table 1 and Figure 1) with the data of Sexton et al.  shows that there are two physical domains in the investigated locus, the boundary between which is localized within the region of “small” loops (“functional” domain of hsp70 genes). Both physical domains, comprising locus 87А7, have the “inactive domain” status. Explanation of this is the fact that Drosophila embryonic cells used for the experiment  have inactive heat shock genes, i.e. in the compact state, which may limit their interaction with the neighboring regions. In addition, the compact state of this region can inhibit interactions of flanking areas. It should be noted that chromatin architecture of a specific locus as well as its functional state, can be tissue- and stage-specific, therefore, this specificity should be taken into account when interpreting results.
According to Hi-C  two of three areas of frequent intra-domain interactions (TADs) at the 87A7 locus are located within the region bounded by a conservative tracts (AAAGA)n on both sides. More frequent interactions occur between sites that may be involved in the formation of both “large” and “small” loops of chromatin in the locus, with the help of direct DNA-DNA interactions. That is, the increased frequency of contacts is observed in the areas of chromatin, organized in loops. These data are also consistent with the fact that TAD may correspond to the interphase chromosome region located between the two sites of its “anchoring” to the nuclear envelope.
4.2. Functional Domain for hsp70 Genes
A large number of differentially expressed genes is localized between conservative (AAAGA)2 tracts of 87A7 locus (Figure 1). In the formation of the functional domain of hsp70 genes about 30 kb long the short polypurine/polypyrimidine tracts as well as insulators (scs/scs’-elements), and their proteins (BEAF32, Zw5), may be involved. Perhaps, the “big” loop (60 kb) and “small” loop (30 kb) are the alternative states of chromatin in 87A7 locus, because polypurine/polypyrimidine tracts involved in the formation of both loops overlap with each other in the 5’-region of the locus.
It is not excluded that the insulators most fully perform the functional domain formation within these chromosomal domains and their “work” outside these domains may be impeded. This assumption is confirmed by our experimental data with double- gene transgenic system (Drosophila yellow and white genes). We have shown that neDNA fragments (EnvM4) when flanking the two reporter genes are able to protect them from the PEV in the host chromosomes of D. melanogaster. The maximum protective effect was observed in the presence of insulator Wari located in the 3’-region of white gene. When neDNA flank only one of the reporter genes (white + Wari), this very gene is protected against PEV to a greater extent than other gene (yellow) not flanked by neDNA fragments   .
Results of analysis for 87A7 locus loop chromatin organization can logically explain some experimental facts. Firstly, scs/scs’-elements are not able to protect the transgene from PEV when integrated into heterochromatin  . This may be due to the fact that transgenic systems are artificial, and may not contain the required set of chromosomal elements in the structure, ensuring the formation of a transgene independent expression domain. In addition, sites of integration are not native for the transgene and cannot contain, for example, complementary polypurine/polypyrimidine tracts necessary for the formation of the loop structures. Thus, according to the simulation for loop organization of chromatin locus 87A7 results, chromosomal binding sites with the nuclear envelope, short complementary polypurine/polypyrimidine tracts as well as scs/ scs’-insulators and their proteins (Zw5, BEAF32)―the total of 3 elements may be involved in the formation of the independent expression domain of hsp70 genes. In the case of experimental transgenic system (yellow and white genes), only 2 elements― chromosome to nuclear envelope attachment sites and insulator Wari―are involved in the formation of the independent transgenes’ expression domain    .
Secondly, scs/scs’-elements are not localized at the puff boundaries but inside it upon heat shock  . The results of our analysis indicate that the “small” loop (hsp70 genes) is formed within the “big” expanded loop (puff??) upon activation of heat shock genes. This assumption is confirmed by the fact that (AAAGA)n conservative tract was detected by FISH data at the band/interband borders on polythene chromosomes of D. melanogaster (unpublished data). The barrier function of scs/scs’-elements is also questionable because scs/scs’-elements overlap with promoters of neighboring genes: scs- element includes CG31211 gene promoter, scs’-element―promoters of CG3281 and aurora genes (FlyBase database). In particular, BEAF32 (insulator protein) and DREF (transcriptional activator) proteins have a lot of interaction sites on the chromosome of Drosophila, and approximately in 50% of cases they overlap   . Their competitive relationships for the DNA binding sites have been shown as well  . Perhaps, the overlapping of insulator protein and transcriptional activator protein sites of the adjacent genes is one of the functional mechanisms of gene hsp70 isolation. The mechanism might be based on the relationship between the insulator protein and the transcription activator protein in the same way as “repressor-activator” attitude.
On the model of D. melanogaster chromosomal 87A7 locus, it was demonstrated that various elements could be involved in the structural and functional organization of chromosomes. They are short polypurine/polypyrimidine tracts, insulators and their proteins, regions that attach chromosomes to the nuclear envelope, i.e. not only DNA- protein but DNA-DNA interactions as well. Combinatorics of the interaction of these elements can determine alternative states of chromosomal locus and genes belonging to this locus. Moreover, the same structural and functional status of chromosomal locus can be accomplished by several embodiments of DNA-DNA interactions that may underlie self-regulation of the locus and in a wider sense―“stability” of the genetic system.
This work was supported by a grant from Subprogramme “Gene pools of wildlife and its preservation” of Presidium RAS Programme for Basic Research “Biodiversity of natural systems”.
List of Abbreviations
TADs―Topologically Associated Domains, neDNA―chromosomal DNA fragments that attach chromosomes to the nuclear envelope, LINE―Long Interspersed Repeat Sequences, SINE―Shot Interspersed Repeat Sequences, LADs―Lamina Associated Domains, scs/scs’―specialized chromatin structures, FISH―Fluorescence in situ Hybridization.