AAR  Vol.10 No.4 , July 2021
Insilico Identification of Genes and Molecular Pathways during Aging in Drosophila Brain
Abstract: The regulation of gene expression in brain vicissitudes during aging is still not much known and explored. Differential gene expression and regulation is a key factor involved to identify the important landmarks within the brain transcriptome to study neuronal aging. Recently, transcriptomic studies are highly explored to understand and depict diseased versus normal as next generation sequencing enables to capture the complete biological context to the entire genome. Study of gene expression during aging compared to young flies provides a signature and scenario of gene expression and regulation during aging. In this study, we took advantage of NGS raw data of young and old flies head from SRA database of NCBI and decrypted the gene expression regulation during normal aging in drosophila model. We identified 350 genes with significant differential expression between young and old flies having 0.01% FDR. Various pathways in context to identified genes which are involved in aging include autophagy i.e. cell death and apoptosis, proteolysis, oxidative stress, declination grey and white matter and neurotransmitter levels, mitochondrial discrepancy, electron transport chain, sugar degradation pathways, activation of transcription factors involved in epigenetic changes, regulators involved in negative and positive regulation WNT signaling pathways, G protein coupled receptor etc. as all these factors contribute to neurodegeneration and possibly dementia in normal aging. So, to find the specific genes and regulators which are differentially expressed in normal aging, we investigate brain transcriptome of normal aging flies compared to young flies which offer a repertoire of genes, regulators and factors involved in network of neurodegeneration to establish direct correlation between aging and dementia. We also identified the pathways which are involved in aging and corresponding gene regulation in these pathways in aging flies brain. It is found that there are some common pathways whose genes and regulators are highly differentially regulated in both aging and dementia.

1. Introduction

Aging is followed by cognitive decline in brain and basic risk factor for neurodegenerative disorders. There are many toxic proteins involved in brain aging giving rise to many disorders like Alzhiemer’s, parkinsons, memory loss and synaptic plasticity. The toxic protein aggregation is influential factor involved in neurodegeneration [1]. This aging may lead to progression of age related disorder, Alzhiemer’s disorder. Amyeloid beta plagues and neurofibrillary tangles are undoubtedly major hallmarks of Alzhiemer’s which is increased due to accumulation of these toxic proteins, and remarkable pathways are associated with Alzhiemer’s like oxidative stress and low amount of antioxidants [2]. A recent publication showed that AD is not only a neurodegenerative, but it is systemic too [3] which evidences repotire of genes involved in neurodegeneration which are not only expressed in brain but also in other tissue parts. So, before finding distinctive features defined as tangles and plagues, we can identify hallmark regulators which are responsible for this accumulation.

Global transcriptome analysis provides important information about brain aging in various animal models including Drosophila which is relatively easy to handle due to its small brain size and capacity to acquire genetic manipulations. Fly brain studies are useful to untangle various neurological disorders like Alzhiemers’s [4], Parkinson’s [5] that occurs with the effect of aging and to understand the role of various pathways and regulators which play or interplay in the aging and neurodegeneration. Several groups demonstrated the gene expression modulation in aging in mammals [6] [7] [8] containing many tissue types including brain and have used the microarray data which have much more contamination and more false positives than that of RNA sequencing. Single cell RNA technique was also used to identify the brain aging precluding the gene expression changes and specific pathway information related to aging [9]. Recently, the whole transcriptome brain profiling of aging flies is published which includes both the sexes and ages with the coexpression module of genes affecting learning and memory [10] but has not shown the correlation of neurodegenerative disorders with aging.

Here, we provide the differential gene expression studies of whole fly brain of both the sexes young versus old and also the common pathways and its genes involved in aging and dementia commonly.

2. Materials and Method

In this study we have used eight different samples of paired end RNA sequence of fly heads in which 4 samples are from brain tissue of 5 days (1 male and 1 female) and 10 days (1 male and 1 female) young flies. Another 4 samples contains 20 days (1 male and 1 female) and 30 days (1 male and 1 female) brain tissue of old fly heads taken from sequence read archive database from NCBI ( under accession no. PRJNA418957. The young and old samples are bifurcated accordingly considering one male and 1 female in each age group. The RNA-seq data contains sequence reads as there are some experimental and sequencing errors which may create noise including adapters and hence they need to removed and trimmed off. The sequence reads are filtered using a cutadapt program which is used to remove adapter sequence from the reads and sequencing errors.

2.1. Mapping and Alignment of Reads

The filtered reads are aligned and mapped against the drosophila whole genome to identify the regions which are expressed in both young and old fly heads. The mapping is done using bowtie 2.0 to get the BAM file containing header, mapping quality and reads which are sequenced. There are 16 sets obtained containing reverse as well as forward reads aligned and a set of regions which are not aligned.

2.2. Counting the Reads to Annotate against the Respective Genes

The aligned reads are obtained from BAM files generated by bowtie. These reads are further counted against the protein coding gene to obtain the annotation of reads. For this featurecount program is used to obtain the counts. The reads which are overlapped with the genes are counted and annotated. The number of overlapping reads also depends upon the depth of the sequence and quality of the sequencing. The count file is simple tabular file generated containing gene IDs and number of reads against them.

2.3. Differential Gene Expression Analysis and Gene Annotation

The obtained count file is used as input for differential analysis [11] [12] of the gene i.e. the reads which are counted against the gene is analysed for differential expression in young flies and old flies to find out the gene regulation in brain of old age flies compared to young flies. Limma voom [13] [14] package is used for finding differential expression of gene. The tool is used to analyse mean variance of the samples and normalisation of each sample count file. The mean variance analysis and statistical analysis is the measure of expression of gene in each sample which can further filtered to get the most differential expression of the gene [15] [16] [17]. The differentially expressed genes are established for gene ontology analysis to get the biological function of each gene and pathways in which these genes are involved. The pathways are also designed and deduced for identifying the actual connection of each gene with the other and hence gene regulatory network of the complete highly expressed genes is obtained.

3. Results

3.1. Identification of Genes Mapped with Young and Old Fly Heads of Male and Female

Sequences obtained from different samples of aging male and female flies from SRA dataset are raw sequences containing noise and unwanted sequencing errors hence it needs to go through quality check to get efficient results, for this trimmomatic and cutadapt [18] program is used to remove noise and adapter sequences and it was found that around 60% reads are unique in reverse and 48.5% reads are unique in forward strand in paired end RNAseq data of aging flies’ heads. The refined and filtered data is deployed for mapping which is done by bowtie2.0 program against the reference genome of drosophila melanogaster to identify the genes which are affected in old flies in comparison to young flies. The mapping statistics is shown in (Figure 1) which indicates 40% - 60% of alignment which is found to be unique and round 20% - 40% reads are unaligned with the drosophila genome.

The BAM file is obtained as output file of mapping which is actually compressed form containing mapping information which is further inspected and visualised through IGV (integrated genome viewer) [19] [20] which shows mapped reads at each position and connecting lines between the aligned reads indicate reads mapped against introns (Figure 2).

(a) (b) (c) (d) (e) (f)

Figure 1. The quality of reads generated from RNAseq of flies head obtained after using trimmomatic and cutadapt function. (a) Shows the percent of overrepresented sequences in forward and reverse strands. (b) Shows N content left after trimmomatic function applied over the reads. (c) Showing GC content present in each sequence of forward and reverse strand. (d) Shows duplication level of the sequence in each strand. (e) Shows length distribution of the reads. (f) Shows sequence counts in both forward and reverse strand.

3.2. Counting Number of Reads per Annotated Genes

For counting the number of reads many programs are there [21] [22] [23] corresponding to each annotated gene we used featurecount [15] program which takes the mapped BAM input and type of strandness as we have unstranded RNAseq data hence we got the counts in forward as well as reverse direction both. We also used a gtf. file as input to get the annotated genes against the reads. 20% - 30% of reads are mapped with genes and the numbers are found to be less due to use of a particular tissue type against the whole genome of flies (Figure 3).

Figure 2. The mapping regions present on the chromosome X of all the 8 samples and regions of each samples are mapped differently shows the differential expression of regions.

Figure 3. X-axis shows percent of reads assigned to the genes and Y-axis show the sample number. The graph shows basic statistics of reads which are mapped against the gene.

3.3. Identification of Differentially Expressed Features

To be able to identify differential gene expression [24] in old flies all dataset of total 8 samples having forward and reverse strands are analysed. 4 samples of young flies i.e. 5 days and 10 days old and 4 samples of old flies having 30 days and 40 days old are analysed and as we have already 8 files containing forward and reverse counts against each gene. Some samples have more reads compared to other samples which shows higher sequencing depths of the samples hence mapping of reads against the gene also depends on sequencing depth and length of the gene, longer the gene higher the reads found against them. So, to deal with such situation normalization of data is done and for that limmavoom [13] [14] is used to run Differential gene expression. The R script is used for limma package to normalise the count table obtained from different samples and this step is done to equalise the relative abundance of each gene in a RNA sample as in some cases small number of genes are highly expressed in one sample compare to another which may cause false positives hence normalisation is used for equalising the abundance. Here, we incorporated a factor called aging of flies comprising young flies and old flies. The data generated here contains the differential regulation of genes of old flies compared to young flies. And it was found that 141 genes are upregulated and 262 genes are down regulated in old flies as compared to young flies out of approx. 17,555 genes. The summary table is obtained containing gene identifiers, mean of normalised counts average to all samples, fold change in log2, standard error estimate for log2 fold change, wald statistics, p value, p value adjusted to multiple testing which controls FDR. A graphical summary of results is obtained which contains mean variance plot, mds and box plots and volcano plot (Figures 4-6) which show the variance and similarity in the gene expression of different samples.

3.4. Extraction and Annotation of Differentially Expressed Genes

This obtained data is filtered to extract and annotate differentially expressed genes by identifying absolute fold change greater than 2 and for that first the significant adjusted p value is calculated by filtering all values less than 0.05 and found 430 genes which are significantly expressed further this result is narrowed down by filtering abs log 2 fold value greater than 1. It was found that there are 350 genes which are expressed differentially with significant adjusted p value. The mean variance of table is calculated and Z scores are calculated and annotated using DAVID. The top 32 genes with Z scores are shown in heatmap (Figure 6).

3.5. Gene Ontology Analysis

The genes are further analysed to identify their presence in various biological pathways and to understand the process of aging in more depth. The Go analysis reduces the complexity by linking the gene directly to its biological and molecular function. Goseq program is used to identify the pathways related to these genes. The wallneius Rank category method is used to get corresponding GO


Figure 4. (a) shows fold change value red dots signifies genes having positive values i.e. genes which are upregulated and blue dots shows gene having negative log fold value i.e. genes are down regulated. (b) mean variance model shows number of genes present in outliers and the values of genes fall under mean variance value. (c) shows log fold change value of gene i.e. the negative values plotted shows the down-regulated genes and positive values show the up-regulated genes.


Figure 5. The figure shows box plots of the normalized and unnormalised counts i.e. log count per million value is considered to normalize the counts of different samples. (b) shows strip charts of the genes which are differentially expressed according to logFc value and adjusted P value.

Figure 6. Showing heatmap of topmost differentially expressed genes. The rows represents name of the genes and column represents name of samples. Blue color shows negative scores i.e. down regulated genes in the samples and Red colour shows upregulated genes in the sample heads of old flies and young flies.

term over represented p value, under represented p value, number of differentially expressed genes, number of genes in this category and details of the term is calculated. A graph of top 10 over represented GO term is given in the (Figure 7) which shows that genes involved in stress response, apoptotic process, histone modification, covalent chromatin modification, protein modification by small protein and protein ubiquitination, immune and defence response regulation. The KEGG pathways plotted using pathview [25] against topmost expressed genes are also generated which shows complete information about the genes and their role in pathways involved in aging and dementia (Figure 8).

Figure 7. Topmost overrepresented GO term against the corresponding genes which are differentially expressed in old flies as compared to young flies.

Our result provides the link between aging brain and dementia which can be further deployed to establish therapeutics against neurodegeneration and aging brain. As, the data shows that both run parallel to each other and factors affecting brain aging strongly contributes to neurodegeneration and dementia.

4. Conclusion

Mitochondrial DNA damage and oxidative stress run parallel during aging and lead to dementia as reactive oxygen species production surpasses cellular antioxidants defence system which contains antioxidant enzymes and this declination can be easily seen in AD brain [8]. It is also evidenced that AD affects brain

Figure 8. This figure shows different metabolic and biological pathways in which the genes are present which are differentially expressed in aging flies. The red star shown in the pathway is denoting a gene which is found to be expressed differentially. (a) is showing oxidative phosphorylation pathway and metabolic pathway. (b) is glycan degradation pathway (c) glycerophospholipid metabolism (d) is neuroactive ligand receptor interaction.

and periphery in which many causative pathways play/interplay significant role and simultaneously many factors risk AD like diabetes, obesity, hypertension, stroke and other cardiovascular risk factors [26]. Neurodegeneration and Aging are inseparable but who is under who is still a debate because of along with aging function of brain changes and aggregation of toxic protein increase. At the same time, apoptosis occurs, leading to reduction in brain volume [27]. Once again mitochondrial energy production decreases due to increase in oxidative stress and finally decline in mitochondrial function is a cause of aging.

Recently the brain transcriptome change in the aging flies is being published [9] showing the pathways and genes involved in aging. In our study, we found 262 genes are down regulated in old flies which are associated with many pathways required for normal functioning of brain and body and are required to promote neuronal growth and neurogenesis in aging brain. A recent discovery of progressive increase of many genes involved in high level of nucleic acid oxidation genes in mitochondrial DNA with aging and in AD cases leads to oxidative stress [28] [29] [30] and contributes strongly in aging and in dementia as here also gene ontology analysis of genes we identified CG10211, CG13280, CG16761, ND6 gene in drosophila shows high amount of expression of nucleic acid oxidation genes present in mitocondrial DNA. Dementia is a common consequence of diabetes as insulin level may affect neurotransmission, cell survival and amyloid trafficking [31] [32]. The differential expression data in our study also shows many genes like CG7985, Glucosidase 2 alpha subunit, involved in insulin modulation. Inflammatory proteins in plasma are also associated with severity of dementia [33]. Lactin-galC1, Drosomycin like gene is also found to be down regulated and provides defence against fungal infection. Okouchi M et al., evidenced change in apotosis regulation leads to many neurodegenerative disorders like Alzhiemer’s, Parkinson’s, Huntington's (HD) diseases, amyotrophic lateral sclerosis (ALS), spinal muscular atrophy (SMA), and diabetic encephalopathy. In flies, our results show that Drep3, DNA pol alpha, psn (presinilin) are differentially regulated with aging in flies. Signal transduction pathways, epigenetic regulation, immune system, vascular system and angiogenesis aberrant regulation are major causes of neurogenesis and we found out various genes, receptors and harmon related to these pathways showing change in expression in adult flies [34]. Many genes, peptides and receptors are mapped with the reads of adult (aging) and young flies show change in expression related to neurogenesis and neurotransmitter decline like chemosensory protein, calcinuerin, Pyrokinin 2 receptor, mucin, Adipokinetic harmone, spatzle, suppressor of zeste, pleiohomeotic and mthl. Hence, in depth study of these genes, receptors, peptides and regulators will also help us in construing specific genes which can be controlled during aging to make the brain function normally by using medical therapeutic technique.

Cite this paper: Parakh, A. , Begde, D. and Dhingra, N. (2021) Insilico Identification of Genes and Molecular Pathways during Aging in Drosophila Brain. Advances in Aging Research, 10, 78-96. doi: 10.4236/aar.2021.104005.

[1]   Yankner, B.A., Lu, T. and Loerch, P. (2008) The Aging Brain. Annual Review of Pathology: Mechanisms of Disease, 3, 41-66.

[2]   Mecocci, P., Boccardi, V., Cecchetti, R., et al. (2018) A Long Journey into Aging, Brain Aging, and Alzheimer's Disease Following the Oxidative Stress Tracks. Journal of Alzheimer’s Disease, 62, 1319-1335.

[3]   Parker, W.D., Filley, C.M. and Parks, J.K. (1990) Cytochrome Oxidase Deficiency in Alzheimer’s Disease. Neurology, 40, 1302-1303.

[4]   Moloney, A., Sattelle, D.B., Lomas, D.A. and Crowther, D.C. (2010) Alzheimer’s Disease: Insights from Drosophila melanogaster Models. Trends in Biochemical Sciences, 35, 228-235.

[5]   Haywood, A.F. and Staveley, B.E. (2006) Mutant Alpha-Synuclein-Induced Degeneration Is Reduced by Parkin in a Fly Model of Parkinson’s Disease. Genome, 49, 505-510.

[6]   Zhan, M., Yamaza, H., Sun, Y., Sinclair, J., Li, H. and Zou, S. (2007) Temporal and Spatial Transcriptional Profiles of Aging in Drosophila melanogaster. Genome Research, 17, 1236-1243.

[7]   Girardot, F., Lasbleiz, C., Monnier, V. and Tricoire, H. (2006) Specific Age-Related Signatures in Drosophila Body Parts Transcriptome. BMC Genomics, 7, 69.

[8]   Kim, T.S., Pae, C.U., Yoon, S.J., Jang, W.Y., Lee, N.J., Kim, J., Lee, S.J., Lee, C., Paik, I.H. and Lee, C.U. (2006) Decreased Plasma Antioxidants in Patients with Alzheimer’s Disease. International Journal of Geriatric Psychiatry, 21, 344-348.

[9]   Davie, K., Janssens, J., Koldere, D., et al. (2018) A Single-Cell Transcriptome Atlas of the Aging Drosophila Brain. Cell, 174, 982-998.e20.

[10]   Pacifico, R., MacMullen, C.M., Walkinshaw, E., Zhang, X. and Davis, R.L. (2018) Brain Transcriptome Changes in the Aging Drosophila melanogaster Accompany Olfactory Memory Performance Deficits. PLoS ONE, 13, e0209405.

[11]   Robinson, M.D., McCarthy, D.J. and Smyth, G.K. (2010) edgeR: A Bioconductor Package for Differential Expression Analysis of Digital Gene Expression Data. Bioinformatics, 26, 139-140.

[12]   Robinson, M.D. and Smyth, G.K. (2007) Moderated Statistical Tests for Assessing Differences in Tag Abundance. Bioinformatics, 23, 2881-2887.

[13]   Law, C.W., Chen, Y., Shi, W., et al. (2014) Voom: Precision Weights Unlock Linear Model Analysis Tools for RNA-seq Read Counts. Genome Biology, 15, R29.

[14]   Liu, R.J., Holik, A.Z., Su, S.A., Jansz, N., Chen, K.L., Leong, H.S., Blewitt, M.E., Asselin-Labat, M.-L., Smyth, G.K. and Ritchie, M.E. (2015) Why Weight? Modelling Sample and Observational Level Variability Improves Power in RNA-seq Analyses. Nucleic Acids Research, 43, e97.

[15]   Liao, Y., Smyth, G.K. and Shi, W. (2013) Feature Counts: An Efficient General Purpose Program for Assigning Sequence Reads to Genomic Features. Bioinformatics, 30, 923-930.

[16]   Anders, S., Pyl, P.T. and Huber, W. (2015) HTSeq—A Python Framework to Work with High-Throughput Sequencing Data. Bioinformatics, 31, 166-169.

[17]   Kim, D., Langmead, B. and Salzberg, S.L. (2015) HISAT: A Fast Spliced Aligner with Low Memory Requirements. Nature Methods, 12, 357-360.

[18]   Marcel, M. (2011) Cutadapt Removes Adapter Sequences from High-Throughput Sequencing Reads. EMBnet Journal, 17, 10-12.

[19]   Robinson, J.T., Thorvaldsdóttir, H., Winckler, W., Guttman, M., Lander, E.S., et al. (2011) Integrative Genomics Viewer. Nature Biotechnology, 29, 24-26.

[20]   Wang, L., Wang, S. and Li, W. (2012) RSeQC: Quality Control of RNA-seq Experiments. Bioinformatics, 28, 2184-2185.

[21]   Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., et al. (2013) TopHat2: Accurate Alignment of Transcriptomes in the Presence of Insertions, Deletions and Gene Fusions. Genome Biology, 14, R36.

[22]   Loerch, P.M., Lu, T., Dakin, K.A., et al. (2008) Evolution of the Aging Brain Transcriptome and Synaptic Regulation. PLoS ONE, 3, e3329.

[23]   Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., et al. (2013) STAR: Ultrafast Universal RNA-seq Aligner. Bioinformatics, 29, 15-21.

[24]   Love, M.I., Huber, W. and Anders, S. (2014) Moderated Estimation of Fold Change and Dispersion for RNA-seq Data with DESeq2. Genome Biology, 15, Article No. 550.

[25]   Luo, W. and Brouwer, C. (2013) Pathview: An R/Bioconductor Package for Pathway-Based Data Integration and Visualization. Bioinformatics, 29, 1830-1831.

[26]   Nelson, M.E., Rejeski, W.J., Blair, S.N., et al. (2007) Physical Activity and Public Health in Older Adults: Recommendation from the American College of Sports Medicine and the American Heart Association. Medicine & Science in Sports & Exercise, 39, 1435-1445.

[27]   Cho, S.G. and Choi, E.J. (2002) Apoptotic Signaling Pathways: Caspases and Stress-Activated Protein Kinases. Journal of Biochemistry and Molecular Biology, 35, 24-27.

[28]   Haddadi, M., Jahromi, S.R., Sagar, B.K., Patil, R.K., Shivanandappa, T. and Ramesh, S.R. (2014) Brain Aging, Memory Impairment and Oxidative Stress: A Study in Drosophila melanogaster. Behavioural Brain Research, 259, 60-69.

[29]   Okouchi, M., Ekshyyan, O., Maracine, M. and Aw, T.Y. (2007) Neuronal Apoptosis in Neurodegeneration. Antioxidants & Redox Signaling, 9, 1059-1096.

[30]   Grotewiel, M.S., Martin, I., Bhandari, P. and Cook-Wiens, E. (2005) Functional Senescence in Drosophila melanogaster. Ageing Research Reviews, 4, 372-397.

[31]   Morris, J.K., Vidoni, E., Honea, R.A. and Burns, J.M. (2014) Impaired Glycemia and Alzheimer’s Disease. Neurobiology of Aging, 35, e23.

[32]   Sandhir, R. and Gupta, S. (2015) Molecular and Biochemical Trajectories from Diabetes to Alzheimer’s Disease: A Critical Appraisal. World Journal of Diabetes, 6, 1223-1242.

[33]   Leung, R., Proitsi, P., Simmons, A., Lunnon, K., Güntert, A., Kronenberg, D., Pritchard, M., Tsolaki, M., Mecocci, P., Kloszewska, I., Vellas, B., Soininen, H., Wahlund, L.O. and Lovestone, S. (2013) Inflammatory Proteins in Plasma Are Associated with Severity of Alzheimer’s Disease. PLoS ONE, 8, e64971.

[34]   Horgusluoglu, E., Nudelman, K., Nho, K. and Saykin, A.J. (2017) Adult Neurogenesis and Neurodegenerative Diseases: A Systems Biology Perspective. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics, 174, 93-112.