Edited by: Mark Webber Miller, Boston University School of Medicine, USA
Reviewed by: Sulev Kõks, University of Tartu, Estonia; Mark Logue, Boston University School of Medicine, USA
*Correspondence: Seth G. N. Grant
This article was submitted to Neurogenomics, a section of the journal Frontiers in Neuroscience
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
The cell types that trigger the primary pathology in many brain diseases remain largely unknown. One route to understanding the primary pathological cell type for a particular disease is to identify the cells expressing susceptibility genes. Although this is straightforward for monogenic conditions where the causative mutation may alter expression of a cell type specific marker, methods are required for the common polygenic disorders. We developed the Expression Weighted Cell Type Enrichment (EWCE) method that uses single cell transcriptomes to generate the probability distribution associated with a gene list having an average level of expression within a cell type. Following validation, we applied EWCE to human genetic data from cases of epilepsy, Schizophrenia, Autism, Intellectual Disability, Alzheimer's disease, Multiple Sclerosis and anxiety disorders. Genetic susceptibility primarily affected microglia in Alzheimer's and Multiple Sclerosis; was shared between interneurons and pyramidal neurons in Autism and Schizophrenia; while intellectual disabilities and epilepsy were attributable to a range of cell-types, with the strongest enrichment in interneurons. We hypothesized that the primary cell type pathology could trigger secondary changes in other cell types and these could be detected by applying EWCE to transcriptome data from diseased tissue. In Autism, Schizophrenia and Alzheimer's disease we find evidence of pathological changes in all of the major brain cell types. These findings give novel insight into the cellular origins and progression in common brain disorders. The methods can be applied to any tissue and disorder and have applications in validating mouse models.
The brain has a highly complex cellular architecture characterized by a diverse set of cell types that are highly interconnected. Identifying the cell types involved with the pathogenesis of disease is particularly challenging in heterogeneous tissues where cell types are often poorly defined. In the majority of brain disorders evidence exists for changes affecting multiple cell types. It has proven problematic to determine which cells are associated with the primary disease pathology and which are altered as secondary “reactive” responses. Genomic technologies have contributed important mechanistic insights into the primary genetic basis of pathogenesis through studies of mutations and variants that increase disease susceptibility. The recent availability of single cell transcriptomes (SCT) from brain tissue (Darmanis et al.,
A method known as Population Specific Expression Analysis (PSEA) (Kuhn et al.,
The EWCE method has been enabled by single cell transcriptomes from brain tissue (Darmanis et al.,
Here, we have used EWCE with single cell mouse brain transcriptome data to obtain significant interpretative advances with two important data sources: (1) human disease associated genes (2) transcriptomic datasets from post-mortem human brains of diseased and control patients. These applications inform on the primary and secondary cell types involved with brain disorders. The EWCE method is a robust approach, detecting consistent cellular signatures across transcriptome datasets from 17 Alzheimer's brain regions and two independent Autism datasets. The EWCE method enables diverse sets of omic data to be integrated and can be applied to a wide range of biological problems in metazoan organisms.
The EWCE method takes two arguments: (1) a target gene list of length
Raw cell type mRNA expression data was downloaded from the Linnarsson lab webpage (data annotated as being from 17th August 2014) (Zeisel et al.,
The dataset contains data from
For some of the analyses (when testing enrichment in genetic susceptibility genes) the values for expression in cell types rather than sub-cell types are used (definition stated above). For this, the values for the expression matrix
For the remainder of the methods section all the formula's are provided using
The background set used for EWCE analysis depends on the analysis being performed. For gene set (but not transcriptome) enrichment the background gene set is comprised of all genes which have orthologs between human and mice—including those in the target list—but excluding any which were not detected in the SCT dataset. For transcriptome enrichment analysis the background set has an additional restriction relative to simple gene set analysis, in that the background genes must also be expressed in the disease transcriptome dataset. Human genes are converted to mouse orthologs using Biomart.
The proportion of expression in each cell type is calculated as a matrix for each gene, then summed to get total expression in each cell type across the whole gene list. Thus, for a gene list indexed by
This calculation is then repeated for 100,000 randomly generated gene lists, having the same length as the target gene list, with the genes randomly selected from the background gene set. Sampling of random gene sets is done without replacement. When the target gene list has length
Where probabilities are stated for gene list enrichments, all
Throughout the paper the number of standard deviations which γ(
Enrichments found in gene sets from genetic studies have been shown to be biased by gene characteristics including transcript length and GC content (Jia et al.,
The disease gene associations were curated from the literature, being largely based on the most recent and authoritative studies. The sources are shown in and the genes comprising each list are in Supplementary Table
The transcriptome datasets used in the study were all obtained from publically available sources (Katsel et al.,
For the Schizophrenia analysis, data from multiple independent studies was available for a number of brain regions. Superior temporal gyrus came from two studies (Katsel et al.,
To merge these schizophrenia datasets together the EWCE method was extended as follows. Standard cell type bootstrapping was done for each individual study and the cell type expression proportions for each bootstrap sample was stored as a matrix, with a row for each of the 100,000 bootstrap replicates and a column for each cell type. For each individual study being merged, the bootstrap output matrices were summed to form a consensus estimated distribution of random cell type expression proportions. The cell type proportions for the target gene list in each individual study were summed. Calculation of
We first sought to confirm that the method detects expected cell type enrichments (Figure
We then examined 185 genes which have Human Phenotype Ontology annotations for abnormal myelination with the prior hypothesis that they would be enriched for oligodendrocyte genes. The majority of the genes are associated with rare neurological disorders in which patients/families have been shown to exhibit either demyelination or absence of myelinated fibers. EWCE analysis confirmed that oligodendrocytes are the cell type most enriched amongst these genes (
We then tested for cell type enrichments in susceptibility genes for seven major brain disorders: Alzheimer's disease, Anxiety disorders, Autism, Intellectual Disability, Multiple Sclerosis, Schizophrenia and epilepsy (gene lists used shown in Supplementary Table
Schizophrenia and Autism were found to be the only exclusively neuronal disorders, and both had their strongest enrichment in pyramidal rather than interneurons. Schizophrenia associated genes were enriched for all three classes of neurons with
Interneurons were found to be the most enriched cell type for intellectual disabilities (ID) (
We next sought to ascertain whether the method could be used to describe the cellular nature of disease phenotypes found in post-mortem brain samples. Many transcriptome studies have been performed for major brain disorders, in an effort to cast light on the pathological basis of the conditions. We reasoned that because our method utilizes genome-wide data to define “set membership” in a quantitative and brain specific manner, it may be more robust and relevant than GO enrichments at detecting the hidden variables underlying brain diseases.
The first step towards obtaining the gene set from transcriptomics was in each case a standard differential expression analysis. Genes were rank ordered by t-statistic and the 250 most upregulated as well as 250 most downregulated genes were taken for further analysis. We then perform an Expression Weighted Cell type Enrichment (EWCE) analysis, wherein the random samples are obtained by reordering the ranked list 100,000 times (see Figure
The pathological characteristics of Alzheimer's disease are relatively well understood, with inflammatory gliosis and synapse loss becoming more acute as the disease progresses. We applied the method to an Alzheimer's dataset that examined changes in 14 cortical and three non-cortical regions, with between 51 and 70 samples per region. Differential expression was calculated for genes whose expression is affected by increases in Braak score. Across all the brains tested, a consistent cell enrichment signature was detected (see Figure
Having validated the method's ability to detect known pathological phenotypes, we sought to apply the method to two disorders that are less well characterized: Autism and Schizophrenia. Two publically available transcriptome datasets were used for Autism (Voineagu et al.,
Both Autism studies, like the Alzheimer's disease studies, indicated that the disease processes affect every major cell type in the brain. As in Alzheimer's disease, interneuron and pyramidal enrichments were found in down-regulated genes, while enrichments of glial and endothelial cells were present in the up-regulated gene sets. The cellular changes expected to correspond to decreased expression of neuronal transcripts is unclear, with no consensus in the literature about how neurons are affected in autistic patients (van Kooten et al.,
We next extended the study to Schizophrenia. We utilized data from six independent transcriptomic studies, providing data for many brain areas. Four of the studies included samples from the dorsolateral prefrontal cortex, while other brain regions including hippocampus and cingulate cortex were covered by at least two of the datasets. To maximize the utility of these replicate studies, the cell type bootstrap data was summed across each independent cohort allowing pooled estimates for cellular changes. As was expected based on the disease literature, regional changes were found to be divergent. Some regions (e.g., the primary visual cortex) were found to show no significant changes, while the most pronounced enrichments were seen in the prefrontal and cingulate cortices.
Those regions showing alterations were found to cluster into two groups: (1) those with decreased oligodendrocyte expression and upregulation of pyramidal neuron genes; (2) increased astrocyte and/or endothelial expression. All samples from the prefrontal cortex fell into the second cluster. Four regions fall outside of these clusters and show few/no significant changes: the primary visual cortex, putamen, superior and inferior frontal gyrus. While we showed in Figure
One of the most significant changes was in the dorsolateral prefrontal cortex (BA46), with an enrichment of astrocyte genes 12.7 standard deviations from the bootstrapped mean (
The second set of brain regions affected in schizophrenia show totally distinct phenotypes from those described above. Astrocyte and endothelial expression appears normal, while highly significant down-regulation of Oligodendrocyte genes was found in the Caudate Nucleus (
Using the EWCE method, we have shown that single cell transcriptome data can be integrated with genetic susceptibility data or tissue transcriptome data to identify cell types involved with disease. Using lists of genetic susceptibility data with single cell transcriptome data that define cell types, we could identify specific cell types that are the likely primary targets of the genetic susceptibility. In a separate analysis, using lists of genes from post-mortem transcriptomes, we found that a broader range of cell types were affected, indicating that the cellular pathology of the disease extends from the cells affected by the primary genetic susceptibility to a wider set of “secondary” or “reactive” cell types.
For seven different brain disorders, the EWCE method was used to identify the putative primary cell types affected by genetic susceptibility. Consistent with current models, pyramidal neurons were the cell type most associated with schizophrenia and autism genes. A primary role of microglia in multiple sclerosis is also consistent with primary pathology in the immune system (Hemmer et al.,
Intellectual disabilities and epilepsy were both found to be most strongly associated with interneurons. Past studies strongly support the finding that interneurons are the causative cell type underlying seizures (Ogiwara et al.,
Once denser sequencing of interneurons has been performed from a wider set of brain regions it may be possible to identify specific interneuron populations associated with distinct types of seizures, as well as particular cognitive deficits. The ability to precisely distinguish affected subtypes of interneurons may however require a modification to the method: at present four of the five disorders which are stated as affecting one population of neurons, have significant enrichments for all three neuron types. As neurons share many genes, and vary only in limited subsets and graduated expression levels, it may be that one can only distinguish between them by considering the single most enriched category. The limitations of the method can be seen by considering the case of autism, which shows it's strongest enrichment in pyramidal cells but is also enriched in interneurons to a lesser degree. These results could support either of two hypotheses: that autism is a primary pyramidal neuron disorder, or that it is a disorder of broad neuronal dysfunction. An extension to the EWCE method that penalizes neuronal subtypes for absence of expression when a gene is expressed in similar cells could potentially resolve this issue.
Contrasting the results of EWCE from genetic susceptibility data with transcriptomes from diseased tissue suggested that the cellular pathology spreads from the primary affected cells to secondary cells. These putative secondary changes appear to extend between classes of cells. For example, the genetic susceptibility of Autism and Schizophrenia appears to primarily impact neurons, yet both show evidence for secondary endothelial disruption. Interestingly, both disorders have been shown to have decreased cerebral blood flow (Sabri et al.,
Implementation of EWCE in mouse models of human disease could underlie a new approach to studying brain disorders. Once the transcriptomic cell type enrichments are determined for a disease, conditional mouse models (carrying cell type specific mutations) could then be validated or rejected based on whether they recapitulate some or all of the disease associated secondary effects. One problem with this approach is that a range of distinct changes could result in the same transcriptomic alteration (for instance, down regulation of interneuron genes could be caused by either decreased cell density, altered cellular state, or decreased synaptic connectivity). The directionality and regional specificity of transcriptional phenotypes should however be able to act as a guide to narrow down the nature of the cellular changes. Even without more extensive follow-up EWCE could heighten confidence in the biological validity of disease models for which only behavioral similarities to human disease could otherwise be shown.
Three limitations with the current study are that the cells were obtained from mice, they were immature and only from CA1 and somatosensory cortex. Using human rather than mouse data may make a significant difference for disease enrichments: one study which compared expression profiles of cell types between humans and mice found that as few as 52% of genes identified as being astrocyte-enriched in mice, were also found to be astrocyte-enriched in humans (Zhang et al.,
In the coming years the quantity of single cell data that is available will increase and the utility of the EWCE method is expected to expand commensurate with this. The single cell transcriptome dataset used for this study comprised 3005 cells from the cortex and CA1 of mice aged p21–31. With greater depth of cellular sequencing, it may become possible to detect changes in more specific populations of cells. This is potentially of greatest importance for interneurons for which the data presently available is sparse—of the 1314 cells sequenced from CA1, only 126 were from interneurons. With over thirty types of interneuron estimated to exist just within CA1 based on morphology, electrophysiology and expression of classical molecular markers (Wheeler et al.,
We also note that there is no reason why EWCE should be restricted to the study of brain disorders and as sufficient cellular data becomes available the same methodology could be applied to any other disease. Indeed the EWCE method can be applied to other omic gene lists for the purposes of interrogating the relevant cell types. For example, gene lists from mouse phenotyping studies, such as the International Mouse Phenotyping Consortium (Brown and Moore,
NS (Implementation, analysis, and author); SG (Interpretation, drafting, and revising)
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The reviewer Dr. Logue and handling Editor, Dr. Miller declared their shared affiliation, and the handling Editor states that the process nevertheless met the standards of a fair and objective review.
This work was supported by the Wellcome Trust and the European Union Seventh Framework Programme under grant agreements n° HEALTH-F2-2009-241995 (“Gencodys” project) and HEALTH-F2-2009-242167 (“Synsys” project) and W. Richardson (UCL).
The Supplementary Material for this article can be found online at: