Advertisement
Archival Report|Articles in Press

Ultra-rare missense variants implicated in Utah pedigrees multiply affected with schizophrenia

Open AccessPublished:February 15, 2023DOI:https://doi.org/10.1016/j.bpsgos.2023.02.002

      Abstract

      Introduction

      Recent work from the Schizophrenia Exome Sequencing Meta-analysis (SCHEMA) consortium has showed a significant enrichment of ultra-rare variants in schizophrenia cases. Family-based studies offer a unique opportunity to evaluate rare variants, since risk in multiplex pedigrees is more likely to be influenced by the same collection of variants than an unrelated cohort.

      Methods

      Here we examine whole genome sequencing data from 35 individuals across six pedigrees multiply affected by schizophrenia. We applied a rigorous filtering pipeline to search for classes of protein-coding variants that co-segregated with disease status and these we examined for evidence of enrichment in the SCHEMA dataset. Additionally, we applied a family-based consensus approach to call copy number variants and screen against a list of schizophrenia-associated risk variants.

      Results

      We identified deleterious missense variants in three genes (ATP2B2, SLC25A28, and GSK3A) that co-segregated with disease in three of the pedigrees. In the SCHEMA the gene ATP2B2 shows highly suggestive evidence for deleterious missense variants in schizophrenia cases (p=0.000072). ATP2B2 is involved in intracellular calcium homeostasis, is expressed in multiple brain tissue types, and is predicted to be intolerant to loss-of-function and missense variants.

      Conclusions

      We have identified genes that are likely increasing schizophrenia risk in three of the six pedigrees examined; the strongest evidence being for a gene involved in calcium homeostasis. Further work is required to examine other classes of variants which may be contributing to disease burden.

      Keywords

      1. Introduction

      Schizophrenia is a clinically heterogeneous brain disorder that impacts on how people perceive and understand the world around them. Existing treatments are partially effective in most people, but outcomes are often poor, and patients die 12-15 years before the average population (
      • Tiihonen J.
      • Lonnqvist J.
      • Wahlbeck K.
      • Klaukka T.
      • Niskanen L.
      • Tanskanen A.
      • et al.
      11-year follow-up of mortality in patients with schizophrenia: a population-based cohort study (FIN11 study).
      ). The underlying etiology is poorly understood, but most of the variation in risk between individuals is genetically mediated, involving a spectrum of risk alleles from many common alleles of small effect to rare copy number and coding variants of larger effect. To date more than 270 common variants have been confirmed by the Psychiatric Genomics Consortium (PGC), but collectively these explain only a small fraction of risk and likely represent <10% of all contributory common variants (

      Ripke S, Walters JT, O'Donovan MC (2020): Mapping genomic loci prioritises genes and implicates synaptic biology in schizophrenia. medRxiv.2020.2009.2012.20192922.

      ). Twelve known copy number variants (CNV) substantially increase individual risk by 2 to 60-fold (
      • Rees E.
      • Kirov G.
      Copy number variation and neuropsychiatric illness.
      ), but are only present in about 2% of patients (
      • Kirov G.
      CNVs in neuropsychiatric disorders.
      ). The causal mechanism underlying most of these risk loci is still to be determined as the associated loci often overlap multiple genes or regulatory features. Identifying rare, coding variants that contribute to risk may be particularly important in understanding the complex molecular network involved and in developing better treatments for schizophrenia.
      Recently, the Schizophrenia Exome Meta-Analysis (SCHEMA) consortium reported 10 genes in which the burden of ultra-rare variants (URVs) was significantly higher in schizophrenia cases (

      Singh T, Poterba T, Curtis D, Akil H, Al Eissa M, Barchas JD, et al. (2020): Exome sequencing identifies rare coding variants in 10 genes which confer substantial risk for schizophrenia. medRxiv.2020.2009.2018.20192815.

      ). Their study also showed that schizophrenia cases carry an excess of pathogenic URVs in loss-of-function (LoF) intolerant genes compared to controls, even when these ten genes were removed. Highly deleterious missense variants had as strong a signal as protein-truncating variants (PTV). These results suggest that many more genes in which URVs contribute to schizophrenia risk are yet to be discovered. Family-based studies offer a unique opportunity to identify and evaluate rare variants conferring risk for schizophrenia, since densely affected pedigrees are more likely to be influenced by the same subset of variants compared to an unrelated cohort. Ultra-rare, family-private variants have been shown to be enriched in multiplex compared to simplex pedigrees in autism, indicating that they are good candidates for novel gene discovery (
      • Wilfert A.B.
      • Turner T.N.
      • Murali S.C.
      • Hsieh P.
      • Sulovari A.
      • Wang T.
      • et al.
      Recent ultra-rare inherited variants implicate new autism candidate risk genes.
      ). Historically, linkage analysis has been used to identify candidate causal genes or regions in pedigrees, but this usually requires data from many individuals across several generations. Instead, a co-segregation analysis can be substituted to analyze variants present in appropriate individuals within each family. Here, we examine whole genome sequencing (WGS) data from six multiplex pedigrees affected by schizophrenia to identify URVs likely contributing to the phenotypic risk.

      2. Methods and Materials

      2.1 Sample procurement and assessment

      Methods used in sample ascertainment and assessment were approved by the University of Utah Institutional Review Board. Multiplex pedigrees were identified by screening hospitalized patients with diagnoses of schizophrenia. Following written informed consent subjects were interviewed by a clinician using the Schedule for Affective Disorders and Schizophrenia-Lifetime Version (“SADS-L”) (
      • Endicott J.
      • Spitzer R.L.
      A diagnostic interview: the schedule for affective disorders and schizophrenia.
      ). Medical records were obtained for any individual who received psychiatric care. The interview results and any medical records were then presented to a diagnostic panel comprising two clinicians who played no role in ascertainment or assessment. Consensual diagnoses were made using Research Diagnostic Criteria (“RDC”) (
      • Spitzer R.L.
      • Endicott J.
      • Robins E.
      Research diagnostic criteria: rationale and reliability.
      ). Six pedigrees of predicted European ancestry in which schizophrenia was the dominant phenotype were selected from the cohort (Supplementary Figure 1).

      2.2 Whole genome sequencing

      Batch 1

      Twenty-six samples across the six pedigrees had been selected previously for WGS. DNA concentrations were quantified by Qubit and the quality of DNA was determined by agarose gel electrophoresis. WGS was performed by MedGenome, Inc. on an Illumina HiSeqX to an average depth of coverage of at least 30x per sample. Some of these samples had been aligned to the GRCh38 reference genome using the Sentieon Genomics proprietary pipeline but were re-aligned locally to match additional data from these pedigrees.

      Batch 2

      Twelve additional samples were selected for WGS. DNA concentrations were quantified by Nanodrop and the quality of DNA was determined by agarose gel electrophoresis. WGS was performed by Edinburgh Genomics (Clinical Genomics) on an Illumina HiSeqX to an average depth of coverage of at least 30x per sample.

      2.3 Data pre-processing and quality control

      All BAM files from Batch 1 were converted to paired-end FASTQ files using the picard toolkit so they could be re-aligned using the same pipeline and to the same reference genome as Batch 2. This involved reverting the BAM to remove all mapping information, fixing errors in mate pairs, removing reads without matching pairs and randomizing the reads to remove any potential downstream bias due to the original sort order. All FASTQ files were examined using FastQC and samtools (
      • Li H.
      • Handsaker B.
      • Wysoker A.
      • Fennell T.
      • Ruan J.
      • Homer N.
      • et al.
      The Sequence Alignment/Map format and SAMtools.
      ) to identify DNA contamination or degradation. Reads were aligned to the GRCh38 reference genome (GCA_000001405.15, including decoy, HLA and alternative contigs) using BWA-MEM (

      Li H (2013): Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM.

      ), following the GATK v3 “Best Practices” (

      Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, et al. (2013): From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Current protocols in bioinformatics. 43:11.10.11-33.

      ). This involved marking PCR duplicates with picard, base quality score recalibration, local realignment of reads around indels, and variant calling with HaplotypeCaller (GATK version 3.8-0-ge9d806836). All sample were jointly genotyped using the GenotypeGVCFs module from GATK using default parameters.
      After calling genotypes, variants whose depth of coverage was five standard deviations greater than the average depth of coverage across all sites were removed. Variants were then split by type (SNVs, indels and other), retaining those on the standard 23 pairs of chromosomes. Variant quality score recalibration (VQSR) was applied to SNVs and indels separately, using truth-sets provided in the GATK resource bundle. A tranche sensitivity threshold of 99.9% for SNVs and 99.0% for indels was used to remove low confidence variant sites. For variants other than SNVs and indels, the following hard filters were applied: QD < 2.0; ReadPosRankSum < -20.0; FS > 200.0; SOR > 10.0. Multi-allelic sites were split into multiple bi-allelic sites with bcftools norm (
      • Danecek P.
      • Bonfield J.K.
      • Liddle J.
      • Marshall J.
      • Ohan V.
      • Pollard M.O.
      • et al.
      Twelve years of SAMtools and BCFtools.
      ). If any sample had GQ < 20.0 or DP < 10.0, the genotype for that sample was set to missing (
      • Vadgama N.
      • Pittman A.
      • Simpson M.
      • Nirmalananthan N.
      • Murray R.
      • Yoshikawa T.
      • et al.
      De novo single-nucleotide and copy number variation in discordant monozygotic twins reveals disease-related genes.
      ). Finally, variants marked with a filter and variants with missingness greater than 20% were removed.
      The software peddy (
      • Pedersen B.S.
      • Quinlan A.R.
      Who's Who? Detecting and Resolving Sample Anomalies in Human DNA Sequencing Studies with Peddy.
      ) was applied to all samples jointly to perform the following pedigree and data checks: i) relatedness discordance; ii) sex discordance; iii) low median coverage; and iv) ancestry clustering by a principal component analysis (PCA) based on 1000 Genomes Project data (
      • Auton A.
      • Brooks L.D.
      • Durbin R.M.
      • Garrison E.P.
      • Kang H.M.
      • Korbel J.O.
      • et al.
      A global reference for human genetic variation.
      ). To investigate the presence of batch effects and technological stratification, the software XPAT (
      • Yu Y.
      • Hu H.
      • Bohlender R.J.
      • Hu F.
      • Chen J.S.
      • Holt C.
      • et al.
      XPAT: a toolkit to conduct cross-platform association studies with heterogeneous sequencing datasets.
      ) was applied to all samples. Given the minimum samples requirements for XPAT, additional families recruited from the same cohort and sequenced in the same batches were included.

      2.4 SNV and Indel Prioritization

      SnpSift v5.0 (
      • Cingolani P.
      • Patel V.M.
      • Coon M.
      • Nguyen T.
      • Land S.J.
      • Ruden D.M.
      • et al.
      Using Drosophila melanogaster as a Model for Genotoxic Chemical Mutational Studies with a New Program, SnpSift.
      ) was used to identify private variants in families to reduce the computation burden of annotation. Here, private indicates that the variant is present in one family and absent from all other families. We considered all individuals with a diagnosis of schizophrenia to be cases, and individuals with no diagnosis as controls. Next, variants were retained if there were no Mendelian violations and they followed either a full co-segregation pattern, (carried by all in-family cases, absent from all in-family controls and absent from all marry-in samples) or a reduced co-segregation pattern (carried by all but one in-family cases, absent from all in-family controls and absent from all marry-in samples). Custom JavaScript code was added to the FilterVcf module from picard to identify the case/control status and in-family/marry-in status of samples for the co-segregation filter. We removed variants not present in the coding sequence of a protein-coding gene, as defined by the RefSeq ncbiRefSeqCurated table (
      • O'Leary N.A.
      • Wright M.W.
      • Brister J.R.
      • Ciufo S.
      • Haddad D.
      • McVeigh R.
      • et al.
      Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.
      ), downloaded from the UCSC Table Browser (
      • Haeussler M.
      • Zweig A.S.
      • Tyner C.
      • Speir M.L.
      • Rosenbloom K.R.
      • Raney B.J.
      • et al.
      The UCSC Genome Browser database: 2019 update.
      ).
      The Variant Effects Predictor (VEP) v97.0 (
      • McLaren W.
      • Gil L.
      • Hunt S.E.
      • Riat H.S.
      • Ritchie G.R.
      • Thormann A.
      • et al.
      The Ensembl Variant Effect Predictor.
      ) was used to annotate each variant, taking the canonical transcript of that gene. Where variants overlapped multiple genes, we examined the canonical transcript of each gene separately. As part of the annotation, we included the Genome Aggregation Database (gnomAD) v2.1.1 exome allele frequencies (
      • Karczewski K.J.
      • Francioli L.C.
      • Tiao G.
      • Cummings B.B.
      • Alföldi J.
      • Wang Q.
      • et al.
      Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes.
      ) and the database of non-synonymous functional prediction (dbNSFP) v4.1 (
      • Liu X.
      • Li C.
      • Mou C.
      • Dong Y.
      • Tu Y.
      dbNSFP v4: a comprehensive database of transcript-specific functional predictions and annotations for human nonsynonymous and splice-site SNVs.
      ) from which several deleteriousness metrics were extracted, namely MPC (

      Samocha KE, Kosmicki JA, Karczewski KJ, O’Donnell-Luria AH, Pierce-Hoffman E, MacArthur DG, et al. (2017): Regional missense constraint improves variant deleteriousness prediction. bioRxiv.148353.

      ), SIFT v2.2 (
      • Vaser R.
      • Adusumalli S.
      • Leng S.N.
      • Sikic M.
      • Ng P.C.
      SIFT missense predictions for genomes.
      ), PolyPhen2 v2.2.2 (

      Adzhubei I, Jordan DM, Sunyaev SR (2013): Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet. Chapter 7:Unit7.20.

      ) and CADD v1.6 (
      • Rentzsch P.
      • Schubach M.
      • Shendure J.
      • Kircher M.
      CADD-Splice-improving genome-wide variant effect prediction using deep learning-derived splice scores.
      ). Gene-based probability of loss-of-function intolerance (pLI) scores (
      • Lek M.
      • Karczewski K.J.
      • Minikel E.V.
      • Samocha K.E.
      • Banks E.
      • Fennell T.
      • et al.
      Analysis of protein-coding genetic variation in 60,706 humans.
      ), missense Z-scores (
      • Samocha K.E.
      • Robinson E.B.
      • Sanders S.J.
      • Stevens C.
      • Sabo A.
      • McGrath L.M.
      • et al.
      A framework for the interpretation of de novo mutation in human disease.
      ) and LOEUF scores (
      • Karczewski K.J.
      • Francioli L.C.
      • Tiao G.
      • Cummings B.B.
      • Alföldi J.
      • Wang Q.
      • et al.
      The mutational constraint spectrum quantified from variation in 141,456 humans.
      ) calculated from gnomAD allele frequencies were also annotated from dbNSFP. Custom bash code was used to extract the variant-level scores from dbNSFP that corresponded to the appropriate transcript where scores for different transcripts were supplied.
      To prioritize variants likely to be implicated in schizophrenia based on the SCHEMA work, we retained those that satisfied the following: (i) ultra-rare in gnomAD (minor allele count ≤ 5 across all samples); (ii) either PTV (i.e. frameshift, stop gain or splice acceptor/donor); or predicted-deleterious missense variants (MPC > 2); and (iii) present in a highly LoF-intolerant gene (pLI > 0.9).

      2.5 Copy Number Variant Calling

      We used a family-based consensus of four software tools to call CNVs from the BAM files (Supplementary Figure 5). CNVs were called by tools derived from two classes of calling methods: CNVnator (
      • Abyzov A.
      • Urban A.E.
      • Snyder M.
      • Gerstein M.
      CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing.
      ) and ERDS (
      • Zhu M.
      • Need A.C.
      • Han Y.
      • Ge D.
      • Maia J.M.
      • Zhu Q.
      • et al.
      Using ERDS to infer copy-number variants in high-coverage genomes.
      ) (read-depth based callers); LUMPY (
      • Layer R.M.
      • Chiang C.
      • Quinlan A.R.
      • Hall I.M.
      LUMPY: a probabilistic framework for structural variant discovery.
      ) and Manta (
      • Chen X.
      • Schulz-Trieglaff O.
      • Shaw R.
      • Barnes B.
      • Schlesinger F.
      • Kallberg M.
      • et al.
      Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications.
      ) (paired-end /split-read based callers). A collapsing strategy was applied to raw CNV calls to eliminate multiple calls which represent the same site, similar to that described in Trost et al. (
      • Trost B.
      • Walker S.
      • Wang Z.
      • Thiruvahindrapuram B.
      • MacDonald J.R.
      • Sung W.W.L.
      • et al.
      A Comprehensive Workflow for Read Depth-Based Identification of Copy-Number Variation from Whole-Genome Sequence Data.
      ). Then, sites that overlapped reciprocally by 50% were merged, first considering within-method callsets (e.g. LUMPY vs Manta) and then between the resulting across-method callsets. Finally, sites for which over 50% of their length comprised of Repeat and Low Complexity Regions (RLCR) were removed. RLCRs are defined in Trost et al. as: (i) assembly gaps, (UCSC “gap” table); (ii) segmental duplications (UCSC “genomicSuperDups” table); (iii) the pseudo-autosomal regions of the sex chromosomes (
      • Trost B.
      • Walker S.
      • Wang Z.
      • Thiruvahindrapuram B.
      • MacDonald J.R.
      • Sung W.W.L.
      • et al.
      A Comprehensive Workflow for Read Depth-Based Identification of Copy-Number Variation from Whole-Genome Sequence Data.
      ).
      Within each pedigree, we removed CNV calls that were identified by one calling method and were only found in one individual. The resulting calls have the support of either at least two calling methods or multiple individuals in the same pedigree. For variants identified by one tool and present in multiple related samples, if the proportion of reads supporting an event at the breakpoints was low, such calls were removed.

      3. Results

      3.1 WGS Data

      Thirty-eight samples across six pedigrees were sent for sequencing at a minimum average depth of coverage of 30x (Methods, Supplementary Figure 1). Two samples (K1501_9 and K1527_33) failed sequencing due to low quality DNA. Additionally, sample K1524_3 was found to be unrelated to their four siblings and was removed. The remaining 35 samples passed all pedigree-related checks and were carried forward for analysis (Table 1). A PCA confirmed that all samples clustered with European individuals from the 1000 Genomes Project (Supplementary Figure 2). XPAT did not reveal any evidence of technological stratification or batch effects across the samples, with only family-based clustering observed (Supplementary Figures S3-4).
      Table 1The total number of individuals sequenced who were retained for analysis, broken down by diagnosis and whether they were members of the family (In-Family) or marry-in (Marry-In) samples. SCZ: schizophrenia; UN: unaffected.
      PedigreeTotalIn-FamilyMarry-In
      SCZUN
      K148054-1
      K149454-1
      K15017421
      K152444--
      K152765-1
      K15468521
      Total352645

      3.2 Prioritized SNVs

      In total 11,509,434 variants passed all quality control measures following the joint genotyping of all samples, of which 3,069,960 were private to one of the families. After applying the main prioritization filters, no ultra-rare, functionally relevant variants were identified that had full co-segregation with cases (Table 2). However, three deleterious missense URVs (Class II variants from SCHEMA: 2≤MPC<3) that followed a reduced co-segregation pattern were identified, one in each of pedigrees K1546, K1494, and K1524 (Table 3; Figure 1).
      Table 2The number of variants remaining after each stage of the prioritization process. Counts on the left are for full co-segregation and on the right are for reduced co-segregation. LoF: loss-of-function
      DescriptionVariants
      Quality control filters11,509,434
      Family-private variants3,069,960
      Co-segregation patternFullReduced
      57,253172,748
      In coding sequence5151,904
      Ultra-rare in gnomAD50210
      Functional relevance010
      LoF intolerant gene03
      Table 3Details of the three prioritized variants with reduced co-segregation. Positions are given on GRCh38. Included are the protein sequence ID (HGVSp), the minor allele count (MAC) from gnomAD (exome), and several deleteriousness prediction metrics. For SIFT and PolyPhen2, D represents “damaging” and “deleterious” respectively. CADD: combined annotation dependent depletion.
      PedigreeChrPositionVariantGeneExonHGVSpMACMPCCADDSIFTPolyPhen2
      K1546310360021G>AATP2B213/23R588C12.2331.0DD
      K15241099610923T>CSLC25A284/4I341V02.1125.6DD
      K14941942232651A>GGSK3A9/11I377T02.3926.9DD
      Figure thumbnail gr1
      Figure 1Pedigree images of the families that harbor an ultra-rare missense variant with reduced co-segregation. Fully shaded boxes denote individuals with a diagnosis of schizophrenia, and sequenced individuals are marked with a colored dot. The genotype of the identified SNV is shown beneath all sequenced individuals that were carried forward for analysis. Additional details for the variants are given in . (A) K1546 (ATP2B2:p.Arg588Cys); (B) K1524 (SLC25A28:p.Ile341Val); and (C) K1494 (GSK3A:p.Ile377Thr)

      3.3 Schizophrenia-associated CNVs

      We called CNVs in all samples using a consensus of four calling tools to rule out any variants with a known association with schizophrenia (Supplementary Table 1). Several such variants were initially identified in the cohort, most were excluded when examining the level of support at the breakpoints (Supplementary Table 2). Only one rare variant was retained, a duplication on chromosome 16p11.2 in sample K1524_5. This CNV was called by both read-depth based callers and was not observed in any other samples in this pedigree.

      4. Discussion

      We examined whole genome sequencing data from 35 individuals across six pedigrees recruited from Utah that were multiply affected by schizophrenia. In the absence of sufficient number of individuals to perform linkage analysis we performed a co-segregation analysis to identify variants which are likely to increase disease burden. Following recent work from the SCHEMA consortium, we investigated the presence of ultra-rare, deleterious variants in LoF-intolerant genes. While no fully co-segregating pathogenic URVs were found, we did observe three missense variants with a reduced co-segregation pattern in three families. All three variants were predicted to be deleterious by additional pathogenicity metrics (Table 2). None of the three genes survived FDR correction in the reported SCHEMA analysis, but there was a suggestive excess of the same class of missense variants at ATP2B2 in the schizophrenia cases compared to controls in the SCHEMA dataset (Table 4). Additionally, the variant in ATP2B2 identified here was observed only in one schizophrenia sample in the SCHEMA dataset. ATP2B2 has the highest missense Z-score of the three genes, indicating that it is the most intolerant to missense variants.
      Table 4Gene-level constraint information from gnomAD, including the probability of LoF-intolerant (pLI) score (
      • Lek M.
      • Karczewski K.J.
      • Minikel E.V.
      • Samocha K.E.
      • Banks E.
      • Fennell T.
      • et al.
      Analysis of protein-coding genetic variation in 60,706 humans.
      ), the missense Z-score (
      • Samocha K.E.
      • Robinson E.B.
      • Sanders S.J.
      • Stevens C.
      • Sabo A.
      • McGrath L.M.
      • et al.
      A framework for the interpretation of de novo mutation in human disease.
      ), the LOEUF metric (
      • Karczewski K.J.
      • Francioli L.C.
      • Tiao G.
      • Cummings B.B.
      • Alföldi J.
      • Wang Q.
      • et al.
      The mutational constraint spectrum quantified from variation in 141,456 humans.
      ), and results (odds ratio (OR), p-value) for Class II variants from the SCHEMA analysis (

      Singh T, Poterba T, Curtis D, Akil H, Al Eissa M, Barchas JD, et al. (2020): Exome sequencing identifies rare coding variants in 10 genes which confer substantial risk for schizophrenia. medRxiv.2020.2009.2018.20192815.

      ).
      GeneConstraintSCHEMA (Class II)
      pLImis_ZLOEUFORp-value
      ATP2B21.004.550.151.9200.000719
      SLC25A280.932.920.370.6170.744000
      GSK3A1.003.220.130.8300.835000
      ATP2B2 (ATPase plasma membrane Ca2+ transporting 2) is a member of the plasma membrane Ca2+ ATPase (PMCA) protein family which is involved in intracellular calcium homeostasis (
      • O'Leary N.A.
      • Wright M.W.
      • Brister J.R.
      • Ciufo S.
      • Haddad D.
      • McVeigh R.
      • et al.
      Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.
      ). It is found to be expressed in multiple brain tissue types in the GTEx project (

      (2013): The Genotype-Tissue Expression (GTEx) project. Nature genetics. 45:580-585.

      ). In a GWAS meta-analysis of autism spectrum disorder (ASD) and schizophrenia, an intronic variant in this gene (rs9879311) was found to be genome-wide significant (

      (2017): Meta-analysis of GWAS of over 16,000 individuals with autism spectrum disorder highlights a novel locus at 10q24.32 and a significant overlap with schizophrenia. Mol Autism. 8:21.

      ). Additionally, damaging de novo variants in ATP2B2 have been shown to be significantly enriched in ASD cases compared to unaffected siblings in a Japanese cohort (
      • Takata A.
      • Miyake N.
      • Tsurusaki Y.
      • Fukai R.
      • Miyatake S.
      • Koshimizu E.
      • et al.
      Integrative Analyses of De Novo Mutations Provide Deeper Biological Insights into Autism Spectrum Disorder.
      ). A protein-protein interaction analysis of genes implicated in schizophrenia from both rare variants (CNVs and de novo SNVs) and common SNPs pointed to N-methyl-D-aspartate receptor (NMDAR) genes as having significant combined effects between rare and common variants (
      • Chang X.
      • Lima L.A.
      • Liu Y.
      • Li J.
      • Li Q.
      • Sleiman P.M.A.
      • et al.
      Common and Rare Genetic Risk Factors Converge in Protein Interaction Networks Underlying Schizophrenia.
      ). ATP2B2 was found to be connected to the core members of this NMDAR interactome. A paralog of this gene is ATP2A2, which is a member of the sarco/endoplasmic reticulum CA2+ ATPase (SERCA) protein family. Variants in this gene cause Darier’s disease, which is known to increase risk for schizophrenia and bipolar disorder (
      • Cederlöf M.
      • Bergen S.E.
      • Långström N.
      • Larsson H.
      • Boman M.
      • Craddock N.
      • et al.
      The association between Darier disease, bipolar disorder, and schizophrenia revisited: a population-based family study.
      ). Fine-mapping of the significant loci from the PGC schizophrenia phase 3 GWAS identified an intronic variant of ATP2A2 as highly probable of being causal (

      Ripke S, Walters JT, O'Donovan MC (2020): Mapping genomic loci prioritises genes and implicates synaptic biology in schizophrenia. medRxiv.2020.2009.2012.20192922.

      ).
      SLC25A28 (“Solute carrier Family 25 Member 28”) is part of the mitochondrial carrier sub-group of the SLC gene family. It is a mitochondrial iron transporter that mediates iron uptake, and is expressed in most tissue types, including several brain tissues (

      (2013): The Genotype-Tissue Expression (GTEx) project. Nature genetics. 45:580-585.

      ). There is no previous evidence of association between SLC25A28 and schizophrenia or related disorders. GSK3A (“glycogen synthase kinase-3α”) is one of the two isoforms of the GSK-3 protein kinase and is expressed in multiple brain tissues (

      (2013): The Genotype-Tissue Expression (GTEx) project. Nature genetics. 45:580-585.

      ). Lithium, used to treat bipolar disorder, inhibits the activity of the paralog of this gene (GSK3B) (
      • Young W.
      Review of lithium effects on brain and blood.
      ). The variant in this gene was also present in the SCHEMA analysis, where one allele was observed in an individual with schizophrenia. Only one individual across the six families was found to carry a rare, schizophrenia associated CNV: a duplication on chromosome 16p11.2 was called in sample K1524_5. This individual was the only sample in the pedigree not to carry the URV in SLC25A28.
      This study represents an exploratory analysis of these pedigrees to screen for classes of gene-affecting variants that are known to be associated with schizophrenia. Full co-segregation would provide the strongest evidence for candidate causal variants, but within each pedigree the inheritance patterns of the three SNVs are consistent with a variant inherited from a common ancestor of most of the affected individuals. The majority of the unaffected individuals from these three pedigrees were last contacted after the typical age of onset of schizophrenia, so are likely to have remained as such (Supplementary Table 3). It is worth noting that the number of samples sequenced in each pedigree is modest, so additional evidence could be gained by the availability of other affected and unaffected family members. Further work is needed to examine other fully co-segregating variants, as well as variants outside the protein-coding regions. We employed strict cut-off thresholds for our filtering and acknowledge that other analytical strategies may identify plausible candidate causal variants missed by the filtering approach.
      The methodology implemented here is robust and follows practices established by large-scale genomics consortia, but the SNVs and CNV identified need to be validated with additional sequencing methods. Given how rare these variants are in the population, large samples sizes are required in the discovery analysis to achieve statistical significance, and future work by the SCHEMA consortium may implicate ATP2B2 as a schizophrenia risk gene.

      5 Acknowledgements

      The authors acknowledge the support of the Trinity Centre for High Performance Computing (ResearchIT). This work as part of the Psychiatric Genomics Consortium was supported by the National Institutes of Health [5U01MH 109499-04 to A.C], and Science Foundation Ireland [16/SPP/3324 to A.C].

      Supplementary Material

      7 References

        • Tiihonen J.
        • Lonnqvist J.
        • Wahlbeck K.
        • Klaukka T.
        • Niskanen L.
        • Tanskanen A.
        • et al.
        11-year follow-up of mortality in patients with schizophrenia: a population-based cohort study (FIN11 study).
        Lancet. 2009; 374: 620-627
      1. Ripke S, Walters JT, O'Donovan MC (2020): Mapping genomic loci prioritises genes and implicates synaptic biology in schizophrenia. medRxiv.2020.2009.2012.20192922.

        • Rees E.
        • Kirov G.
        Copy number variation and neuropsychiatric illness.
        Current opinion in genetics & development. 2021; 68: 57-63
        • Kirov G.
        CNVs in neuropsychiatric disorders.
        Human molecular genetics. 2015; 24: R45-49
      2. Singh T, Poterba T, Curtis D, Akil H, Al Eissa M, Barchas JD, et al. (2020): Exome sequencing identifies rare coding variants in 10 genes which confer substantial risk for schizophrenia. medRxiv.2020.2009.2018.20192815.

        • Wilfert A.B.
        • Turner T.N.
        • Murali S.C.
        • Hsieh P.
        • Sulovari A.
        • Wang T.
        • et al.
        Recent ultra-rare inherited variants implicate new autism candidate risk genes.
        Nature genetics. 2021; 53: 1125-1134
        • Endicott J.
        • Spitzer R.L.
        A diagnostic interview: the schedule for affective disorders and schizophrenia.
        Archives of general psychiatry. 1978; 35: 837-844
        • Spitzer R.L.
        • Endicott J.
        • Robins E.
        Research diagnostic criteria: rationale and reliability.
        Archives of general psychiatry. 1978; 35: 773-782
        • Li H.
        • Handsaker B.
        • Wysoker A.
        • Fennell T.
        • Ruan J.
        • Homer N.
        • et al.
        The Sequence Alignment/Map format and SAMtools.
        Bioinformatics. 2009; 25: 2078-2079
      3. Li H (2013): Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM.

      4. Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, et al. (2013): From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Current protocols in bioinformatics. 43:11.10.11-33.

        • Danecek P.
        • Bonfield J.K.
        • Liddle J.
        • Marshall J.
        • Ohan V.
        • Pollard M.O.
        • et al.
        Twelve years of SAMtools and BCFtools.
        Gigascience. 2021; 10
        • Vadgama N.
        • Pittman A.
        • Simpson M.
        • Nirmalananthan N.
        • Murray R.
        • Yoshikawa T.
        • et al.
        De novo single-nucleotide and copy number variation in discordant monozygotic twins reveals disease-related genes.
        European journal of human genetics : EJHG. 2019;
        • Pedersen B.S.
        • Quinlan A.R.
        Who's Who? Detecting and Resolving Sample Anomalies in Human DNA Sequencing Studies with Peddy.
        American journal of human genetics. 2017; 100: 406-413
        • Auton A.
        • Brooks L.D.
        • Durbin R.M.
        • Garrison E.P.
        • Kang H.M.
        • Korbel J.O.
        • et al.
        A global reference for human genetic variation.
        Nature. 2015; 526: 68-74
        • Yu Y.
        • Hu H.
        • Bohlender R.J.
        • Hu F.
        • Chen J.S.
        • Holt C.
        • et al.
        XPAT: a toolkit to conduct cross-platform association studies with heterogeneous sequencing datasets.
        Nucleic acids research. 2018; 46: e32
        • Cingolani P.
        • Patel V.M.
        • Coon M.
        • Nguyen T.
        • Land S.J.
        • Ruden D.M.
        • et al.
        Using Drosophila melanogaster as a Model for Genotoxic Chemical Mutational Studies with a New Program, SnpSift.
        Front Genet. 2012; 3: 35
        • O'Leary N.A.
        • Wright M.W.
        • Brister J.R.
        • Ciufo S.
        • Haddad D.
        • McVeigh R.
        • et al.
        Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.
        Nucleic acids research. 2016; 44: D733-745
        • Haeussler M.
        • Zweig A.S.
        • Tyner C.
        • Speir M.L.
        • Rosenbloom K.R.
        • Raney B.J.
        • et al.
        The UCSC Genome Browser database: 2019 update.
        Nucleic acids research. 2019; 47: D853-d858
        • McLaren W.
        • Gil L.
        • Hunt S.E.
        • Riat H.S.
        • Ritchie G.R.
        • Thormann A.
        • et al.
        The Ensembl Variant Effect Predictor.
        Genome biology. 2016; 17: 122
        • Karczewski K.J.
        • Francioli L.C.
        • Tiao G.
        • Cummings B.B.
        • Alföldi J.
        • Wang Q.
        • et al.
        Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes.
        bioRxiv. 2019; 531210
        • Liu X.
        • Li C.
        • Mou C.
        • Dong Y.
        • Tu Y.
        dbNSFP v4: a comprehensive database of transcript-specific functional predictions and annotations for human nonsynonymous and splice-site SNVs.
        Genome medicine. 2020; 12: 103
      5. Samocha KE, Kosmicki JA, Karczewski KJ, O’Donnell-Luria AH, Pierce-Hoffman E, MacArthur DG, et al. (2017): Regional missense constraint improves variant deleteriousness prediction. bioRxiv.148353.

        • Vaser R.
        • Adusumalli S.
        • Leng S.N.
        • Sikic M.
        • Ng P.C.
        SIFT missense predictions for genomes.
        Nat Protoc. 2016; 11: 1-9
      6. Adzhubei I, Jordan DM, Sunyaev SR (2013): Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet. Chapter 7:Unit7.20.

        • Rentzsch P.
        • Schubach M.
        • Shendure J.
        • Kircher M.
        CADD-Splice-improving genome-wide variant effect prediction using deep learning-derived splice scores.
        Genome medicine. 2021; 13: 31
        • Lek M.
        • Karczewski K.J.
        • Minikel E.V.
        • Samocha K.E.
        • Banks E.
        • Fennell T.
        • et al.
        Analysis of protein-coding genetic variation in 60,706 humans.
        Nature. 2016; 536: 285-291
        • Samocha K.E.
        • Robinson E.B.
        • Sanders S.J.
        • Stevens C.
        • Sabo A.
        • McGrath L.M.
        • et al.
        A framework for the interpretation of de novo mutation in human disease.
        Nature genetics. 2014; 46: 944-950
        • Karczewski K.J.
        • Francioli L.C.
        • Tiao G.
        • Cummings B.B.
        • Alföldi J.
        • Wang Q.
        • et al.
        The mutational constraint spectrum quantified from variation in 141,456 humans.
        Nature. 2020; 581: 434-443
        • Abyzov A.
        • Urban A.E.
        • Snyder M.
        • Gerstein M.
        CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing.
        Genome research. 2011; 21: 974-984
        • Zhu M.
        • Need A.C.
        • Han Y.
        • Ge D.
        • Maia J.M.
        • Zhu Q.
        • et al.
        Using ERDS to infer copy-number variants in high-coverage genomes.
        American journal of human genetics. 2012; 91: 408-421
        • Layer R.M.
        • Chiang C.
        • Quinlan A.R.
        • Hall I.M.
        LUMPY: a probabilistic framework for structural variant discovery.
        Genome biology. 2014; 15: R84
        • Chen X.
        • Schulz-Trieglaff O.
        • Shaw R.
        • Barnes B.
        • Schlesinger F.
        • Kallberg M.
        • et al.
        Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications.
        Bioinformatics. 2016; 32: 1220-1222
        • Trost B.
        • Walker S.
        • Wang Z.
        • Thiruvahindrapuram B.
        • MacDonald J.R.
        • Sung W.W.L.
        • et al.
        A Comprehensive Workflow for Read Depth-Based Identification of Copy-Number Variation from Whole-Genome Sequence Data.
        American journal of human genetics. 2018; 102: 142-155
      7. (2013): The Genotype-Tissue Expression (GTEx) project. Nature genetics. 45:580-585.

      8. (2017): Meta-analysis of GWAS of over 16,000 individuals with autism spectrum disorder highlights a novel locus at 10q24.32 and a significant overlap with schizophrenia. Mol Autism. 8:21.

        • Takata A.
        • Miyake N.
        • Tsurusaki Y.
        • Fukai R.
        • Miyatake S.
        • Koshimizu E.
        • et al.
        Integrative Analyses of De Novo Mutations Provide Deeper Biological Insights into Autism Spectrum Disorder.
        Cell Rep. 2018; 22: 734-747
        • Chang X.
        • Lima L.A.
        • Liu Y.
        • Li J.
        • Li Q.
        • Sleiman P.M.A.
        • et al.
        Common and Rare Genetic Risk Factors Converge in Protein Interaction Networks Underlying Schizophrenia.
        Front Genet. 2018; 9: 434
        • Cederlöf M.
        • Bergen S.E.
        • Långström N.
        • Larsson H.
        • Boman M.
        • Craddock N.
        • et al.
        The association between Darier disease, bipolar disorder, and schizophrenia revisited: a population-based family study.
        Bipolar disorders. 2015; 17: 340-344
        • Young W.
        Review of lithium effects on brain and blood.
        Cell Transplant. 2009; 18: 951-975