The goat pan-genome reveals patterns of gene loss during domestication

Fenghua Lv

doi:10.1186/s40104-024-01092-7

Short report

The goat pan-genome reveals patterns of gene loss during domestication

¹ ,¹ ,¹ ,¹ ,¹ ,¹

Abstract

Background

Unveiling genetic diversity features and understanding the genetic mechanisms of diverse goat phenotypes are pivotal in facilitating the preservation and utilization of these genetic resources. However, the total genetic diversity within a species can’t be captured by the reference genome of a single individual. The pan-genome is a collection of all the DNA sequences that occur in a species, and it is expected to capture the total genomic diversity of the specific species.

Results

We constructed a goat pan-genome using map-to-pan assemble based on 813 individuals, including 723 domestic goats and 90 samples from their wild relatives, which presented a broad regional and global representation. In total, 146 Mb sequences and 974 genes were identified as absent from the reference genome (ARS1.2; GCF_001704415.2). We identified 3,190 novel single nucleotide polymorphisms (SNPs) using the pan-genome analysis. These novel SNPs could properly reveal the population structure of domestic goats and their wild relatives. Presence/absence variation (PAV) analysis revealed gene loss and intense negative selection during domestication and improvement.

Conclusions

Our research highlights the importance of the goat pan-genome in capturing the missing genetic variations. It reveals the changes in genomic architecture during goat domestication and improvement, such as gene loss. This improves our understanding of the evolutionary and breeding history of goats.

Keywords

Agricultural GeneticsConservation genomicsDomesticationGenetic mappingGenomeGenome EvolutionGenome duplicationGoatImprovementPan-genomePresence/absence variation

Background

Under long-term adaptation to diverse environments and artificial selection, goats have evolved various phenotypes and specific genomic patterns compared with their wild ancestors [1]. In addition, domestic goats have incorporated genetic elements from multiple wild species, such as Caucasian tur (Capra caucasica) and Bezoar (Capra aegagrus) during the domestication and dispersal process [2]. Thus, the genomes of goats exhibit complex genetic characteristics [3]. Unveiling genetic diversity features and understanding the genetic mechanisms of diverse phenotypes of goats are pivotal in facilitating the preservation and utilization of these genetic resources.

Over the last decade, reference genomes have provided a roadmap and a fundamental framework for discovering genomic features. By aligning short reads to a reference genome, the genomic variations, such as single nucleotide polymorphisms (SNPs) and small insertions and deletions (INDELs), were identified, which elucidated the genetic mechanisms of the diverse phenotypes in humans [4], animals [5], and plants [6]. The current goat reference genome was generated from one individual and used in most goat genomic studies [7, 8]. However, the genetic diversity and genomic patterns within a species can’t be fully captured by the reference genome from a single individual [9]. The pan-genome is the nonredundant set of all DNA sequences in a specific species [10]. Pan-genome research in humans [10], animals [11], and plants [12] has provided novel scientific insights into the genetic diversity features and genetic mechanisms of diverse phenotypes. Compared with a reference genome from a single individual, a pan-genome could be applied to detect structural variations (SVs), such as copy number variations (CNVs) and presence/absence variations (PAVs) [13, 14]. These variations may significantly impact phenotype more than SNPs by altering gene dosage, interrupting coding sequence (CDS), and affecting long-range gene regulation [15,16,17]. The first goat pan-genome was assembled by Li et al. [18] using 10 individuals, including 5 samples from Capra genus and 5 individuals from sibling genus. However, the limited sample size can’t fully represent goats’ genetic and phenotypic diversity and constrain the detection of SVs, such as PAVs.

To obtain the comprehensive genome sequence and improve our understanding of the genomic characteristics of goats, the map-to-pan strategy was used to construct a goat pan-genome based on 723 domestic goats and 90 samples from their wild relatives, covering genetically, phylogenetically and geographically diverse samples (Additional file 1: Table S1). To identify the previously unknown SNPs, we aligned the reads that couldn’t be mapped to ARS1.2 to the nonreference sequences. PAVs were detected among the wild, native, and improved populations to reveal genetic changes through domestication and breeding history. This research will enhance our understanding of the changes in the genetic diversity and genomic architecture during domestication and improvement in goats.

Methods

Genome data collection, processing, and pan-genome construction

A total of 723 whole genome sequencing data of domestic goats from 85 populations and 90 samples from 6 wild goat species of Capra genus were downloaded from the National Center for Biotechnology Information Sequence Read Archive database (NCBI, https://www.ncbi.nlm.nih.gov) (Fig. 1, Table 1, Additional file 1: Table S1). The map-to-pan strategy was used to construct a goat pan-genome [19]. More specifically, raw Illumina reads were filtered to remove adapters and low-quality sequences using Trimmomatic (v0.39) [20] with parameters “SLIDINGWINDOW:4:15 MINLEN:50”. The high-quality reads of each individual were mapped to ARS1.2 using BWA (v0.7.12) [21] with the default parameters. The reads unmapped to ARS1.2 were extracted using SAMtools (v1.9) [22] with parameters "-b -f 4", "-b -f 68 -F 8" and "-b -f 132 -F 8", respectively. Subsequently, the unmapped reads were merged into a unified BAM file by SAMtools (v1.9) [22], followed by conversion of the BAM file to FASTQ format using bamtools (v2.5.2) (https://github.com/pezmaster31/bamtools). The FASTQ file was divided into paired FASTQ files corresponding to the forward and reverse strands utilizing an in-house Python script. Then, the paired unmapped reads were assembled using MaSuRCA (v4.1.0) [23] with parameters “USE_LINKING_MATES = 1, JF_SIZE = 5,000,000,000, FLYE_ASSEMBLY = 0”. Contigs assembled with a length exceeding 500 bp were retained for further analysis. We polished the contigs following 4 steps. Firstly, the Mummer (v4.0.0-beta2) [24] was employed to filter the initial contigs by aligning them to the ARS1.2. Contigs exhibiting an identity of ≥ 90% and a coverage of ≥ 80% were subsequently excluded. Secondly, the CD-HIT (v4.8.1) [25] was applied for filtering, where the assembly contigs were aligned against themselves, and contigs displaying an identity ≥ 90% and coverage ≥ 90% were removed. Thirdly, all contigs were carried out a search within the NCBI-NT database using BLASTN (v 2.14.0) [26] to discern sequences associated with archaea, viruses, bacteria, fungi, and viridiplantae, and identified sequences from these categories were eliminated. Finally, the Kraken2 (v2.0.9-beta) [27] was employed to classify and filter the remaining contigs using the kraken2-microbial database, encompassing sequences of archaea, bacteria, fungi, protozoa, viruses, and humans. The unclassified contigs were retained and integrated with ARS1.2 to assemble a goat pan-genome.

Table 1 Summary of sample in this study

Full size table

Annotation of the nonreference sequences

The nonreference sequences were annotated using a pipeline that integrated three distinct methodologies: ab initio gene prediction, RNA-Seq, and protein homology [28]. Firstly, the repetitive sequences were identified by scanning a custom repeat library and Bovidae repeat library in RepeatMasker (v4.1.2-p1) [29] and masked in subsequent annotation analysis, in which a custom repeat library was built by screening the nonreference sequence using RepeatModeler (v2.0.3) [30]. Secondly, ab initio gene prediction was conducted using Augustus (v3.1.0) [31] under the Bovidae model with default parameters, and SNAP (2006-07-28) [32] with the guidance of MAKER2 (v2.31.8) [28]. Thirdly, we collected 37 RNA-Seq data from 28 tissues as transcript evidence (Additional file 1: Table S2). The RNA-Seq reads were filtered to remove the adapter and low-quality sequences using Trimmomatic (v0.39) [20]. The clean reads were aligned to the nonreference sequences using Hisat2 (v2.2.1) [33], and the gene models were subsequently constructed from the result of alignments using StringTie (v2.1.7) [34]. We further downloaded protein sequences of Homo sapiens, Mus musculus, Canis lupus familiaris, Bos taurus, Sus scrofa, Ovis aries, and Capra hircus from RefSeq, and the sequences were aligned to the nonreference sequences using Spaln (v3.0.3) [35]. Finally, we integrated gene predictions based on evidence of ab initio gene prediction, RNA-Seq, and protein homology following the MAKER2 pipeline [28], and a set of high-confidence gene models were obtained using MAKER2 (v2.31.8) [28]. We further investigated gene description by aligning the protein sequences to the UniProt database (https://ftp.uniprot.org/pub/databases) using BLASTP (v2.11.0) [26].

Novel SNP calling and population genetic analysis

The reads of each sample that couldn't be mapped to ARS1.2 were mapped to the nonreference sequence using BWA-MEM (v0.7.17-r1188) [21] with the default parameters. The duplicated mapped reads were filtered using MarkDuplicates module in GATK (v4.1.9.0) [36]. Short variations (SNPs and INDELs) were identified by HaplotypeCaller module in GATK (v4.1.9.0) [36]. GenomicsDBImport module in GATK (v4.1.9.0) [36] was used to generate a database for GVCF files of all samples. And then, genotypes of variations were detected using the GenotypeGVCFs module in GATK (v4.1.9.0) [36]. SNPs were selected by the SelectVariants module in GATK (v4.1.9.0) [36] and filtered out potential false-positive loci using the VariantFiltration module of GATK (v4.1.9.0) [36] under the parameters “QD < 2.0 || MQ < 40.0 || FS > 60.0 || SOR > 3.0 || MQRankSum < −12.5 || ReadPosRankSum < −8.0”. Biallelic SNPs were extracted using VCFtools (v0.1.15) [37] with parameters “--minDP 2 --min-alleles 2 --max-alleles 2”. We further filtered out SNPs with the following criteria using PLINK (v1.90b6.26) [38]: (1) minor allele frequency (MAF) < 0.01; (2) SNP call rate < 0.5; (3) individual call rate < 0.5. After the quality control, 3,190 novel SNPs were retained for subsequent analysis.

To investigate features of these novel SNPs, we performed population genetic analysis using novel SNP datasets, such as principal component analysis (PCA), an approximately maximum likelihood phylogenetic tree, and model-based clustering. We performed the above analysis based on two datasets: one comprising 723 domestic goats and the other including all the 813 individuals (Additional file 1: Table S1). PCA was performed using PLINK (v1.90b6.26) [38]. An approximately maximum likelihood phylogenetic tree was constructed using the FastTree (v2.1.11) [39] with default parameters. The final tree topology was visualized using iTOL tool [40]. We performed model-based clustering to estimate the ancestry of each individual using the ADMIXTURE (v1.3.0) [41] with the number of ancestry kinships (K) set to 2–8.

PAV calling and analysis

To investigate the characteristics of PAV in different populations, individuals with a minimum average sequencing depth of 10 × were retained for PAV calling [9, 13]. A total of 277 individuals were retained for PAV calling, including 85 wild, 121 native, and 71 improved goats (Additional file 1: Table S3). The presence or absence of each gene in each individual was determined using SGSGeneloss-based method [42]. We retrieved the longest transcript for each gene, considering it as the gene body, and subsequently calculated coverage based on this gene body [13]. If a minimum of two reads covered less than 5% of the exon regions, the gene was defined as absent in this individual; otherwise, it was classified as present [43].

We further investigated effects of artificial and natural selection, which were among the most important factors that reshaped the genome's architecture [44]. To identify genes under selection during domestication and improvement, we implemented PAV selection analysis between Bezoar (n = 24) and native goats (n = 121) to identify genes associated with domestication (Table 2). We also performed PAV selection analysis between native goats (n = 121) and improved dairy goats (n = 57) to identify genes related to improvement process (Table 2). Fisher’s exact test was conducted to determine the significance of the difference in the presence frequencies for each gene. A gene with a P value less than 0.005 was chosen as the putative gene.

Table 2 Candidate PAV genes associated with selection in goats

Full size table

Results

Pan-genome of Capra genus

A total of 813 individuals were collected, including 723 domestic goats and 90 wild goats from Europe, Africa, and Asia (Fig. 1, Additional file 1: Table S1), and were used to assemble a goat pan-genome. The average sequence coverage of all the individuals was 16.40× (1.33×–48.64×). After removing contaminants and redundancies, 146 Mb sequences absent from ARS1.2 were retained. These nonreference sequences contained 133,959 contigs (501 bp ≤ length of contig ≤ 113,829 bp) with an average length of 1,091 bp. In addition, 974 protein-coding genes were predicted in the nonreference genome (Additional file 1: Table S4). Ultimately, the goat pan-genome, comprising the ARS1.2 and the nonreference sequences, harbored 25,743 genes (21,546 protein-coding genes and 4,197 other types of genes) covering DNA length up to 3,068,830,139 bp. We also compared the size and number of protein-coding genes of the goat pan-genome with other published goat reference genomes, such as CHIR_1.0 (GCF_000317765.1) and CVASU_BBG_1.0 (GCA_004361675.1). The genome size and number of protein-coding genes in the goat pan-genome were higher than the two reference genomes (Additional file 1: Table S7).

Novel SNP calling and population genetic analysis

To identify missing SNPs in previous studies using a single reference genome, reads that couldn't be aligned to ARS1.2 were mapped to the nonreference sequences, and SNP calling was performed. We detected 3,190 high-quality novel SNPs in 813 samples. We conducted PCA, approximately-maximum-likelihood phylogenetic tree, and model-based clustering analysis using the novel SNP dataset. The results showed that samples from the same species tended to cluster (Fig. 2A, Additional files 2 and 3: Fig. S1 and S2). Most domestic goats were clustered based on their breeds (Additional file 4: Fig. S3), and domestic goats were split into 3 geographically structured groups (Asian, African, and European populations) (Fig. 2B–D, Additional files 2 and 3: Fig. S1 and S2). Population genetic analysis showed a close genetic relationship between Bezoar Ibex/Markhor and domestic goats (Fig. 2A, Additional files 2 and 3: Fig. S1 and S2). The pattern was consistent with previous studies using whole genome sequence datasets [1].

Characterization of gene PAVs

A total of 277 individuals with an average sequence depth of more than 10× (Additional file 1: Table S3) were used to perform PAV calling. The presence frequency of each gene in all 277 individuals was calculated. In accordance with previous pan-genome studies [9, 13, 45], the genes were grouped into four categories (core, softcore, shell, cloud) according to their presence frequencies: core gene presenting in all individuals (100%); softcore gene presenting in more than 99% of individuals but less than 100% of individuals; shell gene presenting in 1%–99% individuals and cloud gene presenting in less than 1% individuals. Finally, 23,098 genes (89.73%) were found in all 277 individuals and were classified as core genes. The remaining 2,645 genes were classified as dispensable genes, including 914 softcore genes (3.55%), 867 shell genes (3.37%), and 864 cloud genes (3.36%) (Fig. 3A). The distribution pattern of gene PAVs was consistent with pan-genome studies of other species, such as chicken [13], Brassica oleracea [46], and arabidopsis [47]. It showed moderately high conserved genes (core and softcore genes) (Fig. 3A). The model of the pan-genome size by 200 iteratively randomly sampling individuals from 1 to 277 indicated an open pan-genome with an estimated total of 25,394 genes (Fig. 3B). The result suggested that the goat pan-genome assembled in this study didn’t reach saturation and could not include all or nearly all Capra genus gene contents. We also noted a different distribution of gene PAV across various species or populations (Fig. 3C). A striking feature was the decrease in gene presence from wild species to native populations and further from native populations to improved populations (Fig. 3C).

Gene PAVs associated with domestication and improvement of goats

We compared gene presence between Bezoar and native goat populations (Table 2, Additional file 1: Table S2). The genes with higher frequencies in native goat populations than Bezoar population were considered as possible favorable genes, and genes with lower frequencies in native goat populations than Bezoar population were considered as possible unfavorable genes [9]. We identified 89 genes with significantly altered frequencies (Fig. 4A), comprising 15 favorable genes and 74 unfavorable genes (Fig. 4B). Annotation analysis indicated that all 89 genes were in ARS1.2 (Additional file 1: Table S5). Among the 15 favorable genes, CLEC2D (LOC102184901) was associated with immune and inflammatory responses (Table 2) and has been fixed in native goats (frequency = 99.17%) (Additional file 1: Table S5). We also detected genes related to immune and inflammatory responses in unfavorable genes, such as FAM26F (LOC102181592). FAM26F that encoding tetraspanin-like membrane glycoprote [48] was detected in all Bezoar populations and 64.46% of native populations (Additional file 1: Table S5). We performed a comparative analysis of gene presence between native goats and dairy goats. Of note, 35 favorable genes (with a higher frequency in improved goat populations compared with native goat populations) and 72 unfavorable genes (with a lower frequency in improved goat populations compared with native goat populations) were identified (Fig. 4C and D, Additional file 1: Table S6). Genes associated with immune and inflammatory responses were also found in unfavorable genes, such as GIMAP6 (LOC108635866) and CLECL1 (LOC108636138). GIMAP6 was detected in 96.69% of native goats and 54.39% of dairy goats (Additional file 1: Table S6) and has been reported to be associated with the regulation of autophagy, immune competence, and inflammation in mammals [49].

Notably, a prominent feature was the loss of genes during domestication and improvement (Fig. 4B and D). Specifically, 75 (83.15%) and 72 genes (67.29%) were identified as unfavorable genes, respectively (Additional file 1: Table S5 and S6). We further compared the gene count within each individual of the Bezoar population and the native goat and dairy goat populations. An average of 24,664, 24,623, and 24,615 genes were detected, showing a decreasing trend in gene number from wild species to native populations and from native populations to dairy populations (Fig. 4E).

Discussion

This study used the map-to-pan strategy to construct a goat pan-genome. Taking into account the potential impact of sample size and representativeness on the pan-genome in humans and other organisms [13, 50, 51], we collected a diverse range of samples, covering almost all goats and their wild relatives across various geographic regions (Fig. 1, Additional file 1: Table S1), surpassing the scope of the previous study [18]. A total of 146 Mb nonreference sequences and 974 additional genes were identified as absent from ARS1.2. Notably, both indices were higher than those in the previous goat pan-genome study [18]. Utilizing the goat pan-genome, we identified 3,190 novel SNPs, indicating that the pan-genome can capture a greater diversity of genetic variations than a single reference genome.

Modeling of the goat pan-genome size revealed an open pan-genome (Fig. 3B). The result suggested that 277 samples in our PAV analysis were insufficient to capture the full spectrum of genetic diversity within goats. One potential reason was the limited sample size. Our study’s sample size was moderate compared with other animals, such as 268 individuals in chicken [13] and 250 individuals in pigs [11]. Nevertheless, the mosaic domestication of goats in the Fertile Crescent has led to genetically and geographically distinct goat populations since the Neolithic period, and continues to influence their diversity today [3, 52]. In addition, a high level of interspecies sequence variation has been observed compared with intraspecies in Capra genus [53], a high proportion of wild goats (31%, Additional file 1: Table S3) in our PAV analysis could contribute to the heterogeneity observed in genomic sequences. Therefore, by incorporating additional samples, particularly from improved and native breeds outside of China, PAV analysis will encompass a broader spectrum of goat gene contents.

Compared with other species, the goat pan-genome exhibited a higher proportion of core genes (89.73%), such as in pigs (75%) [54], chickens (76.32%) [13] and goose (75.86%) [55]. Although the proportion of core genes may decrease with an increase in sample size, 11% of variable genes within the goat pan-genome indicate substantial genomic potential for goat breeding. The PAV analysis unveiled gene loss events during domestication and improvement (Fig. 4). Comparable patterns have been observed in domesticated plants, such as tomato and cotton [9, 56]. The annotation of PAVs suggested that domestication and improvement may have influenced genomic features associated with immune and inflammatory responses [9, 56]. We also identified several immune and inflammation-related genes, including CLEC2D, FAM26F, GIMAP6, and CLECL1 (Table 2), that may be related to the domestication and improvement of goats. CLEC2D was found to be fixed (frequency = 0.99) within native goat populations. It plays a role in regulating immune and inflammatory responses as a ligand of CD161 receptor, and inhibiting the function of NK cells and T cells by its expression [57, 58]. FAM26F was observed to be fixed (frequency = 1.00) within Bezoar population. The gene could be activated by various infections, such as bacteria, viruses, and parasites, and expressed in various immune cells [48]. This observation is consistent with variable environments encountered by Bezoar. Bezoars are primarily distributed in a discontinuous range across Central Asia and the Caucasus, extending to southwestern Turkey [59]. Their natural habitats are characterized by harsh environmental conditions, including arid deserts and high-altitude rocky mountains [60]. GIMAP6 and CLECL1 exhibited a higher frequency in native goats than dairy goats (Additional file 1: Table S6) and are associated with immune responses [49, 61, 62]. This might also reflect the diverse environments encountered by native goats [53].

Conclusion

We constructed a goat pan-genome based on 813 individuals, including 723 domestic goats and 90 samples from their wild relatives, which presented a broad genetic and geographical representativeness. A total of 146 Mb sequences and 974 protein-coding genes were identified as absent from ARS1.2. Additionally, 3,190 novel SNPs were identified from the nonreference sequences, which suggested that the goat pan-genome could capture more genetic variations. PAV analysis revealed evidence of gene loss during domestication and subsequent improvement processes. PAV selection analysis identified several genes related to immune regulation and inflammatory responses. This research enhances our understanding of the genomic changes throughout the history of goat breeding and highlight the importance of pan-genome in goat genomic studies.

Data Availability

The whole genome re-sequence data used for the study is publicly available under the sample accession numbers listed in Additional file 1 : Table S1. All scripts used for this work were performed using open-source software tools and are available from the corresponding authors upon request.

Abbreviations

CDS:: Coding sequence
CNV :: Copy number variation
INDELs:: Small insertions and deletions
MAF:: Minor allele frequency
PAV :: Presence/absence variation
PCA:: Principal component analysis
SNP:: Single nucleotide polymorphism
SV :: Structural variation

References

1.Denoyelle L, Talouarn E, Bardou P, Colli L, Alberti A, Danchin C, et al. VarGoats project.(2021)a dataset of 1159 whole-genome sequences to dissectCapra hircusglobal diversity.Genet Sel Evol.: 86.
10.1186/s12711-021-00659-6
Article|CAS|PubMed|Google Scholar
2.Pogorevc N, Dotsev A, Upadhyay M, Sandoval-Castellanos E, Hannemann E, Simčič M, et al. Whole-genome SNP genotyping unveils ancestral and recent introgression in wild and domestic goats. Mol Ecol. 2024;33.(2024)e17190.
10.1111/mec.17190
Article|CAS|PubMed|Google Scholar
3.Daly KG, Delser PM, Mullin VE, Scheu A, Mattiangeli V, Teasdale MD, et al. Ancient goat genomes reveal mosaic domestication in the Fertile Crescent. Science. 2018;361.(2018)85–8.: 85.
10.1126/science.aas9411
Article|CAS|PubMed|Google Scholar
4.Fan S, Spence JP, Feng Y, Hansen MEB, Terhorst J, Beltrame MH, et al. Whole-genome sequencing reveals a complex African population demographic history and signatures of local adaptation. Cell. 2023;186.(2023)923–39.: 923.
10.1016/j.cell.2023.01.042
Article|CAS|PubMed|Google Scholar
5.Li X, Yang J, Shen M, Xie X, Liu G, Xu Y, et al. Whole-genome resequencing of wild and domestic sheep identifies genes associated with morphological and agronomic traits. Nat Commun. 2020;11.(2020)2815.: 2815.
10.1038/s41467-020-16485-1
Article|CAS|PubMed|Google Scholar
6.Dang D, Guan Y, Zheng H, Zhang X, Zhang A, Wang H, et al. Genome wide association study and genomic prediction on plant architecture traits in sweet corn and waxy corn. Plants (Basel). 2023;12.(2023)303.: 303.
Article|CAS|PubMed|Google Scholar
7.Dong Y, Xie M, Jiang Y, Xiao N, Du X, Zhang W, et al. Sequencing and automated whole-genome optical mapping of the genome of a domestic goat (Capra hircus). Nat Biotechnol. 2013;31.(2013)135–41.: 135.
10.1038/nbt.2478
Article|CAS|PubMed|Google Scholar
8.Bickhart DM, Rosen BD, Koren S, Sayre BL, Hastie AR, Chan S, et al. Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nat Genet. 2017;49.(2017)643–50.: 643.
10.1038/ng.3802
Article|CAS|PubMed|Google Scholar
9.Gao L, Gonda I, Sun H, Ma Q, Bao K, Tieman DM, et al. The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor. Nat Genet. 2019;51.(2019)1044–51.: 1044.
10.1038/s41588-019-0410-2
Article|CAS|PubMed|Google Scholar
10.Li Q, Tian S, Yan B, Liu C, Lam T-W, Li R, et al. Building a Chinese pan-genome of 486 individuals. Commun Bio. 2021;4.(2021)1016.: 1016.
10.1038/s42003-021-02556-6
Article|CAS|PubMed|Google Scholar
11.Li Z, Liu X, Wang C, Li Z, Jiang B, Zhang R, et al. The pig pangenome provides insights into the roles of coding structural variations in genetic diversity and adaptation. Genome Res. 2023;33.(2023)1833–47.: 1833.
10.1101/gr.277638.122
Article|CAS|PubMed|Google Scholar
12.Torkamaneh D, Lemay MA, Belzile F. The pan-genome of the cultivated soybean (PanSoy) reveals an extraordinarily conserved gene content. Plant Biotechnol J. 2021;19.(2021)1852–62.: 1852.
10.1111/pbi.13600
Article|CAS|PubMed|Google Scholar
13.Wang K, Hu H, Tian Y, Li J, Scheben A, Zhang C, et al. The chicken pan-genome reveals gene content variation and a promoter region deletion inIGF2BP1affecting body size. Mol Biol Evol. 2021;38.(2021)5066–81.: 5066.
10.1093/molbev/msab231
Article|CAS|PubMed|Google Scholar
14.Gong Y, Li Y, Liu X, Ma Y, Jiang L. A review of the pangenome.(2023)how it affects our understanding of genomic variation, selection and breeding in domestic animals? J Anim Sci Biotechnol.: 73.
Article|CAS|PubMed|Google Scholar
15.Sudmant PH, Mallick S, Nelson BJ, Hormozdiari F, Krumm N, Huddleston J, et al. Global diversity, population stratification, and selection of human copy-number variation. Science. 2015;349.(2015)aab3761.
10.1126/science.aab3761
Article|CAS|PubMed|Google Scholar
16.Alkan C, Coe BP, Eichler EE. Genome structural variation discovery and genotyping. Nat Rev Genet. 2011;12.(2011)363–76.: 363.
10.1038/nrg2958
Article|CAS|PubMed|Google Scholar
17.Abel HJ, Larson DE, Regier AA, Chiang C, Das I, Kanchi KL, et al. Mapping and characterization of structural variation in 17,795 human genomes. Nature. 2020;583.(2020)83–9.: 83.
10.1038/s41586-020-2371-0
Article|CAS|PubMed|Google Scholar
18.Li R, Fu W, Su R, Tian X, Du D, Zhao Y, et al. Towards the complete goat pan-genome by recovering missing genomic segments from the reference genome. Front Genet. 2019;10.(2019)1169.: 1169.
10.3389/fgene.2019.01169
Article|CAS|PubMed|Google Scholar
19.Wang W, Mauleon R, Hu Z, Chebotarov D, Tai S, Wu Z, et al. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature. 2018;557.(2018)43–9.: 43.
10.1038/s41586-018-0063-9
Article|CAS|PubMed|Google Scholar
20.Bolger AM, Lohse M, Usadel B. Trimmomatic.(2014)a flexible trimmer for Illumina sequence data.Bioinformatics.: 2114.
10.1093/bioinformatics/btu170
Article|CAS|PubMed|Google Scholar
21.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25.(2009)1754–60.: 1754.
10.1093/bioinformatics/btp324
Article|CAS|PubMed|Google Scholar
22.Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, et al. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10.(2021)giab008.
10.1093/gigascience/giab008
Article|CAS|PubMed|Google Scholar
23.Sakaue S, Kanai M, Tanigawa Y, Karjalainen J, Kurki M, Koshiba S, et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat Genet. 2021;53.(2021)1415–24.: 1415.
10.1038/s41588-021-00931-x
Article|CAS|PubMed|Google Scholar
24.Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A. MUMmer4.(2018)a fast and versatile genome alignment system.PLoS Comput Biol.
10.1371/journal.pcbi.1005944
Article|CAS|PubMed|Google Scholar
25.Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT.(2012)accelerated for clustering the next-generation sequencing data.Bioinformatics.: 3150.
10.1093/bioinformatics/bts565
Article|CAS|PubMed|Google Scholar
26.Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+.(2009)architecture and applications.BMC Bioinformatics.: 421.
10.1186/1471-2105-10-421
Article|CAS|PubMed|Google Scholar
27.Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol. 2019;20.(2019)257.: 257.
10.1186/s13059-019-1891-0
Article|CAS|PubMed|Google Scholar
28.Holt C, Yandell M. MAKER2.(2011)an annotation pipeline and genome-database management tool for second-generation genome projects.BMC Bioinformatics.: 491.
10.1186/1471-2105-12-491
Article|CAS|PubMed|Google Scholar
29.Tarailo-Graovac M, Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics. 2009;25.(2009)10.1-14.: 4.
Article|CAS|PubMed|Google Scholar
30.Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci USA. 2020;117.(2020)9451–7.: 9451.
10.1073/pnas.1921046117
Article|CAS|PubMed|Google Scholar
31.Stanke M, Schöffmann O, Morgenstern B, Waack S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics. 2006;7.(2006)62.: 62.
10.1186/1471-2105-7-62
Article|CAS|PubMed|Google Scholar
32.Korf I. Gene finding in novel genomes. BMC Bioinformatics. 2004;5.(2004)59.: 59.
10.1186/1471-2105-5-59
Article|CAS|PubMed|Google Scholar
33.Kim D, Langmead B, Salzberg SL. HISAT.(2015)a fast spliced aligner with low memory requirements.Nat Methods.: 357.
10.1038/nmeth.3317
Article|CAS|PubMed|Google Scholar
34.Pertea M, Pertea GM, Antonescu CM, Chang T-C, Mendell JT, Salzberg SL. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33.(2015)290–5.: 290.
10.1038/nbt.3122
Article|CAS|PubMed|Google Scholar
35.Iwata H, Gotoh O. Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln that incorporates additional species-specific features. Nucleic Acids Res. 2012;40.(2012)e161.
10.1093/nar/gks708
Article|CAS|PubMed|Google Scholar
36.DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43.(2011)491–8.: 491.
10.1038/ng.806
Article|CAS|PubMed|Google Scholar
37.Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27.(2011)2156–8.: 2156.
10.1093/bioinformatics/btr330
Article|CAS|PubMed|Google Scholar
38.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK.(2007)A tool set for whole-genome association and population-based linkage analyses.Am J Hum Genet.: 559.
10.1086/519795
Article|CAS|PubMed|Google Scholar
39.Price MN, Dehal PS, Arkin AP. FastTree.(2009)computing large minimum evolution trees with profiles instead of a distance matrix.Mol Biol Evol.: 1641.
10.1093/molbev/msp077
Article|CAS|PubMed|Google Scholar
40.Letunic I, Bork P. Interactive tree of life (iTOL) v3.(2016)an online tool for the display and annotation of phylogenetic and other trees.Nucleic Acids Res.
10.1093/nar/gkw290
Article|CAS|PubMed|Google Scholar
41.Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19.(2009)1655–64.: 1655.
10.1101/gr.094052.109
Article|CAS|PubMed|Google Scholar
42.Golicz AA, Martinez PA, Zander M, Patel DA, Van De Wouw AP, Visendi P, et al. Gene loss in the fungal canola pathogen Leptosphaeria maculans. Funct Integr Genomics. 2015;15.(2015)189–96.: 189.
10.1007/s10142-014-0412-1
Article|CAS|PubMed|Google Scholar
43.Golicz AA, Bayer PE, Barker GC, Edger PP, Kim H, Martinez PA, et al. The pangenome of an agronomically important crop plantBrassica oleracea. Nat Commun. 2016;7.(2016)13390.: 13390.
10.1038/ncomms13390
Article|CAS|PubMed|Google Scholar
44.Saravanan KA, Panigrahi M, Kumar H, Bhushan B, Dutt T, Mishra BP. Selection signatures in livestock genome.(2020)a review of concepts, approaches and applications.Livest Sci.: 104257.
10.1016/j.livsci.2020.104257
Article|CAS|PubMed|Google Scholar
45.Gordon SP, Contreras-Moreira B, Woods DP, Des Marais DL, Burgess D, Shu S, et al. Extensive gene content variation in theBrachypodium distachyonpan-genome correlates with population structure. Nat Commun. 2017;8.(2017)2184.: 2184.
10.1038/s41467-017-02292-8
Article|CAS|PubMed|Google Scholar
46.Li X, Wang Y, Cai C, Ji J, Han F, Zhang L, et al. Large-scale gene expression alterations introduced by structural variation drive morphotype diversification inBrassica oleracea. Nat Genet. 2024;56.(2024)517–29.: 517.
10.1038/s41588-024-01655-4
Article|CAS|PubMed|Google Scholar
47.Kang M, Wu H, Liu H, Liu W, Zhu M, Han Y, et al. The pan-genome and local adaptation ofArabidopsis thaliana. Nat Commun. 2023;14.(2023)6259.: 6259.
10.1038/s41467-023-42029-4
Article|CAS|PubMed|Google Scholar
48.Malik U, Javed N. FAM26F.(2023)An enigmatic protein having a complex role in the immune system.Int Rev Immunol.: 247.
10.1080/08830185.2016.1206098
Article|CAS|PubMed|Google Scholar
49.Yao Y, Jiang P, Chao BN, Cagdas D, Kubo S, Balasubramaniyam A, et al.GIMAP6regulates autophagy, immune competence, and inflammation in mice and humans. J Exp Med. 2022;219.(2022)e20201405.
10.1084/jem.20201405
Article|CAS|PubMed|Google Scholar
50.Li R, Li Y, Zheng H, Luo R, Zhu H, Li Q, et al. Building the sequence map of the human pan-genome. Nat Biotechnol. 2010;28.(2010)57–63.: 57.
10.1038/nbt.1596
Article|CAS|PubMed|Google Scholar
51.Sherman RM, Forman J, Antonescu V, Puiu D, Daya M, Rafaels N, et al. Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nat Genet. 2019;51.(2019)30–5.: 30.
10.1038/s41588-018-0273-y
Article|CAS|PubMed|Google Scholar
52.Wu M, Wang D, Li M, Lv F. Artificial selection shapes the lower genomic diversity and higher selective pressures on the sex chromosomes of domestic animals. Sci China Life Sci. 2024;67.(2024)1072–5.: 1072.
10.1007/s11427-023-2478-5
Article|CAS|PubMed|Google Scholar
53.Zheng Z, Wang X, Li M, Li Y, Yang Z, Wang X, et al. The origin of domestication genes in goats. Sci Adv. 2020;6.(2020)eaaz5216.
10.1126/sciadv.aaz5216
Article|CAS|PubMed|Google Scholar
54.Tian X, Li R, Fu W, Li Y, Wang X, Li M, et al. Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data. Sci China Life Sci. 2020;63.(2020)750–63.: 750.
10.1007/s11427-019-9551-7
Article|CAS|PubMed|Google Scholar
55.Gao G, Zhang H, Ni J, Zhao X, Zhang K, Wang J, et al. Insights into genetic diversity and phenotypic variations in domestic geese through comprehensive population and pan-genome analysis. J Anim Sci Biotechnol. 2023;14.(2023)150.: 150.
10.1186/s40104-023-00944-y
Article|CAS|PubMed|Google Scholar
56.Li J, Yuan D, Wang P, Wang Q, Sun M, Liu Z, et al. Cotton pan-genome retrieves the lost sequences and genes during domestication and selection. Genome Biol. 2021;22.(2021)119.: 119.
10.1186/s13059-021-02351-w
Article|CAS|PubMed|Google Scholar
57.Lai JJ, Cruz FM, Rock KL. Immune sensing of cell death through recognition of histone sequences by C-Type Lectin-Receptor-2d causes inflammation and tissue injury. Immunity. 2020;52.(2020)123–35.: 123.
10.1016/j.immuni.2019.11.013
Article|CAS|PubMed|Google Scholar
58.Mathewson ND, Ashenberg O, Tirosh I, Gritsch S, Perez EM, Marx S, et al. Inhibitory CD161 receptor identified in glioma-infiltrating T cells by single-cell analysis. Cell. 2021;184.(2021)1281–98.: 1281.
10.1016/j.cell.2021.01.022
Article|CAS|PubMed|Google Scholar
59.Amills M, Capote J, Tosser-Klopp G. Goat domestication and breeding.(2017)a jigsaw of historical, biological and molecular data with missing pieces.Anim Genet.: 631.
10.1111/age.12598
Article|CAS|PubMed|Google Scholar
60.Weinberg P, Ambarli H.Capra aegagrus. The IUCN Red List of Threatened Species. 2020;e.T3786A22145942.https.(2020)//dx. doi. org/10. 2305/IUCN.UK.
Article|CAS|PubMed|Google Scholar
61.Pascall JC, Webb LMC, Eskelinen E-L, Innocentin S, Attaf-Bouabdallah N, Butcher GW. GIMAP6 is required for T cell maintenance and efficient autophagy in mice. PLoS ONE. 2018;13.(2018)e0196504.
10.1371/journal.pone.0196504
Article|CAS|PubMed|Google Scholar
62.Ryan EJ, Marshall AJ, Magaletti D, Floyd H, Draves KE, Olson NE, et al. Dendritic cell-associated lectin-1.(2002)A novel dendritic cell-associated, C-type lectin-like molecule enhances T cell secretion of IL-41.J Immunol.: 5638.
10.4049/jimmunol.169.10.5638
Article|CAS|PubMed|Google Scholar

Acknowledgements

We thank the High-performance Computing Platform of China Agricultural University for providing computing resources.

Funding

This study was financially supported by grants from the Strategic Priority Research Program of Chinese Academy of Sciences (No. XDA24030205), the National Natural Science Foundation of China (Nos. U21A20246, 32102511), the National Key Research and Development Program-Key Projects (2021YFD1200900 and 2021YFD1300904), the Second Tibetan Plateau Scientific Expedition and Research Program (STEP) (No. 2019QZKK0501), Biological Breeding-National Science and Technology Major Project(2023ZD0407106), and Chinese Universities Scientific Fund (2024TC162).

Ethics Declaration

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Rights and Permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Reprints and permissions

[1] 1.Denoyelle L, Talouarn E, Bardou P, Colli L, Alberti A, Danchin C, et al. VarGoats project.(2021)a dataset of 1159 whole-genome sequences to dissectCapra hircusglobal diversity.Genet Sel Evol.: 86.
10.1186/s12711-021-00659-6
Article|CAS|PubMed|Google Scholar

[2] 2.Pogorevc N, Dotsev A, Upadhyay M, Sandoval-Castellanos E, Hannemann E, Simčič M, et al. Whole-genome SNP genotyping unveils ancestral and recent introgression in wild and domestic goats. Mol Ecol. 2024;33.(2024)e17190.
10.1111/mec.17190
Article|CAS|PubMed|Google Scholar

[3] 3.Daly KG, Delser PM, Mullin VE, Scheu A, Mattiangeli V, Teasdale MD, et al. Ancient goat genomes reveal mosaic domestication in the Fertile Crescent. Science. 2018;361.(2018)85–8.: 85.
10.1126/science.aas9411
Article|CAS|PubMed|Google Scholar

[4] 4.Fan S, Spence JP, Feng Y, Hansen MEB, Terhorst J, Beltrame MH, et al. Whole-genome sequencing reveals a complex African population demographic history and signatures of local adaptation. Cell. 2023;186.(2023)923–39.: 923.
10.1016/j.cell.2023.01.042
Article|CAS|PubMed|Google Scholar

[5] 5.Li X, Yang J, Shen M, Xie X, Liu G, Xu Y, et al. Whole-genome resequencing of wild and domestic sheep identifies genes associated with morphological and agronomic traits. Nat Commun. 2020;11.(2020)2815.: 2815.
10.1038/s41467-020-16485-1
Article|CAS|PubMed|Google Scholar

[6] 6.Dang D, Guan Y, Zheng H, Zhang X, Zhang A, Wang H, et al. Genome wide association study and genomic prediction on plant architecture traits in sweet corn and waxy corn. Plants (Basel). 2023;12.(2023)303.: 303.
Article|CAS|PubMed|Google Scholar

[7] 7.Dong Y, Xie M, Jiang Y, Xiao N, Du X, Zhang W, et al. Sequencing and automated whole-genome optical mapping of the genome of a domestic goat (Capra hircus). Nat Biotechnol. 2013;31.(2013)135–41.: 135.
10.1038/nbt.2478
Article|CAS|PubMed|Google Scholar

[8] 8.Bickhart DM, Rosen BD, Koren S, Sayre BL, Hastie AR, Chan S, et al. Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nat Genet. 2017;49.(2017)643–50.: 643.
10.1038/ng.3802
Article|CAS|PubMed|Google Scholar

[9] 9.Gao L, Gonda I, Sun H, Ma Q, Bao K, Tieman DM, et al. The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor. Nat Genet. 2019;51.(2019)1044–51.: 1044.
10.1038/s41588-019-0410-2
Article|CAS|PubMed|Google Scholar

[10] 10.Li Q, Tian S, Yan B, Liu C, Lam T-W, Li R, et al. Building a Chinese pan-genome of 486 individuals. Commun Bio. 2021;4.(2021)1016.: 1016.
10.1038/s42003-021-02556-6
Article|CAS|PubMed|Google Scholar

[11] 11.Li Z, Liu X, Wang C, Li Z, Jiang B, Zhang R, et al. The pig pangenome provides insights into the roles of coding structural variations in genetic diversity and adaptation. Genome Res. 2023;33.(2023)1833–47.: 1833.
10.1101/gr.277638.122
Article|CAS|PubMed|Google Scholar

[12] 12.Torkamaneh D, Lemay MA, Belzile F. The pan-genome of the cultivated soybean (PanSoy) reveals an extraordinarily conserved gene content. Plant Biotechnol J. 2021;19.(2021)1852–62.: 1852.
10.1111/pbi.13600
Article|CAS|PubMed|Google Scholar

[13] 13.Wang K, Hu H, Tian Y, Li J, Scheben A, Zhang C, et al. The chicken pan-genome reveals gene content variation and a promoter region deletion inIGF2BP1affecting body size. Mol Biol Evol. 2021;38.(2021)5066–81.: 5066.
10.1093/molbev/msab231
Article|CAS|PubMed|Google Scholar

[14] 14.Gong Y, Li Y, Liu X, Ma Y, Jiang L. A review of the pangenome.(2023)how it affects our understanding of genomic variation, selection and breeding in domestic animals? J Anim Sci Biotechnol.: 73.
Article|CAS|PubMed|Google Scholar

[15] 15.Sudmant PH, Mallick S, Nelson BJ, Hormozdiari F, Krumm N, Huddleston J, et al. Global diversity, population stratification, and selection of human copy-number variation. Science. 2015;349.(2015)aab3761.
10.1126/science.aab3761
Article|CAS|PubMed|Google Scholar

[16] 16.Alkan C, Coe BP, Eichler EE. Genome structural variation discovery and genotyping. Nat Rev Genet. 2011;12.(2011)363–76.: 363.
10.1038/nrg2958
Article|CAS|PubMed|Google Scholar

[17] 17.Abel HJ, Larson DE, Regier AA, Chiang C, Das I, Kanchi KL, et al. Mapping and characterization of structural variation in 17,795 human genomes. Nature. 2020;583.(2020)83–9.: 83.
10.1038/s41586-020-2371-0
Article|CAS|PubMed|Google Scholar

[18] 18.Li R, Fu W, Su R, Tian X, Du D, Zhao Y, et al. Towards the complete goat pan-genome by recovering missing genomic segments from the reference genome. Front Genet. 2019;10.(2019)1169.: 1169.
10.3389/fgene.2019.01169
Article|CAS|PubMed|Google Scholar

[19] 19.Wang W, Mauleon R, Hu Z, Chebotarov D, Tai S, Wu Z, et al. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature. 2018;557.(2018)43–9.: 43.
10.1038/s41586-018-0063-9
Article|CAS|PubMed|Google Scholar

[20] 20.Bolger AM, Lohse M, Usadel B. Trimmomatic.(2014)a flexible trimmer for Illumina sequence data.Bioinformatics.: 2114.
10.1093/bioinformatics/btu170
Article|CAS|PubMed|Google Scholar

[21] 21.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25.(2009)1754–60.: 1754.
10.1093/bioinformatics/btp324
Article|CAS|PubMed|Google Scholar

[22] 22.Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, et al. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10.(2021)giab008.
10.1093/gigascience/giab008
Article|CAS|PubMed|Google Scholar

[23] 23.Sakaue S, Kanai M, Tanigawa Y, Karjalainen J, Kurki M, Koshiba S, et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat Genet. 2021;53.(2021)1415–24.: 1415.
10.1038/s41588-021-00931-x
Article|CAS|PubMed|Google Scholar

[24] 24.Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A. MUMmer4.(2018)a fast and versatile genome alignment system.PLoS Comput Biol.
10.1371/journal.pcbi.1005944
Article|CAS|PubMed|Google Scholar

[25] 25.Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT.(2012)accelerated for clustering the next-generation sequencing data.Bioinformatics.: 3150.
10.1093/bioinformatics/bts565
Article|CAS|PubMed|Google Scholar

[26] 26.Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+.(2009)architecture and applications.BMC Bioinformatics.: 421.
10.1186/1471-2105-10-421
Article|CAS|PubMed|Google Scholar

[27] 27.Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol. 2019;20.(2019)257.: 257.
10.1186/s13059-019-1891-0
Article|CAS|PubMed|Google Scholar

[28] 28.Holt C, Yandell M. MAKER2.(2011)an annotation pipeline and genome-database management tool for second-generation genome projects.BMC Bioinformatics.: 491.
10.1186/1471-2105-12-491
Article|CAS|PubMed|Google Scholar

[29] 29.Tarailo-Graovac M, Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics. 2009;25.(2009)10.1-14.: 4.
Article|CAS|PubMed|Google Scholar

[30] 30.Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci USA. 2020;117.(2020)9451–7.: 9451.
10.1073/pnas.1921046117
Article|CAS|PubMed|Google Scholar

[31] 31.Stanke M, Schöffmann O, Morgenstern B, Waack S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics. 2006;7.(2006)62.: 62.
10.1186/1471-2105-7-62
Article|CAS|PubMed|Google Scholar

[32] 32.Korf I. Gene finding in novel genomes. BMC Bioinformatics. 2004;5.(2004)59.: 59.
10.1186/1471-2105-5-59
Article|CAS|PubMed|Google Scholar

[33] 33.Kim D, Langmead B, Salzberg SL. HISAT.(2015)a fast spliced aligner with low memory requirements.Nat Methods.: 357.
10.1038/nmeth.3317
Article|CAS|PubMed|Google Scholar

[34] 34.Pertea M, Pertea GM, Antonescu CM, Chang T-C, Mendell JT, Salzberg SL. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33.(2015)290–5.: 290.
10.1038/nbt.3122
Article|CAS|PubMed|Google Scholar

[35] 35.Iwata H, Gotoh O. Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln that incorporates additional species-specific features. Nucleic Acids Res. 2012;40.(2012)e161.
10.1093/nar/gks708
Article|CAS|PubMed|Google Scholar

[36] 36.DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43.(2011)491–8.: 491.
10.1038/ng.806
Article|CAS|PubMed|Google Scholar

[37] 37.Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27.(2011)2156–8.: 2156.
10.1093/bioinformatics/btr330
Article|CAS|PubMed|Google Scholar

[38] 38.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK.(2007)A tool set for whole-genome association and population-based linkage analyses.Am J Hum Genet.: 559.
10.1086/519795
Article|CAS|PubMed|Google Scholar

[39] 39.Price MN, Dehal PS, Arkin AP. FastTree.(2009)computing large minimum evolution trees with profiles instead of a distance matrix.Mol Biol Evol.: 1641.
10.1093/molbev/msp077
Article|CAS|PubMed|Google Scholar

[40] 40.Letunic I, Bork P. Interactive tree of life (iTOL) v3.(2016)an online tool for the display and annotation of phylogenetic and other trees.Nucleic Acids Res.
10.1093/nar/gkw290
Article|CAS|PubMed|Google Scholar

[41] 41.Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19.(2009)1655–64.: 1655.
10.1101/gr.094052.109
Article|CAS|PubMed|Google Scholar

[42] 42.Golicz AA, Martinez PA, Zander M, Patel DA, Van De Wouw AP, Visendi P, et al. Gene loss in the fungal canola pathogen Leptosphaeria maculans. Funct Integr Genomics. 2015;15.(2015)189–96.: 189.
10.1007/s10142-014-0412-1
Article|CAS|PubMed|Google Scholar

[43] 43.Golicz AA, Bayer PE, Barker GC, Edger PP, Kim H, Martinez PA, et al. The pangenome of an agronomically important crop plantBrassica oleracea. Nat Commun. 2016;7.(2016)13390.: 13390.
10.1038/ncomms13390
Article|CAS|PubMed|Google Scholar

[44] 44.Saravanan KA, Panigrahi M, Kumar H, Bhushan B, Dutt T, Mishra BP. Selection signatures in livestock genome.(2020)a review of concepts, approaches and applications.Livest Sci.: 104257.
10.1016/j.livsci.2020.104257
Article|CAS|PubMed|Google Scholar

[45] 45.Gordon SP, Contreras-Moreira B, Woods DP, Des Marais DL, Burgess D, Shu S, et al. Extensive gene content variation in theBrachypodium distachyonpan-genome correlates with population structure. Nat Commun. 2017;8.(2017)2184.: 2184.
10.1038/s41467-017-02292-8
Article|CAS|PubMed|Google Scholar

[46] 46.Li X, Wang Y, Cai C, Ji J, Han F, Zhang L, et al. Large-scale gene expression alterations introduced by structural variation drive morphotype diversification inBrassica oleracea. Nat Genet. 2024;56.(2024)517–29.: 517.
10.1038/s41588-024-01655-4
Article|CAS|PubMed|Google Scholar

[47] 47.Kang M, Wu H, Liu H, Liu W, Zhu M, Han Y, et al. The pan-genome and local adaptation ofArabidopsis thaliana. Nat Commun. 2023;14.(2023)6259.: 6259.
10.1038/s41467-023-42029-4
Article|CAS|PubMed|Google Scholar

[48] 48.Malik U, Javed N. FAM26F.(2023)An enigmatic protein having a complex role in the immune system.Int Rev Immunol.: 247.
10.1080/08830185.2016.1206098
Article|CAS|PubMed|Google Scholar

[49] 49.Yao Y, Jiang P, Chao BN, Cagdas D, Kubo S, Balasubramaniyam A, et al.GIMAP6regulates autophagy, immune competence, and inflammation in mice and humans. J Exp Med. 2022;219.(2022)e20201405.
10.1084/jem.20201405
Article|CAS|PubMed|Google Scholar

[50] 50.Li R, Li Y, Zheng H, Luo R, Zhu H, Li Q, et al. Building the sequence map of the human pan-genome. Nat Biotechnol. 2010;28.(2010)57–63.: 57.
10.1038/nbt.1596
Article|CAS|PubMed|Google Scholar

[51] 51.Sherman RM, Forman J, Antonescu V, Puiu D, Daya M, Rafaels N, et al. Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nat Genet. 2019;51.(2019)30–5.: 30.
10.1038/s41588-018-0273-y
Article|CAS|PubMed|Google Scholar

[52] 52.Wu M, Wang D, Li M, Lv F. Artificial selection shapes the lower genomic diversity and higher selective pressures on the sex chromosomes of domestic animals. Sci China Life Sci. 2024;67.(2024)1072–5.: 1072.
10.1007/s11427-023-2478-5
Article|CAS|PubMed|Google Scholar

[53] 53.Zheng Z, Wang X, Li M, Li Y, Yang Z, Wang X, et al. The origin of domestication genes in goats. Sci Adv. 2020;6.(2020)eaaz5216.
10.1126/sciadv.aaz5216
Article|CAS|PubMed|Google Scholar

[54] 54.Tian X, Li R, Fu W, Li Y, Wang X, Li M, et al. Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data. Sci China Life Sci. 2020;63.(2020)750–63.: 750.
10.1007/s11427-019-9551-7
Article|CAS|PubMed|Google Scholar

[55] 55.Gao G, Zhang H, Ni J, Zhao X, Zhang K, Wang J, et al. Insights into genetic diversity and phenotypic variations in domestic geese through comprehensive population and pan-genome analysis. J Anim Sci Biotechnol. 2023;14.(2023)150.: 150.
10.1186/s40104-023-00944-y
Article|CAS|PubMed|Google Scholar

[56] 56.Li J, Yuan D, Wang P, Wang Q, Sun M, Liu Z, et al. Cotton pan-genome retrieves the lost sequences and genes during domestication and selection. Genome Biol. 2021;22.(2021)119.: 119.
10.1186/s13059-021-02351-w
Article|CAS|PubMed|Google Scholar

[57] 57.Lai JJ, Cruz FM, Rock KL. Immune sensing of cell death through recognition of histone sequences by C-Type Lectin-Receptor-2d causes inflammation and tissue injury. Immunity. 2020;52.(2020)123–35.: 123.
10.1016/j.immuni.2019.11.013
Article|CAS|PubMed|Google Scholar

[58] 58.Mathewson ND, Ashenberg O, Tirosh I, Gritsch S, Perez EM, Marx S, et al. Inhibitory CD161 receptor identified in glioma-infiltrating T cells by single-cell analysis. Cell. 2021;184.(2021)1281–98.: 1281.
10.1016/j.cell.2021.01.022
Article|CAS|PubMed|Google Scholar

[59] 59.Amills M, Capote J, Tosser-Klopp G. Goat domestication and breeding.(2017)a jigsaw of historical, biological and molecular data with missing pieces.Anim Genet.: 631.
10.1111/age.12598
Article|CAS|PubMed|Google Scholar

[60] 60.Weinberg P, Ambarli H.Capra aegagrus. The IUCN Red List of Threatened Species. 2020;e.T3786A22145942.https.(2020)//dx. doi. org/10. 2305/IUCN.UK.
Article|CAS|PubMed|Google Scholar

[61] 61.Pascall JC, Webb LMC, Eskelinen E-L, Innocentin S, Attaf-Bouabdallah N, Butcher GW. GIMAP6 is required for T cell maintenance and efficient autophagy in mice. PLoS ONE. 2018;13.(2018)e0196504.
10.1371/journal.pone.0196504
Article|CAS|PubMed|Google Scholar

[62] 62.Ryan EJ, Marshall AJ, Magaletti D, Floyd H, Draves KE, Olson NE, et al. Dendritic cell-associated lectin-1.(2002)A novel dendritic cell-associated, C-type lectin-like molecule enhances T cell secretion of IL-41.J Immunol.: 5638.
10.4049/jimmunol.169.10.5638
Article|CAS|PubMed|Google Scholar

Journal of Animal Science and Biotechnology

The goat pan-genome reveals patterns of gene loss during domestication

Abstract

Background

Results

Conclusions

Keywords

Background

Methods

Genome data collection, processing, and pan-genome construction

Annotation of the nonreference sequences

Novel SNP calling and population genetic analysis

PAV calling and analysis

Results

Pan-genome of Capra genus

Novel SNP calling and population genetic analysis

Characterization of gene PAVs

Gene PAVs associated with domestication and improvement of goats

Discussion

Conclusion

Data Availability

Abbreviations

References

Acknowledgements

Funding

Ethics Declaration

Ethics approval and consent to participate

Consent for publication

Competing interests

Rights and Permissions