Integrating parental genomes to reduce reference bias and identify intramuscular fat genes in Qinchuan Black pigs

Taiyong Yu

doi:10.1186/s40104-025-01236-3

Methodology article

Integrating parental genomes to reduce reference bias and identify intramuscular fat genes in Qinchuan Black pigs

¹ ,¹ ,¹ ,^1,2 ,¹ ,¹ ,¹ ,^1,3 ,¹ ,¹

1.Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, Laboratory of Animal Fat Deposition & Muscle Development, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, 712100, China

2.Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China

3.Key Laboratory of Agroecological Processes in Subtropical Region, Institute of Subtropical Agriculture, Chinese Academy of Sciences, Changsha, 410125, China

Corresponding author: Rongrong Ding (dingrongrong92@nwafu.edu.cn)

Contributions: Author notes Guangquan Lv and Peiyu Yang contributed equally to this work. Authors and Affiliations Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, Laboratory of Animal Fat Deposition & Muscle Development, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, 712100, China Guangquan Lv, Peiyu Yang, Ao Guo, Minghao Cao, Dong Li, Zhe Liu, Mingyu Wang, Jingchun Sun, Rongrong Ding & Taiyong Yu Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, 650201, China Minghao Cao Key Laboratory of Agroecological Processes in Subtropical Region, Institute of Subtropical Agriculture, Chinese Academy of Sciences, Changsha, 410125, China Jingchun Sun Authors Guangquan Lv View author publications Search author on: PubMed Google Scholar Peiyu Yang View author publications Search author on: PubMed Google Scholar Ao Guo View author publications Search author on: PubMed Google Scholar Minghao Cao View author publications Search author on: PubMed Google Scholar Dong Li View author publications Search author on: PubMed Google Scholar Zhe Liu View author publications Search author on: PubMed Google Scholar Mingyu Wang View author publications Search author on: PubMed Google Scholar Jingchun Sun View author publications Search author on: PubMed Google Scholar Rongrong Ding View author publications Search author on: PubMed Google Scholar Taiyong Yu View author publications Search author on: PubMed Google Scholar Contributions LGQ: Writing – original draft, Visualization, Software, Methodology, Investigation, Formal analysis, Conceptualization, Validation. YPY: Software, Methodology, Visualization, Resources, Investigation. GA: Validation, Software, Formal analysis. CMH: Software, Investigation. LD Software, Methodology. LZ: Investigation, Formal analysis. WMY: Investigation, Formal analysis. SJC: Formal analysis. DRR and YTY: Writing – review & editing, Supervision, Project administration, Conceptualization. Corresponding authors Correspondence to Rongrong Ding or Taiyong Yu .

Abstract

Background

Traditional genomic analysis relies on a single reference genome, which struggles to effectively characterize the genetic diversity among populations. This is due to the substantial genetic differences between the genome of the studied species and the reference genome, potentially introducing reference bias.

Results

In this study, we focused on Guanzhong Black pigs (GZB), Danish Large White pigs (DLW), and their hybrid offspring, Qinchuan Black pigs (QCB). We provided two high-quality parental genomes at the chromosomal level and constructed a parental genomic reference panel to detect SNPs (single nucleotide polymorphisms), INDELs (insertions and deletions), and SVs (structural variations). Compared with the single-reference method, the integrated parental genomic strategy identified 5.48% more SNPs and 67.84% more INDELs. The uniformity of variant distribution and genome functional annotation remained consistent before and after integration, while the ratio of non-reference/non-reference genotypes was also improved. In population genetic structure analysis, principal component analysis (PCA) of the three variant types (SNPs, INDELs, and SVs) exhibited good clustering effects, and ADMIXTURE analysis demonstrated consistent stratification. Selection signal analysis based on the integrated parental genomic strategy successfully identified more differentiated windows and positively selected genes. By leveraging multiple variant types and employing two selection signal methods, we jointly identified several novel intramuscular fat candidate genes (MSMO1,SMC6,CCDC158,KIT,CCNC, etc.), which could not be identified by the single-reference method alone. Functional validation of the geneMSMO1revealed its role in promoting intramuscular adipocyte proliferation and inhibiting adipogenic differentiation.

Conclusions

This study is the first to construct a parental genomic reference panel specifically for pig hybrid populations, which significantly reduces reference bias and exhibits superior performance in downstream analyses. This strategy offers new possibilities for genomic selection breeding of livestock and establishes a methodological foundation for precisely dissecting complex traits in hybrid populations.

Keywords

Comparative GenomicsComparative genomic hybridizationConservation genomicsGenetic mappingGenomic AnalysisGenotypingIntramuscular fatParental genomePigQuantitative trait

Background

Long-read sequencing has revolutionized genomics, offering critical insights into genome complexity, structural variations, and genetic diversity [1]. Unlike short-read sequencing, which may have difficulty processing repetitive regions and miss some hidden variants (long fragment insertions, methylation modifications, etc.) [2,3,4]. Long-read sequencing is conducive to improving the continuity, accuracy, and scope of variant phasing [5], thereby achieving more accurate de novo assembly and mapping to more genetic variations. This progress is particularly valuable in the field of animal genetics, where high-quality genome assembly can better identify previously undiscovered variants associated with desired traits [6,7,8,9,10]. In recent years, with the advancement of bioinformatics tools, a variety of new methods have been proposed that have significantly improved the performance of long-read sequencing in variant detection and data processing, making it an extremely powerful tool for exploring genomes [11].

A key challenge in population genomics is the reliance on a single reference genome, which can introduce biases when studying populations that diverge greatly from the reference genome, particularly in highly diverse individuals [12, 13]. For populations with large phylogenetic distances or rich genetic diversity (e.g., hybrid populations), analyses based on a single reference can obscure variant frequencies and mask important genetic information [14, 15]. Because single-reference approaches often underestimate or simplify genetic diversity, they fail to fully capture the unique variation in a population’s genome. Utilizing a population-specific reference genome can reduce these biases, improve the reliability of genetic variant calls, and enhance downstream analyses [16]. Ideally, for hybrid populations, genotyping that incorporates both parental genomes would capture all genetic variation segregating within the population [17], creating a comprehensive reference framework for more efficient dissection of the genetic basis of complex traits.

Qinchuan Black pig (QCB) is a hybrid breed derived from the crossing of Guanzhong Black pig (GZB) and Danish Large White pig (DLW). Renowned for its high-quality meat and intramuscular fat content [18], QCB holds significant research value. Given the notable genetic differences between its parental breeds, traditional single-reference genome-based analyses may encounter substantial limitations. The strategy of integrating both parental genomes holds great promise in the genomic studies of QCB. By combining the genomic information of the two parents, it may potentially enhance the comprehensiveness of genetic variation detection. Since the genetic composition of QCB is a combination of the genomes of two distinct breeds, using a traditional single-reference strategy may overlook many breed-specific and hybrid-related genetic variations. The strategy of integrating both parental genomes may be able to identify these concealed variations, which is crucial for dissecting the genetic mechanisms underlying complex traits in QCB. Furthermore, by incorporating the genomic information of both parents, this strategy may effectively improve the completeness of variation detection and the accuracy of genotyping. It also aids in revealing the genomic characteristics of various types of variations, such as Structural Variation (SVs) and Small Insertion and Deletion (INDELs).

In addition to the research value of the QCB breed itself, the integration strategy of both parental genomes also has broad application potential in the entire livestock genomics. For genome selection and genetic map construction in livestock breeding, this method can improve the resolution of complex traits, especially in cross-breeding projects, and is expected to significantly reduce bias in genotyping, thereby improving breeding accuracy. By integrating the genomes of both parents, breeders have the potential to more accurately capture genetic variations associated with key traits such as disease resistance, rapid growth, and high meat quality, providing a scientific basis for selecting desirable traits in the breeding process and thereby effectively supporting future selective breeding efforts.

Materials and methods

Sample information

Regarding the breeding of QCB pigs, in the early stage, through phenotypic comparison and other analyses, GZB sows (n = 200) and DLW boars (n = 21) were selected as the F0 generation parental stock. By crossing GZB♀ (sows) × DLW♂ (boars), an F1 generation heterozygous population was obtained. From the F2 generation to the F4 generation, pigs from different families within each generation were mated, and superior individuals were selected for the next generation based on growth, reproductive and other relevant indicators. Starting from the F5 generation, based on the collected phenotypic data, genomic selection (GS) and other analytical methods were employed to conduct closed-group breeding among superior individuals, gradually fixing favorable haplotypes. Up to now, the QCB pigs have been bred to the F6 generation.

Fresh blood was collected from a GZB sow at the Guanzhong Black Pig Core Breeding Farm (Xianyang, Shaanxi Province, China) as the DNA source for generating de novo genome assembly sequence data. In addition, we collected genomic data from 40 DLW pigs, 31 GZB pigs, and 37 QCB pigs. The DLW pigs and GZB pigs all originated from the same F0 generation population or their offspring, and the QCB pigs were all from the F5 generation. These re-sequencing data were downloaded from the NCBI Sequence Read Archive (SRA; https://www.ncbi.nlm.nih.gov/sra/; Table S1).

Genome sequencing

For cycle consensus sequencing (CCS), SMRT bell libraries (15 kb) were constructed according to the PacBio protocol and sequenced on the PacBio Sequel II (e) platform to generate HiFi reads. A total of 10,168,284 reads were generated with a coverage of 35.42 ×. For short-read DNA sequencing, the sequencing library was constructed according to the MGISEQ protocol and sequenced on the DNBSEQ-T7 platform, with paired-end sequencing of 150 bp.

Genome assembly and annotation

For the genome of GZB pig, we used a fast haplotype-resolved de novo assembler, Hifiasm (v0.16.0; with default parameters) [19] to assemble the PacBio HiFi reads. To generate non-redundant genome sequences, the haplotigs and contig overlaps in the initial assemblies were removed using the purge_dups program, which is based on PacBio read depth and a K-mer counting strategy. The primary contig assembly was scaffolded to chromosome level using RagTag (v2.0.1) [20] with the reference genome assembly (Sscrofa 11.1). Repetitive sequences were identified de novo from the genome of GZB pig using RepeatModeler (v2.0.1) [21]. Then, RepeatMasker (v4.1.1) [22] was used to mask the genome using the de novo repeat library and Repbase (v22.11; https://www.girinst.org/repbase/) [23]. Genome annotation was performed on the repeat-masked pseudochromosome genome assembly using Liftoff (v1.6.1) [24] based on the reference genome. Functional annotation was achieved by comparing predicted proteins against public databases with an E-value threshold of 1e-5, including NCBI non-redundant protein sequences database (Nr), SwissProt, and TrEMBL using Diamond BLASTP (v2.0.11.149) [25]. QUAST (v5.0.2) [26] and BUSCO (v5.2.2) [27] were used with the vertebrate odb10 database to assess genome assembly quality and completeness.

Single reference genotyping

We genotyped 108 high-depth (average sequencing depth 16.6 ×) samples (Table S1). Adapters and low-quality reads were filtered out using fastp (v0.20.0) [28] with default parameters. Clean reads were mapped to the reference genome (Sscrofa 11.1) using BWA-MEM (v0.7.17) [29] with default parameters. SNPs (single nucleotide polymorphisms) and INDELs (insertions and deletions) calling was performed using the GATK pipeline [30] and HaplotypeCaller models.

Construction of pan-genome graphs and graph-based genotyping

The DLW genome has been assembled in previous research studies [10]. The reference genome, the GZB pig genome, and the DLW pig genome were visualized using NGenomeSyn (v2.0) [31] and Minimap2 (v2.28) [32], demonstrating high collinearity of most chromosomal regions.

We ran the minigraph-cactus pipeline using the cactus (v2.4.2) [33] Docker image and the nextflow pipeline (https://github.com/WarrenLab/minigraph-cactus-nf) built for this purpose. We took the GZB pig genome and the DLW pig genome as input. We specified Sscrofa 11.1 as the reference because, while it is not the highest quality assembly, it is the best RefSeq annotated assembly.

The base-level pan-genome map generated above was preprocessed using the genotyping pipeline (https://bitbucket.org/jana_ebler/hprc-experiments/src/chm13-based-pipeline/genotyping-experiments/). The three genomes were subsequently integrated into the biallelic VCF generated here. We genotyped a total of 108 samples from the hybrid population using PanGenie (v3.00) (Table S1) [34]. The integration of variant sets was based on single reference genotyping and graph-based genotyping. Prior to this, we removed a small number of multi-allelic variants and variants containing deletions. For overlapping sites, variants with a genotype consistency exceeding 90% were retained. The following variants were excluded: those with a proportion of missing genotypes greater than 10%, and those with a minor allele frequency (MAF) lower than 5%. For both SNPs and INDELs, variants that deviated from Hardy–Weinberg equilibrium (HWE) with a significance threshold set at P < 0.000001 were also removed.

The variants were further annotated and classified into SNPs, INDELs, other synonymous or non-synonymous variants, intronic variants, and variants located in upstream or downstream regions of genes or intergenic regions using VEP (v110.0) [35].

Phylogenetic and population genomic analyses

We utilized variants that had undergone quality control procedures and employed VCFtools (v0.1.16) [36] and Plink (v1.90) [37] software to convert the VCF-formatted file into Plink binary format. Subsequently, we used the top two principal components (PCs) to assign the 108 pig accessions to PCA clusters. To elucidate the genetic ancestry and the admixture patterns among breeds, further population structure was investigated using ADMIXTURE (v1.3.0) [38] with the optimal population size chosen from K = 2 as that with least error after resampling for cross-validation.

Selective scan analysis

Here, variants were filtered based on the criteria of a missing genotype proportion less than 10% and a minor allele frequency (MAF) below 0.05. Fst values and XP-CLR scores between QCB pigs and DLW pigs were calculated using VCFtools and the XP-CLR package (v1.1.2) [39] with a sliding window of 50 kb and a step size of 25 kb. Results were visualized using the qqman R package (v0.1.4) [40]. The top 5% of regions were designated as candidate selection regions, and the genes in these regions were considered candidate genes. Candidate selected regions were mapped to Animal QTLdb for QTL annotation [41].

We conducted a genome-wide association study (GWAS) based on an integration strategy to identify candidate loci under selection. The approach employed was EigenGWAS [42], which utilized the eigenvector (the first principal component; PC1) derived from principal component analysis (PCA) as the "phenotype" to analyze quality-controlled SNPs, INDELs, and SVs from 37 QCB pigs and 40 DLW pigs. The fact that PC1 clearly distinguished QCB pigs from DLW pigs supported the application of EigenGWAS (Fig. S4). SNPs, INDELs, and SVs with Bonferroni-corrected P-values less than 0.05/n were considered significant loci. We performed gene annotation for the regions 100 kb upstream and downstream of the loci.

Isolation of porcine intramuscular preadipocytes

All experiments were conducted in accordance with relevant guidelines and regulations. Select a healthy 3-day-old male Qinchuan Black piglet (F5 generation), washed in warm water and surface sterilized with 75% ethanol. The longest dorsal muscle was taken and placed in a petri dish under aseptic conditions and rinsed twice with PBS containing 2% penicillin–streptomycin (2% penicillin–streptomycin: PBS = 2:100). The fascia of the longest dorsal muscle was removed and then cut into small pieces and placed into a 50-mL centrifuge tube to which trypsin was added and the tissue was digested on a water bath shaker at 37 °C until the tissue was loosened. At the end of digestion, the tissue was allowed to settle and the supernatant was collected in a centrifuge tube after being filtered through 70 mesh and 200 mesh filters and centrifuged at 1,800 r/min for 10 min and the supernatant was discarded. The cells were washed by adding PBS to the centrifuge tube, centrifuged at 1,500 r/min for 5 min, the supernatant discarded, the cells were resuspended by adding complete medium, and inoculated into Petri dishes and incubated at 37 °C for 2.5 h at 5% CO₂, the supernatant discarded and replaced with complete medium for further incubation.

Cell culture and differentiation of pig intramuscular adipocytes

We used DMEM:F12 = 1:1 medium (DMEM Hyclone, Logan, UT, USA) to culture pig intramuscular adipocytes. The medium was supplemented with 10% fetal bovine serum and 2% penicillin–streptomycin. Cells were cultured in an incubator environment at 37 °C with 5% carbon dioxide until the cells grew to the point of contact inhibition. After 2 d of contact inhibition, the cells were switched to differentiation induction medium I (10 µmol/L insulin, 1 µmol/L DEXA, 0.5 mmol/L IBMX, 1 µmol/L rosiglitazone) (Sigma, Shanghai, China) for 2 d. The culture was then switched to induction medium II (10 µmol/L insulin) for 6 d and the medium was changed every 2 d until convergence of fat droplets was seen.

RT-qPCR and Western blot

Total RNA was extracted from pig intramuscular adipocytes using TRIzol reagent (Takara Bio, Otsu, Japan) according to the manufacturer’s instructions. cDNA was reverse transcribed from RNA using PrimeScript RT kit (Takara Bio, Otsu, Japan). Real-time quantitative polymerase chain reaction (RT-qPCR) was performed using a SYBR premixed Ex Taq kit (Vazyme Biotech, Nanjing, China). All primers are shown in Table S2.

After rinsing the cells with PBS, total protein was extracted from the supernatant using RIPA Lysis Buffer (Beyotime, Shanghai, China) containing phosphatase inhibitor A, phosphatase inhibitor B, and protease inhibitors. Protein samples were separated, and then transferred to a polyvinylidene difluoride (PVDF) membrane (Millipore, Bedford, MA, USA). After rinsing 3 times with 1 × TBST, secondary antibody (IgG 1:1,000, Goat Anti-Rabbit IgG, Boster, BA1039; Boster Biological Technology, Pleasanton, CA, USA) was added and incubated for 1.5 h at room temperature. Immunoblots were scanned using a gel imager (Bio-Rad, USA) and the lightness and darkness of the bands were analyzed using an image analysis application (ImageJ).

Statistical analysis of data

GraphPad (v8.3.0) software was used for significance analysis and multiple comparison. For the comparison between two groups, an independent-samples t-test was conducted. P < 0.05 was defined as the significance threshold. RT-qPCR data through the 2^-ΔΔCt method analysis.

Results

GZB pig genome assembly, genome characterization, and genome variation

For de novo assembly of the GZB pig genome, we used PacBio long-read sequencing (35.4 × coverage) to generate a chromosome-scale genome assembly for GZB pig (Table S3 and S4). The contig N50 length of the GZB pig genome is 42.07 Mb, which was similar to the official reference genome assembly (Sscrofa 11.1; 48.2 Mb) (Table 1) [43]. Evaluation of base accuracy and completeness of the genome assembly using Quast [26] and BUSCO [27] indicated the high quality of the genome assemblies (Table 1).

Table 1 Summary statistics of pig genome assemblies

Full size table

The percentage of repetitive sequences was 43.1% in the GZB pig genome (Table S5). Combining evidence from ab initio predictions and protein homology, a total of 20,813 protein-coding genes were predicted in the GZB pig genome. BUSCO analysis showed that the predicted genes in the GZB pig genome captured 91.7% of the core conserved animal genes (Table 1).

The three genome assemblies showed significant differences in the number, type, and relative abundance of transposable elements (TEs) (Table S5). The most common TE class found in the pig genome corresponds to LINE (long interspersed nuclear element) retrotransposons (Table S5). We observed a high level of colinearity between the parental genome of QCB pigs and Sscrofa 11.1, and there may be more insertions, deletions, or rearrangements on some chromosomes (Fig. 1b–e).

Integration and comparative analysis of genotyping results based on traditional single reference genome and two parental genomes

We used Sscrofa 11.1 as a reference and constructed a small pan-genome containing Sscrofa 11.1 and the parental genomes of QCB pigs based on whole-genome alignment (Fig. S1). We found that there were some large structural variations between the parental genomes of QCB pigs and Sscrofa 11.1 (Table S6). A total of 10,649,404 SNPs, 3,216,185 INDELs and 61,078 SVs were identified from the pan-genome, and the distribution of SNPs and INDELs on the genome was similar (Fig. 1a). Among them, INDELs and SVs were integrated into the reference genome, resulting in a pan-genome containing a total of 28.59 Mb of sequences not present in the reference genome. Of these sequences, 15.50 Mb and 14.67 Mb came from the GZB pig and DLW pig genomes, respectively, and 2.27 Mb came from two germplasms (Table S6).

Initially, we used the traditional single reference genome strategy to genotype 108 high-depth samples from the QCB pig hybrid population (Fig. S1a). We then merged the single reference genome with the graph-based genotyping results (Fig. S1c). We set the identifier for single reference genome typing as SR, the identifier for graph-based genotyping as Pan, and the identifier for the merger of the two as Merged.

The SR method is implemented based on tools such as BWA (Burrows-Wheeler Aligner) and GATK (Genome Analysis Toolkit). This method usually involves aligning short read sequences to a fixed reference genome and calling SNPs and INDELs by identifying variant sites in the alignment. Although this strategy performs well in the detection of SNPs and INDELs, its reliance on a single reference genome may lead to reference bias, especially in regions with high repetitiveness and variation in the genome.

PanGenie, which we used in the Pan method, is an accurate and efficient genotyping algorithm that improves genotyping performance by utilizing the pan-genome reference and k-mer counts of haplotype resolution, and is particularly outstanding in variant identification in large insertion and repeat regions [34]. We input the variant set in the upstream pan-genome into PanGenie, and use it as a graph-based typing panel to type the population reads.

The SR method can call variants from population reads, while the Pan method is only based on the variant set obtained during the construction of the pan-genome. In the case of a closed hybrid population, the variants detected by the SR method will increase with the increase in the number of samples and gradually reach saturation. The results of the Pan method typing directly depend on the variant set obtained during the construction of the pan-genome, and are less correlated with the number of population samples. This difference makes the results full of diversity and uncertainty.

In order to reduce the false positives caused by sequencing data in the SR method and improve the typing effect, we used high-depth (average depth of 16.6 ×) sample data (Table S1). After excluding multi-allelic sites and sites with missing sample genotypes, we counted the number of different variant types before and after quality control (Fig. 2a and b; Table S7), and the number of variants increased after merging (4.26% increase in SNPs and 58.40% increase in INDELs before quality control; 5.48% increase in SNPs and 67.84% increase in INDELs after quality control), which was different from the expected increase ratio for SNPs and INDELs. The increase ratio of the total number of variants after merging was 2.12% higher after filtering (12.65%) than the original (10.53%), which was caused by the small change of the variant set obtained by the pan method before and after quality control.

After merging the variants obtained by the two methods, the proportion of calls for both heterozygous reference/non-reference and non-reference/non-reference genotypes increased (Fig. 2c; Table S8). INDELs had a higher proportion of heterozygous reference/non-reference and non-reference/non-reference genotypes than SNPs, while SVs had the most non-reference/non-reference genotypes (Fig. 2c; Table S8). This suggests that the method based on the two parental genomes not only improves the detection ability of variants, but also that the diversity of SVs and INDELs is richer which may reflect the genetic variability and selection pressure of these variants in the population.

We compared the density of SNPs and INDELs obtained by the SR method, SVs obtained by the Pan method, and SNPs and INDELs obtained by merging the two methods on the whole genome (Fig. 2d). For SNPs and INDELs, the uniformity of the variant density after merging was generally improved, which may be because the typing based on the two parental genomes supplemented the insufficient detection of traditional single reference methods in some areas and reduced the influence of "reference bias". The annotation results showed that regardless of the type of variation, most of them were located in intergenic or intronic regions, and the proportional distributions of different variant types before and after data integration exhibited a high degree of consistency (Fig. 2e; Table S9). This reflects that the introduction of the two parental genomes did not significantly change the functional annotation distribution of variation types, but expanded the variation detection in these regions.

Population structure

To validate the integration strategy, we performed a population structure analysis to verify that the population structure was consistent with the structure inferred from the traditional single reference genome strategy. The principal component plot shows that all individuals from the same breed cluster together, the clustering in the principal component plot based on SNPs became more pronounced after applying the integration strategy. (Fig. 3a; Fig. S2a). Both before and after applying the integration strategy, the percentage of variance explained by INDELs was higher than that by SNPs. Moreover, the percentage of variance explained by PC1 for SVs was greater than that for both SNPs and INDELs, which highlights the potential of INDELs and SVs for population genetic structure analysis. Cluster analysis based on maximum likelihood estimation divided the samples into three independent clusters: GZB pigs, DLW pigs and QCB pigs (Fig. 3b; Fig. S2b). Our results show that the population structures derived from SNPs, INDELs and SVs are similar. QCB pigs each possess more than half DLW pig ancestry, which is likely a result of artificial selection during the breeding process.

Integrative strategy reveals novel candidate genes for intramuscular fat in pigs

Significant differences in intramuscular fat content between QCB and DLW pigs were found in a previous study, and the samples used in that study were from the same population as those used in this study [18]. We used two statistical methods (Fst and XPCLR) to search for regions in the genome, and regions with values in the top 5% were considered candidate genomic regions (Fig. 4a–d; Fig. S3).

In this study, SNPs-based genome scanning results showed that the Fst value of the variant set after integration was significantly higher than that before integration (The average difference was 0.004458 ± 3.9 × 10⁻⁵, t = 114.836, P < 0.0001), and the XPCLR value also significantly improved (mean difference 0.083877 ± 2.4 × 10⁻², t = 3.483, P = 0.0005). This trend indicates that after integrating the mutation set, the differentiation signal of SNPs among populations is stronger. In the genome scan of INDELs, we observed that the Fst value of the integrated variant set also increased (The average difference was 0.015884 ± 1.27 × 10⁻⁴, t = 125.036, P < 0.0001), while the XPCLR value decreased (The mean difference is −0.217820 ± 1.61 × 10⁻², t = −13.550, P < 0.0001). By annotating the top 5% of candidate regions, we counted the number of windows, the number of annotated candidate genes, the percentage of overlapping abnormal genes before and after integration, and QTL annotation entries (Table 2). The results of SNPs-based genome scanning showed that the number of windows and the number of annotated candidate genes corresponding to the integration strategy increased. The INDELs-based Fst results also show a similar trend, indicating that the integration process may help reveal more candidate regions. The results of INDELs-based XPCLR analysis showed that although the number of windows increased after integrating the variant set, the number of candidate genes decreased. In addition, INDELs-based Fst analysis also showed that the percentage of overlapping abnormal genes before and after integration was significantly lower than other gene types. This phenomenon suggests that the sensitivity of the XPCLR method may vary when dealing with INDELs variants or may be affected by specific selection signal patterns.

Table 2 Summary of outlier windows, outlier genes, overlapping genes, and QTL annotations

Full size table

We annotated QTL for the first 5% of the selected scanned areas. The results showed that QTL related to meat and carcass had the highest detection rate (Fig. 4f). To identify genes that specifically regulate intramuscular fat deposition, we analyzed RNA-seq data from the longissimus dorsi muscle of QCB pigs and DLW pigs [18]. By integrating genes identified by selection signals and transcriptome screening for DEGs related to fat metabolism, we identified 14 candidate genes most likely to affect IMF in pigs (Fig. 4e; Table S10).

EigenGWAS was conducted using three types of genetic variations (SNPs, INDELs, and SVs). The results of PCA analysis indicated that the first eigenvector could successfully separate QCB pigs and DLW pigs into distinct groups. However, the second eigenvector did not show a significant effect in distinguishing these two pig breeds and failed to achieve group separation (Fig. S4a, d, g). Therefore, the first eigenvector was used as a phenotype for EigenGWAS, which identified a total of 5,115 significant SNPs, 2,149 significant INDELs, and 62 significant SVs (Fig. S4b, c, e, f, h, i). Unlike the results from Fst and XPCLR analyses, these significant loci were almost exclusively concentrated on chromosome 8. After annotation, it was found that they, along with the selection signals identified previously, jointly pointed to the genes KIT and MSMO1 (Table S11).

We observed that both the Fst values and the XPCLR scores in the MSMO1 gene region are generally high, yet the XPCLR scores based on SNPs in this region are all 0 (Fig. 4g). We constructed haplotypes of the 200 kb flanking genomic region of the MSMO1 gene and found a consistent haplotype pattern (Fig. 4h). Subsequently, we screened for variants within the exons of the MSMO1 gene and found a missense mutation (Chr8:43,738,941; rs81509043). This mutation was all Homozygous reference in the Large White pig population, but only 56% was a homozygous reference in the Qinchuan Black pig population (41% were heterozygous variant and 3% were homozygous variant). The differential expression of this gene may be due to differences in protein structure and function caused by exon variations.

MSMO1 promotes pig intramuscular adipocyte proliferation and inhibits lipogenic differentiation

To further investigate the role of MSMO1 on adipogenic differentiation, we selected pig intramuscular adipocytes for subsequent experiments. RT-qPCR results showed that mRNA expression of proliferation-related genes CDK4, CDK6 and cyclin D was significantly increased by overexpression of MSMO1 (Fig. 5a and b). Protein expression of proliferation-related genes was also significantly increased in the treated group compared to the control group (Fig. 5c and d). We simultaneously inhibited with the gene using siRNA in the cells, which showed a significant decrease in the expression of some proliferation-related genes (Fig. 5e–h). The above results suggest that MSMO1 promotes the proliferation of pig intramuscular adipocytes.

Next, we explored the effect of MSMO1 on adipocyte lipogenic differentiation. The results showed that after overexpressing MSMO1 and inducing differentiation for 8 d, the mRNA expression of genes related to adipogenic differentiation, C/EBPα, PPARγ, and FABP4, was significantly reduced (Fig. 5i and j). Protein expression of C/EBPα and FABP4 showed the same trend (Fig. 5k and l). Inhibition of MSMO1 showed opposite results, and we found a significant increase in the expression of lipogenic differentiation genes (Fig. 5m–p). Together, these results suggest that MSMO1 inhibits lipogenic differentiation of pig intramuscular adipocytes.

Discussion

The Danish Large White pig (DLW) is a widely used lean pig breed, and the Guanzhong Black pig (GZB) is an excellent meat-fat dual-purpose breed developed in China in the early days. Both breeds are well-represented in the study of fat deposition. Although there have been comparative studies on pig fat deposition and some fat-related genes have been reported, most of these studies are based on mapping short-read data onto reference genomes to analyze differences between species or different physiological stages [44, 45]. However, reference genomes can introduce mapping biases that significantly affect downstream analysis and inference. Therefore, although the traditional single reference genome method occupies an important position in the field of genotyping, its limitations have prompted the development of new genotyping tools and strategies. A recent three-spined stickleback study demonstrated that the use of local reference genomes can provide higher mapping efficiency and reduce the proportion of missing data, thereby increasing the confidence of downstream analyses [16]. In addition, it is generally accepted in the genetics community that the parental genome contains the complete genomic information of the offspring. In a previously reported study of melon populations, a graph-based approach using parental genomes was able to improve variant calling rate and calling accuracy [17]. However, this strategy has not been reported in livestock.

This study provided a more complete and accurate genetic variation panel for offspring samples based on an integration strategy, and ultimately identified candidate genes associated with intramuscular fat (IMF) content in pigs after analysis. Compared with the traditional single-reference genome method, the comprehensiveness and accuracy of variant detection are significantly improved after integrating the genomes of both parents. Specifically, this strategy integrates data from the reference genome and parental genome, resulting in a markedly increased number of detected variants (including SNPs, INDELs, and SVs), with an even greater increase observed after quality control. There may be two reasons for this. One is that the typing panel used by the Pan method has a high similarity with the variants in the population sample; the other is that the PanGenie algorithm itself is more sensitive to variants in different samples, thereby reducing false positives caused by differences between samples. Furthermore, the detection rate of non-reference genotypes has improved, and the characteristics of variants have shown greater consistency, which further substantiates the reliability of the integration strategy.

We have identified an interesting phenomenon: when based on SNPs, the MSMO1 gene region along with its 150 kb upstream and downstream regions exhibit high Fst values, while the XPCLR score is 0. This phenomenon may be explained by the intense selective pressure this gene underwent during the early stages of breeding, leading to its fixation and subsequent population differentiation (high Fst). However, the XPCLR method becomes ineffective due to the absence of allele frequency gradients, and it fails to detect any recent selection signals. In particular, the integrated selection signal analysis identified more differential windows and differential genes, identifying 14 of the most promising candidate genes: MSMO1, SMC6, CCDC158, KIT, CCNC, IRS2, TRIB2, NREP, SPARCL1, EXOC6, RAPGEF1, SYCP2, RND3 and MTHFD2L (Table S10). The KIT gene is a crucial determinant of variations in pig coat color [46, 47]. In Yorkshire pigs, after eliminating the duplication and splicing mutations in the KIT gene, although there was no change in coat color, the meat color improved and the red blood cell count increased [48]. This suggests that KIT may indirectly influence meat color by affecting erythropoiesis rather than directly participating in melanin synthesis, providing new clues for unraveling the link between coat color genes and meat quality. Chinese indigenous black pigs often exhibit both black coats and high-quality meat, implying a potential correlation between coat color genes and meat quality traits. In the future, by precisely regulating the KIT gene and its interaction network, it may be possible to breed improved pig breeds that retain desirable coat colors while enhancing meat quality. The MSMO1 gene has been reported to negatively regulate adipogenesis in 3T3-L1 cells and is associated with IMF in chickens [49, 50]. This study further elucidates the dual regulatory functions of MSMO1: it stimulates the proliferation of pig primary intramuscular adipocytes while concurrently suppressing their adipogenic differentiation. Remarkably, most of these genes were screened in pigs for the first time. This suggests that it is promising to integrate parental genome information for downstream analysis to identify candidate genes.

Conclusion

In this study, we demonstrated the advantages of integrating biparental genomic information for variant detection and population genomic studies. Compared with traditional single-reference methods, the integration of biparental genomes enhanced the detection of genetic variants, including SNPs, INDELs, and SVs. The integrated strategy improved population structure analysis and revealed clearer clustering patterns. In addition, selection signal analysis identified candidate genes associated with IMF content, including MSMO1. Functional validation showed that MSMO1 promoted intramuscular adipocyte proliferation and inhibited adipogenic differentiation, indicating its potential to be a key regulator of IMF in pigs. Overall, our results highlight the use of strategies integrating biparental genomic information to improve variant detection and enhance our understanding of complex traits, providing new insights for future genetic research in livestock breeding.

Data Availability

Data are available from the corresponding author on reasonable request.

Abbreviations

C/EBPα:: CCAAT/enhancer binding protein alpha
CCS:: Circular consensus sequencing
CDK4:: Cyclin-dependent kinase 4
CDK6:: Cyclin-dependent kinase 6
cDNA:: Complementary deoxyribonucleic acid
DEGs:: Differentially expressed genes
DEXA:: Dexamethasone
DLW:: Danish Large White pig
DMEM:: Dulbecco’s Modified Eagle’s Medium
FABP4:: Fatty acid binding protein 4
Fst:: Fixation index
GZB:: Guanzhong Black pig
IBMX:: 3-Isobutyl-1-methylxanthine/Isobutylmethylxanthine
IM:: Intramuscular
IMF:: Intramuscular fat
INDELs:: Insertions and deletions
LINE:: Long interspersed nuclear element
MAF:: Minor allele frequency
mRNA:: Messenger ribonucleic acid
MSMO1:: Methylsterol monooxygenase 1
PBS:: Phosphate buffer saline
PCA:: Principal component analysis
PPARγ:: Peroxisome proliferator-activated receptor gamma
PVDF:: Polyvinylidene difluoride
QCB:: Qinchuan Black pig
RIPA:: Radioimmunoprecipitation assay
RNA:: Ribonucleic acid
RT-qPCR:: Real-time quantitative polymerase chain reaction
siRNA:: Small interfering RNA
SNPs:: Single nucleotide polymorphisms
SVs:: Structural variants
TBST:: Tris-buffered saline + Tween
TEs:: Transposable elements
XPCLR:: Cross-population composite likelihood ratio

References

1.Espinosa E, Bautista R, Larrosa R, Plata O. Advancements in long-read genome sequencing technologies and algorithms. Genomics. 2024;11.(2024)110842.: 110842.
10.1016/j.ygeno.2024.110842
Article|CAS|PubMed|Google Scholar
2.Sedlazeck FJ, Lee H, Darby CA, Schatz MC. Piercing the dark matter.(2018)bioinformatics of long-range sequencing and mapping.Nat Rev Genet.: 329.
10.1038/s41576-018-0003-4
Article|CAS|PubMed|Google Scholar
3.Mahmoud M, Gobet N, Cruz-Dávalos DI, Mounier N, Dessimoz C, Sedlazeck FJ. Structural variant calling.(2019)the long and the short of it.Genome Biol.: 1.
Article|CAS|PubMed|Google Scholar
4.Ho SS, Urban AE, Mills RE. Structural variation in the sequencing era. Nat Rev Genet. 2020;21(3).(2020)171–89.: 171.
10.1038/s41576-019-0180-9
Article|CAS|PubMed|Google Scholar
5.De Coster W, Weissensteiner MH, Sedlazeck FJ. Towards population-scale long-read sequencing. Nat Rev Genet. 2021;22(9).(2021)572–87.: 572.
10.1038/s41576-021-00367-3
Article|CAS|PubMed|Google Scholar
6.Li X, He S-G, Li W-R, Luo L-Y, Yan Z, Mo D-X, et al. Genomic analyses of wild argali, domestic sheep, and their hybrids provide insights into chromosome evolution, phenotypic variation, and germplasm innovation. Genome Res. 2022;32(9).(2022)1669–84.: 1669.
10.1101/gr.276769.122
Article|CAS|PubMed|Google Scholar
7.Miao J, Wei X, Cao C, Sun J, Xu Y, Zhang Z, et al. Pig pangenome graph reveals functional features of non-reference sequences. J Anim Sci Biotechnol. 2024;15.(2024)32.: 32.
10.1186/s40104-023-00984-4
Article|CAS|PubMed|Google Scholar
8.Leonard AS, Mapel XM, Pausch H. Pangenome-genotyped structural variation improves molecular phenotype mapping in cattle. Genome Res. 2024;34(2).(2024)300–9.: 300.
10.1101/gr.278267.123
Article|CAS|PubMed|Google Scholar
9.Rice ES, Alberdi A, Alfieri J, Athrey G, Balacco JR, Bardou P, et al. A pangenome graph reference of 30 chicken genomes allows genotyping of large and complex structural variants. BMC Biol. 2023;21.(2023)267.: 267.
10.1186/s12915-023-01758-0
Article|CAS|PubMed|Google Scholar
10.Li D, Wang Y, Yuan T, Cao M, He Y, Zhang L, et al. Pangenome and genome variation analyses of pigs unveil genomic facets for their adaptation and agronomic characteristics. IMeta. 2024;3(6).(2024)e257.
10.1002/imt2.257
Article|CAS|PubMed|Google Scholar
11.Nguyen TV, Vander Jagt CJ, Wang J, Daetwyler HD, Xiang R, Goddard ME, et al. In it for the long run.(2023)perspectives on exploiting long-read sequencing in livestock for population scale studies of structural variants.Genet Sel Evol.: 9.
10.1186/s12711-023-00783-5
Article|CAS|PubMed|Google Scholar
12.Sherman RM, Forman J, Antonescu V, Puiu D, Daya M, Rafaels N, et al. Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nat Genet. 2019;51.(2019)30–5.: 30.
10.1038/s41588-018-0273-y
Article|CAS|PubMed|Google Scholar
13.Sherman RM, Salzberg SL. Pan-genomics in the human genome era. Nat Rev Genet. 2020;21(4).(2020)243–54.: 243.
10.1038/s41576-020-0210-7
Article|CAS|PubMed|Google Scholar
14.Bohling J. Evaluating the effect of reference genome divergence on the analysis of empirical RADseq datasets. Ecol Evol. 2020;10(14).(2020)7585–601.: 7585.
10.1002/ece3.6483
Article|CAS|PubMed|Google Scholar
15.Prasad A, Lorenzen ED, Westbury MV. Evaluating the role of reference-genome phylogenetic distance on evolutionary inference. Mol Ecol Resour. 2022;22(1).(2022)45–55.: 45.
10.1111/1755-0998.13457
Article|CAS|PubMed|Google Scholar
16.Thorburn DMJ, Sagonas K, Binzer-Panchal M, Chain FJ, Feulner PG, Bornberg-Bauer E, et al. Origin matters.(2023)using a local reference genome improves measures in population genomics.Mol Ecol Resour.: 1706.
10.1111/1755-0998.13838
Article|CAS|PubMed|Google Scholar
17.Vaughn JN, Branham SE, Abernathy B, Hulse-Kemp AM, Rivers AR, Levi A, et al. Graph-based pangenomics maximizes genotyping density and reveals structural impacts on fungal resistance in melon. Nat Commun. 2022;13.(2022)7897.: 7897.
10.1038/s41467-022-35621-7
Article|CAS|PubMed|Google Scholar
18.Yu T, Tian X, Li D, He Y, Yang P, Cheng Y, et al. Transcriptome, proteome and metabolome analysis provide insights on fat deposition and meat quality in pig. Food Res Int. 2023;166.(2023)112550.: 112550.
10.1016/j.foodres.2023.112550
Article|CAS|PubMed|Google Scholar
19.Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021;18(2).(2021)170–5.: 170.
10.1038/s41592-020-01056-5
Article|CAS|PubMed|Google Scholar
20.Alonge M, Lebeigle L, Kirsche M, Jenike K, Ou S, Aganezov S, et al. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol. 2022;23.(2022)258.: 258.
10.1186/s13059-022-02823-7
Article|CAS|PubMed|Google Scholar
21.Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci. 2020;117(17).(2020)9451–7.: 9451.
10.1073/pnas.1921046117
Article|CAS|PubMed|Google Scholar
22.Chen N. Using Repeat Masker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics. 2004;5(1).(2004)4–10.: 4.
Article|CAS|PubMed|Google Scholar
23.Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005;110(1–4).(2005)462–7.: 462.
10.1159/000084979
Article|CAS|PubMed|Google Scholar
24.Shumate A, Salzberg SL. Liftoff.(2021)accurate mapping of gene annotations.Bioinformatics.: 1639.
10.1093/bioinformatics/btaa1016
Article|CAS|PubMed|Google Scholar
25.Buchfink B, Reuter K, Drost HG. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods. 2021;18(4).(2021)366–8.: 366.
10.1038/s41592-021-01101-x
Article|CAS|PubMed|Google Scholar
26.Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST.(2013)quality assessment tool for genome assemblies.Bioinformatics.: 1072.
10.1093/bioinformatics/btt086
Article|CAS|PubMed|Google Scholar
27.Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO.(2015)assessing genome assembly and annotation completeness with single-copy orthologs.Bioinformatics.: 3210.
10.1093/bioinformatics/btv351
Article|CAS|PubMed|Google Scholar
28.Chen S, Zhou Y, Chen Y, Gu J. fastp.(2018)an ultra-fast all-in-one FASTQ preprocessor.Bioinformatics.
10.1093/bioinformatics/bty560
Article|CAS|PubMed|Google Scholar
29.Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv. 2013;1303.3997.
Article|CAS|PubMed|Google Scholar
30.McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The genome analysis Toolkit.(2010)a MapReduce framework for analyzing next-generation DNA sequencing data.Genome Res.: 1297.
10.1101/gr.107524.110
Article|CAS|PubMed|Google Scholar
31.He W, Yang J, Jing Y, Xu L, Yu K, Fang X. NGenomeSyn.(2023)an easy-to-use and flexible tool for publication-ready visualization of syntenic relationships across multiple genomes.Bioinformatics.
Article|CAS|PubMed|Google Scholar
32.Li H. Minimap2.(2018)pairwise alignment for nucleotide sequences.Bioinformatics.: 3094.
10.1093/bioinformatics/bty191
Article|CAS|PubMed|Google Scholar
33.Hickey G, Monlong J, Ebler J, Novak AM, Eizenga JM, Gao Y, et al. Pangenome graph construction from genome alignments with Minigraph-Cactus. Nat Biotechnol. 2024;42(4).(2024)663–73.: 663.
10.1038/s41587-023-01793-w
Article|CAS|PubMed|Google Scholar
34.Ebler J, Ebert P, Clarke WE, Rausch T, Audano PA, Houwaart T, et al. Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes. Nat Genet. 2022;54(4).(2022)518–25.: 518.
10.1038/s41588-022-01043-w
Article|CAS|PubMed|Google Scholar
35.McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, et al. The ensembl variant effect predictor. Genome Biol. 2016;17.(2016)122.: 122.
10.1186/s13059-016-0974-4
Article|CAS|PubMed|Google Scholar
36.Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15).(2011)2156–8.: 2156.
10.1093/bioinformatics/btr330
Article|CAS|PubMed|Google Scholar
37.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK.(2007)a tool set for whole-genome association and population-based linkage analyses.Am J Hum Genet.: 559.
10.1086/519795
Article|CAS|PubMed|Google Scholar
38.Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19(9).(2009)1655–64.: 1655.
10.1101/gr.094052.109
Article|CAS|PubMed|Google Scholar
39.Chen H, Patterson N, Reich D. Population differentiation as a test for selective sweeps. Genome Res. 2010;20(3).(2010)393–402.: 393.
10.1101/gr.100545.109
Article|CAS|PubMed|Google Scholar
40.Turner S D. qqman.(2018)an R package for visualizing GWAS results using QQ and manhattan plots.J Open Source Softw.: 731.
Article|CAS|PubMed|Google Scholar
41.Hu Z-L, Park CA, Reecy JM. Bringing the animal QTLdb and CorrDB into the future.(2022)meeting new challenges and providing updated services.Nucleic Acids Res.
10.1093/nar/gkab1116
Article|CAS|PubMed|Google Scholar
42.Chen GB, Lee SH, Zhu ZX, Benyamin B, Robinson MR. EigenGWAS.(2016)finding loci under selection through genome-wide association studies of eigenvectors in structured populations.Heredity.: 51.
10.1038/hdy.2016.25
Article|CAS|PubMed|Google Scholar
43.Warr A, Affara N, Aken B, Beiki H, Bickhart DM, Billis K, et al. An improved pig reference genome sequence to enable pig genetics and genomics research. Gigascience. 2020;9(6).(2020)giaa051.
10.1093/gigascience/giaa051
Article|CAS|PubMed|Google Scholar
44.Gozalo-Marcilla M, Buntjer J, Johnsson M, Batista L, Diez F, Werner CR, et al. Genetic architecture and major genes for backfat thickness in pig lines of diverse genetic backgrounds. Genet Sel Evol. 2021;53.(2021)76.: 76.
10.1186/s12711-021-00671-w
Article|CAS|PubMed|Google Scholar
45.Wang L, Zhang Y, Zhang B, Zhong H, Lu Y, Zhang H. Candidate gene screening for lipid deposition using combined transcriptomic and proteomic data from Nanyang black pigs. BMC Genomics. 2021;22.(2021)441.: 441.
10.1186/s12864-021-07764-2
Article|CAS|PubMed|Google Scholar
46.Moller MJ, Chaudhary R, Hellmén E, Höyheim B, Chowdhary B, Andersson L. Pigs with the dominant white coat color phenotype carry a duplication of the KIT gene encoding the mast/stem cell growth factor receptor. Mamm Genome. 1996;7.(1996)822–30.: 822.
10.1007/s003359900244
Article|CAS|PubMed|Google Scholar
47.Marklund S, Kijas J, Rodriguez-Martinez H, Rönnstrand L, Funa K, Moller M, et al. Molecular basis for the dominant white phenotype in the domestic pig. Genome Res. 1998;8(8).(1998)826–33.: 826.
10.1101/gr.8.8.826
Article|CAS|PubMed|Google Scholar
48.Liang X, Lan J, Xu M, Qin K, Liu H, Sun G, et al. Impact ofKITediting on coat pigmentation and fresh meat color in Yorkshire pigs. CRISPR J. 2022;5(6).(2022)825–42.: 825.
10.1089/crispr.2022.0039
Article|CAS|PubMed|Google Scholar
49.Xin Y, Li C, Guo Y, Xiao R, Zhang H, Zhou G. RNA-Seq analysis reveals a negative role of MSMO1 with a synergized NSDHL expression during adipogenesis of 3T3-L1. Biosci Biotechnol Biochem. 2019;83(4).(2019)641–52.: 641.
10.1080/09168451.2018.1559719
Article|CAS|PubMed|Google Scholar
50.Liu L, Liu X, Cui H, Liu R, Zhao G, Wen J. Transcriptional insights into key genes and pathways controlling muscle lipid metabolism in broiler chickens. BMC Genomics. 2019;20.(2019)863.: 863.
10.1186/s12864-019-6221-0
Article|CAS|PubMed|Google Scholar

Acknowledgements

We thank the High‐Performance Computing (HPC) of Northwest A&F University (NWAFU) for providing computing resources.

Funding

This research was supported by the National Key Research and Development Program (2021YFD1301200 and 2021YFF1000602).

Ethics Declaration

Ethics approval and consent to participate

All the animal studies were approved by the Institutional Animal Care and Use Committee of Northwest A&F University (approval number: NWAFU-314021168). All operations were carried out in accordance with the university guidelines for animal research. All pigs were cared for and sacrificed following the guidelines of the Institutional Animal Care and Use Committee of Northwest A&F University (Yangling, China).

Consent for publication

Not applicable.

Competing interests

No potential conflict of interest was reported by the author(s).

Rights and Permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Reprints and permissions

[1] 1.Espinosa E, Bautista R, Larrosa R, Plata O. Advancements in long-read genome sequencing technologies and algorithms. Genomics. 2024;11.(2024)110842.: 110842.
10.1016/j.ygeno.2024.110842
Article|CAS|PubMed|Google Scholar

[2] 2.Sedlazeck FJ, Lee H, Darby CA, Schatz MC. Piercing the dark matter.(2018)bioinformatics of long-range sequencing and mapping.Nat Rev Genet.: 329.
10.1038/s41576-018-0003-4
Article|CAS|PubMed|Google Scholar

[3] 3.Mahmoud M, Gobet N, Cruz-Dávalos DI, Mounier N, Dessimoz C, Sedlazeck FJ. Structural variant calling.(2019)the long and the short of it.Genome Biol.: 1.
Article|CAS|PubMed|Google Scholar

[4] 4.Ho SS, Urban AE, Mills RE. Structural variation in the sequencing era. Nat Rev Genet. 2020;21(3).(2020)171–89.: 171.
10.1038/s41576-019-0180-9
Article|CAS|PubMed|Google Scholar

[5] 5.De Coster W, Weissensteiner MH, Sedlazeck FJ. Towards population-scale long-read sequencing. Nat Rev Genet. 2021;22(9).(2021)572–87.: 572.
10.1038/s41576-021-00367-3
Article|CAS|PubMed|Google Scholar

[6] 6.Li X, He S-G, Li W-R, Luo L-Y, Yan Z, Mo D-X, et al. Genomic analyses of wild argali, domestic sheep, and their hybrids provide insights into chromosome evolution, phenotypic variation, and germplasm innovation. Genome Res. 2022;32(9).(2022)1669–84.: 1669.
10.1101/gr.276769.122
Article|CAS|PubMed|Google Scholar

[7] 7.Miao J, Wei X, Cao C, Sun J, Xu Y, Zhang Z, et al. Pig pangenome graph reveals functional features of non-reference sequences. J Anim Sci Biotechnol. 2024;15.(2024)32.: 32.
10.1186/s40104-023-00984-4
Article|CAS|PubMed|Google Scholar

[8] 8.Leonard AS, Mapel XM, Pausch H. Pangenome-genotyped structural variation improves molecular phenotype mapping in cattle. Genome Res. 2024;34(2).(2024)300–9.: 300.
10.1101/gr.278267.123
Article|CAS|PubMed|Google Scholar

[9] 9.Rice ES, Alberdi A, Alfieri J, Athrey G, Balacco JR, Bardou P, et al. A pangenome graph reference of 30 chicken genomes allows genotyping of large and complex structural variants. BMC Biol. 2023;21.(2023)267.: 267.
10.1186/s12915-023-01758-0
Article|CAS|PubMed|Google Scholar

[10] 10.Li D, Wang Y, Yuan T, Cao M, He Y, Zhang L, et al. Pangenome and genome variation analyses of pigs unveil genomic facets for their adaptation and agronomic characteristics. IMeta. 2024;3(6).(2024)e257.
10.1002/imt2.257
Article|CAS|PubMed|Google Scholar

[11] 11.Nguyen TV, Vander Jagt CJ, Wang J, Daetwyler HD, Xiang R, Goddard ME, et al. In it for the long run.(2023)perspectives on exploiting long-read sequencing in livestock for population scale studies of structural variants.Genet Sel Evol.: 9.
10.1186/s12711-023-00783-5
Article|CAS|PubMed|Google Scholar

[12] 12.Sherman RM, Forman J, Antonescu V, Puiu D, Daya M, Rafaels N, et al. Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nat Genet. 2019;51.(2019)30–5.: 30.
10.1038/s41588-018-0273-y
Article|CAS|PubMed|Google Scholar

[13] 13.Sherman RM, Salzberg SL. Pan-genomics in the human genome era. Nat Rev Genet. 2020;21(4).(2020)243–54.: 243.
10.1038/s41576-020-0210-7
Article|CAS|PubMed|Google Scholar

[14] 14.Bohling J. Evaluating the effect of reference genome divergence on the analysis of empirical RADseq datasets. Ecol Evol. 2020;10(14).(2020)7585–601.: 7585.
10.1002/ece3.6483
Article|CAS|PubMed|Google Scholar

[15] 15.Prasad A, Lorenzen ED, Westbury MV. Evaluating the role of reference-genome phylogenetic distance on evolutionary inference. Mol Ecol Resour. 2022;22(1).(2022)45–55.: 45.
10.1111/1755-0998.13457
Article|CAS|PubMed|Google Scholar

[16] 16.Thorburn DMJ, Sagonas K, Binzer-Panchal M, Chain FJ, Feulner PG, Bornberg-Bauer E, et al. Origin matters.(2023)using a local reference genome improves measures in population genomics.Mol Ecol Resour.: 1706.
10.1111/1755-0998.13838
Article|CAS|PubMed|Google Scholar

[17] 17.Vaughn JN, Branham SE, Abernathy B, Hulse-Kemp AM, Rivers AR, Levi A, et al. Graph-based pangenomics maximizes genotyping density and reveals structural impacts on fungal resistance in melon. Nat Commun. 2022;13.(2022)7897.: 7897.
10.1038/s41467-022-35621-7
Article|CAS|PubMed|Google Scholar

[18] 18.Yu T, Tian X, Li D, He Y, Yang P, Cheng Y, et al. Transcriptome, proteome and metabolome analysis provide insights on fat deposition and meat quality in pig. Food Res Int. 2023;166.(2023)112550.: 112550.
10.1016/j.foodres.2023.112550
Article|CAS|PubMed|Google Scholar

[19] 19.Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021;18(2).(2021)170–5.: 170.
10.1038/s41592-020-01056-5
Article|CAS|PubMed|Google Scholar

[20] 20.Alonge M, Lebeigle L, Kirsche M, Jenike K, Ou S, Aganezov S, et al. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol. 2022;23.(2022)258.: 258.
10.1186/s13059-022-02823-7
Article|CAS|PubMed|Google Scholar

[21] 21.Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci. 2020;117(17).(2020)9451–7.: 9451.
10.1073/pnas.1921046117
Article|CAS|PubMed|Google Scholar

[22] 22.Chen N. Using Repeat Masker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics. 2004;5(1).(2004)4–10.: 4.
Article|CAS|PubMed|Google Scholar

[23] 23.Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005;110(1–4).(2005)462–7.: 462.
10.1159/000084979
Article|CAS|PubMed|Google Scholar

[24] 24.Shumate A, Salzberg SL. Liftoff.(2021)accurate mapping of gene annotations.Bioinformatics.: 1639.
10.1093/bioinformatics/btaa1016
Article|CAS|PubMed|Google Scholar

[25] 25.Buchfink B, Reuter K, Drost HG. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods. 2021;18(4).(2021)366–8.: 366.
10.1038/s41592-021-01101-x
Article|CAS|PubMed|Google Scholar

[26] 26.Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST.(2013)quality assessment tool for genome assemblies.Bioinformatics.: 1072.
10.1093/bioinformatics/btt086
Article|CAS|PubMed|Google Scholar

[27] 27.Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO.(2015)assessing genome assembly and annotation completeness with single-copy orthologs.Bioinformatics.: 3210.
10.1093/bioinformatics/btv351
Article|CAS|PubMed|Google Scholar

[28] 28.Chen S, Zhou Y, Chen Y, Gu J. fastp.(2018)an ultra-fast all-in-one FASTQ preprocessor.Bioinformatics.
10.1093/bioinformatics/bty560
Article|CAS|PubMed|Google Scholar

[29] 29.Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv. 2013;1303.3997.
Article|CAS|PubMed|Google Scholar

[30] 30.McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The genome analysis Toolkit.(2010)a MapReduce framework for analyzing next-generation DNA sequencing data.Genome Res.: 1297.
10.1101/gr.107524.110
Article|CAS|PubMed|Google Scholar

[31] 31.He W, Yang J, Jing Y, Xu L, Yu K, Fang X. NGenomeSyn.(2023)an easy-to-use and flexible tool for publication-ready visualization of syntenic relationships across multiple genomes.Bioinformatics.
Article|CAS|PubMed|Google Scholar

[32] 32.Li H. Minimap2.(2018)pairwise alignment for nucleotide sequences.Bioinformatics.: 3094.
10.1093/bioinformatics/bty191
Article|CAS|PubMed|Google Scholar

[33] 33.Hickey G, Monlong J, Ebler J, Novak AM, Eizenga JM, Gao Y, et al. Pangenome graph construction from genome alignments with Minigraph-Cactus. Nat Biotechnol. 2024;42(4).(2024)663–73.: 663.
10.1038/s41587-023-01793-w
Article|CAS|PubMed|Google Scholar

[34] 34.Ebler J, Ebert P, Clarke WE, Rausch T, Audano PA, Houwaart T, et al. Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes. Nat Genet. 2022;54(4).(2022)518–25.: 518.
10.1038/s41588-022-01043-w
Article|CAS|PubMed|Google Scholar

[35] 35.McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, et al. The ensembl variant effect predictor. Genome Biol. 2016;17.(2016)122.: 122.
10.1186/s13059-016-0974-4
Article|CAS|PubMed|Google Scholar

[36] 36.Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15).(2011)2156–8.: 2156.
10.1093/bioinformatics/btr330
Article|CAS|PubMed|Google Scholar

[37] 37.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK.(2007)a tool set for whole-genome association and population-based linkage analyses.Am J Hum Genet.: 559.
10.1086/519795
Article|CAS|PubMed|Google Scholar

[38] 38.Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19(9).(2009)1655–64.: 1655.
10.1101/gr.094052.109
Article|CAS|PubMed|Google Scholar

[39] 39.Chen H, Patterson N, Reich D. Population differentiation as a test for selective sweeps. Genome Res. 2010;20(3).(2010)393–402.: 393.
10.1101/gr.100545.109
Article|CAS|PubMed|Google Scholar

[40] 40.Turner S D. qqman.(2018)an R package for visualizing GWAS results using QQ and manhattan plots.J Open Source Softw.: 731.
Article|CAS|PubMed|Google Scholar

[41] 41.Hu Z-L, Park CA, Reecy JM. Bringing the animal QTLdb and CorrDB into the future.(2022)meeting new challenges and providing updated services.Nucleic Acids Res.
10.1093/nar/gkab1116
Article|CAS|PubMed|Google Scholar

[42] 42.Chen GB, Lee SH, Zhu ZX, Benyamin B, Robinson MR. EigenGWAS.(2016)finding loci under selection through genome-wide association studies of eigenvectors in structured populations.Heredity.: 51.
10.1038/hdy.2016.25
Article|CAS|PubMed|Google Scholar

[43] 43.Warr A, Affara N, Aken B, Beiki H, Bickhart DM, Billis K, et al. An improved pig reference genome sequence to enable pig genetics and genomics research. Gigascience. 2020;9(6).(2020)giaa051.
10.1093/gigascience/giaa051
Article|CAS|PubMed|Google Scholar

[44] 44.Gozalo-Marcilla M, Buntjer J, Johnsson M, Batista L, Diez F, Werner CR, et al. Genetic architecture and major genes for backfat thickness in pig lines of diverse genetic backgrounds. Genet Sel Evol. 2021;53.(2021)76.: 76.
10.1186/s12711-021-00671-w
Article|CAS|PubMed|Google Scholar

[45] 45.Wang L, Zhang Y, Zhang B, Zhong H, Lu Y, Zhang H. Candidate gene screening for lipid deposition using combined transcriptomic and proteomic data from Nanyang black pigs. BMC Genomics. 2021;22.(2021)441.: 441.
10.1186/s12864-021-07764-2
Article|CAS|PubMed|Google Scholar

[46] 46.Moller MJ, Chaudhary R, Hellmén E, Höyheim B, Chowdhary B, Andersson L. Pigs with the dominant white coat color phenotype carry a duplication of the KIT gene encoding the mast/stem cell growth factor receptor. Mamm Genome. 1996;7.(1996)822–30.: 822.
10.1007/s003359900244
Article|CAS|PubMed|Google Scholar

[47] 47.Marklund S, Kijas J, Rodriguez-Martinez H, Rönnstrand L, Funa K, Moller M, et al. Molecular basis for the dominant white phenotype in the domestic pig. Genome Res. 1998;8(8).(1998)826–33.: 826.
10.1101/gr.8.8.826
Article|CAS|PubMed|Google Scholar

[48] 48.Liang X, Lan J, Xu M, Qin K, Liu H, Sun G, et al. Impact ofKITediting on coat pigmentation and fresh meat color in Yorkshire pigs. CRISPR J. 2022;5(6).(2022)825–42.: 825.
10.1089/crispr.2022.0039
Article|CAS|PubMed|Google Scholar

[49] 49.Xin Y, Li C, Guo Y, Xiao R, Zhang H, Zhou G. RNA-Seq analysis reveals a negative role of MSMO1 with a synergized NSDHL expression during adipogenesis of 3T3-L1. Biosci Biotechnol Biochem. 2019;83(4).(2019)641–52.: 641.
10.1080/09168451.2018.1559719
Article|CAS|PubMed|Google Scholar

[50] 50.Liu L, Liu X, Cui H, Liu R, Zhao G, Wen J. Transcriptional insights into key genes and pathways controlling muscle lipid metabolism in broiler chickens. BMC Genomics. 2019;20.(2019)863.: 863.
10.1186/s12864-019-6221-0
Article|CAS|PubMed|Google Scholar

Journal of Animal Science and Biotechnology

Integrating parental genomes to reduce reference bias and identify intramuscular fat genes in Qinchuan Black pigs

Abstract

Background

Results

Conclusions

Keywords

Background

Materials and methods

Sample information

Genome sequencing

Genome assembly and annotation

Single reference genotyping

Construction of pan-genome graphs and graph-based genotyping

Phylogenetic and population genomic analyses

Selective scan analysis

Isolation of porcine intramuscular preadipocytes

Cell culture and differentiation of pig intramuscular adipocytes

RT-qPCR and Western blot

Statistical analysis of data

Results

GZB pig genome assembly, genome characterization, and genome variation

Integration and comparative analysis of genotyping results based on traditional single reference genome and two parental genomes

Population structure

Integrative strategy reveals novel candidate genes for intramuscular fat in pigs

MSMO1 promotes pig intramuscular adipocyte proliferation and inhibits lipogenic differentiation

Discussion

Conclusion

Data Availability

Abbreviations

References

Acknowledgements

Funding

Ethics Declaration

Ethics approval and consent to participate

Consent for publication

Competing interests

Rights and Permissions