Near telomere-to-telomere genome assemblies of SilkieGallus gallusand MallardAnas platyrhynchosrestored the structure of chromosomes and “missing” genes in birds

Zhuocheng Hou

doi:10.1186/s40104-024-01141-1

Research

Near telomere-to-telomere genome assemblies of SilkieGallus gallusand MallardAnas platyrhynchosrestored the structure of chromosomes and “missing” genes in birds

¹ ,¹ ,¹

Abstract

Background

Chickens and ducks are vital sources of animal protein for humans. Recent pangenome studies suggest that a single genome is insufficient to represent the genetic information of a species, highlighting the need for more comprehensive genomes. The bird genome has more than tens of microchromosomes, but comparative genomics, annotations, and the discovery of variations are hindered by inadequate telomere-to-telomere level assemblies. We aim to complete the chicken and duck genomes, recover missing genes, and reveal common and unique chromosomal features between birds.

Results

The near telomere-to-telomere genomes of SilkieGallus gallusand MallardAnas platyrhynchoswere successfully assembled via multiple high-coverage complementary technologies, with quality values of 36.65 and 44.17 for Silkie and Mallard, respectively; and BUSCO scores of 96.55% and 96.97% for Silkie and Mallard, respectively; the mapping rates reached over 99.52% for both assembled genomes, these evaluation results ensured high completeness and accuracy. We successfully annotated 20,253 and 19,621 protein-coding genes for Silkie and Mallard, respectively, and assembled gap-free sex chromosomes in Mallard for the first time. Comparative analysis revealed that microchromosomes differ from macrochromosomes in terms of GC content, repetitive sequence abundance, gene density, and levels of 5mC methylation. Different types of arrangements of centromeric repeat sequence centromeres exist in both Silkie and the Mallard genomes, with Mallard centromeres being invaded by CR1. The highly heterochromatic W chromosome, which serves as a refuge for ERVs, contains disproportionately long ERVs. Both Silkie and the Mallard genomes presented relatively high 5mC methylation levels on sex chromosomes and microchromosomes, and the telomeres and centromeres presented significantly higher 5mC methylation levels than the whole genome. Finally, we recovered 325 missing genes via our new genomes and annotatedTNFAin Mallard for the first time, revealing conserved protein structures and tissue-specific expression.

Conclusions

The near telomere-to-telomere assemblies in Mallard and Silkie, with the first gap-free sex chromosomes in ducks, significantly enhanced our understanding of genetic structures in birds, specifically highlighting the distinctive chromosome features between the chicken and duck genomes. This foundational work also provides a series of newly identified missing genes for further investigation.

Keywords

5mC methylation levelAvianCentromereChromosomesGenomeGenome EvolutionGenome assembly algorithmsMissing geneReplisomeTelomere-to-telomere genomeTelomeres

Background

Chickens and ducks are the two most farmed poultry, providing a significant amount of animal protein and occupying an important position in human society. However, previous studies have suggested that a significant number of protein-coding genes are missing in avian genomes compared with mammalian and amphibian genomes [1, 2]. Sustained efforts are being made to recover these "missing" genes, which may have been overlooked in incomplete genomes [3,4,5], especially complex regions such as centromeres and telomeres, which can be resolved through telomere-to-telomere (T2T) genomes. In recent years, an increasing number of pangenome studies [6, 7] have shown that one single genome is not sufficient to represent all the genetic information of a species. This suggests that a single reference genome impedes the discovery of functional genes, and more complete genomes of different breeds are needed to characterize the genomes of avian species collectively. Comparative genomics can help us identify similarities and differences between species. Microchromosomes exist in bird genomes, but the properties of microchromosomes, such as centromere composition and 5-methylcytosine (5mC) methylation levels relative to macrochromosomes, are still unclear. Centromeres are repeat-rich heterochromatic regions critical for faithful chromosome segregation during cell division [8]. The sequence and structure of centromeric regions are highly diverse among different species. Compared with conventional genome assemblies, T2T genomes have significant advantages, primarily reflected in the completeness and accuracy of the genome assembly, the discovery of functional genes, and the detection of structural variants. Therefore, we aimed to enhance the genomes of chickens and ducks to the T2T level and employ comparative genomics to study the differences and commonalities between chickens and ducks in terms of chromosome types, centromeres, transposable elements, and 5mC methylation. Furthermore, utilizing the T2T genomes of chickens, ducks and other published avian genomes together, we can investigate extensively and recover missing genes that were previously thought to be in avian species.

Methods

Sample collection and sequencing

To achieve T2T genome assembly, we have added new sequencing data in addition to the existing data from both Silkie and Mallard from our previous work [9, 10]. Fresh blood from the same individual was used for high-fidelity (HiFi) sequencing and Oxford Nanopore Technology (ONT) sequencing of the Mallard. Information related to the sequencing summary of the Mallard is shown in Table S1 and Fig. S1. Fresh blood from the same individual was used for nanopore sequencing of Silkie. Information related to the sequencing summary of Silkie is shown in Table S2 and Fig. S1. The DNA from the same Silkie used to generate the ONT sequencing libraries was the same as that used for the Mallard.

To construct sequencing libraries for Pacific Biosciences (PacBio) HiFi sequencing, more than 20 µg of sheared DNA was subjected to size selection via the Blue Pippin system, and ~ 15 kb Sequel SMRT bell libraries were prepared according to the protocol provided by the PacBio company. Four SMRT cells were run on a PacBio RSII system via P6‒C4 chemistry. Genomic DNA for ONT read sequencing was isolated from the blood. DNA was extracted via the phenol:chloroform:isoamyl alcohol (25:24:1) method from the Tris + SDS (sodium dodecyl sulfate) + EDTA + NaCl lysing reagent-treated tissues without a purification step to ensure a sustained length of genomic DNA. The sequencing libraries were processed via a Ligation Sequencing 1D Kit (SQK-LSK109, Oxford Nanopore Technologies, UK) according to the manufacturer’s instructions. Four DNA libraries were constructed and sequenced on the PromethION platform (Oxford Nanopore Technologies, UK). Guppy (v5.0) was used for base calling and output to FASTQ files.

Genome assembly and assessment

After trying a variety of strategies for assembly, we integrated a method suitable for assembling, correcting, and gap-filling bird microchromosomes. The assembly pipelines are shown in Fig. S2 and S3. For both CAU_Silkie_2.0 and CAU_Wild_2.0, we integrated multiple data sources and used a manual assembly pipeline based on HiFi phased assembly to merge contigs from multiple data sources and methods based on genome collinearity.

Specifically speaking, for CAU_Silkie_2.0, PacBio subreads were filtered and corrected with the circular consensus sequencing (CCS) pipeline v6.0.0 (https://github.com/PacificBiosciences/ccs). Then, adapters of HiFi reads were filtered by HiFiAdapterFilt (v2.0.1) [11], adapters of the ONT reads were trimmed via PoreChop (v0.2.4) [12]; the preprocessing of Hi-C reads was completed via fastp (v0.20.1) [13]. The HiFi reads, ONT reads longer than 50 kb in length and Hi-C reads were subjected to hifiasm (v0.19.6) [14] for double-graph phased assembly. HiFiasm phased contigs were used to phase ONT reads longer than 30 kb by mapping with minimap2 (v2.26) [15]. All ONT reads longer than 30 kb and phased reads were subsequently subjected to NextDenovo (v2.5.2) [16] and CANU (v2.2) [17]. The quality of Hi-C reads was controlled by HiC-Pro (v3.0) [18], and only valid pairs were used for subsequent analysis. The primary HiFiasm contigs were used for scaffolding with Hi-C reads via YAHS (v1.2a.2) [19]. Hi-C reads were mapped with Chromap (v0.2.5-r473) [20]. Contigs from ONT reads were rescued by picking up contigs without mapping with any scaffolds. Scaffolds were manually curated by JuiceBox (v1.11.08) [21] with a Hi-C interaction signal and collinearity with CAU_Silkie_1.0. Gap filling is completed step by step with various versions of contigs and various types of reads. First, chromosomal structures are corrected by comparing different versions of contigs, Hi-C signals, and collinearity with CAU_Silkie_1.0. LINKVIEW2 (https://yangjianshun.github.io/LINKVIEW2/) and scripts from GitHub (https://github.com/ZhouQiLab/DuckGenome/tree/master/anchoring_chr, referred to as anchor scripts) were utilized to manually inspect and integrate sequences into chromosomes or link chromosomes together. Reads are subsequently utilized to patch the remaining gaps via TGS-Gapcloser (v1.2.0) [22]. The remaining gaps were closed through a combination of long reads or contigs that span both ends of the gap. This process was facilitated by scripts from GitHub (https://github.com/zhangleiworld/gapfill_by_reads), DEGAP [23] and anchor scripts, all under manual inspection. After this, duplications were purged with purge_dups (purged with HiFi reads, manually checked cutoffs, v1.2.6) [24]. Contaminations were selected by Krakenuniq (v1.04) [25] leveraging reference databases comprising human, vector, and microbial sequences. The scaffolds were subsequently polished for 2 rounds via HiFi reads with NextPolish (v1.4.1) [26]. Mitochondrial assembly was performed with MitoHiFi (-o 2, v3.0.0) [27].

For CAU_Wild_2.0, a similar pipeline was used. Additionally, CLR reads were downsampled to include only those longer than 17 kb via Filtlong (–min_length 17,000, v0.2.1, https://github.com/rrwick/Filtlong) and assembled by NextDenovo. Illumina reads were assembled via megahit (v1.2.9) [28]. RunBNG (v1.03) [29] was employed to further scaffold the scaffolds. This was achieved by integrating hybrid assembly with Bionano optical maps. Contigs from CLR and ONT were rescued by picking up contigs without mapping with any scaffolds. HiFi reads and WGS reads were used to polish the genome for 2 rounds with NextPolish2 (v0.2.0) [30] and Pilon (v1.24–0) [31], respectively.

Benchmarking Universal Single-Copy Orthologs (BUSCO) (aves_odb10, n = 8,338, v5.0.0) [32] was used to assess the completeness and accuracy of the assembled genome. To test the consistency between the raw data and the assembly, we aligned all the reads back to the genomes. For CAU_Silkie_2.0, we calculated the quality value (QV) from merqury (v1.3) [33] with HiFi reads, while for CAU_Wild_2.0, the QV was calculated from Illumina reads.

Centromere and telomere identification

We searched for the presence of telomere repeats (TTAGGG)n via quarTeT (v1.03) [34]. The ChIP-seq data of CENPA were aligned with the BWA-MEM algorithm with options “-k 50 -c 1000000”. The alignment duplications were marked with sambamba (v0.6.3) [35] and filtered with samtools (view -q 30 -F 2308, v1.15.1). We counted the reads with BEDTools genomecov (v2.29.2) [36]. To annotate the putative centromeres of CAU_Wild_2.0, we searched the genome with the reported 190-bp duck centromeric repeats [37] using TRFinder (2 5 7 80 10 50 2000, v4.09) [38] and SRF [39] followed by manual curation. Similarity heatmaps were generated via StainedGlass (v0.6) [40].

Genome structure prediction and annotation

We mapped the RNA-seq data (Table S3) against the genome assembly with HISAT2 (v2.1.0) [41]. The transcripts were assembled via StringTie (v2.0) [42]. TransDecoder (v5.5.0, https://github.com/TransDecoder/TransDecoder) was used to predict protein-coding regions of the assembled transcripts. Gene models were annotated via the EVidenceModeler (EVM) genome annotation pipeline (v2.31.8) [43], which integrates both ab initio gene predictions generated by Braker3 (v2.1.6) [44] and Helixer (online server) [45], protein-coding regions of the genome-guide assembly of transcripts in the genome, and homology evidence, including protein sequences in the SwissProt database, via exonerate (v2.4.0) (https://github.com/nathanweeks/exonerate). The gene models were further refined twice via PASA (v2.4.1) [46]. To assess the completeness and accuracy of the annotations, we computed BUSCO scores for the annotations using compleasm (v0.2.2) [47].

Identification of noncoding RNA genes

Noncoding RNA species, including microRNA (miRNA), transfer RNA (tRNA), ribosomal RNA (rRNA) and small nuclear RNA (snRNA), were annotated via several methods. tRNAs were predicted via tRNAscan-SE (v1.3.1) [48] with default parameters before repeat masking. miRNAs and snRNAs were annotated by scanning Rfam (v14.0) [49] against the genome and passing the results into Infernal (v1.1.3) [50] with default parameters. The results are shown in Table S4.

Annotation of repeats and transposable elements

Repeats were analyzed via a method that combines de novo structure analyses and homology comparisons. First, RepeatModeler (-LTRStruct, 2.0.2a) [51] was employed to construct the repeat element library. The repeat regions were then annotated via RepeatMasker (v4.1.2-p1) [52] via the repeat library generated from combining de novo prediction, the reference library (Dfam and Repbase) and the avian repeat library [53]. Repetitive elements accounted for 15%–17% of the genome, most of which were long interspersed nuclear elements (Table S5). ClassifyTE [54] was used to classify unclassified transposable elements. TRASH (v1.2) [55] was used to identify and extract tandem repeats in genome sequences and investigate their higher-order structures.

DNA methylome analysis

DNA 5mC methylation was called with Nanopolish (-q cpg, v0.13.2) [56] by using the Hidden Markov Model. ONT fast5 files were used as the input files. The methylation frequency was calculated as the number of reads on methylated cytosine divided by the total number of reads covering each cytosine site in the reference.

Strategy to identify missing genes

We used the proteins of the assembled genomes to find sequences homologous to any of the 571 proteins of genes previously thought to be missing in the bird genomes, of which 274 were thought to be missing from all avian genomes [1,2,3, 5, 57]. The human protein sequences of the corresponding missing genes were used as query sequences to search for homologs in the newly assembled Silkie and Mallard genomes via the reciprocal best-hit algorithm with Mmseqs2 (Release 15-6f452) [58]. We manually checked each matched candidate sequence based on the list of missing genes to distinguish synonyms, paralogs, and alignment errors. We used the AlphaFold 3 server [59] to predict the protein conformation of TNFA with seed 12346. Finally, JCVI (v1.3.9) [60] was used to plot gene collinearity.

Results

Near telomere-to-telomere genome assembly and completeness evaluation

To achieve complete assembly of the genomes of Silkie and Mallard, we adopted multiple high-coverage complementary technologies. The CAU_Silkie_2.0 genome was assembled by incorporating ONT and PacBio HiFi long-read sequences as well as sequences from high-throughput chromatin conformation capture (Hi-C) technologies (~ 39X HiFi, ~ 245X ONT and ~ 193X Hi-C, Table S2), and the N50 of ONT reads reached 33 kb (Fig. S1). While for CAU_Wild_2.0, in addition to HiFi and ONT, Hi-C sequences also include PacBio Continuous Long read (CLR) sequences, BioNano Optical Maps (BOMs), and Illumina sequences (~ 36X HiFi, ~ 207X ONT, ~ 88X BOM, ~ 93X CLR, ~ 116X Hi-C and 121X Illumina, Table S1), and the N50 of ONT reads reaches 32.6 kb (Fig. S1). Multiple complementary high-depth sequencing datasets can effectively ensure the continuity, completeness, and accuracy of the assembly. For both CAU_Silkie_2.0 and CAU_Wild_2.0, we integrated multiple data sources and used a manual assembly pipeline based on HiFi phased contigs by overlapping contigs (Table S6 and S7) from multiple data sources and assembly software (Fig. S2 and S3).

The final genome size of CAU_Silkie_2.0 is 1.09 Gb, with a scaffold N50 size of 90.91 Mb (Table S8). A total of 1.08 Gb (99.03%) of genome sequence was further assigned to 40 chromosomes with only 12 gaps, including 36 gap-free chromosomes and 16 T2T chromosomes (Fig. 1a and b, Table S9). Compared with CAU_Silkie_1.0, the W chromosome was rescued (Fig. S4). By comparing CAU_Silkie_2.0 with CAU_Silkie_1.0, we found that the telomere and subtelomere regions were also rescued on Chr1, ChrZ, Chr31 and Chr35 (Fig. 1b, Table S9). The Hi-C interaction signals from the genome indicate the absence of large-scale structural errors (Fig. S5). Furthermore, the near T2T assembly contained a total of 33 Mb of new sequences ranging from 7 kb to 6.59 Mb per chromosome, which was absent in CAU_Silkie_1.0 chromosomes with extremely high GC contents or extremely high AT contents (Fig. 1b, Fig. S6b, Table S10).

The final genome size of CAU_Wild_2.0 is 1.22 Gb, with a scaffold N50 size of 76.95 Mb (Table S8), becoming the best quality duck genome. A total of 1.21 Gb (99.06%) of genome sequences were further assigned to 41 chromosomes with only 3 gaps, including 39 chromosomes that are gap-free and 23 chromosomes that are T2T (Table S11), a significant decrease in gap number (3% vs. 318, 99%) compared with CAU_Wild_1.0, inversions on Chr4, Chr10, and ChrW were corrected in CAU_Wild_2.0, and the centromeres of 36 chromosomes were identified in CAU_Wild_2.0 (Fig. 1c and d, Table S12). The Hi-C interaction signals from the genome indicate the absence of large-scale structural errors in the assembly (Fig. S7). Furthermore, the near T2T assembly contained a total of 72 Mb of new sequences from 219 kb to 4.26 Mb, which were absent in CAU_Wild_1.0 chromosomes with extremely high GC contents or extremely high AT contents (Fig. 1d, Fig. S6a, Table S10).

To further evaluate the completeness of chromosome assembly, we searched for the presence of telomere repeats (TTAGGG)n and centromeres within CAU_Wild_2.0, CAU_Wild_1.0, and SKLA1.0 [61] for comparison, SKLA1.0 is a chromosome-scale Pekin duck (Anas platyrhynchos) assembly generated recently. We found that the telomere repeats were present at the ends of 36 chromosomes in CAU_Wild_2.0 (Fig. 1c and d), with an average length of 10.90 kb (a total of 446.88 kb, Table S11), but few telomere repeats were observed within CAU_Wild_1.0 and SKLA1.0 (Fig. S8 and S9). The centromere sequences were predicted on 36 chromosomes of CAU_Wild_2.0 (Fig. 1c and d; Table S12), but few centromere sequences have been observed within CAU_Wild_1.0 and SKLA1.0 (Fig. S8 and S9). We also found that telomere repeats were present at the ends of 36 chromosomes in CAU_Silkie_2.0, with an average length of 8.44 kb (a total of 337.77 kb, Table S9, Fig. 1a and b). Functional centromeres can be determined from ChIP-seq data of centromere protein A (CENPA), which is available for chickens. We downloaded related data [62] from chickens and detected functional centromeres across the entire genome. There are 24 chromosomes in CAU_Silkie_2.0 with peaks where functional centromeres are located (Fig. 1a and b; Table S13).

We assessed the genome from BUSCO, QV, and read alignment rates. BUSCO scores revealed that CAU_Silkie_2.0 (96.55%) and CAU_Wild_2.0 (96.97%) achieved superior assembly quality (Table S14). For CAU_Silkie_2.0, the quality value reached 36.65, leading to a base accuracy of 99.978%. The mapping rates of HiFi and ONT reads achieved 99.52% and 99.63%, respectively, also mapping rates of the reads from GGswu (Huxu chicken, Gallus gallus) achieved 99.92% (HiFi) and 99.40% (ONT). And for CAU_Wild_2.0, the QV reached 44.17, leading to a base accuracy of 99.99627%; the mapping rates of ONT, HiFi and Illumina reads achieved 99.75%, 99.60% and 99.89%, respectively, and mapping rates of the reads from SKLA1.0 (Pekin duck, Anas platyrhynchos) achieved 99.74% (Illumina) and 99.51% (ONT). The aforementioned indicators show that the two assemblies ranked in the first tier among bird genomes.

Annotation of repetitive elements, noncoding RNAs, and protein-coding genes

Repetitive element annotation revealed that 17.62% (21.8 Mb) and 15.17% (16.6 Mb) of the CAU_Wild_2.0 and CAU_Silkie_2.0 elements are composed of repetitive elements, respectively, and long interspersed nuclear elements (LINEs) constitute the largest class of transposable elements annotated in both CAU_Wild_2.0 and CAU_Silkie_2.0; other predominant repetitive elements are summarized in Table S5. Noncoding RNA was also detected, accounting for 0.01% of both CAU_Wild_2.0 and CAU_Silkie_2.0 (Table S4), 244 miRNA, 521 tRNA, 264 rRNA and 297 snRNA were annotated in CAU_Wild_2.0; 255 miRNA, 318 tRNA, 60 rRNA and 296 snRNA were annotated in CAU_Silkie_2.0 respectively. The protein-coding genes were subsequently annotated via a combination of ab initio, homology-based, and transcript evidence prediction approaches. For transcript evidence, 42 tissues and 16 tissues (Table S3) were used for CAU_Silkie_2.0 and CAU_Wild_2.0, respectively, and a total of 20,264 and 19,621 genes were successfully identified from CAU_Silkie_2.0 and CAU_Wild_2.0, respectively. After gene structural annotation, InterPro, PANZER2, EggNOG, SwissProt, and NR were employed for gene functional annotations, and 18,697 (92.27%) and 18,574 (94.66%) genes were mapped to at least 1 database for CAU_Silkie_2.0 and CAU_Wild_2.0, respectively (Table S15). Evaluation of completeness and accuracy of annotation showed high-quality results for both chicken (94.78%) and duck annotations (96.03%, Table S16).

Notably, gap-free sex chromosomes (ChrW and ChrZ) were assembled for the first time in Mallard. Good gene collinearity was identified with a greater number of new genes (Fig. S10). There were 864 and 182 protein-coding genes for ChrZ and ChrW, respectively; among them, 805 and 149 genes with functional annotations for ChrZ and ChrW, respectively, and 96 new genes with complete open reading frames (ORFs) were compared with CAU_Wild_1.0 in total. (Table S17 and S18, Fig. S10).

Differences between macro- and microchromosomes and diverse centromere types

A comparison of the newly assembled near T2T avian genomes of CAU_Wild_2.0 and CAU_Silkie_2.0 with their previous versions revealed that the majority of centromeric and telomeric sequences (59/82, 36/42; 48/80, 24/40; Tables S9, S11, S12, and S13) were identified. By utilizing near T2T genomes, we also identified differences in avian genomes between macrochromosomes and microchromosomes, including differences in GC content, repeat sequence content, gene density, and the 5mC methylation level (Fig. 2a and b). In both Silkie and Mallard, the microchromosomes tended to present the following characteristics: higher GC content, a greater proportion of repetitive sequences, higher gene density, and a higher level of 5mC methylation (Fig. 2a and b) than macrochromosomes.

We focused on newly assembled sequences, i.e., centromeres, and our comparative analysis of centromere repeat sequence structures in Silkie and Mallard revealed that their genome centromere structures can essentially be categorized into three types. In Mallard, the inherent type APL-HaeIII is present in the centromeres of almost all chromosomes (Fig. 2c–e), and the centromeres of Chr5 and Chr21 have been invaded by chicken repeat 1 (CR1) transposable elements (Fig. 2d, Fig. S11). In the chicken genome, centromeric regions are composed primarily of satellite sequences, CNM-41 [63] sequences, and simple repeats. The dominant portions of the repetitive sequences transition from satellite sequences in macrochromosomes to CNM-41 sequences in microchromosomes. In addition to centromeres with tandem repeat sequences, we also obtained centromeres from Chr5, ChrZ, and Chr27 without tandem repeat sequences in the chicken genome [62].

Highly heterochromatic W chromosomes serve as refuges for ERV accumulation

The relatively large genome size is also accompanied by a relatively high content of repetitive sequences; the Mallard genome has approximately 2% more repetitive sequences than the chicken genome does (Table S5). Upon categorization of the newly identified, unclassified repetitive sequences before, we discerned that the Mallard genome encompasses 5.29% of the DNA transposons (higher than the chicken genome by 4.26%), which are relatively evenly dispersed across all chromosomes, in contrast to the chicken genome, where they are predominantly located on the macrochromosomes (Fig. 3a). Moreover, active transposable elements of the LINE, which primarily target the centromeres of Chr5 and Chr21 for transposition, were identified as mentioned above (Fig. 3b). For both the chicken and the Mallard genomes, W chromosomes contained disproportionately high amounts of endogenous retroviruses (ERVs), with lengths exceeding 4.5 Mb and 8 Mb (47.87% and 45.45%, respectively) (Fig. 3c). Additionally, the type of LINE sequence activated in the Mallard was identified as CR1 (Fig. 3d). Upon systematic verification, we discovered that only ZDHHC20 and RRP9 were inserted by active CR1 elements, which may impact the function of those genes. Comparative analysis revealed that the primary ERV type in the Silkie W chromosome is ERVL, whereas in the Mallard chromosome, it is mainly ERV1 and ERVL (Fig. 3d); also, we found that the subtype of active LINEs in Mallard genome is CR1, mainly located in Chr5 and Chr21 (Fig. 3e).

Relatively high methylation levels of sex chromosomes, microchromosomes, centromeres, and telomeres

From the perspective of average 5mC methylation levels across whole-genome chromosomes, the average 5mC methylation level of the duck genome was slightly greater than that of the chicken genome (0.5919 vs. 0.5698, 4.58% greater, Fig. 4a). When we focused on the differences between chromosome types, we observed that, in both Silkie and the Mallard genomes, the methylation levels of the sex chromosomes and microchromosomes were greater than those of the macrochromosomes (Silkie: 21.16% and 4.44%, respectively; the Mallard: 9.93% and 5.52%, respectively; Fig. 4b). For the newly assembled telomeres and centromeres, we also compared their methylation levels with those of the whole genomes of Silkie and Mallard. As anticipated, these gene-poor deserts, which are rich in repetitive sequences, presented significantly higher methylation levels than did the whole genome (Fig. 4c).

For the gene context region, we found that only the 5′ untranslated region (5′ UTR) presented significantly low methylation relative to the average methylation level of chromosomes (Fig. 4d). The 5′ UTR is a regulatory region of DNA situated at the 5′ end of all protein-coding genes that are transcribed into mRNA but not translated into protein. This region contains various regulatory elements and plays a major role in controlling translation initiation [64].

Recovery of “missing genes” from the newly assembled genomes

By utilizing the new genome along with genomes from RefSeq, we revisited the list of missing genes. A total of 325 (56.9%) missing genes were identified from the new genomes (Fig. 5a, Table S19). By searching for missing genes from a broader perspective (all avian genomes from RefSeq), we found that 315 genes (55.1%) could be found in the avian orthologous gene database of RefSeq (Table S19), and when combined with our results, a total of 401 (70.2%) genes could be recovered (Table S18). By observing the distribution of the missing genes recovered in this study on the chromosomes, we found that the missing genes are concentrated mainly in the centromeres, telomeres, acrocentric chromosomes, and microchromosomes, which are difficult to assemble (Chr12, Chr14, Chr16, Chr29-38 of CAU_Silkie_2.0, Chr2, ChrZ, Chr17, Chr30, Chr33, Chr35, Chr37-39 of CAU_Wild_2.0, Fig. S12 and S13), suggesting that the reason for the absence of genes could not be found previously because of the difficulty in assembling certain highly heterochromatic microchromosomes completely, resulting in these adjacent gene blocks being missing in a block manner.

Tumor necrosis factor alpha (TNFA) is a pleiotropic cytokine that plays a significant regulatory role in avian energy metabolism, insulin sensitivity, appetite, and disease pathogenesis [65,66,67]. Although the TNFA gene in chicken genomes has been annotated manually on Chr16 [9], the TNFA gene in ducks has still not been annotated from published genomes. Here, TNFA was annotated from both our newly assembled Silikie and Mallard genomes, as well as from previously published genomes CAU_Silkie_1.0 (Fig. S19). To validate the accuracy of our assembly and annotation of TNFA, we analyzed the gene collinearity between the annotated TNFA gene in the cuckoo and the TNFA gene identified in this study. The strong gene collinearity among them confirms the accuracy of the TNFA in this study and the precision of our assembly and annotation process (Fig. 5b). In addition, we confirmed the identification of TNFAs via phylogenetic trees, motif analysis, and analysis of conserved protein domains. These findings indicate that the protein sequence of TNFA is conserved with that found in mammals (Fig. 5c). To understand the expression pattern of TNFA in ducks, we quantified its expression across 19 tissues in ducks (Table S20). The results revealed that TNFA was most highly expressed in the brain and spleen, not expressed in the liver, and expressed in all other examined tissues, corresponding with its role as a pleiotropic cytokine in biological functions (Fig. S14). Additionally, we predicted the protein conformation of TNFA and found that the proteins encoded by TNFA in different species exhibited similar conformations within conserved structural domains with the highest confidence ratings (blue and light blue, Fig. 5d). These findings suggest that the functions of the proteins encoded by TNFA are highly conserved across species.

For several genes that have not been previously annotated in chickens and ducks but have garnered significant research interest, we have, for the first time, successfully assembled and annotated these genes in both Silkie and the Mallard. For example, BAX encodes proteins that undergo a conformation change that causes translocation to the mitochondrial membrane, leading to the release of cytochrome c, which then triggers apoptosis under stress conditions [68]; CFP encodes a plasma glycoprotein that positively regulates the alternative complement pathway of the innate immune system [69]; and GAPDHS encodes a protein that belongs to the glyceraldehyde-3-phosphate dehydrogenase family of enzymes, which may play an important role in regulating the switch between different energy-producing pathways during spermiogenesis and is required for sperm motility and male fertility [70] (Table S19).

Discussion

We successfully assembled and annotated near T2T genomes for Silkie and Mallard by using multiple high-coverage complementary technologies, including a gap-free pair of ZW sex chromosomes in ducks, a milestone not previously achieved in avian genomic research. A review of the latest studies, such as SKLA1.0 (Fig. S9) and the nearly complete chicken genome GGswu [71], revealed that no research has compiled complete ZW chromosomes. In the Silkie genome, ChrZ was assembled without gaps, whereas the W chromosome still exhibited some gaps, potentially resolvable with longer ONT ultralong reads (N50 greater than 100 kb). This assembly situation parallels that of the W chromosome in GGswu chickens [71].

Furthermore, we utilized the centromere structures of Silkie and Mallard for comparative analysis for the first time, revealing novel centromeric repeat sequences (Fig. 2c–e). Notably, CR1 has infiltrated the centromeric region of the Mallard genome alongside the previously identified APL-HaeIII [37]. In the Silkie genome, there is a transition from satellite sequences in macrochromosomes to the CNM-41 sequences characteristic of microchromosomes (Fig. 2f–h); additionally, we identified centromeres from Chr5, ChrZ, and Chr27 in the Silkie genome that lack tandem repeat sequences (Fig. 1a and b), which aligns with results from previous studies [62, 71]. Our examination of repetitive sequences revealed that the heterochromatic W chromosome serves as a refuge for ERVs (Fig. 3c), which is consistent with prior research [72]. The predominant types of ERVs differ between chickens and ducks: ERV1 is most prevalent on Silkie ChrW, whereas Mallard ChrW is more highly represented by both ERV1 and ERVL (Fig. 3d). We conducted a quantitative analysis of 5mC methylation levels across the genome and discovered that telomeric and centromeric regions, gene-poor areas rich in repetitive sequences, exhibit significantly greater methylation than does the overall genome (Fig. 4c). These regions, known as constitutive heterochromatin, exhibit relatively high levels of methylation, as revealed by a study involving 13 various bird species from 10 families across 7 orders [73]. Our results also revealed that sex chromosomes and microchromosomes present elevated levels of 5mC methylation. Another study [74] also indicated that the W chromosome and dense chromosomes in chicken genomes present increased 5mC methylation. Interestingly, only the 5' UTR regions of genes presented significantly lower methylation (Fig. 4d). This region contains various regulatory elements, which are mostly associated with the promoter region [64], indicating their involvement in regulating gene expression through 5mC methylation. This study quantified 5mC methylation levels only in DNA exclusively from blood; further investigation across additional tissues and developmental stages may be necessary for comprehensive validation.

Ultimately, we recovered 401 (70.20%) missing genes from this study and 325 (56.92%) missing genes from Silkie and Mallard genomes, including the first identification of TNFA in ducks, revealing diverse expression trends across tissues. Compared with CAU_Silkie_1.0 and CAU_Wild_1.0, our current assemblies, CAU_Silkie_2.0 and CAU_Wild_2.0, significantly increased the number of identified missing genes, with 165 (150%) and 203 (271%) more missing genes, respectively.

Birds represent over 30% of known tetrapod diversity [75], and the chicken (Gallus gallus) and duck (Anas platyrhynchos) are two important model species for scientific discovery in developmental biology, genetics, virology, and immunology [76,77,78]. Two near-complete avian genomes provide important data for avians to solve important biological problems in some fields such as missing genes, avian genome evolution, and avian phenotypic diversity. Chicken and duck are the two most widely studied poultry species, but some genes with important functions have not been previously annotated, such as TNFA in ducks and other genes annotated in this paper for the first time (Table S19), and the near T2T genomes and annotations of Silkie and Mallard will lay a valuable database for the functional and evolutionary analyses of these annotated genes and their related economic traits.

Conclusion

In conclusion, the successful near T2T assemblies of the Mallard and Silkie, including the novel reconstruction of gap-free sex chromosomes in ducks, have profoundly enriched our comprehension of avian genetic architecture. This study reveals the differences among various chromosome types concerning centromeres, repetitive sequences, and methylation patterns. Moreover, the identification and annotation of previously thought-to-be missing genes lay the groundwork for future research aimed at exploring their functional significance. This work not only demonstrates the importance of T2T genomes but also provides a theoretical foundation for investigating the functions of missing genes.

Data Availability

Abbreviations

References

1.Zhang G, Li C, Li Q, Li B, Larkin DM, Lee C, et al. Comparative genomics reveals insights into avian genome evolution and adaptation. Science. 2014;346.(2014)1126/science.1251385.: 1311.
10.1126/science.1251385
Article|CAS|PubMed|Google Scholar
2.Lovell PV, Wirthlin M, Wilhelm L, Minx P, Lazar NH, Carbone L, et al. Conserved syntenic clusters of protein coding genes are missing in birds. Genome Biol. 2014;15.(2014)org/10.1186/s13059-014-0565-1.: 565.
10.1186/s13059-014-0565-1
Article|CAS|PubMed|Google Scholar
3.Botero-Castro F, Figuet E, Tilak MK, Nabholz B, Galtier N. Avian genomes revisited.(2017)Hidden genes uncovered and the rates versus traits paradox in birds.Mol Biol Evol.: 3123.
10.1093/molbev/msx236
Article|CAS|PubMed|Google Scholar
4.Bravo GA, Schmitt CJ, Edwards SV. What have we learned from the first 500 avian genomes? Annu Rev Ecol Evol S. 2021;52.(2021)org/10.1146/annurev-ecolsys-012121-085928.: 611.
10.1146/annurev-ecolsys-012121-085928
Article|CAS|PubMed|Google Scholar
5.Warren WC, Hillier LW, Tomlinson C, Minx P, Kremitzki M, Graves T, et al. A new chicken genome assembly provides insight into avian genome structure. G3-Genes Genom Genet. 2017;7.(2017)116.035923.: 109.
Article|CAS|PubMed|Google Scholar
6.Liao WW, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, et al. A draft human pangenome reference. Nature. 2023;617.(2023)org/10.1038/s41586-023-05896-x.: 312.
10.1038/s41586-023-05896-x
Article|CAS|PubMed|Google Scholar
7.Rice ES, Alberdi A, Alfieri J, Athrey G, Balacco JR, Bardou P, et al. A pangenome graph reference of 30 chicken genomes allows genotyping of large and complex structural variants. BMC Biol. 2023;21.(2023)org/10.1186/s12915-023-01758-0.: 267.
10.1186/s12915-023-01758-0
Article|CAS|PubMed|Google Scholar
8.Cleveland DW, Mao Y, Sullivan KF. Centromeres and kinetochores.(2003)From epigenetics to mitotic checkpoint signaling.Cell.: 407.
10.1016/s0092-8674(03)00115-6
Article|CAS|PubMed|Google Scholar
9.Zhu F, Yin ZT, Zhao QS, Sun YX, Jie YC, Smith J, et al. A chromosome-level genome assembly for the silkie chicken resolves complete sequences for key chicken metabolic, reproductive, and immunity genes. Commun Biol. 2023;6.(2023)org/10.1038/s42003-023-05619-y.: 1233.
10.1038/s42003-023-05619-y
Article|CAS|PubMed|Google Scholar
10.Zhu F, Yin ZT, Wang Z, Smith J, Zhang F, Martin F, et al. Three chromosome-level duck genome assemblies provide insights into genomic variation during domestication. Nat Commun. 2021;12.(2021)org/10.1038/s41467-021-26272-1.: 5932.
10.1038/s41467-021-26272-1
Article|CAS|PubMed|Google Scholar
11.Sim SB, Corpuz RL, Simmonds TJ, Geib SM. HiFiAdapterFilt, a memory efficient read processing pipeline, prevents occurrence of adapter sequence in pacbio HiFi reads and their negative impacts on genome assembly. BMC Genomics. 2022;23.(2022)org/10.1186/s12864-022-08375-1.: 157.
10.1186/s12864-022-08375-1
Article|CAS|PubMed|Google Scholar
12.Wick RR, Judd LM, Gorrie CL, Holt KE. Completing bacterial genome assemblies with multiplex minion sequencing. Microb Genom. 2017;3.(2017)0.000132.
10.1099/mgen.0.000132
Article|CAS|PubMed|Google Scholar
13.Chen S, Zhou Y, Chen Y, Gu J. Fastp.(2018)An ultra-fast all-in-one fastq preprocessor.Bioinformatics.
10.1093/bioinformatics/bty560
Article|CAS|PubMed|Google Scholar
14.Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021;18.(2021)org/10.1038/s41592-020-01056-5.: 170.
10.1038/s41592-020-01056-5
Article|CAS|PubMed|Google Scholar
15.Li H. Minimap2.(2018)Pairwise alignment for nucleotide sequences.Bioinformatics.: 3094.
10.1093/bioinformatics/bty191
Article|CAS|PubMed|Google Scholar
16.Hu J, Wang Z, Sun Z, Hu B, Ayoola AO, Liang F, et al. NextDenovo.(2024)An efficient error correction and accurate assembly tool for noisy long reads.Genome Biol.: 107.
10.1186/s13059-024-03252-4
Article|CAS|PubMed|Google Scholar
17.Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu.(2017)Scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.Genome Res.: 722.
10.1101/gr.215087.116
Article|CAS|PubMed|Google Scholar
18.Servant N, Varoquaux N, Lajoie BR, Viara E, Chen CJ, Vert JP, et al. HiC-Pro.(2015)An optimized and flexible pipeline for Hi-C data processing.Genome Biol.: 259.
10.1186/s13059-015-0831-x
Article|CAS|PubMed|Google Scholar
19.Zhou C, McCarthy SA, Durbin R. Yahs.(2023)Yet another Hi-C scaffolding tool.Bioinformatics.
Article|CAS|PubMed|Google Scholar
20.Zhang H, Song L, Wang X, Cheng H, Wang C, Meyer CA, et al. Fast alignment and preprocessing of chromatin profiles with chromap. Nat Commun. 2021;12.(2021)org/10.1038/s41467-021-26865-w.: 6566.
10.1038/s41467-021-26865-w
Article|CAS|PubMed|Google Scholar
21.Durand NC, Robinson JT, Shamim MS, Machol I, Mesirov JP, Lander ES, et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 2016;3.(2016)07.012.: 99.
10.1016/j.cels.2015.07.012
Article|CAS|PubMed|Google Scholar
22.Xu M, Guo L, Gu S, Wang O, Zhang R, Peters BA, et al. TGS-GapCloser.(2020)A fast and accurate gap closer for large genomes with low coverage of error-prone long reads.Gigascience.
Article|CAS|PubMed|Google Scholar
23.Huang Y, Wang Z, Schmidt MA, Su H, Xiong L, Zhang J. DEGAP.(2024)Dynamic elongation of a genome assembly path.Brief Bioinform.
Article|CAS|PubMed|Google Scholar
24.Guan D, McCarthy SA, Wood J, Howe K, Wang Y, Durbin R. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics. 2020;36.(2020)org/10.1093/bioinformatics/btaa025.: 2896.
10.1093/bioinformatics/btaa025
Article|CAS|PubMed|Google Scholar
25.Breitwieser FP, Baker DN, Salzberg SL. Krakenuniq.(2018)Confident and fast metagenomics classification using unique k-mer counts.Genome Biol.: 198.
10.1186/s13059-018-1568-0
Article|CAS|PubMed|Google Scholar
26.Hu J, Fan J, Sun Z, Liu S. Nextpolish.(2020)A fast and efficient genome polishing tool for long-read assembly.Bioinformatics.: 2253.
10.1093/bioinformatics/btz891
Article|CAS|PubMed|Google Scholar
27.Uliano-Silva M, Ferreira JGRN, Krasheninnikova K, Darwin Tree of Life Consortium, Formenti G, Abueg L, et al. MitoHiFi.(2023)A python pipeline for mitochondrial genome assembly from PacBio high fidelity reads.BMC Bioinformatics.: 288.
Article|CAS|PubMed|Google Scholar
28.Li D, Liu CM, Luo R, Sadakane K, Lam TW. MEGAHIT.(2015)An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de bruijn graph.Bioinformatics.: 1674.
10.1093/bioinformatics/btv033
Article|CAS|PubMed|Google Scholar
29.Yuan Y, Bayer PE, Lee HT, Edwards D. runBNG.(2017)A software package for bionano genomic analysis on the command line.Bioinformatics.: 3107.
10.1093/bioinformatics/btx366
Article|CAS|PubMed|Google Scholar
30.Hu J, Wang Z, Liang F, Liu SL, Ye K, Wang DP. NextPolish2.(2024)A repeat-aware polishing tool for genomes assembled using HiFi long reads.Genom Proteom Bioinf.
Article|CAS|PubMed|Google Scholar
31.Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, et al. Pilon.(2014)An integrated tool for comprehensive microbial variant detection and genome assembly improvement.PLoS ONE.
10.1371/journal.pone.0112963
Article|CAS|PubMed|Google Scholar
32.Manni M, Berkeley MR, Seppey M, Zdobnov EM. BUSCO.(2021)Assessing genomic data quality and beyond.Curr Protoc.
10.1002/cpz1.323
Article|CAS|PubMed|Google Scholar
33.Rhie A, Walenz BP, Koren S, Phillippy AM. Merqury.(2020)Reference-free quality, completeness, and phasing assessment for genome assemblies.Genome Biol.: 245.
10.1186/s13059-020-02134-9
Article|CAS|PubMed|Google Scholar
34.Lin Y, Ye C, Li X, Chen Q, Wu Y, Zhang F, et al. quarTeT.(2023)A telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification.Hortic Res.
Article|CAS|PubMed|Google Scholar
35.Tarasov A, Vilella AJ, Cuppen E, Nijman IJ, Prins P. Sambamba.(2015)Fast processing of ngs alignment formats.Bioinformatics.: 2032.
10.1093/bioinformatics/btv098
Article|CAS|PubMed|Google Scholar
36.Quinlan AR, Hall IM. BEDTools.(2010)A flexible suite of utilities for comparing genomic features.Bioinformatics.: 841.
10.1093/bioinformatics/btq033
Article|CAS|PubMed|Google Scholar
37.Uno Y, Nishida C, Hata A, Ishishita S, Matsuda Y. Molecular cytogenetic characterization of repetitive sequences comprising centromeric heterochromatin in three anseriformes species. PLoS ONE. 2019;14.(2019)pone.0214028.
10.1371/journal.pone.0214028
Article|CAS|PubMed|Google Scholar
38.Benson G. Tandem repeats finder.(1999)A program to analyze DNA sequences.Nucleic Acids Res.: 573.
10.1093/nar/27.2.573
Article|CAS|PubMed|Google Scholar
39.Zhang Y, Chu J, Cheng H, Li H. De novo reconstruction of satellite repeat units from sequence data. Genome Res. 2023;33.(2023)278005.123.: 1994.
10.1101/gr.278005.123
Article|CAS|PubMed|Google Scholar
40.Vollger MR, Kerpedjiev P, Phillippy AM, Eichler EE. StainedGlass.(2022)Interactive visualization of massive tandem repeat structures with identity heatmaps.Bioinformatics.: 2049.
10.1093/bioinformatics/btac018
Article|CAS|PubMed|Google Scholar
41.Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and hisat-genotype. Nat Biotechnol. 2019;37.(2019)org/10.1038/s41587-019-0201-4.: 907.
10.1038/s41587-019-0201-4
Article|CAS|PubMed|Google Scholar
42.Shumate A, Wong B, Pertea G, Pertea M. Improved transcriptome assembly using a hybrid of long and short reads with stringtie. PLoS Comput Biol. 2022;18.(2022)pcbi.1009730.
10.1371/journal.pcbi.1009730
Article|CAS|PubMed|Google Scholar
43.Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 2008;9.(2008)R7. https://doi. org/10.1186/gb-.
10.1186/gb-2008-9-1-r7
Article|CAS|PubMed|Google Scholar
44.Gabriel L, Bruna T, Hoff KJ, Ebel M, Lomsadze A, Borodovsky M, et al. BRAKER3.(2024)Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS and TSEBRA.Genome Res.: 769.
Article|CAS|PubMed|Google Scholar
45.Stiehler F, Steinborn M, Scholz S, Dey D, Weber APM, Denton AK. Helixer.(2021)Cross-species gene annotation of large eukaryotic genomes using deep learning.Bioinformatics.: 5291.
10.1093/bioinformatics/btaa1044
Article|CAS|PubMed|Google Scholar
46.Haas BJ, Zeng Q, Pearson MD, Cuomo CA, Wortman JR. Approaches to fungal genome annotation. Mycology. 2011;2.(2011)118–41. https://doi. org/10.1080/21501203.: 118.
10.1080/21501203.2011.606851
Article|CAS|PubMed|Google Scholar
47.Huang N, Li H. Compleasm.(2023)A faster and more accurate reimplementation of BUSCO.Bioinformatics.
Article|CAS|PubMed|Google Scholar
48.Chan PP, Lin BY, Mak AJ, Lowe TM. tRNAscan-SE 2.0.(2021)Improved detection and functional classification of transfer rna genes.Nucleic Acids Res.: 9077.
Article|CAS|PubMed|Google Scholar
49.Daub J, Eberhardt RY, Tate JG, Burge SW. Rfam.(2015)Annotating families of non-coding rna sequences.Methods Mol Biol.: 349.
10.1007/978-1-4939-2291-8_22
Article|CAS|PubMed|Google Scholar
50.Nawrocki EP, Eddy SR. Infernal 1.1.(2013)100-fold faster rna homology searches.Bioinformatics.: 100-.
Article|CAS|PubMed|Google Scholar
51.Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, et al. Repeatmodeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci U S A. 2020;117.(2020)1073/pnas.1921046117.: 9451.
10.1073/pnas.1921046117
Article|CAS|PubMed|Google Scholar
52.Tarailo-Graovac M, Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinform. 2009;Chapter 4.(2009)1002/0471250953.bi0410s25.: 4.
Article|CAS|PubMed|Google Scholar
53.Peona V, Blom MPK, Xu L, Burri R, Sullivan S, Bunikis I, et al. Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise. Mol Ecol Resour. 2021;21.(2021)1111/1755-0998.13252.: 263.
10.1111/1755-0998.13252
Article|CAS|PubMed|Google Scholar
54.Panta M, Mishra A, Hoque MT, Atallah J. ClassifyTE.(2021)A stacking-based prediction of hierarchical classification of transposable elements.Bioinformatics.: 2529.
10.1093/bioinformatics/btab146
Article|CAS|PubMed|Google Scholar
55.Wlodzimierz P, Hong M, Henderson IR. TRASH.(2023)Tandem repeat annotation and structural hierarchy.Bioinformatics.
Article|CAS|PubMed|Google Scholar
56.Loman NJ, Quick J, Simpson JT. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat Methods. 2015;12.(2015)1038/nmeth.3444.: 733.
10.1038/nmeth.3444
Article|CAS|PubMed|Google Scholar
57.Yin ZT, Zhu F, Lin FB, Jia T, Wang Z, Sun DT, et al. Revisiting avian “missing” genes from de novo assembled transcripts. BMC Genomics. 2019;20.(2019)org/10.1186/s12864-018-5407-1.: 4.
10.1186/s12864-018-5407-1
Article|CAS|PubMed|Google Scholar
58.Steinegger M, Soding J. Mmseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol. 2017;35.(2017)1038/nbt.3988.: 1026.
10.1038/nbt.3988
Article|CAS|PubMed|Google Scholar
59.Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024;630.(2024)org/10.1038/s41586-024-07487-w.: 493.
10.1038/s41586-024-07487-w
Article|CAS|PubMed|Google Scholar
60.Tang H, Krishnakumar V, Zeng X, Xu Z, Taranto A, Lomas JS, et al. JCVI.(2024)A versatile toolkit for comparative genomics analysis.iMeta.
Article|CAS|PubMed|Google Scholar
61.Hu J, Song L, Ning M, Niu X, Han M, Gao C, et al. A new chromosome-scale duck genome shows a major histocompatibility complex with several expanded multigene families. BMC Biol. 2024;22.(2024)org/10.1186/s12915-024-01817-0.: 31.
10.1186/s12915-024-01817-0
Article|CAS|PubMed|Google Scholar
62.Shang WH, Hori T, Toyoda A, Kato J, Popendorf K, Sakakibara Y, et al. Chickens possess centromeres with both extended tandem repeats and short non-tandem-repetitive sequences. Genome Res. 2010;20.(2010)106245.110.: 1219.
10.1101/gr.106245.110
Article|CAS|PubMed|Google Scholar
63.Deryusheva S, Krasikova A, Kulikova T, Gaginskaya E. Tandem 41-bp repeats in chicken and japanese quail genomes.(2007)Fish mapping and transcription analysis on lampbrush chromosomes.Chromosoma.: 519.
10.1007/s00412-007-0117-5
Article|CAS|PubMed|Google Scholar
64.Barrett LW, Fletcher S, Wilton SD. Regulation of eukaryotic gene expression by the untranslated gene regions and other non-coding elements. Cell Mol Life Sci. 2012;69.(2012)org/10.1007/s00018-012-0990-9.: 3613.
10.1007/s00018-012-0990-9
Article|CAS|PubMed|Google Scholar
65.Borst SE. The role of TNF-alpha in insulin resistance. Endocrine. 2004;23.(2004)org/10.1385/ENDO:23:2-3:177.: 177.
10.1385/ENDO:23:2-3:177
Article|CAS|PubMed|Google Scholar
66.Kalliolias GD, Ivashkiv LB. TNF biology, pathogenic mechanisms and emerging therapeutic strategies. Nat Rev Rheumatol. 2016;12.(2016)2015.169.: 49.
10.1038/nrrheum.2015.169
Article|CAS|PubMed|Google Scholar
67.Akash MSH, Rehman K, Liaqat A. Tumor Necrosis Factor-Alpha.(2018)Role in development of insulin resistance and pathogenesis of type 2 diabetes mellitus.J Cell Biochem.: 105.
10.1002/jcb.26174
Article|CAS|PubMed|Google Scholar
68.Czabotar PE, Lee EF, Thompson GV, Wardak AZ, Fairlie WD, Colman PM. Mutation to Bax beyond the BH3 domain disrupts interactions with pro-survival proteins and promotes apoptosis. J Biol Chem. 2011;286.(2011)M110.161281.: 7123.
10.1074/jbc.M110.161281
Article|CAS|PubMed|Google Scholar
69.Ferreira VP, Cortes C, Pangburn MK. Native polymeric forms of properdin selectively bind to targets and promote activation of the alternative pathway of complement. Immunobiology. 2010;215.(2010)932–40. https://doi. org/10. 1016/j.imbio.: 932.
10.1016/j.imbio.2010.02.002
Article|CAS|PubMed|Google Scholar
70.Nicholls C, Li H, Liu JP. Gapdh.(2012)A common enzyme with uncommon functions.Clin Exp Pharmacol Physiol.: 674.
10.1111/j.1440-1681.2011.05599.x
Article|CAS|PubMed|Google Scholar
71.Huang Z, Xu Z, Bai H, Huang Y, Kang N, Ding X, et al. Evolutionary analysis of a complete chicken genome. Proc Natl Acad Sci U S A. 2023;120.(2023)1073/pnas.2216641120.
10.1073/pnas.2216641120
Article|CAS|PubMed|Google Scholar
72.Peona V, Palacios-Gimenez OM, Blommaert J, Liu J, Haryoko T, Jonsson KA, et al. The avian W chromosome is a refugium for endogenous retroviruses with likely effects on female-biased mutational load and genetic incompatibilities. Philos Trans R Soc Lond B Biol Sci. 2021;376.(2021)2020.0186.: 20200186.
10.1098/rstb.2020.0186
Article|CAS|PubMed|Google Scholar
73.Schmid M, Steinlein C. The hypermethylated regions in avian chromosomes. Cytogenet Genome Res. 2017;151.(2017)org/10.1159/000464268.: 216.
10.1159/000464268
Article|CAS|PubMed|Google Scholar
74.Krasikova AV, Kulikova TV. Distribution of heterochromatin markers in lampbrush chromosomes in birds. Russ J Genet. 2017;53.(2017)org/10.1134/S1022795417090071.: 1022.
10.1134/S1022795417090071
Article|CAS|PubMed|Google Scholar
75.Friedman-Einat M, Seroussi E. Avian leptin.(2019)Bird’s-eye view of the evolution of vertebrate energy-balance control.Trends Endocrinol Metab.: 819.
10.1016/j.tem.2019.07.007
Article|CAS|PubMed|Google Scholar
76.International Chicken Genome Sequencing C. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature. 2004;432.(2004)org/10.1038/nature03154.: 695.
Article|CAS|PubMed|Google Scholar
77.Tregaskes CA, Kaufman J. Chickens as a simple system for scientific discovery.(2021)The example of the mhc.Mol Immunol.: 12.
10.1016/j.molimm.2021.03.019
Article|CAS|PubMed|Google Scholar
78.Bean AGD, Baker ML, Stewart CR, Cowled C, Deffrasnes C, Wang LF, et al. Studying immunity to zoonotic diseases in the natural host - keeping it real. Nat Rev Immunol. 2013;13.(2013)org/10.1038/nri3551.: 851.
10.1038/nri3551
Article|CAS|PubMed|Google Scholar

Acknowledgements

Funding

Ethics Declaration

Rights and Permissions

[1] 1.Zhang G, Li C, Li Q, Li B, Larkin DM, Lee C, et al. Comparative genomics reveals insights into avian genome evolution and adaptation. Science. 2014;346.(2014)1126/science.1251385.: 1311.
10.1126/science.1251385
Article|CAS|PubMed|Google Scholar

[2] 2.Lovell PV, Wirthlin M, Wilhelm L, Minx P, Lazar NH, Carbone L, et al. Conserved syntenic clusters of protein coding genes are missing in birds. Genome Biol. 2014;15.(2014)org/10.1186/s13059-014-0565-1.: 565.
10.1186/s13059-014-0565-1
Article|CAS|PubMed|Google Scholar

[3] 3.Botero-Castro F, Figuet E, Tilak MK, Nabholz B, Galtier N. Avian genomes revisited.(2017)Hidden genes uncovered and the rates versus traits paradox in birds.Mol Biol Evol.: 3123.
10.1093/molbev/msx236
Article|CAS|PubMed|Google Scholar

[4] 4.Bravo GA, Schmitt CJ, Edwards SV. What have we learned from the first 500 avian genomes? Annu Rev Ecol Evol S. 2021;52.(2021)org/10.1146/annurev-ecolsys-012121-085928.: 611.
10.1146/annurev-ecolsys-012121-085928
Article|CAS|PubMed|Google Scholar

[5] 5.Warren WC, Hillier LW, Tomlinson C, Minx P, Kremitzki M, Graves T, et al. A new chicken genome assembly provides insight into avian genome structure. G3-Genes Genom Genet. 2017;7.(2017)116.035923.: 109.
Article|CAS|PubMed|Google Scholar

[6] 6.Liao WW, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, et al. A draft human pangenome reference. Nature. 2023;617.(2023)org/10.1038/s41586-023-05896-x.: 312.
10.1038/s41586-023-05896-x
Article|CAS|PubMed|Google Scholar

[7] 7.Rice ES, Alberdi A, Alfieri J, Athrey G, Balacco JR, Bardou P, et al. A pangenome graph reference of 30 chicken genomes allows genotyping of large and complex structural variants. BMC Biol. 2023;21.(2023)org/10.1186/s12915-023-01758-0.: 267.
10.1186/s12915-023-01758-0
Article|CAS|PubMed|Google Scholar

[8] 8.Cleveland DW, Mao Y, Sullivan KF. Centromeres and kinetochores.(2003)From epigenetics to mitotic checkpoint signaling.Cell.: 407.
10.1016/s0092-8674(03)00115-6
Article|CAS|PubMed|Google Scholar

[9] 9.Zhu F, Yin ZT, Zhao QS, Sun YX, Jie YC, Smith J, et al. A chromosome-level genome assembly for the silkie chicken resolves complete sequences for key chicken metabolic, reproductive, and immunity genes. Commun Biol. 2023;6.(2023)org/10.1038/s42003-023-05619-y.: 1233.
10.1038/s42003-023-05619-y
Article|CAS|PubMed|Google Scholar

[10] 10.Zhu F, Yin ZT, Wang Z, Smith J, Zhang F, Martin F, et al. Three chromosome-level duck genome assemblies provide insights into genomic variation during domestication. Nat Commun. 2021;12.(2021)org/10.1038/s41467-021-26272-1.: 5932.
10.1038/s41467-021-26272-1
Article|CAS|PubMed|Google Scholar

[11] 11.Sim SB, Corpuz RL, Simmonds TJ, Geib SM. HiFiAdapterFilt, a memory efficient read processing pipeline, prevents occurrence of adapter sequence in pacbio HiFi reads and their negative impacts on genome assembly. BMC Genomics. 2022;23.(2022)org/10.1186/s12864-022-08375-1.: 157.
10.1186/s12864-022-08375-1
Article|CAS|PubMed|Google Scholar

[12] 12.Wick RR, Judd LM, Gorrie CL, Holt KE. Completing bacterial genome assemblies with multiplex minion sequencing. Microb Genom. 2017;3.(2017)0.000132.
10.1099/mgen.0.000132
Article|CAS|PubMed|Google Scholar

[13] 13.Chen S, Zhou Y, Chen Y, Gu J. Fastp.(2018)An ultra-fast all-in-one fastq preprocessor.Bioinformatics.
10.1093/bioinformatics/bty560
Article|CAS|PubMed|Google Scholar

[14] 14.Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021;18.(2021)org/10.1038/s41592-020-01056-5.: 170.
10.1038/s41592-020-01056-5
Article|CAS|PubMed|Google Scholar

[15] 15.Li H. Minimap2.(2018)Pairwise alignment for nucleotide sequences.Bioinformatics.: 3094.
10.1093/bioinformatics/bty191
Article|CAS|PubMed|Google Scholar

[16] 16.Hu J, Wang Z, Sun Z, Hu B, Ayoola AO, Liang F, et al. NextDenovo.(2024)An efficient error correction and accurate assembly tool for noisy long reads.Genome Biol.: 107.
10.1186/s13059-024-03252-4
Article|CAS|PubMed|Google Scholar

[17] 17.Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu.(2017)Scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.Genome Res.: 722.
10.1101/gr.215087.116
Article|CAS|PubMed|Google Scholar

[18] 18.Servant N, Varoquaux N, Lajoie BR, Viara E, Chen CJ, Vert JP, et al. HiC-Pro.(2015)An optimized and flexible pipeline for Hi-C data processing.Genome Biol.: 259.
10.1186/s13059-015-0831-x
Article|CAS|PubMed|Google Scholar

[19] 19.Zhou C, McCarthy SA, Durbin R. Yahs.(2023)Yet another Hi-C scaffolding tool.Bioinformatics.
Article|CAS|PubMed|Google Scholar

[20] 20.Zhang H, Song L, Wang X, Cheng H, Wang C, Meyer CA, et al. Fast alignment and preprocessing of chromatin profiles with chromap. Nat Commun. 2021;12.(2021)org/10.1038/s41467-021-26865-w.: 6566.
10.1038/s41467-021-26865-w
Article|CAS|PubMed|Google Scholar

[21] 21.Durand NC, Robinson JT, Shamim MS, Machol I, Mesirov JP, Lander ES, et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 2016;3.(2016)07.012.: 99.
10.1016/j.cels.2015.07.012
Article|CAS|PubMed|Google Scholar

[22] 22.Xu M, Guo L, Gu S, Wang O, Zhang R, Peters BA, et al. TGS-GapCloser.(2020)A fast and accurate gap closer for large genomes with low coverage of error-prone long reads.Gigascience.
Article|CAS|PubMed|Google Scholar

[23] 23.Huang Y, Wang Z, Schmidt MA, Su H, Xiong L, Zhang J. DEGAP.(2024)Dynamic elongation of a genome assembly path.Brief Bioinform.
Article|CAS|PubMed|Google Scholar

[24] 24.Guan D, McCarthy SA, Wood J, Howe K, Wang Y, Durbin R. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics. 2020;36.(2020)org/10.1093/bioinformatics/btaa025.: 2896.
10.1093/bioinformatics/btaa025
Article|CAS|PubMed|Google Scholar

[25] 25.Breitwieser FP, Baker DN, Salzberg SL. Krakenuniq.(2018)Confident and fast metagenomics classification using unique k-mer counts.Genome Biol.: 198.
10.1186/s13059-018-1568-0
Article|CAS|PubMed|Google Scholar

[26] 26.Hu J, Fan J, Sun Z, Liu S. Nextpolish.(2020)A fast and efficient genome polishing tool for long-read assembly.Bioinformatics.: 2253.
10.1093/bioinformatics/btz891
Article|CAS|PubMed|Google Scholar

[27] 27.Uliano-Silva M, Ferreira JGRN, Krasheninnikova K, Darwin Tree of Life Consortium, Formenti G, Abueg L, et al. MitoHiFi.(2023)A python pipeline for mitochondrial genome assembly from PacBio high fidelity reads.BMC Bioinformatics.: 288.
Article|CAS|PubMed|Google Scholar

[28] 28.Li D, Liu CM, Luo R, Sadakane K, Lam TW. MEGAHIT.(2015)An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de bruijn graph.Bioinformatics.: 1674.
10.1093/bioinformatics/btv033
Article|CAS|PubMed|Google Scholar

[29] 29.Yuan Y, Bayer PE, Lee HT, Edwards D. runBNG.(2017)A software package for bionano genomic analysis on the command line.Bioinformatics.: 3107.
10.1093/bioinformatics/btx366
Article|CAS|PubMed|Google Scholar

[30] 30.Hu J, Wang Z, Liang F, Liu SL, Ye K, Wang DP. NextPolish2.(2024)A repeat-aware polishing tool for genomes assembled using HiFi long reads.Genom Proteom Bioinf.
Article|CAS|PubMed|Google Scholar

[31] 31.Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, et al. Pilon.(2014)An integrated tool for comprehensive microbial variant detection and genome assembly improvement.PLoS ONE.
10.1371/journal.pone.0112963
Article|CAS|PubMed|Google Scholar

[32] 32.Manni M, Berkeley MR, Seppey M, Zdobnov EM. BUSCO.(2021)Assessing genomic data quality and beyond.Curr Protoc.
10.1002/cpz1.323
Article|CAS|PubMed|Google Scholar

[33] 33.Rhie A, Walenz BP, Koren S, Phillippy AM. Merqury.(2020)Reference-free quality, completeness, and phasing assessment for genome assemblies.Genome Biol.: 245.
10.1186/s13059-020-02134-9
Article|CAS|PubMed|Google Scholar

[34] 34.Lin Y, Ye C, Li X, Chen Q, Wu Y, Zhang F, et al. quarTeT.(2023)A telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification.Hortic Res.
Article|CAS|PubMed|Google Scholar

[35] 35.Tarasov A, Vilella AJ, Cuppen E, Nijman IJ, Prins P. Sambamba.(2015)Fast processing of ngs alignment formats.Bioinformatics.: 2032.
10.1093/bioinformatics/btv098
Article|CAS|PubMed|Google Scholar

[36] 36.Quinlan AR, Hall IM. BEDTools.(2010)A flexible suite of utilities for comparing genomic features.Bioinformatics.: 841.
10.1093/bioinformatics/btq033
Article|CAS|PubMed|Google Scholar

[37] 37.Uno Y, Nishida C, Hata A, Ishishita S, Matsuda Y. Molecular cytogenetic characterization of repetitive sequences comprising centromeric heterochromatin in three anseriformes species. PLoS ONE. 2019;14.(2019)pone.0214028.
10.1371/journal.pone.0214028
Article|CAS|PubMed|Google Scholar

[38] 38.Benson G. Tandem repeats finder.(1999)A program to analyze DNA sequences.Nucleic Acids Res.: 573.
10.1093/nar/27.2.573
Article|CAS|PubMed|Google Scholar

[39] 39.Zhang Y, Chu J, Cheng H, Li H. De novo reconstruction of satellite repeat units from sequence data. Genome Res. 2023;33.(2023)278005.123.: 1994.
10.1101/gr.278005.123
Article|CAS|PubMed|Google Scholar

[40] 40.Vollger MR, Kerpedjiev P, Phillippy AM, Eichler EE. StainedGlass.(2022)Interactive visualization of massive tandem repeat structures with identity heatmaps.Bioinformatics.: 2049.
10.1093/bioinformatics/btac018
Article|CAS|PubMed|Google Scholar

[41] 41.Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and hisat-genotype. Nat Biotechnol. 2019;37.(2019)org/10.1038/s41587-019-0201-4.: 907.
10.1038/s41587-019-0201-4
Article|CAS|PubMed|Google Scholar

[42] 42.Shumate A, Wong B, Pertea G, Pertea M. Improved transcriptome assembly using a hybrid of long and short reads with stringtie. PLoS Comput Biol. 2022;18.(2022)pcbi.1009730.
10.1371/journal.pcbi.1009730
Article|CAS|PubMed|Google Scholar

[43] 43.Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 2008;9.(2008)R7. https://doi. org/10.1186/gb-.
10.1186/gb-2008-9-1-r7
Article|CAS|PubMed|Google Scholar

[44] 44.Gabriel L, Bruna T, Hoff KJ, Ebel M, Lomsadze A, Borodovsky M, et al. BRAKER3.(2024)Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS and TSEBRA.Genome Res.: 769.
Article|CAS|PubMed|Google Scholar

[45] 45.Stiehler F, Steinborn M, Scholz S, Dey D, Weber APM, Denton AK. Helixer.(2021)Cross-species gene annotation of large eukaryotic genomes using deep learning.Bioinformatics.: 5291.
10.1093/bioinformatics/btaa1044
Article|CAS|PubMed|Google Scholar

[46] 46.Haas BJ, Zeng Q, Pearson MD, Cuomo CA, Wortman JR. Approaches to fungal genome annotation. Mycology. 2011;2.(2011)118–41. https://doi. org/10.1080/21501203.: 118.
10.1080/21501203.2011.606851
Article|CAS|PubMed|Google Scholar

[47] 47.Huang N, Li H. Compleasm.(2023)A faster and more accurate reimplementation of BUSCO.Bioinformatics.
Article|CAS|PubMed|Google Scholar

[48] 48.Chan PP, Lin BY, Mak AJ, Lowe TM. tRNAscan-SE 2.0.(2021)Improved detection and functional classification of transfer rna genes.Nucleic Acids Res.: 9077.
Article|CAS|PubMed|Google Scholar

[49] 49.Daub J, Eberhardt RY, Tate JG, Burge SW. Rfam.(2015)Annotating families of non-coding rna sequences.Methods Mol Biol.: 349.
10.1007/978-1-4939-2291-8_22
Article|CAS|PubMed|Google Scholar

[50] 50.Nawrocki EP, Eddy SR. Infernal 1.1.(2013)100-fold faster rna homology searches.Bioinformatics.: 100-.
Article|CAS|PubMed|Google Scholar

[51] 51.Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, et al. Repeatmodeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci U S A. 2020;117.(2020)1073/pnas.1921046117.: 9451.
10.1073/pnas.1921046117
Article|CAS|PubMed|Google Scholar

[52] 52.Tarailo-Graovac M, Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinform. 2009;Chapter 4.(2009)1002/0471250953.bi0410s25.: 4.
Article|CAS|PubMed|Google Scholar

[53] 53.Peona V, Blom MPK, Xu L, Burri R, Sullivan S, Bunikis I, et al. Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise. Mol Ecol Resour. 2021;21.(2021)1111/1755-0998.13252.: 263.
10.1111/1755-0998.13252
Article|CAS|PubMed|Google Scholar

[54] 54.Panta M, Mishra A, Hoque MT, Atallah J. ClassifyTE.(2021)A stacking-based prediction of hierarchical classification of transposable elements.Bioinformatics.: 2529.
10.1093/bioinformatics/btab146
Article|CAS|PubMed|Google Scholar

[55] 55.Wlodzimierz P, Hong M, Henderson IR. TRASH.(2023)Tandem repeat annotation and structural hierarchy.Bioinformatics.
Article|CAS|PubMed|Google Scholar

[56] 56.Loman NJ, Quick J, Simpson JT. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat Methods. 2015;12.(2015)1038/nmeth.3444.: 733.
10.1038/nmeth.3444
Article|CAS|PubMed|Google Scholar

[57] 57.Yin ZT, Zhu F, Lin FB, Jia T, Wang Z, Sun DT, et al. Revisiting avian “missing” genes from de novo assembled transcripts. BMC Genomics. 2019;20.(2019)org/10.1186/s12864-018-5407-1.: 4.
10.1186/s12864-018-5407-1
Article|CAS|PubMed|Google Scholar

[58] 58.Steinegger M, Soding J. Mmseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol. 2017;35.(2017)1038/nbt.3988.: 1026.
10.1038/nbt.3988
Article|CAS|PubMed|Google Scholar

[59] 59.Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024;630.(2024)org/10.1038/s41586-024-07487-w.: 493.
10.1038/s41586-024-07487-w
Article|CAS|PubMed|Google Scholar

[60] 60.Tang H, Krishnakumar V, Zeng X, Xu Z, Taranto A, Lomas JS, et al. JCVI.(2024)A versatile toolkit for comparative genomics analysis.iMeta.
Article|CAS|PubMed|Google Scholar

[61] 61.Hu J, Song L, Ning M, Niu X, Han M, Gao C, et al. A new chromosome-scale duck genome shows a major histocompatibility complex with several expanded multigene families. BMC Biol. 2024;22.(2024)org/10.1186/s12915-024-01817-0.: 31.
10.1186/s12915-024-01817-0
Article|CAS|PubMed|Google Scholar

[62] 62.Shang WH, Hori T, Toyoda A, Kato J, Popendorf K, Sakakibara Y, et al. Chickens possess centromeres with both extended tandem repeats and short non-tandem-repetitive sequences. Genome Res. 2010;20.(2010)106245.110.: 1219.
10.1101/gr.106245.110
Article|CAS|PubMed|Google Scholar

[63] 63.Deryusheva S, Krasikova A, Kulikova T, Gaginskaya E. Tandem 41-bp repeats in chicken and japanese quail genomes.(2007)Fish mapping and transcription analysis on lampbrush chromosomes.Chromosoma.: 519.
10.1007/s00412-007-0117-5
Article|CAS|PubMed|Google Scholar

[64] 64.Barrett LW, Fletcher S, Wilton SD. Regulation of eukaryotic gene expression by the untranslated gene regions and other non-coding elements. Cell Mol Life Sci. 2012;69.(2012)org/10.1007/s00018-012-0990-9.: 3613.
10.1007/s00018-012-0990-9
Article|CAS|PubMed|Google Scholar

[65] 65.Borst SE. The role of TNF-alpha in insulin resistance. Endocrine. 2004;23.(2004)org/10.1385/ENDO:23:2-3:177.: 177.
10.1385/ENDO:23:2-3:177
Article|CAS|PubMed|Google Scholar

[66] 66.Kalliolias GD, Ivashkiv LB. TNF biology, pathogenic mechanisms and emerging therapeutic strategies. Nat Rev Rheumatol. 2016;12.(2016)2015.169.: 49.
10.1038/nrrheum.2015.169
Article|CAS|PubMed|Google Scholar

[67] 67.Akash MSH, Rehman K, Liaqat A. Tumor Necrosis Factor-Alpha.(2018)Role in development of insulin resistance and pathogenesis of type 2 diabetes mellitus.J Cell Biochem.: 105.
10.1002/jcb.26174
Article|CAS|PubMed|Google Scholar

[68] 68.Czabotar PE, Lee EF, Thompson GV, Wardak AZ, Fairlie WD, Colman PM. Mutation to Bax beyond the BH3 domain disrupts interactions with pro-survival proteins and promotes apoptosis. J Biol Chem. 2011;286.(2011)M110.161281.: 7123.
10.1074/jbc.M110.161281
Article|CAS|PubMed|Google Scholar

[69] 69.Ferreira VP, Cortes C, Pangburn MK. Native polymeric forms of properdin selectively bind to targets and promote activation of the alternative pathway of complement. Immunobiology. 2010;215.(2010)932–40. https://doi. org/10. 1016/j.imbio.: 932.
10.1016/j.imbio.2010.02.002
Article|CAS|PubMed|Google Scholar

[70] 70.Nicholls C, Li H, Liu JP. Gapdh.(2012)A common enzyme with uncommon functions.Clin Exp Pharmacol Physiol.: 674.
10.1111/j.1440-1681.2011.05599.x
Article|CAS|PubMed|Google Scholar

[71] 71.Huang Z, Xu Z, Bai H, Huang Y, Kang N, Ding X, et al. Evolutionary analysis of a complete chicken genome. Proc Natl Acad Sci U S A. 2023;120.(2023)1073/pnas.2216641120.
10.1073/pnas.2216641120
Article|CAS|PubMed|Google Scholar

[72] 72.Peona V, Palacios-Gimenez OM, Blommaert J, Liu J, Haryoko T, Jonsson KA, et al. The avian W chromosome is a refugium for endogenous retroviruses with likely effects on female-biased mutational load and genetic incompatibilities. Philos Trans R Soc Lond B Biol Sci. 2021;376.(2021)2020.0186.: 20200186.
10.1098/rstb.2020.0186
Article|CAS|PubMed|Google Scholar

[73] 73.Schmid M, Steinlein C. The hypermethylated regions in avian chromosomes. Cytogenet Genome Res. 2017;151.(2017)org/10.1159/000464268.: 216.
10.1159/000464268
Article|CAS|PubMed|Google Scholar

[74] 74.Krasikova AV, Kulikova TV. Distribution of heterochromatin markers in lampbrush chromosomes in birds. Russ J Genet. 2017;53.(2017)org/10.1134/S1022795417090071.: 1022.
10.1134/S1022795417090071
Article|CAS|PubMed|Google Scholar

[75] 75.Friedman-Einat M, Seroussi E. Avian leptin.(2019)Bird’s-eye view of the evolution of vertebrate energy-balance control.Trends Endocrinol Metab.: 819.
10.1016/j.tem.2019.07.007
Article|CAS|PubMed|Google Scholar

[76] 76.International Chicken Genome Sequencing C. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature. 2004;432.(2004)org/10.1038/nature03154.: 695.
Article|CAS|PubMed|Google Scholar

[77] 77.Tregaskes CA, Kaufman J. Chickens as a simple system for scientific discovery.(2021)The example of the mhc.Mol Immunol.: 12.
10.1016/j.molimm.2021.03.019
Article|CAS|PubMed|Google Scholar

[78] 78.Bean AGD, Baker ML, Stewart CR, Cowled C, Deffrasnes C, Wang LF, et al. Studying immunity to zoonotic diseases in the natural host - keeping it real. Nat Rev Immunol. 2013;13.(2013)org/10.1038/nri3551.: 851.
10.1038/nri3551
Article|CAS|PubMed|Google Scholar

Journal of Animal Science and Biotechnology

Near telomere-to-telomere genome assemblies of SilkieGallus gallusand MallardAnas platyrhynchosrestored the structure of chromosomes and “missing” genes in birds

Abstract

Background

Results

Conclusions

Keywords

Background

Methods

Sample collection and sequencing

Genome assembly and assessment

Centromere and telomere identification

Genome structure prediction and annotation

Identification of noncoding RNA genes

Annotation of repeats and transposable elements

DNA methylome analysis

Strategy to identify missing genes

Results

Near telomere-to-telomere genome assembly and completeness evaluation

Annotation of repetitive elements, noncoding RNAs, and protein-coding genes

Differences between macro- and microchromosomes and diverse centromere types

Highly heterochromatic W chromosomes serve as refuges for ERV accumulation

Relatively high methylation levels of sex chromosomes, microchromosomes, centromeres, and telomeres

Recovery of “missing genes” from the newly assembled genomes

Discussion

Conclusion

Data Availability

Abbreviations

References

Acknowledgements

Funding

Ethics Declaration

Rights and Permissions