Integrated Full-Length Transcriptome and RNA-Seq to Identify Immune System Genes from the Skin of Sperm Whale (Physeter macrocephalus)

Cetaceans are a group of secondary aquatic mammals whose ancestors returned to the ocean from land, and during evolution, their immune systems adapted to the aquatic environment. Their skin, as the primary barrier to environmental pathogens, supposedly evolved to adapt to a new living environment. However, the immune system in the skin of cetaceans and the associated molecular mechanisms are still largely unknown. To better understand the immune system, we extracted RNA from the sperm whale’s (Physeter macrocephalus) skin and performed PacBio full-length sequencing and RNA-seq sequencing. We obtained a total of 96,350 full-length transcripts with an average length of 1705 bp and detected 5150 genes that were associated with 21 immune-related pathways by gene annotation enrichment analysis. Moreover, we found 89 encoding genes corresponding to 33 proteins were annotated in the NOD-like receptor (NLR)-signaling pathway, including NOD1, NOD2, RIP2, and NF-κB genes, which were discussed in detail and predicted to play essential roles in the immune system of the sperm whale. Furthermore, NOD1 was highly conservative during evolution by the sequence comparison and phylogenetic tree. These results provide new information about the immune system in the skin of cetaceans, as well as the evolution of immune-related genes.


Introduction
Cetaceans (whales, dolphins, and porpoise) are a group of secondary aquatic mammals whose ancestors returned to the ocean from land and then gradually evolved into the dominant group of marine mammals approximately 53 million years ago [1], which is one of the most dramatic events in the history of biological evolution. When cetaceans entered the ocean from land and rapidly radiated in different waters around the world, their immune systems were likely attacked by many kinds of pathogenic microorganisms in different environments. Therefore, cetaceans may be ideal models for studying the evolution process of vertebrate immune genes and related driving mechanisms. Sperm whale (P. macrocephalus) belongs to the first branching lineage of extant Odontoceti, family

Animal and Tissue Collection
The sperm whale (P. macrocephalus) used in this study was discovered alive and trapped in fishing nets in the waters of Daya Bay (Guangdong, China) on 12 March 2017. It died on 15 March after an approximately 72 h rescue attempt. It has been recognized that the most likely cause of death for this sperm whale was prolonged entanglement leading to malnutrition/starvation, exhaustion, dehydration, and death [28]. The dead animal was salvaged and necropsied according to standard protocols. It was about 10.78 m in length and 14.18 t in weight, and, after an anatomical investigation, it was found to be a female sperm whale with a fetus. This sperm whale was necropsied just after death. The skin of the female was collected with a stainless steel scalpel during the necropsy and preserved in 1 mL RNA-later solution (Applied Biosystems, Warrington, UK) and stored at 4 • C for 24 h and then transferred to −80 • C until RNA extraction.

RNA Extraction and Quantification
Total RNA was isolated from a piece of skin tissue from the sperm whale's tail using TRIzolTM reagent (Thermo Fisher Scientific, Waltham, MA, USA) according to the manufacturer's instruction. The schematic diagram of the collected skin tissue sample from the sperm whale in Figure 1A,B. The skin tissue sample contained epidermis and dermis without hypodermis and blubber. We used an RNA 6000 nano kit and an Agilent 2100 bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) to evaluate RNA quality. The RNA quality criteria for the RNA samples were RIN ≥ 7.0, or RIN was close to 7.0, and 1.8 < OD260/280 < 2.2. The qualified total RNA was further used for Illumina library construction, PacBio iso-seq library construction, and subsequent analysis, respectively.

PacBio Iso-Seq Library Preparation and Sequencing
Optional poly-A selection was performed on the qualified total RNA, and the poly (A) RNA was separated with PuristTM Kit (Ambion, Inc, Austin, TX, USA) according to the manufacturer's instructions. Subsequently, the poly (A) RNA was reverse transcribed using a Clontech SMARTer PCR cDNA synthesis kit (Clontech, Mountain View, CA, USA) to synthesize the first-strand cDNA. After PCR optimization, a large-scale PCR was performed to synthesize second-strand cDNA. To ensure that the long low-abundance transcriptome fragments can be sequenced adequately, we used the BluePippintm system (Sage Science, Beverly, MA, USA) to select the size of the cDNA fragments. Then, the >4 kb cDNA fragments were mixed with the no size selection library to form the combined SMRTbell library that was used for single-molecule real-time (SMRT) sequencing on a PacBio Sequel system.

RNA-seq Library Preparation and Sequencing
Magnetic beads with oligo (dT) were used to enrich the poly(A) mRNA to process the total RNA; an appropriate amount of interrupting reagent was added to the obtained mRNA at high temperature. We used the interrupted mRNA fragment as a template to synthesize the first-strand cDNA, then configured the second-strand synthesis reaction system to synthesize the second-strand cDNA. After this, we used Ampure XP beads to purify and recover the second-strand cDNA, repaired the sticky ends, and added the "A" base to the 3 end of the cDNA. The adapter was connected to the cDNA, fragment size was selected, and PCR amplification was performed following the manufacturer's instructions. Subsequently, quality inspection on the constructed library was performed using the Agilent 2100 Bioanalyzer and ABI StepOnePlus real-time PCR system. Sequencing was performed on the Illumina HiSeq X Ten.

Bioinformatics Analysis of PacBio Data and RNA-seq Data
The raw full-length Iso-seq data were processed following the Iso-seq standard protocol (SMRT Analysis 2.3) on the PacBio SMRT sequencing platform. In our project, The RNA that was extracted from the skin sample was sequenced on the PacBio Sequel platform, and PacBio Iso-seq libraries (0-5 kb) were constructed. During the sequencing reaction, the data that retained the inserts of the sense strand and antisense strand, the 3 and 5 , joints and the SMRT linker sequence were called polymerase reads, which were subsequently delinked, sequenced repeatedly, clustered and corrected, to get different lengths of reads of insert (ROIs). After processing ROIs under the parameter settings, the minimum sequence was set to 300 bp, and the phmmer algorithm in primer detection was set to 10. ROIs were classified into full-length non-chimeric (FLnc), non-full-length (nFL), chimeric, and short reads. The ROIs were considered to be full-length reads if 5 and 3 end linkers and polyA were detected simultaneously. The interactive clustering and error correction (ICE) algorithm was used to predict the isoforms of the full-length non-chimeric, assigning it to clusters, and then the Quiver program was used to correct them into consensus if the cluster contains enough full-length and non-full-length coverage. Fragments were considered as high-quality isoforms when the Quiver quality value was >0.95 in the libraries; others were considered as low-quality isoforms. After the sequence clustering and correction, the high-quality sequences in each library were merged, and redundant sequences were removed to obtain the final transcriptome.
We used SOAPnuke, a filtering software developed by Beijing Genomics Institute (BGI), to count RNA-Seq reads. Trimmomatic was used to remove sequences contained in the adaptor, reads with unknown base N content > 5%, and low-quality reads. We used the Trinity system to perform de novo assembly on clean reads (removing PCR duplicates to improve assembly efficiency) and used Tgicl to cluster the assembled transcripts and redundancy to obtain UniGenes [29]. Bowtie2 program was used to align the UniGenes with the full-length transcriptome of sperm whale skin as reference gene sequences, and finally, the RSEM software package was used to calculate the expression levels of genes and transcripts [30,31]. Those raw data generated were deposited into the National Center for Biotechnology Information (NCBI) database under the accession number SRR13038369 (full-length transcriptome) and SRR13024481 (RNA-Seq transcriptome).

Quantification and Annotation of Gene Expression Levels
We used the full-length transcripts generated by the SMRT Iso-Seq analysis as reference sequences, and further combined with the short read dataset yielded from the Illumina sequencing platform to compare analysis by using Bowtie2 [26], then analyzed the expression level of UniGenes in the transcriptome of the sperm whale skin, which could be used for certain immune genes quantitative investigation. Briefly, the RESM software was used to calculate the expression levels of the UniGene expression values; all clean data generated by the Illumina sequencing platform were mapped back into the full-length transcripts as reference sequences to obtain the read count values of the isoforms in the skin transcriptome. To eliminate the influence caused by the difference in sequencing depth and transcript length, we converted all the read count values into FPKM values (the number of reads per kilobase length per million reads in the transcript) to calculate the expression of each isoform in the skin sample and then used the RPKM software for normalization [32].
To obtain a comprehensive functional annotation from the skin transcriptome of the sperm whale, we use Blast [33], Blast2GO [34], and InterProScan5 [35], after clustering and correcting to perform functional annotation with all transcriptome against seven different public protein and nucleotide databases, including the National Center for Biotechnology Information (NCBI) nonredundant protein sequence (Nr), NCBI nonredundant nucleotide sequence (NT), clusters of orthologous groups for complete eukaryotic genomes (KOG), Swiss-Prot, InterPro, Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases.

Sequence Analysis
TransDecoder [36] software was used to identify the candidate coding region in the transcripts. The longest open reading frames (ORFs) were extracted, and then the predicted coding sequences were verified by BLAST searches against the Swiss-Port database and by Hmmscan searches of the Pfam database to detect homologous protein sequences based on sequence similarity. The sperm whale NOD1 amino acid sequences were produced as a part of the PacBio full-length sequencing, which was uploaded into the NCBI database, the longest sequence named Isoform_6383 was used for analysis in the transcriptome (Accession ID: SRR13038369). We also retrieved other species' NOD1 amino acid sequences from the NCBI database, whose accession IDs were provided as additional material (Table A1). The average length of those amino acid sequences was 957 aa. We used the MEGA X package to perform multiple sequence alignments on NOD1 of different species, then saved them in Fasta format, and used IQtree2 (IQ-TREE-2.0.6-windows) to construct the phylogenetic tree [37]. We chose the JTT + G4 model to construct the tree according to the Bayesian information criterion (BIC). Moreover, then the iTOL tool (https://itol.embl.de/ (accessed on 4 February 2021)) was used to display, modify, and beautify the phylogenetic tree. Meanwhile, we also used the simple modular architecture research tool (SMART) to predict the conserved domains of NOD1 and RIP2 sequences based on their sequence homologies [38] and further confirmed them by BLAST conserved domain prediction.

Full-Length Transcripts from the Skin of Sperm Whale
To analyze the gene expression in the skin of the sperm whale, the extracted mRNA was processed to the full-length transcriptome using the PacBio Sequel platform. A total of 21.5 G subread bases was generated by two SMRT cells from the PacBio library. As a result, a total of 939,114,987 bp circular consensus sequences (CCS) (1,056,247 reads of insert) were obtained with a mean read length of 1776 bp and mean read quality of 0.93. All CCS reads were further classified into four categories, full-length non-chimeric (FLnc), chimeric, non-full-length (nFL), and short reads. In these categories, FLnc reads had the highest proportion, followed by nFL reads (Figure 2A). 187,767 nFL and 289,512 FLnc reads with a mean length of 1429 bp were simultaneously detected in tests containing 5 adaptor sequences, 3 adaptor sequences, and poly (A) tail signals. Only the FLnc and nFLreads were used in further analysis. The FLnc reads were clustered to form a consensus sequence using the ICE algorithm. For each cluster, if the FLnc and nFLreads coverage were sufficient, the Quiver program was run to refine the consensus. As a result, 129,174 high-quality consensus isoforms with accuracies > 99.91% were generated. The high-quality consensus in each library was merged by clustering and error correction, and the final full-length transcript was obtained by eliminating redundancy. A total of 96,350 full-length transcripts were generated with a mean length was 1705 bp, and the N50 length was 1996 bp (Table 1).  On the other hand, we also performed RNA-seq sequencing using RNA extracted from the skin tissue of the sperm whale, which generated 50.61 Mb raw reads. After filtering to remove low-quality data, 43.02 Mb clean reads were obtained. The clean reads were assembled and, after removing redundant reads, 35,982 UniGene with an N50 length of 2047 bp, GC content of 54.14%, and an average length of 1105 bp was obtained. The length of the specific distribution is shown in Table 2. The benchmarking universal single-copy orthologs (BUSCO) was used to evaluate the integrity of the transcriptome generated by Illumine sequencing [39]. The complete BUSCOs containing single-copy (S) and duplicated BUSCOs (D) accounted for 60.07% of all BUSCOs (Figure 3), and the fragment and missing occupied 15.51% and 24.42%, respectively. It suggested that the integrity of the transcriptome was reliable.

Annotations and Analysis of Full-Length Transcriptome
Among the 96,350 full-length transcripts, 95,230 (98.84%) were annotated by homology searches against seven databases ( Figure 2B). Among them, it was found that the annotation rates in Nr, NT, Swiss-Prot, KEGG, and KOG were relatively high, exceeding 50%, respectively. Through the functional annotation of genes and KEGG pathways analysis ( Figure 4A), 58,661 transcripts were identified in the annotation pathways, and 342 different pathways were constructed, including chemokine-signaling pathway, cell pathway molecules, leukocyte transendothelial migration, and other pathways related to the metabolism of the immune system. The full-length transcripts were assigned to multiple KEGG pathways and KOG categories. The main representative pathways, such as focal adhesion, regulation of actin cytoskeleton, endocytosis, and phagosome, all belonging to the cellular processes, were classified into the level 1 process of KEGG categories. Moreover, those pathways were associated with molecular functions, including cell motility, cell communication, intracellular trafficking, secretion, and vesicular transport ( Figures 4A and A1, in Appendix A), suggesting that multiple cellular events were active in the skin of the sperm whale.

Enrichment of Immune-Related Pathways in the Skin of Sperm Whale
Because skin plays an important role in the regulation of the immune system, many transcripts of the sperm whale were annotated in immune-related pathways in the KEGG database, and then the immune-related pathways were analyzed., A total of 34,660 transcripts were annotated in KEGG pathways, and 20,476 transcripts overlapped in all KEGG pathways were annotated. Moreover, we also found that the top five immune pathways with the largest number of transcripts among the 21 immune pathways did not have overlapped transcripts ( Figure A2, in Appendix A). A total of 5150 genes associated with 21 immune-related pathways were enriched ( Figure 4A). Pathways associated with the higher number of genes were platelet activation (1035 genes) and leukocyte transendothelial migration (905 genes), followed by NOD-like receptor-signaling pathway (875 genes), Fc γ R-mediated phagocytosis (678 genes), chemokine-signaling pathway (643 genes), IL-17-signaling pathway (619 genes), antigen processing and presentation (476 genes), C-type lectin-receptor-signaling pathway (439 genes), Th17 cell differentiation (424 genes) and Toll-like receptor-signaling pathway (410 genes) and so on. These results roughly described the pathways related to immune function annotated in the sperm whale skin tissue.

NOD-Like Receptor Signaling Pathway
Based on the functional annotation and the metabolic pathway analysis of the skin transcriptome, 33 proteins coding genes related to the NOD-like receptor-signaling pathway in the skin of sperm whales were discovered, and the FPKM value relevant to each transcript was also listed (Table A2, in Appendix A). Combined with the analysis of KEGG comparison results, the high-quality transcript sequences involved in the NOD-like receptor-signaling pathway were blasted on NCBI, and it was found that these transcripts have a high degree of similarity with the corresponding protein-coding genes in the three toothed whale species (Table A2, in Appendix A). Meanwhile, based on the results of transcriptome annotation, we speculated that there was a NOD-like receptor-signaling pathway in the sperm whale ( Figure 5). There were 33 protein-coding genes that were annotated in the NLR-signaling pathway in the sperm whale (Table A2, in Appendix A). After the NOD1 or NOD2 interacted with the bacterial PGN, RIP2 located downstream were further recruited and activated by interacting with the homotypic caspase recruitment domains (CARD) [40,41]. Subsequently, the IKK complex composed of Ikkβ, IKKα, and NEMO was indirectly activated, leading to degradation of the NF-κB inhibitor IkBα, translocation of NF-κB into the nucleus, and inducing the transcription of chemokines and other substances. In addition to activating the NF-κB pathway, NOD2 stimulation can also lead to the activation of MAPKs (ERK, JNK, P38). Both NF-κB and MAPK pathways can induce the secretion of proinflammatory cytokines and chemokines and the production of antimicrobial peptides. Furthermore, the NOD2 receptor can recognize single-strand RNA (ssRNA), activate IRF3 indirectly through the mitochondrial antiviral-signaling protein (MAVS) signal, leading to the generation of IFNα, an of type I IFNs.

Structural and Evolutional Analysis of NOD1 Gene in Sperm Whale
As for the sperm whale, in terms of the encoded structural-functional domains of the NOD1 gene (Figure 6), the NOD1 genes from different species share many common structural-features, all possessing C-terminal leucine-rich repeat (LRR) domains and a central nucleotide-binding-oligomerization domain (NBD domain), which is NACHT domain, and N-terminal CARD domains.
The NOD1 amino acid sequences were aligned with 30 different species from five classes, including Mammalia, Aves, Reptilia, Amphibia, and Actinopterygii, to construct the phylogenetic tree. The NOD1 sequence of the sperm whale clustered with orthologous sequences of other cetacean species to form a group in which the sperm whale sequence's NOD1 was closest to that of the Yangtze river dolphin (Lipotes vexillifer) (Figure 7). As we expected, compared with the other species among all the five classes, the cetacean NOD1 protein orthologous formed a cluster and was closely related to its terrestrial relative water buffalo (Bubalus bubalis) in order Artiodactyla.

Discussion
In the process of cetaceans returning from land to the sea, the change of pathogenic microorganisms poses a severe challenge to their survival and may drive the evolution and adaptation of immune genes [42]. Skin is not only one of the gateways between cetacean and the external water environment but also the first barrier for defense and plays an important role in defending against pathogens [43]. The dominant cell type in the cetacean epidermis is lipokeratinocyte, which helps the mechanical strength, buoyancy, and insulation of cetacean skin [44]. In addition to as the physical barrier, cetacean skin can also detoxify chemicals that pass through the stratum corneum by xenobiotic pathways [45]. Moreover, the skin is also an immune organ. It has shown that proinflammatory cytokines can induce the production of β-defensins, which may serve as a nonspecific defense against bacteria, fungi, and algae [46]. By analyzing the selective pressure of immune-related genes involved in the TLR-signaling pathway in cetacean's innate immune systems, it had found that those genes in the TLR-signaling pathway were under selective pressure, suggesting the cetacean's immune system has adapted to the pathogenic microorganisms during their transition from the terrestrial to the marine ecosystem [20]. However, the molecular pathway of the innate immune remains unknown in sperm whales. In this study, the short reads sequenced by RNA-Seq and the full-length transcripts generated by PacBio Iso-Seq of the sperm whale's skin were obtained and used to investigate immune-related pathways in sperm whale skin tissue. The proportion of the complete BUSCOs containing a single-copy (S) and duplicated BUSCO (D) was more than 60% of the total BUSCO, which was similar to other studies [47][48][49]. Moreover, many genes generated by full-length transcriptome were annotated with protein-related pathways, such as transcription, amino acid transport, and metabolism ( Figure A1; in Appendix A), indicating the participation of these biological processes could be the basis of the protein biosynthesis and secretion of the sperm whale skin. Those data would provide a certain data basis for the future research of sperm whale immune research and metabolic pathways.
Because there are more pathogenic organisms such as bacteria, fungi, and parasites in aquatic environments than on land [9,10], the immune mechanisms or immune substances of aquatic organisms have recently attracted many interests [50][51][52]. As the first line of defense, the innate immune system can detect and remove harmful microbes. Although aquatic organisms are more exposed to microorganisms than land organisms, they may develop a series of strategies to deal with huge environmental pressure. For example, fish majorly relied on innate immunity during their developmental stages [53].
As a group of aquatic mammals, cetaceans may have also evolved a better innate immune system. Innate immunity involves a family of proteins recognizing the microbial called pattern recognition receptors (PRRs) [54]. As the first class of cellular PRRs to be identified, TLRs were a class of extracellular transmembrane PRRs [55]. They have been studied widely in many animals, including fish, chicken, and humans [53,56], even in cetaceans [57]. The NLRs (NOD-like proteins) are important intracellular cytoplasmic PRRs that have been investigated in immunity, inflammation, and disease. In previous studies, the roles of the NLRs family in immune mechanisms had been well studied in humans, mice, and fish [52,58,59]; however, the NLR pathway has not been reported in any cetaceans until now.
In this study, based on the gene annotation and the metabolic pathway analysis of the skin transcriptome, the coding genes annotated the NLR-signaling pathway were blasted on NCBI, the transcripts were found to have very high consistency with sequence from the sperm whale RefSeq genome. Meanwhile, the NLR-signaling pathway was studied in the other species (human, mice, and fish) [53,60,61], it was speculated that the NLR-signaling pathway was extremely likely to be annotated in the sperm whale. The NOD1/NOD2 recognized the bacterial PGN and then transmitted and stimulated immune signals, which generated several central immune factors, immune substances, and antimicrobial peptides that played significant roles in the body's immune responses ( Figure 5). Unlike in the other mammals studied, in the sperm whale skin, NOD2 was not annotated with the NOD-like receptor-signaling pathway in our sample, maybe because of the low expression. Otherwise, the NOD2 expression was commonly found in the intestine, lung, and oral cavity [22].
In the NOD-like receptor-signaling pathway, NOD1 and NOD2 were both the important proteins that recognized PGNs through their LRR domains [62]. Bacterial PGN was associated with NOD1 and NOD2, then would recruit and interact with RIP2, leading to downstream-signaling events that eventually induced the NF-κB and MAP kinase activation ( Figure 6). NOD2 cannot only recognize bacterial PGN (MDP) but also combine with single-stranded RNA (ssRNA) to trigger a series of immune-signaling pathways and produce some immune factors to resist the invasion of pathogens [63,64] (Figure 5). NOD1 and NOD2 can also affect the abundance levels of immune substances such as chemokines and cytokines, which are important mediators in the immune response, and their abundance levels directly affected the occurrence and regulation of immune effects [65].
RIP2 is a specific serine/threonine kinase activity that is a member of the RIP family [66,67]. From our annotation, RIP2 (Isoform_25948, Accession ID: SRR13038369) in the sperm whale contained an N-terminal kinase domain and a CARD domain that could interact with other proteins containing CARD domain ( Figure A3, in Appendix A), such as NOD1 or NOD2 [40,41]. As shown in Table A2 (in Appendix A), the FPKM expression of RIP2 was about 0.71-1.52. RIP2 has been shown to play a vital role in NOD1-and NOD2-mediated innate immune responses. Moreover, RIP2 deficiency was closely related to cellular-signaling and cytokine responses in the NLR-signaling pathway [68][69][70]. The responses of RIP2-deficient and kinase-dead mice to stimulation with the NOD1 and NOD2 ligands in vitro, which suggested that neither the NLR-mediated inflammatory chemokines nor cytokines could be yield normally in the absence of RIP2 or loss of RIP2 kinase activity, further demonstrating the importance of RIP2 kinase in maintaining the inflammatory immune response [71]. Moreover, RIP2 overexpression was shown to mediate the phosphorylation and activation of TAK1, which is involved in the NLR-mediated-signaling pathway [72].
NF-κB is a highly conserved transcription factor in evolution, which is widely present in various tissues. Many pathogens and viruses can activate NF-κB, indicating that NF-κB is an evolutionarily conserved mediator of immune and inflammatory [73,74]. The NF-κB family is mainly composed of five family members: c-REL, RELB, NF-κB1 (p105/p50), NF-κB2 (p100/p52) and RELA (p65) [75]. NF-κB proteins are a type of homologous/heterologous nuclear transcription factor, which is formed by members of the NF-κB family and plays an important role in cells. It has shown that p50 and p65 can form heterodimer NF-κB existing in the human placenta [76]. In our transcriptome, the genes encoding RELA and NFKB1 proteins were annotated, their FPKM expressions ranges are about 0.08-10.86, 1.34-3.73, respectively (Table A2, in Appendix A), and by blasting, the RELA is the p65 subunit of NF-κB, and the NFKB1 is p105, which plays the role as a precursor for p50 and can be translated into p50 [77]. We speculated that NF-κB might also exist in the skin of the sperm whale as a heterodimer composed of p65-p50, which was worthy of further verification in the future. NF-κB played an essential role in the immune response. Most of the NF-κB dimers can bind to the IKB inhibitor that remains in the cytoplasm before the IKK complex activated IKB phosphorylation [78]. Intestinal invading bacteria can activate NF-κB in human intestinal epithelial cells, leading to the production of inflammatory factors, such as IL-8, MCP-1, and TNF-α that are vital components of the inflammatory, immune, and stress response [79]. In addition to the NLR-signaling pathway, NF-κB also plays a similar key role in TNF-signaling pathways by producing important-signaling molecules to mediate cell proliferation and death [80].
Most NOD-family members have a tripartite domain architecture comprising an Nterminal effector-binding domain (EBD), a centrally located NOD domain, and a C-terminal ligand-recognition domain (LRD). The EBD interacts with downstream effector molecules following activation of the-signaling cascade. In the NOD proteins, the EBD of NOD proteins is structurally variable and divided mainly into the caspase recruitment (CARD) domain and the pyrin (PYD) domain [81]. The EBD in the NOD1 proteins is a CARD domain. The centrally located NOD domain is responsible for ATPase activity and induces self-oligomerization, and which is classified as a NACHT domain in NOD1. The LRR domain recognized exogenous and endogenous ligands. When the small peptides derived from PGNs were released into the cytoplasm, they would be recognized and bound by the LRR domain of NOD1 [69], then NOD1 initiated RIP2 through CARD-CARD domain interactions [40] and subsequently mediated the activation of NF-κB ( Figure 5).
In this study, the NOD1 structure of the sperm whale and other different species were compared. The amino acid sequence of NOD1 from the sperm whale was highly similar to the other orthologous sequences of other vertebrates, especially the highly conserved CARD, NATCH, and LRR domains ( Figure 6). The structure of the NOD1 domains was very similar in the cetacean species and also similar to the NOD1 domains in Bos taurus in order Artiodactyla. In particular, there were only a few differences that the number of leucine-rich repeats in the LRR domains of the sperm whale NOD1 was one less than the number in B. taurus and Ovis aries ( Figure 6). We also used the NOD1 full-length sequence of the sperm whale to construct a phylogenetic tree (Figure 7). The sperm whale NOD1 protein formed a cluster with the orthologous proteins of other cetaceans and was closely related to its terrestrial relative B. bubalis in order Artiodactyla. From the structure and phylogenetic analysis, we presume the function of NOD1 is conserved in the sperm whale and even in other cetaceans, implying it may play a vital role in the immune process of the sperm whale.

Conclusions
A total of 96,350 full-length transcripts with an average length of 1705 bp were generated, and 5150 genes were detected to associate with 21 immune-related pathways by gene annotation enrichment analysis. Moreover, 89 encoding genes corresponding to 33 proteins related to the NOD-like receptor-signaling pathway of innate immunity were discovered in the skin of the sperm whale. By sequence comparison analysis, it revealed that these proteins coding genes have a high consistency with the sequences that come from a RefSeq genome of the sperm whale, which was a deduced transcriptome in NCBI, and those were also highly consistent with the sequences of L.vexillifer and T. truncatus. We also found that NOD1, RIP2, and NF-κB existed in the NOD-like receptor-signaling pathway, which was proved to play important roles in resisting the invasion of pathogens in many other species. Furthermore, we speculated that the function and domains of NOD1 of the sperm whale were highly conserved by structure and phylogenetic analysis with orthologs in other vertebrates. Our results provide information about the NLR-signaling pathway in the skin of the sperm whale and deepen our understanding of the innate immune process of the sperm whale, a secondarily adapted marine mammal.