Understanding Animal Evolution: The Added Value of Sponge Transcriptomics and Genomics

Sponges are important but often‐neglected organisms. The absence of classical animal traits (nerves, digestive tract, and muscles) makes sponges challenging for non‐specialists to work with and has delayed getting high quality genomic data compared to other invertebrates. Yet analyses of sponge genomes and transcriptomes currently available have radically changed our understanding of animal evolution. Sponges are of prime evolutionary importance as one of the best candidates to form the sister group of all other animals, and genomic data are essential to understand the mechanisms that control animal evolution and diversity. Here we review the most significant outcomes of current genomic and transcriptomic analyses of sponges, and discuss limitations and future directions of sponge transcriptomic and genomic studies.


Introduction
Bilaterians represent the majority of extant animal species and unsurprisingly are highly represented with fully sequenced genomes. There is particular interest, however, in studying non-bilaterian taxa (Placozoa, Cnidaria, Ctenophora, and Porifera) because they hold the key to understanding the origin of major transitions in animal body plans. [1,2] Sequencing and analyzing the genomes of nonbilaterians will help determine the origins of major features of Bilaterians such as axial polarity, symmetry, nervous systems, muscles, and even the origin of germ layers and the gut. Many fewer genomes are available for non-bilaterians, but one of the most poorly represented phyla is also one of the earliest branching of animals, and one with widespread ecological and evolutionary importance: Porifera (sponges).
The first genomes of non-bilaterian species began to appear between 2005 and 2010: the cnidarian Nematostella vectensis, [3] the placozoan Trichoplax adhaerens, [4] the demosponge Amphimedon queenslandica, [5] and the lobate ctenophore Mnemiopsis leydii. [6] And so for the last 8 years it has been possible to compare the genome content and architecture of at least some non-bilaterians with the more readily available genomes of bilaterian species.
But in order to sketch the portrait of the last common ancestor of animals and to trace back the early evolution of molecular, cellular, and morphological characters, knowing the relative phylogenetic position of these non-bilaterian phyla is important. [7] Interest in their evolutionary placement increased after a first phylogenomic study incidentally suggested that ctenophores may have arisen prior to sponges. [8] The debate about which of the two phyla À sponges or ctenophores À is the best candidate to be sister group of all other animals has been fueled ever since. [8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26] While the strength of that debate boosted molecular and cellular studies on these two phyla, still only minimal transcriptomic and genomic data are available from both groups, especially when compared to the data now available for single vertebrate species; for example, genomes of more than 160 dog breeds have been sequenced. [27] Of the two phyla (ctenophores and sponges), Porifera (sponges) are the most diverse with more than 8500 species described (compared to %200 for ctenophores), [2,28] and yet genomes sequenced do not reflect this diversity. According to recent phylogenetic and phylogenomic studies, sponges form a monophyletic group with four classes: Demospongiae, Hexactinellida, Calcarea, and Homoscleromorpha each with very different ecological, embryological, and morphological characteristics ( Table 1). [11,29] Consequently, the acquisition of large genetic datasets in each of these classes is expected to provide insight into the ancestral poriferan genetic toolkit. However, the taxon sampling of transcriptomic and genomic data from sponges is currently quite poor (Figure 1, see also http://www.spongebase. net, resource accessed in April 2018). Indeed, although transcriptomic and partial genomic data exist from individual researchers, those data are unfortunately not always made publicly available. In all, there are only four genome drafts publicly available so far that are complete enough to provide reliable estimates of gene content and genome size: these are from two demosponges Amphimedon queenslandica and Tethya wilhelma, [5,30] one homoscleromorph Oscarella pearsei, [31,32] and one calcareous sponge Sycon ciliatum. [33] Despite this, the data that currently are available have already given rise to several major discoveries that challenged many previous (mis)conceptions. These include, but are not limited to i) that genome sizes of sponges are not particularly small, and so do not correlate with morphological or body plan complexity; ii) that gene content is vastly more complex than our understanding of sponge morphology and function can explain; and iii) that, like other metazoans, sponge lineages too have undergone lineage specific gene diversifications, including significant losses of genes in some classes and duplications in others. Here, we review the major outcomes from comparative analyses of transcriptomic and genomic data from sponges, in large part due to advances in next generation sequencing technology. We also try to evaluate objectively what the best collective strategy for future progress should be. [34,35]

Sponge Genomes Have a Surprisingly High Gene Number
The most extensive estimate of sponge genome size so far reports genomes ranging from about 40 to 810 Mb (0.04-0.81 pg), with a mean estimated genome size of 200 Mb. [36] Thus while sponges have some of the smallest animal genome size reported so far, [36] some species have genome sizes in the same range of protostomes with substantially more complex body plans. For example the Amphimedon queenslandica sponge genome is about 166 Mb and that of another demosponge, Tethya wilhelma, is about 125 Mb [30,37] ; these values are not much different than that of Drosophila melanogaster (175 Mb). [36] Genome size thus appears very variable in demosponges, although unfortunately it is currently unknown what the size of genomes in other sponge classes is, including both the calcareous sponge Sycon ciliatum and the homoscleromorph sponge Oscarella pearsei. The results so far, however, fit the C-paradox concept, since i) different sponges with similar complexity can display very different genome sizes and ii) despite their relative anatomical simplicity, some sponge genomes are not always smaller than those of some invertebrate bilaterians. [38] Little is known about sponge genome organization. According to the most complete study of sponge karyotypes published to date, work that focused on freshwater demosponges only, sponge genomes are thought to be organized on about 23 micro-chromosomes. [39] And in terms of gene content, sponge genomes have between 17 000 and 41 000 genes. [30,[40][41][42][43] The highest estimates are high even compared to the estimated gene number of some bilaterians. [44] This gene number may be overestimated by the accidental inclusion of symbiont or transposable element-coding sequences in draft genomes, and further analysis is needed on a larger set of sponges for a better estimate. Nevertheless, according to recent data on A. queenslandica, sponge genomes may be compact, with gene density at about 9 genes per 50 kb window, and therefore more similar to choanoflagellates than to bilaterians. [37] This may partly explain the relatively high number of genes compared to total genome size. If confirmed, sponges would illustrate the "G-value paradox" that is, a disconnect between gene number and body complexity. [45] It is nevertheless premature to extrapolate from these data to all sponges.

Sponge Lineages Have Different Gene Contents
Among the four sponge classes (Demospongiae, Hexactinellida, Calcarea, and Homoscleromorpha), [11,29] the class Demospongiae has the largest species diversity (Table 1), which at least in part explains why most of the molecular data published so far focus on species of this taxon. Indeed, the first sponge DNA sequence analyses began in the 1980s in various demosponges, [46] and increased in number in the 1990s, [47] based on Sanger-sequencing. With the subsequent development of "next generation sequencing" (NGS) technologies, such as 454 and Illumina, deeper sponge transcriptome sequencing was initiated, and these technological advances significantly aided the accumulation of transcriptome and genome data needed for the comparison of gene content between sponge classes. Figure 1. Phylogenetic relationships among the main sponge taxa according to molecular phylogenetic studies. [29,60,[101][102][103] More complete and updated taxonomy can be found on the Word Porifera Database. [104] The names of species are written as follows: 1) in red, when transcriptomic and genomic analyses are already published (whether the corresponding databases are publicly available or not) [30,33,40,43,83,105] ; 2) in violet, when transcriptome analyses are already published (whether the corresponding databases are publicly available or not) and there is a known Ã on-going genome project [71,106] ; 3) in blue, when transcriptome analyses are already published (whether the corresponding databases are publicly available or not) and there is NO known Ã on-going genome project [13,49,52,55,56,59,107] ; 4) in gray, when there are known Ã on-going transcriptome and/or genome projects but not yet published sequence analyses (except on symbiotic content). Ã According to what was presented at the last World Sponge Conference (Galway, June 2017).
www.advancedsciencenews.com www.bioessays-journal.com The first draft genome of a demosponge, Amphimedon queenslandica, was finally published in 2010, [5] but one might wonder why A. queenslandica was chosen as a model by the Joint Genome Institute for sequencing. As with each early sequencing project, decisions had to be made to choose an appropriate species, but unfortunately no sponge is known that is both widely distributed and/or easy to culture in any lab, and which spawns both eggs and sperm for manipulative work. Although A. queenslandica does not reproduce when kept in laboratory/aquarium culture, at the time it had the most abundant and readily available larvae. Its larvae also showed distinctive polarity and behavior, [48] characteristics that were thought would enable better comparison of the group with other metazoans.
The most exhaustive comparison, in terms of number of genes compared, was carried out on the transcriptomes of eight sponges from all four classes, nevertheless with most data from demosponges. [49] Other studies focused on specific gene families or signaling pathways, [50][51][52][53] and were based on both transcriptomic and genomic comparisons, but still tended to confirm key findings (unexpected diversified gene repertoire in sponges but different gene repertoire between sponge classes).
Even if a single sponge species is often treated as representative of the whole diverse phylum, as it seems to have a similar level of complexity for non-specialists (although there are well-known differences; see Table 1), each class has a distinct set of developmental genes in both gene presence/absence and number of copies ( Figure 2). Even if it is still too early to propose general rules about gene content, a few trends are noticeable. Regardless of the gene family or pathway studied so far, in most cases, glass sponges (Hexactinellida) seem to be the class in which more genes are absent or divergent (note, however, that most studies presently rely on transcriptomes only), while calcareous sponges (Calcarea), and even more so homoscleromorph sponges (Homoscleromorpha), seem to harbor the most complete and conserved gene repertoire relative to bilaterians. [49,[54][55][56] Also, in many cases calcareous sponges are found to have a much higher number of genes in a given gene family. [43,54,[57][58][59] According to our present understanding of relationships among sponges, [11,25] Hexactinellida and Demospongiae are sister groups (Silicea sensu stricto [60,61] ) and Homoscleromorpha and Calcarea are sister groups. [60,61] Therefore, the previously cited findings (absence of genes in glass sponges and numerous gene copies in Calcarea) suggest that glass sponges have undergone gene loss and gene divergence while calcareous sponges underwent polyploidization. In addition, the larger bilaterian orthologous gene repertoire found in calcareous and homoscleromorph sponges may lead to a re-evaluation of the ancestral molecular toolkit of sponges. Importantly, this may suggest that many genes putatively present in the last common ancestor of sponges were secondarily lost in Silicea sensu stricto. [33,50,55,62] This molecular divergence between the four sponge classes ( Figure 2) is perhaps to be expected not only because of their morphological, developmental and ecological differences (Table 1), but also because sponge classes probably diverged about 750 Myr ago. [63] In short, it is important to gather and compare additional genomic and transcriptomic data from all four classes to fully explore sponge genome diversity, to understand the genetic underpinnings of the morphological and developmental differences between classes, and to trace back the evolution of gene families since the divergence of Porifera and other animal lineages.

Gene Families Are Conserved in Sponges, With a Few Key Absences
One of the main and unexpected results since transcriptome and genome data were first obtained from sponges is the diversity of gene families present. Their seemingly simple body plan, without digestive cavity, neurons, and muscles, might have suggested a general absence of some of the main metazoan gene families, but despite differences in gene content between sponge classes, that seems not to be the case.
In particular, sponges possess a large part of the molecular toolkit that is widely understood to be instrumental for the development of most other animals: for example genes encoding for members of the four main signaling pathways (Wnt, TGF beta, Hedgehog, and Notch, Figure 2A), genes needed for maintenance of pluripotent stem cells or sex cell differentiation (Figure 2A), and the main transcription factor families ( Figure 2A). This means, assuming that sponges are the sister group to all other animals, that most of the bilaterian developmental molecular toolkit could have been present in the last common ancestor of all Metazoa. [49,50,64] Nevertheless, one can also notice ( Figure 2) that sponges often have a lower number of gene copies or gene types in a given family compared to cnidarians and bilaterians. This suggests that even if an already diverse gene content was present in the last common ancestor of sponges and other metazoans, numerous duplication and divergence events occurred later on, in the last common ancestor of cnidarians and bilaterians.
The most conspicuously absent genes in these surveys are the well-known Hox genes, which all studies so far have failed to find in any sponge species. Nevertheless, their possible origin through duplication/divergence events was discussed when finding the NK gene clusters in A. queenslandica and of a Cdx parahox member in calcareous sponges. [33,65,66] Also worth noting is the absence of key members of the three other signaling pathways involved in development (Receptor Tyrosine Kinase [RTK]; Jak/STAT; nuclear hormone receptor), suggesting de novo emergence of genes after the divergence of sponges or drastic losses in sponges.
As described for many bilaterians, paralogous genes sometime play redundant or similar roles: for example, Raf replacing Mos in oocyte cytostasis in Caenorhabitis elegans, [67] Boule/DazL in vertebrate gametogenesis, [68] or Drosophila septate junctions engaging different complements of proteins. [69] Therefore caution is needed in interpreting gene content, since gene absence does not necessarily imply absence of function and gene presence does not exclude neofunctionalization.

Wnt Signaling
Until recently the wnt signaling pathway was considered to be instrumental in animal development for the acquisition of multicellularity, body polarity, cell polarity, and in sponges for the patterning of the aquiferous system. [70] The wnt pathway seems to be absent in glass sponges according to transcriptome analyses ( Figure 2A). [49,56,71] According to the presently proposed intra-sponge phylogenetic relationships (see Section 1 and Figure 1), this absence can be interpreted as a secondary loss, because the wnt pathway is found in all other sponge classes. However, despite not having a canonical Wnt pathway, glass sponges nevertheless have a clear body axis at larval and adult stages, clear polarity of the aquiferous system with water entering through the dermal incurrent pores, passing through canals and chambers and exiting through the osculum. This finding calls for a re-evaluation of the previously proposed roles Figure 2. Synthesis of the most complete transcriptomic and genomic comparative studies performed so far on sponges. Only data where comparison between at least 2 classes was possible were included. The most parsimonious hypotheses concerning presence/absence in the Last common ancestor of sponges (LCAS) were provided when possible. A) Key developmental genes. Ligand/receptor of main Signaling pathways (blue square): Wnt, Notch, TGFβ seem complete enough in sponges to predict functionality (here, only the ligand/receptor pairs are mentioned). The case of the hedgehog (Hh) pathway is puzzling because the receptor Patched (Ptc) is present and a gene related to Hedgehog (hedgling) is present in sponges but does not have the diagnostic hog domain; a hedgehog gene sensu stricto is thus considered absent here. [49,52,56,71,105,[108][109][110][111][112] Hippo and Nitric Oxide pathways were also found to be present in Amphimedon queenslandica and thought to be ancestral. [113,114] Diversity and heterogeneity in Homeobox and bHLH transcription factors (TF) content (red square): at present exhaustive comparisons of their diversity were carried out only between demosponges and calcareous sponges. [50,62,65,[115][116][117][118][119] For homeobox genes TF not members of the ANTP (Antennapedia) class are mentioned as "other homeobox" (these are members of the T-box, Sox, Sine, and Smad families). [50] For bHLH TF "others" stands for genes that cannot be assigned to a given group. Ribonucleoproteins involved in the Germline Multipotency Program (GMP) are well conserved in sponges (green square). [59,106,112,120,121] B) Conserved presence of genes involved in epithelial protein complexes: Type IV collagen (blue square) is considered a specific feature of basement membranes (BM). [106,122] Adherens-like junctions (red square) and genes encoding for alpha-catenin, beta-catenin, gamma-catenin, and E-cadherin (together forming the cadherin-catenin complex, CCC, in other animals). [43,84,85,106,123,124] Apical-basal (AB) polarity (green square) and polarity complexes (Par, Scribble, Crumbs). [43,106,125] C) Neurosensory related genes: sponges possess genes encoding various types of receptors (ionotropic/metabotropic glutamate [i/mGluR], GABA, glycine [GLR], triphosphate inositol [IP3R], receptor tyrosine kinases [among which ERbB]), voltage dependent (Àv) ion channels, TRP (transient receptor potential) channels (other than TRP-N type); cryptochromes (CRY) and G protein-coupled receptor (GPCR) but not opsins. [30,88,90,105,106,[126][127][128][129][130][131][132] www.advancedsciencenews.com www.bioessays-journal.com of the wnt signaling pathway in animals. It is also worth noting that Hexactinellida currently is the only metazoan taxon À except for some groups of Myxozoans (parasitic cnidarians) À in which Wnt is absent from the genome. [72] However, both taxa are unusual: glass sponges have a body plan organized as a vast multinucleated syncytial tissue, [73] and myxozoans are highly reduced parasites of only a few cells. [74] Future functional studies are needed to fully understand the roles of Wnt signaling in sponges.

Epithelia
The basal lamina is usually considered as a defining feature of epithelial layers. [75] Nevertheless, a distinct basement membrane and basal lamina are absent in different animal groups: Placozoa, [76] some Ctenophora, [77] most Porifera, [78] and most Acoelomorpha. [79] Since the finding of basal lamina in homoscleromorph sponges and physiological evidence that sponge layers are functionally similar to other epithelia, sponges are now considered to have bona fide epithelia. [78,[80][81][82] Recent transcriptome and genome analyses ( Figure 2B) have also challenged the assumption that animals that lack a basal lamina should also not have genes encoding type IV collagen, one of the main constituents of the basal lamina. However, like placozoans and Acoelomorpha, it is now evident that calcareous sponges possess genes encoding type IV collagen. [77] The same applies to the Cadherin Catenin Complex (CCC). Adherens junctions (AJs) of Bilateria are characterized by interactions between classical E-cadherin and alpha-, beta-, and gamma-catenins. Adherens-like junctions have so far only been properly identified, using electron microscopy, in calcareous and homoscleromorph sponges (Table 1). Nevertheless, all four sponge classes possess all the genes encoding these four proteins. [43,49,83,84,85] Here again, despite the presence of certain genes in sponge genomes, the absence of the expected corresponding structures found in other non-sponge taxa is difficult to explain. Studies are needed to confirm that these proteins can interact as they do in other animals.

Nervous System
One of the main characteristic traits of Porifera, compared to almost all other Metazoa, is the absence of neurons. Other metazoans that do lack nerves are either parasitic, like mesozoans and myxozoans, or thought to be morphologically reduced based on all other aspects of their gene complement, as is the case of placozoans. Although the mechanisms involved are not fully understood, it is well known that sponges are able to react to various stimuli. [86] Despite the absence of nerve cells and synapses in sponges, [87] genes encoding for proteins involved in synapses and signal transduction ( Figure 2C) and genes involved in neuronal patterning in other animals have also been described in sponges. [30,49,[88][89][90][91] The exact role of these genes in sponges remains to be fully characterized, however.
In summary, although many genes are conserved between sponges and other animals, they either do not share the same roles as in other animals or we need new approaches for studying and understanding the structure and ultrastructure of sponges. Genome and transcriptome inventories are a prerequisite, but are not entirely sufficient to fully understand how protein function and their interplay with the phenotype evolved among sponges and how this compares to the other animals. Clearly therefore, future efforts should focus on both acquiring genomes of additional species, but importantly also invest in developing functional techniques to explore the role of genes in sponges where the function currently is not fully understood.

Taxonomic and Holobiome Issues
Because of the limited number of discrete characters in sponges, species determination can sometimes be difficult, especially because of cryptic species or substantial polymorphism between individuals of a same species can occur. [32,[92][93][94] Therefore, one of the first steps before beginning expensive and time consuming sequencing is firm species identification of the specimens in question, and also keeping a voucher of the sequenced specimen in a public collection to enable subsequent studies on the sequenced material. [95] Another issue concerns the sponge microbiome. The association of bacteria with sponges has been known since the first electron microscopy observations were carried out, even though their specificity and functional role are still poorly understood today. [96] Nevertheless, recent metagenomic surveys have greatly enhanced our knowledge of the diversity of bacterial and archaeal species associated with sponges. [97] The taxonomic assignment of microbial sequences and their separation from the sponge genome is a critical step in genome analyses. The relative GC content is often used to separate their sequencing reads, but some ambiguities in the evaluation of the number of sponge genes/ proteins can persist until a high quality assembly is achieved. [30,98] One of the current approaches is to use long-read technologies (such as PacBio or Oxford Nanopore) in the hope of avoiding difficulties faced with assembling short reads (Illumina). Another problem is high molecular weight DNA extraction, because for often unknown reasons DNA fragmentation occurs, resulting frequently in fragments <20 kb. One possibility is that this degradation may sometimes be caused by the activity of microbial nucleases, and so methods to remove all or part of symbionts before extracting DNA should be more deeply explored.

Divergence and Annotation Issues
Francis et al. recently noticed that there is considerable divergence of orthologous genes between the two demosponges Amphimedon queenslandica and Tethya wilhelma, despite the fact that they belong to the same subclass of demosponges, Heteroscleromorpha (Figure 1). [30] Indeed, the average identity found between the same genes of different sponges is estimated to be only 57.8%. Moreover, the divergence found between bilaterian and poriferan genes was greater than 48%, [30] which would explain partly why many genes in the three published draft sponge genomes were not annotated. [31,98,99] Although all www.advancedsciencenews.com www.bioessays-journal.com genomes have lineage specific genes, under-annotation seems to be a more common problem for non-model animal genomes because of the difficulty automated annotation procedures (for example by Blast or GO Blast) have with divergent sequences. [99] Significant additional effort may be needed to fully assign a precise identity and function of the many "uncharacterized proteins" identified by automated annotation. As a direct consequence, the reader should be aware that the presence/absence status for genes given here (Figure 2) is based on the literature currently available and may be reevaluated by future in-depth bioinformatic analyses and functional characterization of gene families and protein domains.

Gaps to be Filled
Presently at least one genome from each of three sponge classes is publicly available (Figure 1). To fill the gap for glass sponges, the annotation of the genome draft of Oopsacas minuta, and sequencing and annotation of Aphrocallistes vastus are in progress and are expected to be published and available in the foreseeable future.
The genome of Ephydatia muelleri, a freshwater haplosclerid, is now underway after a prolonged sequencing project that began more than 6 years ago. Because of its cosmopolitan distribution in lakes and rivers worldwide, its availability makes it a useful sponge model to develop at the international scale. In addition, this freshwater species facilitates shipping to other labs thanks to its gemmules, a dormant resistant stage, and until now it is the more advanced in term of functional studies thanks to efficient siRNA experiments. [100] Because of the level of molecular divergence found among demosponges of the same subclass, we strongly encourage the acquisition of at least one transcriptome of each sponge subclass (gaps made obvious by Figure 1). Their comparative analyses should be a prerequisite before beginning any new, expensive and time consuming, genome project.
One of the main difficulties is choosing the species that would be most useful for experimental work and also most widely accessible. No sponge has yet been found that is both either cosmopolitan in distribution or easy to ship and grow in any lab and spawns both eggs and sperm for manipulative work. This makes sponges unique among metazoans, since even placozoan strains appear to be easier to culture in labs although they share the problem of not sexually reproducing in culture. Probably this last feature alone is responsible for the dearth of experimental approaches and the intractability of sponges as an experimental system. However, given that so many sponge species are known, it should only be a matter of time before even this obstacle will be overcome. Until then, it will still be important to develop a better understanding of the commonalities and differences in morphology, physiology and function of sponges. Working with genomes from a range of species is an asset to understanding many aspects of sponge body plan evolution.

Conclusions
Because sponges do not fit with many classical zoological definitions, [2,82,86] they have long been sidelined from most cellular and molecular studies. This relative neglect has significantly delayed knowledge of sponge biology relatively to the state of knowledge of Cnidaria, for example, and without question compared to what is known of the majority of Bilateria. The transcriptomic and genomic analyses performed so far, although still incomplete, have already revealed an important molecular disparity between sponge classes, which emphasizes the need to develop not only one, but ideally four, sponge models for evo-devo, functional, and physiological studies. Another result of these comparative gene surveys is that gene inventories alone fail to explain anatomical and physiological features of sponges and their differences compared with other animals. As a consequence, gene inventories have to be performed in line with the development of functional approaches in order to resolve the present discrepancy between the gene presence and/or absence and assumed corresponding anatomical features.