Abstract

As ecosystem engineers, seagrasses are angiosperms of paramount ecological importance in shallow shoreline habitats around the globe. Furthermore, the ancestors of independent seagrass lineages have secondarily returned into the sea in separate, independent evolutionary events. Thus, understanding the molecular adaptation of this clade not only makes significant contributions to the field of ecology, but also to principles of parallel evolution as well. With the use of Dr. Zompo, the first interactive seagrass sequence database presented here, new insights into the molecular adaptation of marine environments can be inferred. The database is based on a total of 14 597 ESTs obtained from two seagrass species, Zostera marina and Posidonia oceanica, which have been processed, assembled and comprehensively annotated. Dr. Zompo provides experimentalists with a broad foundation to build experiments and consider challenges associated with the investigation of this class of non-domesticated monocotyledon systems. Our database, based on the Ruby on Rails framework, is rich in features including the retrieval of experimentally determined heat-responsive transcripts, mining for molecular markers (SSRs and SNPs), and weighted key word searches that allow access to annotation gathered on several levels including Pfam domains, GeneOntology and KEGG pathways. Well established plant genome sites such as The Arabidopsis Information Resource (TAIR) and the Rice Genome Annotation Project are interfaced by Dr. Zompo. With this project, we have initialized a valuable resource for plant biologists in general and the seagrass community in particular. The database is expected to grow together with more data to come in the near future, particularly with the recent initiation of the Zostera genome sequencing project.

The Dr. Zompo database is available at http://drzompo.uni-muenster.de/

Introduction

Seagrasses are underwater flowering plants (angiosperms) belonging to the order Alismatales comprising of about 60 species (Supplementary Figure 1). Around the globe, seagrasses occupy shallow sedimentary shorelines where they operate as ecosystem engineers (1,2), and are ranked among the most valuable ecosystems on earth, surpassing the economic values of coral reefs or even tropical rain forests (3). Unique ecosystem services rendered by seagrasses include nutrition retention, erosion control, animal nursery and carbon sequestration (1,4).

Ecosystems founded by and based on seagrasses are rich and valuable, but they are also fragile. In recent years, it has been reported that seagrass populations show significant decline as a result of rapid environmental and anthropogenic changes (1,5–8); these changes will become more pronounced as global warming perpetuates (9). Seagrasses are of particular interest, not only because they occupy an ecologically unique position, but also serve as an extraordinary study system for evolutionary biologists. During the Cretaceous period, ancestors of seagrasses have secondarily returned to the sea in at least three independent lineages (10), offering great opportunities for the study of adaptation to a marine habitat and principles of parallel evolution in general (11,12).

The most abundant seagrass on the northern hemisphere, Zostera marina L., has recently been selected for a pilot genome sequencing project at the Joint Genome Institute (http://www.jgi.doe.gov/). As the first non-domesticated monocotyledon, this species will be an important addition to the poorly represented monocot branch of angiosperms. Thus, Zostera marina will become a corner stone of marine molecular ecology research, with complementary data to follow from several other seagrass species such as Zostera noltii and Posidonia oceanica to conduct further comparative studies.

With the Zostera genome project under way and a large world-wide community working with seagrasses, it is imperative to provide a central data resource which is easily accessible by ecologists, evolutionary biologists and the like. In this context, we see our database as a nucleus for seagrass data that will be generated at ever increasing speed with the advent of next generation sequencing, aiming to serve the rapidly growing seagrass community.

Dr. Zompo is a first step in creating this valuable community resource while also simultaneously addressing insight into molecular evolution and adaptation. This database serves as a basis for a large-scale comparative analysis, a novel approach in contrast to earlier studies which have focused on single gene loci only (10,13–16). A total of 14 597 ESTs of two seagrass species, Z. marina and P. oceanica, were analyzed. Employing such comparative analyses help to gain insights into the question how independent seagrass lineages have adapted to putatively similar ecological conditions on the genetic level. Inferences can be made through results of shared orthologous genes between the two species that show significantly different expression levels, and Pfam-A protein domains whose abundance is significantly skewed towards one species.

Implementation and architecture

Dr. Zompo architecture consists of a relational database integrated in the Ruby on Rails (RoR) framework (http://www.rubyonrails.org/). RoR is an open source web application framework built on the Ruby programming language, intended for developing database-backed web applications according to the Model-View-Control pattern. Overall, the RoR framework provides ideal grounds for efficient development of user-friendly and attractive web applications. Our application operates under an Apache web server on a DEBIAN Linux system and integrates a MySQL database, Apache's Lucene search engine library, the BioRuby library, the statistical language R and NCBI BLAST.

Database content

Dr. Zompo was designed to store EST data and corresponding annotation of multiple seagrass species while supporting inter-species investigations (Figure 1). The framework is scalable, thus allowing to widen the scope of content to the flowering plant order Alismatales if desired, and is compatible with forthcoming genome data. The current release is based on 9412 Z. marina ESTs (17) and 5185 P. oceanica ESTs (18) which, after preprocessing, have been assembled into 3387 and 1219 tentative unigenes, respectively. The sequence length of EST reads within each library is summarized in Supplementary Table 1.

Figure 1.

Dr. Zompo: screenshot of numerous ways to access the seagrass EST sequences and their annotation. Browse/Search results view, interactive flash charts of EST statistics, Pfam domain annotation represented as a tag cloud, KEGG pathway annotation with highlighted seagrass genes.

Posidonia EST libraries

Only one library was made from samples collected by SCUBA diving from a meadow located in Lacco Ameno, Ischia (Gulf of Naples). Total RNA, extracted from 10–15 genotypes from each of the two sampling depths (5 and 25 m), was isolated from pooled young leaf tissue and meristematic portions of shoots using a CTAB method. The cDNA was synthesized by vertis Biotechnologie AG, Freising, Germany. From about 11 000 000 recombinant clones, randomly selected clones from the library were isolated from plasmids (minipreps) and sequenced. Sequencing was performed both at the Stazione Zoologica Anton Dohrn (Naples, Italy) and at the Max Planck Institute of Molecular Genetics (Berlin, Germany).

Zostera EST libraries

Five different samples (experimental conditions and tissue types) were obtained from Schilksee and Maasholm (south-western Baltic Sea, Germany), from 1.6 to 2.5 m depth. Total RNA was extracted from young leaf tissue and meristematic region of plants. Two libraries represent natural conditions collected from average summer and winter conditions, respectively. Two other libraries were collected at the same sites, but under a heat stress treatment in aquaria at the water surface. The purpose of a fifth, redundant library was solely to improve the assembly. Each library contained the pooled RNA of 4–6 genotypes. Library construction was performed with Creator SMART cDNA library construction kit (Clontech) with directional ligation. Sequencing based on Plasmid preps was performed at Max-Planck-Institute for Limnology, Ploen, Germany.

EST sequence preprocessing, assembly, polymorphisms

In several preprocessing steps, using pregap4 from the Staden package (19) and cross_match from phrap (http://www.phrap.com/), the raw EST reads were processed to clip all uninformative sequence parts, including poor quality regions, vector and adapter fragments, and poly-A/-T tails. Remaining reads shorter than 100 nt were excluded from further processing. The resulting clean reads are publicly available on NCBI (P. oceanica accessions GO34959 to GO349047; Z. marina accessions AM268883 to AM268894, AM408830 to AM408843, AM766003 to AM773228, FC822029 to FC823188). For each species, separately, the clean reads were assembled into unigenes using CAP3 (20) with standard parameters and a minimum of one good read at clip position. We determined putative open reading frames (ORFs), microsatellites and single nucleotide polymorphisms for each of the resulting unigenes (Figure 2). Statistics and references to the tools employed for each of these steps are compiled in Table 1.

Figure 2.

Dr. Zompo: extract from the detailed view of an exemplary contig. Large body of functional annotation (SwissProt, GeneOntology, Pfam, KEGG) plus visual representation of assembled reads and sequence properties (ORF, domains, polymorphisms).

Table 1.

Summarizing statistics on the EST sequence preprocessing steps and sequence analysis for both Zostera marina and Posidonia oceanica EST collections

Z. marinaP. oceanica
Number of raw EST sequences94125185
Number of clean sequences after pregap478763089
Number of clean sequences after crossmatch74163079
Number of sequences in contigs after EST assembly50242238
Number of singletons after EST assembly2392841
Number of contigs after EST assembly995378
Number of unigenes (contigs plus singletons)33871219
Number of unigenes with a putative ORF33871219
… based on homology to NCBI nr (BLASTX, E ≤ 1e−5)2467963
… without homology but longest polypeptide >20aa920256
Number of unigenes containing SSRs585346
Total number of SSRs730498
Number of unigenes containing potential SNPs11518
Total number of potential SNPs393436
Z. marinaP. oceanica
Number of raw EST sequences94125185
Number of clean sequences after pregap478763089
Number of clean sequences after crossmatch74163079
Number of sequences in contigs after EST assembly50242238
Number of singletons after EST assembly2392841
Number of contigs after EST assembly995378
Number of unigenes (contigs plus singletons)33871219
Number of unigenes with a putative ORF33871219
… based on homology to NCBI nr (BLASTX, E ≤ 1e−5)2467963
… without homology but longest polypeptide >20aa920256
Number of unigenes containing SSRs585346
Total number of SSRs730498
Number of unigenes containing potential SNPs11518
Total number of potential SNPs393436

Putative ORFs were predicted based on homology to known proteins or by identifying the longest stretch of DNA that could be translated into more than 20 amino acids; potential SNPs were determined using QualitySNP (Tang,J., Vosman,B., Voorrips,R.E., van der Linden,C.G., and Leunissen,J.A.M. (2006) QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species. BMC Bioinformatics, 7, 438), and SSRs were detected using SSRIT (Temnykh,S., DeClerck,G., Lukashova,A., Lipovich,L., Cartinhour,S. and McCouch,S. (2001) Computational and experimental analysis of microsatellites in rice (Oryza sativa L.): frequency, length variation, transposon associations, and genetic marker potential. Genome Res, 11, 1441–1452).

Table 1.

Summarizing statistics on the EST sequence preprocessing steps and sequence analysis for both Zostera marina and Posidonia oceanica EST collections

Z. marinaP. oceanica
Number of raw EST sequences94125185
Number of clean sequences after pregap478763089
Number of clean sequences after crossmatch74163079
Number of sequences in contigs after EST assembly50242238
Number of singletons after EST assembly2392841
Number of contigs after EST assembly995378
Number of unigenes (contigs plus singletons)33871219
Number of unigenes with a putative ORF33871219
… based on homology to NCBI nr (BLASTX, E ≤ 1e−5)2467963
… without homology but longest polypeptide >20aa920256
Number of unigenes containing SSRs585346
Total number of SSRs730498
Number of unigenes containing potential SNPs11518
Total number of potential SNPs393436
Z. marinaP. oceanica
Number of raw EST sequences94125185
Number of clean sequences after pregap478763089
Number of clean sequences after crossmatch74163079
Number of sequences in contigs after EST assembly50242238
Number of singletons after EST assembly2392841
Number of contigs after EST assembly995378
Number of unigenes (contigs plus singletons)33871219
Number of unigenes with a putative ORF33871219
… based on homology to NCBI nr (BLASTX, E ≤ 1e−5)2467963
… without homology but longest polypeptide >20aa920256
Number of unigenes containing SSRs585346
Total number of SSRs730498
Number of unigenes containing potential SNPs11518
Total number of potential SNPs393436

Putative ORFs were predicted based on homology to known proteins or by identifying the longest stretch of DNA that could be translated into more than 20 amino acids; potential SNPs were determined using QualitySNP (Tang,J., Vosman,B., Voorrips,R.E., van der Linden,C.G., and Leunissen,J.A.M. (2006) QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species. BMC Bioinformatics, 7, 438), and SSRs were detected using SSRIT (Temnykh,S., DeClerck,G., Lukashova,A., Lipovich,L., Cartinhour,S. and McCouch,S. (2001) Computational and experimental analysis of microsatellites in rice (Oryza sativa L.): frequency, length variation, transposon associations, and genetic marker potential. Genome Res, 11, 1441–1452).

Unigene annotation

All seagrass unigenes were annotated comprehensively (Figure 2). This includes functional annotation on several levels (for annotation coverage see Supplementary Table 2): Seagrass unigenes were queried against (i) the manually curated, high quality SwissProt database (21) and (ii) the GeneOntology database (22) where annotation was assigned based on sequence similarity identified by BLAST with E-value ≤ 1e−5 (23). (iii) Protein domains were identified by comparing the seagrass sequences against hidden Markov models of the high quality Pfam-A domain database (24) using hmmpfam from the HMMER package (http://hmmer.janelia.org/). (iv) Within the unigene datasets, putative participants of molecular interactions and reaction networks were determined through KEGG [(25), KAAS, SBH method]. Moreover, we provide links to the corresponding entries on the main annotation websites: AmiGO, the official tool for searching and browsing the GeneOntology database, the Pfam website to show the summary entry, KEGG's orthology, hierarchy and pathway definitions.

Querying the database

The Dr. Zompo web application allows access to the unigenes from several different directions: (i) The complete set of unigenes can be browsed manually, listing almost all the associated properties and characterizations of each sequence in one table. (ii) Using BLAST (23), unigenes that show sequence similarity to any query sequence can be identified. In addition, BLAST queries can be performed against the genomic cDNA and peptide sequence databases of the two most important genetic plant model organisms, Arabidopsis thaliana and Oryza sativa, and the two genome projects are interfaced by directly linking all Arabidopsis and rice gene IDs to the respective resource. (iii) Search queries allow to find unigenes with certain properties (for instance presence of polymorphisms), and full-text searches scan all annotation descriptions and return a list of results scored by relevance. (iv) KEGG annotation is accessible either via a browsable list of all unigenes that have been identified to have a KEGG orthologue, or via the KEGG pathway maps where maps of interest can be viewed. In those maps, proteins that have been associated with one of the seagrass unigenes are highlighted and clickable, linking to the corresponding unigene entry in our database. (v) For browsing through domains, we provide two features to uncover domain annotations of particular interest. First, we use a tag cloud representation for each annotated Pfam domain where the observed domain frequency determines the size of its tag (Figure 1). Such a tag cloud provides intuitive and quick navigation through the most abundant domains found in the database. Second, a table containing domains that are significantly over-represented in either of the two seagrass species based on a χ2-test has been generated. Each of the listed domains is linked via the Pfam entry in our database where all associated transcripts and their numbers of reads are summarized.

The data interfaces among sequence, functional annotation and domain content allow the user to access the presented data in two complementary ways. First, we utilize interactive plots and tag clouds to allow for intuitive exploration of the data without any a priori research question. This is complemented by the ability to access data very selectively using complex search queries. For instance, if one is interested in candidate photosystem-related genes for heat stress experiments, search queries can be executed to find Z. marina unigenes that have been found to show a significant response to heat stress that were annotated with the keyword ‘photosystem’. Similarly, seagrass unigenes can be identified that can be used for population studies because they contain polymorphic molecular markers such as microsatellites or SNPs.

Applications

Other applications of the database include finding orthologous genes that are differentially expressed and inter-species comparison of domain abundance and bias. The results of these two example applications highlight the versatility and usefulness of Dr. Zompo, in particular for ecosystem research with multiple seagrass species.

Currently, the interpretation of these results can be cumbersome, since the available datasets have not been collected under controlled conditions. Accordingly, several variables such as collection depth, season collection, treatment and light conditions differ between the Posidonia and the Zostera dataset (see Methods section). These variabilities impact the statistical analysis such that observed differences cannot be attributed to one variable unequivocally. Despite these issues, the methods applied allow the identification of significant differences between the two EST datasets. Thus, a list of prime candidate genes was compiled which should be taken into account in further investigations when addressing the ecologically important question to understand how Zostera and Posidonia achieved adaptation to similar habitats. The work presented here demonstrates that significant insight can be gained with this limited dataset and that the described methods are in place for more comprehensive datasets that are expected in the near future for deeper analysis.

Comparing gene expression between orthologs

Alternative to functional substitutions in the coding sequence, gene expression modulation is a prominent mechanism to adapt to environmental changes at a cellular level (26). Tuning gene expression allows for rapid phenotypic changes while maintaining functional robustness of the underlying regulatory modules (27,28). Comparing expression levels of orthologous genes between Posidonia and Zostera provides insight into the global similarity of the transcriptomes that is a good indication for the degree of independent adaptation by the same mechanisms. Furthermore, differences of gene expression for gene clusters based on functional modules reflect non-parallel, lineage-specific adaptation to similar habitats.

In order to infer the overall similarity of the expression states between the two seagrasses, the transcription levels (i.e. the number of reads) between the orthologs were compared using Pearson correlation. Unigenes with significantly different numbers of reads (outliers) were determined using a χ2-test with P ≤ 0.01. To ensure that observed differences were not induced by the heat stress treatment, all cases where the Zostera unigene was found to be heat-responsive based on statistical evaluation of the read distribution in normal and heat-stressed libraries, as published before (17), were removed. Orthologs between the two seagrass species were determined by using a very conservative reciprocal best BLAST hit method on the protein sequence level. A threshold of E-value ≤ 1e−10 was used to define orthologs that have reciprocal best hits within the whole dataset, with the criteria that the region of similarity must span at least 80 amino acid residues with a minimum of 35% sequence identity.

The transcription level of the identified 207 seagrass orthologs were found to be significantly correlated (Pearson correlation coefficient r2 = 0.48, P = 1e−13). We tested if this could be a consequence of very low numbers of reads (between 1 and 3) that is observed for the vast majority of transcripts. When including only orthologous pairs where both members had a minimum number of reads, the correlation coefficient remained quite stable. Correlation of the 79 ortholog pairs with more than one read resulted in r2 = 0.54 (P = 3e−7); 27 orthologs with reads >3 produced r2 = 0.49 (P = 0.009). Therefore, it can be concluded that the total expression of orthologs between Z. marina and P. oceanica is significantly similar. This is quite remarkable for two reasons: first, their evolutionary distance is large since the two seagrass lineages have split probably before returning into the sea. Second, the Zostera dataset contained two heat stress libraries whereas Posidonia ESTs were exclusively collected under natural conditions.

As deviations from this general observed correlation between ortholog expression levels, we identified 33 orthologs that showed significant differences in transcription levels when comparing the Zostera and the Posidonia dataset (Table 2). Although the libraries are not 100% comparable (see above), those genes are nonetheless prime candidates for further in-depth investigations addressing questions on parallel evolution between the two seagrass lineages.

Table 2.

List of ZosteraPosidonia ortholog pairs that were found to have significantly different expression levels (i.e. number of EST reads)

Zostera marina IDPosidonia oceanica IDExpectedObservedχ2SwissProt annotation
Zoma_ZMD12058Pooc_Contig12524.8 : 16.21 : 401129.59Oxygen-evolving enhancer protein 2-2,chloroplast precursor
Zoma_Contig902Pooc_Contig324.8 : 16.210 : 31436.04Ribulose bisphosphate carboxylase small chainSSU5B, chloroplast precursor
Zoma_Contig740Pooc_Contig28721.1 : 13.933 : 2281.26Photosystem II 10 kDa polypeptide, chloroplastprecursor
Zoma_ZME03108Pooc_Contig28110.9 : 7.11 : 17194.94Probable glutathione S-transferase GSTU6
Zoma_Contig116Pooc_Contig27912.7 : 8.33 : 18187.59Light-regulated protein precursor
Zoma_Contig825Pooc_PC010E0411.5 : 7.518 : 185.1140S ribosomal protein S11
Zoma_Contig630Pooc_PC017D049.1 : 5.914 : 148.8Actin-depolymerizing factor 4
Zoma_Contig609Pooc_Contig16712.1 : 7.917 : 348.460S ribosomal protein L10
Zoma_Contig14Pooc_Contig1423.6 : 15.428 : 1139.47Glyceraldehyde-3-phosphate dehydrogenase, cytosolic
Zoma_Contig680Pooc_PC019A017.9 : 5.212 : 134.440S ribosomal protein S18
Zoma_Contig421Pooc_Contig24212.1 : 7.916 : 430.72
Zoma_Contig263Pooc_PC028C077.3 : 4.811 : 128.15Peptidyl-prolyl cis-trans isomerase
Zoma_Contig896Pooc_Contig17011.5 : 7.515 : 424.83Translationally-controlled tumor protein homolog
Zoma_ZMC06047Pooc_Contig2174.2 : 2.81 : 620.84
Zoma_Contig8Pooc_Contig9210.9 : 7.114 : 419.56Chlorophyll a-b binding protein 40, chloroplastprecursor
Zoma_Contig643Pooc_Contig1939.1 : 5.96 : 918.73Nascent polypeptide-associated complex subunitalpha-like protein 1
Zoma_Contig977Pooc_Contig196.0 : 4.03 : 718.49Photosystem I reaction center subunit XI, chloroplast precursor
Zoma_Contig679Pooc_Contig86.6 : 4.44 : 713.99ADP-ribosylation factor 1
Zoma_ZME01071Pooc_Contig2563.6 : 2.41 : 513.7740S ribosomal protein S19
Zoma_Contig828Pooc_PC046B105.4 : 3.68 : 113.15ATP synthase subunit d, mitochondrial
Zoma_Contig636Pooc_PC027E025.4 : 3.68 : 113.15
Zoma_Contig195Pooc_PC034B025.4 : 3.68 : 113.15Adenosine kinase 2
Zoma_Contig880Pooc_Contig2718.5 : 5.511 : 312.94Inosine-5'-monophosphate dehydrogenase
Zoma_Contig748Pooc_Contig2785.4 : 3.63 : 611.87Peroxiredoxin Q, chloroplast precursor
Zoma_Contig239Pooc_PC023A074.8 : 3.27 : 19.4
Zoma_Contig69Pooc_Contig1807.9 : 5.210 : 39.22Thioredoxin H-type 1
Zoma_ZMF05139Pooc_Contig3673.0 : 2.01 : 48.1660S ribosomal protein L18-2
Zoma_ZME04139Pooc_Contig43.0 : 2.01 : 48.16
Zoma_ZMC13030Pooc_Contig833.0 : 2.01 : 48.16Pollen-specific protein C13 precursor
Zoma_ZMA09041Pooc_Contig2643.0 : 2.01 : 48.16Nucleoside diphosphate kinase 1
Zoma_Contig913Pooc_Contig3624.8 : 3.23 : 56.71Zeaxanthin epoxidase, chloroplast precursor
Zoma_Contig736Pooc_Contig2114.8 : 3.23 : 56.71Acyl carrier protein 1, chloroplast precursor
Zoma_Contig488Pooc_Contig614.8 : 3.23 : 56.7160S ribosomal protein L7-4
Zostera marina IDPosidonia oceanica IDExpectedObservedχ2SwissProt annotation
Zoma_ZMD12058Pooc_Contig12524.8 : 16.21 : 401129.59Oxygen-evolving enhancer protein 2-2,chloroplast precursor
Zoma_Contig902Pooc_Contig324.8 : 16.210 : 31436.04Ribulose bisphosphate carboxylase small chainSSU5B, chloroplast precursor
Zoma_Contig740Pooc_Contig28721.1 : 13.933 : 2281.26Photosystem II 10 kDa polypeptide, chloroplastprecursor
Zoma_ZME03108Pooc_Contig28110.9 : 7.11 : 17194.94Probable glutathione S-transferase GSTU6
Zoma_Contig116Pooc_Contig27912.7 : 8.33 : 18187.59Light-regulated protein precursor
Zoma_Contig825Pooc_PC010E0411.5 : 7.518 : 185.1140S ribosomal protein S11
Zoma_Contig630Pooc_PC017D049.1 : 5.914 : 148.8Actin-depolymerizing factor 4
Zoma_Contig609Pooc_Contig16712.1 : 7.917 : 348.460S ribosomal protein L10
Zoma_Contig14Pooc_Contig1423.6 : 15.428 : 1139.47Glyceraldehyde-3-phosphate dehydrogenase, cytosolic
Zoma_Contig680Pooc_PC019A017.9 : 5.212 : 134.440S ribosomal protein S18
Zoma_Contig421Pooc_Contig24212.1 : 7.916 : 430.72
Zoma_Contig263Pooc_PC028C077.3 : 4.811 : 128.15Peptidyl-prolyl cis-trans isomerase
Zoma_Contig896Pooc_Contig17011.5 : 7.515 : 424.83Translationally-controlled tumor protein homolog
Zoma_ZMC06047Pooc_Contig2174.2 : 2.81 : 620.84
Zoma_Contig8Pooc_Contig9210.9 : 7.114 : 419.56Chlorophyll a-b binding protein 40, chloroplastprecursor
Zoma_Contig643Pooc_Contig1939.1 : 5.96 : 918.73Nascent polypeptide-associated complex subunitalpha-like protein 1
Zoma_Contig977Pooc_Contig196.0 : 4.03 : 718.49Photosystem I reaction center subunit XI, chloroplast precursor
Zoma_Contig679Pooc_Contig86.6 : 4.44 : 713.99ADP-ribosylation factor 1
Zoma_ZME01071Pooc_Contig2563.6 : 2.41 : 513.7740S ribosomal protein S19
Zoma_Contig828Pooc_PC046B105.4 : 3.68 : 113.15ATP synthase subunit d, mitochondrial
Zoma_Contig636Pooc_PC027E025.4 : 3.68 : 113.15
Zoma_Contig195Pooc_PC034B025.4 : 3.68 : 113.15Adenosine kinase 2
Zoma_Contig880Pooc_Contig2718.5 : 5.511 : 312.94Inosine-5'-monophosphate dehydrogenase
Zoma_Contig748Pooc_Contig2785.4 : 3.63 : 611.87Peroxiredoxin Q, chloroplast precursor
Zoma_Contig239Pooc_PC023A074.8 : 3.27 : 19.4
Zoma_Contig69Pooc_Contig1807.9 : 5.210 : 39.22Thioredoxin H-type 1
Zoma_ZMF05139Pooc_Contig3673.0 : 2.01 : 48.1660S ribosomal protein L18-2
Zoma_ZME04139Pooc_Contig43.0 : 2.01 : 48.16
Zoma_ZMC13030Pooc_Contig833.0 : 2.01 : 48.16Pollen-specific protein C13 precursor
Zoma_ZMA09041Pooc_Contig2643.0 : 2.01 : 48.16Nucleoside diphosphate kinase 1
Zoma_Contig913Pooc_Contig3624.8 : 3.23 : 56.71Zeaxanthin epoxidase, chloroplast precursor
Zoma_Contig736Pooc_Contig2114.8 : 3.23 : 56.71Acyl carrier protein 1, chloroplast precursor
Zoma_Contig488Pooc_Contig614.8 : 3.23 : 56.7160S ribosomal protein L7-4

Expression levels were compared using a χ2-test with P ≤ 0.01, where the expected number of reads was determined by averaging the overall number of reads in both species considering their different library sizes. All ortholog pairs which contained Zostera unigenes that have been found to be responsive to the heat stress treatment were removed.

Table 2.

List of ZosteraPosidonia ortholog pairs that were found to have significantly different expression levels (i.e. number of EST reads)

Zostera marina IDPosidonia oceanica IDExpectedObservedχ2SwissProt annotation
Zoma_ZMD12058Pooc_Contig12524.8 : 16.21 : 401129.59Oxygen-evolving enhancer protein 2-2,chloroplast precursor
Zoma_Contig902Pooc_Contig324.8 : 16.210 : 31436.04Ribulose bisphosphate carboxylase small chainSSU5B, chloroplast precursor
Zoma_Contig740Pooc_Contig28721.1 : 13.933 : 2281.26Photosystem II 10 kDa polypeptide, chloroplastprecursor
Zoma_ZME03108Pooc_Contig28110.9 : 7.11 : 17194.94Probable glutathione S-transferase GSTU6
Zoma_Contig116Pooc_Contig27912.7 : 8.33 : 18187.59Light-regulated protein precursor
Zoma_Contig825Pooc_PC010E0411.5 : 7.518 : 185.1140S ribosomal protein S11
Zoma_Contig630Pooc_PC017D049.1 : 5.914 : 148.8Actin-depolymerizing factor 4
Zoma_Contig609Pooc_Contig16712.1 : 7.917 : 348.460S ribosomal protein L10
Zoma_Contig14Pooc_Contig1423.6 : 15.428 : 1139.47Glyceraldehyde-3-phosphate dehydrogenase, cytosolic
Zoma_Contig680Pooc_PC019A017.9 : 5.212 : 134.440S ribosomal protein S18
Zoma_Contig421Pooc_Contig24212.1 : 7.916 : 430.72
Zoma_Contig263Pooc_PC028C077.3 : 4.811 : 128.15Peptidyl-prolyl cis-trans isomerase
Zoma_Contig896Pooc_Contig17011.5 : 7.515 : 424.83Translationally-controlled tumor protein homolog
Zoma_ZMC06047Pooc_Contig2174.2 : 2.81 : 620.84
Zoma_Contig8Pooc_Contig9210.9 : 7.114 : 419.56Chlorophyll a-b binding protein 40, chloroplastprecursor
Zoma_Contig643Pooc_Contig1939.1 : 5.96 : 918.73Nascent polypeptide-associated complex subunitalpha-like protein 1
Zoma_Contig977Pooc_Contig196.0 : 4.03 : 718.49Photosystem I reaction center subunit XI, chloroplast precursor
Zoma_Contig679Pooc_Contig86.6 : 4.44 : 713.99ADP-ribosylation factor 1
Zoma_ZME01071Pooc_Contig2563.6 : 2.41 : 513.7740S ribosomal protein S19
Zoma_Contig828Pooc_PC046B105.4 : 3.68 : 113.15ATP synthase subunit d, mitochondrial
Zoma_Contig636Pooc_PC027E025.4 : 3.68 : 113.15
Zoma_Contig195Pooc_PC034B025.4 : 3.68 : 113.15Adenosine kinase 2
Zoma_Contig880Pooc_Contig2718.5 : 5.511 : 312.94Inosine-5'-monophosphate dehydrogenase
Zoma_Contig748Pooc_Contig2785.4 : 3.63 : 611.87Peroxiredoxin Q, chloroplast precursor
Zoma_Contig239Pooc_PC023A074.8 : 3.27 : 19.4
Zoma_Contig69Pooc_Contig1807.9 : 5.210 : 39.22Thioredoxin H-type 1
Zoma_ZMF05139Pooc_Contig3673.0 : 2.01 : 48.1660S ribosomal protein L18-2
Zoma_ZME04139Pooc_Contig43.0 : 2.01 : 48.16
Zoma_ZMC13030Pooc_Contig833.0 : 2.01 : 48.16Pollen-specific protein C13 precursor
Zoma_ZMA09041Pooc_Contig2643.0 : 2.01 : 48.16Nucleoside diphosphate kinase 1
Zoma_Contig913Pooc_Contig3624.8 : 3.23 : 56.71Zeaxanthin epoxidase, chloroplast precursor
Zoma_Contig736Pooc_Contig2114.8 : 3.23 : 56.71Acyl carrier protein 1, chloroplast precursor
Zoma_Contig488Pooc_Contig614.8 : 3.23 : 56.7160S ribosomal protein L7-4
Zostera marina IDPosidonia oceanica IDExpectedObservedχ2SwissProt annotation
Zoma_ZMD12058Pooc_Contig12524.8 : 16.21 : 401129.59Oxygen-evolving enhancer protein 2-2,chloroplast precursor
Zoma_Contig902Pooc_Contig324.8 : 16.210 : 31436.04Ribulose bisphosphate carboxylase small chainSSU5B, chloroplast precursor
Zoma_Contig740Pooc_Contig28721.1 : 13.933 : 2281.26Photosystem II 10 kDa polypeptide, chloroplastprecursor
Zoma_ZME03108Pooc_Contig28110.9 : 7.11 : 17194.94Probable glutathione S-transferase GSTU6
Zoma_Contig116Pooc_Contig27912.7 : 8.33 : 18187.59Light-regulated protein precursor
Zoma_Contig825Pooc_PC010E0411.5 : 7.518 : 185.1140S ribosomal protein S11
Zoma_Contig630Pooc_PC017D049.1 : 5.914 : 148.8Actin-depolymerizing factor 4
Zoma_Contig609Pooc_Contig16712.1 : 7.917 : 348.460S ribosomal protein L10
Zoma_Contig14Pooc_Contig1423.6 : 15.428 : 1139.47Glyceraldehyde-3-phosphate dehydrogenase, cytosolic
Zoma_Contig680Pooc_PC019A017.9 : 5.212 : 134.440S ribosomal protein S18
Zoma_Contig421Pooc_Contig24212.1 : 7.916 : 430.72
Zoma_Contig263Pooc_PC028C077.3 : 4.811 : 128.15Peptidyl-prolyl cis-trans isomerase
Zoma_Contig896Pooc_Contig17011.5 : 7.515 : 424.83Translationally-controlled tumor protein homolog
Zoma_ZMC06047Pooc_Contig2174.2 : 2.81 : 620.84
Zoma_Contig8Pooc_Contig9210.9 : 7.114 : 419.56Chlorophyll a-b binding protein 40, chloroplastprecursor
Zoma_Contig643Pooc_Contig1939.1 : 5.96 : 918.73Nascent polypeptide-associated complex subunitalpha-like protein 1
Zoma_Contig977Pooc_Contig196.0 : 4.03 : 718.49Photosystem I reaction center subunit XI, chloroplast precursor
Zoma_Contig679Pooc_Contig86.6 : 4.44 : 713.99ADP-ribosylation factor 1
Zoma_ZME01071Pooc_Contig2563.6 : 2.41 : 513.7740S ribosomal protein S19
Zoma_Contig828Pooc_PC046B105.4 : 3.68 : 113.15ATP synthase subunit d, mitochondrial
Zoma_Contig636Pooc_PC027E025.4 : 3.68 : 113.15
Zoma_Contig195Pooc_PC034B025.4 : 3.68 : 113.15Adenosine kinase 2
Zoma_Contig880Pooc_Contig2718.5 : 5.511 : 312.94Inosine-5'-monophosphate dehydrogenase
Zoma_Contig748Pooc_Contig2785.4 : 3.63 : 611.87Peroxiredoxin Q, chloroplast precursor
Zoma_Contig239Pooc_PC023A074.8 : 3.27 : 19.4
Zoma_Contig69Pooc_Contig1807.9 : 5.210 : 39.22Thioredoxin H-type 1
Zoma_ZMF05139Pooc_Contig3673.0 : 2.01 : 48.1660S ribosomal protein L18-2
Zoma_ZME04139Pooc_Contig43.0 : 2.01 : 48.16
Zoma_ZMC13030Pooc_Contig833.0 : 2.01 : 48.16Pollen-specific protein C13 precursor
Zoma_ZMA09041Pooc_Contig2643.0 : 2.01 : 48.16Nucleoside diphosphate kinase 1
Zoma_Contig913Pooc_Contig3624.8 : 3.23 : 56.71Zeaxanthin epoxidase, chloroplast precursor
Zoma_Contig736Pooc_Contig2114.8 : 3.23 : 56.71Acyl carrier protein 1, chloroplast precursor
Zoma_Contig488Pooc_Contig614.8 : 3.23 : 56.7160S ribosomal protein L7-4

Expression levels were compared using a χ2-test with P ≤ 0.01, where the expected number of reads was determined by averaging the overall number of reads in both species considering their different library sizes. All ortholog pairs which contained Zostera unigenes that have been found to be responsive to the heat stress treatment were removed.

Inter-species comparison of domain abundance

Protein domains are evolutionarily conserved parts of proteins that typically correspond to distinct functional units. By determining the expression level of each domain, annotated using Pfam-A for the two seagrass species, the relative frequency of this functional module in the two EST datasets can be assessed. This allows for the identification of individual functional modules within proteins that may occur at different frequencies between the two sampled transcriptomes of Z. marina and P. oceanica. Since both organisms occupy similar ecological niches, one would expect to find all domains to occur at similar frequencies across the two datasets. In contrast, strongly biased domain frequencies between the two species should be a consequence of one or multiple external factors. While being aware of the limitations mentioned before, we consider species differences (e.g. dissimilar expansion of gene families) by chance or as a consequence of adaptation to slightly different ecological niches, or differences in the expressed repertoire of genes as most likely explanations.

For each species, an approximation of the expression level for each domain was obtained by summing up the number of EST reads associated with the specific domain. Based on the assumption of uniform distribution, expected frequencies were computed for each domain while having corrected for the different sizes of the transcriptomes. In a χ2-test, observed and expected number of occurrences were compared, which yielded 25 cases where the domain abundance differed significantly between the two seagrass species (P ≤ 0.05, FDR corrected, see Supplementary Table 3).

Six out of 15 domains were found to be over-represented in P. oceanica and are associated with the photosystems and the photosynthetic process, in general. Moreover, at least three out of the unequally distributed domains are clearly associated with response to stress and/or pathogens (Harpin-induced protein 1, Glycine rich protein family, Gamma-thionin family) which could indicate that the sampled seagrasses were exposed to a distinct set of external stimuli to which they had to react to.

The strongest signal was found in the metallothionein (MT) protein family domain which is important for protein interactions with heavy metals such as zinc, copper, cadmium or nickel. In animals and fungi, MTs have been found to be involved in detoxification of heavy metals (29); in plants first evidence suggests a role in metal homeostasis (30,31). The precise function of MTs, however, is still to be determined. Since the contig with the highest number of reads in both datasets is similar to known metallothionein proteins (Zoma_Contig761 and Pooc_Contig198), MT function seems to be extremely important for both species. From studies in P. oceanica, MTs are already known to comprise a multi-gene family of at least five members (13), enabling the seagrass to absorb and accumulate metals in its organs and tissues after uptake from sediments (32,33). Moreover, P. oceanica is considered as a bioindicator for mercury since it can live at mercury rich sites (34).

The high frequency of MT domains observed in Posidonia is a direct consequence of significant similarity of the Pfam-A domain model to the highly expressed contig in Posidonia, but not that of Zostera. Although both Zostera and Posidonia unigenes contain an ORF with the same region of the protein and is of similar length, only the Posidonia contig matched the metallothionein Pfam-A domain model well (E = 2.6e−43). The Zostera contig Zoma_Contig761 probably contains a divergent variant of this domain, as it can be found with a relaxed E-value threshold (E = 0.0015). Interestingly, two Posidonia transcripts (Pooc_Contig300 and Pooc_Contig55) sharing high sequence similarity with the highly expressed Zoma_Contig761 show almost no sequence similarity to the MT Pfam domain (E = 0.00037). All in all, those unigenes are likely to represent different members of the MT gene family and, according to the samples we have gathered, the two seagrass species have distinct combinations of expression levels within this gene family.

Concluding remarks

In addition to the ecological importance of seagrasses to the planet, this group of flowering plants provides extraordinary grounds for evolutionary analysis due to their specific evolutionary history. The results from the analysis of the first EST data, obtained for two representative seagrass species of independent lineages, are presented with interesting findings for the evolution of orthologs at the domain level. The analysis was facilitated by Dr. Zompo which has been designed to manage and explore seagrass ESTs and genomic data from various species. This new database is the first large-scale resource designed specifically to accommodate seagrass sequence data. And such repository will be of high value in bridging the gap between the genomic resources of model organisms and seagrass species. Furthermore, the database provides experimentalists with a base knowledge upon which to build experiments. For instance, it can be used for finding experimentally verified homologs in other plant genes, or as a basis to target primer design for specific genes or molecular markers.

Dr. Zompo is not limited to the seagrass community and will be an important contribution to the general plant research community. This resource is expected to grow alongside with ecological and global warming-related research focuses that are gaining momentum (35). The database will be consistently updated and extended to broaden the content scope to include available data through addition of more species, and by adding more comprehensive and detailed annotations as they become available. Comprehensive collections of transcriptome data for the two sister species Z. marina and Z. noltii using the 454 next-generation sequencing technology will soon be gathered. Moreover, the full genomes of Z. marina and Spirodela polyrhiza (Great Duckweed) are expected by the end of 2009 (Joint Genome Institute, http://www.jgi.doe.gov/). Such extensions will promote the usefulness of this data repository even beyond its current status.

Acknowledgements

We thank the FP6 NoE Marine Genomics Europe (EC contract reference: GOCE-CT-2004505403) for supporting part of the EST sequencing through two small grants at the Technological Platform MPI in Berlin-Dahlem. We also thank the Molecular Biology Service at SZN, Italy for making part of the sequences and for providing technical support throughout the project. E.D. is a PhD student (tutor Prof. A. Innocenti) of the Dipartimento di Ecologia, Università della Calabria, Rende (CS), Italy. T.B.H.R. acknowledges funding through the DFG (Deutsche Forschungsgemeinschaft, Re 1108/9 AQUASHIFT). E.B.B. and A.D.M. acknowledge support by the DFG (Deutsche Forschungs Gemeinschaft) through grant BO2544/4-1.

Conflict of interest. None declared.

References

1
Duarte
CM
The future of seagrass meadows
Environ. Cons.
2002
, vol. 
29
 (pg. 
192
-
106
)
2
Williams
SL
Heck
K.
Jr.
Marine Community Ecology.
2000
Sinauer, Sundewrland, MA
3
Costanza
R
D’arge
R
de Groot
R
, et al. 
The value of the world's ecosystem services and natural capital
Nature
1997
, vol. 
387
 (pg. 
253
-
260
)
4
Micheli
F
Peterson
CH
Estuarine vegetated habitats as corridors for predator movements
Conserv. Biol.
1999
, vol. 
13
 (pg. 
869
-
881
)
5
Short
FT
Wyllie-Echeverria
S
Natural and human-induced disturbance of seagrasses
Environ. Conser.
1996
, vol. 
23
 (pg. 
17
-
27
)
6
Duarte
CM
Seagrass ecology at the turn of the millennium: challenges for the new century
Aquat. Bot.
1999
, vol. 
65
 (pg. 
7
-
20
)
7
Waycott
M
Duarte
CM
Carruthers
T.JB
, et al. 
Accelerating loss of seagrasses across the globe threatens coastal ecosystems
Proc. Natl Acad. Sci. USA
2009
 
doi:10.1073/pnas.0905620106
8
Orth
RJ
Carruthers
T.JB
Dennison
WC
, et al. 
A global crisis for seagrass ecosystems
BioOne
2006
, vol. 
56
 (pg. 
987
-
996
)
9
IPCC
Fourth Assessment Report: Climate Change 2007
2007
10
Les
DH
Cleland
MA
Waycott
M
Phylogenetic studies in Alismatidae, II: evolution of marine angiosperms (seagrasses) and hydrophily
Syst. Botany
1997
, vol. 
22
 (pg. 
443
-
463
)
11
Doolittle
RF
Convergent evolution: the need to be explicit
Trends Biochem. Sci.
1994
, vol. 
19
 (pg. 
15
-
18
)
12
Wood
TE
Burke
JM
Rieseberg
LH
Parallel genotypic adaptation: when evolution repeats itself
Genetica
2005
, vol. 
123
 (pg. 
157
-
170
)
13
Giordani
T
Natali
L
Maserti
BE
, et al. 
Characterization and expression of DNA sequences encoding putative type-II metallothioneins in the seagrass Posidonia oceanica
Plant Physiol.
2000
, vol. 
123
 (pg. 
1571
-
1582
)
14
Procaccini
G
Mazzella
L
Alberte
RS
Les
DH
Chloroplast tRNALeu (UAA) intron sequences provide phylogenetic resolution of seagrass relationships
Aquat. Bot.
1999
, vol. 
62
 (pg. 
269
-
283
)
15
Kato
Y
Aioi
K
Omori
Y
, et al. 
Phylogenetic analyses of Zostera species based on rbcL and matK nucleotide sequences: implications for the origin and diversification of seagrasses in Japanese waters
Genes Genet. Syst.
2003
, vol. 
78
 (pg. 
329
-
342
)
16
Adhitya
A
Thomas
F.IM
Ward
BB
Diversity of assimilatory nitrate reductase genes from plankton and epiphytes associated with a seagrass bed
Microb. Ecol.
2007
, vol. 
54
 (pg. 
587
-
597
)
17
Reusch
TBH
Veron
AS
Preuss
C
, et al. 
Comparative analysis of expressed sequence tag (EST) libraries in the seagrass Zostera marina subjected to temperature stress
Mar. Biotechnol.
2008
, vol. 
10
 (pg. 
297
-
309
)
18
Migliaccio
M
Cavallini
A
Natali
L
Procaccini
G
New genomic approaches on the seagrass Posidonia oceanica (L.) Delile
Biol. Marina Mediterranea
2006
, vol. 
13
 (pg. 
64
-
67
)
19
Staden
R
Beal
KF
Bonfield
JK
The Staden package, 1998
Methods Mol. Biol.
2000
, vol. 
132
 (pg. 
115
-
130
)
20
Huang
X
Madan
A
CAP3: a DNA sequence assembly program
Genome Res.
1999
, vol. 
9
 (pg. 
868
-
877
)
21
Boeckmann
B
Bairoch
A
Apweiler
R
, et al. 
The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003
Nucleic Acids Res.
2003
, vol. 
31
 (pg. 
365
-
370
)
22
Ashburner
M
Ball
CA
Blake
JA
, et al. 
Gene ontology: tool for the unification of biology. The Gene Ontology Consortium
Nat. Genet.
2000
, vol. 
25
 (pg. 
25
-
29
)
23
Altschul
SF
Madden
TL
Schöffer
AA
, et al. 
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Res.
1997
, vol. 
25
 (pg. 
3389
-
3402
)
24
Finn
RD
Tate
J
Mistry
J
, et al. 
The Pfam protein families database
Nucleic Acids Res.
2008
, vol. 
36
 
Database issue
(pg. 
D281
-
D288
)
25
Mao
X
Cai
T
Olyarchuk
JG
Wei
L
Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary
Bioinformatics
2005
, vol. 
21
 (pg. 
3787
-
3793
)
26
López-Maury
L
Marguerat
S
Bähler
J
Tuning gene expression to changing environments: from rapid responses to evolutionary adaptation
Nat. Rev. Genet.
2008
, vol. 
9
 (pg. 
583
-
593
)
27
Carroll
SB
Grenier
JK
Weatherbee
SD
From DNA to Diversity.
2001
Malden, MA, USA
Blackwell Science
28
Wagner
A
Robustness and Evolvability in Living Systems
2005
 
Princeton Studies in Complexity. Princeton University Press. Princeton NJ, USA
29
Robinson
NJ
Tommey
AM
Kuske
C
Jackson
PJ
Plant metallothioneins
Biochem. J.
1993
, vol. 
295
 
Pt 1
(pg. 
1
-
10
)
30
Murphy
A
Taiz
L
Comparison of metallothionein gene expression and nonprotein thiols in ten Arabidopsis ecotypes. Correlation with copper tolerance
Plant Physiol.
1995
, vol. 
109
 (pg. 
945
-
954
)
31
Murphy
A
Zhou
J
Goldsbrough
PB
Taiz
L
Purification and immunological identification of metallothioneins 1 and 2 from Arabidopsis thaliana
Plant Physiol.
1997
, vol. 
113
 (pg. 
1293
-
1301
)
32
Schlacher-Hoenlinger
MA
Schlacher
TA
Accumulation, contamination, and seasonal variability of trace metals in the coastal zone: patterns in a seagrass meadow from the Mediterranean
Marine Biol.
1998
, vol. 
131
 (pg. 
401
-
410
)
33
Warnau
M
Fowler
SW
Teyssiéb
J.-L
Biokinetics of selected heavy metals and radionuclides in two marine macrophytes: the seagrass Posidonia oceanica and the alga Caulerpa taxifolia
Mar. Environ. Res.
1996
, vol. 
41
 (pg. 
343
-
362
)
34
Pergent-Martini
C
Posidonia oceanica: a biological indicator of past and present mercury contamination in the Mediterranean sea
Mar. Environ. Res.
1998
, vol. 
45
 (pg. 
101
-
111
)
35
Procaccini
G
Olsen
JL
Reusch
T.BH
Contribution of genetics and genomics to seagrass biology and conservation
J. Exp. Mar. Biol. Ecol.
2007
, vol. 
350
 (pg. 
234
-
259
)
This is Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary data