Marine microbial eukaryotes are found in all major eukaryotic branches of the evolutionary tree. They perform a diverse range of functions in marine ecosystems-including photosynthesis, predation and parasitism. Despite the great abundance of microeukaryotes in the ocean, their importance as absorbers of carbon dioxide and their critical contribution to marine food webs (among many ecological roles), the gene content of these microorganisms is only beginning to be explored because their genomes can be structurally complex and many gigabases in size. This program is intended to increase the research community's baseline of scientific knowledge by creating catalogues of genes that specify how these organisms thrive in diverse marine habitats and how they influence marine ecosystems, biogeochemical cycles and the composition of the atmosphere.

Transcriptome datasets are complex and should be approached with awareness of the following potential features-especially in light of the very deep Illumina sequencing used to generate these data. While many researchers attempted to provide axenic and uni-algal total RNA extracts for sequencing, it did not always turn out to be that way. The very deep Illumina sequencing may have occasionally picked up very low levels of non-target RNA, and low levels of bacteria may have been present but not known to the laboratory that provided the sample.

1. The identity of the target alga should be verified through 18S rRNA sequence analysis of the rRNA reads included in the transcriptome. This analysis has been performed by NCGR, but you may wish to repeat the analysis using your own protocols. Please contact NCGR for information about their 18S rRNA sequence analysis performed on a given sample.

2. Many cultures were known to be non-axenic, which has been indicated on the sample page. Cultures that were believed to be axenic must be verified through transcriptome data analysis, however.

3. Some cultures were known to contain multiple algal species (e.g. predator-prey experiments), which has been indicated on the sample page. Unanticipated very low levels of non-target algal species may be present, however, which has been revealed by the deep sequencing of the transcriptomes (upwards of 2.5 Gb per sample).

We recommend that you analyze the transcriptome datasets with these caveats in mind and encourage discussions in the research community regarding best practices and for sharing useful bioinformatics approaches to aid in the community's analyses.

NCGRCAMERAMoore Foundation