Primers:
- 939F – 5′ TTG ACG GGG GCC CGC ACA AG-3′
- 1492R – 5′-GTT TAC CTT GTT ACG ACT T-3′
GGTTACCTTGTTAGACTTGTGCGGGCCCCCGTCAA over rep seq:
CGGTGGAGCATGTGGTTTAATTCGACGCAACGCGAAGAACCTTACCTGGG
CGGTCTTGCGACCGTACTCCCCAGGCGGAGTGTTTCACGCGTTAGCTGGG
GATGAACGCTGGCGGCATGCCTAACACATGCAAGTCGGACGGGAAGTGGT
fwd TTGACGGGGGCCCGCACAAG - 3889280 of 4115088
rev GTTTACCTTGTTACGACTT 1302 of 4115088 GGTTACCTTGTTACGACTT 3785444
GGCTACCTTGTTACGACTT 6109
Expected Insert Length:
- 554 bp
Raw Fastqs are in folder 0_RawFastqs/RawFastqs_nameformatted
Names are formatted so that the identifying sample label appears
before the first underscore. This makes the downstream auto fasta
header relabeling seemless and ensures that when the sequences
across samples are combined into 1 file, each sequence is properly
labeled in the headers.
For this project, sample ids were numeric values [1,316], so again
for simplifying downstream processing, I renamed each raw fastqs
file to have the prefix ‘hfd12wk’ and set all numberic values to
have 3 digits.
Names are formatted so that the identifying sample label appears
before the first underscore. This makes the downstream auto fasta
header relabeling seemless and ensures that when the sequences
across samples are combined into 1 file, each sequence is properly
labeled in the headers.
For this project, sample ids were numeric values [1,316], so again
for simplifying downstream processing, I renamed each raw fastqs
file to have the prefix ‘hfd12wk’ and set all numberic values to
have 3 digits.
phyloseq-class experiment-level object otu_table() OTU Table: [ 1090 taxa and 245 samples ] sample_data() Sample Data: [ 245 samples by 11 sample variables ] tax_table() Taxonomy Table: [ 1090 taxa by 7 taxonomic ranks ]
USEARCH version
usearch11.0.667_i86linux64
Ref db: rdp_16s_v19
Seqs per sample | Full 16S OTU table |
---|---|
min | 226 |
max | 32,036 |
mean | 20,585 |
sd | 3,467 |
total | 5,043,233 |
## [1] "Bacteria" "uncl_Domain" "Eukaryota"
Domain | reads_assigned | percentage_all_reads |
---|---|---|
Bacteria | 5,042,647 | 99.99% |
uncl_Domain | 437 | 0.01% |
Eukaryota | 149 | 0.00% |
For now, only continuing with reads assigned at >= 80% confidence to
domain = Bacteria or Archaea
Domain-level:
Remove anything unclassified at Domain-level (437 reads; 69 OTUs)
Remove Domain = Eukaryota (149 reads; 1 OTUs)
Phylum-level:
Remove Phylum = “uncl_Bacteria” (12,991 reads; 357 OTUs)
Remove Phylum = “uncl_Archaea” (0 reads; 0 OTUs)
Look at Phylum “Cyanobacteriota”
Class-level:
Remove Class = “Chloroplast” (122 reads; 4 OTUs)
Remove Class = “uncl_Cyanobacteria/Chloroplast” (0 reads; 0 OTUs)
Family-level:
Remaining data:
- 5,029,534
total reads
- 20,528.71
average reads/sample
- 659 OTUs
- 245 samples
At this point, the unrarefied 12wk 16S dataset with unwanted taxa
prunned has 659
OTUs and a total of
5,029,534
reads.
32
singleton OTUs were present and removed (corresponding to
32
reads)
627 OTUs
remain
5,029,502
total reads remain
average:
20,528.58
sd:
3,460.712
par(mar = c(5.1, 4.1, 5.1, 2.1)) # c(bottom, left, top, right)
dev.copy(png,'/Users/L347123/Desktop/hfd-microbiome/docs/figures/rarefaction/Rarecurve_12wk_fullrange.png',width = 650)
plot_rarecurves((phylo2vegan_OTU(hfd_12wk_unrare_min2)), step = 1000,xlim=c(0,max(sample_sums(hfd_12wk_unrare_min2)*1.1)),ylim=c(0,350),
title(main="Rarefaction Curves for all 12wk samples\nDomain=Bacteria or Archaea, taxa prunned", adj=0),
label=T,cex.main=1.2)
dev.off()
Rarefaction threshold of 13,006 sequences per sample -> we’d lose:
Of note, I am rarefying via subsampling WITHOUT replacement and
removing OTUs no longer in dataset
Indeed:
- 2 samples removed because they contained fewer reads than
sample.size
.
- 3 OTUs were removed because they are no longer present in any sample
after random subsampling
624 OTUs
remain
3,160,458
total reads remain
average:
13,006
sd:
0