Prevalence of Borrelia burgdorferi and diversity of its outer surface protein C (ospC) alleles in blacklegged ticks (Ixodes scapularis) in DelawareAbstract
Characterizing the diversity of genes associated with virulence and transmission of a pathogen across the pathogen's distribution can inform our understanding of host infection risk. Borrelia burgdorferi is a vector-borne bacterium that causes Lyme disease in humans and is common in the United States. The outer surface protein C (ospC) gene of B. burgdorferi exhibits substantial genetic variation across the pathogen's distribution and plays a critical role in virulence and transmission in vertebrate hosts. In fact, B. burgdorferi infections that disseminate across host tissues in humans are associated with only a subset of ospC alleles. Delaware has a high incidence of Lyme disease, but the diversity of ospC in B. burgdorferi in the state has not been evaluated. We used PCR to amplify ospC in B. burgdorferi-infected blacklegged ticks (Ixodes scapularis) in sites statewide and used short-read sequencing to identify ospC alleles. B. burgdorferi prevalence in blacklegged ticks varied across sites, but not significantly so. We identified 15 previously characterized ospC alleles accounting for nearly all of the expected diversity of alleles across the sites as estimated using the Chao1 index. Nearly 40% of sequenced infections (23/58) had more than one ospC allele present suggesting mixed strain infections and the relative frequencies of alleles in single infections were positively correlated with their relative frequencies in mixed infections. Turnover of ospC alleles was positively related to distance between sites with closer sites having more similar allele compositions than more distant sites. This suggests a degree of B. burgdorferi dispersal limitation or habitat specialization. OspC alleles known to cause disseminated infections in humans were found at the highest frequencies across sites, corresponding to Delaware's high incidence of Lyme disease.
Genetic structure and historic demography of endangered unarmoured threespine stickleback at southern latitudes signals a potential new management approachAbstract
Habitat loss, flood control infrastructure, and drought have left most of southern California and northern Baja California's native freshwater fish near extinction, including the endangered unarmoured threespine stickleback (Gasterosteus aculeatus williamsoni). This subspecies, an unusual morph lacking the typical lateral bony plates of the G. aculeatus complex, occurs at arid southern latitudes in the eastern Pacific Ocean and survives in only three inland locations. Managers have lacked molecular data to answer basic questions about the ancestry and genetic distinctiveness of unarmoured populations. These data could be used to prioritize conservation efforts. We sampled G. aculeatus from 36 localities and used microsatellites and whole genome data to place unarmoured populations within the broader evolutionary context of G. aculeatus across southern California/northern Baja California. We identified three genetic groups with none consisting solely of unarmoured populations. Unlike G. aculeatus at northern latitudes, where Pleistocene glaciation has produced similar historical demographic profiles across populations, we found markedly different demographics depending on sampling location, with inland unarmoured populations showing steeper population declines and lower heterozygosity compared to low armoured populations in coastal lagoons. One exception involved the only high elevation population in the region, where the demography and alleles of unarmoured fish were similar to low armoured populations near the coast, exposing one of several cases of artificial translocation. Our results suggest that the current “management-by-phenotype” approach, based on lateral plates, is incidentally protecting the most imperilled populations; however, redirecting efforts toward evolutionary units, regardless of phenotype, may more effectively preserve adaptive potential.
Altered costimulatory signals and hypoxia support chromatin landscapes limiting the functional potential of exhausted T cells in cancerAbstract
Immunotherapy has changed cancer treatment with major clinical successes, but response rates remain low due in part to elevated prevalence of dysfunctional, terminally exhausted T cells. However, the mechanisms promoting progression to terminal exhaustion remain undefined. We profiled the histone modification landscape of tumor-infiltrating CD8 T cells throughout differentiation, finding terminally exhausted T cells possessed chromatin features limiting their transcriptional potential. Active enhancers enriched for bZIP/AP-1 transcription factor motifs lacked correlated gene expression, which were restored by immunotherapeutic costimulatory signaling. Epigenetic repression was also driven by an increase in histone bivalency, which we linked directly to hypoxia exposure. Our study is the first to profile the precise epigenetic changes during intratumoral differentiation to exhaustion, highlighting their altered function is driven by both improper costimulatory signals and environmental factors. These data suggest even terminally exhausted T cells remain poised for transcription in settings of increased costimulatory signaling and reduced hypoxia.
Comprehensive dataset of shotgun metagenomes from oxygen stratified freshwater lakes and pondsAbstract
Stratified lakes and ponds featuring steep oxygen gradients are significant net sources of greenhouse gases and hotspots in the carbon cycle. Despite their significant biogeochemical roles, the microbial communities, especially in the oxygen depleted compartments, are poorly known. Here, we present a comprehensive dataset including 267 shotgun metagenomes from 41 stratified lakes and ponds mainly located in the boreal and subarctic regions, but also including one tropical reservoir and one temperate lake. For most lakes and ponds, the data includes a vertical sample set spanning from the oxic surface to the anoxic bottom layer. The majority of the samples were collected during the open water period, but also a total of 29 samples were collected from under the ice. In addition to the metagenomic sequences, the dataset includes environmental variables for the samples, such as oxygen, nutrient and organic carbon concentrations. The dataset is ideal for further exploring the microbial taxonomic and functional diversity in freshwater environments and potential climate change impacts on the functioning of these ecosystems.
Quantifying chromosomal instability from intratumoral karyotype diversity using agentbased modeling and Bayesian inferenceAbstract
Chromosomal instability (CIN) — persistent chromosome gain or loss through abnormal
karyokinesis — is a hallmark of cancer that drives aneuploidy. Intrinsic chromosome mis-segregation rates, a measure of CIN, can inform prognosis and are a likely biomarker for response to anti-microtubule agents. However, existing methodologies to measure this rate are labor intensive, indirect, and confounded by karyotype selection reducing observable diversity.
We developed a framework to simulate and measure CIN, accounting for karyotype selection,
and recapitulated karyotype-level clonality in simulated populations. We leveraged approximate
Bayesian computation using phylogenetic topology and diversity to infer mis-segregation rates
and karyotype selection from single-cell DNA sequencing data. Experimental validation of this
approach revealed extensive chromosome mis-segregation rates caused by the chemotherapy
paclitaxel (17.5±0.14/division). Extending this approach to clinical samples revealed the inferred rates fell within direct observations of cancer cell lines. This work provides the necessary framework to quantify CIN in human tumors and develop it as a predictive biomarker.
Monodopsis and Vischeria Genomes Shed New Light on the Biology of Eustigmatophyte AlgaeAbstract
Members of eustigmatophyte algae, especially Nannochloropsis and Microchloropsis, have been tapped for biofuel production owing to their exceptionally high lipid content. Although extensive genomic, transcriptomic, and synthetic biology toolkits have been made available for Nannochloropsis andMicrochloropsis, very little is known about other eustigmatophytes. Herewe present three near-chromosomal and gapless genome assemblies of Monodopsis strains C73 and C141 (60Mb) and Vischeria strain C74 (106Mb), which are the sister groups to Nannochloropsis andMicrochloropsis in the order Eustigmatales. These genomes contain unusually high percentages of simple repeats, ranging from 12% to 21% of the total assembly size. Unlike Nannochloropsis and Microchloropsis, long interspersed nuclear element repeats are abundant in Monodopsis and Vischeria and might constitute the centromeric regions. We found that both mevalonate and nonmevalonate pathways for terpenoid biosynthesis are present in Monodopsis and Vischeria, which is different from Nannochloropsis and Microchloropsis that have only the latter. Our analysis further revealed extensive spliced leader trans-splicing in Monodopsis and Vischeria at 36–61% of genes. Altogether, the highquality genomes of Monodopsis and Vischeria not only serve as the much-needed outgroups to advance Nannochloropsis and Microchloropsis research, but also shed new light on the biology and evolution of eustigmatophyte algae.
Optimization of Enzymatic Fragmentation is Crucial To Maximize Genome Coverage: A Comparison of Library Preparation Methods for Illumina SequencingAbstract
Novel commercial kits for whole genome library preparation for next-generation sequencing on Illumina platforms promise shorter workflows, lower inputs and
cost savings. Time savings are achieved by employing enzymatic DNA fragmentation and by combining end-repair and tailing reactions. Fewer cleanup steps
also allow greater DNA input flexibility (1 ng-1 μg), PCR-free options from 100 ng DNA, and lower price as compared to the well-established sonication and
tagmentation-based DNA library preparation kits.
We compared the performance of four enzymatic fragmentation-based DNA library preparation kits (from New England Biolabs, Roche, Swift Biosciences and
Quantabio) to a tagmentation-based kit (Illumina) using low input DNA amounts (10 ng) and PCR-free reactions with 100 ng DNA. With four technical
replicates of each input amount and kit, we compared the kits` fragmentation sequence-bias as well as performance parameters such as sequence coverage
and the clinically relevant detection of single nucleotide and indel variants. While all kits produced high quality sequence data and demonstrated similar
performance, several enzymatic fragmentation methods produced library insert sizes which deviated from those intended. Libraries with longer insert lengths
performed better in terms of coverage, SNV and indel detection. Lower performance of shorter-insert libraries could be explained by loss of sequence coverage
to overlapping paired-end reads, exacerbated by the preferential sequencing of shorter fragments on Illumina sequencers. We also observed that libraries
prepared with minimal or no PCR performed best with regard to indel detection.
The enzymatic fragmentation-based DNA library preparation kits from NEB, Roche, Swift and Quantabio are good alternatives to the tagmentation based
Nextera DNA flex kit from Illumina, offering reproducible results using flexible DNA inputs, quick workflows and lower prices. Libraries with insert DNA
fragments longer than the cumulative sum of both read lengths avoid read overlap, thus produce more informative data that leads to strongly improved
genome coverage and consequently also increased sensitivity and precision of SNP and indel detection. In order to best utilize such enzymatic fragmentation
reagents, researchers should be prepared to invest time to optimize fragmentation conditions for their particular samples.
During the last decade, Illumina technology has come to dominate short read next generation sequencing, offering cost-effective high precision data for a
wide variety of applications such as whole genome sequencing (WGS), metagenomics and transcriptomics. In medical genetics, WGS is increasingly applied
to identify disease-causing genetic variation (SNPs or structural variants), disease susceptibility, cancer evolution and drug response, among a plethora of
other applications (1–3).
Since the cost of next generation sequencing is still high and often the amount of available DNA from the biological source is limited, methodological efforts
are constantly underway to improve the efficiency of WGS, i.e. extracting the most unique genetic information, with the highest possible quality and coverage,
from a variety of input DNA amounts and qualities, at the lowest cost and shortest hands-on time.
Library preparation is an essential process preceding sequencing itself, and comprises several aspects that affect the efficiency of WGS. It typically involves
the following main steps: fragmentation of the input DNA, end-repair and A-tailing of the DNA fragments, ligation of indexed sequencing adapters and
optional amplification of the ligated products. In addition, one or more cleanup steps are necessary in between steps to purify the DNA reaction products of
reagents from the previous reaction.
The most commonly used methods for fragmentation of genomic DNA are sonication, tagmentation (i.e., transposition of partial adapters into the DNA), and
enzymatic digestion by DNA endonucleases. Prior to the widespread adoption of enzymatic fragmentation, sonication was preferred, as it produces nearrandom
fragmentation, and fragment length can be adjusted by varying sonication time and strength. However, this requires a DNA sonication instrument and
in some cases also special consumable sonication tubes, adding considerable cost and handling time to the procedure. Based on sonication, Illumina
s tagmentation based Nextera reagents, albeit not allowing PCR-free prep (until recently, in its renamed version Illumina DNA prep). The principle of
tagmentation is the insertion by transposition of partial sequencing adapter sequences in genomic DNA that in effect fragments and adds adapters in a single
step. Subsequently, the adapters are extended to full length by PCR (in the Nextera DNA flex kit) and through an undisclosed PCR-free method in the Illumina
DNA PCR-free kit. The length of the DNA between the transposed adapters is dependent on the size of the beads and the concentration of the transposomes
(transposase loaded with adapters) coating on them(4), which is fixed for the respective kit. The sole possibility to modulate this length is by means of size
selection after library preparation is complete, which may discard a considerable portion of the library.
In recent years many competitor library prep kits that use enzymatic fragmentation have emerged. Offering quick and simple workflows, high flexibility of DNA
input amounts, PCR-free options with approximately 100 ng DNA, and importantly a lower price, these kits offer attractive alternatives. Here, we compare the
performance of several of these kits.
In order to evaluate the performance and sequencing data quality produced with enzymatic fragmentation-based library prep kits, we performed WGS using
four such kits (from New England Biolabs, Quantabio, Swift Biosciences and Roche) and the Nextera DNA flex tagmentation based kit (from Illumina) with 10
and 100 ng DNA inputs, and sequenced them on an Illumina HiSeq X instrument. All of the tested kits reproducibly delivered similar high-quality data in terms
of coverage and precision of single nucleotide variant (SNV) and indel detection. We observed that libraries with DNA insert size longer than the combined
sequencing reads length exhibited improved performance than those with shorter length. However, there is an optimum insert length, beyond which further
increase in insert length does not augment the sequencing data performance but can reduce clustering efficiency and data yields.
We compared the WGS performance of four enzymatic fragmentation-based library preparation kits: NEBNext Ultra II FS from NEB (hereafter referred to as
NEB), Swift 2S Turbo flexible from Swift Biosciences (hereafter Swift2S), SparQ DNA Frag and Library Prep from Quantabio (Quanta) and KAPA HyperPlus
from Roche (Kapa) with the tagmentation-based Nextera DNA FLEX kit from Illumina (Nextera). In our study design we prepared libraries from 10 ng and 100
ng input DNA amounts, whereby the 100 ng input reactions were PCR-free (or with minimal PCR cycles, where absolutely required; see Table 1 for details). An
entirely PCR-free option was not possible for Nextera and NEB kits, in which PCR is necessary to complete the sequencing adapters (and add indexes). With
four technical replicates for each input amount and kit, we aimed to test the reproducibility and robustness of the kits, with respect to fragment size
distribution and quality of the sequencing data. As input DNA we used genomic DNA from the human fibroblast cell line NA12878 (purchased from Coriell
Institute) that has been well characterized (e.g. for indels and single nucleotide variants) and often used as standard control DNA source for genomic studiescalled
therefore also “genome in a bottle” (5, 6). Library concentrations produced by each replicate are summarized in Additional Table 1. The libraries from 10
and 100 ng DNA inputs, each in four technical replicates, were pooled and sequenced over 20 lanes of HiSeq X flowcells (i.e. four lanes per kit), with 150 bp
paired end reads.
Microbial mitigation of greenhouse gas emissions from boreal lakesAbstract
The climate change crisis has drawn the attention of both the public and scientific
community to the carbon cycle and particularly to the importance of greenhouse
gases (GHG) carbon dioxide (CO2) and methane (CH4). CO2 has been a key
component of Earth´s climate regulation throughout its geological history and is now
the main driver of the current change in climate. CH4 has been responsible for a
quarter of the cumulative radiative forcing observed so far. Recent studies suggest
that lakes could be a major source of both CO2 and CH4. Boreal lakes are of special
interest as they represent 27% of the global lake area, and their production of CO2
and CH4 are expected to increase in the future.
This project aimed to investigate microbial processes with the potential to limit
the emissions of GHGs from boreal lakes. For that purpose, the impact of an increase
in phosphorus (P) concentration in the water on CH4 oxidation under the ice was
investigated as well as the community composition of the methanotrophic guild. We
also looked at the potential importance of chemolithoautotrophic microorganisms in
fixing CO2 in the water column. Using a combination of geochemical analysis,
genomic studies, and in vivo assays, we showed that P amendment has the potential
to increase methane oxidation, possibly limiting the expected increase in CH4
emissions due to anthropogenic fertilization of boreal lakes. We also showed that
methanotrophic community structure in boreal lakes is driven by CH4 concentration
and that alphaproteobacterial methanotrophs might play an important role in
removing CH4 from surface waters. Finally, we showed that dark carbon fixation is
a common trait in boreal lakes and that it seems related to the iron cycle.
Comprehensive dataset of shotgun metagenomes from stratified freshwater lakes and pondsAbstract
Stratified lakes and ponds featuring steep oxygen gradients are significant net sources
of greenhouse gases and hotspots in the carbon cycle. Despite their significant
biogeochemical roles, the microbial communities, especially in the oxygen depleted
compartments, are poorly known. Here, we present a comprehensive dataset including
267 shotgun metagenomes from 41 stratified lakes and ponds mainly located in the
boreal and subarctic regions, but also including one tropical reservoir and one
temperate lake. For most lakes and ponds, the data includes a vertical sample set
spanning from the oxic surface to the anoxic bottom layer. The majority of the
samples were collected during the open water period, but also a total of 29 samples
were collected from under the ice. In addition to the metagenomic sequences, the
dataset includes environmental variables for the samples, such as oxygen, nutrient and
organic carbon concentrations. The dataset is ideal for further exploring the microbial
taxonomic and functional diversity in freshwater environments and potential climate
change impacts on the functioning of these ecosystems.
ISSRseq: an extensible, low-cost, and efficient method for reduced representation sequencingAbstract
1. The capability to generate densely sampled single nucleotide polymorphism (SNP) data
is essential in diverse subdisciplines of biology, including crop breeding, pathology,
forensics, forestry, ecology, evolution, and conservation. However, access to the
expensive equipment and bioinformatics infrastructure required for genome-scale
sequencing is still a limiting factor in the developing world and for institutions with
2. Here we present ISSRseq, a PCR-based method for reduced representation of genomic
variation using simple sequence repeats as priming sites to sequence inter-simple
sequence repeat (ISSR) regions. Briefly, ISSR regions are amplified with single primers,
pooled, and used to construct sequencing libraries with a low-cost, efficient commercial
kit, and sequenced on the Illumina platform. We also present a flexible bioinformatic
pipeline that assembles ISSR loci, calls and hard filters variants, outputs data matrices in
common formats, and conducts population analyses using R.
3. Using three angiosperm species as case studies, we demonstrate that ISSRseq is highly
repeatable, necessitates only simple wet-lab skills and commonplace instrumentation, is
flexible in terms of the number of single primers used, is low-cost, and can generate
genomic-scale variant discovery on par with existing RRS methods that require high
sample integrity and concentration.
4. ISSRseq represents a straightforward approach to SNP genotyping in any organism, and
we predict that this method will be particularly useful for those studying population
genomics and phylogeography of non-model organisms. Furthermore, the ease of
ISSRseq relative to other RRS methods should prove useful for those conducting research
in undergraduate and graduate environments, and more broadly by those lacking access
to expensive instrumentation or expertise in bioinformatics.
Complete Genomes of Symbiotic Cyanobacteria Clarify the Evolution of Vanadium-NitrogenaseAbstract
Plant endosymbiosis with nitrogen-fixing cyanobacteria has independently evolved in diverse plant lineages, offering a unique window to study the evolution and genetics of plant–microbe interaction. However, very few complete genomes exist for plant cyanobionts, and therefore little is known about their genomic and functional diversity. Here, we present four complete genomes of cyanobacteria isolated from bryophytes. Nanopore long-read sequencing allowed us to obtain circular contigs for all the main chromosomes and most of the plasmids. We found that despite having a low 16S rRNA sequence divergence, the four isolates exhibit considerable genome reorganizations and variation in gene content. Furthermore, three of the four isolates possess genes encoding vanadium (V)-nitrogenase (vnf), which is uncommon among diazotrophs and has not been previously reported in plant cyanobionts. In two cases, the vnf genes were found on plasmids, implying possible plasmid-mediated horizontal gene transfers. Comparative genomic analysis of vnf-containing cyanobacteria further identified a conserved gene cluster. Many genes in this cluster have not been functionally characterized and would be promising candidates for future studies to elucidate V-nitrogenase function and regulation.