top of page
Search
potuhanie

Bacterial gene regulation pdf free: mechanisms and evolution of transcription initiation



A specialized machine learning procedure for TSS recognition allowed us to map 15,923 TSSs: 14,360 in free-living bacteria, 4329 in symbiosis with soybean and 2766 in both conditions. Further, we provide proteomic evidence for 4090 proteins, among them 107 proteins corresponding to new genes and 178 proteins with N-termini different from the existing annotation (72 and 109 of them with TSS support, respectively). Guided by proteomics evidence, previously identified TSSs and TSSs experimentally validated here, we assign a score threshold to flag 14 % of the mapped TSSs as a class of lower confidence. However, this class of lower confidence contains valid TSSs of low-abundant transcripts. Moreover, we developed a de novo algorithm to identify promoter motifs upstream of mapped TSSs, which is publicly available, and found motifs mainly used in symbiosis (similar to RpoN-dependent promoters) or under both conditions (similar to RpoD-dependent promoters). Mapped TSSs and putative promoters, proteomic evidence and updated gene annotation were combined into an annotation file.


The genome-wide TSS and promoter maps along with the extended genome annotation of B. japonicum represent a valuable resource for future systems biology studies and for detailed analyses of individual non-coding transcripts and ORFs. Our data will also provide new insights into bacterial gene regulation during the agriculturally important symbiosis between rhizobia and legumes.




bacterial gene regulation pdf free




The aim of this study was to generate dRNA-seq data of B. japonicum USDA 110 grown free-living or in symbiosis with soybean to be used for genome-wide mapping of TSSs and promoters, and for identification of new genes. To perform global mapping of TSSs, we developed a TSS-identification tool that uses machine-learning approaches to propagate expert knowledge initially applied to a subset of the data. For identification of new protein-coding genes, we used a proteogenomics approach. Furthermore, we used our condition-specific TSS map to predict and map promoters by a new algorithm, which is publicly available. Finally, we provide an updated and extended genome annotation with mapped promoters, TSSs and terminators in the generic feature format 3 (gff) and the Gene Bank sequence format (gbk). We expect that these data will serve as a useful resource both for detailed analysis of specific genes and for systems biology studies of the symbiosis between rhizobia and legumes, as well as for future annotations of bacterial genomes.


To assess the reliability of the SVM-based TSS mapping, we compared our data to previously published results. Our data matched 35 out of 38 previously determined TSSs of genes expressed under symbiotic conditions or in free-living cells, i.e., under the conditions investigated in this study (Additional file 3: Table S4). Well-known examples are genes blr1769 (nifH encoding the dinitrogenase reductase) and blr1759 (nifB encoding a nitrogenase cofactor biosynthesis protein). As expected, transcripts of these genes were detected only in bacteroids and the respective TSSs Bja_TSS_3777 and Bja_TSS_3758 were mapped at previously determined genomic positions 1,928,416 and 1,921,754 [36, 37]. Known TSSs induced under conditions not relevant to our study either did not pass our stringent filtering criteria (e.g., TSS T2 of the heat shock sigma factor gene rpoH 2 at genomic position 8,074,642 used predominantly at high temperature, [30]), or were scored but had low peaks consistent with low expression of the corresponding genes (ecfQ, ecfF, bsl1652; [13, 28, 38]). These and additional examples summarized in Additional file 3: Table S4 demonstrate the quality of TSS mapping based on dRNA-seq and machine learning.


We used the dRNA-seq analysis to compare the primary transcriptome of free-living B. japonicum to that of bacteroids in soybean root nodules. Of 15,923 TSSs identified in this study, 14,360 were detected in Free and 4329 in Nod, with 2766 being detected under both conditions (Fig. 2a). This is in agreement with previous transcriptomics data [13] showing that a much lower number of genes (2780) were expressed during symbiosis compared to free-living conditions (5439 genes) and can be explained by the non-dividing and thus transcriptionally less active state of nitrogen-fixing bacteroids [50, 51]. The data also indicate one advantage of a dRNA-Seq approach: due to the ability to directly map reads against two reference genomes, more transcripts were identified in symbiosis by dRNA-seq compared to the hybridization-based microarray analysis [13], where these signals cannot be separated in a similar manner.


To explore additional evidence for translation of transcripts with TSSs identified here, we re-analyzed existing proteomics data of B. japonicum USDA 110 grown under free-living conditions in rich PSY medium or in minimal medium [56], and in symbiosis with soybean (G. max) [15], cowpea (Vigna unguiculata) or siratro (Macroptilium atropurpureum) [57]. For this, we devised a novel variant of a proteogenomics approach that relies on generating an extended protein search database guided by the TSS evidence for (i) ORFs missed in the original RefSeq annotation, including short ORFs which are typically under-represented in genome annotations [58], here taken from the ISGA annotation (see Additional files 5, 6 and 7), (ii) ORFs that are longer or shorter compared to the RefSeq annotation, and (iii) evidence for proteins encoded by transcripts originating from an iTSS.


We analyzed the primary transcriptome of the soybean symbiont B. japonicum USDA 110 grown under free-living and symbiotic conditions, and provide the first genome-wide TSS and promoter maps for this bacterium. TSS recognition was performed with a specialized tool based on machine learning which enabled fast and sensitive global mapping of 14,360 TSSs in free-living bacteria and 4329 TSSs in bacteroids within the large B. japonicum genome. The TSS map served as a basis for de novo prediction of promoter motifs with similarity to RpoD- and RpoN-dependent promoters by a new algorithm. The algorithm is publicly available and will be useful for de novo prediction of bacterial promoters. Combining the global TSS map with a proteogenomics approach proved to be a powerful solution and led to an extension of the repertoire of protein-coding genes, providing evidence for 107 new proteins and identifying different N-termini for 178 proteins compared to the existing annotation. The score distribution of previously mapped TSSs, TSSs validated in this study and TSSs of new protein genes allowed us to define a score threshold that flags a lower confidence class of TSSs. This lower confidence class contains some functional TSSs of weakly expressed genes. Mapped TSSs and promoters were included in re-annotation files along with the proteomics evidence and predicted terminators and operons. Our updated and extended annotation is a valuable resource for both future systems biology studies or for in-depth analyses of specific genes and their regulation in B. japonicum and related bacteria.


Existing proteomics data of B. japonicum 110 grown under free-living conditions (rich (PSY) and minimal medium, [56], and in symbiosis with soybean (G. max, [15]), cowpea (V. unguiculata) or siratro (M. atropurpureum) [57] was re-analyzed as follows: fragment ion mass spectra were searched with MS-GF+ (MS-GFDB v9979, [81]) against a protein database containing sequences of 8317 B. japonicum USDA 110 proteins, 2857 shorter ORFs and 194 longer ORFs, 1391 newly predicted ORFs, 5894 protein sequences generated by in-silico translation starting from 593 iTSS with strong dRNA-seq evidence (up to 200 nt downstream), and 256 common contaminants (e.g., human keratin, trypsin). In total, the protein database contained 18,909 protein sequences. Spectra were searched for a match to fully-tryptic and semi-tryptic peptides with a mass tolerance of 25 ppm. Carbamidomethylation was set as fixed modification for all cysteines, while oxidation of methionines was considered as optional modification. Based on the target-decoy search strategy a stringent score cutoff was determined that resulted in an estimated FDR of 0.1 % at the peptide spectrum match (PSM) level. PSMs above this cutoff were subjected to a PeptideClassifier analysis [82] and only peptides that unambiguously identify one protein, or that imply a longer or shorter from of an annotated protein (extending the concept of Gerster et al. [83]), were considered. We furthermore required at least 3 independent spectra for a protein identification as described [84], which resulted in a total of 4090 identified protein groups at an estimated protein level FDR below 1 % (0.9 %).


Cell-free gene expression (CFE) emerged as an alternative approach to living cells for specific applications in protein synthesis and labelling for structural biology and proteomics studies. CFE has since been repurposed as a versatile technology for synthetic biology and bioengineering. However, taking full advantage of this technology requires in-depth understanding of its fundamental workflow beyond existing protocols. This Primer provides new practitioners with a comprehensive, detailed and actionable guide to best practices in CFE, to inform research in the laboratory at the state of the art. We focus on Escherichia coli-based CFE systems, which remain the primary platform for efficient CFE. Producing proteins, biomanufacturing therapeutics, developing sensors and prototyping genetic circuits illustrate the broader utility and opportunities provided by this practical introduction to CFE. With its extensive functionality and portability, CFE is becoming a powerful and enabling research tool for biotechnology.


The synthesis of proteins using CFE is now used in educational kits91,92. BioBits kits, for example, fill a significant gap in the available resources to teach molecular and synthetic biology to high school students. Considering that BioBits kits use freeze-dried CFE and plasmids, they do not require cold-chain distribution or sterile conditions to function, factors that are often limiting for implementation in educational environments. Concepts such as tuning of gene expression are illustrated by getting users to vary the plasmid DNA concentration in CFE reactions. Different proteins and materials can be produced, including fluorescent reporters, fragrances and hydrogels91, each of which stimulate different senses, thus engaging users. 2ff7e9595c


1 view0 comments

Recent Posts

See All

Comments


bottom of page