|Year : 2020 | Volume
| Issue : 1 | Page : 5-11
16S ribosomal RNA gene-based metagenomics: A review
Asmita Kamble, Shriya Sawant, Harinder Singh
Department of Biological Sciences, Sunandan Divatia School of Science, NMIMS Deemed to be University, Vile Parle (W), Mumbai, India
|Date of Submission||03-Mar-2020|
|Date of Decision||09-May-2020|
|Date of Acceptance||20-May-2020|
|Date of Web Publication||12-Jun-2020|
Dr. Harinder Singh
Department of Biological Sciences, Sunandan Divatia School of Science, NMIMS University, Vile Parle (West), Mumbai, Maharashtra
Source of Support: None, Conflict of Interest: None
With the advent of contemporary molecular tools, the conventional microbiological isolation, enrichment techniques, and approaches have changed considerably. Molecular techniques such as polymerase chain reaction, cloning, and sequencing have shown that the major percentage of microbial diversity in an ecosystem remain “unculturable” or “as yet uncultivable” due to the lack of information on their biology, limited selection media, and culture conditions that could support their growth. Identifying and knowing more about them have become an important objective in the microbiological research. The ecological, environmental, and functional implications of a microbial ecosystem can be deciphered by knowing its microbial composition and interactions. The areas of whole-cell and targeted gene metagenomics are playing a key role in accomplishing this objective. The present review discusses the 16S ribosomal RNA (16S rRNA) gene metagenomics approach, which has found major applications in identifying the composition of a given microbial ecosystem. Different systems, processes, and analysis tools are available to perform 16S rRNA metagenomics; however, there are few concerns that require more investigation to gain the maximum benefit of these techniques.
Keywords: 16S ribosomal RNA, metagenomics, microbiome, next-generation sequencing
|How to cite this article:|
Kamble A, Sawant S, Singh H. 16S ribosomal RNA gene-based metagenomics: A review. Biomed Res J 2020;7:5-11
| Introduction|| |
Microbial research has seen a major revolution in the past 25 years, especially after the introduction of contemporary molecular techniques in the last decade. The current microbial culturing method on standard media replicates the essential aspects such as pH, temperature, nutrients, and osmotic conditions that could only support the growth of small fraction of total microbial diversity, while the majority of it remain unculturable. Next-generation sequencing data have evidently shown that the humongous size of the uncultured microbial world remains unexplored in conventional culturing techniques. This is mainly due to the limitation of conventional microbiological isolation and enrichment techniques that are not capable of supporting the growth of all the microbes present in a sample, under laboratory conditions. Because of this drawback, the traditional microbiological approach of isolating and characterizing novel microbes from an environmental source has taken a back seat. Researchers are more interested in capturing and profiling the complete microbial diversity present in a given sample, rather than a small percentage of it which can be done by conventional methods. Along with the progress in the tools and techniques, the sample source has also expanded hugely, samples are now being explored from a range of natural or artificial sources such as agricultural or environmentally relevant soil, aquatic habitat, flora present on or inside other organisms, like domestic animals or human body.,, The increased interest has initiated many local and global scale microbiome projects. One famous example is the Human Microbiome Project (HMP; https://hmpdacc.org/), which was initiated by the National Institutes of Health (NIH), launched in 2008 aims to identify the complete healthy human microbiome to appreciate the diversity and complexity of the microbial communities. Similar initiatives have been taken from national funding agencies from different countries such as Commonwealth Scientific and Industrial Research Organization (Australia), Canadian Institutes of Health Research (Canada) (https://cihr-irsc.gc.ca/e/39939.html), European Commission (Europe) (https://www.gutmicrob iotaforhealth.com/met ahit/), National Agency for Research (France), European Molecular Biology Laboratory (Germany), Medical Research Council (Gambia), Japan Science and Technology Agency (JST, Japan), National Research Foundation (Korea), and NIH (United States). The International Human Microbiome Consortium (IHMC; http://www.human-microbiome.org/) coordinate the activities and policies, share microbiome data and protocols and promote the generation of robust data resource. Recently, interdisciplinary Unified Microbiome Initiative was started with an objective to discover and understand different Earth's microbial ecosystems. In the recent past, a huge impetus has been given to the human microbiome research in different countries. Most of these studies have generated huge data, documenting the diverse microbes present in various populations that are different on the basis of geographical location, lineage, eating or working lifestyle, etc.,,, This practice has definitely increased the demand for nontraditional and contemporary techniques to understand the microbial world which resist, avoid, or escape the routine cultivation. From these focused researches, an area of metagenomics has developed, that basically involve the genomic examination of a populace of microorganisms. Metagenomics is a habitat-based investigation of mixed microbial populations at the DNA level. The process of metagenomics involves isolating DNA from an environmental sample or any sample to be tested, followed by sequencing and genome analysis. Targeted metagenomics approaches such as 16S ribosomal RNA (16S rRNA) gene metagenomics, include steps such as sample collection, DNA isolation, 16S rRNA polymerase chain reaction (PCR) amplification, and cleanup, followed by the next generation sequencing (NGS), and sequence analysis using various computational tools.
The past decade has witnessed many reports on metagenomics studies for microbial diversity analysis.,,,, Large scale specific projects such as US NIH-funded HMP consortium have concentrated on producing reference genome of a “healthy individual” using metagenomics approaches., Several methods such as whole genome shotgun, metagenomics sequencing of 16S rRNA were employed to thereby obtain reference microbial genome of the human body. In addition, research groups worldwide are also working on dynamics and interactions, and specific components of the microbiome with a variety of disease conditions, including cancers.,,,, Various steps involved in such metagenomics studies are discussed below.
Sample collection and isolation of DNA
Isolating the DNA from a given sample is an important step in metagenomics as this is the starting material and decides the result quality for all the downstream processes. Isolation of good quality and quantity of DNA is necessary for moving forward with the sequencing and analysis processes. There are three critical factors in the isolation process which are important: a proper sample collection, isolating the intact and high-quality DNA, and isolating DNA free of PCR contaminants. Proper sample collection mainly means it is necessary to ensure that the collected sample is a representative of the environment location under consideration. If the location in question is of a natural environment, it is very crucial to check and avoid the man-made/artificial interference/contamination, which can modify the biodiversity analysis. For example, a deep-sea marine sample collection should be done with the precaution to avoid the microbial contaminants from casual or incorrect sample collection practices or instruments. Similarly, a natural skin microflora sample should be from an untreated skin surface, as the diversity will change if it is being treated with any chemicals such as soap or cosmetics. A root rhizosphere soil should be collected strictly from the immediate surroundings of the plant roots, as it drastically differs from the neighboring root-free soil. The DNA isolation protocol must be chosen appropriately to avoid excessive shearing or degradation of DNA. Good yield of high-molecular weight DNA is preferred, although it is not a strict requirement. The high yield is preferable because it can increase the size of the population under analysis, as opposed to the low yield of genomic DNA. The yield, in turn, is dependent on an efficient cell lysis protocol, which is another important factor playing a role in DNA isolation. Cell lysis can be done either direct or indirect lysis. In direct lysis technique, cells are lysed in the sample itself, and then, DNA is recovered, whereas in indirect lysis technique, cells are first separated from the sample, and then, DNA is extracted. It has been noted that more DNA is isolated using the direct lysis method as compared to indirect, although higher purity is obtained in an indirect lysis method. Another alternative for improving the yield of DNA is to preculture the sample under required growth conditions, or collecting the sample from an environment where the sample is exposed to favorable conditions, thus enhancing the trait of interest naturally. This is particularly important when targeting a specific population of interest. For example, aiming for a novel thermostable microbe, a sample from sources like hot springs will be favorable instead of normal sample source. Isolating novel microbes capable of utilizing a particular substrate can be improved by growing the sample in minimal media with the required substrate to select the preferred population. It can be argued that these modifications will change the diversity, but it should be noted that the objective of such studies is to find novel candidates, rather than assessing diversity. Another important issue which is faced with specific samples such as soil is the PCR inhibitors such as humic and fulvic acids which are coextracted along with total soil DNA during extraction procedures., To remove or avoid the PCR inhibitors, DNA isolation kits are available and different protocols have been reported.,,,, Another point which has been observed is the content of DNA from nontarget organisms. For example, genomic DNA isolation from a human skin or oral sample will contain only a small percentage of bacterial DNA, and largely will be of human DNA, which gets easily isolated with any DNA isolation protocol. Similarly, the soil sample will have DNA from fungi, protozoa, and viruses along with the bacteria. If the next step is to do a specific PCR to amplify the target gene, then this contaminant DNA will not be a hindrance. However, when whole-genome sequencing is performed, the analysis part will be tedious to remove the nontarget DNA data. Hence, the DNA isolation protocol can be selected based on the type of sample and the objective of the study.
16S ribosomal RNA gene sequencing and metagenomics
The selection of genes for amplification is a major step toward metagenomics. In most of the studies, 16S rRNA gene sequencing has been widely used for diversity analysis in the polymicrobial population.,, The use of rRNA gene sequence and its importance to characterize and study evolution of bacteria dates back to 1970s, where Carl Woese described the use of molecular sequences to determine the evolutionary relationships. rRNA gene is a present in all self-replicating systems, can be readily isolated, and its sequence changes slowly overtime, allowing the detection of relatedness among different bacterial species. The 16S rRNA is universally present in all prokaryotes and has multiple sub-regions, namely V1–V9 which can be used for the distinct identification of various prokaryotes. Along with the hypervariable region, there are the regions conserved across all prokaryotes which allows the designing of universal primers (F01, 8F, 357F, 515F, 1237F, 519R, 1100R, 1391R) to amplify the 16S rRNA gene [Figure 1] and [Table 1]., These properties of the 16s rRNA gene make it a useful marker for taxonomical classification and separation, giving rise to various 16S identification tools and databases such as the Ribosomal Database Project (https://rdp.cme.msu.edu/), Greengenes (https://greengenes.secondgenome.com/), and SILVA (https://www.arb-silva.de/). However, sequencing the entire 16S rRNA gene is not necessary for microbial diversity analysis. Single or combination of different variable regions can be used for diversity analysis. The choice of region varies and comparative studies have been carried out using the different regions of the 16S rRNA compositional analysis. Regions V3-V5 is more often used for this purpose, although the region V1–V2 is also reported to be used. It was observed that for identification of maximum archaeal sequences, regions V1–V4 showed best results for species richness at genus level and V3–V5 region for family level identification. For the analysis of bacterial sequences, V1–V4 regions showed best results for sample richness at species level and has least error rates. It is important that the region used for sequencing should be chosen rightly, for example, the V3–V4 region of 16S rRNA is a preferred region giving the best results and have low error rates as compared to V8–V9 and are more appropriate for clustering analysis., However, other regions of the 16S rRNA have also been used for sequencing in various studies., Because of the universality of the gene sequence, amplification of the 16S rRNA gene can lead to potential complications, if reagents are contaminated, leading to the amplification of unwanted products. In most of the bacteria, there are multiple copies of the 16S rRNA gene per cell, which can affect the quantitative studies toward understanding the proportion of different microbes in a sample. Recently, researchers have used other universal gene which is present in a single copy per cell (like rpo B), thus giving a correct estimation of the population numbers. A simple similarity search of universal primers showed that there is a slight possibility of not amplifying unique and novel sequences. The solution to this problem is to update the universal primers sequence periodically to include the new gene information that has been added recently to the database. The template DNA concentration used for PCR is crucial, there should be sufficient copies of the target for significant amplification. In certain cases, like DNA from the human body, the concentration of the target DNA template can be low, as most of it is human DNA, PCR amplification will not be efficient. An optimization PCR should be performed using specific target primers to decide the template concentration for an efficient amplification.
|Figure 1: The figure shows the location of universal polymerase chain reaction primers used to amplify multiple sub regions (V1 to V9) of 16S ribosomal RNA gene for analyzing microbial diversity from various environmental samples|
Click here to view
|Table 1: Universal primers for 16S ribosomal RNA gene polymerase chain reaction|
Click here to view
The selection of sequencing method depends on the requirement and specific objective of the research. Since metagenomics generally involves the detection of a mix population, simple sanger sequencing fails to sequence it. The next-generation sequencing such as pyrosequencing, illumina, nanopore, and PacBio sequencing are mostly used here. These techniques have been reviewed extensively, and readers are advised to refer to these review articles.,,, Among the different NGS technologies, 454/Roche is one of the first sequencer to be manufactured. It clonally amplifies random DNA fragments attached to microscopic beads deposited in the picotitre plate. The average read length of the sequence produced using 454/Roche is 600–800 bp. On the other hand, Illumina technology by Solexa amplifies DNA fragments immobilized on a surface resulting in a cluster of identical DNA fragments. The cluster density ranges from 170 to 1400 K clusters/mm2, depending on the type of Illumina system used, and 150–300 bp read length can be obtained using this method. There are few more sequencing methods developed in recent times like Ion Torrent by Thermo Fischer Scientific which amplifies a read length from 200 to 600 bp with an output of 0.3–25 gigabases. This technology has smaller run time, i.e., 2.5–4 h as compared to other technologies. Third-generation sequencers have recently been started, which aim to amplify longer reads in a real-time sequencing. Single-molecule real-time (SMRT) sequencing from Pacific Biosciences (PacBio) was the first popular third-generation sequencer, SMRT sequencing introduced the capability of real-time sequence acquisition for read lengths > 1 kb using sequencing by synthesis and optical detection. In 2014, Oxford Nanopore Technologies released nanopore sequencing in the form of the MinION, a handheld sequencer that uses a grid of membrane-embedded biological nanopores. The MinION has the distinct advantage of being highly portable and capable of sequencing when plugged into a laptop.
For sequencing of 16S rRNA for taxonomic classification, reads up to 200–250 bp are satisfactory., The read length plays an important role in metagenomics studies since identifying up to species level can be difficult with shorter reads. Now, various kits are available, which can read up to 400 bp, making it easier for the identification of 16S rRNA sequences. Another important parameter in NGS metagenomics studies is Depth of Sequencing/Coverage, which is the number of times a genome has been sequenced. This primarily depends on the requirement and aim of the study. The sequencing depth varies from 30x-50x for whole-genome sequencing, to 100x for whole-exome sequencing and ChIP-Seq. For RNA sequencing, sequencing depth is calculated in terms of numbers of millions (M) of reads to be sampled. For profiling highly expressed genes, 5–25 million reads per sample should be enough, whereas for experiments looking for indepth transcriptome analysis, 100–200 million reads per sample might be required (https://sapac.support.illum ina.com/bulletins/2017/04/consi derations-for-rna-seq-re ad-length-and-coverage-.html). If the main goal of the study is to only identify the major bacterial phyla in the sample, a smaller sampling depth up to 300–400 M reads per lane is sufficient. A study by Leyet al. in 2008 concluded the presence of two major phyla in vertebrate gut microbiome of which 75% were firmicutes and 18% bacteriodetes by using 350 sequences/sample depth coverage., Shallow depth sampling can be carried for the detection of major community data/large scale pattern. On the other hand, studies that aim to identify species-level diversity in a sample, higher sequence read is required. For 16S rRNA sequencing, sequence reads of 1000sequences/sample or more are recommended to provide species-level identification. Study carried out by Yang et al. in 2018 which involved diversity analysis up to the species level of oral microbiome in oral cancer patients with respect to different oral cancer stages, sequence depth of 10,000 sequences/sample was used as a threshold for analysis. Low-quality reads (Q-score <20) are generally not useful as metagenomics software such as Quantitative Insights Into Microbial Ecology (QIIME) filter out low quality reads that do not satisfy the requirement.
Several factors are important while assembling the data obtained from sequencing. Two major concerns while processing the data from raw format are the presence of appropriate read lengths and reduction of data processing requirements. Depending on the read length of the sequences, the pipelines used for gene prediction will differ. MG-RAST (http://api.mg-rast.org/api.html) and QIIME2 (QIIME: https://qiime2.org/) are an open source pipelines that suggests automatic phylogenetic and functional analysis of metagenomes, widely used to organize 16S rRNA sequences of length 100 bp and above, whereas IGM/M (The Integrated Microbial Genomes with Microbiome Samples: https://img.jgi.doe.gov/m/) prefers data input in the form of contigs. However, the longer the sequence length, the better is the potentiality of obtaining authentic information by comparing it with the genetic data by homology searching. To address the second concern while annotating the data, algorithms such as uclust or CD-HIT are used which assemble similar reads into contigs and clusters making further processing of raw data easier. Longer and more complex sequences cannot be analyzed without assembly, and therefore, might be lost from the data set if they are not arranged appropriately.
For typical metagenomics data for the identification of bacterial diversity, common pipelines such as MG-RAST and QIIME2 involve five major processes that sequences have to undergo before the actual visualization of the data [Figure 2].
|Figure 2: The steps involve are collection of various ecological samples, DNA extraction followed by 16S ribosomal RNA sequencing with the help of primers [Figure 1], and data analysis with the help of appropriate pipeline|
Click here to view
First, the raw sequences are demultiplexed. Every Next Generation Sequencer can sequence the hundreds of samples in one run by multiplexing, where a unique barcode is added to either one or both ends of each sequence. Once the samples are sequenced, these unique barcodes help in de-multiplexing, i.e., identifying individual sample sequences. Depending on the type of raw data, various options are available for demultiplexing. For example, q2-demux and q2-cutadapt are used in QIIME 2 for sequences in EMP (Earth Microbiome Project; https://earthmicrobiome.org/) format and multiplexed barcode in-sequence format respectively. The sequences obtained in the raw data can be either single-end (sequences are sequenced only in one direction) or paired-end sequences (each sequence is sequenced bi-directionally). After de-multiplexing, the pairs of paired-end sequences have to be identified and joined to each other for accurate information.
The second step in the pipelines is denoising. This step aims to filter out noisy reads, de-replicate (reduce repetition), remove singletons, remove chimeric sequences, and correct errors in marginal sequences. In QIIME2, denoising can be carried out by two methods, Divisive Amplicon Denoising Algorithm (DADA2) and Deblur. For Deblur, basic quality score-based filtering is required which is not necessary in DADA2. The outputs obtained after denoising are variants of “sequence variant (SV)” or “amplicon SV,” and these are 100% operational taxonomic units (OTUs). This step is a prerequisite for the next step, i.e., clustering. OTU clustering is implemented by either closed reference, open reference or de novo strategy. In closed reference clustering, all sequences are compared against a reference sequence collection, and any reads that do not match the reference sequences are discarded from downstream analysis, whereas in an open-reference clustering, reads are clustered against the reference and those that do not match the reference are clustered de novo. The final outcomes after denoising and clustering are feature table artefact and representative sequences artefact. The feature table gives a summary of all annotated features in the data (mRNA, genes, and sequences of 16S rRNA) along with the information such as the number of times a feature has been repeated in the data and in how many samples. On the other hand, representative sequences artefact gives information about the actual DNA sequence to every annotated feature which are later designated a taxonomy after classification using 16S rRNA databases like SILVA/RDP/GreenGenes. These are used in all downstream analysis and are the central record of all observations of a sample.
A series of programs further aid to classify the obtained sequences and feature table into taxonomy. Each sequence is allotted its taxonomy which helps in carrying out phylogenetic analysis. The taxonomic compositions can also be viewed as bar plots and heat maps. Following this, detailed analysis including alpha and beta analysis can also be carried out using this data. Alpha analysis measures the level of diversity within individual samples. Beta diversity measures the level of diversity or dissimilarity between the samples and dissimilarity index measures of microbiome composition dissimilarity along with principle coordinate analysis which are depicted in the form of Bray-Curtis matrix (quantitative) and Jaccard matrix (qualitative). Rarefaction index can be calculated for the analysis of species richness for a given number of individual samples, based on the construction of so-called rarefaction curves.,,, Diversity analyses, including Shannon Index, Simpson Index, and Faith PD are also determined using the pipelines. The whole data analysis result in a list of bacteria identified in the sample which can be further distributed in separate lists of Phylum, class, order, family, genus, and species and is dependent on the database chosen as a reference, for example, RDP, SILVA, Greengenes, etc., The diversity is given in the form of absolute count and percent composition. The identification is based on the fact that a sequence is matched with greater than or equal to 97% similarity to a sequence present in the database. This will give only the species of genus present in the database, and any new genus will not get highlighted in the results. The unclassified candidates are clustered under phylum Saccharibacteria formerly known as Candidate Division TM7. This group lacks the culture isolates, so, the practical role or implication for them is missing and further investigation is required for more information. Whole-genome sequencing can be one way to identify these novel microbes, and their genetic content which can further help in deciphering their role.
| Conclusion|| |
According to environmental microbiologists, <2% of bacteria can be cultured in laboratories from the different ecological environments. The percent culturability (a percentage of culturable bacteria in comparison with total cell counts) of bacteria varies for different habitats. For instance, it ranges from 0.001% to 15% for habitats such as for seawater, freshwater, mesotrophic lake, unpolluted estuarine waters, activated sludge, soil, and sediments.,,, On the other hand, in habitats like the human oral cavity, 50% of microflora can be cultured. However, 16S rRNA gene sequence-based metagenomics approaches, widely used in the recent past, have enhanced our understanding of microbial diversity at such locations. These microbial diversity analysis studies have definitely led to the increased discovery of novel bacterial lineages. Some of the famous projects such as the HMP headed by NIH became possible only due to the advances in the metagenomics approaches. Not only do these studies help us in understanding the diversity of microbiome in different organs of the human body, but it also helps to understand the changing dynamics of this microbiome in the state of disease and other unhealthy conditions. It has helped in the finding and understanding the diversity in environmentally relevant microbiome population from different terrestrial and aquatic sources. Although most of the studies have been concentrated on bacterial studies, more initiatives should be taken to explore other microbes such as fungi and viruses. Efforts should also be made to enrich the available 16S rRNA gene database to remove the redundancy in entries, reducing the percentage of partial sequences, as species-level identification has more relevance from the perspective of applications. These studies can be further enriched by whole-genome analysis, and related omics studies such as metatranscriptomics, metaproteomics and metametabolomics, to understand the functionality and dynamics of a microbial community. The network association between microbes and with the environment at different levels is important to understand the overall function and characteristics of an individual habitat and its microbial ecology. This can help in curing or modifying certain habitats such as soil for various agricultural applications. These metagenomics studies can also lead to exploration and discovery of novel genes, proteins, enzymes, metabolites, and active compounds that can be of medical, environmental, and agricultural significance. With new population diversity getting reported very frequently, it can help in revealing novel microbes that have application or potential in the medical, agricultural, and biotechnology industry. Future work in this field expects enhancement in cost-effective and streamlined protocols and workflow, technical expertise, efficient and robust computational/bioinformatics tools and pipelines. Recent metagenomics research has made significant contributions toward enhancing the microbial database; however, the large repertoire of microbes presents around us, warrants further research in order to understand and exploit them for the benefit of mankind.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Stewart EJ. Growing unculturable bacteria. J Bacteriol 2012;194:4151-60.
Pace NR, Stahl DA, Lane DJ, Olsen GJ. The analysis of natural microbial populations by ribosomal RNA sequences. In: Marshall KC, editors. Advances in Microbial Ecology. Advances in Microbial Ecology. Vol. 9. Boston, MA: Springer; 1986.
Pace NR. Analyzing natural microbial populations by rRNA sequences. ASM News 1985;51:4-12.
Handelsman J, Rondon MR, Brady SF, Clardy J, Goodman RM. Molecular biological access to the chemistry of unknown soil microbes: A new frontier for natural products. Chem Biol 1998;5:R245-9.
Kirk HJ, Kelley ST, Pace NR. New perspective on uncultured bacterial phylogenetic division OP11. Appl Environ Microbiol 2004;70:845-9.
Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, Gordon JI. The human microbiome project. Nature 2007;449:804-10.
Proctor LM, Creasy HH, Fettweis JM, Lloyd-Price J, Mahurkar A, Zhou W, et al
. The integrative human microbiome project. Nature 2019;569:641-8.
Alivisatos AP, Blaser MJ, Brodie EL, Chun M, Dangl JL, Donohue TJ, et al
. MICROBIOME. A unified initiative to harness Earth's microbiomes. Science 2015;350:507-8.
Huttenhower C, Gevers D, Knight R, Abubucker S, Badger JH, Chinwalla AT, et al
. Structure, function and diversity of the healthy human microbiome. Nature 2012;486:207-14.
Diaz-Torres ML, Mcnab R, Spratt DA, Villedieu A, Hunt N, Wilson M, et al
. Novel tetracycline resistance determinant from the oral metagenome downloaded from. antimicrob. Agents Chemother 2003;47:1430-2.
Suenaga H. Targeted metagenomics: A high-resolution metagenomics approach for specific gene clusters in complex microbial communities. Environ Microbiol 2012;14:13-22.
Lazarevic V, Whiteson K, Huse S, Hernandez D, Farinelli L, Østerås M, et al
. Metagenomic study of the oral microbiota by Illumina high-throughput sequencing. J Microbiol Methods 2009;79:266-71.
Banerjee J, Mishra N, Dhas Y. Metagenomics: A new horizon in cancer research. Meta Gene 2015;5:84-9.
Yu J, Feng Q, Wong SH, Zhang D, Liang QY, Qin Y, et al
. Metagenomic analysis of faecal microbiome as a tool towards targeted non-invasive biomarkers for colorectal cancer. Gut 2017;66:70-8.
Kim NH, Park JH, Chung E, So HA, Lee MH, Kim JC, et al
. Characterization of a soil metagenome-derived gene encoding wax ester synthase. J Microbiol Biotechnol 2016;26:248-54.
Xu X, He J, Xue J, Wang Y, Li K, Zhang K, et al
. Oral cavity contains distinct niches with dynamic microbial communities. Environ Microbiol 2015;17:699-710.
Dehingia M, Devi KT, Talukdar NC, Talukdar R, Reddy N, Mande SS, et al
. Gut bacterial diversity of the tribes of India and comparison with the worldwide data. Sci Rep 2015;5:18563.
Sánchez-Sanhueza G, Bello-Toledo H, González-Rocha G, Gonçalves AT, Valenzuela V, Gallardo-Escárate C. Metagenomic study of bacterial microbiota in persistent endodontic infections using Next-generation sequencing. Int Endod J 2018;51:1336-48.
Yang CY, Yeh YM, Yu HY, Chin CY, Hsu CW, Liu H, et al
. Oral microbiota community dynamics associated with oral squamous cell carcinoma staging. Front Microbiol 2018;9:862.
Turner S, Pryer KM, Miao VP, Palmer JD. Investigating deep phylogenetic relationships among cyanobacteria and plastids by small subunit rRNA sequence analysis. J Eukaryot Microbiol Soc Protozool 1999;46:327-38.
Methé BA, Nelson KE, Pop M, Creasy HH, Giglio MG, Huttenhower C, et al
. A framework for human microbiome research. Nature 2012;486:215-21.
Elend C, Schmeisser C, Pop M, Creasy HH, Giglio MG, Huttenhower C, et al
. A framework for human microbiome research. Nature 2012;486:215-21.
Schrader C, Schielke A, Ellerbroek L, Johne R. PCR inhibitors-occurrence, properties and removal. J Appl Microbiol 2012;113:1014-26.
Watson RJ, Blackwell B. Purification and characterization of a common soil component which inhibits the polymerase chain reaction. Can J Microbiol 2000;46:633-42.
Kamble A, Singh H. Different methods of soil DNA extraction. Bio Protocol 2020;10:1-23.
Fatima F, Pathak N, Rastogi Verma S. An improved method for soil DNA extraction to study the microbial assortment within rhizospheric region. Mol Biol Int 2014;2014:518960.
Fatima F, Chaudhary I, Ali J, Rastogi S, Pathak N. Microbial DNA extraction from soil by different methods and its PCR amplification. Biochem Cell Arch 2011;11:85-90.
Bag S, Saha B, Mehta O, Anbumani D, Kumar N, Dayal M, et al
. An improved method for high quality metagenomics DNA extraction from human and environmental samples. Sci Rep 2016;6:26775.
Foong CP, Lakshmanan M, Abe H, Taylor TD, Foong SY, Sudesh K. A novel and wide substrate specific polyhydroxyalkanoate (PHA) synthase from unculturable bacteria found in mangrove soil. J Polym Res 2018;25:1-9.
Woese CR, Fox GE. Phylogenetic structure of the prokaryotic domain: The primary kingdoms. Proc Natl Acad Sci U S A 1977;74:5088-90.
Lan Y, Rosen G, Hershberg R. Marker genes that are less conserved in their sequences are useful for predicting genome-wide similarity levels between closely related prokaryotic strains. Microbiome 2016;4:18.
Yang B, Wang Y, Qian PY. Sensitivity and correlation of hypervariable regions in 16S rRNA genes in phylogenetic analysis. BMC Bioinformatics 2016;17:135.
Olsen GJ, Overbeek R, Larsen N, Woese CR. The ribosomal database project: Updated description. Nucleic Acids Res 1991;19:4817.
DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, et al
. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 2006;72:5069-72.
Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, et al
. The SILVA ribosomal RNA gene database project: Improved data processing and web-based tools. Nucleic Acids Res 2013;41:D590-6.
Abbai NS, Govender A, Shaik R, Pillay B. Pyrosequence analysis of unamplified and whole genome amplified DNA from hydrocarbon-contaminated groundwater. Mol Biotechnol 2012;50:39-48.
Kim M, Morrison M, Yu Z. Evaluation of different partial 16S rRNA gene sequence regions for phylogenetic analysis of microbiomes. J Microbiol Methods 2011;84:81-7.
Yong HS, Song SL, Chua KO, Lim PE. High diversity of bacterial communities in developmental stages of Bactrocera carambolae
(Insecta: Tephritidae) revealed by illumina miseq sequencing of 16S rRNA gene. Curr Microbiol 2017;74:1076-82.
Andersson AF, Lindberg M, Jakobsson H, Bäckhed F, Nyrén P, Engstrand L. Comparative analysis of human gut microbiota by barcoded pyrosequencing. PLoS One 2008;3:e2836.
Baker GC, Smith JJ, Cowan DA. Review and re-analysis of domain-specific 16S primers. J Microbiol Methods 2003;55:541-55.
Li W, Godzik A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006;22:1658-9.
Klappenbach JA, Dunbar JM, Schmidt TM. rRNA operon copy number reflects ecological strategies of bacteria. Appl Environ Microbiol 2000;66:1328-33.
Adékambi T, Drancourt M, Raoult D. The rpoB gene as a tool for clinical microbiologists. Trends Microbiol 2009;17:37-45.
Kumar KR, Cowley MJ, Davis RL. Next-generation sequencing and emerging technologies. Semin Thromb Hemost 2019;45:661-73.
Escobar-Zepeda A, Vera-Ponce de León A, Sanchez-Flores A. The road to metagenomics: From microbiology to DNA sequencing technologies and bioinformatics. Front Genet 2015;6:348.
Frey KG, Herrera-Galeano JE, Redden CL, Luu TV, Servetas SL, Mateczun AJ, et al
. Comparison of three next-generation sequencing platforms for metagenomic sequencing and identification of pathogens in blood. BMC Genomics 2014;15:96.
Sandmann S, de Graaf AO, van der Reijden BA, Jansen JH, Dugas M. GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data. PLoS One 2017;12:e0171983.
Thomas RK, Nickerson E, Simons JF, Jänne PA, Tengs T, Yuza Y, et al
. Sensitive mutation detection in heterogeneous cancer specimens by massively parallel picoliter reactor sequencing. Nat Med 2006;12:852-5.
Wommack KE, Bhavsar J, Ravel J. Metagenomics: Read length matters - downloaded from. Appl Environ Microbiol 2008;74:1453-63.
Merriman B, Torrent I, Rothberg JM. Progress in ion torrent semiconductor chip based sequencing. Electrophoresis 2012;33:3397-417.
Rhoads A, Au KF. PacBio sequencing and its applications. Genomics Proteomics Bioinformatics 2015;13:278-89.
Lu H, Giordano F, Ning Z. Oxford nanopore MinION sequencing and genome assembly. Genomics Proteomics Bioinformatics 2016;14:265-79.
Quick J, Loman NJ, Duraffour S, Simpson JT, Severi E, Cowley L, et al
. Real-time, portable genome sequencing for Ebola surveillance. Nature 2016;530:228-32.
Zhang SW, Jin XY, Zhang T. Gene prediction in metagenomic fragments with deep learning. Biomed Res Int 2017;2017:4740354.
Brady A, Salzberg SL. Phymm and PhymmBL: Metagenomic phylogenetic classification with interpolated Markov models. Nat Methods 2009;6:673-6.
Ley RE, Lozupone CA, Hamady M, Knight R, Gordon JI. Worlds within worlds: Evolution of the vertebrate gut microbiota. Nat Rev Microbiol 2008;6:776-88.
Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, et al
. QIIME allows analysis of high-throughput community sequencing data. Nat Methods 2010;7:335-6.
Hernandez BY, Zhu X, Goodman MT, Gatewood R, Mendiola P, Quinata K, et al
. Betel nut chewing, oral premalignant lesions, and the oral microbiome. PLoS One 2017;12:e0172196.
Staggs C, Galloway M. Development of a local cloud-based bioinformatics architecture. Latifi S (ed.). In: Advances in Intelligent Systems and Computing. Cham: Springer Verlag; 2018. p. 559-65.
Knietsch A, Bowien S, Whited G, Gottschalk G, Daniel R. Identification and characterization of coenzyme B12-dependent glycerol dehydratase- and diol dehydratase-encoding genes from metagenomic DNA libraries derived from enrichment cultures. Appl Environ Microbiol 2003;69:3048-60.
Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 2010;26:2460-1.
Chan CK, Hsu AL, Halgamuge SK, Tang SL. Binning sequences using very sparse labels within a metagenome. BMC Bioinformatics 2008;9:215.
Crognale S, Tonanzi B, Valentino F, Majone M, Rossetti S. Microbiome dynamics and phaC synthase genes selected in a pilot plant producing polyhydroxyalkanoate from the organic fraction of urban waste. Sci Total Environ 2019;689:765-73.
Kioroglou D, Mas A, Portillo MD. Evaluating the effect of QIIME balanced default parameters on metataxonomic analysis workflows with a mock community. Front Microbiol 2019;10:1084.
Palmer RJ, Cotton SL, Kokaras AS, Gardner P, Grisius M, Pelayo E, et al
. Analysis of oral bacterial communities: Comparison of HOMI NGS with a tree-based approach implemented in QIIME. J Oral Microbiol 2019;11:1586413.
Ferrari B, Winsley T, Ji M, Neilan B. Insights into the distribution and abundance of the ubiquitous candidatus Saccharibacteria phylum following tag pyrosequencing. Sci Rep 2014;4:3957.
Amann RI, Ludwig W, Schleifer KH. Phylogenetic identification and in situ
detection of individual microbial cells without cultivation. Microbiol Rev 1995;59:143-69.
Hugenholtz P, Pace NR. Identifying microbial diversity in the natural environment: A molecular phylogenetic approach. Trends Biotechnol 1996;14:190-7.
Schmidt TM, DeLong EF, Pace NR. Analysis of a marine picoplankton community by 16S rRNA gene cloning and sequencing. J Bacteriol 1991;173:4371-8.
Giovannoni SJ, Britschgi TB, Moyer CL, Field KG. Genetic diversity in Sargasso Sea bacterioplankton. Nature 1990;345:60-3.
Wade W. Unculturable bacteria--the uncharacterized organisms that cause oral infections. J R Soc Med 2002;95:81-3.
[Figure 1], [Figure 2]