Revealing the genetic code of ribosomal RNAs

Despite modern sequencing methods, determining the precise sequence of the genetic code for ribosomal RNA (rRNA) has been technically challenging due to its repetitive nature. The Lab of Peter Schlögelhofer has now, for the first time for any organism, sequenced and assembled large parts of the rDNA-encoding nucleolus organizing region of the model plant Arabidopsis thaliana. In their study, published in Nature Communications, the scientists also identified several tissue-specific rRNA variants, which may have functional roles in specialized ribosomes.

The ribosome is an ancient macromolecular machine that translates the information coded in messenger RNA into proteins. Ribosomes are found in all living cells and, in addition to proteins, crucially depend on rRNA for their function. The genes encoding rRNAs are arranged in clusters called nucleolus organizing regions (NOR). Knowledge of the sequence and organization of rDNA genes within these clusters is essential for understanding rRNA gene regulation, their evolutionary origins, and the role of transcriptional variants in biology.

Until now, the sequencing of rDNA clusters has been inhibited by the repetitive nature of the DNA, which pose a problem for standard sequencing methods. Traditional sequencing methods chop DNA strands into smaller pieces, sequence them, and then reassemble the pieces in the proper order by matching their regions of overlap. But what if the individual pieces look too similar? More modern methods permit longer individual reads, which preserves context, but these approaches are more error-prone.

To tackle this problem, the researchers combined the contextuality of long reads and the precision of short read sequencing with a method to individually label rDNA repeats: “Each rDNA unit has minuscule variations”, explains senior Postdoc Jason Sims who is co-first author of the study together with Giovanni Sestini (now Institute of Molecular Biotechnology, IMBA). “We used these small differences to generate what we call a barcode for each unit. This way we could confidently assemble this highly repetitive rDNA region.”

With the sequence data in hand, the team observed that sequence variations exist not only in inter-genic regions, but also in the genes coding for rRNAs. These variants are integrated into mature ribosomes in a tissue-specific manner. “We learn in school that ribosomes are invariant machines – they just translate mRNA into proteins”, says group leader Peter Schlögelhofer, “however it was recently shown that they have different protein contents in certain tissues and developmental contexts. Our data now underlines the idea that they also contain different rRNAs, depending on tissue and developmental stage”. The biological significance of these genetic variants remains unclear, “but I would be surprised if there is none”, says Schlögelhofer. With their sequencing and bioinformatics pipeline, the scientists hope to have created a tool that can be used to analyze other repetitive DNA regions that make up a large part of the genome.

Publication in Nature Communications

Jason Sims, Giovanni Sestini, Christiane Elgert, Arndt von Haeseler & Peter Schlögelhofer: Sequencing of the Arabidopsis NOR2 reveals its distinct organization and tissue-specific rRNA ribosomal variants. Nature Communications Volume 12 (2021)