Removing unwanted variation from large-scale RNA sequencing data with PRPS
Journal Title
Nature Biotechnology
Publication Type
Research article
Abstract
Accurate identification and effective removal of unwanted variation is essential to derive meaningful biological results from RNA sequencing (RNA-seq) data, especially when the data come from large and complex studies. Using RNA-seq data from The Cancer Genome Atlas (TCGA), we examined several sources of unwanted variation and demonstrate here how these can significantly compromise various downstream analyses, including cancer subtype identification, association between gene expression and survival outcomes and gene co-expression analysis. We propose a strategy, called pseudo-replicates of pseudo-samples (PRPS), for deploying our recently developed normalization method, called removing unwanted variation III (RUV-III), to remove the variation caused by library size, tumor purity and batch effects in TCGA RNA-seq data. We illustrate the value of our approach by comparing it to the standard TCGA normalizations on several TCGA RNA-seq datasets. RUV-III with PRPS can be used to integrate and normalize other large transcriptomic datasets coming from multiple laboratories or platforms.
Publisher
Springer Nature
Keywords
Humans; *rna; Gene Expression Profiling/methods; Sequence Analysis, RNA; *Neoplasms/genetics
Department(s)
Laboratory Research
PubMed ID
36109686
Open Access at Publisher's Site
https://doi.org/10.1038/s41587-022-01440-w
Terms of Use/Rights Notice
Refer to copyright notice on published article.


Creation Date: 2023-06-06 06:44:11
Last Modified: 2023-06-06 06:45:18

© 2024 The Walter and Eliza Hall Institute of Medical Research. Access to this website is subject to our Privacy Policy and Terms of Use

An error has occurred. This application may no longer respond until reloaded. Reload 🗙