FIGURE SUMMARY
Title

Computational prediction and experimental validation identify functionally conserved lncRNAs from zebrafish to human

Authors
Huang, W., Xiong, T., Zhao, Y., Heng, J., Han, G., Wang, P., Zhao, Z., Shi, M., Li, J., Wang, J., Wu, Y., Liu, F., Xi, J.J., Wang, Y., Zhang, Q.C.
Source
Full text @ Nat. Genet.

Fig. 1: Identification of coPARSE-lncRNA and their homologs across vertebrates.. A simplified workflow for lncHOME analysis of vertebrate lncRNAs. The phylogenetic tree shows the evolutionary descent of eight vertebrates, with the number of annotated lncRNAs in each species. The heatmap shows the Jaccard index of lncRNAs and protein-coding genes identified by sequence similarity across eight vertebrates (top). lncHOME defines coPARSE-lncRNAs by combining the alignment of homologous protein-coding genes and corresponding genomic anchors (bottom left) and analysis of similar motif distribution patterns (bottom right). b, Contour line plot of syntenic lncRNAs in human versus mouse and human versus zebrafish identified by lncHOME, in terms of the proportion of common protein-coding genes and the proportion of corresponding genomic anchors. Background density plot showing the proportion scores for protein-coding genes with one-to-one homology. c, The distribution of curated RNA motifs for representative RBPs. Represented motifs for two example RBPs (FUS and TARDBP) are shown. d, coPARSE-lncRNA homolog pairs with similar motif distribution patterns between human and mouse. A coPARSE-lncRNA with annotation in the lncRNAdb database is highlighted in red. The lncRNA THORLNC is highlighted in blue. Red dashed lines represent the median value of the MPSSs and the GPSs.

Fig. 2: The coPARSE-lncRNAs and their predicted homologs share similar evolutionary and functional features., The distribution of average conservation scores (PhastCons) for coPARSE-lncRNA homolog pairs with sequence similarity (homolog_ss, n = 605/17 for human versus mouse/human versus zebrafish) and without sequence similarity (homolog_nss, n = 4,959/553 for human versus mouse/human versus zebrafish), and paired lncRNAs randomly selected from human and mouse lncRNAs (nonhomolog, n = 5,000). b, The distribution of common SNP density of SNPs in motif or nonmotif regions in human coPARSE-lncRNAs among the homolog_ss (n = 605) and homolog_nss (n = 4,959) groups of lncRNA pairs. c, The distribution of major alternative allele frequency of SNPs in motif or nonmotif regions in human coPARSE-lncRNAs among the homolog_ss (n = 605) and homolog_nss (n = 4,959) groups of lncRNA pairs. d, The distribution of the common histone modification site rate among the homolog_ss (n = 605), homolog_nss (n = 4,959) and nonhomolog (n = 5,000) groups of lncRNA pairs. For ad, two-sided Mann–Whitney U test. Boxes, IQR. Center lines, median. Whiskers, values within 1.5× IQR of the top and bottom quartiles. e, Heatmap of normalized expression values of coPARSE-lncRNAs and their predicted homologs in five organs (brain, kidney, liver, muscle and spleen) and three species (human, mouse and zebrafish) are displayed (top), and distribution of tissue-specific expression score (among the five organs) of the coPARSE-lncRNAs and their homologs (bottom). f, Correlation of tissue specificity of homolog_ss and homolog_nss groups of coPARSE-lncRNAs and their homologs among three species. g, Distribution of enrichment for human coPARSE-lncRNA genes with ClinVar mutations (excluding the mutations falling in exons of protein-coding genes), compared to randomly selected lncRNA genes (P value calculated using a permutation test). Blue dashed lines represent the nonenrichment threshold of 1. h, Enrichment of the homolog_ss and homolog_nss groups of human coPARSE-lncRNAs with homologs in mouse for differentially expressed lncRNAs across different cancer types. Each dot represents a cancer type, and the orange and yellow colors indicate significant enrichment (P values calculated using two-sided Fisher’s exact test). IQR, interquartile range.

Fig. 3: CRISPR–Cas12a screening and validation of coPARSE-lncRNA functions., The crRNA library was delivered into cells stably expressing Cas12a by lentiviral infection. Infected cells were collected by fluorescence-activated cell sorting (FACS; green fluorescence). For screening, cells were cultured for 15–45 d before genome DNA extraction and high-throughput sequencing analysis of the barcoded crRNA regions. Each DNA oligonucleotide sequence encodes two crRNAs (represented in red and blue), which will be transcribed and processed to generate individual mature crRNAs by Cas12a; these mature crRNAs will guide Cas12a to cut target genome regions. DR (19 nt). b, The RRA scores for the top-ranking negatively selected lncRNAs. Note that smaller RRA scores indicate a stronger selection of the corresponding lncRNAs. The coPARSE-lncRNAs of the top ten negatively selected lncRNAs are highlighted in red, whereas the non-coPARSE-lncRNAs are highlighted in orange. Nine positive control genes are shown in blue (round dots for lncRNAs and triangles for protein-coding genes). Background represents the overall distribution. c, The mean read count value for paired crRNAs at day 45 relative to that of day 0 for lncRNA genes in our screening library. Highlighted dots are paired crRNAs for five negatively selected candidate genes in our screening assay, and the background represents the overall distribution. d, Overlap of the negatively selected lncRNAs in the three indicated cell lines. e, Cell proliferation validation assays in HeLa cells treated with two independent shRNAs for each candidate lncRNA. Error bars, means ± s.d., n = 3 biologically independent experiments. DR, direct repeats.

Fig. 4: Functional validation of coPARSE-lncRNAs.KO-rescue lentivirus plasmid construction. The plasmid contains three functional cassettes for U6 promoter-driven expression of crRNAs, Dox-inducible ectopic expression of homologs and GFP labeling for infected cells. b, IncuCyte proliferation analysis. HeLa cells maintained in a Dox-free culture medium were split into two groups (Dox+/−) for lentivirus infection, followed by transient transfection of rtTA-expression or control pcDNA3.1 plasmids 24 h after infection. GFP-positive cells were sorted by FACS for IncuCyte proliferation analysis. Error bars, means ± s.d., n = 3 biologically independent experiments. c, KO-rescue assays of 21 candidate coPARSE-lncRNAs (THORLNC as a positive control). The relative cell confluence upon Dox induction was calculated for these coPARSE-lncRNAs (the fold change of 72 h versus 0 h for each coPARSE-lncRNA was normalized to AAVS1 in the Dox+/− groups). An AAVS1-targeting crRNA pair and a segment of fly luciferase gene were used for the AAVS1 group. Error bars, means ± s.d., n = 3 biologically independent experiments, two-sided Student’s t-test. d, IncuCyte assay of the human coPARSE-lncRNA RP1-212P9.3 and its zebrafish homolog TCONS_00107744_zbf, using luciferase segments as a negative control, n = 2 biologically independent experiments. e, Time-matched images of early embryogenesis showing that injection of the four human coPARSE-lncRNAs rescued the developmental defect of the corresponding zebrafish lncRNA homolog knockdown embryos. The epiboly edge is marked by red dotted lines, and the embryonic shield is indicated by red arrowheads. Scale bars, 100 μm. f, Quantification of zebrafish lncRNA knockdown embryos complemented with human homologous coPARSE-lncRNAs, showing a rescue of the developmental delay. n = 3 biologically independent experiments. The number of embryos in each injection group is detailed in Methods. Error bars, means ± s.d., two-sided Student’s t-test. g, HeLa cell line xenograft tumors of Dox+/− groups of the human lncRNA RP1-212P9.3 KO and complementation samples by RP1-212P9.3 and its zebrafish homolog (TCONS_00107744_zbf), showing increased tumor growth of the complementation samples (top). Bar plot showing tumor weights (bottom). Error bars, means ± s.d., n = 13, 14, 6 and 7 biologically independent experiments, one-sided Student’s t-test.

Fig. 5: Identification and functional analysis of the RBP interactome for two coPARSE-lncRNAs., PCA of MS data for HeLa cell lysates pulled down for the indicated human coPARSE-lncRNAs and the predicted mouse and zebrafish homologs. The control samples are based on luciferase transcript segments. b, Distribution of the MiST scores of enriched RBPs upon pull-down using the human coPARSE-lncRNA RP1-212P9.3 and its predicted mouse and zebrafish homologs. Dashed lines represent a threshold of 0.7. Three commonly enriched RBPs from all comparisons (highlighted in colored circles) were validated by immunoblotting. The r represents the Pearson correlation coefficient, two-sided Student’s t-test. c, Venn diagram showing identified binding proteins of eight lncRNAs (human coPARSE-lncRNAs THORLNC, RP1-212P9.3 and RP11-1055B8.4, and their mouse and zebrafish homologs) in the RNA pull-down experiments (top). The table presents common binding proteins of three human lncRNAs and their homologs (bottom). Each dot represents a binding protein. d,e, Time-matched images (d) and quantifications (e) of early embryogenesis showing that injection of a human homologous coPARSE-lncRNA RP1-212P9.3 fragment and an RP1-212P9.3 fragment with the intact NONO-binding sites (RP1-212P9.3 re1) rescued the developmental defect of the corresponding zebrafish lncRNA homolog knockdown embryos. The epiboly edge is marked by red dotted lines, and the embryonic shield is indicated by red arrowheads. n = 3 biologically independent experiments. The number of embryos in each injection group is detailed in Methods. Scale bars, 100 μm. Error bars, means ± s.d., two-sided Student’s t-test. f,g, High-content imaging proliferation assays of RP1-212P9.3 (f) and RP11-1055B8.4 (g) KO HeLa cells rescued with wild-type zebrafish homologs and mutants bearing mutated RBP-binding sites. A luciferase segment was used as a negative control. AAVS1/FLU, control with pcrRNA targeting AAVS1 gene and overexpression of the luciferase segment. All groups were cultured with 500 ng ml−1 Dox. Error bars, means ± s.d., n = 3 biologically independent experiments. h, A simplified model for the evolution and function of coPARSE-lncRNAs. NS, not significant.

Source data

Acknowledgments
This image is the copyrighted work of the attributed author or publisher, and ZFIN has permission only to display this image to its users. Additional permissions should be obtained from the applicable author or publisher of the image. Full text @ Nat. Genet.