天道酬勤,学无止境

biomart

Using spread with duplicate identifiers for rows giving error

My data looks like this: df <- read.table(header = T, text = "GeneID Gene_Name Species Paralogues Domains Functional_Diversity 1234 DDR1 hsapiens 14 2 8.597482 5678 CSNK1E celegans 70 4 8.154788 9104 FGF1 Chicken 3 0 5.455874 4575 FGF1 hsapiens 4 6 6.745845") I need it to look like: Gene_Name hsapiens celegans ggalus DDR1 8.597482 NA NA CSNK1E NA 8.154788 NA FGF1 6.745845 NA 5.455874 I've tried using: library(tidyverse) df %>% select(Gene_Name, Species, Functional_Diversity) %>% spread(Species, Functional_Diversity) My actual data consists of 130,000 rows (many Gene Names approx 14,000 unique)

2021-05-13 10:59:32    分类:问答    r   dataframe   tidyr   spread   biomart

BioMart: Is there a way to easily change the species for all of my code?

Below is a small fraction of my code: library(biomaRt) ensembl_hsapiens <- useMart("ensembl", dataset = "hsapiens_gene_ensembl") hsapien_PC_genes <- getBM(attributes = c("ensembl_gene_id", "external_gene_name"), filters = "biotype", values = "protein_coding", mart = ensembl_hsapiens) paralogues[["hsapiens"]] <- getBM(attributes = c("external_gene_name", "hsapiens_paralog_associated_gene_name"), filters = "ensembl_gene_id", values = c(ensembl_gene_ID) , mart = ensembl_hsapiens) This bit of code will only allow me to extract the paralogues for hsapiens, it there a way for me to easily get the

2021-04-21 08:30:21    分类:问答    r   bioinformatics   bioconductor   biomart