天道酬勤,学无止境

R Summarize Collapsed Data.Table

I have data such as this

    data=data.table("School"=c(1,1,1,1,1,1,0,1,0,0,1,1,1,0,1,0,1,1,1,1,1,0,0,1,0,1,1,1,1,1,1,0,1,0,1,0),
    "Grade"=c(0,1,1,1,0,0,0,1,1,1,0,1,1,0,0,1,1,1,0,0,1,1,0,1,0,0,1,0,1,1,0,0,0,0,1,0),
    "CAT"=c(1,0,1,1,0,1,0,1,1,0,1,0,0,1,0,1,0,0,0,0,0,0,1,0,0,1,1,0,0,1,1,0,1,1,1,1),
    "FOX"=c(1,1,0,1,1,1,1,1,0,0,0,1,1,1,0,0,1,1,1,1,1,1,1,0,1,1,0,0,1,0,0,1,0,0,1,0),
    "DOG"=c(0,0,0,1,0,0,1,0,0,1,0,1,1,1,0,1,1,0,0,1,1,0,0,1,0,1,1,0,1,0,1,1,1,0,1,1))

and wish to achieve a new data table such as this:

dataWANT=data.frame("VARIABLE"=c('CAT', 'CAT', 'CAT', 'FOX', 'FOX', 'FOX', 'DOG', 'DOG', 'DOG'),
"SCHOOL"=c(1, 1, 0, 1, 1, 0, 1, 1, 0),
"GRADE"=c(0, 1, 1, 0, 1, 1, 0, 1, 1),
"MEAN"=c(NA))

dataWANT takes the mean for CAT and FOX and DOG by SCHOOL, GRADE, and SCHOOL X GRADE when they are equal to 1.

I know how to do this one at a time but that is not good for doing this with a big data.

data[, CAT1:=mean(CAT), by=list(SCHOOL)]
data[, FOX1:=mean(FOX), by=list(GRADE)]
data[, DOG1:=mean(DOG), by=list(SCHOOL, GRADE)]

data$CAT2 = unique(data[SCHOOL==1, CAT1])
data$FOX2 = unique(data[GRADE==1, FOX1])
data$DOG2 = unique(data[SCHOOL==1 & GRADE==1, DOG1])

Please only use this:

data=data.table("SCHOOL"=c(1,1,1,1,1,1,0,1,0,0,1,1,1,0,1,0,1,1,1,1,1,0,0,1,0,1,1,1,1,1,1,0,1,0,1,0),
                "GRADE"=c(0,1,1,1,0,0,0,1,1,1,0,1,1,0,0,1,1,1,0,0,1,1,0,1,0,0,1,0,1,1,0,0,0,0,1,0),
                "CAT"=c(1,0,1,1,0,1,0,1,1,0,1,0,0,1,0,1,0,0,0,0,0,0,1,0,0,1,1,0,0,1,1,0,1,1,1,1),
                "FOX"=c(1,0,0,1,1,1,1,1,0,0,0,1,1,1,0,0,1,1,1,1,1,1,1,0,1,1,0,0,1,0,0,1,0,0,1,0),
                "DOG"=c(0,0,0,1,0,0,1,0,0,1,0,1,1,1,0,1,1,0,0,1,1,0,0,1,0,1,1,0,1,0,1,1,1,0,1,1))


data[, CAT1:=mean(CAT), by=list(SCHOOL)]
data[, CAT2:=mean(CAT), by=list(GRADE)]
data[, CAT3:=mean(CAT), by=list(SCHOOL, GRADE)]

data[, FOX1:=mean(FOX), by=list(SCHOOL)]
data[, FOX2:=mean(FOX), by=list(GRADE)]
data[, FOX3:=mean(FOX), by=list(SCHOOL, GRADE)]

data[, DOG1:=mean(DOG), by=list(SCHOOL)]
data[, DOG2:=mean(DOG), by=list(GRADE)]
data[, DOG3:=mean(DOG), by=list(SCHOOL, GRADE)]

dataWANT=data.frame("VARIABLE"=c('CAT','CAT','CAT','FOX','FOX','FOX','DOG','DOG','DOG'),
                    "TYPE"=c(1,2,3,1,2,3,1,2,3),
                    "MEAN"=c(0.48,0.44,0.428,0.6,0.611,0.6428,0.52,0.61,0.6428))

where TYPE equals to 1 when MEAN in estimated by SCHOOL,

TYPE equals to 2 when MEAN is estimated by GRADE,

TYPE equals to 3 when MEAN is estimated by SCHOOL and GRADE

评论

We could use rbindlist after creating a list by taking the MEAN after melting the dataset (as in the other post)

library(data.table)
cols <- c('CAT', 'FOX', 'DOG')
data1 <- melt(data, measure.vars = cols)
list_cols <- list('SCHOOL', 'GRADE', c('SCHOOL', 'GRADE'))
lst1 <- lapply(list_cols, function(x)  
       data1[, .(MEAN = mean(value, na.rm = TRUE)), c(x, 'variable')])
rbindlist(lapply(lst1, function(x)  {
     nm1 <- setdiff(names(x), c('variable', 'MEAN'))
     x[Reduce(`&`, lapply(mget(nm1), as.logical)),
     .(VARIABLE = variable, MEAN)]}), idcol = 'TYPE')[order(VARIABLE)]
#   TYPE VARIABLE      MEAN
#1:    1      CAT 0.4800000
#2:    2      CAT 0.4444444
#3:    3      CAT 0.4285714
#4:    1      FOX 0.6000000
#5:    2      FOX 0.5555556
#6:    3      FOX 0.6428571
#7:    1      DOG 0.5200000
#8:    2      DOG 0.6111111
#9:    3      DOG 0.6428571

Do you mean to get something like this?

library(data.table)

melt(data, measure.vars = c('CAT', 'FOX', 'DOG'))[, 
        .(MEAN = mean(value, na.rm = TRUE)), .(School, Grade, variable)]

To group it by different columns, we can do :

cols <- c('CAT', 'FOX', 'DOG')
data1 <- melt(data, measure.vars = cols)
list_cols <- list('School', 'Grade', c('School', 'Grade'))

lapply(list_cols, function(x)  
         data1[, .(MEAN = mean(value, na.rm = TRUE)), c(x, 'variable')])

You could subset and calculate your means first using lapply(.SD,...) then melt that into your output:

melt(data[School != 0 | Grade != 0, lapply(.SD, mean), by = .(School, Grade)], id.vars = c("School", "Grade"))

Adding this after also adds the TYPE variable

...][, TYPE := School + (2*Grade)]

Putting it all together and tidying it up too, it matches your desired output

dataWANT <- melt(data[School != 0 | Grade != 0, lapply(.SD, mean), by = .(School, Grade)], id.vars = c("School", "Grade"))[, TYPE := School + (2*Grade)][order(variable, TYPE), .("VARIABLE" = variable, TYPE, "MEAN" = value)] 

受限制的 HTML

  • 允许的HTML标签:<a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • 自动断行和分段。
  • 网页和电子邮件地址自动转换为链接。

相关推荐
  • 创建一个“其他”字段(Creating an “other” field)
    问题 现在,我有以下由original.df %.% group_by(Category) %.% tally() %.% arrange(desc(n))创建的data.frame。 DF <- structure(list(Category = c("E", "K", "M", "L", "I", "A", "S", "G", "N", "Q"), n = c(163051, 127133, 106680, 64868, 49701, 47387, 47096, 45601, 40056, 36882)), .Names = c("Category", "n"), row.names = c(NA, 10L), class = c("tbl_df", "tbl", "data.frame" )) Category n 1 E 163051 2 K 127133 3 M 106680 4 L 64868 5 I 49701 6 A 47387 7 S 47096 8 G 45601 9 N 40056 10 Q 36882 我想从nie排名最低的类别创建一个“ Other”字段 Category n 1 E 163051 2 K 127133 3 M 106680 4 L 64868 5 I 49701 6 Other 217022 现在,我在做 rbind(filter(DF
  • 折叠相交区域(Collapse intersecting regions)
    问题 我正在尝试找到一种方法来折叠具有相交范围的行(由“开始”和“停止”列表示),并将折叠后的值记录到新列中。 例如,我有这个数据框: my.df<- data.frame(chrom=c(1,1,1,1,14,16,16), name=c("a","b","c","d","e","f","g"), start=as.numeric(c(0,70001,70203,70060, 40004, 50000872, 50000872)), stop=as.numeric(c(71200,71200,80001,71051, 42004, 50000890, 51000952))) chrom name start stop 1 a 0 71200 1 b 70001 71200 1 c 70203 80001 1 d 70060 71051 14 e 40004 42004 16 f 50000872 50000890 16 g 50000872 51000952 我试图找到重叠范围,并在“开始”和“停止”中记录折叠的重叠行所覆盖的最大范围,并记录折叠的行的名称,因此我得到了这一点: chrom start stop name 1 70001 80001 a,b,c,d 14 40004 42004 e 16 50000872 51000952 f,g
  • R:折叠行,然后将行转换为新列(R: collapse rows and then convert row into a new column)
    问题 所以这是我的挑战。 我试图摆脱最好组织为一列的数据行。 原始数据集看起来像 1|1|a 2|3|b 2|5|c 1|4|d 1|2|e 10|10|f 最终想要的结果是 1 |1,2,4 |a| e d 2 |3,5 |b| c 10|10 |f| NA 该表的整形基于第 1 列分组内的最小值第 2 列,其中新列 3 是根据组内的最小值定义的,而新列 4 是从非最小值折叠的。 尝试的一些方法包括: newTable[min(newTable[,(1%o%2)]),] ## returns the minimum of both COL 1 and 2 only ddply(newTable,"V1", summarize, newCol = paste(V7,collapse = " ")) ## collapses all values by Col 1 and creates a new column nicely. 将这些代码行组合成一行的变体没有奏效,部分原因是我的知识有限。 此处不包括这些修改。 回答1 尝试: library(dplyr) library(tidyr) dat %>% group_by(V1) %>% summarise_each(funs(paste(sort(.), collapse=","))) %>% extract(V3, c("V3"
  • R 的 data.table 截断位?(R's data.table Truncating Bits?)
    问题 所以我是 R 中data.table忠实粉丝。我几乎一直都在使用它,但遇到过它根本不适合我的情况。 我有一个包(在我公司内部),它使用 R 的double来存储一个无符号 64 位整数的值,其位序列对应于一些花哨的编码。 除了 data.table 之外,这个包在任何地方都能很好地工作。 我发现,如果我汇总此数据的一列,我会丢失大量我的唯一值。 我唯一的猜测是data.table在某种奇怪的double优化中截断了位。 任何人都可以确认是这种情况吗? 这只是一个错误吗? 下面是该问题的重现,并与我目前必须使用但想避免使用的软件包 ( dplyr ) 进行比较。 temp <- structure(list(obscure_math = c(6.95476896592629e-309, 6.95476863436446e-309, 6.95476743245288e-309, 6.95476942182375e-309, 6.95477149408563e-309, 6.95477132830476e-309, 6.95477132830476e-309, 6.95477149408562e-309, 6.95477174275702e-309, 6.95476880014538e-309, 6.95476896592647e-309, 6.95476896592647e
  • R's data.table Truncating Bits?
    So I'm a huge data.table fan in R. I use it almost all the time but have come across a situation in which it won't work for me at all. I have a package (internal to my company) that uses R's double to store the value of an unsigned 64 bit integer whose bit sequence corresponds to some fancy encoding. This package works very nicely everywhere except data.table. I found that if I aggregate on a column of this data that I lose a large number of my unique values. My only guess here is that data.table is truncating bits in some kind of weird double optimization. Can anyone confirm that this is the
  • Summarize the self-join index while avoiding cartesian product in R data.table
    With a 2-column data.table, I'd like to summarize the pairwise relationships in column 1 by summing the number of shared elements in column 2. In other words, how many shared Y elements does each pairwise combination of X-values have? For example, I can do this in a 2-step process, first doing a cartesian cross join, then summarizing it like so: d = data.table(X=c(1,1,1,2,2,2,2,3,3,3,4,4), Y=c(1,2,3,1,2,3,4,1,5,6,4,5)) setkey(d, Y) d2 = d[d, allow.cartesian=TRUE] d2[, .N, by=c("X", "i.X")] # X i.X N #1: 1 1 3 #2: 2 1 3 #3: 3 1 1 #4: 1 2 3 #5: 2 2 4 #6: 3 2 1 #7: 1 3 1 #8: 2 3 1 #9: 3 3 3 #10
  • 总结自连接索引,同时避免R数据中的笛卡尔积。(Summarize the self-join index while avoiding cartesian product in R data.table)
    问题 对于2列data.table ,我想通过累加第2列中共享元素的数量来总结第1列中的成对关系。换句话说,每个X值的成对组合具有多少个共享Y元素? 例如,我可以分两步进行,首先进行笛卡尔交叉联接,然后将其总结如下: d = data.table(X=c(1,1,1,2,2,2,2,3,3,3,4,4), Y=c(1,2,3,1,2,3,4,1,5,6,4,5)) setkey(d, Y) d2 = d[d, allow.cartesian=TRUE] d2[, .N, by=c("X", "i.X")] # X i.X N #1: 1 1 3 #2: 2 1 3 #3: 3 1 1 #4: 1 2 3 #5: 2 2 4 #6: 3 2 1 #7: 1 3 1 #8: 2 3 1 #9: 3 3 3 #10: 4 2 1 #11: 2 4 1 #12: 4 4 2 #13: 4 3 1 #14: 3 4 1 此结果的第二行表明X=1共享3个Y值,其中X=2 ; 而X=3仅分享1 y值,而X=4 。 有什么方法可以绕过笛卡尔联接步骤而导致大型低效率的表格呢? 我想在具有数百万行的表上执行类似的操作,并且笛卡尔连接遇到2^31向量大小限制(除了变慢)。 我在想像这样的事情: d[d, list(X, length(Y)), by=c("X", "i.X")] 但这给出了错误iX
  • ddp在R中按组求和(ddply for sum by group in R)
    问题 我有一个示例数据框“数据”,如下所示: X Y Month Year income 2281205 228120 3 2011 1000 2281212 228121 9 2010 1100 2281213 228121 12 2010 900 2281214 228121 3 2011 9000 2281222 228122 6 2010 1111 2281223 228122 9 2010 3000 2281224 228122 12 2010 1889 2281225 228122 3 2011 778 2281243 228124 12 2010 1111 2281244 228124 3 2011 200 2281282 228128 9 2010 7889 2281283 228128 12 2010 2900 2281284 228128 3 2011 3400 2281302 228130 9 2010 1200 2281303 228130 12 2010 2000 2281304 228130 3 2011 1900 2281352 228135 9 2010 2300 2281353 228135 12 2010 1333 2281354 228135 3 2011 2340 如果我对每个Y都有四个观测值(例如,对于2281223的2010年第6、9
  • R- collapse rows based on contents of two columns
    I apologize in advance if this question is too specific or involved for this type of forum. I have been a long time lurker on this site, and this is the first time I haven't been able to solve my issue by looking at previous questions, so I finally decided to post. Please let me know if there is a better place to post this, or if you have advice on making it more clear. here goes. I have a data.table with the following structure: library(data.table) dt = structure(list(chr = c("chr1", "chr1", "chr1", "chr1", "chrX", "chrX", "chrX", "chrX"), start = c(842326, 855423, 855426, 855739, 153880833
  • R-根据两列的内容折叠行(R- collapse rows based on contents of two columns)
    问题 如果这个问题太具体或涉及此类论坛,我提前道歉。 我在这个网站上潜伏了很长时间,这是我第一次无法通过查看以前的问题来解决我的问题,所以我最终决定发帖。 请让我知道是否有更好的地方可以发布此内容,或者您​​是否有关于使其更清楚的建议。 开始。 我有一个具有以下结构的 data.table: library(data.table) dt = structure(list(chr = c("chr1", "chr1", "chr1", "chr1", "chrX", "chrX", "chrX", "chrX"), start = c(842326, 855423, 855426, 855739, 153880833, 153880841, 154298086, 154298089), end = c(842327L, 855424L, 855427L, 855740L, 153880834L, 153880842L, 154298087L, 154298090L), meth.diff = c(9.35200555410902, 19.1839617944039, 29.6734426495636, -12.3375577709254, 50.5830043986142, 52.7503561092491, 46.5783738475184, 41.8662800742733)
  • 使用 R 标准格式日期/时间从大型数据集中计算平均每日值?(Calculate average daily value from large data set with R standard format date/times?)
    问题 我有一个大约 1000 万行的数据框,跨越大约 570 天。 使用 striptime 转换日期和时间后,数据如下所示: date X1 1 2004-01-01 07:43:00 1.2587 2 2004-01-01 07:47:52 1.2585 3 2004-01-01 17:46:14 1.2586 4 2004-01-01 17:56:08 1.2585 5 2004-01-01 17:56:15 1.2585 我想计算每一天的平均值(如一年中的几天,而不是一周中的几天),然后绘制它们。 例如。 获取所有日期为“2004-01-01”的行,计算平均价格,然后对“2004-01-2”执行相同的操作,依此类推。 同样,我对找到平均每月价值或每小时价格感兴趣,但我想一旦我知道如何获得平均每日价格,我就可以计算出这些。 我最大的困难是自动从日期变量中提取一年中的哪一天。 如何循环遍历所有 365 天并计算每天的平均值,并将其存储在列表中? 我能够使用 weekdays() 函数找到一周中某天的平均值,但我找不到任何类似的东西。 回答1 这是使用dplyr和lubridate的解决方案。 首先,简化通过使用舍入到最接近的日子单元的时间floor_date (参见下面的注释thelatemail),然后group_by日期和使用计算的平均值summarize :
  • R按日期分组,并汇总值(R group by date, and summarize the values)
    问题 R 对我来说是新的,我正在使用(私人)数据集。 我有以下问题,我有很多时间序列: 2015-04-27 12:29:48 2015-04-27 12:31:48 2015-04-27 12:34:50 2015-04-27 12:50:43 2015-04-27 12:53:55 2015-04-28 00:00:00 2015-04-28 00:00:10 所有时间序列都有一个值: Datetime value 2015-04-27 12:29:48 0.0 2015-04-27 12:31:48 0.0 2015-04-27 12:34:50 1.1 2015-04-27 12:50:43 45.0 2015-04-27 12:53:55 0.0 2015-04-28 00:00:00 1.0 2015-04-28 00:00:10 2.0 我想跳过所有的小时和分钟,并像这样总结它们: Datetime value 2015-04-27 46.1 2015-04-28 3.0 我做的第一件事是转换列日期时间: energy$datetime <- as.POSIXlt(energy$datetime) 我尝试了汇总功能: df %>% group_by(energy$datetime) %>% summarize (energy$newname(energy$value)
  • 按组计算平均值(Calculate the mean by group)
    问题 我有一个看起来像这样的大数据框: df <- data.frame(dive = factor(sample(c("dive1","dive2"), 10, replace=TRUE)), speed = runif(10) ) > df dive speed 1 dive1 0.80668490 2 dive1 0.53349584 3 dive2 0.07571784 4 dive2 0.39518628 5 dive1 0.84557955 6 dive1 0.69121443 7 dive1 0.38124950 8 dive2 0.22536126 9 dive1 0.04704750 10 dive2 0.93561651 我的目标是在另一列等于某个值时获得一列中的平均值,并对所有值重复此操作。 即在上面的例子中,我想返回柱dive每个唯一值的柱speed平均值。 因此,当dive==dive1 ,平均的speed是这一点,所以对每个价值dive 。 回答1 在R中有许多方法可以执行此操作。具体来说, by , aggregate , split和plyr , cast , tapply , data.table , dplyr等等。 从广义上讲,这些问题的形式是“拆分应用”。 哈德利·威克汉姆(Hadley Wickham)写了一篇漂亮的文章
  • R Highcharter:Shiny 中的动态多级钻取(R Highcharter: Dynamic multi level drilldown in Shiny)
    问题 我试图创建一个使用多层明细图highcharter与动态数据shiny 。 我可以仅使用带有设置input R 代码来完成此操作,但是当我将它放入一个闪亮的应用程序中并尝试让它动态地对数据进行子集化时,它失败了。 下面是适用于R的代码(仅从 Farm 向下钻取到 Sheep): library(shinyjs) library(tidyr) library(data.table) library(highcharter) library(dplyr) x <- c("Farm","Farm","Farm","City","City","City","Ocean","Ocean") y <- c("Sheep","Sheep","Cow","Car","Bus","Bus","Boat","Boat") z <- c("Bill","Tracy","Sandy","Bob","Carl","Newt","Fig","Tony") a <- c(1,1,1,1,1,1,1,1) dat <- data.frame(x,y,z,a) input <- "Farm" input2 <- "Sheep" #First Tier datSum <- dat %>% group_by(x) %>% summarize(Quantity = sum(a) ) datSum <-
  • 使用 data.table 对一列进行计数和聚合/汇总(Use data.table to count and aggregate / summarize a column)
    问题 我想对data.table一列进行计数和聚合(求和),但找不到最有效的方法来做到这一点。 这似乎接近我想要的 R 用 data.table 总结多列。 我的数据: set.seed(321) dat <- data.table(MNTH = c(rep(201501,4), rep(201502,3), rep(201503,5), rep(201504,4)), VAR = sample(c(0,1), 16, replace=T)) > dat MNTH VAR 1: 201501 1 2: 201501 1 3: 201501 0 4: 201501 0 5: 201502 0 6: 201502 0 7: 201502 0 8: 201503 0 9: 201503 0 10: 201503 1 11: 201503 1 12: 201503 0 13: 201504 1 14: 201504 0 15: 201504 1 16: 201504 0 我想使用 data.table 通过MNTH VAR进行计数和求和。 想要的结果: MNTH COUNT VAR 1 201501 4 2 2 201502 3 0 3 201503 5 2 4 201504 4 2 回答1 您所指的帖子提供了一种如何将一种聚合方法应用于多列的方法。 如果要对不同的列应用不同的聚合方法
  • Use data.table to count and aggregate / summarize a column
    I want to count and aggregate(sum) a column in a data.table, and couldn't find the most efficient way to do this. This seems to be close to what I want R summarizing multiple columns with data.table. My data: set.seed(321) dat <- data.table(MNTH = c(rep(201501,4), rep(201502,3), rep(201503,5), rep(201504,4)), VAR = sample(c(0,1), 16, replace=T)) > dat MNTH VAR 1: 201501 1 2: 201501 1 3: 201501 0 4: 201501 0 5: 201502 0 6: 201502 0 7: 201502 0 8: 201503 0 9: 201503 0 10: 201503 1 11: 201503 1 12: 201503 0 13: 201504 1 14: 201504 0 15: 201504 1 16: 201504 0 I want to both count and sum VAR by
  • 在 R 中出现重复值并返回以逗号分隔的唯一值(occurrences of duplicated values and returning unique values separated by comma in R)
    问题 我在 R 中有以下数据框 Number ship_no 4432 1 4432 2 4564 1 4389 5 6578 6 4389 3 4355 10 4355 10 我想找到在唯一的ship_no中重复的重复Number Number ship_no 4432 1,2 4389 5,3 4355 10 我怎么能在 r 中做到这一点? 我尝试在 R 中遵循以下代码 library(dplyr) group_by(Number) %>% filter(duplicated(Number)) %>% summarize(Number = paste0(unique(ship_no), collapse = ',')) 回答1 你可以做: df %>% group_by(Number) %>% filter(n() > 1) %>% summarize(ship_no = paste0(unique(ship_no), collapse = ',')) 回答2 为什么您的解决方案不起作用: 随着声明 filter(duplicated(Number)) 您只保留与先前遇到的行重复的行: duplicated(df$Number) [1] FALSE TRUE FALSE FALSE FALSE TRUE 使用 data.table 的解决方案 1 library(data
  • dplyr用小计汇总(dplyr summarize with subtotals)
    问题 在excel中,数据透视表的一大优点是它们可以自动提供小计。 首先,我想知道dplyr中是否已经创建了可以完成此任务的任何东西。 如果没有,最简单的方法是什么? 在下面的示例中,我显示了按气缸和化油器数量划分的平均排量。 对于每组气缸(4,6,8),我想查看该组气缸的平均排量(或总排量,或任何其他摘要统计量)。 library(dplyr) mtcars %>% group_by(cyl,carb) %>% summarize(mean(disp)) cyl carb mean(disp) 1 4 1 91.38 2 4 2 116.60 3 6 1 241.50 4 6 4 163.80 5 6 6 145.00 6 8 2 345.50 7 8 3 275.80 8 8 4 405.50 9 8 8 301.00 回答1 data.table这很笨拙,但这是一种方法: library(data.table) DT <- data.table(mtcars) rbind( DT[,.(mean(disp)), by=.(cyl,carb)], DT[,.(mean(disp), carb=NA), by=.(cyl) ], DT[,.(mean(disp), cyl=NA), by=.(carb)] )[order(cyl,carb)] 这给 cyl carb V1 1
  • R Highcharter: Dynamic multi level drilldown in Shiny
    I am trying to create a multi-layer drilldown graph using highcharter with dynamic data in shiny. I am able to accomplish this using just R code with a set input but when I put it in a shiny application and try to have it subset the data dynamically, it fails. Below is the code that that works in R (only drilling down from Farm to Sheep): library(shinyjs) library(tidyr) library(data.table) library(highcharter) library(dplyr) x <- c("Farm","Farm","Farm","City","City","City","Ocean","Ocean") y <- c("Sheep","Sheep","Cow","Car","Bus","Bus","Boat","Boat") z <- c("Bill","Tracy","Sandy","Bob","Carl"
  • R 根据一列中的值汇总跨列的唯一值(R summarize unique values across columns based on values from one column)
    问题 我想知道基于var_1的值的每一列的唯一值的总数。 例如: Test <- data.frame(var_1 = c("a","a","a", "b", "b", "c", "c", "c", "c", "c"), var_2 = c("bl","bf","bl", "bl","bf","bl","bl","bf","bc", "bg" ), var_3 = c("cf","cf","eg", "cf","cf","eg","cf","dr","eg","fg")) 我正在寻找的结果将基于var_1中的值,并且应该是: var_1 var_2 var_3 a 2 2 b 2 1 c 3 4 但是,在尝试了各种方法(包括apply和table)之后-聚合一直是我要寻找的最接近的东西,但是此脚本会得出var_1的每个值的条目总数的汇总,但不是独特 agbyv1= aggregate(. ~ var_1, Test, length) var_1 var_2 var_3 a 3 3 b 2 2 c 5 5 我试过了 unqbyv1= aggregate(. ~ var_1, Test, length(unique(x))) 但这没用。 任何帮助是极大的赞赏。 回答1 尝试 library(dplyr) Test %>% group_by(var_1) %>% summarise