天道酬勤,学无止境

summarize

在数据框上定义和应用自定义容器(Define and apply custom bins on a dataframe)

问题 我使用python创建了以下包含相似值的数据框: cosinFcolor cosinEdge cosinTexture histoFcolor histoEdge histoTexture jaccard 1 0.770 0.489 0.388 0.57500000 0.5845137 0.3920000 0.00000000 2 0.067 0.496 0.912 0.13865546 0.6147309 0.6984127 0.00000000 3 0.514 0.426 0.692 0.36440678 0.4787535 0.5198413 0.05882353 4 0.102 0.430 0.739 0.11297071 0.5288008 0.5436508 0.00000000 5 0.560 0.735 0.554 0.48148148 0.8168083 0.4603175 0.00000000 6 0.029 0.302 0.558 0.08547009 0.3928234 0.4603175 0.00000000 我试图编写一个R脚本来生成另一个反映框架的数据框,但是如果值大于0.5,则适用于合并的条件 伪代码: if (cosinFcolor > 0.5 & cosinFcolor <= 0.6) bin = 1 if (cosinFcolor > 0

2021-05-14 00:11:23    分类:技术分享    r   dataframe   binning   summarize

R dplyr summarise multiple functions to selected variables

I have a dataset for which I want to summarise by mean, but also calculate the max to just 1 of the variables. Let me start with an example of what I would like to achieve: iris %>% group_by(Species) %>% filter(Sepal.Length > 5) %>% summarise_at("Sepal.Length:Petal.Width",funs(mean)) which give me the following result # A tibble: 3 × 5 Species Sepal.Length Sepal.Width Petal.Length Petal.Width <fctr> <dbl> <dbl> <dbl> <dbl> 1 setosa 5.8 4.4 1.9 0.5 2 versicolor 7.0 3.4 5.1 1.8 3 virginica 7.9 3.8 6.9 2.5 Is there an easy way to add, for example, max(Petal.Width)to summarise? So far I have tried

2021-05-03 03:12:13    分类:问答    r   dplyr   summarize

counting the occurrence of substrings in a column in R with group by

I would like to count the occurrences of a string in a column ....per group. In this case the string is often a substring in a character column. I have some data e.g. ID String village 1 fd_sec, ht_rm, A 2 NA, ht_rm A 3 fd_sec, B 4 san, ht_rm, C The code that I began with is obviously incorrect, but I am failing on my search to find out I could use the grep function in a column and group by village impacts <- se %>% group_by(village) %>% summarise(c_NA = round(sum(sub$en41_1 == "NA")), c_ht_rm = round(sum(sub$en41_1 == "ht_rm")), c_san = round(sum(sub$en41_1 == "san")), c_fd_sec = round(sum

2021-05-03 01:07:30    分类:问答    r   summarize

Why does `summarize` drop a group?

I'm fooling around with babynames pkg. A group_by command works, but after the summarize, one of the groups is dropped from the group list. library(babynames) babynames[1:10000, ] %>% group_by(year, name) %>% head(1) # A tibble: 1 x 5 # Groups: year, name [1] year sex name n prop <dbl> <chr> <chr> <int> <dbl> 1 1880 F Mary 7065 0.07238433 This is fine---two groups, year, name. But after a summarize (which respects the groups correctly), the name group is dropped. Am I missing an easy mistake? babynames[1:10000, ] %>% group_by(year, name) %>% summarise(n = sum(n)) %>% head(1) # A tibble: 1 x 3

2021-05-02 13:23:18    分类:问答    r   group-by   dplyr   summarize

Applying group_by and summarise(sum) but keep columns with non-relevant conflicting data?

My question is very similar to Applying group_by and summarise on data while keeping all the columns' info but I would like to keep columns which get excluded because they conflict after grouping. Label <- c("203c","203c","204a","204a","204a","204a","204a","204a","204a","204a") Type <- c("wholefish","flesh","flesh","fleshdelip","formula","formuladelip", "formula","formuladelip","wholefish", "wholefishdelip") Proportion <- c(1,1,0.67714,0.67714,0.32285,0.32285,0.32285, 0.32285, 0.67714,0.67714) N <- (1:10) C <- (1:10) Code <- c("c","a","a","b","a","b","c","d","c","d") df <- data.frame(Label

2021-05-01 12:24:23    分类:问答    r   group-by   tidyverse   mutate   summarize

What is the pandas equivalent of dplyr summarize/aggregate by multiple functions?

I'm having issues transitioning to pandas from R where dplyr package can easily group-by and perform multiple summarizations. Please help improve my existing Python pandas code for multiple aggregations: import pandas as pd data = pd.DataFrame( {'col1':[1,1,1,1,1,2,2,2,2,2], 'col2':[1,2,3,4,5,6,7,8,9,0], 'col3':[-1,-2,-3,-4,-5,-6,-7,-8,-9,0] } ) result = [] for k,v in data.groupby('col1'): result.append([k, max(v['col2']), min(v['col3'])]) print pd.DataFrame(result, columns=['col1', 'col2_agg', 'col3_agg']) Issues: too verbose probably can be optimized and efficient. (I rewrote a for-loop

2021-04-22 03:04:58    分类:问答    python   r   pandas   pandas-groupby   summarize

Using R & dplyr to summarize - group_by, count, mean, sd [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 1 year ago. Improve this question Good day and greetings! This is my first post on Stack Overflow. I am fairly new to R and even newer dplyr. I have a small data set comprised of 2 columns - var1 and var2. The var1 column is comprised of num values. The var2 column is comprised of factors with 3 levels - A, B, and C. var1 var2 1 1.4395244 A 2 1.7698225 A 3 3.5587083 A 4 2.0705084 A 5 2.1292877 A 6

2021-04-20 18:36:35    分类:问答    r   dplyr   summarize

How to interpret dplyr message `summarise()` regrouping output by 'x' (override with `.groups` argument)?

I started getting a new message (see post title) when running group_by and summarise() after updating to dplyr development version 0.8.99.9003. Here is an example to recreate the output: library(tidyverse) library(hablar) df <- read_csv("year, week, rat_house_females, rat_house_males, mouse_wild_females, mouse_wild_males 2018,10,1,1,1,1 2018,10,1,1,1,1 2018,11,2,2,2,2 2018,11,2,2,2,2 2019,10,3,3,3,3 2019,10,3,3,3,3 2019,11,4,4,4,4 2019,11,4,4,4,4") %>% convert(chr(year,week)) %>% mutate(total_rodents = rowSums(select_if(., is.numeric))) %>% convert(num(year,week)) %>% group_by(year,week) %>%

2021-03-26 09:54:46    分类:问答    r   dplyr   summarize

Define and apply custom bins on a dataframe

Using python I have created following data frame which contains similarity values: cosinFcolor cosinEdge cosinTexture histoFcolor histoEdge histoTexture jaccard 1 0.770 0.489 0.388 0.57500000 0.5845137 0.3920000 0.00000000 2 0.067 0.496 0.912 0.13865546 0.6147309 0.6984127 0.00000000 3 0.514 0.426 0.692 0.36440678 0.4787535 0.5198413 0.05882353 4 0.102 0.430 0.739 0.11297071 0.5288008 0.5436508 0.00000000 5 0.560 0.735 0.554 0.48148148 0.8168083 0.4603175 0.00000000 6 0.029 0.302 0.558 0.08547009 0.3928234 0.4603175 0.00000000 I am trying to write a R script to generate another data frame that

2021-03-25 07:16:35    分类:问答    r   dataframe   binning   summarize