天道酬勤,学无止境

rolling-computation

Pandas DataFrame: How to do Set Union Aggregation over a rolling window

I have a Dataframe that contains sets of ids in one column and dates in another: import pandas as pd df = pd.DataFrame([['2018-01-01', {1, 2, 3}], ['2018-01-02', {3}], ['2018-01-03', {3, 4, 5}], ['2018-01-04', {5, 6}]], columns=['timestamp', 'ids']) df['timestamp'] = pd.to_datetime(df['timestamp']) df.set_index('timestamp', inplace=True) ids timestamp 2018-01-01 {1, 2, 3} 2018-01-02 {3} 2018-01-03 {3, 4, 5} 2018-01-04 {5, 6} What I am looking for is a function that can give me the ids for the last x days per day. So, assuming x=3, I'd want the result to be: ids timestamp 2018-01-01 {1, 2, 3}

2022-01-16 18:07:23    分类:问答    python   pandas   set   union   rolling-computation

Pandas - 最近 x 天的值的计数频率(Pandas - Count frequency of value for last x amount of days)

问题 我发现了一些意想不到的结果。 我想要做的是创建一个查看 ID 号和日期的列,并计算过去 7 天内该 ID 号出现的次数(我还想将这个动态设置为 x 数量天,但只是尝试 7 天)。 所以给定这个数据框: import pandas as pd df = pd.DataFrame( [['A', '2020-02-02 20:31:00'], ['A', '2020-02-03 00:52:00'], ['A', '2020-02-07 23:45:00'], ['A', '2020-02-08 13:19:00'], ['A', '2020-02-18 13:16:00'], ['A', '2020-02-27 12:16:00'], ['A', '2020-02-28 12:16:00'], ['B', '2020-02-07 18:57:00'], ['B', '2020-02-07 21:50:00'], ['B', '2020-02-12 19:03:00'], ['C', '2020-02-01 13:50:00'], ['C', '2020-02-11 15:50:00'], ['C', '2020-02-21 10:50:00']], columns = ['ID', 'Date']) 计算每个实例在过去 7 天内发生的代码: df['Date'] = pd.to

2022-01-16 10:10:20    分类:技术分享    python   pandas   datetime   pandas-groupby   rolling-computation

在具有多个参数的 pandas 数据帧上应用滚动函数(Apply rolling function on pandas dataframe with multiple arguments)

问题 我正在尝试在 pandas 数据框上应用具有 3 年窗口的滚动功能。 import pandas as pd # Dummy data df = pd.DataFrame({'Product': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'], 'Year': [2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018], 'IB': [2, 5, 8, 10, 7, 5, 10, 14], 'OB': [5, 8, 10, 12, 5, 10, 14, 20], 'Delta': [2, 2, 1, 3, -1, 3, 2, 4]}) # The function to be applied def get_ln_rate(ib, ob, delta): n_years = len(ib) return sum(delta)*np.log(ob[-1]/ib[0]) / (n_years * (ob[-1] - ib[0])) 预期的输出是 Product Year IB OB Delta Ln_Rate 0 A 2015 2 5 2 1 A 2016 5 8 2 2 A 2017 8 10 1 0.3353 3 A 2018 10 12 3 0.2501 4 B 2015 7 5 -1 5 B 2016

2022-01-16 09:00:37    分类:技术分享    python   pandas   pandas-groupby   rolling-computation

Pandas 分别对每个类别的日期范围求和(Pandas sum over a date range for each category separately)

问题 我有一个数据框,其中包含不同项目的销售交易时间序列: import pandas as pd from datetime import timedelta df_1 = pd.DataFrame() df_2 = pd.DataFrame() df_3 = pd.DataFrame() # Create datetimes and data df_1['date'] = pd.date_range('1/1/2018', periods=5, freq='D') df_1['item'] = 1 df_1['sales']= 2 df_2['date'] = pd.date_range('1/1/2018', periods=5, freq='D') df_2['item'] = 2 df_2['sales']= 3 df_3['date'] = pd.date_range('1/1/2018', periods=5, freq='D') df_3['item'] = 3 df_3['sales']= 4 df = pd.concat([df_1, df_2, df_3]) df = df.sort_values(['item']) df 结果数据框: date item sales 0 2018-01-01 1 2 1 2018-01-02 1 2 2 2018-01-03

2022-01-16 06:48:43    分类:技术分享    python   pandas   time-series   grouping   rolling-computation

大熊猫滚动最大值的numpy版本(Numpy version of rolling maximum in pandas)

问题 TL;DR:我的问题是如何改进我的功能以超越熊猫自己的最大移动功能? 背景资料: 所以我正在处理很多移动平均线,移动最大值和移动最小值等,到目前为止,我发现的唯一移动窗口之类的功能是 pandas.rolling 方法。 问题是:我拥有的数据是 numpy 数组,我想要的最终结果也必须在 numpy 数组中; 尽管我想简单地将其转换为 pandas 系列并返回到 numpy 数组来完成这样的工作: result2_max = pd.Series(data_array).rolling(window).max().to_numpy() ,这太不合情理了,因为转换数据类型似乎没有必要,而且可能有一些方法纯粹在 numpy 实现中做同样的事情。 然而,尽管它看起来很简单,但它比我想出的或在网上看到的任何方法都快。 我将在下面给出小基准: import numpy as np import pandas as pd def numpy_rolling_max(data, window): data = data[::-1] data_strides = data.strides[0] movin_window = np.lib.stride_tricks.as_strided(data, shape=(data.shape[0] - window +1, window)

2022-01-16 03:08:00    分类:技术分享    python   pandas   performance   numpy   rolling-computation

RollingGroupby 上的 Pandas 聚合方法(Pandas Aggregate Method on RollingGroupby)

问题 问题: .agg 方法是否适用于 RollingGroupby 对象? 似乎它应该和 IPython 自动填充此方法,但我收到一个错误。 文档:我没有看到任何特定于RollingGroupby对象的内容。 我可能找错地方了,但我查看了标准移动窗口函数和 GroupBy 样本数据: # test data df = pd.DataFrame({ 'animal':np.random.choice( ['panda','python','shark'], 12), 'period':np.repeat(range(3), 4 ), 'value':np.tile(range(2), 6 ), }) # this works as expected df.groupby(['animal', 'period'])['value'].rolling(2).count() animal period panda 0 2 1.0 2 8 1.0 10 2.0 python 0 0 1.0 1 2.0 1 6 1.0 2 11 1.0 shark 0 3 1.0 1 4 1.0 5 2.0 7 2.0 2 9 1.0 Name: value, dtype: float64 # this works as expected df.groupby(['animal', 'period'])[

2022-01-14 15:24:17    分类:技术分享    python   pandas   numpy   aggregate   rolling-computation

R - Rolling sum of two columns in data.table

I have a data.table as follows - dt = data.table( date = seq(as.Date("2015-12-01"), as.Date("2015-12-10"), by="days"), v1 = c(seq(1, 9), 20), v2 = c(5, rep(NA, 9)) ) dt date v1 v2 1: 2015-12-01 1 5 2: 2015-12-02 2 NA 3: 2015-12-03 3 NA 4: 2015-12-04 4 NA 5: 2015-12-05 5 NA 6: 2015-12-06 6 NA 7: 2015-12-07 7 NA 8: 2015-12-08 8 NA 9: 2015-12-09 9 NA 10: 2015-12-10 20 NA Question 1: I want to add the current row value of v1 with the previous row value of v2 so the output looks like the following. date v1 v2 1: 2015-12-01 1 5 2: 2015-12-02 2 7 3: 2015-12-03 3 10 4: 2015-12-04 4 14 5: 2015-12-05 5

2022-01-14 12:53:15    分类:问答    r   data.table   rolling-computation   rollapply

duplicating records between date gaps within a selected time interval in a PySpark dataframe

I have a PySpark dataframe that keeps track of changes that occur in a product's price and status over months. This means that a new row is created only when a change occurred (in either status or price) compared to the previous month, like in the dummy data below ---------------------------------------- |product_id| status | price| month | ---------------------------------------- |1 | available | 5 | 2019-10| ---------------------------------------- |1 | available | 8 | 2020-08| ---------------------------------------- |1 | limited | 8 | 2020-10| ---------------------------------------- |2 |

2022-01-13 09:15:43    分类:问答    apache-spark   pyspark   pyspark-dataframes   rolling-computation

Apply rolling function on pandas dataframe with multiple arguments

I am trying to apply a rolling function, with a 3 year window, on a pandas dataframe. import pandas as pd # Dummy data df = pd.DataFrame({'Product': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'], 'Year': [2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018], 'IB': [2, 5, 8, 10, 7, 5, 10, 14], 'OB': [5, 8, 10, 12, 5, 10, 14, 20], 'Delta': [2, 2, 1, 3, -1, 3, 2, 4]}) # The function to be applied def get_ln_rate(ib, ob, delta): n_years = len(ib) return sum(delta)*np.log(ob[-1]/ib[0]) / (n_years * (ob[-1] - ib[0])) The expected output is Product Year IB OB Delta Ln_Rate 0 A 2015 2 5 2 1 A 2016 5 8 2 2 A

2022-01-13 02:59:41    分类:问答    python   pandas   pandas-groupby   rolling-computation

Pandas sum over a date range for each category separately

I have a dataframe with timeseries of sales transactions for different items: import pandas as pd from datetime import timedelta df_1 = pd.DataFrame() df_2 = pd.DataFrame() df_3 = pd.DataFrame() # Create datetimes and data df_1['date'] = pd.date_range('1/1/2018', periods=5, freq='D') df_1['item'] = 1 df_1['sales']= 2 df_2['date'] = pd.date_range('1/1/2018', periods=5, freq='D') df_2['item'] = 2 df_2['sales']= 3 df_3['date'] = pd.date_range('1/1/2018', periods=5, freq='D') df_3['item'] = 3 df_3['sales']= 4 df = pd.concat([df_1, df_2, df_3]) df = df.sort_values(['item']) df Resulting dataframe

2022-01-12 12:09:31    分类:问答    python   pandas   time-series   grouping   rolling-computation