天道酬勤,学无止境

pandas-resample

Finding the average of the unequal values per month and distribute it based on some conditions

I am currently struggling with converting my data into a useful dataset. I need to evenly distribute payments from the first month up to the last month. The problem is that payments are inconsistent and unequal. Also, there are payments that have been fully paid and should be distributed from the first payment plus the term it is applicable based on the agreement dataframe. My tables are the following: 1st table: payments cust_id agreement_id date payment 1 A 12/1/20 200 1 A 2/2/21 200 1 A 2/3/21 100 1 A 5/1/21 200 1 B 1/2/21 50 1 B 1/9/21 20 1 B 3/1/21 80 1 B 4/23/21 90 2 C 1/21/21 600 3 D 3

2022-01-13 02:01:29    分类:问答    python   r   pandas-groupby   pandas-resample

在 Python 中使用向前和向后填充窗口进行分组和重新采样(Groupby and resample using forward and backward fill in window in Python)

问题 我想重采样数据列使用前填充ffill和落后填充bfill在频率1min ,而分组df的id列。 df : id timestamp data 1 1 2017-01-02 13:14:53.040 10.0 2 1 2017-01-02 16:04:43.240 11.0 ... 4 2 2017-01-02 15:22:06.540 1.0 5 2 2017-01-03 13:55:34.240 2.0 ... 我用了: pd.DataFrame(df.set_index('timestamp').groupby('id', sort=True)['data'].resample('1min').ffill().bfill()) 如何通过在过去 10 天后的窗口内重新采样来添加附加条件? 所以最后一个timestamp读数是现在,第一个timestamp读数是datetime.datetime.now() - pd.to_timedelta("10day") 。 目标是为每个id组提供相同数量的读数。 更新: 尝试: start = datetime.datetime.now() - pd.to_timedelta("10day") end = datetime.datetime.now() r = pd.to_datetime(pd.date_range(start

2021-10-31 01:01:04    分类:技术分享    python   pandas   numpy   datetime   pandas-resample

我可以动态选择应用于 Pandas Resampler 对象的方法吗?(Can I dynamically choose the method applied on a pandas Resampler object?)

问题 我正在尝试创建一个函数来重新采样pandas时间序列数据。 我希望可以选择根据我发送的数据类型来指定发生的聚合类型(即对于某些数据,取每个 bin 的总和是合适的,而对于其他数据,取平均值是必要的,等等。)。 例如这样的数据: import pandas as pd import numpy as np dr = pd.date_range('01-01-2020', '01-03-2020', freq='1H') df = pd.DataFrame(np.random.rand(len(dr)), index=dr) 我可以有这样的功能: def process(df, freq='3H', method='sum'): r = df.resample(freq) if method == 'sum': r = r.sum() elif method == 'mean': r = r.mean() #... #more options #... return r 对于少量聚合方法,这很好,但如果我想从所有可能的选择中进行选择,这似乎很乏味。 我希望使用getattr来实现类似这篇文章的内容(在“使用它:概括方法调用”下)。 但是,我找不到一种方法来做到这一点: def process2(df, freq='3H', method='sum'): r = df

2021-10-30 09:52:01    分类:技术分享    python   pandas   getattr   pandas-resample

Groupby and resample using forward and backward fill in window in Python

I want to resample data column using forward fill ffill and backward fill bfill at the frequency of 1min while grouping df by id column. df: id timestamp data 1 1 2017-01-02 13:14:53.040 10.0 2 1 2017-01-02 16:04:43.240 11.0 ... 4 2 2017-01-02 15:22:06.540 1.0 5 2 2017-01-03 13:55:34.240 2.0 ... I used: pd.DataFrame(df.set_index('timestamp').groupby('id', sort=True)['data'].resample('1min').ffill().bfill()) How can I add an additional condition, by resampling within the window of past 10 days from now? So the last timestamp reading is now and the first timestamp reading is datetime.datetime

2021-10-24 17:13:05    分类:问答    python   pandas   numpy   datetime   pandas-resample

Can I dynamically choose the method applied on a pandas Resampler object?

I am trying to create a function which resamples time series data in pandas. I would like to have the option to specify the type of aggregation that occurs depending on what type of data I am sending through (i.e. for some data, taking the sum of each bin is appropriate, while for others, taking the mean is needed, etc.). For example data like these: import pandas as pd import numpy as np dr = pd.date_range('01-01-2020', '01-03-2020', freq='1H') df = pd.DataFrame(np.random.rand(len(dr)), index=dr) I could have a function like this: def process(df, freq='3H', method='sum'): r = df.resample(freq

2021-10-23 20:01:47    分类:问答    python   pandas   getattr   pandas-resample

What is the difference between bins when using groupby apply vs resample apply?

This is somewhat of a broad topic, but I will try to pare it to some specific questions. I have noticed a difference between resample and groupby that I am curious to learn about. Here is some hourly time series data: In[]: import pandas as pd dr = pd.date_range('01-01-2020 8:00', periods=10, freq='H') df = pd.DataFrame({'A':range(10), 'B':range(10,20), 'C':range(20,30)}, index=dr) df Out[]: A B C 2020-01-01 08:00:00 0 10 20 2020-01-01 09:00:00 1 11 21 2020-01-01 10:00:00 2 12 22 2020-01-01 11:00:00 3 13 23 2020-01-01 12:00:00 4 14 24 2020-01-01 13:00:00 5 15 25 2020-01-01 14:00:00 6 16 26

2021-10-22 03:43:33    分类:问答    python   pandas   pandas-groupby   downsampling   pandas-resample

Resampling boolean values in pandas

I have run into a property which I find peculiar about resampling Booleans in pandas. Here is some time series data: import pandas as pd import numpy as np dr = pd.date_range('01-01-2020 5:00', periods=10, freq='H') df = pd.DataFrame({'Bools':[True,True,False,False,False,True,True,np.nan,np.nan,False], "Nums":range(10)}, index=dr) So the data look like: Bools Nums 2020-01-01 05:00:00 True 0 2020-01-01 06:00:00 True 1 2020-01-01 07:00:00 False 2 2020-01-01 08:00:00 False 3 2020-01-01 09:00:00 False 4 2020-01-01 10:00:00 True 5 2020-01-01 11:00:00 True 6 2020-01-01 12:00:00 NaN 7 2020-01-01 13

2021-08-02 15:57:43    分类:问答    python   pandas   boolean   pandas-resample

Resample Pandas Dataframe Without Filling in Missing Times

Resampling a dataframe can take the dataframe to either a higher or lower temporal resolution. Most of the time this is used to go to lower resolution (e.g. resample 1-minute data to monthly values). When the dataset is sparse (for example, no data were collected in Feb-2020) then the Feb-2020 row in will be filled with NaNs the resampled dataframe. The problem is when the data record is long AND sparse there are a lot of NaN rows, which makes the dataframe unnecessarily large and takes a lot of CPU time. For example, consider this dataframe and resample operation: import numpy as np import

2021-05-13 05:24:56    分类:问答    python   pandas   dataframe   pandas-resample