天道酬勤,学无止境

statistics

Smoothing Small Data Set With Second Order Quadratic Curve

I'm doing some specific signal analysis, and I am in need of a method that would smooth out a given bell-shaped distribution curve. A running average approach isn't producing the results I desire. I want to keep the min/max, and general shape of my fitted curve intact, but resolve the inconsistencies in sampling. In short: if given a set of data that models a simple quadratic curve, what statistical smoothing method would you recommend? If possible, please reference an implementation, library, or framework. Thanks SO! Edit: Some helpful data (A possible signal graph) The dark colored quadratic

2022-01-19 01:00:30    分类:问答    c++   c   statistics   signal-processing   quadratic

how to sample from an upside down bell curve

I can generate numbers with uniform distribution by using the code below: runif(1,min=10,max=20) How can I sample randomly generated numbers that fall more frequently closer to the minimum and maxium boundaries? (Aka an "upside down bell curve")

2022-01-18 08:00:34    分类:问答    r   statistics

How to random sample lognormal data in Python using the inverse CDF and specify target percentiles?

I'm trying to generate random samples from a lognormal distribution in Python, the application is for simulating network traffic. I'd like to generate samples such that: The modal sample result is 320 (~10^2.5) 80% of the samples lie within the range 100 to 1000 (10^2 to 10^3) My strategy is to use the inverse CDF (or Smirnov transform I believe): Use the PDF for a normal distribution centred around 2.5 to calculate the PDF for 10^x where x ~ N(2.5,sigma). Calculate the CDF for the above distribution. Generate random uniform data along the interval 0 to 1. Use the inverse CDF to transform the

2022-01-18 01:51:07    分类:问答    python   random   statistics   probability-density   cdf

Can one extend the functionality of PDF, CDF, FindDistributionParameters etc in Mathematica?

I've started doing more and more work with the new Mathematica statistics and data analysis features. I attended the "Statistics & Data Analysis with Mathematica" online seminar on Tuesday (great presentation, I highly recommend it) but I've run into some problems that I hope someone on this forum might have a few moments to consider. I've created a rather extensive notebook to streamline my data analysis, call it "AnalysisNotebook". It outputs an extensive series of charts and data including: histograms, PDF and CDF plots, Q-Q plots, plots to study tail fit, hypothesis test data, etc. This

2022-01-17 20:58:29    分类:问答    statistics   wolfram-mathematica   distribution

c++ 离散分布抽样,概率频繁变化(c++ discrete distribution sampling with frequently changing probabilities)

问题 问题:我需要从由某些权重构成的离散分布中进行采样,例如 {w1,w2,w3,..},以及概率分布 {p1,p2,p3,...},其中 pi=wi/(w1+ w2+...)。 一些 wi 的变化非常频繁,但只占所有 wi 的非常低的比例。 但是分布本身因此每次发生时都必须重新规范化,因此我相信 Alias 方法不能有效地工作,因为每次都需要从头开始构建整个分布。 我目前正在考虑的方法是二叉树(堆方法),其中所有wi都保存在最低级别,然后将每两个之和保存在更高级别,依此类推。 它们的总和将处于最高水平,这也是一个归一化常数。 因此,为了在 wi 更改后更新树,需要进行 log(n) 更改,以及从分布中获取样本的相同数量。 问题: Q1。 您对如何更快地实现它有更好的想法吗? Q2。 最重要的部分:我正在寻找一个已经这样做的图书馆。 解释:几年前我自己做了这个,通过在向量中构建堆结构,但从那以后我学到了很多东西,包括发现库(:))和容器,如地图......现在我需要重写那个代码具有更高的功能,这次我想把它做好: 所以 Q2.1 有没有一种很好的方法可以使 c++ 映射不是按索引排序和搜索,而是按其元素的累积总和(这就是我们采样的方式,对吗?...)。 (这是我目前的理论,我想怎么做,但不一定要这样......) Q2.2 也许有更好的方法来做同样的事情? 我相信这个问题是如此频繁

2022-01-16 19:01:07    分类:技术分享    c++   statistics   distribution   probability   sampling

Goodness of fit test for Weibull distribution in python

I have some data that I have to test to see if it comes from a Weibull distribution with unknown parameters. In R I could use https://cran.r-project.org/web/packages/KScorrect/index.html but I can't find anything in Python. Using scipy.stats I can fit parameters with: scipy.stats.weibull_min.fit(values) However in order to turn this into a test I think I need to perform some Monte-Carlo simulation (e.g. https://en.m.wikipedia.org/wiki/Lilliefors_test) I am not sure what to do exactly. How can I make such a test in Python?

2022-01-16 17:33:48    分类:问答    python   scipy   statistics   weibull   openturns

在 R 中拟合 von Mises 分布的混合(Fit a mixture of von Mises distributions in R)

问题 我有一组角度数据,我想将两个 von Mises 分布混合到其中。 如下所示,数据在 0 和 ±π 左右聚集,因此这种情况需要有周期性边界。 我尝试使用 movMF 包来拟合这些数据的分布,但似乎它正在对每一行进行归一化,并且由于这是一组一维数据,因此结果是一个 ±1 的向量。 其他人如何在 R 中拟合这样的分布混合? 回答1 问题在于使用角度向量作为 movMF 函数的输入。 相反,必须将角度转换为单位圆上的点 pts_on_unit_circle <- cbind(cos(angle_in_degrees * pi / 180), sin(angle_in_degrees * pi / 180)) d <- movMF(pts_on_unit_circle, number_of_mixed_vM_fxns) mu <- atan2(d$theta[,2], d$theta[,1]) kappa <- sqrt(rowSums(d$theta^2)) 资料来源:联系了 movMF 包的作者 Kurt Hornik。

2022-01-16 17:05:48    分类:技术分享    r   statistics

c++ discrete distribution sampling with frequently changing probabilities

Problem: I need to sample from a discrete distribution constructed of certain weights e.g. {w1,w2,w3,..}, and thus probability distribution {p1,p2,p3,...}, where pi=wi/(w1+w2+...). some of wi's change very frequently, but only a very low proportion of all wi's. But the distribution itself thus has to be renormalised every time it happens, and therefore I believe Alias method does not work efficiently because one would need to build the whole distribution from scratch every time. The method I am currently thinking is a binary tree (heap method), where all wi's are saved in the lowest level, and

2022-01-16 16:05:52    分类:问答    c++   statistics   distribution   probability   sampling

Fit a mixture of von Mises distributions in R

I have a set of angular data that I'd like to fit a mixture of two von Mises distributions to. As shown below, the data are clustered at about 0 and ±π, so having a periodic boundary is required for this case. I have tried using the movMF package to fit a distribution to these data but it seems that it is normalizing each row, and since this is a set of 1D data, the result is a vector of ±1. How are others fitting a mixture of distributions like this in R?

2022-01-16 13:38:21    分类:问答    r   statistics

Django & Postgres - 百分位数(中位数)和分组依据(Django & Postgres - percentile (median) and group by)

问题 我需要计算每个卖家 ID 的期间中位数(参见下面的简化模型)。 问题是我无法构造 ORM 查询。 模型 class MyModel: period = models.IntegerField(null=True, default=None) seller_ids = ArrayField(models.IntegerField(), default=list) aux = JSONField(default=dict) 询问 queryset = ( MyModel.objects.filter(period=25) .annotate(seller_id=Func(F("seller_ids"), function="unnest")) .values("seller_id") .annotate( duration=Cast(KeyTextTransform("duration", "aux"), IntegerField()), median=Func( F("duration"), function="percentile_cont", template="%(function)s(0.5) WITHIN GROUP (ORDER BY %(expressions)s)", ), ) .values("median", "seller_id") )

2022-01-16 09:07:43    分类:技术分享    python   django   postgresql   statistics   subquery