Is there a reasonably fast way to do
np.percentile(ndarr, axis=0) on data containing NaN values?
np.median, there is the corresponding
bottleneck.nanmedian (https://pypi.python.org/pypi/Bottleneck) that is pretty good.
The best I've come up with for percentile, which is incomplete and presently incorrect, is:
from bottleneck import nanrankdata, nanmax, nanargmin def nanpercentile(x, q, axis): ranks = nanrankdata(x, axis=axis) peak = nanmax(ranks, axis=axis) pct = ranks/peak / 100. # to make a percentile wh = nanargmin(abs(pct-q),axis=axis) return x[wh]
This doesn't work; really what is needed is some way to take the n'th element along the
axis, but I haven't found the numpy slicing trick to do that.
"Reasonably fast" means better than looping over indices, e.g.:
q = 40 x = np.array([[[1,2,3],[6,np.nan,4]],[[0.5,2,1],[9,3,np.nan]]]) out = np.empty(x.shape[:-1]) for i in range(x.shape): for j in range(x.shape): d = x[i,j,:] out[i,j] = np.percentile(d[np.isfinite(d)], q) print out #array([[ 1.8, 4.8], # [ 0.9, 5.4]])
which works but can be exceedingly slow.
np.ma appears not to work as expected; it treats the
nan value as if it were
xm = np.ma.masked_where(np.isnan(x),x) print np.percentile(xm,40,axis=2) # array([[ 1.8, 5.6], # [ 0.9, 7.8]])