Data smoothing (math question)

Discussion in 'Physics & Math' started by pilpaX, Mar 9, 2011.

  1. pilpaX amateur-science.com Registered Senior Member

    Messages:
    239
    Hello,

    I'm looking for some method to smooth data. I have talked to math wizards at my local university and googled interwebs. So far I have found (almost)nothing that would be robust enough for my data. I have also looked into python packages: scipy, numpy and R.

    What I am looking for is something like moving average , but moving average only works correctly if Xn-Xn-1=const which is not always so. Basically im looking similar method to moving average, but it should take account that the data rate on X axis might not be constant.

    For example, if one measures temperature every day for 1 year, one can smooth the data using 10-day moving average. But if you measure temperature irregularly(some days twice and skip some days) then the moving average will fail.

    Thank you.
     
  2. Google AdSense Guest Advertisement



    to hide all adverts.
  3. siphra Registered Senior Member

    Messages:
    344
    Honestly, If the data smoothing tools in programs like Excel or some of the more complex programs won't do what you want, and if you can't find a method that is 'robust' enough, your data may not be able to be presented in that way.

    Because even excel can do polynomial smoothing.

    And what your example is, is a bad example. It shows at best, poor data gathering skills.
     
  4. Google AdSense Guest Advertisement



    to hide all adverts.
  5. Absane Rocket Surgeon Valued Senior Member

    Messages:
    8,989
    Have you considered polynomial interpolation? Though, there's no guarantee that given n points to find the representative polynomial that f[a_(n+1)] will be anywhere near the actual value.

    Another problem is that if you have A LOT of points, then your polynomial might not be of any practical interest since people like to keep things simple.

    If you're sufficiently skilled, you might be able to look at the graphed data and make an educated guess as to what sort of function you might need. Maybe a/ln(b*x), e^(a*x), a/(x+b) - c*sin(x^3), etc. and just find the constants a, b, c, .... Though, it takes some experience or a lot of time on your hands.

    Another possibility is to accept that the data sucks.
     
  6. Google AdSense Guest Advertisement



    to hide all adverts.
  7. RJBeery Natural Philosopher Valued Senior Member

    Messages:
    4,222
    lol!
     
  8. przyk squishy Valued Senior Member

    Messages:
    3,203
    Try taking the moving average over both the X and Y axes? Eg. if you want to average over sets of 10 consecutive points:
    \( \begin{align} \tilde{Y}_{n} \,&=\, \frac{Y_{n} \,+\, Y_{n+1} \,+\, \ldots \,+\, Y_{n+9}}{10} \\ \tilde{X}_{n} \,&=\, \frac{X_{n} \,+\, X_{n+1} \,+\, \ldots \,+\, X_{n+9}}{10} \,, \end{align} \)​
    and plot the \(\tilde{Y}_{n}\)'s vs, the \(\tilde{X}_{n}\)'s, though I don't know about calling a method like this "robust".
     
  9. rpenner Fully Wired Valued Senior Member

    Messages:
    4,833
    Since for weather, the 4-hour variability can easily exceed the 24-hour variability, I'm not sure that data smoothing on irregularly sampled data is going to tell you more about the weather than your sampling practice.
    However, it looks from first principles that you could adapt an Infinite Impulse Response filter to irregularly-sampled data.

    Let \(\tau\) be your smoothing constant with units of time.
    Let \(t_i\) be the times of your data collection.
    Let \(x_i\) be the values of your data.
    Let \(y_0 = x_0\) to help smooth the initial transient.

    Then let \(y_i = ( 1 - \alpha_i ) x_i + \alpha_i y_{i-1}\) where \(\alpha = e^{- \frac{t_i - t_{i-1}}{\tau}}\).

    In the case of uniformly sampled data with period \(\Delta t\) this is an exponential moving average (EWMA) \(y_i = ( 1 - e^{- \frac{\Delta t}{\tau}} ) x_i + e^{- \frac{\Delta t}{\tau}} y_{i-1}\) and in the limit of a small ratio \(\frac{\Delta t}{\tau} \), this is \(y_i = \frac{\Delta t}{\tau} x_i + (1 - \frac{\Delta t}{\tau}) y_{i-1}\).

    But statistically, this is just meaningless manipulation of your data and may not reflect on the system you were measuring, especially when the signal has signal power at frequencies higher than half your average sample rate. It's only suitable for "guiding the eye" when you think a plain linear fit would not suit you.

    I think using AJ Johnson's non-uniform discrete fourier transform, filtering out the high frequency components and then reconstructing the time series should work (in the absence of high frequency data missed by the sampling).

    Here's a pointer to it for matlab.
    http://www.mathworks.com/matlabcentral/newsreader/view_original/765504
     
  10. rpenner Fully Wired Valued Senior Member

    Messages:
    4,833
    It's more "robust" if you use median rather than mean.
     
  11. Dinosaur Rational Skeptic Valued Senior Member

    Messages:
    4,885
    Have you plotted the data points?

    This is usually a good idea prior to trying to do curve fitting. Some times a plot of the data indicates that there is no reasonable function which will approximate it.

    If the appearance of the point plot is encouraging, you might try a spline fit (use Google). If my memory is not way off base, I think a spline fit uses least squares approximating methods, but does not attempt to fit all the points with one function.

    It chooses sets of points & develops a different approximating function for each set. Using this method, data conforming to an exponential (or some esoteric) function can be closely approximated using third or fourth order polynomials. Note that an exponential function cannot be approximated over a large range by a single third or fourth order polynomial, but a spline fit will do a good job.
     
  12. pilpaX amateur-science.com Registered Senior Member

    Messages:
    239
    Thank you for responses, I'll try out rpenner and przyk methods.

    I already have a script for cubic-spline (interpolation), but it passes through all the original data points, thats why I (sometimes) need to smooth the data before splining.

    Found out that OpenOffice Calc has cubic- and b-spline line smoothing options, and it seems that the b-spline could be something usable for smoothing out noise and robust enough if data is irregularly spaced on the X axis.

    Please Register or Log in to view the hidden image!


    b-spline in OO calc
     
    Last edited: Mar 10, 2011
  13. rpenner Fully Wired Valued Senior Member

    Messages:
    4,833
    This doesn't look like data where you need to draw a line at all, unless you have a model that you are comparing to the data.
     

Share This Page