algorithm - Peak detection of measured signal


Translate

We use a data acquisition card to take readings from a device that increases its signal to a peak and then falls back to near the original value. To find the peak value we currently search the array for the highest reading and use the index to determine the timing of the peak value which is used in our calculations.

This works well if the highest value is the peak we are looking for but if the device is not working correctly we can see a second peak which can be higher than the initial peak. We take 10 readings a second from 16 devices over a 90 second period.

My initial thoughts are to cycle through the readings checking to see if the previous and next points are less than the current to find a peak and construct an array of peaks. Maybe we should be looking at a average of a number of points either side of the current position to allow for noise in the system. Is this the best way to proceed or are there better techniques?


We do use LabVIEW and I have checked the LAVA forums and there are a number of interesting examples. This is part of our test software and we are trying to avoid using too many non-standard VI libraries so I was hoping for feedback on the process/algorithms involved rather than specific code.


Toutes les réponses
  • Translate

    You could try signal averaging, i.e. for each point, average the value with the surrounding 3 or more points. If the noise blips are huge, then even this may not help.

    I realise that this was language agnostic, but guessing that you are using LabView, there are lots of pre-packaged signal processing VIs that come with LabView that you can use to do smoothing and noise reduction. The NI forums are a great place to get more specialised help on this sort of thing.


  • Translate

    There are lots and lots of classic peak detection methods, any of which might work. You'll have to see what, in particular, bounds the quality of your data. Here are basic descriptions:

    1. Between any two points in your data, (x(0), y(0)) and (x(n), y(n)), add up y(i + 1) - y(i) for 0 <= i < n and call this T ("travel") and set R ("rise") to y(n) - y(0) + k for suitably small k. T/R > 1 indicates a peak. This works OK if large travel due to noise is unlikely or if noise distributes symmetrically around a base curve shape. For your application, accept the earliest peak with a score above a given threshold, or analyze the curve of travel per rise values for more interesting properties.

    2. Use matched filters to score similarity to a standard peak shape (essentially, use a normalized dot-product against some shape to get a cosine-metric of similarity)

    3. Deconvolve against a standard peak shape and check for high values (though I often find 2 to be less sensitive to noise for simple instrumentation output).

    4. Smooth the data and check for triplets of equally spaced points where, if x0 < x1 < x2, y1 > 0.5 * (y0 + y2), or check Euclidean distances like this: D((x0, y0), (x1, y1)) + D((x1, y1), (x2, y2)) > D((x0, y0),(x2, y2)), which relies on the triangle inequality. Using simple ratios will again provide you a scoring mechanism.

    5. Fit a very simple 2-gaussian mixture model to your data (for example, Numerical Recipes has a nice ready-made chunk of code). Take the earlier peak. This will deal correctly with overlapping peaks.

    6. Find the best match in the data to a simple Gaussian, Cauchy, Poisson, or what-have-you curve. Evaluate this curve over a broad range and subtract it from a copy of the data after noting it's peak location. Repeat. Take the earliest peak whose model parameters (standard deviation probably, but some applications might care about kurtosis or other features) meet some criterion. Watch out for artifacts left behind when peaks are subtracted from the data. Best match might be determined by the kind of match scoring suggested in #2 above.

    I've done what you're doing before: finding peaks in DNA sequence data, finding peaks in derivatives estimated from measured curves, and finding peaks in histograms.

    I encourage you to attend carefully to proper baselining. Wiener filtering or other filtering or simple histogram analysis is often an easy way to baseline in the presence of noise.

    Finally, if your data is typically noisy and you're getting data off the card as unreferenced single-ended output (or even referenced, just not differential), and if you're averaging lots of observations into each data point, try sorting those observations and throwing away the first and last quartile and averaging what remains. There are a host of such outlier elimination tactics that can be really useful.


  • Translate

    This problem has been studied in some detail.

    There are a set of very up-to-date implementations in the TSpectrum* classes of ROOT (a nuclear/particle physics analysis tool). The code works in one- to three-dimensional data.

    The ROOT source code is available, so you can grab this implementation if you want.

    From the TSpectrum class documentation:

    The algorithms used in this class have been published in the following references:

    [1] M.Morhac et al.: Background elimination methods for multidimensional coincidence gamma-ray spectra. Nuclear Instruments and Methods in Physics Research A 401 (1997) 113- 132.

    [2] M.Morhac et al.: Efficient one- and two-dimensional Gold deconvolution and its application to gamma-ray spectra decomposition. Nuclear Instruments and Methods in Physics Research A 401 (1997) 385-408.

    [3] M.Morhac et al.: Identification of peaks in multidimensional coincidence gamma-ray spectra. Nuclear Instruments and Methods in Research Physics A 443(2000), 108-125.

    The papers are linked from the class documentation for those of you who don't have a NIM online subscription.


    The short version of what is done is that the histogram flattened to eliminate noise, and then local maxima are detected by brute force in the flattened histogram.


  • Translate

    I would like to contribute to this thread an algorithm that I have developed myself:

    It is based on the principle of dispersion: if a new datapoint is a given x number of standard deviations away from some moving mean, the algorithm signals (also called z-score). The algorithm is very robust because it constructs a separate moving mean and deviation, such that signals do not corrupt the threshold. Future signals are therefore identified with approximately the same accuracy, regardless of the amount of previous signals. The algorithm takes 3 inputs: lag = the lag of the moving window, threshold = the z-score at which the algorithm signals and influence = the influence (between 0 and 1) of new signals on the mean and standard deviation. For example, a lag of 5 will use the last 5 observations to smooth the data. A threshold of 3.5 will signal if a datapoint is 3.5 standard deviations away from the moving mean. And an influence of 0.5 gives signals half of the influence that normal datapoints have. Likewise, an influence of 0 ignores signals completely for recalculating the new threshold: an influence of 0 is therefore the most robust option.

    It works as follows:

    Pseudocode

    # Let y be a vector of timeseries data of at least length lag+2
    # Let mean() be a function that calculates the mean
    # Let std() be a function that calculates the standard deviaton
    # Let absolute() be the absolute value function
    
    # Settings (the ones below are examples: choose what is best for your data)
    set lag to 5;          # lag 5 for the smoothing functions
    set threshold to 3.5;  # 3.5 standard deviations for signal
    set influence to 0.5;  # between 0 and 1, where 1 is normal influence, 0.5 is half
    
    # Initialise variables
    set signals to vector 0,...,0 of length of y;   # Initialise signal results
    set filteredY to y(1,...,lag)                   # Initialise filtered series
    set avgFilter to null;                          # Initialise average filter
    set stdFilter to null;                          # Initialise std. filter
    set avgFilter(lag) to mean(y(1,...,lag));       # Initialise first value
    set stdFilter(lag) to std(y(1,...,lag));        # Initialise first value
    
    for i=lag+1,...,t do
      if absolute(y(i) - avgFilter(i-1)) > threshold*stdFilter(i-1) then
        if y(i) > avgFilter(i-1)
          set signals(i) to +1;                     # Positive signal
        else
          set signals(i) to -1;                     # Negative signal
        end
        # Adjust the filters
        set filteredY(i) to influence*y(i) + (1-influence)*filteredY(i-1);
        set avgFilter(i) to mean(filteredY(i-lag,i),lag);
        set stdFilter(i) to std(filteredY(i-lag,i),lag);
      else
        set signals(i) to 0;                        # No signal
        # Adjust the filters
        set filteredY(i) to y(i);
        set avgFilter(i) to mean(filteredY(i-lag,i),lag);
        set stdFilter(i) to std(filteredY(i-lag,i),lag);
      end
    end
    

    Demo

    Demonstration of robust thresholding algorithm

    > Original answer


  • Translate

    This method is basically from David Marr's book "Vision"

    Gaussian blur your signal with the expected width of your peaks. this gets rid of noise spikes and your phase data is undamaged.

    Then edge detect (LOG will do)

    Then your edges were the edges of features (like peaks). look between edges for peaks, sort peaks by size, and you're done.

    I have used variations on this and they work very well.


  • Translate

    I think you want to cross-correlate your signal with an expected, exemplar signal. But, it has been such a long time since I studied signal processing and even then I didn't take much notice.


  • Translate

    I don't know very much about instrumentation, so this might be totally impractical, but then again it might be a helpful different direction. If you know how the readings can fail, and there is a certain interval between peaks given such failures, why not do gradient descent at each interval. If the descent brings you back to an area you've searched before, you can abandon it. Depending upon the shape of the sampled surface, this also might help you find peaks faster than search.


  • Translate

    Is there a qualitative difference between the desired peak and the unwanted second peak? If both peaks are "sharp" -- i.e. short in time duration -- when looking at the signal in the frequency domain (by doing FFT) you'll get energy at most bands. But if the "good" peak reliably has energy present at frequencies not existing in the "bad" peak, or vice versa, you may be able to automatically differentiate them that way.


  • Translate

    You could apply some Standard Deviation to your logic and take notice of peaks over x%.