My own War Story - How I got stuck between frequency peaks

kuruma · March 20, 2014, 3:49am

Those of you who have read or at least skimmed your eyes over the famous Steven Skienna book: “The Algorithm Design Manual” will possibly recognize the title.

A War Story in that book, for those of you who are unaware of it, is a short story describing how Skienna and some of his students solve real-life company or corporation problems using standard algorithms with some twists and tricks that we all use here in Long Contests…

Today (and I assume that’s a good thing!) I have experienced my first War Story… While I was having breakfast, my dad told me about a problem he and some of his thesis students are facing… So, he came to me for help…

He and all of his students are Civil Engineers who work on a more theoretical and research field if you like to call it like that, and, according to my dad, he and his student are studying vibration data of a large portuguese dam, such that for a given time period they have the amplitude of the oscillations of the dam over time, so they have a function which is:

f(t) -> returns the amplitude of dam's oscillations at time t.

Now they are interested in studying the times when the dam oscillates the most according to some specified parameters (for example, they might want all the oscillations which are evenly spaced in time, or, they might want all the “interesting” oscillations that occur above a given cutoff oscillation value, etc, etc).

The main issue they are facing, is that usually the graph they can plot is obtained from experimental data and, as such, some values are always affected by some error, or, in Signal Processing theory, the graph they plot is full of “noise”, i.e., very small ambient-dependent oscillations they might want to ignore or discard, since they label them as “not-interesting”.

An example of a graph they might have is presented below (this is an illustration graph taken from wikipedia, ignore the axis labels):

![alt text][1]

For instance, and by inspection they are only interested in the 4 highest peaks that are clearly seen in the picture (the 2 neighbouring ones of the highest one and the other one slightly on the left).

So, my challenge was given the pairs <t, f(t)> I needed to devise a scheme to find all these “interesting peaks”.

I immediately had some ideas:

“You can use the derivative of the function and devise a sort of “hill climbing” algorithm where you only consider some peaks where the derivative in a very small neighbouring area is above a given value.”

“Good idea, but that doesn’t exclude the white noise points in the graph.”

Point taken, I thought… The noise was quite troublesome… So, I switched approach.

“How about doing sqrt decomposition? You divide your domain in sqrt(Domain Size) parts and successfully solve recursively for each part and merge the maximums of all parts to get sqrt(Domain Size) values which you can then cutoff as you please…”

At this point, I immediately realized that this wouldn’t work also, since the partitions can happen in “bad places” or near extreme values which could be accidentally left out…

Maybe using a sort of fixed distance to the peaks would rule out this “partition luck” factor, but I saw no practical use on it…

Then I thought about Simulated Annealing or on some heuristics but I’m unsure myself if this is of any help here…

Lastly, all I could think of was applying a sort of filter to eliminate all the noise and then perform a DFT on the filtered graph and sort the resulting oscilations to be above a given cutoff value, but, then again, I’m not sure if this helps…

So, if anyone can help me with this I’d be really glad.

PS: I know this is a bit of an ill-posed problem, but, to narrow it down, assume that what is wanted is all the peaks which are “interesting” and above a given cutoff value.

Bruno
[1]: http://discuss.codechef.com/upfiles/1486582_714574195232066_1266478383_n.jpg

avastpulkit · March 21, 2014, 12:08am

…bump…

kuruma · March 21, 2014, 12:22am

Thanks for the bump

My dad solved this more or less with the MATLAB function findpeaks(), which is part of the Signal Processing Toolbox… Still, a good algorithmic approach would be fun to devise and it would be particularly interesting to hear explanations from some pros

mjbpl · March 21, 2014, 11:16am

About your concern with the sqrt decomposition, you can modify the algorithm not to decompose blocks into exact sqrt sizes, rather decompose it near a point that is guaranteed not to be a local maxima.

You can also calculate local maxima using hill climbing algorithm and then ignore this small segment (that contained local maxima) and then reiterate hill climbing on left and right remaining blocks and then merge the final result.

About white noise there may be some non-promising solutions to smoothen the graph. One such solution would be to use some points at some intervals (more than your least count on x) and do a local curve fitting with those points.

If I understood your question right, following would be my solution:

Step 1: Initialize first peak found location as -infinity with F(x) = infinity.

Step 2: Find difference between F(x+epsilon) - F(x). Keep incrementing x by epsilon as long as you are getting a positive value. Once you get a negative or zero, F(x) was a peak. (Let’s decide next whether to ignore this one or consider). Let’s call this peak location as x1.

If x1 - x0 is larger than some threshold, Consult last peak found (I’m calling it x0). If abs( F(x1) - F(x0) ) is larger than ( max(const_xdiff_max, x1 - x0) * some const threshold), record x0 as a peak location. x0 = x1. You can use a quadratic instead of linear relation with some const threshold for better results.
Else if( F(x1) >= F(x0) ) x0 = x1.

Step 3: If (xend - x0) is larger than some threshold value, x0 was a peak location.

Record peaks only if F(x0) is larger than some threshold value.