Hi
Algo Geeks,
Some questions constantly nag me. And this happen many times in case of
Algo questions.
Well this time I will be straight forward while telling question and answer. Prior to this I have send many things but they are trickily said and people did not give it attention. So this time very straight.
The question goes like-
You are given a 1000 points in 2D space and you need to find the most dense 100 points.
The first answer people generally gave was "in worst case it can be solved in order of O(n^2).
That's by finding the distance between every 2 points and then sort it according to that and get the answer. But is it true.
There is grave misconception over here, we cannot do this. Can we actually find such 100 points by this way. Perhaps the top 100 least distances may be of points very far from each other. So what we should do? After so many years of constant nagging with this question i have bit answer, yet i hope that this may have been answered in unsupervised learning. But I want this to share with you, just let me know weather it goes right or it is going badly wrong. So here is that part answer...
1) sort the x-
coordinates of the points
1.a) get the minimum distance along x direction between the points, (say
dx)
2) sort the y-
coordinates of the points
2.a) get the minimum distance along y direction between the points, (say
dy)
(The above two ensures that there would be max 2 points in the grid, -to me it seems the max. is 1 but just to be careful i used here 2)
3) Divide the whole plane into grid of
dx by
dy sized rectangles
4) Calculate the density of points over the whole plane (say
avgden =
total no of given points/ total no. of grids the 2D plane spans to include all these points)
4.a)
minden =
avgden5)For each grid rectangle with
atleast one point do 5.a)
5.a) Now start collecting neighboring grids calculating density untill either there would be 100 points in the collection or its density goes below minden
5.b)keep updating minden for maximum density cluster found or if we only encountering clusters with density well below minden then we can re-estimate the avgden possible for the rest of points, which may help us finding higher density.
Doubt:
Since in case of "Find max. sum in the given array of integers" we never had a factor that decreases the value variably would this
algo be doing its job
Also in that case we knew the point below which we will not go that's zero but here we don't know what would be the highest density we are looking for, which in this case i had taken the
avgden and updating as needed, still there may be cases where this approach may not work. But the first pass must give us some idea about density we are seeking. And it has to be converging with next passes. But with how many iteration the answer may come, is the real hurdle this algo is facing. I am looking for loopholes in this algo and some work at this point.