Friday, April 25, 2008

Algorithms Mania

Things have been about algorithms lately. Someway or the other. Explaining about dynamic programming to my roommate, convincing people of P != NP consequences and how to move in with things or for that matter taking in witty interview questions or listening to continuous video lectures on NP Completeness and working on all possible sources to get my assignment done (yes, that should be done in polynomial time to meet the submission requirements, ya ya! I know I got freaked out lately).

My algorithms instructor has got me. I would agree on this. He has been great all the sem making us feel good about appreciating algorithms. Note that, I did not say that we are learning to write new algorithms, but merely understanding the beauty of existing ones. It is really fascinating that all what we have learned as of today and for all of those whose concrete proofs are available (in form of textbooks) have been accomplished before 1970s. Or, I will say mid 1970's to be on safer side. For the past 25 years this field, undoubtedly has contributed a lot but in direction which would seem a dead end back then.

The algorithm diaspora covers puzzles from all walks of life. From scheduling small things to building and optimizing complex motives, we rely on algorithms which would be executed by machines and would give us result based on our input set. What is particularly worth noticing here is the problems being discussed are no where in the domain of computer science. For example, the original problem on "3D Matching is NP Complete" had to do with polygamy and marriage between 3 sexes. Is the society ready for anything like that? No, not at all. But still this small problem if solved, would motivate people to link problems to this one and solve them. Or for that matter the N-Queens problem which has a 2^n solution. Come on, I did not see any value of that problem when I learnt about it in my undergrad class. But something of that sort can be transformed to problems in graph theory, wow! I could not envision that. So, yes it amazes me that how much application this field has in whatever walks of life we are in.

So, that brings on to us with questions which are asked during interviews and are supposed to be cracked "On the Spot". Hmmm. Quite a lot of expectation. Performing under pressure is never easy. But if you want to compete against a MIT undergrad who can churn in these solutions without taking a single class in this area, you better hold some ground in this subject. I would classify the problem being asked into 2 categories - One that is crack-able and the Second which are attempt-able. My point is that its good for you that the third category which contains a can of problems which are unsolvable to be as rare as possible. Why? Because you ain't any einstein, so you need not solve problems which he solved. Also, everyone is not in einstein's company, so they won't ask questions which only einstein could solve in his lifetime.

But still, given a problem, it is the approach which matters. Sometimes all you need is to invert the problem upside down to see it naked and surrendering to you. Sometimes you need to get hold of a pivot point for the puzzle to crack. How can I get it right everytime? Wish I knew the answer. The only answer I got was : Practice, practice and practice. You see things when you dive into it. Effort put in makes the picture a lot clearer in this subject. And the best part is that, when you are able to connect those dots, you feel like getting eternal bliss. You feel like you just cracked the $1M Netflix question, when actually what you have done has been done-undone-redone atleast a billion times.

Lately I have been thinking it would have made a lot of sense if the subject had been divided into 2 parts (Algo-I and Algo-II) and taught during our undergrad classes. It is the heart and soul of computer science, come on you need to give some time to it!

Why so much of a rant on algorithms. Because what I am going to do in the next 8 months has everything to do with algorithms but would not be specific to it. While it is true that I would be using existing ones, but the fact is that I am free to choose them and that makes it more difficult because it tests my understanding of the subject and how good am I in expressing my problem and transforming it into a solved puzzle of this subject. So why am I so afraid? The truth is that I did not do justice to this subject. I got busy with projects and blah blah and neglected this. But everytime, I tried to get hold of whats happening in the course, I just wished that I could have paid more attention to it. I am not great in solving puzzles, but I have noticed that by taking small steps towards breaking down the complexity of these questions, we can increase the likelihood of us cracking them down in minutes. All it requires is TIME. You give undivided attention and there you see things happening. But I must tell you that, it is something which cannot be done in 2 days. I know of the consequences of that statement and I am ready to accept it.

Next time you see someone in Computer Science scribbling something on a paper and cracking his head, give him some free space and time :) Not that you abandon him. Because what he is trying to do is work out his mind. And that makes him a healthy person in some aspect of life!

Sunday, April 20, 2008

Mocking Intelligence

I had to write out this stuff. This is taken straight from a presentation by Peter Norvig, Director of Research at Google. The discussion was on training using data sets, algorithms, machine learning, clustering of data sets and finding out potential patterns and outliers.

A nice problem was explained on String Segmentation. I won't try to create my own version but would cite some of the examples mentioned during the presentation. When dealing with texts written in languages such as Chinese etc many a times the need of spaces in between words is ignored. A Human mind reading the same can easily recognize the pattern and thus understand from the context as to what it is trying to convey.

But think about the computer brain. It goes nuts while interpreting these combinations. For example, it is easy to observe what the following line tries to say:

livelisteningparty :: live listening party

But combinations like,
smallandinsignificant becomes 'small and in significant' while it should have been 'small and insignificant'. Hence we conclude that semantic training on the data set is required. Again there might be words which are actually in the context but do not rank high in the training set to make an impact. One of the training sets which he had mentioned was of 1.7B size but would still fail to recognize an uncommon dictionary word and would break it into highly ranked separate clusters lacking any meaning altogether.

Ok, now the fun part. The examples next follow particular nuisance created by this parsing. In each of the examples mentioned below, you have a website hosted somewhere on web. And see what the computer makes up while tagging them.

www.whorepresents.com : who represents provides Contact Info for Celebraties etc :: whore presents (Now imagine what the similar searches would lead to)

www.therapistfinder.com : Finds you a Therapist in California :: the rapist finder (Gosh! The Dept. of Investigation would buy this one out!!)

Now, this one we all use for something or the other (Cached results remember :)

www.experts-exchange.com : Provides inputs to your queries :: expert sexchange (Yeh I know you would say that the delimiter should not be ignored, but then do you know the very reason of having that! ... Yeh :)

www.penisland.net : pen island provides Custom made pens on internet :: (Aha! Pop Quiz Time .... Left for you)

So, still we need things out here to evolve. Computers need to socialize more I guess and know what fits in where.


SiteMeter