Showing posts with label Development. Show all posts
Showing posts with label Development. Show all posts

Sunday, April 20, 2008

Mocking Intelligence

I had to write out this stuff. This is taken straight from a presentation by Peter Norvig, Director of Research at Google. The discussion was on training using data sets, algorithms, machine learning, clustering of data sets and finding out potential patterns and outliers.

A nice problem was explained on String Segmentation. I won't try to create my own version but would cite some of the examples mentioned during the presentation. When dealing with texts written in languages such as Chinese etc many a times the need of spaces in between words is ignored. A Human mind reading the same can easily recognize the pattern and thus understand from the context as to what it is trying to convey.

But think about the computer brain. It goes nuts while interpreting these combinations. For example, it is easy to observe what the following line tries to say:

livelisteningparty :: live listening party

But combinations like,
smallandinsignificant becomes 'small and in significant' while it should have been 'small and insignificant'. Hence we conclude that semantic training on the data set is required. Again there might be words which are actually in the context but do not rank high in the training set to make an impact. One of the training sets which he had mentioned was of 1.7B size but would still fail to recognize an uncommon dictionary word and would break it into highly ranked separate clusters lacking any meaning altogether.

Ok, now the fun part. The examples next follow particular nuisance created by this parsing. In each of the examples mentioned below, you have a website hosted somewhere on web. And see what the computer makes up while tagging them.

www.whorepresents.com : who represents provides Contact Info for Celebraties etc :: whore presents (Now imagine what the similar searches would lead to)

www.therapistfinder.com : Finds you a Therapist in California :: the rapist finder (Gosh! The Dept. of Investigation would buy this one out!!)

Now, this one we all use for something or the other (Cached results remember :)

www.experts-exchange.com : Provides inputs to your queries :: expert sexchange (Yeh I know you would say that the delimiter should not be ignored, but then do you know the very reason of having that! ... Yeh :)

www.penisland.net : pen island provides Custom made pens on internet :: (Aha! Pop Quiz Time .... Left for you)

So, still we need things out here to evolve. Computers need to socialize more I guess and know what fits in where.


Friday, March 14, 2008

Projects, Linux and Tricks

This is perhaps an useful post I am writing after a long long time. Normally it is just rants on life and things which are bogus or seem to be. Without digressing much from the topic, let me tell you about my rebirth in the linux world. Still in dormant phase, but few minutes back - I just saved myself from buying a 2 GB RAM to work full time on Linux and still use Windows for GTalk Calls. Haha, what a reason to have Vista on my laptop.

Question 1

Sometimes while using Visual Studio C++, you feel out-of-the-blue the auto complete feature suddenly stopped working. You would close the solution and reopen, but just nothing would happen.

Answer

The auto complete feature is basically a slow build up which Visual Studio does time and hence while you are busy coding. It builds a huge repository of index which would help you to quickly access lot of your code snippets and lines with shortcuts to make your life as a programmer simpler. Now, if you search in your solution directory, there will be a file with the extension .ncb. Now thats the culprit for the problem above. It would be having the same name as your project and would have a description as some "Intelli sense ...". Just check the size and I promise you that it would be larger than your code base. Just for statistics, while my code base was in all just 380 KB in total, the .ncb file was 1.5 MB. So now you know the exact reason, why in databases, data actually takes 20 GB of space and the indices take the rest 80 GB. You just have so many of them to make your life simpler as all you know to use is "search".

Hence, the simplest solution is to delete the .ncb file. Don't worry, nothing will happen. Until you source code is there, this file has no significance. As soon as you open your IDE, this file will start building again. And in no time it will grow that big. Many a times, due to some **poor** programmers out there in VS team, the ncb file gets corrupted. Hence, from that point on, nothing starts working properly. Older additions might work, but if you add new classes, it will just not respond.

So check first if compilation goes through. If it does, there is no reason IntelliSense should not work.

Question 2

I have just a 1 GB of RAM. I don't want to work on Vista except that I want to receive calls on GTalk. And somehow Ubuntu on Vmware runs slow.

Answer

I don't have a concrete answer for this one. But this is what saved me from buying an extra 2 GB and blowing out $80.

Go to msconfig and turn off all services you don't want.
Open Task Manager and kill all the processes you don't want. See to it that they are removed from startup.
Go to System Properties and see that the memory usage is optimized for **BEST PERFORMANCE** - which means back to Windows 95. :) This blog I am typing is in Ubuntu, and I can guarantee that if someone performs a GUI comparison between my Vista and Ubuntu, the latter will win hands down. Still, now I am getting a lot better performance than earlier when my mouse used to hang and processes used to take a lot of time before getting spawned. So much so for Vista!

Question 3

Whats an ok-if-not-nice setup if I want to move to Linux?

Answer

I am playing safe here coz I know people tend to use a lot of tools and swear by it. When I install a package, I feel that I have a necessary requirement to have it. I hate to have unused softwares on my disk if I am not using them.

So I installed Ubuntu. Pretty easy and nice. Next was to get kdevelop. I know vim folks would dread me for this. But I am an IDE person (moving from eclipse, come on; you got to give me a chance).

If I evaluate kdevelop, I would rate it at 7/10. Nice integration with CVS and SVN. Autocomplete just about works perfectly fine. Switching to header/body is not that perfect but its manageable with the help of the File Tree. Integration with konsole is available at the bottom. It recommends kompare for diffs and is really very well suited for collaborative development.

What I did not like is that, the debugging is not that great. I mean you don't have that great an interface - coz it makes you slow. But in case you are comfortable with gdb, you can just find your way out. Regular watchpoints, breakpoints etc work fine.

So, pretty much thats my linux setup. I allocated around 16 GB of space and am pretty much all set to work on the 6 million record dataset I have for full time :).

Question 4

What to do if you suck at using Linux Commands?

Answer

I am still trying to find that out. But what I got is that, you can't memorize them in a day. It's only through long long hours of code development that you realize that suddenly you want to do some filesystem or memory stuffs. As one of my friend perhaps screwed up the vim version and had to rollback to prev version but was not able to figure out a way to leave glibc6 untouched. Such nifty things won't come to you overnight. It is just that you have to make linux your mistress, accept your illicit relationship in public and then you might find getting into the groove.

This question arose because in my last 2 interviews, I have been shot and fired with unix commands which I have managed to ignore till today. But perhaps, destiny has its own thoughts.

So currently this is the list of open items for me:

1. Start using unix commands. Just grep won't do :)
2. Start learning python for scripting. You need one, once in a while apart from shell script.
3. Install mysql and start tweaking in.
4. Algorithms, Algorithms, Algorithms. - Do I have to say more?

I was asked today about memcached. I remember before the beginning of all these interviews etc, when I first took the Database System Implementation course, read a lot about Google Big Table and was fascinated about Distributed Systems - the first thing which impressed me was memcached. Next was the hadoop project; a part of lucene. I just wished if I ever get a chance to work on something like memcached. If I ever could learn to know what Oracle Caching Algorithm is. I had got the opportunity today - just that perhaps I blew it, and that too I did it pretty nicely. Hence, this rant. Oh, it comes again!

So guys, there are things open here and we need to work. Your comments are utmost welcome. Have some new things to share, do leave me a note. Adios.

Tuesday, January 29, 2008

Language Bashing

"You know what - Java sucks". When someone tells something in those lines in front me - my blood boils. I am literally letting my heart speak rather than making the expression any less intense.

Well, as a matter of fact, I was one of the victims of this unnamed syndrome when I graduated from college. At that time, we had three choices - C or C++ or Java. Some went ahead with C, few with C++ and many with Java. I belonged to the last camp. My frequent attempts were to educate people with Java and how powerful it is. I despised people who coded in C - like they live in some sixties still. For C++ people, I had some admiration - but I knew that 90% people who say that they write C++ code, actually mean they write "C with Classes" code. So, I often indulged in verbal bashes in favor of Java (quoting the design patterns to whom most were unfamiliar). Sometimes, I could convince people - sometimes I just gave it up.

After a while, AJAX came into picture. That, I believe, is a turning point in defining what would be the perfect language. I will give the answer after a few lines here. The next noticeable turnaround was the chanting of "python" mantra by Google-related sources. People said that C++ and Java are soon to be extinct : scripting is the way to go. You will find millions of just blog lines and comments arguing that this language is the future and that cannot hold any longer. Had those developers written that many lines of code - probably some company would have posted more profit than Google did (or whomsoever is the leader in this area). Few long-bearded gentlemen proposed Rails and further Ruby-on-Rails. I guess - that is the sweetest name of any framework I have ever heard. And again all hell broke loose. People against people - on something so abstract.

And still today I hear it - "Java is for web development Man! Do you want to go in that field?"

People who are expert in PERL should die. And those still writing code in Assembly - I guess they should take a hack-saw, put in some salt & after-shave mix on the ridge and start cutting their throats right now. I am sad to say them this - You guys don't even get discussed among these people. Read on.... you can do those once you are done with this article. I hope you change your mind.

The ONLY answer I have for all these people - who ask such STUPID question & perhaps are too adamant about seeing the truth is : Someday, you are going to learn it the hard way - brutal and it will hurt.

"There exists no ONE language for all purposes". That being said, people still try to make coteries among language. Oh you know what - if you are in this subset {ASM,C,C++}, then take the kleen closure and you are cool. Otherwise, you are not among the best. Shocking!!

I have been riding this wave for 4 years now. And perhaps I can see the shore. My belief is that there is exists no ONE language for all solutions. Neither coteries help. What matters the most is this simple question - "What is the fastest way you can make money out of the effort you put in and once done how long does it take you to keep it generating worthy revenue". I guess you would understand how sensible that question is - you involve business in and you see everyone is nodding and listening to you very carefully. Read on.

Money matters to us all. We may want to be a filthy rich professional or are a humble millionaire already through stock options - what we all believe in at the end of the day is we EARN our money and this earning never dries - till we are faithful to our work.

Said that - we are always home with certain technologies like C++, Java, Oracle, MS Office etc to name a few. Anything beyond that will make our ears go deaf. Protests will just be knocked out without giving any second thoughts.

No doubt they are the best products of some of the VERY best minds in our industry - but didn't they solve some purpose too easily which at one time looked unsolvable. That is the main reason why they have gained so much popularity and people have embraced it. But one needs to know, someday a bigger different problem would exist - and then everything would break loose again. We need a new solution and it would be one of its kind. Evolution is something which we can't stop. We HAVE to embrace it and live our lives.

So for people who are still reading as in why Java DOES NOT suck against C++ - I will answer next.

Any product, given a bunch of exceptionally talented engineers, can be written to the best it can perform. Did I forget the language part there?

No.

You guys might want to read the INTERNET more than a NORMAL / AVERAGE person who uses Java does. There are many benchmarks which cite performance measurements with Java outperforming C++ compiled with g++. I AM NO WAY SAYING NOW THAT C++ IS BAD. There can be many reasons to this - The platform can be a mobile phone whose stack was implemented in Java - so JVM compatibility was the best when the apps were written in Java. What you often forget is the following two simple things which are utmost important to any product development:

1. Time to turn around the concept to reality - a neat and clean working code
2. Time to fix bugs

Now, if you really don't want to hold no longer and ask the AGE-OLD tried and tested missile question - "What happened to performance" - All I can say is this - "Every language gives you freedom to fiddle with it - to the degree that you modify and use it to the best you want to". In case of Java, you may want to redefine the garbage collection and tweak in the JVM. But you won't : as language bashing is the easier route.

A new age is upon us. When I started using computers, C++ was the standard. Java came into boom in late nineties. We bid goodbyes to procedural languages and welcomed OO frameworks. Somewhere and sometime, scripting language sneaked in. Some trillions of lines were written overnight and they remain as backbone and sometimes the most important asset of many companies. We jumped guns and embraced Javascript through AJAX - because it just looked cool. The person who still does not know what is the POWER of AJAX : will not give you an iota of respect if you say that your bread and butter is Javascript. May be PHP might sound cool to him. (Don't say ASP or JScript - you will fall 'down' into some domain that he cannot define himself).

Then why is that Cisco still uses Assembly in its best possible routing codes. Why do companies employ people to learn PERL and write those very million dollars worthy UNIT test cases - is TDD bad after all? Or why does not Google do all its development just in C++? Why does Microsoft spend money after all in inventing F# or asking teams to build products using D. Why Bruce Eckel takes the pain in evaluating Scala and advising people that it is the most definitive successor to Java. Why in MIT if you still get admitted, you get to use Scheme. And you thought coding in Python was cool.

Baseline - It is the time to build and sell the product which matters. Do it in a language you are comfortable with - so that you will write less buggy code and give a better output in the time assigned to you. But by that don't convince youself that you have evolved to the best you can. Keep you eyes open. C++ has its place - may be in embedded systems, may be with C++0x you would find a Concurrent Language worthy of learning (you might want to learn a language which is inherently concurrent - not just supports threads as a library). When you speak about Oct core systems, if you write a program which does not exploit Concurrency - you should feel sick downright - coz you are wasting computing power. Don't you think there should be language ConcurrentJava which would befit. Or you still want to to fall into your coterie and be Multi-C.

Java or C++ - they have places to stay. So is ASM. You want real time - use ASM. You want little lesser than that use C. May be even C++ now-a-days gives almost the same runtime support with HALF the development and perhaps ONE-EIGHTH the maintainance time. Use Java to build applications in Distributed Platforms - Isn't today all the work done in distributed scenarios? You want performance - tweak in JVM. Profile and profile. When you give up - know that given ONE-TENTH the time you invested in coding it in Java, you can achieve the same mark if you switch to any different language if you have some excellent coders in your team and IF YOUR DESIGN IS PERFECT. People neglect the 80/20 rule and start cursing Java. They don't realize that the 80 belongs to the debugging and cross-platform issues in C++. In Java most of the time is spent in designing. Still, some good coders are faithful and regardless of language, spend time in architecting the entire thing - before even deciding what to use : I say they are the best coders, even if they have not written even one damn line!

So, start googling for the language problems you feel you have. There can't be a case that all your problems are solved by one paradigm solution. Learn at least one from each - Assembly, Procedural, OO, Scripting and Functional (sorry, If I missed anything here - tell me and I will learn). You can build a browser in just 25 lines of Python code. While in some - you can multiply two matrices in just one line.

Remember that what matters is you should propose a solution which should be independent of underlying infrastructure. Model you design to not compromise with anything. Then start picking what fits in best based on constraints you have :
1. Time to build and sell (Productivity counts)
2. Ease of collaboration
3. Ease of Maintainance
4. Debugging
5. Inherent features and Recent Developments
6. Open Source Initiatives and Available Plugins
7. Scalability and Performance Metrics
8. Agility

So the next time if someone asks you - "Do you recommend me reading Java or C++?". Just stare and stay silent - Do not utter a word. I hope he feels like he asked a question which perplexed you so much that you could hit him at any moment. After sometime he would realize it and start describing the actual problem in his design and then asking for advice. All you need now is to educate him not to ask that little **stupid** question again.

Tuesday, November 13, 2007

FogBugz 6.0 - Devs' envy Testers' Pride

Well it is 02:02 at night and I am not in a mood to write something long here. So, would point you guys to a good demo of FogBugz. Seemed pretty awesome to me and I hope you would like it too. Believe me developers, you would strictly say a big NO to this software getting in the hands of the QA folks. Coz, it hits where it matters most - your performance and consistency.

The software keeps track of timelines you have skipped and other small stuffs which impact your consistency and reliability in solving bugs/issues. It is perhaps a thousand times better than Clarify which I have seen people using.

If something appeals to me most, it is the usage of Thin Client (of course something which works on Firefox without giving JavaScript issues and which uses Web 2.0 features). You have collaborative tools which spurs free interaction between Dev and QA.

I have only seen the demo, so I am not sure if this feature is present in 6.0 - I would have liked to see a quick draft form of a bug saved to a QA profile without getting filed right away. Many a times, the tester keeps a bug for further verification after he has encountered it for the first time. This feature should be included - though there are both pros and cons attached to it for the obvious reasons - the management (you figure it out yourself why).

The Demo URL is here

Just to add more interest - fogcreek is the one associated with Joel Spolsky who writes the famous blog http://joelonsoftware.com/.

This is part of an effort am putting in now-a-days to correct the process of writing code and implementing projects/designs efficiently. Good process comes through practice and awareness. In the beginning I used to always skip it. But slowly as I built the framework, it does not take much time of mine. Using open source software frameworks is the first step in this direction ( ! FogBugz ). Gone are those times when we had time to rewrite something twice. You write something today - and that remains forever!

There are some steps one should use before writing even a single line of code. Along with OO design, keep in mind Version Control, Memory Management, Code Profiling, Code Documentation etc. I would write an article when I get some more time highlighting these aspects and how they help you to bring clarity and integrity to the overall project. You may not configure them right away, but keeping space for them in your overall project design helps in the long run.

And remember - never reinvent the wheel unless you have been asked to do so. Always build on the top of existing blocks (reusing efficiently) until : you hit the Great Wall of Performance Issues.

hmmm ... the glass finally goes empty ... sadly it was cranberry juice all the way.

SiteMeter