Sunday, July 13, 2014

Big Data - My perspective

As Nathan Marz has put in his popular blog post http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html, all data is immutable and valid at a point in time. There are no updates of data, just new data at new time. My perspective on big data is also valid at the time of writing this blog and it is bound to change with time.

Big data as I see today is a marketing term to create a lot of hype without creating much of business value. But maybe that's how such thing come into existence. For some period of time, every business would want to jump to the big data wave and try to have their own analytics solution on whatever data they have, most of the times, it's not really big enough. And it may eventually help out most organisations, although as of today, it may not be serving much purpose for most businesses.

Data always keeps on growing, so the definition of Big Data would keep changing. The only difference we are observing in the current period of time is that RDBMS technologies are not scaling up to the scale of the data which is getting processed. I believe all of this new technology wave started with the organisations like Google, Twitter, Facebook, which realised that their data has gone beyond the use case of relation databases and thus they started a bunch of new concepts and technologies to process and store high volume data coming at high velocity. The Big Data term came a lot later. MR, Hadoop and related technologies surely made it a lot easier for a lot of companies to jump in and make money out of it. MR had a huge processing limitation in the form of disk spills and reads and thus we saw a number of solutions (Spark, Storm etc) which tried to overcome it by doing as much of processing in memory as possible.

It seems this trend will continue for sometime, while the in-memory technologies would continue to mature. Once all this settles down, and we have a highly scalable solution, we would see a long period of time (perhaps decades) for which there would only be optimisations in those technologies while the computer science world would move to solve another set of problems.

So, working with and mastering the current set of Big Data technologies can be enough for a long time in career!