Big data is a phrase, used to describe a massive volume of both structured and unstructured data. In most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing capacity. e.g Rolling web log data Network & System Logs Click information and What is considered “big data” varies depending on the capabilities of the organization managing the set, and on the capabilities of the applications that are traditionally used to process and analyze the data set in its domain.


Big data involves the data produced by different devices and applications. Given below are some of the fields that come under Big Data.

  • Black Box Data : It is a component of helicopter, airplanes, and jets, etc. It captures voices of the flight crew, recordings of microphones and earphones, and the performance information of the aircraft.
  • Social Media Data : Social media such as Facebook and Twitter hold information and the views posted by millions of people across the globe.
  • Stock Exchange Data : The stock exchange data holds information about the ‘buy’ and ‘sell’ decisions made on a share of different companies made by the customers.
  • Power Grid Data : The power grid data holds information consumed by a particular node with respect to a base station.
  • Transport Data : Transport data includes model, capacity, distance and availability of a vehicle.
  • Search Engine Data : Search engines retrieve lots of data from different databases.

” Big Data is when the Data itself becomes part of the Problem “

Thus Big Data includes huge volume, high velocity, and extensible variety of data. The data in big data can be classified into following three types

  • Structured data : Relational data.
  • Semi Structured data : XML data.
  • Unstructured data : Word, PDF, Text, Media Logs.



The good news is that Big Data is here. The bad news is that we are struggling to store and analyze it and Hadoop helps us in this struggle .