Traditionally, big data has been described by the “3Vs”: Volume, Variety, Velocity.  What is a real analytics problem that is best solved using big data tools?  What kind of metrics do you want to capture?  The most common use cases today involve scraping large volumes of log data.  This is because log data tends to be very unstructured, can come from multiple sources, and especially for popular websites, can be huge (terabytes+ a day).  Thus having a framework for performing distributed computing tasks is essential to solve this problem.