Architecture to administrator, Extract, Transform and Load, Troubleshoot, Business Intelligence Reporting, Performance Tuning of Big data is complicated to work with using most legacy relational database management systems, desktop statistics and visualization packages, requiring instead “massively parallel software running on tens, hundreds, or even thousands of servers”. What is considered “big data” varies depending on the capabilities of the organization managing the set, and on the capabilities of the applications that are traditionally used to process and analyze the data set in its domain. “For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration.”
Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target, ranging from a few dozen terabytes to many petabytes of data in a single data set.
Gartner updated its definition on Big Data: “Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.” Additionally, a new V “Veracity” is added by some organizations to describe it.
The growing maturity of the concept fosters a more sound difference between big data and Business Intelligence, regarding data and their use:
Business Intelligence uses descriptive statistics with data with high information density to measure things, detect trends etc.
Big data uses inductive statistics and concepts from nonlinear system identification to infer laws (regressions, nonlinear relationships, and causal effects) from large data sets to reveal relationships, dependencies, and to perform predictions of outcomes and behaviors..