Big data—a market that IDC has forecast to surge at a 40 percent compound annual growth rate, from $3.2 billion in 2010 to $17 billion in 2015—presents a formidable new frontier in which data sets can grow so large that they become awkward to work with using traditional database management tools. The need for new tools, frameworks, hardware, software and services to handle this emerging issue represents a huge market opportunity. Good big data toolsets provide scalable, high-performance analytics at the lowest cost and in near-real time as business users increasingly demand continuous access to data. By analyzing this data, companies gain greater intelligence as well as a competitive advantage. Here, John Schroeder, CEO and co-founder of Hadoop and big data specialist MapR, points out major developments he believes will drive big data to become a must-have infrastructure for large enterprises in 2014—including the promise and challenges SQL has for big data, data security issues stemming from authentication, and Hadoop's future.
by Chris Preimesberger
SQL development for Hadoop enables business analysts to use their skills and SQL tools of choice for big data projects. Developers can choose from Apache projects Hive and Drill, Impala, and proprietary technologies such as Hadapt, HAWQ and Splice Machine.
SQL requires data structure, and centrally structuring data causes delays and requires manual administration. SQL also limits the type of analysis. An over-emphasis on SQL will delay organizations efforts to fully leverage the value of their data and delay reactions.
With an onslaught of access-control capabilities available in Hadoop, organizations quickly realize that wire-level authentication is the required foundation. Without adequate authentication, any upper-level control is easily bypassed, thwarting intended security initiatives.
Data errors will occupy organizations in 2014. Do data errors indicate issues with underlying source systems? Are data errors the result of ETL issues that are introducing biases in downstream analysis? Do data errors indicate definitional differences or a lack of consistency across departments and business segments? 2014 will see the embracing of data anomalies.
2014 will see a dramatic increase in production deployments of Hadoop across industries. This will reveal the power of Hadoop in operations where production applications combine analytics for measureable business advantage in apps such as customized retail recommendations, fraud detection and using sensor data for prescriptive maintenance.
Data hubs offload ETL processing and data from enterprise data warehouses to Hadoop, acting as a central enterprise hub that is 10 times cheaper and can perform more analytics for additional processing or new apps.
The ability to leverage big data will emerge as the competitive weapon in 2014. More companies will use big data and Hadoop to pinpoint individual consumers' preferences for profitable up-sell and cross-sell opportunities, better mitigate risk, and reduce production and overhead costs.
Organizations will transition from developers driving the big data initiatives. Increasingly, IT will be tasked with defining the data infrastructure required to support diverse applications and focus on the infrastructure required to deploy, process and protect an organization's core asset.
There were a large number of SQL initiatives for Hadoop in 2013, and 2014 will be the year that the unstructured query language comes into full focus. Integrating search into Hadoop provides a simple intuitive method for any business user to locate important information. Search engines also are the core for many discovery and analysis applications, including recommendation engines.
Hadoop will continue to displace other IT spending, disrupting enterprise data warehouse and enterprise storage. For example, Oracle top-line targets have missed five of the last 10 quarters, and Teradata revenue and earnings have missed four of the last five quarters.
More organizations will realize that Apache Hadoop alone isn't enterprise-ready. Apache Hadoop wasn't designed for system administration or common enterprise IT processes, such as disaster recovery. Enterprises will continue to move toward hybrid solutions that combine architectural innovations with Apache Hadoop's open source.