Amazon Web Services (AWS) recently announced support for Impala, the open source technology platform developed by Cloudera for querying data in the Hadoop Distributed File System or HBase using SQL-like syntax as elaborated below:
Impala raises the bar for query performance while retaining a familiar user experience. With Impala, you can query data, whether stored in HDFS or Apache HBase – including SELECT, JOIN, and aggregate functions – in real time. Furthermore, it uses the same metadata, SQL syntax (Hive SQL), ODBC driver and user interface (Hue Beeswax) as Apache Hive, providing a familiar and unified platform for batch-oriented or real-time queries. (For that reason, Hive users can utilize Impala with little setup overhead.)
Amazon Web Services introduced Impala as part of the Amazon Elastic MapReduce project. Users will need to run Hadoop clusters that use Hadoop 2.x in order to take advantage of its Hadoop offering. Impala users can run queries on data sets in real time and enjoy low latency times enabled by the platform’s distributed query engine that allows Impala to boast speed and performance benefits over Apache Hive. The availability of Impala on the Amazon Web Services platform comes just weeks after its release of Amazon Kinesis, its platform for collecting and storing real time big data streams, and subsequently underscores the seriousness with which AWS plans to deploy products designed for the big data space.