Tuesday, February 26, 2013

What is Hive

Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.



Hive is an abstraction on top of MapReduce it allows users to query data in the Hadoop cluster without knowing Java or MapReduce.It Uses the HiveQL language,Very similar to SQL.

8 Points about Hive:-
  • Hive was originally developed at Facebook
  • Provides a very SQL-like language
  • Can be used by people who know SQL
  • Enabling Hive requires almost no extra work by the system administrator
  • Hive ‘layers’ table definitions on top of data in HDFS
  • Hive tables are stored in Hive’s ‘warehouse’ directory in HDFS, By default, /user/hive/warehouse
  • Tables are stored in subdirectories of the warehouse directory
  • Actual data is stored in flat files- Control character-delimited text, or SequenceFiles
Hive is Data warehousing tool on top of Hadoop.It same as SQL
  • SQL like Queries
  • SHOW TABLES, DESCRIBE, DROPTABLE
  • CREATE TABLE, ALTER TABLE
  • SELECT, INSERT
Hive Limitations
  • Not all ‘standard’ SQL is supported
  • No support for UPDATE or DELETE
  • No support for INSERTing single rows
  • Relatively limited number of built-in functions
  • No datatypes for date or time - Use the STRING datatype instead.In new version date or time datatype will support.

1 comment:

  1. Nice explanation regarding HIVE.Thanks for sharing.Even I am having something useful to share with you.The website named http://www.hadooponlinetutor.com is offering the hadoop complete videos at $20 only.The videos are really good.

    ReplyDelete