Big data: What is Hive

Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.

Hive is an abstraction on top of MapReduce it allows users to query data in the Hadoop cluster without knowing Java or MapReduce.It Uses the HiveQL language,Very similar to SQL.

8 Points about Hive:-

Hive was originally developed at Facebook
Provides a very SQL-like language
Can be used by people who know SQL
Enabling Hive requires almost no extra work by the system administrator
Hive ‘layers’ table definitions on top of data in HDFS
Hive tables are stored in Hive’s ‘warehouse’ directory in HDFS, By default, /user/hive/warehouse
Tables are stored in subdirectories of the warehouse directory
Actual data is stored in flat files- Control character-delimited text, or SequenceFiles

Hive is Data warehousing tool on top of Hadoop.It same as SQL

SQL like Queries
SHOW TABLES, DESCRIBE, DROPTABLE
CREATE TABLE, ALTER TABLE
SELECT, INSERT

Hive Limitations

Not all ‘standard’ SQL is supported
No support for UPDATE or DELETE
No support for INSERTing single rows
Relatively limited number of built-in functions
No datatypes for date or time - Use the STRING datatype instead.In new version date or time datatype will support.

Big data

Tuesday, February 26, 2013

What is Hive

1 comment:

About Me

Blog Archive