Big data: Hive Application,Components,Model and Layout

Monday, February 25, 2013

Hive Application,Components,Model and Layout

Hive Applications

Log processing
Text mining
Document indexing
Customer-facing business intelligence (e.g., Google Analytics)
Predictive modeling, hypothesis testing

Hive Components

Shell: allows interactive queries like MySQL shell connected to database – Also supports web and JDBC clients
Driver: session handles, fetch, execute
Compiler: parse, plan, optimize
Execution engine: DAG of stages (M/R,HDFS, or metadata)
Metastore: schema, location in HDFS,SerDe

Data Model

Tables – Typed columns (int, float, string, date,boolean) – Also, list: map (for JSON-like data)
Partitions – e.g., to range-partition tables by date
Buckets – Hash partitions within ranges (useful for sampling, join optimization)

Metastore

Database: namespace containing a set of tables
Holds table definitions (column types,physical layout)
Partition data
Uses JPOX ORM for implementation; can be stored in Derby, MySQL, many other relational databases

Physical Layout

Warehouse directory in HDFS – e.g., /home/hive/warehouse
Tables stored in subdirectories of warehouse – Partitions, buckets form subdirectories of tables
Actual data stored in flat files – Control char-delimited text, or SequenceFiles – With custom SerDe, can use arbitrary format

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)