Monday, February 25, 2013

Hive Application,Components,Model and Layout

Hive Applications
  • Log processing
  • Text mining
  • Document indexing
  • Customer-facing business intelligence (e.g., Google Analytics)
  • Predictive modeling, hypothesis testing
Hive Components
  • Shell: allows interactive queries like MySQL shell connected to database – Also supports web and JDBC clients
  • Driver: session handles, fetch, execute
  • Compiler: parse, plan, optimize
  • Execution engine: DAG of stages (M/R,HDFS, or metadata)
  • Metastore: schema, location in HDFS,SerDe
Data Model
  • Tables – Typed columns (int, float, string, date,boolean) – Also, list: map (for JSON-like data)
  • Partitions – e.g., to range-partition tables by date
  • Buckets – Hash partitions within ranges (useful for sampling, join optimization)
Metastore
  • Database: namespace containing a set of tables
  • Holds table definitions (column types,physical layout)
  • Partition data
  • Uses JPOX ORM for implementation; can be stored in Derby, MySQL, many other relational databases
Physical Layout
  • Warehouse directory in HDFS – e.g., /home/hive/warehouse
  • Tables stored in subdirectories of warehouse – Partitions, buckets form subdirectories of tables
  • Actual data stored in flat files – Control char-delimited text, or SequenceFiles – With custom SerDe, can use arbitrary format

No comments:

Post a Comment