Hive Applications
- Log processing
- Text mining
- Document indexing
- Customer-facing business intelligence (e.g., Google Analytics)
- Predictive modeling, hypothesis testing
Hive Components
- Shell: allows interactive queries like MySQL shell connected to database – Also supports web and JDBC clients
- Driver: session handles, fetch, execute
- Compiler: parse, plan, optimize
- Execution engine: DAG of stages (M/R,HDFS, or metadata)
- Metastore: schema, location in HDFS,SerDe
- Tables – Typed columns (int, float, string, date,boolean) – Also, list: map (for JSON-like data)
- Partitions – e.g., to range-partition tables by date
- Buckets – Hash partitions within ranges (useful for sampling, join optimization)
- Database: namespace containing a set of tables
- Holds table definitions (column types,physical layout)
- Partition data
- Uses JPOX ORM for implementation; can be stored in Derby, MySQL, many other relational databases
- Warehouse directory in HDFS – e.g., /home/hive/warehouse
- Tables stored in subdirectories of warehouse – Partitions, buckets form subdirectories of tables
- Actual data stored in flat files – Control char-delimited text, or SequenceFiles – With custom SerDe, can use arbitrary format
No comments:
Post a Comment