1.Large data on the web
2.Nutch built to crawl this web data
3.Large volumn of data had to saved- HDFS Introduced
4.How to use this data? Report
5.Map reduce framework built for coding and running analytics
6.unstructured data – Web logs, Click streams, Apache logs, Server logs – fuse,webdav, chukwa, flume and Scribe
7.sqoop and Hiho for loading data into HDFS – RDBMS data
8.High level interfaces required over low level map reduce programming– Hive,Pig,Jaql
9.BI tools with advanced UI reporting
10.Workflow tools over Map-Reduce processes and High level languages - Oozie
11.Monitor and manage hadoop, run jobs/hive, view HDFS – high level view- Hue, karmasphere, eclipse plugin, cacti, ganglia
12.Support frameworks- Avro (Serialization), Zookeeper (Coordination)
13.More High level interfaces/uses- Mahout, Elastic map Reduce
14.OLTP- also possible – Hbase
15.Lucene is a text search engine library written in Java.
- HBase is the Hadoop database for random read/write access.
- Hive provides data warehousing tools to extract, transform and load data, and query this data stored in Hadoop files.
- Pig is a platform for analyzing large data sets. It is a high level language for expressing data analysis.
- Oozie - Workflow for interdependent Hadoop jobs.The workflow has four control-flow nodes.A start control node,a map-reduce action node, a kill control node, and an end control node.
- FLUME - Highly reliable, configurable streaming data collection
- SQOOP -Integrate databases and data warehouses with Hadoop
- HUE - User interface framework and SDK for visual Hadoop applications
- Eclipse is a popular IDE donated by IBM to the open source community.
- Lucene is a text search engine library written in Java.
- Jaql or jackal is a query language for JavaScript open notation.
- ZooKeeper - Coordination service for distributed applications
- Avro is a data serialization system.
- UIMA is the architecture for the development, discovery, composition and deployment for the analysis of unstructured data.
Very nice Blog,keep sharing more information with us.
ReplyDeletethank you.....
big data online course
hadoop admin online course