The main components of the Hive are as follows:
- Metastore
- Driver
- Compiler
- Optimizer
- Executor
- Client
While Hadoop/hive can process nearly any amount of data, but optimizations can lead to big savings, proportional to the amount of data, in terms of processing time and cost. There are a whole lot of optimizations that can be applied in the hive. Let us look into the optimization techniques we are going to cover:
- Partitioning
- Bucketing
- Using Tez as Execution Engine
- Using Compression
- Using ORC Format
- Join Optimizations
- Cost-based Optimizer
No comments:
Post a Comment