Friday, April 16, 2021

Hive optimization techniques

 The main components of the Hive are as follows:

  • Metastore
  • Driver
  • Compiler
  • Optimizer
  • Executor
  • Client


While Hadoop/hive can process nearly any amount of data, but optimizations can lead to big savings, proportional to the amount of data, in terms of processing time and cost. There are a whole lot of optimizations that can be applied in the hive. Let us look into the optimization techniques we are going to cover:

  1. Partitioning
  2. Bucketing
  3. Using Tez as Execution Engine
  4. Using Compression
  5. Using ORC Format
  6. Join Optimizations
  7. Cost-based Optimizer

No comments:

Post a Comment