Bigdata and data science by Kartheek Dachepalli: Hive optimization techniques

Friday, April 16, 2021

Hive optimization techniques

The main components of the Hive are as follows:

Metastore
Driver
Compiler
Optimizer
Executor
Client

While Hadoop/hive can process nearly any amount of data, but optimizations can lead to big savings, proportional to the amount of data, in terms of processing time and cost. There are a whole lot of optimizations that can be applied in the hive. Let us look into the optimization techniques we are going to cover:

Partitioning
Bucketing
Using Tez as Execution Engine
Using Compression
Using ORC Format
Join Optimizations
Cost-based Optimizer

Bigdata and data science by Kartheek Dachepalli

Friday, April 16, 2021

Hive optimization techniques

No comments:

Post a Comment