Why do we use $CONDITIONS in Apache Sqoop?

Usage of Sqoop:

 By default 4 mapper running in sqoop. When we are going to import data and if a table has a primary key then by default 4 mapper otherwise 1 mapper runs. If more than 1 mapper will run then sqoop distributes the data equally among the mappers to get high performance.

Number of mappers = Number of part files on the HDFS.

Example:

Sqoop import 

--connect jdbc:mysql://localhost/demo

--username root

--password cloudera

--query 'select * from students where $CONDITIONS'

-m 1

--target-dir /user;

 Sqoop used $CONDITIONS internally for fetching metadata and table data.

1. For fetching Metadata (column name, data type), sqoop replaces $CONDITIONS with 1=0 . Following is the 1st query which returns an empty set.

select * from students where $CONDITIONS
select * from Students where 1 =0 --> we will get only metadata information or schema
2. For fetching all data,
2.1. If Number of Mapper = 1 then sqoop replaces $CONDITIONS with 1= 1. Following is the next query which fetch and return data.

select * from students where $CONDITIONS
select * from Students where 1 =1 -> fectch all orders
2.2. If Number of Mapper > 1 then,
Sqoop replaces $CONDITIONS with a range query to fetch a subset of data from RDBMS. Like below: Suppose in students table, there is a primary key stdId, and stdId lies between 1 to 100 and we are using 4 mappers.

select * from students where id >=1 AND id <=25
select * from students where id >=25 AND id <=50
select * from students where id >=50 AND id <=75
select * from students where id >=75 AND id <=100

Comments

Popular posts from this blog

SQOOP file format

Sqoop where condition , Sqoop join two tables