Why do we use $CONDITIONS in Apache Sqoop?
Usage of Sqoop:
By default 4 mapper running in sqoop. When we are going to import data and if a table has a primary key then by default 4 mapper otherwise 1 mapper runs. If more than 1 mapper will run then sqoop distributes the data equally among the mappers to get high performance.
Number of mappers = Number of part files on the HDFS.
Example:
Sqoop import
--connect jdbc:mysql://localhost/demo
--username root
--password cloudera
--query 'select * from students where $CONDITIONS'
-m 1
--target-dir /user;
Sqoop used $CONDITIONS internally for fetching metadata and table data.
1. For fetching Metadata (column name, data type), sqoop replaces $CONDITIONS with 1=0 . Following is the 1st query which returns an empty set.select * from students where $CONDITIONS
select * from Students where 1 =0 --> we will get only metadata information or schema
2. For fetching all data,
2.1. If Number of Mapper = 1 then
sqoop replaces $CONDITIONS with 1= 1. Following is the next query which fetch and return data.
select * from students where $CONDITIONS
select * from Students where 1 =1 -> fectch all orders
2.2. If Number of Mapper > 1 then,
Sqoop replaces $CONDITIONS with a range query to fetch a subset of data from RDBMS. Like below:
Suppose in students table, there is a primary key stdId, and stdId lies between 1 to 100 and we are using 4 mappers.
select * from students where id >=1 AND id <=25
select * from students where id >=25 AND id <=50
select * from students where id >=50 AND id <=75
select * from students where id >=75 AND id <=100
Comments
Post a Comment