Posts

Showing posts from October, 2020

Sqoop --target-dir & ---warehouse-dir

  Sqoop store imported data by 2 way: When we import record from RDBMS to HDFS using Sqoop then Sqoop use two way to store imported record on HDFS. 1. --target-dir directory path here 2. --warehouse-dir directory path here Import data using --target-dir in Sqoop: By using --target-dir path of directory, Sqoop create a directory which work as table name (directory path) with imported files. Example -1: Sqoop import --connect jdbc:mysql://localhost/sales --username root --password password --table students --target-dir /user/YT/tarData --where country_name = ‘India’ -m 1 This import statement stored all those students in target directory location, who belongs to India. Import data using --warehouse-dir in Sqoop: Sqoop create a directory which works as database directory and table name directory automatically created with imported files with in warehouse dir . So warehouse directory work like database directory for importing table. Example: Sqoop import --connect ...

Sqoop where condition , Sqoop join two tables

  1. Import data from MySQL using JOIN in Sqoop Example: Sqoop import --connect jdbc:mysql://localhost/sales --username root --password password --target-dir /user/YT/joinedData --query `select s.name, d.name from students s join department d on d.dept_id = s.dept_id where s.country_name = ‘India’ and $CONDITIONS order by d.name` -m 2 Above SQL will display all those student who belongs to India with the student name and department name. 2. Where Clause without query in Sqoop: Here we are not using the query, using only table name and where clause. It will return all the matching rows. Example: Sqoop import --connect jdbc:mysql://localhost/sales --username root --password password --table students --target-dir /user/YT/studentsCountryWiseData --where country_name = ‘India’ -m 1

Sqoop command and list all databases

  1. How to connect & list all databases in Sqoop: Sqoop list-databases tool parses and executes the show databases query against the database server. Syntax: $ sqoop list-databases connection-args Example: $ sqoop list-databases --connect jdbc:mysql//localhost/ --username username --password password It will display the list of databases. 2. Common Arguments in Sqoop: 2.1. --connect jdbc-uri : JDBC connect string. 2.2. --connect-manager class-name: If you write your java program and tell sqoop how to connect database. Suppose you have a java program class like myConnect.java then you can use:- --connect-manager myConnect 2.3. --driver class-name: We can manually specify the jdbc driver. Jdbc:mysql is the jdbc driver. 2.4. --hadoop-home dir: Override the $HADOOP_HOME 2.5. -P: Read the password from console. 2.6. --username username: Set Authentic username. 2.7. --password password: Set Authentic password. 2.8. --verbose: Print more information while working. 2.10...

Why do we use $CONDITIONS in Apache Sqoop?

Usage of Sqoop:  By default 4 mapper running in sqoop. When we are going to import data and if a table has a primary key then by default 4 mapper otherwise 1 mapper runs. If more than 1 mapper will run then sqoop distributes the data equally among the mappers to get high performance. Number of mappers = Number of part files on the HDFS.

How to use password-file in Apache Sqoop?

  There are 3 ways to pass password in sqoop import command: 1. --password mypassword --> not secured way 2. –P: Read the password from the console. 3. Reading password from a file. Reading the password from a file: Syntax: --password-file password-filename Example: Sqoop import --connect jdbc:mysql://localhost/sales --username root --password-file my_sqoop_password --table student

SQOOP file format

  How many file  formats  in Sqoop?   Sqoop supports the following 4 file  formats  for import operation.    1. Text file format.    2. Avro file format.    3. Sequence file format.    4. Parquet file format.    1st is  text file format  and  other three are binary file format .    Binary formats are Avro, Sequence and Parquet file.     Text file format in Sqoop:     1. Text file format: Apache Sqoop uses text file format as the default file format for importing the data from SQL to HDFS.    If we want to import data into text file then we need to specify --as- textfile  in  sqoop  command.  

Compression technique in Sqoop:

  Compression technique in Sqoop: While saving data to HDFS, decrease the overall size occupied on HDFS by using the compression technique. Mainly 3 compression techniques are available in sqoop. 1. Gzip Compression 2. Snappy Compression 3. Bzip2 Compression --compress is used to enable the compression. --compression-codec is used with a specific compression algorithm. Compression technique in Sqoop: When using the --compress parameter in sqoop command, output files will be compressed using the Gzip codec, and all files will end up with a .gz extension. Gzip Files Extension .gz. Bzip2 Files Extension .bz2 . Snappy Files Extension .snappy . Gzip Compression Speed medium . Bzip2 Compression Speed slow . Snappy Compression Speed fast . if 100GB size medium 50 ,high 40) Gzip Degree of Compression medium . Bzip2 Degree of Compression high . Snappy Degree of Compression medium .