Compression technique in Sqoop:

 Compression technique in Sqoop:

While saving data to HDFS, decrease the overall size occupied on HDFS by using the compression technique. Mainly 3 compression techniques are available in sqoop. 1. Gzip Compression 2. Snappy Compression 3. Bzip2 Compression --compress is used to enable the compression. --compression-codec is used with a specific compression algorithm. Compression technique in Sqoop:
When using the --compress parameter in sqoop command, output files will be compressed using the Gzip codec, and all files will end up with a .gz extension. Gzip Files Extension .gz. Bzip2 Files Extension .bz2. Snappy Files Extension .snappy. Gzip Compression Speed medium. Bzip2 Compression Speed slow. Snappy Compression Speed fast.
if 100GB size medium 50 ,high 40)
Gzip Degree of Compression medium. Bzip2 Degree of Compression high. Snappy Degree of Compression medium. You need to make sure the compressed map output is allowed in your Hadoop configuration file. Configure Compression technique in Sqoop:

in core-site.xml find below propoeties In mapred-site.xml: name: mapreduce.map.output.compress value: true name: mapreduce.map.output.compress.codec value: org.apache.hadoop.io.compress.GzipCodec name: mapreduce.map.output.compress.codec value: org.apache.hadoop.io.compress.SnappyCodec name: mapreduce.map.output.compress.codec value: org.apache.hadoop.io.compress.BZip2Codec Gzip Compressing technique: Compress imported data with Gzip compression technique. By default sqoop uses GzipCodec compress technique. To enable compress, parameter is: --compress 1.1 . Gzip Compression using --compress
if we give --compress its Gzip compress in sqoop. Example : $sqoop import --connect jdbc:mysql://localhost/database-name --username root --password mypassword --table cities --target-dir /user/YT/cities_gz --compress
1.2. Gzip Compression using --compression-codec GzipCodec Example : $sqoop import --connect jdbc:mysql://localhost/database-name --username root --password mypassword --table cities --target-dir /user/YT/cities_gz --compression-codec org.apache.hadoop.io.compress.GzipCodec -m 1
1.3 Snappy Compressing technique: Compress imported data with a snappy compression technique. Example : $sqoop import --connect jdbc:mysql://localhost/database-name --username root --password mypassword --table cities --target-dir /user/YT/cities_snappy --compression-codec org.apache.hadoop.io.compress. SnappyCodec 1.4 Bzip2 Compressing technique:
Compress imported data with Bzip2 compression technique. Example : $sqoop import --connect jdbc:mysql://localhost/database-name --username root --password mypassword --table cities --target-dir /user/YT/cities_bz2 --compression-codec org.apache.hadoop.io.compress. BZip2Codec

Comments

Popular posts from this blog

SQOOP file format

Why do we use $CONDITIONS in Apache Sqoop?

Sqoop where condition , Sqoop join two tables