LZ4 changes encoding

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

LZ4 changes encoding

Soheil Pourbafrani
Hi,

Using Java API I write some text files in HDFS. Using the Gzip Codec, its OK and compressed data will be written on HDFS with the same encoding, but using the LZ4 Codec, it seems LZ4 codec change the text encoding and when I check compressed text files on HDFS, the text file contents are unreadable! Here is the code I use:

org.apache.hadoop.conf.Configuration conf = new Configuration();
CompressionCodecFactory ccf = new CompressionCodecFactory(conf);
CompressionCodec codec = ccf.getCodecByClassName(Lz4Codec.class.getName());
FileSystem fileSystem = FileSystem.get(conf);
FSDataOutputStream out;
out = fileSystem.create(path);
OutputStream compressedOutputSream = codec.createOutputStream(out);
BufferedWriter cout = new BufferedWriter( new OutputStreamWriter( compressedOutputSream) );
cout.write(text_data + "\n");
How can I force LZ4 Codec to compress data as they are using the sane encode or UTF-8?