Monday, October 15, 2012

Manipulating Zip Files with PeopleCode

I've seen a few forum posts that show how to zip files using both Exec and the XML Publisher PSXP_RPTDEFNMANAGER:Utility app package. Those are great options, but might not fit every scenario. Since the Java API includes support for zip files, let's investigate how we can use it to create or extract zip files.

Java allows developers to create zip files by writing data to a ZipOutputStream. We've used OutputStreams a few times on this blog to write data to files. A ZipOutputStream is just a wrapper around an OutputStream that writes contents in the zip file format. Here is an example of reading a text file and writing it out to a ZipOutputStream

REM ** The file I want to compress;
Local string &fileNameToZip = "c:\temp\blah.txt";

REM ** The internal zip file's structure -- internal location of blah.txt;
Local string &zipInternalPath = "my/internal/zip/folder/structure";

Local JavaObject &zip = CreateJavaObject("java.util.zip.ZipOutputStream", CreateJavaObject("java.io.FileOutputStream", "c:\temp\compressed.zip", True));

Local JavaObject &file = CreateJavaObject("java.io.File", &fileNameToZip);
REM ** We will read &fileNameToZip into a buffer and write it out to &zip;
Local JavaObject &buf = CreateJavaArray("byte[]", 1024);

Local number &byteCount;
Local JavaObject &in = CreateJavaObject("java.io.FileInputStream", &fileNameToZip);

Local JavaObject &zipEntry = CreateJavaObject("java.util.zip.ZipEntry", &zipInternalPath | "/" | &file.getName());

REM ** Make sure zip entry retains original modified date;
&zipEntry.setTime(&file.lastModified());

&zip.putNextEntry(&zipEntry);

&byteCount = &in.read(&buf);

While &byteCount > 0
   &zip.write(&buf, 0, &byteCount);
   &byteCount = &in.read(&buf);
End-While;

&in.close();
&zip.flush();
&zip.close();

To add multiple files to a single zip file, we can convert the above code into a function (preferably a FUNCLIB function) and then call it multiple times, once for each file:

Function AddFileToZip(&zipInternalPath, &fileNameToZip, &zip)
   Local JavaObject &file = CreateJavaObject("java.io.File", &fileNameToZip);
   REM ** We will read &fileNameToZip into a buffer and write it out to &zip;
   Local JavaObject &buf = CreateJavaArray("byte[]", 1024);
   
   Local number &byteCount;
   Local JavaObject &in = CreateJavaObject("java.io.FileInputStream", &fileNameToZip);
   
   Local JavaObject &zipEntry = CreateJavaObject("java.util.zip.ZipEntry", &zipInternalPath | "/" | &file.getName());
   
   REM ** Make sure zip entry retains original modified date;
   &zipEntry.setTime(&file.lastModified());
   
   &zip.putNextEntry(&zipEntry);
   
   &byteCount = &in.read(&buf);
   
   While &byteCount > 0
      &zip.write(&buf, 0, &byteCount);
      &byteCount = &in.read(&buf);
   End-While;
   
   &in.close();
End-Function;


Local JavaObject &zip = CreateJavaObject("java.util.zip.ZipOutputStream", CreateJavaObject("java.io.FileOutputStream", "c:\temp\compressed.zip", True));

AddFileToZip("folder1", "c:\temp\file1.txt", &zip);
AddFileToZip("folder1", "c:\temp\file2.txt", &zip);
AddFileToZip("folder2", "c:\temp\file1.txt", &zip);
AddFileToZip("folder2", "c:\temp\file2.txt", &zip);

&zip.flush();
&zip.close();

The contents to zip doesn't have to come from a static file in your file system. It could come from the database or... well, anywhere. Here is an example of zipping static text. In this example I intentionally left the internal zip file path (folder) blank to show how to create a zip file with no structure.

Local JavaObject &textToCompress = CreateJavaObject("java.lang.String", "This is some text to compress... probably a bloated XML document or something ;)");
Local string &zipInternalFileName = "contents.txt";

Local JavaObject &zip = CreateJavaObject("java.util.zip.ZipOutputStream", CreateJavaObject("java.io.FileOutputStream", "c:\temp\compressed.zip", True));
Local JavaObject &zipEntry = CreateJavaObject("java.util.zip.ZipEntry", &zipInternalFileName);
Local JavaObject &buf = &textToCompress.getBytes();
Local number &byteCount = &buf.length;

&zip.putNextEntry(&zipEntry);

&zip.write(&buf, 0, &byteCount);

&zip.flush();
&zip.close();

And, finally, unzipping files. The following example prints the text inside each file from a zip file named "compressed.zip" that contains four fictitious text files named file1.txt, file2.txt, file3.txt, and file4.txt.

Local JavaObject &zipFileInputStream = CreateJavaObject("java.io.FileInputStream", "c:\temp\compressed.zip");
Local JavaObject &zipInputStream = CreateJavaObject("java.util.zip.ZipInputStream", &zipFileInputStream);
Local JavaObject &zipEntry = &zipInputStream.getNextEntry();
Local JavaObject &buf = CreateJavaArray("byte[]", 1024);
Local number &byteCount;

While &zipEntry <> Null
   
   If (&zipEntry.isDirectory()) Then
      REM ** do nothing;
   Else
      Local JavaObject &out = CreateJavaObject("java.io.ByteArrayOutputStream");
      &byteCount = &zipInputStream.read(&buf);
      
      While &byteCount > 0
         &out.write(&buf, 0, &byteCount);
         &byteCount = &zipInputStream.read(&buf);
      End-While;
      
      &zipInputStream.closeEntry();
      MessageBox(0, "", 0, 0, &out.toString());
      /*Else
         &log.writeline("&zipEntry is a directory named " | &zipEntry.getName);*/
   End-If;
   
   &zipEntry = &zipInputStream.getNextEntry();
End-While;

&zipInputStream.close();
&zipFileInputStream.close();

What about unzipping binary files into the file system? I'll let you write that one.

Password protected zip files? Java doesn't make this easy. There are a few Java libraries, but as Chris Rigsby points out here, using non-standard Java classes (including your own) can be hazardous. At this time, it seems the best way to password protect a zip file is to use Exec to call a command line zip program. On Linux with the zip utility, use the -P parameter to encrypt with a password.

22 comments:

Brett B said...

Is there a significant benefit to doing it this way instead of calling the system unzip?

Jim Marion said...

@Brett, good question. For some, there may be no difference.

The main benefit is that it doesn't require any system utilities. For those that run on Linux, finding a compression utility or writing a shell script is trivial. For windows, however, there isn't a lot in the GPL/GNU space (except UnxUtils, of course). The example shown here uses the Java API which is guaranteed to be in your PeopleSoft system.

The second benefit is shown in the third code listing. It shows how to stream information directly into a zip file without having to write to a file first. When it comes to distributed processing, servers, load balancing, etc, it is difficult to make assumptions for the file system. Now, what I didn't show was how to stream a zip out to the response object without ever writing to disk. That sounds like a good post for tonight :)

Neeraj Kholiya said...

Hi Jim

Few months back we had a similar requirement for one of our client and we end up developing two different scripts for unix and windows .

Maintenance / development of batch script was just a nightmare..

This looks like more cleaner solution.

Jim Marion said...

@Neeraj, Thank you for the feedback. What you mention is a common scenario and why I went the Java/PeopleCode route instead of a batch file.

If you have to password protect the zip files, then the solution is not as clean, but for standard compression, this is a good solution.

Notice that I didn't have to use any reflection with the Java! Just nice, clean method calls.

kane81 said...

Hi, your post inspired me to provide online some code I wrote a while back. - More specifically the zip password issue.

I have a free library that provides zipping with or without passwords as well as checking if office documents are encrypted.

The zip password is 'standard' encryption - so compatible with all zipping programs!

The library is Java and there is a wrapper so that it works in PeopleSoft with relative ease.

Example: &zipUtil.CreateZipFolderEncrypted(&FolderPathToZip, &ZipFileToSaveAs, &Password);


http://users.adam.com.au/kane81/PeopleSoft/Utilities/

Jim Marion said...

@Kane81, thanks for sharing!

Raajesh said...

Hi Jim,

A very useful tutorial.

Thanks for this post. I have expanded this tutorial to create TAR Files in PeopleCode, using JTAR library.

Jim Marion said...

@Raajesh, thanks for sharing!

Narender Dontu said...

Hello jim,

I am using the code to zip multiple pdf's into a zip file. But when I unzip the files, the pdf's are under sub-directories.

e.g if a pdf file is under
\\1.1.1\folder1\folder2\OutPut1.pdf
\\1.1.1\folder1\folder2\OutOut2.pdf

then the resultant zip file has the following structure

1.1.1\folder1\folder2\OutPut1.pdf
1.1.1\folder1\folder2\OutPut2.pdf

Is there a way to not include the sub-directories but just the pdf files inside the zip files?

Jim Marion said...

@Narender, yes, just set &zipInternalPath to "".

ChiDONEt said...

Is there any way to check the zip integrity, before to open or begin with the process ? let me try to explain...I recieve by sftp a lot of zip files ( and I process at night ), but this week, I have a problema with a big zip file... while the zip file was arriving my process begin to open at the same time, and I not receive any security error from OS. Its to say that I can open the zip file while it is transfered... I try several things but I until now, I don't how can check if the trasnfer of the file was already finish and I can open the zip file.

Jim Marion said...

@ChiDONEt, interesting use case. I don't have an answer for you. If this is PeopleSoft related, you can try posting your question on the PeopleSoft OTN General Discussion forum.

ChiDONEt said...

Thanks Jim... I will post on PeopleSoft Forum Community...

I know How to resolve, but I don't like my solution....check the file size of zip file and store on table, on the second run, if the size is the same, I can begin to process the zip file.

Jim Marion said...

@ChiDONEt, I don't know if it is feasible, but an MD5 hash sum is a good way to verify files as well. If you can store that instead, or in addition to, that might be helpful. They do require additional time to generate, though.

Sandeep Mohanty said...

Is it possible to multiple folders under a specified path into one zip file using Peoplecode?

Jim Marion said...

@Sandeep, absolutely! It is easier if you are on PT 8.53 and can use the new Java ZipFileSystem.

Sandeep Mohanty said...

@Jim - I am on PT 8.53. I have a requirement - There is a parent folder and it contains many child folders (Child Folder may or may not contains files like .doc/.pdf etc.). I have to create a zip contains these child folders and files within these child folders. Can you please guide me or provide sample code snippets for this. Appreciate your help in this regard.

Thanks,
Sandeep

Jim Marion said...

@Sandeep, good idea. I'll add it to the list.

Becca said...

Jim,

I was surprised that I couldn't find a thread on this topic already - so please excuse my placing this comment here.

I'm creating a CSV file from data loaded into a temp table in an App Engine. This works flawlessly until you reach the requirement that any field in the CSV which is NULL should output a space:
abc, ,1234, ,8/9/10, ,zyx

I've ensured that the spaces are preserved in the record, but cannot seem to keep them when writing to the file. Any ideas?

@Becca

Jim Marion said...

@Becca, did you try enforcing quotes around text?

Jim Marion said...

@Becca, the only other option I know of is to skip the FileLayout and write directly to a file by iterating over a set of rows and applying the appropriate delimiters yourself.

Becca said...

Jim,

Since I'm not supposed to have any text qualifiers, it looks like I'm going to have to go with option #2. I kinda figured that was the case, but didn't want to write it off.

Thanks again!

@Becca