HTAR - Introduction

HTAR is a utility that is used for aggregating a set of files from the local file system directly into HPSS, creating a file that conforms to the POSIX TAR specification.

It does this without having to first create an intermediate file on the local filesystem; instead, it uses a sophisticated multithreaded buffering scheme to write files directly into HPSS, thereby achieving a high rate of performance.

When HTAR creates the TAR file, it also builds an index file, which is stored in the same directory as the TAR file, as shown by the diagram below:

HTAR Archive/Index file creation

The index filename is normally the same as the TAR file name, with a ".idx" suffix added. 

HTAR provides a number of commands to work with archive files, including:

  • actions to create, list and verify the contents of archive files, and to randomly extract files from within an archive file, using the offset(s) stored in the index file
  • ability to recreate an index file that has been been accidentally deleted, and to create index files for TAR-format archive files that were created by other versions of TAR.
  • ability to create and verify CRC checksums for member files within the archive file
  • ability to store member files within the archive of up to 8^12-1 (approximately 68GB)
  • ability to specify the HPSS Class of Service for either the archive file, the index file, or both

HTAR was originally designed to work with 50000-100000 small (1 to 3 megabyte) files, but it has proven to be capable of working efficiently with very large archive files containing large numbers of member files.  It is now routinely used at some sites to create archives containing tens of millions of files, with archive files reaching sizes of up to 10 Terabytes.  

HTAR is used as the aggregation mechanism for the newly developed IBM GPFS-HPSS interface, and was demonstrated as part of the Billion File Demo at Supercomputing '2007.