HTAR User Guide


HTAR USER GUIDE  

HTAR Version 3.4.2

Last Revision: 10/20/08


Acknowledgements

Many thanks to the Lawrence Livermore National Laboratory (LLNL) for granting permission to use their “HTAR Reference Manual (UCRL-WEB-200720)” as the basis for this document. However, Gleicher Enterprises takes full responsibility for any mistakes or inaccuracies in the text.



Preface  


Scope:  This manual explains the features, usage, options, and error handling of the “HPSS TAR” (HTAR) program. HTAR was designed to bundle large numbers of related small files directly into a single HPSS “archive” file without having to first create a local file, and to efficiently randomly retrieve files from the archive. The manual introduces HTAR, describes the structure of the HTAR files, and provides a description of the HTAR command line, control options, and environment variables. Annotated examples are provided of using HTAR for the tasks it is designed to handle (such as retrieving member files from within an archive, successfully managing very large archives, and using CRC checksums to verify the integrity of member files in the archive). 


Availability:  HTAR is supported on a wide variety of Un*x-based software and hardware platforms, including AIX, Solaris, HPUX, SGI, Linux, MacOSX, and Cygwin on Windows XP.


Support:  Contact your local HPSS consulting staff for help with HTAR, or to report problems. Gleicher Enterprises, LLC also provides HSI and HTAR support for sites that have purchased a Support License.


Printing:  The printable version of this document can be found at: 

http://www.mgleicher.us/Downloads/htar/htar.pdf  



Introduction  

1.1 WHAT IS HTAR?


HTAR ("HPSS TAR") is a utility program that writes TAR-compatible archive (library) files directly into HPSS, without having to first create a local file. Its command line was originally based on that of the AIX “TAR” program, with a number of extensions added to provide extra features.


HTAR's many features include its ability to:  


• bundle many small files together in memory (without using more local disk space, as required by other bundling utilities such as TAR, CPIO or PAX) for more efficient handling and transfer

• send the resulting large archive file directly to HPSS without requiring you to use HSI or other HPSS transfer programs, such as PFTP

• randomly retrieve individual files from an HPSS archive without moving the whole large archive back to your local machine first (or, optionally, without even staging the whole archive to HPSS disk cache if the file currently resides on tape)

• perform high-speed transfers  to and from HPSS by deploying multiple threads and by automatically using the parallel I/O features of HPSS, such as network striping, without requiring any action on your part.

• optionally use Cyclic Redundancy Checksums (CRCs) to help verify the integrity of member files within the archive, at a cost of extra CPU cycles and some transfer performance degradation

• recreate index files that may have been inadvertently deleted

• create index files for TAR-format archives that were created by other TAR utilities

• optionally create and extract files using the local filesystem, using the -E option.

• optionally work with archives on a remote machine (using the -F option) at sites that use the LLNL-modified “wuftpd” FTP server, which can use the HPSS mover transfer protocols.

• easily create incremental backup archives to supplement a master archive with (only) files recently  changed (with -n). 


1.2 HTAR DESIGN GOALS

When HTAR was originally developed, there were a number of design goals that it was intended to achieve, including the following:


• Use the POSIX 1003.1 format for the archive file, in order to allow HTAR-generated archive files to be read by other TAR programs.

• Emulate the AIX “tar” program command line as much as possible

• Create an index file for all of the member files in the archive file, to provide the ability to randomly retrieve files from the archive without having to read the entire archive file.

• Efficiently store and retrieve HPSS-resident archive files containing several tens of thousands of “small” files, with sizes ranging from 1 byte to 3 megabytes, using multi-threaded I/O to be able to read member files in parallel when creating archive files.  (Note that htar allows  files up to 8^12-1 bytes).

• Provide options to create, list, extract, and rebuild the index file if it should accidentally be deleted


A comparison of traditional UNIX TAR and HTAR features is contained in section "TAR and HTAR Compared". In most cases, HTAR provides equivalent or enhanced versions of standard TAR, with the exception of not providing the ability to append to existing archives. In general, HTAR maintains full output  compatibility with the POSIX 1003.1 standard TAR format while successfully archiving thousands or even  millions of member files, handling files of greatly mixed sizes or types, and imposing no limit on the total size of the HPSS-resident archive files that it builds. 


1.3 PERFORMANCE


HTAR is designed to achieve maximum efficiency when creating large archives in HPSS. HTAR does not invoke another utility to do the actual transfers, such as a separate parallel FTP (PFTP) client, but instead uses its own logic to perform parallel transfers to and from HPSS.


Its use of multiple concurrent threads, a sophisticated buffering scheme, combined with HPSS parallel I/O and network striping enable HTAR to achieve the maximum transfer rates possible, limited only by the slowest component in the path (local filesystem I/O, network bandwidth, or HPSS bandwidth). Rates for creating HTAR archives in HPSS are usually tens or even hundreds of times faster than storing the small files individually In most cases, creating an archive directly in HPSS, using HTAR, will be much faster than either creating a local TAR file and then copying it to HPSS with HSI, or piping TAR output into HSI. 


1.4 ARCHIVE/MEMBER FILE RESTRICTIONS


Member files within an HTAR archive can be up to approximately 68Gbyte (exact max is 68,719,476,735 bytes). There is no maximum size for the whole HTAR  archive other than site-imposed restrictions or amount of space available. There is also a maximum number of member files that is defined by the HPSS administrator when HTAR is compiled; the release default is one million member files, with a command line option (-M) to increase this to a site-defined absolute maximum, with the release value set to five million. (HTAR has successfully been used to create archives up to 10 terabytes at one site, and to create archives containing 50 million member files). 


1.5 MULTIPLE COPIES

The default number of copies for HPSS files that are created at each site is defined by the HPSS administrator, based upon the Class of Service that is used for HTAR-created archive files (see next section). HTAR also provides a command line option “-Hcopies=n” to specify the number of copies to be used, but only if automatic COS selection is used.


1.6 CLASS OF SERVICE


The HPSS administrator may specify the COS to be used for HTAR-created archive files at build time, or may specify that automatic COS selection should be used to pick a Class of Service based upon factors such as the archive file size. The HTAR “-Y” command line option can also be specified to pick the COS for either the archive file, the index file, or both, or to specify automatic COS selection. In addition, the HTAR_COS environment variable can be set to specify the COS to use for the archive and/or index files. 


Contact your local HPSS administrator to find out how to specify multiple copies at your site.


2 ABOUT THIS GUIDE. 

Because HTAR combines two features usually separate (file bundling and storing files in HPSS), having some  understanding of how it works can help you use it effectively. So one subsection below shows and explains the relationship among the three files (the archive file, the index file, and the consistency file) that HTAR uses to manage every transaction. Also shown in an appendix is a feature-by-feature comparison of HTAR and traditional UNIX TAR. 

One major section in this manual tells how to run HTAR, and spells out common error  conditions, known limitations (with work-arounds), and HTAR environment variables. A second major section describes the function of each HTAR option (distinguishing the required action options 

from the control options). A third section gives annotated step-by-step examples of how to 

use  HTAR to handle common file-archiving tasks and problems (including creation of archives containing very many files and optional transfer of archives to or from non-HPSS machines at sites which support an LLNL-compatible version of wuftpd with modifications to support the HPSS mover protocol). 


HTAR users may also benefit from familiarity with the HSI utility, which provides a user-friendly Unix-style interface to HPSS, with a number of features, such as the ability to recursively store, retrieve and list entire trees with a single command. HSI information is available at:




USING HTAR

1 How HTAR Works 

HTAR makes an archive (or library) file in the standard POSIX 1003.1 TAR format, which allows  

most vendor-native TAR programs to open any HTAR archive file. But HTAR offers more services than ordinary TAR, and it therefore  needs extra internal machinery to support those services. While

much of this extra machinery is hidden  during normal use, some of it reveals itself in HTAR status messages or command responses that might prove surprising or confusing without some insight into how HTAR works. So this section briefly explains how HTAR makes an archive file and the role that several support files play in that process. 


htar_figure_1

1.1 Archive File (name.tar)

When you run HTAR with the create-archive (-c) option, the program first opens a  connection to HPSS. It then deploys multiple threads to transfer in parallel the local-disk files that you specify (the "archive members," top 

part of the diagram) into a TAR-format envelope file created (unless you request  otherwise) in your HPSS home directory (bottom part of the diagram). 


This archive  file never exists on local disk (unless you demand it with the -E option), even in temporary directories on the machine where HTAR runs. Instead, HTAR reads the member files piecewise into its internal buffers and moves the data directly to HPSS (or to another host specified by -F), where it assembles the archive, shown as “archive.tar” in the Figure 1. The archive file consists of a series of TAR-format headers, each immediately followed by the data for the member file that is described by the TAR header.


HTAR simultaneously builds a separate index file (outside the archive) and a little  consistency file (deposited last inside the archive), which is discussed below. 


HPSS is very  reliable, but it is sometimes the case that a Class of Service (COS) that supports 2 or more copies of a file, should be used for files of special importance. COS-s are defined by the HPSS adninistrator at each site, so check with your local HPSS administrator to determine which COS(s) support multiple copies at your site, and then use HTAR's “-Y”option or HTAR's -Hcopies=2 option (if using automatic COS selection) to force creation of a duplicate (invisible) backup copy. You can also use HTAR's “-K” option to verify your archived results.


1.2 Index File (name.tar.idx) 

To allow archives of virtually unlimited size, and to support the direct extraction of any stored  archive member(s) without retrieving the whole archive to local disk, HTAR automatically builds an external index file to accompany every archive that you create. 


While making the archive, HTAR temporarily writes the index file to the local /tmp  file system on the machine where it runs, then transfers it (by default) to the same HPSS (or other remote) directory where the archive itself resides at the end of the process. 


Each HTAR index file contains one 512-byte record for every member file, directory entry, or symbolic link stored in  the corresponding archive file, regardless of the member file's size (so even a 10,000-file archive will have an index file of only about 5 Mbyte). HTAR index files are so much smaller than the archives that they support, that the index file often remains on HPSS disk (to rapidly respond to queries) even when the larger archive file itself migrates from HPSS disk cache to HPSS tape volumes. If you use HTAR's -E or -F options to force the archive to a location other than HPSS, the index file is written to the same location as the archive file. 


1.3 Consistency File (/usr/tmp/HTAR_CF_CHK_nnnnn_mmmmmmmmm) 

Because the archive and index files are separate, HTAR maintains a consistency check  between them in an additional 1-block (256-byte) file always parked (as a last step) at the end of each archive. This consistency file's name has the long numerical format shown above, but it begins with tmpdir, where tmpdir is the temporary file directory (normally /tmp) on the machine where HTAR created the archive file. HTAR never extracts this file (unless you specifically request it), but every use of -t and -v (together with -c or -x) reports this (perhaps unexpected) consistency file at the end of HTAR's list of archived contents. (Currently, the verification option (-K) neither reports this consistency file nor counts it, unlike table-of-contents option -t.) 

How to Use HTAR  

1 HTAR Command Line 

The HTAR command line has the general form:


htar action archive [options] [filelist


and the specific form  


htar -c|D|K|t|U|x|X -f archivename [-BdEFhHILmMoOPpqSTvVwY] [flist


where exactly one action and the archivename are always required, while the control options and (except when using -c) the filelist (or flist) can be omitted (and the options can share a hyphen flag with the action for convenience). 


Here,  


-c  (create) opens a connection to HPSS, creates an archive file in the HPSS director y (your home directory by default - not in the local filesystem) and with the name specified by -f, and transfers (copies) into the archive each file specified by filelist (required whenever you use -c). If archivename already exists, HTAR overwrites it without warning. To create a local archive file instead (the way TAR does), also use -E; to deposit it on a non-HPSS host, also use -F. If filelist specifies any directories, HTAR includes them and all of their children recursively. Use -P with -c to automatically create all needed subdirectories along the archive pathname. 


-D (soft-delete) opens a connection to HPSS, then reads the existing index file, creating a new temporary index file in the local filesystem and marking each of the specified member files as <deleted> in the new index file. It then replaces the existing index file with the new temporary copy.


-K  (uppercase kay, verify) opens a connection to HPSS (or to another remote host specified by -F), verifies the index file for the archive that you specify with -f, then uses the index file to verify every entry (member file) in the archive file itself. The default responses from -K appear very quickly and overwrite, so you may only be able to read the last one ("HTAR successful," if it is). If the index file is missing for an archive, -K reports the error message "no such file archivename.idx." If you combine -K with -v (verbose output), HTAR lists the name of each file (that it successfully finds) in the specified archive in alphabetical order, one per line, along with the size of each in bytes and in blocks (excluding the consistency file (section 1)), then gives a total file count. Compare this output with that from -t (below). 


-t  (table of contents) opens a connection to HPSS, then lists (in the order that they were stored) the files currently within the stored archive file specified by -f, along with their owner, size, permissions, and modification date (the list includes HTAR's own consistency file – see Figure 1). Here filelist defaults to * (all files in the archive), but you can specify a more restrictive subset (usually by making filelist a filter). Compare with -K output below. 


-x  (lowercase eks, extract) opens a connection to HPSS, (or to another remote host specified by -F), then transfers (extracts, copies) from the HPSS-resident (or remote) archive file specified by -f each internal file specified by filelist (or all files in the archive if you omit filelist). If filelist specifies any directories, HTAR extracts them and all their children recursively. If any file already exists locally, HTAR overwrites it without warning, and it creates all new files with the same owner and group IDs (and if you use -p, with the same UNIX permissions) as they had when stored in the archive. (If you lack needed permissions, extracted files get your own user and group Ids and the local UMASK permissions; if you lack write permission then -x creates no files at all.) Note that -x works directly on the remote archive file, using random access to avoid reading unneeded files; you never retrieve the whole archive from HPSS just to extract a few specified files from within it (impossible with TAR). 


-U (undelete) opens a connection to HPSS, then reads the existing index file, creating a new temporary index file in the local filesystem and unsetting the <deleted> flag, for of the specified member files. in the new index file. It then replaces the existing index file with the new temporary copy.


-X  (uppercase eks, index) opens a connection to HPSS (or to another remote host specified by -F), then creates an (external) index file for the existing archive file specified by -f (an HPSS-resident TAR-format file by default, a local TAR-format file if you also use -E, or remote if you use -F). Using -X rescues an HTAR archive whose (stored) index file was lost, and it enables HTAR to manage an archive originally created by traditional TAR. The resulting external index file is stored in HPSS if the corresponding archive lives in HPSS, but is stored on the local filesystem if the archive is also local (with -E). See the "How HTAR Works” section (page 11) for an explanation of HTAR index files. 


-f archivename (required option) specifies the archive file on which HTAR performs the -c|t|x|X|K actions described above. HTAR has no default for -f (whose argument must appear immediately after the option name). Since HTAR (normally) operates on HPSS-resident archive files, archivename also locates the archive file relative to your HPSS home directory: a simple file name here (e.g., abc.tar) resides in your HPSS home directory, while a relative pathname (e.g., xyz/abc.tar) specifies a subdirectory of your HPSS home directory (i.e., /users/unn/username/xyz/abc.tar). Never use tilde (~) in archivename because the shell expands it into your local home directory, not your HPSS home directory, unless they happen to coincidentally be the same at your site. HTAR's -f makes no subdirectories; you must have created them in advance (with HSI's mkdir command) before you mention them in archivename, (except for the -c command, which provides the -P option to create missing intermediate subdirectories). When used with -F to make or read an archive on a non-HPSS machine, archivename should be the full pathname of the archive on the remote machine (e.g., /var/tmp/abc.tar). 


lelist  specifies the input files for -c and the subset of archived files to process (for -t, -x, or -X). Omitting filelist for -c yields a null result and the error message "refusing to create empty archive." Omitting filelist for -t, -x, -X, or -K defaults to *, all files within the archive file specified by -f. Here filelist can include a blank-delimited list of files, UNIX file filters (wildcard characters ), or directory name(s) to be processed recursively. 



1.1 SYNTAX ISSUES

Traditional TAR is such an old utility (whose original use, writing bundled files to local tape drives, is  seldom needed now) that syntax differences have evolved under different versions of the UNIX operating system. AIX and Linux (using the GNU Tar program), for example, offer some different TAR options and use some of the same options (such as -L) for different purposes. 


An itemized comparison of TAR and HTAR features appears in a later section of this  manual. Generally, HTAR syntax follows the more restrictive implementations of TAR. Thus with HTAR:

  1. 1. one "action" -c|t|x|X|K is always required, but it need not come first on the HTAR command line. However, if the first option on the command line starts without a minus sign, but is an HTAR action character, then it is treated as if the option did start with a minus sign. For example, the following two command lines are equivalent:
  2. htar  -c -v -f abc.tar * 
  3. and
  4. htar  cv -f abc.tar * 
  5. 2. the archive specifier -f is always required and it must immediately precede its argument (-farchivename), regardless of where that pair falls on the HTAR command line.
  6. 3. any HTAR flag character that requires an argument, such as “-L pathname”, requires that the argument immediately follow the option character, with or without preceding whitespace.
  7. 4. all HTAR options, whatever their order, must precede the first member file name (all
  8. options must  precede flist or any filters that take the place of flist)
  9. 5. options may share the flag character (-) as long as the other rules above are also followed.


Thus these three combinations  


htar  -c -v -f abc.tar * 
htar  -cvf abc.tar * 
htar  -v -fabc.tar -c * 


are all equivalent, acceptable HTAR command lines.  


1.2 DEFAULTS 

Directories

By default, HTAR creates an archive by copying files from the current local directory where you are working  into a file in your HPSS home directory, and it extracts files by reversing that process. You must always specify the name of the archive file on which HTAR operates (there is never a default archive). In its reports, HTAR appends slash (/) to each directory name listed. 


File Names.  

Once you name the archive with the “-f archivename” option, HTAR names the corresponding external index file archivename.idx by default, and normally stores it in the same HPSS directory as the archive file. HTAR's -I option lets you specify a nondefault name or location for the index file, although in practice this is seldom needed, and should be used with care, as it is easy to forget where the index file is stored if if is not in the same directory as the archive file. The HTAR consistency file's name begins with tmpdir/HTAR (where tmpdir is the name of the temporary directory used to write the consistency file on the local machine, normally “/tmp”) (a default that you cannot change). HTAR consults the “TMPDIR” environment variable to try to find the directory to use for temporary files on the local machine, but default's to “/tmp” if TMPDIR is not set.


Class of Service (COS).


The HPSS administrator defines (at build time) which COS that HTAR should use when creating archive files. If the administrator specifies “auto”, then HTAR consults a site-specific setting to determine whether 

a) to store files in a COS that migrates to one or two tape copies after being written to HPSS disk cache, and 

b) determines which COS within the set of single or dual copy COSs to use, based on the archive file's size. You can override the COS to use by specifying either :


• “-Ycos” where cos is either the numeric COS ID, or the word “auto”, which tells HTAR to select the COS based upon factors such as the archive file size. 


• Setting the “HTAR_COS” environment variable, which has the same form as the “-Y” option. For example, using the Korn shell:

export HTAR_COS=auto


If COS is set to “auto”, you can override the default number of copies by using HTAR's “-Hcopies=n” command line option. In this case, HTAR will only select from COSs that have been defined by the HPSS administrator to make n copies when migrating data from HPSS disk cache to tape.



HTAR Error Conditions  


HTAR prefixes all ordinary messages with the string 'HTAR:', but it prefixes  nonfatal errors with 'INFO:' and fatal errors with 'ERROR:'. Unexpected situations are usually flagged with a '###WARNING' prefix, as the cases below show. 


Some of the most common error conditions, and HTAR's responses to them, are summarized here  to help you troubleshoot: 


1.1 Problem: HPSS is down. 

When HPSS is unavailable to users (perhaps for maintenance), no HPSS-resident archive can  be read or written. HTAR returns a message of this form and terminates:


hpssex_OpenConnection: unable to obtain remote site info 
result = -5000, errno = 0 
Unable to setup communication to HPSS. Exiting... 


1.2 Problem: Specified archive directory does not exist. 

If -f specifies a child directory (of your HPSS home directory) that you have not previously created, for example with HSI's “mkdir” command, HTAR returns the following error message:: 


***Error -2 on hpss_Open (create) for archivename 


You can correct this problem by doing either of the following:

  1. 1. use HSI or some other HPSS utility to create the missing subdirectories
  2. 2. use HTAR's “-P” command line option, when initially creating the archive, to cause HTAR to automatically attempt to create any missing subdirectories in the archive file path.

When you attempt to extract files from an archive in a nonexistent (sub)directory,  HTAR replaces the first line of this error message with: 

***Fatal error opening index file archivename.idx 


1.3 Specified archive file does not exist. 

If -f specifies an archive file that does not exist (perhaps because you deleted it or  mistyped its name), HTAR responds: 


[FATAL] no such HPSS archive file: archivename 


1.4 Specified index file does not exist. 

If you try to list (-t) or extract (-x) files from an actual HTAR archive whose  corresponding external index file (archivename.idx) has been deleted or moved, HTAR pinpoints the problem by reporting the missing index name: 

No such file: archivename.idx 


You can work around the missing index by using HTAR's -X (uppercase eks) option  to rebuild the index while the archive remains stored, or you can retrieve the whole archive from HPSS to the local machine with HSI, and then open it with TAR. 


1.5 HTAR's lelist omitted. 

If you try to create (-c) an archive without specifying a  filelist (or without using a filelist replacement such as -L), HTAR connects to HPSS but quickly terminates with the message:


Refusing to create empty archive.  


If you try to list (-t) or extract (-x) without specifying a  filelist, HTAR defaults to processing all files in the archive. 


1.6 HTAR run with no options. 

Because HTAR requires exactly one action (-c|t|x|X|K) and a specified archive file  (-f) to run, executing the program with nothing else on the command line will cause HTAR to display a syntax summary and then terminate. 


1.7 Command line too long for shell. 

The easy way to build an HTAR archive of very many like-named files is to specify them indirectly by using a UNIX metacharacter (filter, wild card) such as * (to match any string) or ? (to match any single character). But if the selected file set has thousands of members, the list of input names that the UNIX shell generates by expanding such an "ambiguous file reference" may grow too long to handle. See the Limitations and Restrictions section below for several ways to work around such excessively long command lines when building large archives with HTAR. 


1.8 Wild cards (wildcard characters ) used for retrieval. 

HTAR currently does not internally support the use of wildcard characters such as “*”. However, the shell can be used to expand the “*” character when creating an archive by expanding the “*” to match all of the local files. Since the shell does not have any knowledge of the files that are contained in the archive, “*” cannot be used to retrieve files from one ("no match"  is the usual, but not the only possible, error message). See the Retrieving Files section below for an analysis and possible ways to work around this limitation.



HTAR Limitations and Restrictions  

The current version of HTAR has the following known limitations or usage restrictions:  

1 I/O: 

You can redirect any HTAR output into a file (with >) for separate postprocessing (see the Retrieving  Files (page 41) example for one helpful application of this). But HTAR normally does not read from or write to UNIX pipes (standard input, standard output). Two HTAR control options, however, let you enable the use of pipes if you need them: 


◊ Read From Standard Input.  

Use HTAR's -Linputfile option with a hyphen (-) as the inputfile (that is, -L-) to read a list of files from a UNIX pipe (from standard input) instead of from the usual execute-line filelist. The "Too Many Names" discussion later in this section shows how to apply this technique to solve a practical problem when creating archives with very many input files. 


◊ Write To Standard Output.  

Use HTAR's -O (uppercase oh) option to write a file extracted with the -x option to standard  output (and hence to a UNIX pipe if you wish). Thus 


htar  -xf abc.tar -O def 


extracts file DEF from archive ABC.TAR in your storage home directory and displays it at  your terminal, while 


 htar  -xf abc.tar -O def | wc 


instead reports DEF's line, word, and character count. Since HTAR does not separate files in  the output stream, this usually is useful only when extracting a single file. 


2 wildcard characters 

HTAR leaves all processing of wildcard characters (filters or wild cards, such as *) to the shell. This  means that when you create an HTAR archive you can use * to select from among your local files to store, but when you retrieve specific files from within an already stored archive you CANNOT use * to select from among the stored files to get back. See the Retrieving Files (page 41) example below for details on this limitation, and a few suggested but inelegant ways to work around it. Another side effect of this approach to wildcard characters is that C shell (CSH) users must type the three-character string -\? (instead of -?) to display HTAR's help message. 

3 UPDATES

No options exist to update (replace), remove (delete), or append to individual files that HTAR has  already archived. You must replace (create again) an entire archive to alter the member files within 

it.  

4 NAME LENGTH

To comply with POSIX 1003.1 standards regarding TAR-file input names, the longest input file  name of the form prefix/name that HTAR can accept has 154 characters in the prefix and 99 characters 

in the name. Link names likewise cannot exceed 99 characters.  

5 FILE SIZE

The maximum size of a single member file within an HTAR archive is 68 Gbyte (expanded from  the former 8-Gbyte limit once imposed by the format of the TAR header). HTAR imposes no limit on the maximum size of an archive file (some have successfully reached 10 terabytes), but local disk space (when using -E or -F) or storage space might externally limit an archive's size. Users can specify a maximum number of member files per archive with HTAR's -M option. 

6 PASSWORDLESS FTP 

Because HTAR (unlike FTP) does not support user dialog with a server and has no password-passing  option, you can only manipulate HTAR archives, using HTAR's “-F” option, on machines with preauthenticated (passwordless) FTP servers. HTAR uses its own modules to perform FTP protocol - it does not run the PFTP client. 


7 TOO MANY NAMES

For users who make HTAR archives containing thousands of files, a different kind of limitation poses  problems, a limitation of the UNIX shell (csh, bsh, ksh) rather than of the HTAR program itself. One would normally select multiple files for archiving by using a UNIX "ambiguous file reference," a partial file name adjacent to one or more shell wildcard characters (or "wild card" filters, such as the asterisk(*)). Your current shell automatically expands the metacharacter(s) to generate a (long) alphabetical list of matching file names, which it inserts into the command line as if you had typed them all yourself. Thus 


 htar  -cf test.tar a* 


might become equivalent to a command line with dozens of a-named files on the end. Each shell has a  maximum length for command lines, however, and if your specified metacharacter filter matches thousands of file names, HTAR's command line may grow too long for the shell to accept. This would prevent building your intended many-file archive. 


WORKAROUND 1: USE A DIRECTORY.  

The most effective, least resource-intensive way to work around the problem of having a (virtual) HTAR  command line too long for the shell to handle is to plan ahead and keep (or generate) in a single directory all and only the files that you want to archive. HTAR processes directory names recursively by default. So if you specify only the relevant directory name on HTAR's command line, HTAR will (internally) archive every file within the directory without any filter-induced length problems. For example, 


 htar  -cf test.tar projdir 


will successfully archive any number of files within the PROJDIR directory yet use no troublesome  shell-mediated file-name generation to do it. 


WORK-AROUND 2: USE FIND.  

The UNIX FIND utility is designed to produce lists of files (that meet specified criteria) to feed into other  programs for further processing. So FIND offers a second way for HTAR to archive very large numbers of files without having a very long command line. Indirection is required for success, however. The "natural" use of FIND's -EXEC option to run another program (here, HTAR) driven by a list of files from FIND, for example, 


[WRONG] 
find  . -name 'a*' -print -exec htar -cf test.tar {} \; 


fails  to produce the desired effect. This actually runs HTAR once (to build an archive called test.tar) for each successive input file (here, files beginning with A). If there are thousands of files, HTAR just repeatedly creates a one-file archive thousands of times (each replacing the previous archive) so that only the last file processed really remains in test.tar at the end. 


Instead, enable FIND to pipe standard input directly into HTAR by invoking HTAR's -L option with  a hyphen (-) argument instead of a file name. The correct sequence is: 


 find . -name 'a*' -print | htar -cf test.tar -L -  


(The use of the metacharacter * in FIND's command line here does not pose the same too-long problem as  it did originally in HTAR's command line because the surrounding quotes shelter the filter from shell processing. FIND's -NAME option generates the list of matching names internally, without expanding FIND's command line.) If you need to keep the list of input names (for verification or audit purposes, for example), you could break this single line into two equivalent steps mediated by a helper file (here called ALIST) that you preserve: 


 find . -name 'a*' -print > alist  
 htar -cf test.tar -L alist  


HTAR Environment Variables  

HTAR uses the following HPSS-related environment variables if they are available on the machine  where it runs: 


HPSS_HOSTNAME  

specifies the host name or IP address of the network interface to which HPSS mover(s)  should connect when transferring data at HTAR's request (overridden by the information specified in the file /usr/local/etc/HPSS.conf). The default interface (the alternative to HPSS_HOSTNAME) is often slow, such as the control Ethernet of an IBM SP machine. This setting is overridden by the settings in the HPSS.conf file, if one is found with the appropriate specifications. The HPSS.conf file is normally set up by the HPSS administrator, and can specify more than one network interface to be used for parallel transfers.


HPSS_PATH_ETC  

specifies the pathname of a local directory containing the HPSS network options file.  


HPSS_SERVER_HOST  

specifies the server host name and optional port number of the HTAR server.  


HTAR_COS  

specifies the default class of service (COS) ID for the archive file that HTAR creates,  or contains the string AUTO to force HPSS to automatically select the class of service based on the file size. HTAR option -Y overrides the HTAR_COS environment setting (see the end of the next section (page 38) for details).  The syntax for the environment variable is the same as the syntax for the -Y command line option.


TMPDIR

specifies the directory to use when HTAR creates temporary files, such as the index file and the consistency file.


HTAR Command Line  

1 Action Options 

Exactly ONE of these action options is required every time that you run HTAR.  


-c  (create) opens a connection to HPSS, creates an archive file within HPSS (not on the local filesystem) with the name specified by -f, and transfers (copies) into the archive each file specified by filelist (required whenever you use -c). If archivename already exists, HTAR overwrites it without warning. To create an archive file on the local filesystem instead (the way TAR does), also use -E; to write it to a nonHPSS host, also use -F. If filelist specifies any directories, HTAR includes them and all of their children recursively. Use -P with -c to automatically create all needed subdirectories along the archive pathname. 


-D soft-deletes the specified member files from the archive, by marking them as <deleted> in the index file. When the archive file is repacked (planned for a future release), deleted files will not be included in the repacked archive. 


-K  (uppercase kay, verify) opens a connection to HPSS (or to another remote host specified by -F), verifies the index file for the archive that you specify with -f, then uses the index file to verify every entry in (member of) the archive file itself. The default responses from -K appear very quickly and overwrite, so you may only be able to read the last one ("HTAR successful," if it is). If the index file is missing for an archive, -K reports the error message "no such file archivename.idx." If you combine -K with -v (verbose output), HTAR lists the name of each file (that it successfully finds) in the specified archive in alphabetical order, one per line, along with the size of each in bytes and in blocks (excluding the consistency file (page 11)), then gives a total file count. Compare this output with that from -t (below). 


-t  (table of contents) opens a connection to HPSS, then lists (in the order that they were stored) the files currently within the stored archive file specified by -f, along with their owner, size, permissions, and modification date (the list includes HTAR's own consistency file. Here filelist defaults to * (all files in the archive), but you can specify a more restrictive subset (usually by making filelist a filter). 




-U undeletes the specified member files from the archive which were previously soft-deleted by -D, by removing the <deleted> flag in their index file entries.


-x  (lowercase eks, extract) opens a connection to HPSS (or to another remote host specified by -F), then transfers (extracts, copies) from the stored (remote) archive file specified by -f each internal file specified by filelist (or all files in the archive if you omit filelist). If filelist specifies any directories, HTAR extracts them and all their children recursively. If any file already exists locally, HTAR overwrites it without warning, and it creates all new files with the same owner and group IDs (and if you use -p, with the same UNIX permissions) as they had when stored in the archive. (If you lack needed permissions, extracted files get your own user and group IDs and the local UMASK permissions; if you lack write permission then -x creates no files at all.) Note that -x works directly on the remote archive file by issuing random reads; you never have to retrieve the whole archive from HPSS just to extract a few specified files from within it, and you never have to read through the archive file from the start in order to find the file(s) that are being extracted (impossible with standard TAR). 


-X  (uppercase eks, index) opens a connection to HPSS (or to another remote host specified by -F), then creates an (external) index file for the existing archive file specified by -f (an HPSS-resident TAR-format file by default, a local TAR-format file if you also use -E, or remote non-HPSS machine if you use -F). Using -X rescues an HTAR archive whose (stored) index file was lost, and it enables HTAR to manage an archive originally created by traditional TAR. The resulting index file is stored in the same location as the archive file. See the "How HTAR Works" section (page 11) for an explanation of HTAR index files. 



2 Archive File Option 

This option is required every time that you run HTAR unless you only use the “-?” option.

-f archivename 

(required option) specifies the archive file on which HTAR performs the -c|t|x|X|K  actions described above. HTAR has no default for -f (whose archivename argument must appear immediately after the option name). 


Since HTAR (normally) operates on HPSS-resident  archive files, archivename also locates the archive file relative to your HPSS home directory: a simple file name here (e.g., abc.tar) resides in your HPSS home directory, while a relative pathname (e.g., xyz/abc.tar) specifies a subdirectory of your HPSS home directory (i.e., /users/unn/username/xyz/abc.tar). Never use tilde (~) in archivename because the shell expands it into your local home directory, not your HPSS home directory. HTAR's -f makes no subdirectories; you must have created them in advance (with HSI's mkdir option) before you mention them in archivename, or you must use the “-P” option (on creates) to cause HTAR to try to create any missing intermediate subdirectories. When used with -F to make or read an archive on a nonHPSS machine, archivename should be the full pathname of the archive on the remote machine (e.g., /var/tmp/abc.tar). 





3 Control Options 

These options change how HTAR behaves, but none is required (default values are indicated when  they exist). 


-?  displays a short help message (a syntax summary of the HTAR command line and a one-line description of each option). Users running HTAR under the C shell (CSH) will probably have to use the three-character string -\? to display this help message. 


-B  adds block numbers to the listing (-t) output (normally used only for debugging). 


-d debuglevel (default is 0) sets to an integer from 0 through 5 the level of debug output from HTAR, where 0 disables debug information for normal use and 1 to 5 enable progressively more elaborate debug output. 


-E  emulates TAR by forcing the archive file to reside on the local machine (where you run HTAR) rather than in HPSS, where it resides by default (-f always specifies the archive pathname, which -E interprets as local rather than remote). See also -F for making non-HPSS remote archives. The HTAR index file goes into the same (local) directory as the archive. Option -P works with -E. 


-F [user@]host[#port

overrides the HTAR default of a stored archive and specifies on which remote machine (host) the archive resides other than in HPSS. For creating archives, host is the sink machine; for extracting files from existing archives host is the source machine (see the between-machine example for how to use -F properly). See also -E for making nonHPSS local archives. The HTAR index file goes into the same (remote) directory as the archive. HTAR still contacts the HPSS server even though the archive does not reside in HPSS, just to log all -F transactions. The user and port fields are seldom needed or appropriate because they usually betray the need for an FTP password and HTAR has no means to transmit one (the default user is you, the default port is 21). Important Note: The -F option is only available at sites that support an LLNL-compatible version of the wuftpd program with modifications to support the HPSS mover protocol. 


WARNINGS: using -F to specify a machine that is running the HPSS Parallel FTP Daemon, in order to transfer the file to HPSS, is not only completely  unnecessary, because it is HTAR's default, but also inefficient, because it undermines HTAR's built-in techniques for moving files to or from stored archives quickly and effectively. Also, the -P option does not work with -F. 


-H subopt[:subopt...] 

specifies a colon-delimited list of HTAR suboptions to control program execution.  Possible subopt values currently include: 


acct=id|acctname specifies the numeric account ID or alphabetic account name to use for the current HTAR run. This option is only meaningful for HPSS-resident archives, and is used at sites that configure HPSS for site-style accounting. See your local HPSS administrator for information about HPSS accounting at your site.


cix  used with the “extract” (-x) operation with HPSS-resident archives. If specified, precopies the index file to a temporary local file before reading the archive file. This option is normally not needed, but was added to avoid problems that were encountered with multithreaded I/O on some hardware platforms.


copies=n Used with the “create” (-c) option for HPSS-resident archives. n specifies the number of tape copies that HPSS should make when migrating the archive file from disk cache. This option is only used if automatic Class of Service selection is enabled. (see -Y, below)


crc  enables generation of Cyclic Redundancy Checksums (CRCs) when copying member files into the archive and when verifying the contents of the archive (-K command line option, or -Hverify option for creates). Enabling checksums usually degrades HTAR's I/O performance and increases its CPU utilization. 


exfile =path specifies a pathname to an “exceptions” file, which contains a list of failed member files and an explanation of the failure. Note: this option is currently implemented only for the GPFS/HPSS Interface (GHI).


family =id[,index_id] specifies tape file family ID to use when creating HPSS-resident archive files, and, optionally, the family ID to use when creating the index file. This option is useful at sites which make use of the HPSS “file family” capability. Family ID 0, which is the default, uses the default pool of tapes. Contact your HPSS administrator to determine the file families that are available at your site.


okfile =path - specifies path to a file to contain a list of successfully transferred files. Note: this option is currently implemented only for the GPFS/HPSS Interface (GHI).


nocfchk  causes HTAR to disable the verification of the index file and the consistency file (see page 11). Use of this option can avoid extra tape mounts if the consistency file lives on a different tape cartridge than the specified member file(s). Currently, this option is only effective for the -D (soft-delete) action.


nocrc (the default) disables generation of CRCs when creating filed, and when extracting files from, or verifying existing archive files.


nostage  avoids prestaging tape-resident HPSS-resident archive files when HTAR performs -x or -X actions. 


port =x specifies the TCP port number to use when HTAR connects to the remote HPSS server. This parameter is only used in conjunction with the -Hserver parameter.


relpaths used with the “verify” (-K) action. When comparing member files in the archive file with local files, forces relative local file paths to be used by removing any leading “/” from the member file pathname before attempting to read it in the local filesystem.


rmlocal  removes local member files after HTAR has successfully written both the archive file and the index file (used with -c). 


server =host specifies the hostname or TCP/IP address of the HPSS server. The HPSS administrator defines the default server host or IP address when HTAR is built. The “-Hport” parameter (see above) can be used in conjunction with this option to completely specify the connection address to be used.


tss =stack_size specifies the thread stack size to be used when HTAR creates threads to read local files during a create (-c) operation. In most cases, the system default value can be used, but situations such as the case where the default thread stack size is set very large, for example, on machines that are tuned for compute-type problems, can cause HTAR thread creations to fail. stack_size can be specified in bytes, kilobytes, or megabytes by appending a case-insensitive suffix (“k”, “kb”, “m” or “mb”).


umask =octal_mask used with the “-c” option. This specifies the HPSS umask value to be set during HTAR startup. This impacts the permissions that are set on the resulting archive and index files that HTAR creates in the same manner as the Unix “umask” command. The default umask value is defined by the HPSS administrator.


verify=test[,test,..] performs posttransfer verification after creating an archive, where test can be any of: 


cksum | nocksum  enables or disables (the default) verifying member file checksums by reading the archive file or by comparing the index file checksum with a checksum of the local member files. 


compare | nocompare  enables or disables (the default) byte-by-byte comparison of the local member files with the corresponding archive files. 


paranoid | noparanoid enables or disables (the default) extreme efforts to detect problems (such as discovering whether local files were modified during archive creation before deleting them if authorized by RMLOCAL). 


-h  (used only with -c; has no effect otherwise) for each symbolic link that it encounters, causes HTAR to replace the link with the actual contents of the linked-to file (stored under the link name, not under the file's original name). Later use of -t or -x treats the linked-to file as if it had always been present as an actual file with the link name. Without -h, HTAR records, reports, and restores every symbolic link overtly, but it does not replace the link with the linked-to contents. 



-I indexname specifies a nondefault name for the HTAR external index file that supports the archive specified by -f. 


WARNING: if you use -I to make any nondefault index name (3 cases, below) when  you create (-c) an archive, then you MUST also use -I with the same argument every time you extract (-x) files from that archive (else HTAR will look for the default index, not find it, and end with an error). 


There are three cases based on the first character of indexname



. (dot)  If indexname begins with a period (dot), HTAR treats it as a suffix to append to the current archive name. 

Example: -I .xnd yields an index file called archivename.xnd 


/  If indexname begins with a / (slash), HTAR treats it as an absolute pathname (you must create all the subdirectories ahead of time with FTP's mkdir option). 

Example: -I /users/unn/yourname/projects/text.idx uses that absolute pathname in HPSS or on the local filesystem (-E) or remote filesystem (-F) for the index file. 


other  If indexname begins with any other character, HTAR treats it as a relative pathname (relative to the HPSS/local/remote directory where the archive file resides, which might be different than your HPSS/local/remote home directory. 


Example: -I projects/first.index locates first.index at HPSS-homedir/projects/first.index if the archive file is in your HPSS home directory(the default), but tries to locate first.index at HPSS-homedir/projects/projects/first.index if the archive was specified as -f projects/aname in the first place. (All such subdirectories must be created in advance or the -P command line option must be specified to create any missing intermediate subdirectories.) 


-L inputfile (used with -c) writes the files and directories specified by their literal names (in the inputfile, which contains file names one per line) into the archive specified by -f. Directories are treated recursively; a directory entry and its subdirectories or subfiles are all written to the archive. Currently, normal wildcard characters (tilde, asterisk, question mark) are treated literally, not expanded as filters. Replace inputfile with a hyphen (-L-) for HTAR to read the list of file names from standard input; the HTAR "Limitations" section shows how to use this technique. 


(used with -x) retrieves the files and directories specified by their literal names. See  the Retrieving Files (page 41) example below for how to use -L instead of wild cards to retrieve only specified files from a stored archive. 


WARNING: HTAR's -L differs from both AIX TAR's -L (which handles directories  nonrecursively) and Linux TAR's -L (which changes tapes). 


-M maxfiles 

(release default is 5,000,000) specifies the maximum number of member files  allowed when you use -c to create an HTAR archive. Internal limits are set when HTAR is compiled at each site; see your local HPSS administrator to find out what the limit is at your site. 


-m  (used only with -x; applies only to files) makes the time of extraction the last-modified time for each member file (the default preserves each file's original time of last modification). For directories, HTAR itself always preserves the original modification time for top-level directories that it copies from an archive, even if you invoke -m. However, subsequently creating subdirectories or files within a directory may cause the operating system to change the modification time on one or more directories (so that it too appears to be the time of extraction). 



-n timeinterval 

(used only with -c; has no effect otherwise) includes in a new archive only those files  (that meet your other naming criteria and) that were either created or modified between now and the start of timeinterval. Option -n is intended mostly to simplify the creation of incremental backup archives. Here timeinterval can have the form: 


d  an integer that specifies days (e.g., 5 for 5 days), or 


:h  an integer that specifies hours (e.g., :12 for :12 hours), or 


d:h  a pair of integers that specify days and then hours (e.g., 1:6 for 1 day and 6 hours). 


-O  (uppercase, used only with -x, mimics the Linux TAR --to-stdout option) writes the file(s) extracted from an archive (with -x) to standard output (and hence to a UNIX pipe for postprocessing, if you wish). The HTAR "Limitations" section (page 23) above shows how to use this technique. Since HTAR does not separate files in the output stream, -O is usually useful only when you extract a single file. 


-o (lowercase, used only with -x) (default for all nonroot users) causes the extracted files  to take on the user and group ID (UID, GID) of the person running HTAR, not those of the original archive. This makes a difference for root users but not for ordinary HTAR users. 


-P  (uppercase pea; used only with -c, has no effect otherwise) automatically creates all intermediate subdirectories specified on the archive file's pathname if they do not already exist. HTAR's -P thus works the same as MKDIR's -P option. You can use -P with archives created in HPSS (the default) or on your local machine (with -E), but it is ignored for other remote archives (those created with -F). 


-p  (lowercase pea) preserves all UNIX permission fields (on extracted files) in their original modes, ignoring the present UMASK (the default changes the permissions to the local UMASK where HTAR extracts the files). Root users can also preserve the setuid, setgid, and sticky bit permissions with this option. 


-q  (quiet mode) supresses most HTAR informational messages, such as its usual interactive progress reports as it creates an archive file. 


-S bufsize 

(release default is 8 Mbyte) specifies the buffer size to use when HTAR reads from or writes  to an HPSS archive file. Here bufsize can be a plain integer (interpreted as bytes), an integer suffixed by k, K, kb, or KB for kilobytes, or an integer suffixed by m, M, mb, or MB for megabytes (e.g., 16mb). HTAR's default bufsize is a compile-time option that is set by the HPSS administrator when HTAR is built. See your local HPSS administrator to find out what the default value is for HTAR at your site. 



-T maxthreads 

(release default is 30) specifies the maximum number of threads that HTAR will use to copy member files  to the archive file (-c). This value is ignored when extracting member files from an archive (-x). HTAR reports the actual number of threads used on each run if you invoke -v or -V. HTAR creates a maxthreads pool of threads; the actual number of threads used will vary based upon I/O buffer size (see -S), average member file size, and HPSS network transfer rates. Normally, the smaller the member file size, the more threads can be active when creating files. For small files, setting -T to a larger number (up to 100 has been tested) can dramatically improve the transfer rates if the operating system is able to support the load.


-U undeletes soft-deleted member files (see -D, above) by copying the existing index file to a temporary local file, removing the <deleted> flag in the specified index entries along the way, and then rewriting the temporary index to the same location (HPSS, local filesystem (-E) or remote filesystem (-F)).


-V  (uppercase vee) requests "slightly verbose" reporting of file-transfer progress (often very brief, overwritten messages to the terminal). Do not use with -v. 


-v  (lowercase vee) requests "very verbose" reporting of file-transfer progress. For each member file transferred to an archive, HTAR prints “A” (added) and its name on one line; for each member file extracted from an archive, HTAR prints “X”, its name, and its size on a line, along with a summary of the whole transfer at the end. For each file added during a “build index” (-X) operation, HTAR prints “i” and its name. For each file verified during a verify operation (-K), HTAR prints “v” (or “V” if comparing archive and local file contents), its name, and a trailing “/” if this is a directory. For each file that is soft-deleted during a delete (-D) operation, HTAR prints “d”; similarly, for an undelete (-U) operation, HTAR prints “u”.


Do not use  with -V. 


-w  (works only with -x, -D or -U, not with -c) lists (one by one) each member file to be extracted from the archive and prompts you for your choice of confirmatory action, where possible responses are: 


y[es]  extracts the named file. 


n[o]  skips the named file. 


a[ll]  extracts the named file and all remaining (not yet processed) selected files too. 


q[uit]  skips the named file and stops prompting. HTAR ends. 


-Y auto | [archiveCOS][:indexCOS

specifies the HPSS class of service (COS) for each stored archive and its corresponding  index file. The default is “auto”, which causes HTAR to use a site-specific COS chosen for archive suitability, based upon factors such as the archive file size and the number of copies desired. You can specify a nondefault COS for the archive, the index, or both (e.g., -Y120:110), but this is usually undesirable except when testing new HPSS features or devices - if your archive size grows to exceed that allowed by a nondefault COS, HPSS will stop the transfer and HTAR will end with an error.   Use the “-Hcopies=2” option to request dual-copy storage of any mission critical archive of any size for extra safety. Using -Y overrides the HTAR_COS environment variable. HSI's “ls -U” command (with the -H option if headings are desired) reports the COS for stored files (in output column 3), while HSI's “set cos=n|auto” command offers a different way to specify the HPSS class of service. 

Note: For HTAR release 3.5, either  archiveCOS or indexCOS may be specified as auto to facilitate specifying auto for the archive file, and a numeric COS ID for the index file, or vice-versa.




HTAR Examples  


1 Creating an HTAR Archive File 

OBJECTIVE:  

To create an HTAR archive file in a subdirectory of your HPSS home directory and use  a filter to install several files within that stored archive. 


STRATEGY:  


  1. 1. One HTAR command line can perform all of the desired tasks quickly and in parallel:


• The -cvf options create (c) an archive, verbosely (v) report the incoming files, and  (f) name the envelope file. 


• The relative pathname case3/myproject.tar locates the archive (myproject.tar) in  pre-existing subdirectory case3 of your HPSS home directory (omitting case3/ leaves the archive at the top level of your HPSS home directory). HTAR will not create case3 by default, however; you must either have previously used HSI's mkdir command, or else you must invoke -P to explicitly request creation of all needed subdirectories along the archive pathname. 


• File filter tim* selects all and only the files whose names begin with TIM (in the  directory where you run HTAR) to be stored in the archive. 


  1. 2. HTAR opens a preauthenticated connection to your storage (HPSS) home directory  and reports its housekeeping activities (very quickly, in lines that overwrite, so you may not notice all of these status reports on your screen).


  1. 3. HTAR creates your requested archive and uses parallel connection to move your requested files directly into it. Directories are handled recursively  and directory names (if any) appear with a slash (/) appended to identify them.
  2. 4. The last incoming file that HTAR reports is always the 256-byte consistency file by  which HTAR coordinates your archive with its external index file. 
  3. 5. HTAR summarizes the work done (time, rate, amount, thread count), then copies the index file that it made into HPSS, destroys the local version, and ends.  
htar -cvf case3/myproject.tar tim* ---(1)  
HTAR: Opening HPSS server connection ---(2)  
HTAR: Getting HPSS site info  
HTAR: Writing temp index file to /usr/tmp/aaamva09A  
HTAR: creating HPSS Archive file case3/myproject.tar ---(3)  
HTAR: a tim1.txt  
HTAR: a tim2.txt  
HTAR: a tim2a.txt  
HTAR: a tim3.a  
HTAR: a time.txt  
HTAR: a time2.gif  
HTAR: a /tmp/HTAR_CF_CHK_13805_997722535 ---(4)  
HTAR Create complete for case3/myproject.tar. 399,360 ---(5)
bytes written for 6 member files, max threads: 8 
Transfer time: 0.055 seconds (7.257 MB/s)  
HTAR: Copying Index File to HPSS...Creating file  
HTAR: HTAR SUCCESSFUL  





2 Retrieving Files from within an Archive 

OBJECTIVE:  To retrieve several files from within an existing stored HTAR archive file (without retrieving the whole archive first). 


STRATEGY:  HTAR does not process wildcard characters (file filters such as *) itself, but leaves them for the shell to expand and compare with file names in your local directory. Hence, you CANNOT use * to select a subset of already archived files to retrieve. For example, "natural" command lines 


htar -xvf case3/myproject.tar time* [WRONG]  
htar -xvf case3/myproject.tar 'time*' [WRONG]  

both FAIL to select (and hence to retrieve) any stored files from the MYPROJECTS.TAR  stored archive (each yields its own set of error messages). These lines work only accidentally, if you happen to have files with the same name in both your local directory and your stored archive (unlikely except when you are just testing HTAR). 


WORKAROUNDS:  

(1A) Type the name of each file that you want to retrieve (at the end of the HTAR execute  line). 

(1B) If you have a long list of files to retrieve, or if you plan to reuse the same retrieval  list often, put the list of sought files into a file and use HTAR's -L option to invoke that list. You can use HTAR's -t (reporting) option to help generate that retrieval list by reporting all the files you have archived and then editing that report to include only the relevant file names to retrieve. For instance, 


htar -tf case3/myproject.tar > hout  
grep 'time' hout | cut -c 50-80 > tlist  


captures the list of all your stored files in the local file HOUT, and then selects just the  file names that contain the string TIME for use with HTAR's -L option (here, in local file TLIST). Note that HTAR automatically appends slash (/) at the end of every directory name that -t reports. 


  1. 1. Once you have laid the groundwork above, a single HTAR command line can retrieve  your specified files quickly and in parallel from within your stored archive:


• The -xvf options request retrieval/extraction (x), verbosely (v) report the retrieved  files, and (f) name the target archive.


• The relative pathname case3/myproject.tar locates the archive (myproject.tar) in  pre-existing

subdirectory case3 of your storage home directory.


• The explicit file list (1A) or name-containing file (1B) selects all and only the files  that you

want (here, those whose names begin with TIME, a subset of all files stored  in this archive in the previous example).


  1. 2. HTAR opens a preauthenticated connection to your storage (HPSS) home directory  and reports its housekeeping activities (very quickly, in lines that overwrite, so you may not notice all of these status reports on your screen).
  2. 3. HTAR uses its external index to locate in the archive the (two) specific files that you  requested and then it transfers them by using parallel connections (but not the PFTP client) to your local machine without retrieving the whole archive file. 
  3. 4. HTAR summarizes the work done (time, rate, amount) and then ends.  


htar -xvf case3/myproject.tar time.txt time2.gif ---(1A)  

OR  

htar -xvf case3/myproject.tar -L tlist ---(1B)  


HTAR: Opening HPSS server connection ---(2)  
HTAR: Reading index file  
HTAR: Opening archive file  
HTAR: Reading archive file ---(3)  
HTAR: x time.txt, 1085 bytes, 4 media blocks  
HTAR: x time2.gif, 3452 bytes, 8 media blocks  
HTAR: Extract complete for case3/myproject.tar, ---(4)
2 files. total bytes read: 116,736 in 0.070 seconds (1.669 MB/s )

HTAR: HTAR SUCCESSFUL

HTAR: HTAR SUCCESSFUL  


3 Using CRC Checksums 

OBJECTIVE:  To include Cyclic Redundancy Checksums (CRCs) during a “creation” (-c) run 


STRATEGY:  HTAR provides a command line option (“-Hcrc”) to cause it to generate a CRC for each member file as it is being copied into the archive file. When this option is specified, HTAR will use additional CPU time, and the I/O performance will noticeably decrease. The actual amount of degradation varies from machine to machine. However, using this option is worthwhile if you are concerned about the reliability of HPSS storage media, such as tapes, and you want to verify that files have not changed when they are read back using HTAR's “extract” option (-x). 


In addition, if “-Hcrc” is specified for a listing run (-t), HTAR will display the member file CRC in square brackets “[]”after the permissions field on each line. If no CRC is available for the member file, an empty pair of brackets is displayed.


  1. To illustrate the difference, 
  2. (1) first create an archive file without specifying “-Hcrc” and 
  3. (2) then display its contents (-t), but with “-Hcrc” specified
  4. (3) then recreate the archive file, this time with "-Hcrc" specified during the create operation and
  5. (4) redisplay its contents with -Hcrc specified


htar -cvf case3/myproject.tar tim* --- (1)
HTAR: a tim1.txt 
HTAR: a tim2.txt
HTAR: a tim2a.txt
HTAR: a tim3.a
HTAR: a time.txt
HTAR: a time2.gif
HTAR: a /tmp/HTAR_CF_CHK_28518_1220351609
HTAR Create complete for case3/myproject.tar. 399,360 bytes written for 6 member files, max threads: 8 Transfer time: 0.082 seconds (4.878 MB/s)
HTAR: HTAR SUCCESSFUL
htar -Hcrc -tvf case3/myproject.tar --- (2)
HTAR: -rw-r--r-- [] gleicher/hpss 9367 2005-02-04 09:05 tim1.txt
HTAR: -rw------- [] gleicher/hpss 10568 2008-09-02 03:12 tim2.txt
HTAR: -rwx------ [] gleicher/hpss 4396 2008-09-02 03:13 tim2a.txt
HTAR: -rwx------ [] gleicher/hpss 253839 2008-09-02 03:13 tim3.a
HTAR: -rwx------ [] gleicher/hpss 1786 2008-09-02 03:14 time.txt
HTAR: -rw------- [] gleicher/hpss 113512 2008-09-02 03:14 time2.gif
HTAR: -rw------- [] gleicher/gleicher 256 2008-09-02 03:33 /tmp/HTAR_CF_CHK_28518_1220351609
HTAR: Listing complete for case3/myproject.tar, 7 files 7 total objects
HTAR: HTAR SUCCESSFUL


htar -Hcrc -cvf case3/myproject.tar tim* --- (3)_
HTAR: a tim1.txt 
HTAR: a tim2.txt
HTAR: a tim2a.txt
HTAR: a tim3.a
HTAR: a time.txt
HTAR: a time2.gif
HTAR: a /tmp/HTAR_CF_CHK_28618_1220351078
HTAR Create complete for case3/myproject.tar. 399,360 bytes written for 6 member files, max threads: 8 Transfer time: 0.066 seconds (6.075 MB/s)gleicher@toofast26[/home/toofast/gleicher/temp]: 


htar -Hcrc -tvf case3/myproject.tar --- (4)
HTAR: -rw-r--r-- [0xf4124da1] gleicher/hpss 9367 2005-02-04 09:05 tim1.txt
HTAR: -rw------- [0x6a20d2c7] gleicher/hpss 10568 2008-09-02 03:12 tim2.txt
HTAR: -rwx------ [0xaee5f5b9] gleicher/hpss 4396 2008-09-02 03:13 tim2a.txt
HTAR: -rwx------ [0x036ebadc] gleicher/hpss 253839 2008-09-02 03:13 tim3.a
HTAR: -rwx------ [0x7acb1840] gleicher/hpss 1786 2008-09-02 03:14 time.txt
HTAR: -rw------- [0xcc713d8a] gleicher/hpss 113512 2008-09-02 03:14 time2.gif
HTAR: -rw------- [0xa9bcf0db] gleicher/hpss 256 2008-09-02 03:24 /tmp/HTAR_CF_CHK_28618_1220351078
HTAR: Listing complete for case3/myproject.tar, 7 files 7 total objects
HTAR: HTAR SUCCESSFUL

4 Verify The Contents of an Archive During Creation

OBJECTIVE:  To verify the contents of the archive file during the creation run 

STRATEGY: HTAR provides the “-Hverify=option[,option...]” command line option, which causes HTAR to first create the archive file normally, and then to go back and check its work by performing a series of checks on the archive file. You choose the types of checks to be performed by specifying one or more comma-separated options. The options can be either individual items, or the keyword “all”, or a numeric level between 0, 1 or 2. Each numeric level includes all of the checks for lower-valued levels and adds additional checks. The verification options are:

all Enables all possible verification options except “paranoid”

info Reads and verifies the tar-format headers that precede each member file in the archive

crc Reads each member file and recalculates the Cyclic Redundancy Checksum (CRC), and verifies that it matches the value that is stored in the index file. Note that this option only applies if the “-Hcrc” option was specified, which causes a CRC to be generated for each member file as it is copied into the archive file.

compare This option directs HTAR to compare each member file in the archive with the original local file. The “-Hrelpaths” option can be specified here to handle the case where the member files in the archive have an absolute path (starting with “/”), but were previously extracted (using “-x”) to a local directory, since HTAR removed the leading “/” when extracting. If “-Hrelpaths” is NOT specified, then HTAR will attempt to read the member files by using the original absolute pathname in the archive file TAR header. If, on the other hand, “-Hrelpaths” is specified, then HTAR will remove the leading “/” in the member file pathname and use it as a path relative to the current working directory when you start HTAR.


paranoid This option is only meaningful if “-Hrmlocal” is specified, which causes HTAR to remove any local files or symbolic links that have been succuessfully copied to the archive file. If “paranoid” is specified, then HTAR makes one last check before removing local files or symlinks to verify that:

a. For files, the modification time has not changed since the member file was copied into the archive

b. The object type has not changed, for example, if the original object was a file, it has not been deleted and recreated as a symlink or directory, etc.


0 Same as “info”


1 Same as “info,crc”


2 Same as “info,crc,compare”


It is also possible to specify a verification option such as “all”, or a numeric level, such as 0, 1 or 2, and then selectively disable one or more options. In practice, this is rarely, if ever, useful, but the following options are provided:


nocompare Disables comparison of member files with their original local files


nocrc Disables CRC checking


noparanoid Disables checking of modification time and object type changes 


In the example, 

(1) the archive file is created (-c) with verification level 2, including CRC generation and checking. The verbose output option (-v) is used to cause HTAR to display information about each file that is added during the create phase, and then verified during the verification phase.

(2) the archive file is then listed (-t) using the "-Hcrc" option to cause HTAR to display the CRC value for each member file.


htar -cvf case3/myproject.tar -Hcrc -Hverify=2 tim* --- (1)
HTAR: a tim1.txt 
HTAR: a tim2.txt
HTAR: a tim2a.txt
HTAR: a tim3.a
HTAR: a time.txt
HTAR: a time2.gif
HTAR: a htar -cvf case3/myproject.tar -Hcrc -Hverify=2 tim* --- (1)
HTAR: a tim1.txt 
HTAR: a tim2.txt
HTAR: a tim2a.txt
HTAR: a tim3.a
HTAR: a time.txt
HTAR: a time2.gif
HTAR: a /tmp/HTAR_CF_CHK_28128_1220351451
HTAR Create complete for case3/myproject.tar. 399,360 bytes written for 6 member files, max threads: 7 Transfer time: 0.041 seconds (9.857 MB/s)
HTAR: V tim1.txt, 9367 bytes, 20 media blocks
HTAR: V tim2.txt, 10568 bytes, 22 media blocks
HTAR: V tim2a.txt, 4396 bytes, 10 media blocks
HTAR: V tim3.a, 253839 bytes, 497 media blocks
HTAR: V time.txt, 1786 bytes, 5 media blocks
HTAR: V time2.gif, 113512 bytes, 223 media blocks
HTAR: V /tmp/HTAR_CF_CHK_28128_1220351451, 256 bytes, 2 media blocks
HTAR: Verify complete. 0 total errors, 7 total objects (7 Files,0 Dirs,0 Hard Links,0 symlinks)
HTAR: HTAR SUCCESSFUL



htar -Hcrc -tvf case3/myproject.tar --- (2)
HTAR: -rw-r--r-- [0xf4124da1] gleicher/hpss 9367 2005-02-04 09:05 tim1.txt
HTAR: -rw------- [0x6a20d2c7] gleicher/hpss 10568 2008-09-02 03:12 tim2.txt
HTAR: -rwx------ [0xaee5f5b9] gleicher/hpss 4396 2008-09-02 03:13 tim2a.txt
HTAR: -rwx------ [0x036ebadc] gleicher/hpss 253839 2008-09-02 03:13 tim3.a
HTAR: -rwx------ [0x7acb1840] gleicher/hpss 1786 2008-09-02 03:14 time.txt
HTAR: -rw------- [0xcc713d8a] gleicher/hpss 113512 2008-09-02 03:14 time2.gif
HTAR: -rw------- [0x781f3241] gleicher/hpss 256 2008-09-02 03:30 /tmp/HTAR_CF_CHK_28128_1220351451
HTAR: Listing complete for case3/myproject.tar, 7 files 7 total objects
HTAR: HTAR SUCCESSFUL

5 Rebuilding a Missing Index 

OBJECTIVE:  To rebuild the missing index file for a stored HTAR archive file and thereby (re)enable blocked access to the files within it (and extract some). 


STRATEGY:  

  1. 1. You try to retrieve all files (-xvf) from the HTAR archive myproject.tar in the case3  subdirectory of your HPSS home directory.
  2. 2. But HTAR cannot find the external index file (here, called myproject.tar.idx) for this  archive, and it returns an error message, retrieves no requested files, and ends. (File myproject.tar.idx may have been moved, renamed, or accidentally deleted from HPSS.) 
  3. 3. So you execute HTAR again with the special action -X (uppercase, not lowercase,  eks) to request rebuilding the external index for the (same) disabled archive.
  4. 4. HTAR opens a preauthenticated connection to your HPSS home directory,  locates the archive in subdirectory case3, scans (but does not retrieve) its contents, and thereby creates a new myproject.tar.idx file (temporarily on local disk, then moved to the same HPSS directory as the archive file that it supports). HTAR ends. 
  5. 5. Now you again try your original (1) file-retrieval request
  6. 6. HTAR opens a preauthenticated connection to your HPSS home directory  and reports its housekeeping activities (very quickly, in lines that overwrite, so you may not notice all of these status reports on your screen). 
  7. 7. HTAR uses its (newly rebuilt) external index to locate the files within the archive and  transfers them by parallel connections to your local machine (it transfers all of them because there is no filelist on the command line). 
  8. 8. HTAR summarizes the work done (time, rate, amount) and then ends.
  9. htar -xvf case3/myproject.tar ---(1)  
  10. HTAR: Opening HPSS server connection  
  11. HTAR: Getting HPSS site info  
  12. ERROR: Received unexpected reply from server: 550 ---(2)  
  13. ERROR: Error -1 getting Index File attributes...  
  14. HTAR: HTAR FAILED  
  15. ###WARNING htar returned non-zero exit status.  
  16. 72 = /usr/local/bin/htar.exe...  
  17. htar -Xf case3/myproject.tar ---(3)  
  18. HTAR: Opening HPSS server connection ---(4)  
  19. HTAR: Reading archive  
  20. HTAR: Copying Index File to HPSS... creating file  
  21. HTAR: HTAR SUCCESSFUL  
  22. htar -xvf case3/myproject.tar ---(5)  
  23. HTAR: Opening HPSS server connection ---(6)  
  24. HTAR: Reading index file
  25. HTAR: Opening archive file  
  26. HTAR: Reading archive file ---(7)  
  27. HTAR: x tim1.txt, 3503 bytes, 8 media blocks  
  28. HTAR: x tim2.txt, 4310 bytes, 10 media blocks  
  29. HTAR: x tim2a.txt, 5221 bytes, 12 media blocks  
  30. HTAR: x tim3a., 5851 bytes, 13 media blocks  
  31. HTAR: x time.txt, 1085 bytes, 4 media blocks  
  32. HTAR: x time2.gif, 3452 bytes, 8 media blocks  
  33. HTAR: Extract complete. total bytes read: ---(8)  
  34. 28,160 in 0.141 seconds (0.200 MB/s)  
  35. HTAR: HTAR SUCCESSFUL  



6 Specifying Very Many Files 

OBJECTIVE:  To specify a very large number of input files yet avoid an HTAR command line too long for the shell to accept. 


STRATEGY:  The input-line-too-long problem is analyzed and explained in the HTAR Limitations section above. The best solution is to keep in a separate directory all and only the intended input files, and then specify that directory's name on HTAR's command line (for recursive processing). But if you failed to take that precaution, you can work around the problem of having too many file names for the shell to accept by using the UNIX FIND utility as shown here. 


  1. 1. Run FIND to select the files that you want (here, those whose names begin with T)  in a way that processes the file list internally, not by the shell. (The HTAR Limitations (page 23) section above explains why FIND's -EXEC option will not finish the job here.) Then pipe (|) FIND's output directly into HTAR to build the stored archive that you want (-cf) using the -L option with a hyphen (-) argument to enable standard input (off by default).

2. Without -v, HTAR shows no verification but does copy its index file to storage when  the archive is successfully stored. 

	find . -name 't*' -print | htar -cf test.tar -L - ---(1)
	HTAR: Opening HPSS server connection  
	HTAR: creating HPSS archive file test.tar  
	HTAR: Copying index file to HPSS...creating file ---(2)  
	HTAR: HTAR SUCCESSFUL  


TAR and HTAR Compared  

TAR was originally intended to write a specified set of files to offline tape (or retrieve them), and hence  

by extension, to simply write (or retrieve) a specified set of files to a local envelope or library file for easier  management. 


HTAR in many ways returns TAR to its roots because it is specifically designed to efficiently  store a set of files together in HPSS or get them back, not merely to make an archive file and leave it, although you can force HTAR to do that. 


This table compares the more familiar TAR features and effects with those (often enhanced) of HTAR:  

Feature

TAR

HTAR

Can create an archive file without storing  it in HPSS? 

Yes (the default)  

Only with -E or -F  

Can create an archive file without using  

local disk space?  

No  

Yes (the default)  

Can store an archive file while creating  

it?  

No, needs HSI to store the archive file after creating on local disk

Yes (the default)  

Can store an (existing) archive file

No – must create the archive

No – must create the archive

Can write an archive to (offline) tape? 

Yes  

No (to HPSS disk)  

Can write an archive to another machine? 

No  

Yes, with -F  

Can read an archive from another  machine? 

No  

Yes, with -F

Can read any TAR archive file? 

Yes (the default)  

Yes, if -X first  

Can read any HTAR archive file? 

Yes  

Yes (the default)  

Can extract just one file from an HPSS-resident archive without reading all files preceding it? 

No  

Yes (the default)  

Can add file(s) to an existing archive? 

Yes

No  

Can create and verify CRC checksums of member files

No

Yes

Can verify contents of a newly created archive as part of the creation operation

No

Yes

Treats input directories  recursively? 

Yes  

Yes  

Default target if no archive specified?

Yes (tape)  

No, -f required  

Option -L disables directory recursion? 

Yes (under AIX  but not Linux) 

No  

Preserves original permissions on files? 

No (uses UMASK)

Yes, with -p

Depends on HPSS availability to work? 

No  

Yes  

Archive duplicated automatically  in storage? 

No

Only with -Y 150

Builds and needs an external  index file? 

No  

Yes  

Builds and needs a consistency  check file? 

No  

Yes  

Overwrites existing files without  warning? 

Yes  

Yes (-w disables)  

Can use standard input or output? 

Yes (with -f -)  

Yes (with -L, -O)  

Order of options important? 

Somewhat  

Somewhat  

Table of contents (-t) reveals  what? 

File names only  

File names and  properties 


1 Keyword Index 

To see an alphabetical list of keywords for this document, consult the next section.  

Keyword Description  
------- -----------  
entire This entire document.  
title The name of this document.  
scope Topics covered in this document.  
availability Where HTAR runs.  
who Who to contact for assistance.  
introduction General HTAR overview, analysis.  
htar-role Goals, scope, performance of HTAR.  
htar-files Three key HTAR files diagrammed.  
tar-comparison TAR and HTAR features compared.  
htar-usage How to use HTAR.  
execute-line Required syntax, features, defaults.  
htar-errors Common errors conditions, warnings.  
limitations Known HTAR limitations, work-arounds.  
input-output How to use standard input, output.  
environment-variables Env. variables used by HTAR.  
options HTAR options grouped, explained.  
action HTAR's action options (1 reqd).  
archive HTAR's archive option (always reqd).  
control HTAR's control options.  
examples Annotated sample HTAR sessions.  
create-archive How to make an HTAR archive.  
crc-generation How to add Cyclic Redundancy Checksums during creation.
retrieve-files How to extract HTAR files.  
rebuild-index How to rescue a lost HTAR index.  
verify-archive How to verify the contents of an archive during creation
many-files Specifying very many input files.  
find-input Specifying very many input files.  
index The structural index of keywords.  
a The alphabetical index of keywords.  
date The latest changes to this document.  
revisions The complete revision history.  



Alphabetical List of Keywords  

Keyword Description  
------- -----------  
a The alphabetical index of keywords.  
action HTAR's action options (1 reqd).  
archive HTAR's archive option (always reqd).  
availability Where HTAR runs.  
between-machines Archiving between NONstorage machines.  
control HTAR's control options.  
create-archive How to make an HTAR archive.  
date The latest changes to this document.  
entire This entire document.  
environment-variables Env. variables used by HTAR.  
examples Annotated sample HTAR sessions.  
execute-line Required syntax, features, defaults.  
find-input Specifying very many input files.  
htar-errors Common errors conditions, warnings.  
htar-files Three key HTAR files diagrammed.  
htar-role Goals, scope, performance of HTAR.  
htar-usage How to use HTAR.  
index The structural index of keywords.  
input-output How to use standard input, output.  
introduction General HTAR overview, analysis.  
limitations Known HTAR limitations, work-arounds.  
many-files Specifying very many input files.  
options HTAR options grouped, explained.  
rebuild-index How to rescue a lost HTAR index.  
retrieve-files How to extract HTAR files.  
revisions The complete revision history.  
scope Topics covered in this document.  
tar-comparison TAR and HTAR features compared.  
title The name of this document.  
who Who to contact for assistance.  




Date and Revisions  

Revision Keyword Description of  
Date Affected Change  
-------- -------- ------  
20Aug07 all Original document, based upon LLNL's 
            “HTAR Reference Manual (UCRL-WEB-200720)”.