Re: cvs commit: src/usr.bin/tar Makefile bsdtar.1 bsdtar.c bsdtar.h bsdtar_platform.h matching.c read.c util.c write.c

From: Tim Kientzle (tim_at_kientzle.com)
Date: 04/06/04

  • Next message: Mathew Kanner: "Re: dev/sound/pcm/sound.h shadows INTR_MPSAFE declaration"
    Date: Tue, 06 Apr 2004 11:14:48 -0700
    To: Ruslan Ermilov <ru@FreeBSD.org>
    
    

    Ruslan Ermilov wrote:
    > On Mon, Apr 05, 2004 at 02:32:18PM -0700, Tim Kientzle wrote:
    >
    >>kientzle 2004/04/05 14:32:18 PDT
    >>
    >> FreeBSD src repository
    >>
    >> Added files:
    >> usr.bin/tar Makefile bsdtar.1 bsdtar.c bsdtar.h
    >> bsdtar_platform.h matching.c read.c
    >> util.c write.c
    >> Log:
    >> Initial commit for bsdtar.
    >>
    >
    > Awesome! Are there some benchmarking results available?

    I haven't focused very closely on performance yet, to be honest, though
    the internal architecture is pretty clean (minimal data copying;
    reuse of internal buffers to avoid heap thrashing).

    I did some quick tests early on and the performance (on dearchiving)
    was roughly comparable to gnutar. (Within about 5-10%.) That will
    improve some as I continue to work on it. However, in general,
    I expect it to be a little bit slower because the compression
    isn't handled in a separate process (thus there's less overlapping
    of I/O and computation).

    But, there are a lot of nice new features:

      * Fully automatic format/compression detection.
         In particular, the following commands all work:

           bsdtar -xf file.tgz
           bsdtar -xf file.tbz
           bsdtar -xf file.cpio

        or even

           fetch -o - http://...../file.tgz | bsdtar -xf -

        GNU tar can't do any of these; 'star' fails the last
        one. To be fair, "Heirloom tar" does support all of these.

      * Ability to interpolate an archive. The following
        combines the contents of "foo1.tgz" and "foo2.cpio"
        into a single archive called "out.tbz":

          bsdtar -cjf out.tbz @foo1.tgz @foo2.cpio

        Yes, you can mix interpolations and regular files on
        the command line. You can even interpolate from stdin:

          bsdtar -cjf - -F pax @-

        converts an archive read on stdin into a pax-format,
        bzip2-compressed archive on stdout. Once I get mtree
        read support, you'll be able to convert an mtree file
        into a shell script, for example:
            bsdtar -cf tree.sh -F shar @tree.mtree

      * Compliance with SUSv2. SUSv2 (POSIX.1-1997 ?) was
        the last official spec for tar. GNU tar does not
        comply with the file format specified there, nor does
        it correctly implement the command-line options specified
        there. By default, bsdtar will create standard ustar
        archives unless it finds a file attribute that is not
        supported by ustar (such as a very long filename or ACL),
        in which case it will use SUSv3 (POSIX.1-2001) extensions
        to carry the additional data. There are command-line options
        to force straight ustar format or permit SUSv3 ("pax")
        extensions even when not absolutely required. (The default
        format won't use SUSv3 extensions just to store atime/ctime
        or sub-second timestamps; specifying "pax" format will.)

      * Support for SUSv3 extensions. The "pax" format extensions
        eliminate essentially all of the historic limitations of
        tar in a way that is easily extensible and compatible with
        standard-compliant "pax" implementations on other platforms.
        (as well as some modern tar implementations, notably Joerg
        Schilling's "star")

      * More complete archiving. With the "pax" format, bsdtar will
        archive ino/dev/nlink, sub-second resolution mtime/ctime/atime,
        ACLs, file flags, etc, etc. Not all of this can currently be
        restored (ino/dev/nlink/ctime are currently ignored on extract),
        but it's all stored in the archive.

      * Broad format support. bsdtar reads the usual bevy of tar formats,
        and some cpio archives (only the odc variant at the moment).
        It writes standard tar formats, cpio, and shar. The
        underlying libarchive library is extensible and I have plans
        for reading mtree files, reading/writing more cpio
        formats, reading ZIP archives, etc.

      * Cleanly factored. The archive format support is all in a separate
        library. It should be fairly routine to build "cpio" or "pax"
        command-line interfaces to the same library or use the library for
        "pkg_install" or "pkg_create." For comparison, right now "bsdtar"
        is ~2,000 lines of C, "libarchive" is closer to 10,000 lines of C.

    There is some performance work to be done; I need to build
    a uid/gid/uname/gname cache, for example. Part of my recent rewrite
    of the ACL support was to get to the point that there was one
    place where all such lookups were handled, regardless of whether
    it's a file owner or an ACL that needs the information.

    There are still a few bugs to iron out and a couple of features that
    are a bit incomplete, but it's getting better quickly. My hope
    is that a few adventurous souls will start using it and giving
    me feedback so that I can grow it into the system tar
    that FreeBSD deserves.

    Tim

    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"


  • Next message: Mathew Kanner: "Re: dev/sound/pcm/sound.h shadows INTR_MPSAFE declaration"

    Relevant Pages

    • Re: Tar output mode for installworld
      ... file1 uname=root gname=wheel ... In particular, tar requires ... interesting to extend the tar format with a "whiteout" ... tar archives as well. ...
      (freebsd-hackers)
    • Re: FreeBSD tar errors on valid empty tar.gz
      ... returns an error and generates a 0-byte tgz; as previously shown, BSD tar in 6.3 treats that as an empty archive, which seems reasonable, whereas gtar feeds it to gzip which generates an error: ... It turns out that empty archives are a tricky case. ... when it tried to determine the format. ...
      (freebsd-stable)
    • Re: Difference between tar and ar
      ... This description could apply equally well to tar. ... tar format which makes it unsuitable for executable code libraries? ... ar uses a binary format for headers, which makes its archives ... Yeah, the headers are still small, each being one line 59 characters ...
      (comp.unix.questions)
    • Re: From the LuxAsm list.
      ... > therefore Luxasm uses the TIS 1.2 ELF format, which, to my knowledge, ... kind of _archive object code collection_ format I'm referring to...how they ... this actually is also an implicit "import" from the "librarian" ... dealing with multiple archives should be allowed ...
      (alt.lang.asm)
    • SUMMARY: Resuming tar restore
      ... the option suggested on the tar man page to ... extract archives based no regular expressions is a no-brainer. ... it should work nicely with disk archives. ... command and the restore was interrupted. ...
      (SunManagers)