Why do some files take up more disk space after being copied?

Why do some files take up more disk space after being copied? Why are the sizes reported by ls -l and du different?


Some files -- core files being one common example -- contain "holes", areas
which were seeked over without being written. These files are called
"sparse". When read back, these areas appear to contain zeros; however
they do not occupy disk space. The "length" of such a file (as reported by
"ls -l") will exceed its "size" (as reported by "ls -s" and reflected in
the results of du or df).

cp, cpio, and tar do not detect holes; they read and copy the zeros,
and the resulting files will contain all-zero blocks (which occupy
space) where the input files contained holes (which do not).

dump will detect holes in the dumped files, and restore will reproduce
them.

Thanks to Perry Hutchison

GNU tar has an "-S" option which preserves holes, and Joerg Schilling's
"star" has "-sparse" and "-force_hole" options which can be used to
preserve and re-insert holes, respectively. star is available for download
at ftp://ftp.fokus.gmd.de/pub/unix/star



Home
FAQ