Linux File Operations and Search
- Description: Reading files (
cat,less), comparing them (cmp,diff), searching the filesystem (find), and managing symbolic vs hard links (ln). Plus thefind ... -execvsxargsdistinction, which trips up most newcomers. - My Notion Note ID: K2B-3-4
- Created: 2020-06-03
- Updated: 2026-05-19
- License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io
Table of Contents
- 1. Reading Files:
cat,less,head,tail - 2. Output Redirection
- 3. Comparing Files:
cmpanddiff - 4.
find— Filesystem Search - 5.
find -execvsxargs - 6. Symbolic and Hard Links
- 7. References
1. Reading Files: cat, less, head, tail
cat [OPTION]... [FILE]...— concatenate files to stdout. With no FILE, read stdin.-n, --number— number all output lines.-b, --number-nonblank— number only nonempty lines.-A— show non-printables (-v,-E,-Tcombined): tabs as^I, EOL as$, control chars as^X.-s, --squeeze-blank— collapse runs of blank lines.
- Pitfall —
cat file \| grep .... This is the canonical "useless use ofcat."grep PATTERN filedoes the same thing without the pipe;catis only needed to concatenate multiple files or to read stdin into a tool that only takes filenames. - For browsing big files use a pager instead of
cat:less FILE— interactive, supports/search,n/N,g/G,q, follows tail withF.head -n 20 FILE/tail -n 20 FILE— first / last N lines.tail -f FILE— follow appended output (log files).tail -Fre-opens on rotation.
2. Output Redirection
cmd > file— overwrite.cmd >> file— append.cmd < file— feed file as stdin.cmd 2> err.log— redirect stderr (FD 2).cmd > out 2>&1— merge stderr into stdout, then toout.cmd &> out(bash) is the same.cmd1 \| cmd2— pipe stdout ofcmd1to stdin ofcmd2.cmd \| tee file— split: write tofileand continue down the pipeline.cat > filethen typing thenCtrl-Dis the original "type text into a file" recipe, butprintf 'line\n' > fileor a real editor is cleaner.
3. Comparing Files: cmp and diff
cmp FILE1 FILE2— byte-level comparison. Exits 0 if identical, 1 if different, 2 on error. Use on binaries.-l— print every differing byte.-s— silent; use in scripts that only care about the exit code.
diff [OPTION]... FILE1 FILE2— line-level comparison. Designed for text.- Default output is "normal diff" (
</>markers). -u, --unified[=N]— unified diff (whatgit diffproduces). Almost always what you want.-c, --context[=N]— context diff (older format).-y, --side-by-side— two columns; useful for short files.-W NUMsets column width.--suppress-common-lines— only show changes (with-y).-r, --recursive— compare directory trees.-q, --brief— only print whether files differ.-i— ignore case.-w— ignore all whitespace.-B— ignore blank lines.
- Default output is "normal diff" (
- Symbols in the default and
-youtput:<line present only in the first file>line present only in the second file\|line differs between the two (side-by-side)
- For three-way merges or interactive diffs reach for
diff3,vimdiff, orgit diff --no-index FILE1 FILE2.
4. find — Filesystem Search
find [PATH...] [EXPRESSION]— walk a directory tree and apply tests. Path defaults to.; expression defaults to-print.- Tests (most-used):
| Test | Matches |
|---|---|
-name 'glob' |
basename matches glob (quote it, otherwise the shell expands first) |
-iname 'glob' |
case-insensitive -name |
-path 'glob' |
full path matches glob |
-type f/d/l/b/c/p/s |
file / dir / symlink / block / char / pipe / socket |
-size N[ckMG] |
size (in 512-byte blocks by default; c=bytes, k=KiB, M=MiB, G=GiB) |
-mtime ±N |
modified N×24h ago (-7 = within 7 days, +30 = older than 30 days) |
-mmin ±N |
minutes instead of days |
-newer FILE |
newer than FILE's mtime |
-user NAME / -group NAME |
by owner / group |
-perm /222 |
any of the write bits set |
-empty |
empty file or dir |
- Combinators:
-and(implicit),-or(-o),-not(!).- Parentheses must be escaped or quoted:
find . \( -name '*.c' -o -name '*.h' \) -print.
- Pruning — stop descending into a subtree:
find . -name node_modules -prune -o -name '*.js' -print- The
-prunemust come before-oand the action must be explicit on the right side.
- Actions:
-print(default),-print0(null-terminated, safe for filenames with spaces/newlines).-delete— remove matching entries. Use only after a-printdry run.-exec CMD {} \;— run CMD once per file.\;ends the exec clause.-exec CMD {} +— batch many files into one invocation. Faster, likexargs.-ok— same as-execbut prompts before each invocation.
5. find -exec vs xargs
Two ways to feed find results into another command, with different tradeoffs.
find ... -exec CMD {} \; |
find ... -exec CMD {} + |
find ... -print0 | xargs -0 CMD |
|
|---|---|---|---|
| One process per file | yes (slow on big trees) | no, batched | no, batched |
| Handles spaces / newlines in names | yes | yes | yes (with -0 / -print0) |
| Stops on first failure | no | no | with -x flag |
| Position of file arg | wherever {} is |
end of arg list | end of arg list (or with -I token) |
- Pitfall — naïve
find ... \| xargs(no-print0/-0) breaks on filenames containing whitespace or quotes. Always use the null-delimited form, or just use-exec ... +. xargs -P Nruns N invocations in parallel — handy for embarrassingly parallel work likefind . -name '*.jpg' -print0 \| xargs -0 -P 4 -n 1 mogrify -resize 800x.
Useful find -exec sh -c pattern for complex commands (referenced in the original [Y] note):
find . -name '*.log' -exec sh -c 'gzip "$1" && mv "$1.gz" /archive/' _ {} \;
The _ becomes $0 (script name placeholder); {} becomes $1. This lets you use shell features (pipes, variables, redirection) per file without quoting hell.
6. Symbolic and Hard Links
- Hard link — a second directory entry pointing at the same inode. Indistinguishable from the original; deleting one entry doesn't free the file until the last link is gone. Limited to the same filesystem and (typically) not allowed on directories.
- Symbolic link (symlink, soft link) — a small file whose content is a path. Can cross filesystems, can dangle (point to nothing). Behaves like a shortcut.
ln TARGET LINK_NAME— create a hard link.ln -s TARGET LINK_NAME— create a symlink.- Common flags:
| Flag | Meaning |
|---|---|
-s, --symbolic |
make a symbolic link |
-f, --force |
remove an existing destination |
-n, --no-dereference |
treat the destination as a normal file (don't follow if it's a symlink to a directory) |
-v, --verbose |
print each link as created |
-r, --relative |
compute a relative path from LINK_NAME to TARGET (cleaner symlinks) |
- Pitfall —
ln -sf newtarget mylinkwhenmylinkalready points to a directory silently createsmylink/<basename(newtarget)>inside the old target instead of replacing the symlink. Add-nto force replacement:
ln -sfn /a/new/path mylink
This is the classic recipe from the original [Y] note, and it stays a footgun because the default behavior is rarely what you want when re-pointing a symlink-to-a-dir.
- Inspect symlinks:
ls -lshowslink -> target.readlink LINKprints the raw target;readlink -f LINKresolves the full chain to a canonical path.
7. References
man cat,man less,man diff,man cmp,man find,man xargs,man ln,man readlink- GNU
findutilsmanual — https://www.gnu.org/software/findutils/manual/html_mono/find.html - "Useless use of cat" award — http://porkmail.org/era/unix/award.html
- Symlink replacement: https://unix.stackexchange.com/questions/151999/how-to-change-where-a-symlink-points