Linux Text Processing Commands Guide

Text Processing Commands

Commands for displaying, searching, editing, and processing text files.

cat - Display and Concatenate Files

Displays file contents or concatenates multiple files.

Option Description
-n Display line numbers
-b Display line numbers (except for blank lines)
-s Squeeze multiple blank lines into one
-A Display control and special characters (tabs, line breaks, etc.)

Examples:

cat file.txt - Display file contents
cat file1.txt file2.txt - Concatenate and display multiple files
cat file1.txt file2.txt > combined.txt - Concatenate multiple files and save to a new file

grep - Search Text

Searches for text patterns in files.

Option Description
-i Case-insensitive search
-v Display lines that don't match the pattern
-n Display line numbers
-r, -R Search recursively through directories
-l Display only filenames containing matches
-c Display count of matching lines
-E Use extended regular expressions
-A n Display n lines after the match
-B n Display n lines before the match
-C n Display n lines before and after the match

Examples:

grep "pattern" file.txt - Search for pattern in file
grep -i "pattern" file.txt - Case-insensitive search
grep -r "pattern" directory/ - Search recursively through all files in directory
grep -E "pattern1|pattern2" file.txt - Search for multiple patterns (OR)
grep -v "^#" config.txt | grep -v "^$" - Exclude comment lines and blank lines (using pipe)
ps aux | grep "[f]irefox" - Search for firefox processes excluding the grep process itself
grep -A 2 -B 1 "ERROR" log.txt - Display error lines with context
grep -o "([0-9]\{1,3\}\.){3}[0-9]\{1,3\}" file.txt - Extract only IP addresses
find . -name "*.log" -exec grep -l "ERROR" {} \; | xargs wc -l - Count lines in log files containing errors
history | grep "git commit" | sed 's/^[ 0-9]*//' - Extract only git commit commands from history

sed - Stream Editor

Performs text transformations like substitution, deletion, and insertion.

Option/Syntax Description
-i Edit files in-place (default is to output to stdout)
-e Specify multiple commands
-n Suppress automatic printing (print only specific patterns)
s/pattern/replacement/ Substitute pattern
s/pattern/replacement/g Substitute all occurrences of pattern (g means global)
d Delete lines matching pattern
p Print lines matching pattern (usually used with -n)

Examples:

sed 's/old/new/' file.txt - Replace first occurrence of "old" with "new" on each line
sed 's/old/new/g' file.txt - Replace all occurrences of "old" with "new"
sed -i 's/old/new/g' file.txt - Edit file in-place
sed '/pattern/d' file.txt - Delete lines matching pattern
sed -n '/pattern/p' file.txt - Print only lines matching pattern
sed -i.bak 's/old/new/g' file.txt - Create backup before editing file
sed '1,5s/old/new/g' file.txt - Replace "old" with "new" only in lines 1-5
sed '/start/,/end/d' file.txt - Delete range of lines from "start" to "end" pattern
sed 's/[0-9]\{3\}-[0-9]\{4\}/XXX-XXXX/g' file.txt - Mask phone numbers
cat file.txt | sed 's/^[ \t]*//' - Remove leading whitespace from each line
echo "hello world" | sed 's/\b\(.\)/\u\1/g' - Capitalize first letter of each word
sed -e 's/old/new/g' -e 's/foo/bar/g' file.txt - Perform multiple substitutions at once
grep "ERROR" log.txt | sed 's/.*ERROR: \(.*\)/\1/' - Extract only error messages

awk - Text Processing Language

A programming language for text processing and data extraction.

Syntax/Pattern Description
'{print $1}' Print first field (column) of each line
'{print $1, $3}' Print first and third fields
-F Specify field separator (default is whitespace)
'/pattern/ {action}' Perform action on lines matching pattern
NR Current line number
NF Number of fields in current line

Examples:

awk '{print $1}' file.txt - Print first field of each line
awk -F, '{print $1, $3}' file.csv - Print first and third fields from CSV file
awk '/pattern/ {print $0}' file.txt - Print lines matching pattern
awk '{sum += $1} END {print sum}' file.txt - Calculate and print sum of first field
awk '{count[$1]++} END {for (word in count) print word, count[word]}' file.txt - Count occurrences of each word
awk -F, '{if ($3 > 100) print $1, $2, $3}' data.csv - Print lines where third field is greater than 100
awk 'NR % 2 == 0' file.txt - Print only even-numbered lines
awk 'length($0) > 80' file.txt - Print lines longer than 80 characters
awk 'BEGIN {FS=","; OFS="\t"} {print $1, $2, $3}' file.csv - Convert CSV to TSV
ps aux | awk '$3 > 10.0 {print $2, $3, $11}' - Print processes using more than 10% CPU
cat /etc/passwd | awk -F: '{print "User: " $1 ", Home: " $6}' - Format user information
ls -l | awk '{sum += $5} END {printf "Total size: %.2f MB\n", sum/1024/1024}' - Calculate total file size in MB
grep "ERROR" log.txt | awk -F: '{print $1}' | sort | uniq -c - Count errors by file

head/tail - Display Beginning/End of Files

Display the beginning or end portions of files.

Command/Option Description
head file.txt Display first 10 lines of file
head -n N file.txt Display first N lines of file
tail file.txt Display last 10 lines of file
tail -n N file.txt Display last N lines of file
tail -f file.txt Display last lines and follow file updates in real-time (useful for log monitoring)

Examples:

head -n 5 file.txt - Display first 5 lines
tail -n 20 file.txt - Display last 20 lines
tail -f /var/log/syslog - Monitor system log in real-time
head -n 5 file.txt | tail -n 1 - Display only line 5
tail -n +10 file.txt | head -n 5 - Display lines 10-14
ls -l | head -n 5 - Display first 5 files in directory
find . -type f -name "*.log" | head - Display first 10 log files
ps aux | sort -nrk 3,3 | head -n 5 - Display top 5 CPU-consuming processes
tail -f log.txt | grep --color "ERROR" - Monitor log with highlighted errors
tail -f /var/log/apache2/access.log /var/log/apache2/error.log - Monitor multiple log files simultaneously
find . -type f -mtime -1 | xargs wc -l | head - Display line counts of 10 most recently modified files

sort - Sort Text

Sorts lines of text files.

Option Description
-r Sort in reverse (descending) order
-n Sort numerically
-k N Sort by Nth field (column)
-t Specify field separator
-u Remove duplicates (output unique lines only)
-f Case-insensitive sort

Examples:

sort file.txt - Sort file alphabetically
sort -r file.txt - Sort in reverse order
sort -n numbers.txt - Sort numerically
sort -k 2 -t, file.csv - Sort CSV file by second field
sort -k 2n -t: file.txt - Sort numerically by second field (delimiter is :)
sort -k 3,3 -k 1,1 file.txt - Sort by third field, then by first field
sort -u file.txt - Sort and remove duplicates
sort -h sizes.txt - Sort by human-readable sizes (K, M, G)
du -h | sort -hr - Sort directory sizes in descending order
ps aux | sort -nrk 3,3 | head -n 10 - Display top 10 CPU-consuming processes
cat /etc/passwd | sort -t: -k 3 -n - Sort users by UID
find . -type f -name "*.log" | xargs ls -l | sort -k 5 -nr - Sort log files by size

uniq - Process Duplicate Lines

Processes duplicate lines (typically used with sort).

Option Description
-c Display count of occurrences
-d Display only duplicate lines
-u Display only unique lines (no duplicates)
-i Case-insensitive comparison

Examples:

sort file.txt | uniq - Sort and remove duplicate lines
sort file.txt | uniq -c - Count occurrences of each line
sort file.txt | uniq -d - Display only duplicate lines
sort file.txt | uniq -u - Display only unique lines
sort file.txt | uniq -c | sort -nr - Display lines by frequency (most frequent first)
cat access.log | cut -d' ' -f1 | sort | uniq -c | sort -nr - Count IP address access frequency
cat file.txt | tr '[:upper:]' '[:lower:]' | sort | uniq - Case-insensitive duplicate removal
history | cut -c8- | sort | uniq -c | sort -nr | head -n 10 - Display top 10 most used commands
find . -type f -name "*.txt" | xargs cat | sort | uniq -c - Count occurrences of lines across all text files
grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' access.log | sort | uniq -c | sort -nr - Count IP address frequency in log file

wc - Count Lines, Words, Bytes

Counts lines, words, and bytes in files.

Option Description
-l Count lines only
-w Count words only
-c Count bytes only
-m Count characters only

Examples:

wc file.txt - Display line, word, and byte counts
wc -l file.txt - Display line count only
wc -w file.txt - Display word count only
wc -c file.txt - Display byte count only
wc -m file.txt - Display character count (correctly counts multibyte characters)
find . -name "*.py" | xargs wc -l - Count lines in all Python files
find . -name "*.py" | xargs wc -l | sort -nr - Sort Python files by line count
grep -v "^#" script.sh | grep -v "^$" | wc -l - Count non-comment, non-blank lines in script
cat file.txt | tr -s ' ' '\n' | sort | uniq | wc -l - Count unique words in file
ls -la | wc -l - Count files in directory (including header line)
for file in *.txt; do echo -n "$file: "; wc -l < "$file"; done - Display line count for each text file
find . -type f -exec cat {} \; | wc -l - Count total lines in all files in directory