Unix Tools and Simple Automation
Master the text-processing toolkit that powers DevOps automation: grep, pipes, redirection, and your first useful shell scripts.
Searching text with grep
grep is one of the most frequently used tools in a DevOps engineer's day. Its name stands for Global Regular Expression Print, and it does exactly what that suggests: it searches files (or input) for lines matching a pattern and prints them.
# Search for 'error' in a log file
grep 'error' server.log
# Case-insensitive search
grep -i 'error' server.log
# Show line numbers alongside matches
grep -n 'error' server.log
42:ERROR: connection timeout
87:ERROR: database unreachable
# Count matching lines
grep -c 'error' server.log
23
# Show lines that do NOT match
grep -v 'DEBUG' server.log
# Search recursively through all files in a directory
grep -r 'TODO' ./src
# Show the filename with matches (useful with -r)
grep -rn 'TODO' ./src
./src/auth.py:15:# TODO: add input validation
./src/api.py:42:# TODO: handle timeout
# Show N lines of context around each match
grep -B2 -A2 'CRITICAL' server.log
# 2 lines Before and After
Regular expressions basics
grep supports regular expressions (regex) — a powerful pattern language. Mastering regex is a significant skill in its own right, but a handful of patterns covers the majority of practical cases.
# Lines starting with 'ERROR'
grep '^ERROR' server.log
# Lines ending with a number
grep -E '[0-9]+$' server.log
# IP addresses (simple pattern)
grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' access.log
# Lines containing a word boundary match for 'fail'
grep -w 'fail' server.log
# matches 'fail' but not 'failure'
# Use -E (extended regex) for + ? | and ()
grep -E 'ERROR|CRITICAL' server.log
# match either word
You will sometimes see egrep used instead of grep -E. They are equivalent. grep -E is preferred in modern scripts because it makes the behaviour explicit.
Processing text: sort, uniq, cut, wc
These four tools form a text-processing toolkit that you will combine constantly with grep and pipes.
sort — order lines
# Sort alphabetically
sort names.txt
# Sort in reverse order
sort -r names.txt
# Sort numerically (not lexicographically)
sort -n numbers.txt
# Sort by second column (tab-separated)
sort -k2 data.tsv
# Reverse numeric sort (largest first)
sort -rn scores.txt
uniq — deduplicate consecutive lines
# Remove consecutive duplicate lines
sort names.txt
| uniq
# Count occurrences
sort names.txt
| uniq -c
3 alice
1 bob
5 charlie
# Show only lines that appear more than once
sort names.txt
| uniq -d
Important: uniq only removes consecutive duplicates. You almost always need to sort before uniq to get meaningful deduplication.
cut — extract columns
# Extract first column from comma-separated file
cut -d, -f1 data.csv
# Extract columns 1 and 3
cut -d, -f1,3 data.csv
# Extract characters 1-10 from each line
cut -c1-10 file.txt
# Extract all columns from 5 onwards
cut -d: -f5- /etc/passwd
wc — count things
# Lines, words, characters
wc file.txt
156 1203 8432 file.txt
wc -l file.txt
# lines only
wc -w file.txt
# words only
wc -c file.txt
# bytes
Pipes and redirection
Pipes and redirection are what turn individual tools into a powerful automation system. They follow the Unix philosophy: each program does one thing well, and you compose them freely.
The pipe operator
# How many ERROR lines in the log?
grep 'ERROR' server.log
| wc -l
47
# Top 5 most frequent errors
grep 'ERROR' server.log
| sort
| uniq -c
| sort -rn
| head -5
# Find all Python files modified today
find . -name '*.py'
| xargs ls -lt
| head
# Check if a process is running
ps aux
| grep python3
Redirection
# Write stdout to file (overwrite)
ls -l
> listing.txt
# Append stdout to file
echo 'new entry'
>> log.txt
# Read stdin from file
wc -l
< big-file.txt
# Redirect stderr (errors) to file
python3 script.py
2> errors.txt
# Redirect both stdout and stderr
python3 script.py
> output.txt
2>&1
# Discard output entirely
noisy-command
> /dev/null
# tee: write to file AND display on screen
make build
| tee build.log
Every process has three standard streams: stdin (0) — input, stdout (1) — normal output, stderr (2) — error output. Redirection controls where each goes. This is why error output uses 2> — it is redirecting file descriptor 2.
Stream editing with sed
sed (stream editor) lets you perform transformations on text without opening a file in an editor. Its most common use is find and replace.
# Replace first occurrence per line
sed 's/old/new/' file.txt
# Replace ALL occurrences per line (g = global)
sed 's/old/new/g' file.txt
# Case-insensitive replacement
sed 's/error/ERROR/gi' file.txt
# Edit in place (modifies the actual file)
sed -i 's/localhost/db.prod.example.com/g' config.yml
# Delete lines matching a pattern
sed '/^#/d' config.ini
# delete comment lines
# Print only lines 10-20
sed -n '10,20p' file.txt
On macOS, sed -i requires a backup extension: sed -i '' 's/old/new/g' file.txt. On Linux you can omit it. Keeping a backup is good practice: sed -i.bak 's/old/new/g' file.txt creates a .bak file automatically.
Shell scripts: structure and arguments
A shell script is a file of commands. It becomes powerful when it accepts arguments — values passed in when you run the script.
#!/bin/bash
# deploy.sh — deploy to a named environment
# Usage: ./deploy.sh staging
# $0 = script name, $1 = first argument, $2 = second...
ENVIRONMENT=$1
# $# = number of arguments
if
[ $# -eq 0 ]; then
echo "Usage: $0 "
exit 1
fi
echo "Deploying to: $ENVIRONMENT"
# $@ = all arguments as separate words
# $* = all arguments as one string
# Double quotes: variable expansion happens
NAME=Alice
echo "Hello, $NAME"
Hello, Alice
# Single quotes: literal, no expansion
echo 'Hello, $NAME'
Hello, $NAME
# Always quote variables that might contain spaces
if
[ -f "$FILENAME"
]; then
echo "File exists"
fi
Conditionals: if and else
#!/bin/bash
# Test if a file exists
if
[ -f config.yml ]; then
echo "Config found"
else
echo "Config missing — aborting"
exit 1
fi
# Test if a directory exists
if
[ -d logs ]; then
echo "Logs dir exists"
fi
# String comparison
if
[ "$ENV"
= "production"
]; then
echo "Running in production mode"
fi
# Numeric comparison
if
[ $COUNT -gt 10 ]; then
echo "Count exceeds threshold"
fi
Loops in depth
for loops
# Loop over a list of words
for
env in staging production; do
echo "Checking $env..."
done
# Loop over files matching a pattern
for
f in *.py; do
echo "Processing $f"
python3 -m py_compile "$f"
done
# C-style numeric loop
for
((i=1; i<=5; i++)); do
echo "Step $i"
done
# Loop over lines from a command
while
IFS= read -r line; do
echo "Line: $line"
done
< file.txt
while loops
# Retry a command up to 3 times
ATTEMPTS=0
while
[ $ATTEMPTS -lt 3 ]; do
python3 app.py && break
ATTEMPTS=$((ATTEMPTS + 1))
echo "Attempt $ATTEMPTS failed, retrying..."
sleep 5
done
Exit codes and error handling
Every command in Unix returns an exit code — a number indicating success or failure. Exit code 0 means success; anything else means failure.
# $? holds the exit code of the last command
ls /nonexistent
ls: cannot access '/nonexistent': No such file or directory
echo $?
2
ls ~
echo $?
0
# set -e: stop script immediately on any failure
#!/bin/bash
set -e
# set -u: treat undefined variables as errors
set -u
# set -x: print each command before executing (debug mode)
set -x
# Use all three — recommended for production scripts
set -euo pipefail
# Only proceed if a command succeeds
pytest &&
echo "Tests passed"
# Run fallback if a command fails
pytest ||
(echo "Tests failed"
# Explicit exit code in a function
validate_config
() {
if
[ ! -f config.yml ]; then
echo "ERROR: config.yml not found" >&2
return 1
fi
return 0
}
Redirecting error messages to stderr with >&2 is good practice. It means the message appears even if the caller redirects stdout. CI systems display stderr prominently, making it easier to find errors in logs.
A worked automation example
Let us build a realistic script that a small team might actually use. It checks the project for issues before a developer commits.
#!/bin/bash
set -euo pipefail
# Pre-commit checks for a Python project
# Usage: ./precheck.sh [--fix]
FIX=false
if
[ "${1:-}" = "--fix" ]; then
FIX=true
fi
check_dependencies
() {
echo "[1/4] Checking dependencies..."
for
tool in python3 git pip; do
if
! command -v "$tool"
&>/dev/null; then
echo "ERROR: $tool is not installed" >&2
exit 1
fi
done
}
install_packages
() {
echo "[2/4] Installing packages..."
pip install -r requirements.txt -q
}
lint
() {
echo "[3/4] Linting..."
if
[ "$FIX" = "true" ]; then
black .
else
black --check .
fi
}
run_tests
() {
echo "[4/4] Running tests..."
pytest -v
}
check_dependencies
install_packages
lint
run_tests
echo "All checks passed."
Running this script produces clear output at each stage. If any step fails (because set -e is set), the script exits immediately with an error. The --fix argument activates automatic code formatting.
This script can then be called from your CI pipeline (Module 6) and optionally as a Git pre-commit hook (Module 7). The same checks run locally and in CI.
Key terms
Exercises
Part A: Text processing pipeline
Download the sample log file from Moodle (server.log). Using only the command line:
- Count the total number of lines in the file.
- Count how many lines contain the word ERROR (case-insensitive).
- Extract just the ERROR lines and save them to a file called errors.txt.
- Find the five most frequently occurring error messages.
- Find any lines containing an IP address pattern (four numbers separated by dots).
- Count how many unique IP addresses appear in the log.
Part B: Shell script — environment checker
Write a script envcheck.sh that:
- Accepts a required argument: the name of the environment (e.g. staging, production).
- Prints an error and exits with code 1 if no argument is given.
- Checks that python3, git, and docker are installed. For each, print ✓ tool found or ✗ tool missing.
- Prints a summary: N/3 tools found.
- Returns exit code 0 if all three are found, 1 otherwise.
Part C: Combining tools
- Use
find . -name "*.py" | wc -lto count Python files in a directory. - Pipe that through sort and uniq to find duplicate filenames:
find . -name "*.py" | xargs basename | sort | uniq -d - Write a loop that runs
wc -lon every Python file and prints:filename: N lines - Extension: modify the loop to skip files with zero lines.
Part D: sed
- Create a file with the text
server=localhoston one line. - Use sed to replace localhost with the name of a fake production server.
- Use sed to delete all lines starting with # from a Python file (comment stripping).