Module 02Unix & CLI~1 hour

Unix Tools and Simple Automation

Master the text-processing toolkit that powers DevOps automation: grep, pipes, redirection, and your first useful shell scripts.

Covers:MLO2MLO3

Searching text with grep

grep is one of the most frequently used tools in a DevOps engineer's day. Its name stands for Global Regular Expression Print, and it does exactly what that suggests: it searches files (or input) for lines matching a pattern and prints them.

bash — grep basics

# Search for 'error' in a log file
grep 'error' server.log

# Case-insensitive search
grep -i 'error' server.log

# Show line numbers alongside matches
grep -n 'error' server.log
42:ERROR: connection timeout
87:ERROR: database unreachable

# Count matching lines
grep -c 'error' server.log
23

# Show lines that do NOT match
grep -v 'DEBUG' server.log

# Search recursively through all files in a directory
grep -r 'TODO' ./src

# Show the filename with matches (useful with -r)
grep -rn 'TODO' ./src
./src/auth.py:15:# TODO: add input validation
./src/api.py:42:# TODO: handle timeout

# Show N lines of context around each match
grep -B2 -A2 'CRITICAL' server.log
  # 2 lines Before and After

Regular expressions basics

grep supports regular expressions (regex) — a powerful pattern language. Mastering regex is a significant skill in its own right, but a handful of patterns covers the majority of practical cases.

Match any single character

Match zero or more of the previous character

Match one or more (use grep -E for this)

Match start of line

Match end of line

[abc]

Match a, b, or c

[0-9]

Match any digit

[a-z]

Match any lowercase letter

\\b

Word boundary — match only whole words

bash — regex with grep

# Lines starting with 'ERROR'
grep '^ERROR' server.log

# Lines ending with a number
grep -E '[0-9]+$' server.log

# IP addresses (simple pattern)
grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' access.log

# Lines containing a word boundary match for 'fail'
grep -w 'fail' server.log
  # matches 'fail' but not 'failure'

# Use -E (extended regex) for + ? | and ()
grep -E 'ERROR|CRITICAL' server.log
  # match either word

ℹ grep -E vs egrep

You will sometimes see egrep used instead of grep -E. They are equivalent. grep -E is preferred in modern scripts because it makes the behaviour explicit.

Processing text: sort, uniq, cut, wc

These four tools form a text-processing toolkit that you will combine constantly with grep and pipes.

sort — order lines

bash — sort

# Sort alphabetically
sort names.txt

# Sort in reverse order
sort -r names.txt

# Sort numerically (not lexicographically)
sort -n numbers.txt

# Sort by second column (tab-separated)
sort -k2 data.tsv

# Reverse numeric sort (largest first)
sort -rn scores.txt

uniq — deduplicate consecutive lines

bash — uniq

# Remove consecutive duplicate lines
sort names.txt
 | uniq

# Count occurrences
sort names.txt
 | uniq -c
      3 alice
      1 bob
      5 charlie

# Show only lines that appear more than once
sort names.txt
 | uniq -d

Important: uniq only removes consecutive duplicates. You almost always need to sort before uniq to get meaningful deduplication.

cut — extract columns

bash — cut

# Extract first column from comma-separated file
cut -d, -f1 data.csv

# Extract columns 1 and 3
cut -d, -f1,3 data.csv

# Extract characters 1-10 from each line
cut -c1-10 file.txt

# Extract all columns from 5 onwards
cut -d: -f5- /etc/passwd

wc — count things

bash — wc

# Lines, words, characters
wc file.txt
  156  1203  8432 file.txt

wc -l file.txt
  # lines only
wc -w file.txt
  # words only
wc -c file.txt
  # bytes

Pipes and redirection

Pipes and redirection are what turn individual tools into a powerful automation system. They follow the Unix philosophy: each program does one thing well, and you compose them freely.

The pipe operator

bash — pipes

# How many ERROR lines in the log?
grep 'ERROR' server.log
 | wc -l
47

# Top 5 most frequent errors
grep 'ERROR' server.log
 | sort
 | uniq -c
 | sort -rn
 | head -5

# Find all Python files modified today
find . -name '*.py'
 | xargs ls -lt
 | head

# Check if a process is running
ps aux
 | grep python3

Redirection

bash — redirection operators

# Write stdout to file (overwrite)
ls -l
 > listing.txt

# Append stdout to file
echo 'new entry'
 >> log.txt

# Read stdin from file
wc -l
 < big-file.txt

# Redirect stderr (errors) to file
python3 script.py
 2> errors.txt

# Redirect both stdout and stderr
python3 script.py
 > output.txt
 2>&1

# Discard output entirely
noisy-command
 > /dev/null

# tee: write to file AND display on screen
make build
 | tee build.log

ℹ Standard streams

Every process has three standard streams: stdin (0) — input, stdout (1) — normal output, stderr (2) — error output. Redirection controls where each goes. This is why error output uses 2> — it is redirecting file descriptor 2.

Stream editing with sed

sed (stream editor) lets you perform transformations on text without opening a file in an editor. Its most common use is find and replace.

bash — sed

# Replace first occurrence per line
sed 's/old/new/' file.txt

# Replace ALL occurrences per line (g = global)
sed 's/old/new/g' file.txt

# Case-insensitive replacement
sed 's/error/ERROR/gi' file.txt

# Edit in place (modifies the actual file)
sed -i 's/localhost/db.prod.example.com/g' config.yml

# Delete lines matching a pattern
sed '/^#/d' config.ini
  # delete comment lines

# Print only lines 10-20
sed -n '10,20p' file.txt

⚠ sed -i backup

On macOS, sed -i requires a backup extension: sed -i '' 's/old/new/g' file.txt. On Linux you can omit it. Keeping a backup is good practice: sed -i.bak 's/old/new/g' file.txt creates a .bak file automatically.

Shell scripts: structure and arguments

A shell script is a file of commands. It becomes powerful when it accepts arguments — values passed in when you run the script.

bash — script with arguments

#!/bin/bash
# deploy.sh — deploy to a named environment
# Usage: ./deploy.sh staging

# $0 = script name, $1 = first argument, $2 = second...
ENVIRONMENT=$1

# $# = number of arguments
if
 [ $# -eq 0 ]; then
    echo "Usage: $0 "
    exit 1
fi

echo "Deploying to: $ENVIRONMENT"

# $@ = all arguments as separate words
# $* = all arguments as one string

bash — double quotes and quoting

# Double quotes: variable expansion happens
NAME=Alice
echo "Hello, $NAME"
Hello, Alice

# Single quotes: literal, no expansion
echo 'Hello, $NAME'
Hello, $NAME

# Always quote variables that might contain spaces
if
 [ -f "$FILENAME"
 ]; then
    echo "File exists"
fi

Conditionals: if and else

bash — if/else patterns

#!/bin/bash

# Test if a file exists
if
 [ -f config.yml ]; then
    echo "Config found"
else
    echo "Config missing — aborting"
    exit 1
fi

# Test if a directory exists
if
 [ -d logs ]; then
    echo "Logs dir exists"
fi

# String comparison
if
 [ "$ENV"
 = "production"
 ]; then
    echo "Running in production mode"
fi

# Numeric comparison
if
 [ $COUNT -gt 10 ]; then
    echo "Count exceeds threshold"
fi

-f file

True if file exists and is a regular file

-d file

True if file exists and is a directory

-e file

True if file or directory exists

-z "$var"

True if variable is empty

-n "$var"

True if variable is non-empty

$a -eq $b

Numeric equal

$a -gt $b

Numeric greater than

$a -lt $b

Numeric less than

"$a" = "$b"

String equal

Loops in depth

for loops

bash — for loop variations

# Loop over a list of words
for
 env in staging production; do
    echo "Checking $env..."
done

# Loop over files matching a pattern
for
 f in *.py; do
    echo "Processing $f"
    python3 -m py_compile "$f"
done

# C-style numeric loop
for
 ((i=1; i<=5; i++)); do
    echo "Step $i"
done

# Loop over lines from a command
while
 IFS= read -r line; do
    echo "Line: $line"
done
 < file.txt

while loops

bash — while loop

# Retry a command up to 3 times
ATTEMPTS=0
while
 [ $ATTEMPTS -lt 3 ]; do
    python3 app.py && break

    ATTEMPTS=$((ATTEMPTS + 1))
    echo "Attempt $ATTEMPTS failed, retrying..."
    sleep 5
done

Exit codes and error handling

Every command in Unix returns an exit code — a number indicating success or failure. Exit code 0 means success; anything else means failure.

bash — exit codes

# $? holds the exit code of the last command
ls /nonexistent
ls: cannot access '/nonexistent': No such file or directory
echo $?
2

ls ~
echo $?
0

# set -e: stop script immediately on any failure
#!/bin/bash
set -e

# set -u: treat undefined variables as errors
set -u

# set -x: print each command before executing (debug mode)
set -x

# Use all three — recommended for production scripts
set -euo pipefail

bash — conditional on exit code

# Only proceed if a command succeeds
pytest &&
 echo "Tests passed"

# Run fallback if a command fails
pytest ||
 (echo "Tests failed"

# Explicit exit code in a function
validate_config
() {
    if
 [ ! -f config.yml ]; then
        echo "ERROR: config.yml not found" >&2
        return 1
    fi
    return 0
}

ℹ >&2 for error messages

Redirecting error messages to stderr with >&2 is good practice. It means the message appears even if the caller redirects stdout. CI systems display stderr prominently, making it easier to find errors in logs.

A worked automation example

Let us build a realistic script that a small team might actually use. It checks the project for issues before a developer commits.

bash — precheck.sh

#!/bin/bash
set -euo pipefail

# Pre-commit checks for a Python project
# Usage: ./precheck.sh [--fix]

FIX=false
if
 [ "${1:-}" = "--fix" ]; then
    FIX=true
fi

check_dependencies
() {
    echo "[1/4] Checking dependencies..."
    for
 tool in python3 git pip; do
        if
 ! command -v "$tool"
 &>/dev/null; then
            echo "ERROR: $tool is not installed" >&2
            exit 1
        fi
    done
}

install_packages
() {
    echo "[2/4] Installing packages..."
    pip install -r requirements.txt -q
}

lint
() {
    echo "[3/4] Linting..."
    if
 [ "$FIX" = "true" ]; then
        black .
    else
        black --check .
    fi
}

run_tests
() {
    echo "[4/4] Running tests..."
    pytest -v
}

check_dependencies
install_packages
lint
run_tests

echo "All checks passed."

Running this script produces clear output at each stage. If any step fails (because set -e is set), the script exits immediately with an error. The --fix argument activates automatic code formatting.

This script can then be called from your CI pipeline (Module 6) and optionally as a Git pre-commit hook (Module 7). The same checks run locally and in CI.

Key terms

grep

Search for lines matching a pattern. -i case-insensitive, -r recursive, -n show line numbers.

regular expression

A pattern language for describing text. Used by grep, sed, and many other tools.

pipe (|)

Sends the stdout of one command to the stdin of the next.

redirection (>)

Writes stdout to a file instead of the screen.

stderr (2)

The error output stream. Redirect with 2> or 2>&1.

sed

Stream editor. Most used for find-and-replace: sed 's/old/new/g'.

$1, $2, $@

Script arguments. $1 is first, $@ is all arguments.

exit code

A number returned by every command. 0 = success, non-zero = failure.

set -e

Stop the script immediately if any command returns a non-zero exit code.

Run the second command only if the first succeeds.

Run the second command only if the first fails.

uniq -c

Count consecutive duplicate lines (always sort first).

Exercises

✎ Lab exercises — approximately 50 minutes

Part A: Text processing pipeline

Download the sample log file from Moodle (server.log). Using only the command line:

Count the total number of lines in the file.
Count how many lines contain the word ERROR (case-insensitive).
Extract just the ERROR lines and save them to a file called errors.txt.
Find the five most frequently occurring error messages.
Find any lines containing an IP address pattern (four numbers separated by dots).
Count how many unique IP addresses appear in the log.

Part B: Shell script — environment checker

Write a script envcheck.sh that:

Accepts a required argument: the name of the environment (e.g. staging, production).
Prints an error and exits with code 1 if no argument is given.
Checks that python3, git, and docker are installed. For each, print ✓ tool found or ✗ tool missing.
Prints a summary: N/3 tools found.
Returns exit code 0 if all three are found, 1 otherwise.

Part C: Combining tools

Use find . -name "*.py" | wc -l to count Python files in a directory.
Pipe that through sort and uniq to find duplicate filenames: find . -name "*.py" | xargs basename | sort | uniq -d
Write a loop that runs wc -l on every Python file and prints: filename: N lines
Extension: modify the loop to skip files with zero lines.

Part D: sed

Create a file with the text server=localhost on one line.
Use sed to replace localhost with the name of a fake production server.
Use sed to delete all lines starting with # from a Python file (comment stripping).

← PreviousIntroduction to DevOps Next →Version Control & Git