File Signatures
File signatures (also known as magic numbers or magic bytes) are unique byte sequences at the beginning of files that identify their format. Understanding file signatures is crucial for forensic analysis, as files may have incorrect or missing extensions.
Common File Signatures
# Image Formats
PNG: 89 50 4E 47 0D 0A 1A 0A
JPEG: FF D8 FF E0 (JFIF) or FF D8 FF E1 (EXIF)
GIF: 47 49 46 38 (GIF87a or GIF89a)
BMP: 42 4D
# Archive Formats
ZIP: 50 4B 03 04
RAR: 52 61 72 21 1A 07 00
GZIP: 1F 8B 08
7Z: 37 7A BC AF 27 1C
# Document Formats
PDF: 25 50 44 46 2D 31 2E (%PDF-1.)
DOCX: 50 4B 03 04 (ZIP-based)
RTF: 7B 5C 72 74 66
# Executable Formats
EXE: 4D 5A (MZ)
ELF: 7F 45 4C 46
DMG: 78 01 73 0D 62 62 60
Viewing File Signatures
# Using hexdump (first 32 bytes)
hexdump -C -n 32 suspicious_file
# Using xxd (more readable)
xxd suspicious_file | head
# Using od (octal dump)
od -A x -t x1z -N 32 suspicious_file
Essential Analysis Tools
file - Identify File Types
The file command uses magic numbers and heuristics to determine file types.
# Basic file type identification
file suspicious_file
# Show MIME type
file -i suspicious_file
# Check all files in directory
file *
# Follow symbolic links
file -L symlink_file
strings - Extract ASCII/Unicode Strings
Extract readable strings from binary files to find hidden messages, URLs, file paths, or credentials.
# Extract printable strings (minimum 4 characters)
strings suspicious_file
# Set minimum string length
strings -n 8 suspicious_file
# Include Unicode strings
strings -e l suspicious_file # 16-bit little-endian
strings -e b suspicious_file # 16-bit big-endian
# Search for specific patterns
strings suspicious_file | grep -i "password"
strings suspicious_file | grep -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"
exiftool - Metadata Analysis
ExifTool reads and writes metadata in a wide variety of file formats.
# View all metadata
exiftool image.jpg
# Extract specific fields
exiftool -GPS* image.jpg
exiftool -Creator -CreateDate document.pdf
# Remove all metadata
exiftool -all= image.jpg
# Batch process all images
exiftool -r -csv directory/ > metadata.csv
binwalk - Analyze Binary Files
Binwalk searches binary files for embedded files and executable code.
# Scan for file signatures
binwalk firmware.bin
# Extract embedded files
binwalk -e firmware.bin
# Extract with custom directory
binwalk -e --directory=output firmware.bin
# Show entropy analysis (detect encryption/compression)
binwalk -E firmware.bin
File Carving
File carving is the process of recovering files from a disk image or unallocated space without relying on file system metadata. This technique is essential when dealing with deleted files or corrupted file systems.
foremost - File Carving Tool
# Carve all supported file types
foremost -i disk_image.dd -o carved_files/
# Carve specific file types only
foremost -t jpg,png,pdf -i disk_image.dd -o output/
# Use custom configuration
foremost -c custom.conf -i disk_image.dd -o output/
# Verbose output
foremost -v -i disk_image.dd -o output/
scalpel - Advanced File Carving
# Edit configuration file first
# /etc/scalpel/scalpel.conf
# Run scalpel
scalpel disk_image.dd -o carved_output/
# Carve specific types (edit config first)
scalpel -c custom_scalpel.conf disk_image.dd -o output/
PhotoRec - Recover Deleted Files
# Interactive mode
photorec disk_image.dd
# Command-line mode
photorec /d recovered_files/ /cmd disk_image.dd search
Hash Analysis
Hash functions create unique fingerprints of files, essential for verifying integrity and identifying known malware.
# MD5 (legacy, faster but less secure)
md5sum file.exe
# SHA-1
sha1sum file.exe
# SHA-256 (recommended)
sha256sum file.exe
# Multiple files
sha256sum *.exe > hashes.txt
# Verify against hash list
sha256sum -c hashes.txt
# Compare two files
md5sum file1.txt file2.txt
VirusTotal API
Check file hashes against VirusTotal's database to identify known malware.
# Using curl with API key
curl -X POST 'https://www.virustotal.com/api/v3/files' \
-H 'x-apikey: YOUR_API_KEY' \
-F 'file=@suspicious_file.exe'
Practice Challenges
Beginner Level
-
Hidden in Plain Sight
Find data hidden in file extensions -
Magic Bytes
Identify files by their signatures -
String Theory
Extract meaningful strings from binaries
Intermediate Level
-
Deleted But Not Forgotten
Recover deleted files from disk images -
Metadata Mystery
Analyze EXIF data for hidden clues -
Carved in Stone
Carve multiple file types from raw data