File Analysis

Learn to analyze files, extract metadata, and uncover hidden information

File Signatures

File signatures (also known as magic numbers or magic bytes) are unique byte sequences at the beginning of files that identify their format. Understanding file signatures is crucial for forensic analysis, as files may have incorrect or missing extensions.

Common File Signatures
# Image Formats
PNG:  89 50 4E 47 0D 0A 1A 0A
JPEG: FF D8 FF E0 (JFIF) or FF D8 FF E1 (EXIF)
GIF:  47 49 46 38 (GIF87a or GIF89a)
BMP:  42 4D

# Archive Formats
ZIP:  50 4B 03 04
RAR:  52 61 72 21 1A 07 00
GZIP: 1F 8B 08
7Z:   37 7A BC AF 27 1C

# Document Formats
PDF:  25 50 44 46 2D 31 2E (%PDF-1.)
DOCX: 50 4B 03 04 (ZIP-based)
RTF:  7B 5C 72 74 66

# Executable Formats
EXE:  4D 5A (MZ)
ELF:  7F 45 4C 46
DMG:  78 01 73 0D 62 62 60
Viewing File Signatures
# Using hexdump (first 32 bytes)
hexdump -C -n 32 suspicious_file

# Using xxd (more readable)
xxd suspicious_file | head

# Using od (octal dump)
od -A x -t x1z -N 32 suspicious_file

Essential Analysis Tools

file - Identify File Types

The file command uses magic numbers and heuristics to determine file types.

# Basic file type identification
file suspicious_file

# Show MIME type
file -i suspicious_file

# Check all files in directory
file *

# Follow symbolic links
file -L symlink_file
strings - Extract ASCII/Unicode Strings

Extract readable strings from binary files to find hidden messages, URLs, file paths, or credentials.

# Extract printable strings (minimum 4 characters)
strings suspicious_file

# Set minimum string length
strings -n 8 suspicious_file

# Include Unicode strings
strings -e l suspicious_file  # 16-bit little-endian
strings -e b suspicious_file  # 16-bit big-endian

# Search for specific patterns
strings suspicious_file | grep -i "password"
strings suspicious_file | grep -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"
exiftool - Metadata Analysis

ExifTool reads and writes metadata in a wide variety of file formats.

# View all metadata
exiftool image.jpg

# Extract specific fields
exiftool -GPS* image.jpg
exiftool -Creator -CreateDate document.pdf

# Remove all metadata
exiftool -all= image.jpg

# Batch process all images
exiftool -r -csv directory/ > metadata.csv
binwalk - Analyze Binary Files

Binwalk searches binary files for embedded files and executable code.

# Scan for file signatures
binwalk firmware.bin

# Extract embedded files
binwalk -e firmware.bin

# Extract with custom directory
binwalk -e --directory=output firmware.bin

# Show entropy analysis (detect encryption/compression)
binwalk -E firmware.bin

File Carving

File carving is the process of recovering files from a disk image or unallocated space without relying on file system metadata. This technique is essential when dealing with deleted files or corrupted file systems.

foremost - File Carving Tool
# Carve all supported file types
foremost -i disk_image.dd -o carved_files/

# Carve specific file types only
foremost -t jpg,png,pdf -i disk_image.dd -o output/

# Use custom configuration
foremost -c custom.conf -i disk_image.dd -o output/

# Verbose output
foremost -v -i disk_image.dd -o output/
scalpel - Advanced File Carving
# Edit configuration file first
# /etc/scalpel/scalpel.conf

# Run scalpel
scalpel disk_image.dd -o carved_output/

# Carve specific types (edit config first)
scalpel -c custom_scalpel.conf disk_image.dd -o output/
PhotoRec - Recover Deleted Files
# Interactive mode
photorec disk_image.dd

# Command-line mode
photorec /d recovered_files/ /cmd disk_image.dd search

Hash Analysis

Hash functions create unique fingerprints of files, essential for verifying integrity and identifying known malware.

# MD5 (legacy, faster but less secure)
md5sum file.exe

# SHA-1
sha1sum file.exe

# SHA-256 (recommended)
sha256sum file.exe

# Multiple files
sha256sum *.exe > hashes.txt

# Verify against hash list
sha256sum -c hashes.txt

# Compare two files
md5sum file1.txt file2.txt
VirusTotal API

Check file hashes against VirusTotal's database to identify known malware.

# Using curl with API key
curl -X POST 'https://www.virustotal.com/api/v3/files' \
  -H 'x-apikey: YOUR_API_KEY' \
  -F 'file=@suspicious_file.exe'

Practice Challenges

Beginner Level
  • Hidden in Plain Sight
    Find data hidden in file extensions
  • Magic Bytes
    Identify files by their signatures
  • String Theory
    Extract meaningful strings from binaries
Intermediate Level
  • Deleted But Not Forgotten
    Recover deleted files from disk images
  • Metadata Mystery
    Analyze EXIF data for hidden clues
  • Carved in Stone
    Carve multiple file types from raw data