FAQ - How File Type Detector Works

How It Works

What are "magic bytes"?

Magic bytes (also called file signatures) are specific byte sequences at the beginning of a file that identify its format. For example:

PNG files always start with 89 50 4E 47
JPEG files start with FF D8 FF
PDF files start with 25 50 44 46 (which is "%PDF" in text)
ZIP files start with 50 4B 03 04 (which is "PK" in text)
See Wikipedia's List of File Signatures for a basic reference

Out detector reads these magic bytes to accurately identify file types, regardless of the file extension.

How does the detection process work?

The detection happens in four steps:

File Selection: You select or drag a file into the browser
Header Reading: JavaScript reads only the first 64KB of the file (the header)
WASM Processing: The header bytes are passed to our Rust WebAssembly module
Magic Byte Analysis: The detector analyzes the byte signature and identifies the format

All processing happens locally in your browser - no data is transmitted anywhere.

Why only read the first 64KB?

File type signatures are almost always located at the beginning of a file. Reading only the first 64KB (65,536 bytes) provides:

Fast detection - even multi-GB files are analyzed instantly
Low memory usage - only 64KB is loaded regardless of file size
Accurate results - sufficient data for complex formats

This is enough to detect even complex container formats like DOCX (which is actually a ZIP file containing XML).

What is WebAssembly (WASM)?

WebAssembly is a low-level programming language that runs in web browsers at near-native speed. We use it to:

Run Rust code directly in your browser
Process files faster than pure JavaScript
Maintain strong privacy guarantees (everything runs locally)
Support 450+ file formats with a compact 189KB binary

What are container formats and how does the detector handle them?

Many modern file formats are actually containers - they're based on standard archive formats (like ZIP) but contain specific structured content inside. The detector can identify both the specific format and the underlying container.

How container detection works:

The detector reads the magic bytes and identifies the container (e.g., ZIP)
It examines the internal structure and metadata within the first 64KB
Based on specific markers, it determines the actual format (e.g., DOCX)
The result shows both the specific format and its parent container

Common container-based formats:

ZIP-based:
- DOCX, XLSX, PPTX (Microsoft Office Open XML)
- ODT, ODS, ODP (OpenDocument formats)
- EPUB (eBook format)
- JAR (Java Archive)
- APK (Android Package)
- XPI (Firefox Extension)
Other containers:
- MP4, M4A, MOV (based on ISO Base Media File Format)
- OGG, OGV (Ogg container for audio/video)
- WebM (Matroska-based)

Why this matters:

Accuracy: A renamed ZIP file can be correctly identified as DOCX if it contains the right internal structure
Understanding: You learn what your file is really built on (useful for developers)
Compatibility: Knowing the container helps understand which tools can open the file
Recovery: If a DOCX file won't open, knowing it's a ZIP means you can extract its contents manually

Example detection: When you upload a DOCX file, the detector will show:

Format: DOCX (Word Document)
MIME Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
Based On: application/zip (.zip)

This "Based On" information tells you the file is fundamentally a ZIP archive with a specific internal structure.

General Questions

Is my file uploaded to a server?

No, absolutely not. Your files never leave your browser. This is a completely client-side application - all processing happens locally using WebAssembly. We don't have a backend server to receive files even if we wanted to.

How many file types are supported?

Our detector supports over 450 file formats, including:

Images: JPEG, PNG, GIF, WebP, TIFF, BMP, SVG, ICO, HEIC
Videos: MP4, AVI, MKV, WebM, MOV, FLV, WMV
Audio: MP3, WAV, FLAC, OGG, AAC, M4A, WMA
Documents: PDF, DOCX, XLSX, PPTX, ODT, EPUB
Archives: ZIP, RAR, 7Z, TAR, GZIP, BZIP2
Executables: EXE, DLL, ELF, Mach-O
And many more...

What if the file type is not detected?

If a file type cannot be detected, it means:

The file has no magic byte signature
It's a custom or proprietary format
The file is corrupted
It's a plain text file (which we detect as text/plain)

The detector will show application/octet-stream for unknown binary files.

What is the ransomware warning?

Our detector warns when an executable file is disguised with a safe-looking extension. Examples:

An .exe file renamed to .jpg
A .dll file renamed to .pdf
A .scr file renamed to .mp3

This is a common malware technique. If you see this warning, do not open the file unless you trust its source.

⚠️ Important: This is a basic check only. We only detect extension mismatches, not malware itself.

This tool is NOT antivirus software. It cannot detect:

Malicious code within legitimate file types (macro viruses, PDF exploits)
Files with correct extensions that contain malware
Sophisticated malware or zero-day exploits

For complete protection, always use proper antivirus software and avoid opening files from untrusted sources.

Is there a file size limit?

No! Since we only read the first 64KB of each file, you can analyze files of any size - even terabyte-sized files will work. The detection time is the same regardless of file size.

Can I use this offline?

After the initial page load (which downloads the WebAssembly module), the detector can work offline. However, the learn more links (Wikipedia, Wikidata, Google) require an internet connection.

What browsers are supported?

File Type Detector works in all modern browsers that support WebAssembly:

Chrome/Edge 57+
Firefox 52+
Safari 11+
Opera 44+

This includes all browsers released after 2017.

Is this open source?

The detector uses the open-source mimetype-detector Rust crate for file type detection. The crate is publicly available and auditable on GitHub.

Why does it show a "parent format"?

Many modern file formats are actually containers based on other formats. For example:

DOCX is a ZIP file containing XML documents
XLSX is a ZIP file with spreadsheet data
EPUB is a ZIP file with HTML content
JAR files are ZIP archives

When detected, we show both the specific format (DOCX) and its parent container (ZIP).

Frequently Asked Questions

How It Works

General Questions

Still have questions?