How It Works
Magic bytes (also called file signatures) are specific byte sequences at the beginning of a file that identify its format. For example:
- PNG files always start with
89 50 4E 47 - JPEG files start with
FF D8 FF - PDF files start with
25 50 44 46(which is "%PDF" in text) - ZIP files start with
50 4B 03 04(which is "PK" in text) - See Wikipedia's List of File Signatures for a basic reference
Out detector reads these magic bytes to accurately identify file types, regardless of the file extension.
The detection happens in four steps:
- File Selection: You select or drag a file into the browser
- Header Reading: JavaScript reads only the first 64KB of the file (the header)
- WASM Processing: The header bytes are passed to our Rust WebAssembly module
- Magic Byte Analysis: The detector analyzes the byte signature and identifies the format
All processing happens locally in your browser - no data is transmitted anywhere.
File type signatures are almost always located at the beginning of a file. Reading only the first 64KB (65,536 bytes) provides:
- Fast detection - even multi-GB files are analyzed instantly
- Low memory usage - only 64KB is loaded regardless of file size
- Accurate results - sufficient data for complex formats
This is enough to detect even complex container formats like DOCX (which is actually a ZIP file containing XML).
WebAssembly is a low-level programming language that runs in web browsers at near-native speed. We use it to:
- Run Rust code directly in your browser
- Process files faster than pure JavaScript
- Maintain strong privacy guarantees (everything runs locally)
- Support 450+ file formats with a compact 189KB binary
Many modern file formats are actually containers - they're based on standard archive formats (like ZIP) but contain specific structured content inside. The detector can identify both the specific format and the underlying container.
How container detection works:
- The detector reads the magic bytes and identifies the container (e.g., ZIP)
- It examines the internal structure and metadata within the first 64KB
- Based on specific markers, it determines the actual format (e.g., DOCX)
- The result shows both the specific format and its parent container
Common container-based formats:
- ZIP-based:
- DOCX, XLSX, PPTX (Microsoft Office Open XML)
- ODT, ODS, ODP (OpenDocument formats)
- EPUB (eBook format)
- JAR (Java Archive)
- APK (Android Package)
- XPI (Firefox Extension)
- Other containers:
- MP4, M4A, MOV (based on ISO Base Media File Format)
- OGG, OGV (Ogg container for audio/video)
- WebM (Matroska-based)
Why this matters:
- Accuracy: A renamed ZIP file can be correctly identified as DOCX if it contains the right internal structure
- Understanding: You learn what your file is really built on (useful for developers)
- Compatibility: Knowing the container helps understand which tools can open the file
- Recovery: If a DOCX file won't open, knowing it's a ZIP means you can extract its contents manually
Example detection: When you upload a DOCX file, the detector will show:
- Format: DOCX (Word Document)
- MIME Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
- Based On: application/zip (.zip)
This "Based On" information tells you the file is fundamentally a ZIP archive with a specific internal structure.
General Questions
No, absolutely not. Your files never leave your browser. This is a completely client-side application - all processing happens locally using WebAssembly. We don't have a backend server to receive files even if we wanted to.
Our detector supports over 450 file formats, including:
- Images: JPEG, PNG, GIF, WebP, TIFF, BMP, SVG, ICO, HEIC
- Videos: MP4, AVI, MKV, WebM, MOV, FLV, WMV
- Audio: MP3, WAV, FLAC, OGG, AAC, M4A, WMA
- Documents: PDF, DOCX, XLSX, PPTX, ODT, EPUB
- Archives: ZIP, RAR, 7Z, TAR, GZIP, BZIP2
- Executables: EXE, DLL, ELF, Mach-O
- And many more...
If a file type cannot be detected, it means:
- The file has no magic byte signature
- It's a custom or proprietary format
- The file is corrupted
- It's a plain text file (which we detect as
text/plain)
The detector will show application/octet-stream for unknown binary files.
Our detector warns when an executable file is disguised with a safe-looking extension. Examples:
- An
.exefile renamed to.jpg - A
.dllfile renamed to.pdf - A
.scrfile renamed to.mp3
This is a common malware technique. If you see this warning, do not open the file unless you trust its source.
⚠️ Important: This is a basic check only. We only detect extension mismatches, not malware itself.
This tool is NOT antivirus software. It cannot detect:
- Malicious code within legitimate file types (macro viruses, PDF exploits)
- Files with correct extensions that contain malware
- Sophisticated malware or zero-day exploits
For complete protection, always use proper antivirus software and avoid opening files from untrusted sources.
No! Since we only read the first 64KB of each file, you can analyze files of any size - even terabyte-sized files will work. The detection time is the same regardless of file size.
After the initial page load (which downloads the WebAssembly module), the detector can work offline. However, the learn more links (Wikipedia, Wikidata, Google) require an internet connection.
File Type Detector works in all modern browsers that support WebAssembly:
- Chrome/Edge 57+
- Firefox 52+
- Safari 11+
- Opera 44+
This includes all browsers released after 2017.
The detector uses the open-source mimetype-detector Rust crate for file type detection. The crate is publicly available and auditable on GitHub.
Many modern file formats are actually containers based on other formats. For example:
- DOCX is a ZIP file containing XML documents
- XLSX is a ZIP file with spreadsheet data
- EPUB is a ZIP file with HTML content
- JAR files are ZIP archives
When detected, we show both the specific format (DOCX) and its parent container (ZIP).
Still have questions?
Contact us at support@filetype-detector.online