How It Works
Think of magic bytes as a file's fingerprint - they're specific byte sequences at the very beginning of a file that tell you what type it is. Every file format has its own unique signature:
- PNG files always start with
89 50 4E 47 - JPEG files start with
FF D8 FF - PDF files start with
25 50 44 46(which spells "%PDF" in text) - ZIP files start with
50 4B 03 04(the letters "PK") - Check out Wikipedia's List of File Signatures to see more
Our detector reads these magic bytes to figure out what your file really is, no matter what the extension says.
Here's what happens when you upload a file:
- You select a file - Either drag and drop or click to browse
- We read the header - JavaScript grabs just the first 64KB of the file
- WebAssembly takes over - Those bytes get passed to our super-fast Rust code
- We identify the format - The detector matches the magic bytes against its database
Everything happens right here in your browser - nothing gets sent anywhere.
Because that's where all the important stuff is! File signatures are pretty much always at the very beginning. Reading just the first 64KB (65,536 bytes) gives us:
- Blazing fast detection - Your 10GB video file? Analyzed in milliseconds
- Low memory usage - We only load 64KB no matter how big your file is
- Perfect accuracy - More than enough data for even complex formats
Even tricky formats like DOCX (which is actually a ZIP file full of XML) reveal themselves in those first 64KB.
WebAssembly is like giving your browser superpowers. It's a way to run really fast code (like Rust or C++) directly in your browser, almost as fast as native desktop apps. We use it to:
- Run high-performance Rust code right in your browser
- Process files way faster than JavaScript alone could
- Keep everything private - all the code runs on your machine
- Pack support for ~500 file formats into a small package
Here's a fun fact: Many modern file formats are basically ZIP files in disguise! They're containers - standard archive formats (like ZIP) that hold specific structured content inside. Our detector is smart enough to identify both what the file pretends to be and what it actually is underneath.
How we detect containers:
- We read the magic bytes and spot the container (like ZIP)
- We peek inside at the structure and metadata (all within that first 64KB)
- We look for specific markers that reveal the true format (like DOCX)
- We show you both - the specific format and what it's built on
Common container-based formats:
- ZIP-based:
- DOCX, XLSX, PPTX (Microsoft Office Open XML)
- ODT, ODS, ODP (OpenDocument formats)
- EPUB (eBook format)
- JAR (Java Archive)
- APK (Android Package)
- XPI (Firefox Extension)
- Other containers:
- MP4, M4A, MOV (based on ISO Base Media File Format)
- OGG, OGV (Ogg container for audio/video)
- WebM (Matroska-based)
Why should you care?
- Accuracy: Even if someone renames a DOCX to .zip, we'll still recognize it as a Word document
- Understanding: You'll know what your file is actually made of (super useful if you're a developer)
- Compatibility: Knowing the container tells you which tools can work with your file
- Recovery: If a DOCX won't open in Word, you can unzip it manually and grab your data
What you'll see: Upload a DOCX file and we'll show you:
- Format: DOCX (Word Document)
- MIME Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
- Based On: application/zip (.zip)
That "Based On" line reveals the secret - it's fundamentally a ZIP archive with Word-specific stuff inside.
General Questions
No, absolutely not. Your files never leave your browser. This is a completely client-side application - all processing happens locally using WebAssembly. We don't have a backend server to receive files even if we wanted to.
We support over 500+ file formats! Here's a taste of what we can identify:
- Images: JPEG, PNG, GIF, WebP, TIFF, BMP, SVG, ICO, HEIC
- Videos: MP4, AVI, MKV, WebM, MOV, FLV, WMV
- Audio: MP3, WAV, FLAC, OGG, AAC, M4A, WMA
- Documents: PDF, DOCX, XLSX, PPTX, ODT, EPUB
- Archives: ZIP, RAR, 7Z, TAR, GZIP, BZIP2
- Executables: EXE, DLL, ELF, Mach-O
- ...and hundreds more
If we can't identify your file, it usually means:
- The file has no magic byte signature (some formats just don't have one)
- It's a custom or proprietary format we haven't seen before
- The file might be corrupted
- It's plain text (we'll show it as
text/plain)
For unknown binary files, you'll see application/octet-stream - basically our way of saying "it's a file, but we're not sure what kind." Learn more about MIME types and media types.
We'll warn you if we spot an executable file trying to look innocent. Common tricks include:
- An
.exefile pretending to be a.jpg - A
.dllfile disguised as a.pdf - A
.scrscreensaver masquerading as a.mp3
This is Malware 101. If you see this warning, don't open the file unless you absolutely trust where it came from.
⚠️ Important heads-up: We're doing a basic safety check, not a full security scan. We catch extension mismatches, not actual malware.
This tool is NOT antivirus software. We can't detect:
- Malicious code hiding inside legitimate files (macro viruses, PDF exploits, etc.)
- Files with correct extensions that happen to contain malware
- Sophisticated malware or zero-day exploits
For real protection, use actual antivirus software and be careful what you download.
Nope! Since we only peek at the first 64KB, you can throw files of any size at us - even your 100GB video project. A 1KB text file and a 1TB database will both be analyzed in the same amount of time (milliseconds).
Any modern browser that supports WebAssembly will work just fine:
- Chrome/Edge 57+
- Firefox 52+
- Safari 11+
- Opera 44+
Basically, if your browser was released after 2017, you're good to go.
The detection engine is! We use the open-source mimetype-detector Rust crate. You can check out the code on GitHub and see exactly how it works.
Because lots of modern file formats are secretly based on other formats. Fun examples:
- DOCX is just a ZIP file full of XML documents
- XLSX is a ZIP file stuffed with spreadsheet data
- EPUB is a ZIP file containing HTML and images
- JAR files are literally just ZIP archives with Java code
We show both - the fancy name (DOCX) and what it's really built on (ZIP). Knowledge is power!
Learn More About File Detection
Want to dive deeper into file formats and detection techniques? Check out these authoritative resources:
- List of File Signatures (Magic Bytes) - Complete reference on Wikipedia
- File Format Overview - Understanding digital file structures
- MIME Types Specification - Internet media type standards
- File Format on Wikidata - Structured knowledge about formats
- File Magic Bytes Detection - Search Google for more techniques
- WebAssembly Technology - How we achieve browser-native speed
- Rust Programming Language - The language powering our detector
Still have questions?
Contact us at [email protected]