Frequently Asked Questions

Everything you need to know about how it works

Back to Detector

How It Works

What are "magic bytes"?

Think of magic bytes as a file's fingerprint - they're specific byte sequences at the very beginning of a file that tell you what type it is. Every file format has its own unique signature:

  • PNG files always start with 89 50 4E 47
  • JPEG files start with FF D8 FF
  • PDF files start with 25 50 44 46 (which spells "%PDF" in text)
  • ZIP files start with 50 4B 03 04 (the letters "PK")
  • Check out Wikipedia's List of File Signatures to see more

Our detector reads these magic bytes to figure out what your file really is, no matter what the extension says.

How does the detection process work?

Here's what happens when you upload a file:

  1. You select a file - Either drag and drop or click to browse
  2. We read the header - JavaScript grabs just the first 64KB of the file
  3. WebAssembly takes over - Those bytes get passed to our super-fast Rust code
  4. We identify the format - The detector matches the magic bytes against its database

Everything happens right here in your browser - nothing gets sent anywhere.

Why only read the first 64KB?

Because that's where all the important stuff is! File signatures are pretty much always at the very beginning. Reading just the first 64KB (65,536 bytes) gives us:

  • Blazing fast detection - Your 10GB video file? Analyzed in milliseconds
  • Low memory usage - We only load 64KB no matter how big your file is
  • Perfect accuracy - More than enough data for even complex formats

Even tricky formats like DOCX (which is actually a ZIP file full of XML) reveal themselves in those first 64KB.

What is WebAssembly (WASM)?

WebAssembly is like giving your browser superpowers. It's a way to run really fast code (like Rust or C++) directly in your browser, almost as fast as native desktop apps. We use it to:

  • Run high-performance Rust code right in your browser
  • Process files way faster than JavaScript alone could
  • Keep everything private - all the code runs on your machine
  • Pack support for ~500 file formats into a small package
What are container formats and how does the detector handle them?

Here's a fun fact: Many modern file formats are basically ZIP files in disguise! They're containers - standard archive formats (like ZIP) that hold specific structured content inside. Our detector is smart enough to identify both what the file pretends to be and what it actually is underneath.

How we detect containers:

  1. We read the magic bytes and spot the container (like ZIP)
  2. We peek inside at the structure and metadata (all within that first 64KB)
  3. We look for specific markers that reveal the true format (like DOCX)
  4. We show you both - the specific format and what it's built on

Common container-based formats:

  • ZIP-based:
    • DOCX, XLSX, PPTX (Microsoft Office Open XML)
    • ODT, ODS, ODP (OpenDocument formats)
    • EPUB (eBook format)
    • JAR (Java Archive)
    • APK (Android Package)
    • XPI (Firefox Extension)
  • Other containers:
    • MP4, M4A, MOV (based on ISO Base Media File Format)
    • OGG, OGV (Ogg container for audio/video)
    • WebM (Matroska-based)

Why should you care?

  • Accuracy: Even if someone renames a DOCX to .zip, we'll still recognize it as a Word document
  • Understanding: You'll know what your file is actually made of (super useful if you're a developer)
  • Compatibility: Knowing the container tells you which tools can work with your file
  • Recovery: If a DOCX won't open in Word, you can unzip it manually and grab your data

What you'll see: Upload a DOCX file and we'll show you:

  • Format: DOCX (Word Document)
  • MIME Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
  • Based On: application/zip (.zip)

That "Based On" line reveals the secret - it's fundamentally a ZIP archive with Word-specific stuff inside.

General Questions

Is my file uploaded to a server?

No, absolutely not. Your files never leave your browser. This is a completely client-side application - all processing happens locally using WebAssembly. We don't have a backend server to receive files even if we wanted to.

How many file types are supported?

We support over 500+ file formats! Here's a taste of what we can identify:

  • Images: JPEG, PNG, GIF, WebP, TIFF, BMP, SVG, ICO, HEIC
  • Videos: MP4, AVI, MKV, WebM, MOV, FLV, WMV
  • Audio: MP3, WAV, FLAC, OGG, AAC, M4A, WMA
  • Documents: PDF, DOCX, XLSX, PPTX, ODT, EPUB
  • Archives: ZIP, RAR, 7Z, TAR, GZIP, BZIP2
  • Executables: EXE, DLL, ELF, Mach-O
  • ...and hundreds more
What if the file type is not detected?

If we can't identify your file, it usually means:

  • The file has no magic byte signature (some formats just don't have one)
  • It's a custom or proprietary format we haven't seen before
  • The file might be corrupted
  • It's plain text (we'll show it as text/plain)

For unknown binary files, you'll see application/octet-stream - basically our way of saying "it's a file, but we're not sure what kind." Learn more about MIME types and media types.

What is the ransomware warning?

We'll warn you if we spot an executable file trying to look innocent. Common tricks include:

  • An .exe file pretending to be a .jpg
  • A .dll file disguised as a .pdf
  • A .scr screensaver masquerading as a .mp3

This is Malware 101. If you see this warning, don't open the file unless you absolutely trust where it came from.

⚠️ Important heads-up: We're doing a basic safety check, not a full security scan. We catch extension mismatches, not actual malware.

This tool is NOT antivirus software. We can't detect:

  • Malicious code hiding inside legitimate files (macro viruses, PDF exploits, etc.)
  • Files with correct extensions that happen to contain malware
  • Sophisticated malware or zero-day exploits

For real protection, use actual antivirus software and be careful what you download.

Is there a file size limit?

Nope! Since we only peek at the first 64KB, you can throw files of any size at us - even your 100GB video project. A 1KB text file and a 1TB database will both be analyzed in the same amount of time (milliseconds).

Can I use this offline?

Yep! Once the page loads for the first time (downloading the WebAssembly module), the detector works offline. The only thing that won't work is the "Learn More" links to Wikipedia, Wikidata, and Google - those obviously need internet.

What browsers are supported?

Any modern browser that supports WebAssembly will work just fine:

  • Chrome/Edge 57+
  • Firefox 52+
  • Safari 11+
  • Opera 44+

Basically, if your browser was released after 2017, you're good to go.

Is this open source?

The detection engine is! We use the open-source mimetype-detector Rust crate. You can check out the code on GitHub and see exactly how it works.

Why does it show a "parent format"?

Because lots of modern file formats are secretly based on other formats. Fun examples:

  • DOCX is just a ZIP file full of XML documents
  • XLSX is a ZIP file stuffed with spreadsheet data
  • EPUB is a ZIP file containing HTML and images
  • JAR files are literally just ZIP archives with Java code

We show both - the fancy name (DOCX) and what it's really built on (ZIP). Knowledge is power!

Learn More About File Detection

Want to dive deeper into file formats and detection techniques? Check out these authoritative resources:

Still have questions?

Contact us at [email protected]