Introducing fashion command-line tool

fashion is a Swift command-line interface to traverse a file hierarchy and compute or match popular hash digests. The project is hosted on GitHub and natively supports:

  • CryptoKit hash functions: SHA-2 (SHA256 by default), insecure SHA-1 and MD5
  • fuzzy hash functions: SSDeep and TLSH as submodules with Swift C bridging
  • searching for multiple digests with any algo and a similarity / distance score for fuzzy hash
  • git-hash-object
  • symhash with any algo, separator and optional sort (Mach-O binaries)
  • eXtensible ARchiver (XAR archives e.g. installation packages on macOS) table of contents checksum with any algo and optional decompress (zlib)
  • multithreading

With optimizations, fashion is very fast yet has a minimal real memory footprint < 150MB:

Machine App count File count SHA256 time TLSH time
Mac Studio M2 Ultra 292 apps 910,000 files 1 minute 2 minutes
MacBook Air M4 231 apps 720,000 files 42 seconds 1 minute 30 secs

Features#

Supported algorithm#

Project algorithm choices are driven by interoperability with existing tools and formats.

Insecure MD5 and SHA1#

MD5 and SHA1 both have known collisions and are considered insecure.

TLSH#

Trend Micro library counts the total data length using an unsigned int data_len i.e. 32-bit, so any file larger than ~4GB is undefined behavior. Even worse, on Java, the code was using a signed integer (~2GB).

fashion follows Trend Micro recommendations and backports the Java fix from TLSH 4.6.0 to define the TLSH of a file as the TLSH of its first ~4GB.

Git hash#

The git and git256 algo compute the hash of a Git blob object. With the hash, you can look up the object across all branches and commits of a repository.

$ git log --raw --all --format='' --find-object=$(fashion --algo git --quiet .swiftformat)
:000000 100644 0000000 842281a A        .swiftformat

CDHash#

Compute the Code Directory hash of signed Mach-O binaries, according to the strongest supported hash. While we print the full hash, we can match both CDFullHash and truncated CDHash.

Quiet flag#

With the -q / --quiet flag, we only print file digests.

Match mode#

Use -m / --match to search for files matching one or more digests.

$ fashion --algo tlsh --match $(fashion --quiet --algo tlsh /usr/bin/true) /usr/bin/
T1B683F9DB67586C65EC98A97412CEE6237F33E7950FA2401760A1C4E93E437B67E3980C   40  /usr/bin/update_dyld_shared_cache
T10483D9DF1B582C51ED4C987012CEA6677F33E7950F92422B60A1C4E92E437BB6E3984C    0  /usr/bin/true
T16083DADB57582C64EC989C7412CEA727BF33E7550B92412B60A1C4EA3E437B67E3584C   26  /usr/bin/false

With the quiet flag, we only print matching paths.

Symbol mode#

Following the convention used by ImpHash, symhash, and tools like VirusTotal, --symhash mode defaults to MD5 of the ordered external symbols list joined by the , separator, but you can pick any algo, separator, and keep the symbols list as is.

$ fashion --symhash --algo ssdeep --match $(fashion --symhash --algo ssdeep admobs.ru/agent/bin/Pods --quiet) admobs.ru
6:vILFGL4MPEm03Jc6DVXYtZPGujVpEKK7XBUL9xMw:vIJMgqIXYTnhpEKK7X8xMw  100  admobs.ru/agent/bin/cat
6:vILFGL4MPEm03Jc6DVXYtZPGujVpEKK7XBUL9xMw:vIJMgqIXYTnhpEKK7X8xMw  100  admobs.ru/agent/bin/Pods
6:vILFGL4MPEm03Jc6DVXYtZPGujVpEKK7XBUL9xMw:vIJMgqIXYTnhpEKK7X8xMw  100  admobs.ru/sys/bin/Pods
6:vILFGL4MPEm03Jc6DVXYtZPGujVpEKK7XBUL9xMw:vIJMgqIXYTnhpEKK7X8xMw  100  admobs.ru/d

XAR mode#

Installation packages on macOS. According to xar(1):

xar is no longer under active development by Apple. Clients of xar should pursue alternative archive formats.

eXtended ARchive header is straightforward, so we implement our own parser to read the table of contents (TOC) instead of bridging the library.

Just like xar --dump-toc-cksum, --xar-toc mode defaults to SHA1 of the compressed TOC, but you can choose any algo and even have the TOC decompressed (zlib).

Mach-O support#

fashion parses universal and thin Mach-O binaries natively. The --slices flag hashes each architecture individually in addition to the whole file. Supported architectures: arm64, arm64e, x86_64, i386, and legacy ppc / ppc64.

Concurrency#

The -j / --jobs flag controls parallel workers. Set -j 0 to use all available CPU cores.

The --sort flag trades some throughput for deterministic output order—paths are collected, sorted, and results are emitted sequentially even under concurrent processing.

Misc#

Completion#

swift-argument-parser provides free completion for bash, fish and zsh:

fashion --generate-completion-script

With fish-shell, you can even source the generated completion script on-demand, and it will remain for the session:

echo "fashion --generate-completion-script fish | source" > ~/.config/fish/completions/fashion.fish

Very convenient when you craft your own tools.

History#

The original fashion was a workaround for repetitive loading of the Perl command shasum.

The name is a mashup between my original bash function fsha and the David Bowie song, more specifically the duo with Frank Black from The Pixies.

Python made it very easy thanks to built-in hashlib, io, and python-ssdeep or tlsh / py-tlsh modules were straightforward. But uv was complaining about pkg_resources, and I needed features Python couldn’t give me without significant effort.

So here we are: Swift, native, concurrent, with C and C++ dependencies bridged as submodules, and zero external runtime requirements.