Sei sulla pagina 1di 3

1/7/2020 pHash.

org: Home of pHash, the open source perceptual hash library

pHash
The open source perceptual hash library
Home
Demo
Docs
Download
Support
Licensing
Apps

What is a perceptual hash?


A perceptual hash is a fingerprint of a multimedia file derived from various features from its content. Unlike
cryptographic hash functions which rely on the avalanche effect of small changes in input leading to drastic
changes in the output, perceptual hashes are "close" to one another if the features are similar.

Relevance of Perceptual Hashing

Perceptual hashes must be robust enough to take into account transformations or "attacks" on a given input
and yet be flexible enough to distinguish between dissimilar files. Such attacks can include rotation, skew,
contrast adjustment and different compression/formats. All of these challenges make perceptual hashing an
interesting field of study and at the forefront of computer science research.

What is pHash?

pHash is an open source software library released under the GPLv3 license that implements several
perceptual hashing algorithms, and provides a C-like API to use those functions in your own programs.
pHash itself is written in C++.

pHash 0.9.6 Released

04.23.2013 pHash 0.9.6 fixes some compilation errors and warnings, as well as updates to the automake files
to support building on Gentoo.

News and Updates:


04.23.2013 pHash 0.9.6 released. Fix some compilation errors and warnings, as well as updates to the
automake files to support building on Gentoo.

11.23.2012 pHash 0.9.5 released. Fix compilation problem with using deprecated FFmpeg functions.

10.20.2011 Cumulix 1.0 Cumulix is an extremely fast and scalable cloud-based image search and retrieval
system based on pHash Pro and Neo4j.

01.31.2011 pHash 0.9.4 released. Added radial image hash to Java bindings, fixed compilation on Mac OS X
with complex header type, and the examples linking to pthread.

12.24.2010 MVPTree v1.0 New download available. The MVPTree is a generic distance-based indexing
structure to store n-dimensional data points. The distance function is configurable as well as the type of data.

https://www.phash.org 1/3
1/7/2020 pHash.org: Home of pHash, the open source perceptual hash library

10.29.2010 AudioScout v1.0 audio content indexing software released! A scalable audio content indexing
solution for managing a collection of audio files, AudioScout is a set of distributed servers that index audio
signals based on low-level features in the signal, not just simply on the filename or even its metadata. This
makes it ideal for such uses as duplicate detection or the protection of copyrights. It works on both music and
speech content. Read the preliminary paper or contact us for further details.

08.15.2010 pHash 0.9.3 released. Fixed a bug with the auxiliary header file causing mp3 support to break.

08.15.2010 pHash 0.9.2 released. Fixed a bug in the audio perceptual hash when converting from stereo to
mono for WAV/Ogg/FLAC audio files.

06.15.2010 pHash 0.9.1 released. Removed dependency on ffmpeg for audio functions (now using
libmpg123,libsndfile and libsamplerate libraries), cleaned up java bindings, fixed bug in determining number
of cpus on mac os x, fixed bug in multi-threaded image, audio and video functions, preliminary bindings for
php and c#.

03.28.2010 pHash 0.9.0 released. Multithreading support added for hash functions, audio hash can now read
ogg and flac files and image hash can handle RGBA files. Fixed a heap corruption bug in the mvp storage
functions.

01.28.2010 pHash 0.8.1 released. Minor bug fixes for MH image hash and compilation with older gcc
releases.

01.25.2010 pHash 0.8.0 released. A new perceptual hash has been added based on the Marr/Mexican hat
wavelet, the JNI has been greatly improved, and several bugs have been fixed.

12.23.2009 pHash 0.7.2 released. Fixed a bug when building on systems where mremap is not present.

12.20.2009 pHash 0.7.1 released. Updates to the Java bindings to use new dct video hash, removed need for
FFTW, included spec file for creating RPMs and general code clean up.

12.12.2009 pHash 0.7 released. Fixed a bug in the perceptual text hash to make hash truly cyclic (credit to
Xiaofan Lin for discovering the bug), now works with latest CImg versions, as well as on Windows and BSD
systems.

10.07.2009 pHash 0.6 released. The new release contains a variable length DCT video hash which
supercedes the previous video hash.

09.12.2009 The end-stopped wavelets are taking longer than anticipated, so in the meantime we've been
devoting more time to improving the video hash to handle longer videos. Look for it in the 0.6 release.

07.22.2009 The next version of pHash is in the works and will include a new image hash based on Gabor and
end-stopped wavelets, leading to better feature extraction. We will also be improving the video hash to
account for longer videos. Stay tuned!

07.02.2009 pHash 0.5 released.

06.29.2009 Custom index technique added for quick storage, search and retrieval of all hash values within a
given distance of a query. This technique uses a specially developed file format for persistent storage and can
be used for virtually any size hash and distance metric. Preliminary testing reveals a 300% improvement in
search time over a simple linear search. For image or audio hashes, additional storage amounts to less than
0.05% of the space used by the actual files. To be included in the 0.5 release!

06.22.2009 Support for Textual hashing is now in the library. Although support is limited to plain utf-8
textual encoded documents for now, the functions allow for a quick scan of documents to find string matches
and their offsets. Expect this in the next release.

06.05.2009 Changed the build system to use the gnu autoconf tools. This should make things easier to build
and install the pHash lib and program files.

https://www.phash.org 2/3
1/7/2020 pHash.org: Home of pHash, the open source perceptual hash library

04.15.2009 Java bindings for all pHash library functions.

02.03.2009 pHash now supports hashing for audio files. Derived from frequency spectrum data along the
bark scale, this hash is based on characteristics that tend to be the most prominent for the human auditory
system. Furthermore, the number of hashes generated per file vary according to the number of samples in the
audio file, so short clips can be matched to longer sound files. Naturally, the longer the clip, the more
successful it will be. So far, this has proven to work well with 30 second music clips when altered by either
mp3 compression and/or telephone simulated filtering.

11.04.2008-2010 The dct hash method has been adapted to video. This is useful for short video clips only,
since the entire video is condensed to a fixed length hash.

10.24.2008-2010 Support for an image hash based on the discrete cosine transform. The DCT is a quick and
efficent method to write a hash based on frequency data of the underlying image. While it is generally not
sophisticated enough to identify visually similar images in any semantically meaningful way, it is fairly
robust against minor distortions of the image, such as blurring, rotation and different compression formats.

That's great but what is it good for?


Potential applications include copyright protection, similarity search for media files, or even digital
forensics. For example, YouTube could maintain a database of hashes that have been submitted by the major
movie producers of movies to which they hold the copyright. If a user then uploads the same video to
YouTube, the hash will be almost identical, and it can be flagged as a possible copyright violation. The audio
hash could be used to automatically tag MP3 files with proper ID3 information, while the text hash could be
used for plagiarism detection.

Have another use for pHash? Let us know!

Copyright © 2008-2010 Evan Klinger & David Starkweather | Valid xhtml

https://www.phash.org 3/3

Potrebbero piacerti anche