This web page is a repository of links pointing to useful resources e.g. bibliography, data sets, reference source code. The expectation is that this archive will help newcomers jumpstart their research and encourage reproducible research. Should you know about an IFS-relevant resource that is not listed below, do not hesitate to drop us a message.
- Digital Forensic Database (Dartmouth College) - The DFD maintains a comprehensive bibliography of technical papers in the field of digital image, audio, and video forensics.
Public Data Sets
- FERET Database: The FERET image corpus was assembled to support government monitored testing and evaluation of face recognition algorithms using standardized tests and procedures. The final corpus, presented here, consists of 14051 eight-bit grayscale images of human heads with views ranging from frontal to left and right profiles
- CASIA Face Image Database Version 5.0: This database contains 2,500 color facial images of 500 subjects captured using Logitech USB camera in one session. All face images are 16 bit color BMP files and the image resolution is 640*480. Typical intra-class variations include illumination, pose, expression, eye-glasses, imaging distance, etc.
- Quantum Signal Biometrics Database (QSBD): This Database contains an extensive, high quality set of still images, video recordings, and audio recordings of more than 300 human subjects. The collection experiment had the intent of providing a high-quality corpus from which to develop and text speech, face, or multimodal biometrics algorithms and/or software.
- Hong Kong Polytechnic University (PolyU) NIR Face Database: This database was built to advance research and to provide researchers working in the area of face recognition with an opportunity to compare the effectiveness of face recognition algorithms. It consists of images obtained by a NIR face capture device and is freely available for academic, noncommercial uses.
- FEI Face Database: This is a Brazilian face database that contains 14 images for each of 200 individuals, a total of 2800 images. All images are colourful and taken against a white homogenous background. All faces are mainly represented by people between 19 and 40 years old with distinct appearance, hairstyle, and adorns.
- PUT Face Database: This database consists of 9971 images of 100 people. The focus is on development of validation algorithms images were taken in partially controlled illumination conditions over an uniform background. The main source of face appearance variations were the changes in head pose.
- UMASS Labeled Faces in the Wild: This is a database of face photographs designed for studying the problem of unconstrained face recognition. The data set contains more than 13,000 labeled images of faces collected from the web. The only constraint on these faces is that they were detected by the Viola-Jones face detector.
- Sheffield Face Database: This database consists of 564 images of 20 individuals of mixed race and both genders. Each individual is shown in a range of poses from profile to frontal views. The files are all in PGM format, approximately 220 x 220 pixels with 256-bit grey-scale. Some restrictions apply regarding publication of individual images.
- Stirling Face Datasets: This page consists of a number of different face databases hosted by Stirling University. Examples of database characteristics include smiling faces, images of expressions of pain, etc.
- BioID face database: This database has been recorded and is published to enable evaluation and comparison of face detection algorithms. Special emphasis has been placed on "real world" conditions. Therefore the testset features a large variety of illumination, background, and face size.
- Max Planck Institute Face Video Database: This database contains videos of facial action units which were recorded at the MPI for Biological Cybernetics. The Videolab technology allows recording of facial movements from six different viewpoints at the same time while maintaining a very precise synchronization between the different cameras.
- Essex University Facial Images: This database consists of 7900 images of 395 persons of both genders, different ages and different races. They are in 24-bit color JPEG format and under artificial lighting. Please note restrictions that pply regarding the display of these images.
- MIT-CBCL face recognition database: This database contains face images of 10 subjects. They provide two training sets: (1) High resolution pictures, including frontal, half-profile and profile view. (2) Synthetic images (324/subject) rendered from 3D head models of the 10 subjects. The 3D models are not included in the database. The test set consists of 200 images per subject.
- Cohn-Kanade AU-Coded Facial Expression Database: This consists of two database for research in automatic facial image analysis and synthesis and for perceptual studies. There are images of several posed and non-posed expressions. There are also sequences of images ranging from a neutral to a peak expression.
- The ORL Database of Faces - Cambridge University Computer Laboratory: There are ten different images of each of 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement). The files are in PGM format.
- NIST Standard Reference Data: This page links to 7 databases, that are available for puchase through NIST. These include (1) Fingerprint Scale Images (2) Livescan Fingerprint Data (3) Dual Resolution from Paired Fingerprint Data (4) Latent and Matching Tenprint Data (5) Mated Fingerprint Card Pairs (6) Plain and Rolled Images (7) Supplemental Fingerprint Card Data.
- CASIA Fingerprint Image Database Version 5.0: This database contains 20,000 fingerprint images of 500 subjects. The fingerprint images were captured using URU4000 fingerprint sensor in one session. Each volunteer contributed 40 fingerprint images of his eight fingers (left and right thumb/second/third/fourth finger), i.e. 5 images per finger. The volunteers were asked to rotate their fingers with various levels of pressure to generate significant intra-class variations. All fingerprint images are 8 bit gray-level BMP files and the image resolution is 328 x 356.
- NIST Special Biometric Databases - This is a listing of test data produced by NIST's Image Group for use in evaluating automated OCR, fingerprint classification/matching, and face recognition systems.
Physical Object Security and Anti-counterfeiting
- Forensic Authentication Microstructure Optical Set aka. FAMOS 1 (University of Geneva) - The FAMOS1 is a dataset with 5000 unique microstructures from consumer packages for the development, testing and benchmarking of forensic identification and authentication technologies. All samples have been acquired 3 times with two different cameras giving 30,000 images in total.
- Netflix Prize - Participants were provided with data sets containing users' previous ratings to films, and were required to predict users' future ratings for those films. In this competition, two rating data sets were provided.
- CANT Competition (University of Rhode Island and Peking University - Participants were provided with normal rating data, and were required to upload unfair ratings aiming to mislead the final reputation score. After the competition, the attack dataset is available upon request.
- Epinions Data - First dataset contains 5-week crawl from the Epinions.com Web site. The second dataset contains additional distrust lists which are not available to general public.
- Mobile App Installation Data (MIT Media Lab) - Collected from March to July 2010, this dataset recorded the installations of 821 apps from 55 participants as well as the social interactions (e.g. phone call, blue tooth etc.) among participants.
- Kaggle - A platform for predictive modeling and analytics competitions. Companies and researchers post their data. Statisticians and data miners from all over the world compete to produce the best models. Some examples of the competitions are: Facebook Recruiting Competition, CPROD1: Consumer PRODucts contest #1, Detecting Insults in Social Commentary, Job Recommendation Engine Challenge.
- Toward steganalysis into the wild a.k.a ALASKA - Collection of 50.000 still images used during the contest (raw files, developement scripts and resulting ppm/pgm files are available).
- Break Our Steganographic System aka. BOSS - Collection of still images used during the contest.
- LIRMMBase database (Color, 512x512, 256x256) LIRMMBase - Collection of still images in Color and in Grey-Levels.
- Datasets for CANTATA project - In the context of the European CANTATA project, partners involved in multi content analysis validation methods combined their efforts to create a webpage to share knowledge about datasets (sets & metadata & ground truth & metrics...) for three different domains: surveillance, consumer electronic and medical.
- Break Our Watermarking System, 2nd edition, aka. BOWS 2 (Ecole Centrale Lille) - Large collection of still images.
Reference Source Code
- Binghamton download section (Binghamton University) - Collection of reference source code for various steganographic and steganalysis tools.
- Break Our Steganographic System aka. BOSS - Source code of HUGO, the reference algorithm used during the BOSS contest.
- Break Our Watermarking System, 2nd edition, aka. BOWS 2 (Ecole Centrale Lille) - Source code of Broken Arrows, the reference algorithm selected during the BOWS'2 contest.
- Digital Watermarking Source (Universität Salzburg) - C source code for a number of watermarking algorithms.
- Tardos decoding (INRIA) - Source code for joint decoding of Tardos fingerprinting codes.