How can a machine learn to recognize visual attributes emerging out of online community without a definitive supervised dataset? This paper proposes an automatic approach to discover and analyze visual attributes from a noisy collection of image-text data on the Web. Our approach is based on the relationship between attributes and neural activations in the deep network. We characterize the visual property of the attribute word as a divergence within weakly-annotated set of images. We show that the neural activations are useful for discovering and learning a classifier that well agrees with human perception from the noisy real-world Web data. The empirical study suggests the layered structure of the deep neural networks also gives us insights into the perceptual depth of the given word. Finally, we demonstrate that we can utilize highly-activating neurons for finding semantically relevant regions.


Sirion Vittayakorn, Takayuki Umeda, Kazuhiko Murasaki, Kyoko Sudo, Takayuki Okatani, Kota Yamaguchi  Automatic Attribute Discovery with Neural Activations
European Conference on Computer Vision (ECCV) 2016. Amsterdam, The Netherlands.


  title     = {Automatic Attribute Discovery with Neural Activations},
  author    = {Sirion Vittayakorn and Takayuki Umeda and Kazuhiko Murasaki and Kyoko Sudo and Takayuki Okatani and Kota Yamaguchi},
  year      = {2016},
  booktitle = {European Conference on Computer Vision (ECCV)}


Etsy datasetProduct metadata from Etsy, such as title, description, tags, materials, or image URLs of 2.8 million product listings sold in Sep 2014.
Wear datasetPost metadata from Wear.jp, such as description, tags, item list, or image URLs of 212K posts collected in Oct 2015.