Hadi Kiapour



Research Scientist, Ph.D.

Computer Vision

eBay, San Francisco, CA

CV | Email | Google Scholar | Linkedin


About Me

I am a research scientist at eBay, where I work on computer vision and deep learning research. At eBay, I have worked on Visual Search for eBay ShopBot and Shop The Look.

I did my PhD in Computer Science at University of North Carolina at Chapel Hill, where I was advised by Tamara Berg and worked closely with Alex Berg. I am interested in all aspects of computer vision and related problems. My thesis was on large scale visual recognition of clothing, people and fashion styles. I also work on image retrieval, fine-grained classification, deep adversarial networks and multimodal learning.


News



Publications


Every Brand is a Story


Every Brand is a Story: Going beyond Logos in Fashion Brands by Visual Understanding

M. Hadi Kiapour, Robinson Piramuthu
Coming on arXiv soon, 2018

abstract | bibtex

Brands express themselves in various visual forms. Although logo is a popular mode of expression, brands show their uniqueness widely through other visual cues such as color, patterns and shapes. In this work, we analyze the response of neurons in deep networks to get insights into different visual cues in fashion brands. Neurons extract information at various levels of abstraction. In particular, the activation strength and extent of neurons are studied to provide interesting insights about visual brand expressions. The proposed method identifies where a brand stands in the spectrum of branding strategy, i.e., from trademark-emblazoned goods with bold logos to implicit no logo marketing. By quantifying attention maps, we are able to interpret the visual characteristics of a brand present in a single image and model the general design direction of a brand as a whole. We further investigate versatility of neurons and discover "specialists" that are highly brand-specific and "generalists" that detect diverse visual features. A human experiment based on three main visual scenarios of fashion brands is conducted to verify the alignment of our quantitative measures with the human perception of brands. This paper encourages readers to think beyond logo in order to recognize brand from an image.

@inproceedings{kiapourarXiv18every,
	Author = {Kiapour, Hadi and 
	Piramuthu, Robinson},
    Title = {Every Brand is a Story: Going beyond Logos in Fashion Brands by Visual Understanding},
    Journal = {arXiv},
    Year = {2018}
}

Image-Text Embedding


Conditional Image-Text Embedding Networks

Bryan A. Plummer, Paige Kordas, M. Hadi Kiapour, Shuai Zheng, Robinson Piramuthu, Svetlana Lazebnik
arXiv, 2018

pdf | code | abstract | bibtex

This paper presents an approach for grounding phrases in images which jointly learns multiple text-conditioned embeddings in a single end-to-end model. In order to differentiate text phrases into semantically distinct subspaces, we propose a concept weight branch that automatically assigns phrases to embeddings, whereas prior works predefine such assignments. Our proposed solution simplifies the representation requirements for individual embeddings and allows the underrepresented concepts to take advantage of the shared representations before feeding them into concept-specific layers. Comprehensive experiments verify the effectiveness of our approach across three phrase grounding datasets, Flickr30K Entities, ReferIt Game and Visual Genome, where we obtain a (resp.) 3.5%, 2%, and 3.5% improvement in grounding performance over a strong region-phrase embedding baseline.

@inproceedings{plummerarXiv18conditional,
	Author = {Bryan A. Plummer and Paige Kordas and 
	M. Hadi Kiapour and Shuai Zheng and Robinson Piramuthu 
	and Svetlana Lazebnik},
    Title = {Conditional Image-Text Embedding Networks},
    Journal = {arXiv},
    Year = {2018}
}

Twenty Questions Game


Twenty Questions Game: Finding Images using Human-in-the-loop Feedback

Bryan A. Plummer, M. Hadi Kiapour, Shuai Zheng, Robinson Piramuthu
Coming on arXiv soon, 2018

abstract | bibtex

In this paper, we introduce an attribute-based interactive image search with human-in-the-loop feedback to iteratively refine the image search results. We study active image search where human feedback is solicited exclusively in visual form, without using relative attribute annotations used in prior work, which are expensive to collect for large datasets. In order to optimize the image selection strategy, a deep reinforcement model is trained to take advantage of the interplay between attributes. Additionally, we extend the recently introduced Conditional Similarity Network to incorporate global similarity in training visual embeddings, which results in a more natural transition as the user explores the learned similarity embeddings. Our experiments demonstrate the effectiveness of our approach by producing compelling results on both active image search and image attribute representation tasks.

@inproceedings{plummerarXiv18twenty,
	Author = {Bryan A. Plummer and M. Hadi Kiapour and 
	Shuai Zheng and Robinson Piramuthu},
    Title = {Twenty Questions Game: Finding Images using Human-in-the-loop Feedback},
    Journal = {arXiv},
    Year = {2018}
}

Large Scale Visual Search


Visual Search at eBay

Fan Yang, Ajinkya Kale, Yury Bubnov, Leon Stein, Qiaosong Wang, M. Hadi Kiapour, Robinson Piramuthu
ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2017

pdf | video | abstract | bibtex

In this paper, we propose a novel end-to-end approach for scalable visual search infrastructure. We discuss the challenges we faced for a massive volatile inventory like at eBay and present our solution to overcome those. We harness the availability of large image collection of eBay listings and state-of-the-art deep learning techniques to perform visual search at scale. Supervised approach for optimized search limited to top predicted categories and also for compact binary signature are key to scale up without compromising accuracy and precision. Both use a common deep neural network requiring only a single forward inference. The system architecture is presented with in-depth discussions of its basic components and optimizations for a trade-off between search relevance and latency. This solution is currently deployed in a distributed cloud infrastructure and fuels visual search in eBay ShopBot and Close5. We show benchmark on ImageNet dataset on which our approach is faster and more accurate than several unsupervised baselines. We share our learnings with the hope that visual search becomes a first class citizen for all large scale search engines rather than an afterthought.

@inproceedings{yangKDD17visual,
	Author = {Fan Yang and Ajinkya Kale and 
	Yury Bubnov and Leon Stein and Qiaosong Wang and 
	Hadi Kiapour and Robinson Piramuthu},
    Title = {Visual Search at eBay},
    Journal = {ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Applied Data Science Track},
    Year = {2017}
}

Where to Buy It


Where to Buy It: Matching Street Clothing Photos to Online Shops

M. Hadi Kiapour, Xufeng Han, Svetlana Lazebnik, Alexander C. Berg, Tamara L. Berg
International Conference on Computer Vision (ICCV), 2015 (~Oral presentation)

pdf | poster | oral presentation | spotlight video | project page | dataset | abstract | bibtex

In this paper, we define a new task, Exact Street to Shop, where our goal is to match a real-world example of a garment item to the same item in an online shop. This is an extremely challenging task due to visual differences between street photos (pictures of people wearing clothing in everyday uncontrolled settings) and online shop photos (pictures of clothing items on people, mannequins, or in isolation, captured by professionals in more controlled settings). We collect a new dataset for this application containing 404,683 shop photos collected from 25 different online retailers and 20,357 street photos, providing a total of 39,479 clothing item matches between street and shop photos. We develop three different methods for Exact Street to Shop retrieval, including two deep learning baseline methods, and a method to learn a similarity measure between the street and shop domains. Experiments demonstrate that our learned similarity significantly outperforms our baselines that use existing deep learning based representations.

@inproceedings{kiapourICCV15where,
	Author = {M. Hadi Kiapour and Xufeng Han and 
	Svetlana Lazebnik and Alexander C. Berg and Tamara L. Berg},
    Title = {Where to Buy It: Matching Street Clothing Photos to Online Shops},
    Journal = {International Conference on Computer Vision (ICCV)},
    Year = {2015}
}

Mid-Level Representations for Fine-Grained Classification


Mine the Fine: Fine-Grained Fragment Discovery

M. Hadi Kiapour, Wei Di, Vignesh Jagadeesh, Robinson Piramuthu
International Conference on Image Processing (ICIP), 2015

pdf | poster | abstract | bibtex

While discriminative visual element mining has been introduced before, in this paper we present an approach that requires minimal annotation in both training and test time. Given only a bounding box that contains the foreground objects, it automatically transforms the input images into a roughly-aligned pose space and automatically discovers the most discriminative visual fragments of each category. These fragments are then used to learn robust classifiers that discriminate between very similar categories under challenging conditions such as large variations in pose or habitats. The minimal required input, is a critical characteristic that enables our approach to easily generalize over other visual domains. Moreover, our approach takes advantage of deep networks that are targeted towards fine-grained classification. It learns mid-level representations that are specific to a category and generalize well across the category instances at the same time. Our evaluations show that the automatically learned representation based on discriminative fragments, significantly outperforms globally extracted deep features in classification accuracy.

@inproceedings{kiapourICIP15mine,
	Author = {M. Hadi Kiapour and Wei Di and 
	Vignesh Jagadeesh and Robinson Piramuthu},
    Title = {Mine the Fine: Fine-Grained Fragment Discovery},
    Journal = {International Conference on Image Processing (ICIP)},
    Year = {2015}
}

Hipster Wars


Hipster Wars: Discovering Elements of Fashion Styles

M. Hadi Kiapour, Kota Yamaguchi, Alexander C. Berg, Tamara L. Berg
European Conference on Computer Vision (ECCV), 2014

pdf | poster | project Page | dataset | HipsterWars game | abstract | bibtex

The clothing we wear and our identities are closely tied, revealing to the world clues about our wealth, occupation, and socioidentity. In this paper we examine questions related to what our clothing reveals about our personal style. We first design an online competitive Style Rating Game called Hipster Wars to crowd source reliable human judgments of style. We use this game to collect a new dataset of clothing outfits with associated style ratings for 5 style categories: hipster, bohemian, pinup, preppy, and goth. Next, we train models for between class and within-class classification of styles. Finally, we explore methods to identify clothing elements that are generally discriminative for a style, and methods for identifying items in a particular outfit that may indicate a style.

@inproceedings{kiapourECCV14hipster,
	Author = {M. Hadi Kiapour and Kota Yamaguchi and 
	Alexander C. Berg and Tamara L. Berg},
    Title = {Hipster Wars: Discovering Elements of Fashion Styles},
    Journal = {European Conference on Computer Vision (ECCV)},
    Year = {2014}
}

Materials Discovery


Materials Discovery: Fine-Grained Classification of X-ray Scattering Images

M. Hadi Kiapour, Kevin Yager, Alexander C. Berg, Tamara L. Berg
IEEE Winter Conference on Applications of Computer Vision (WACV), 2014

pdf | poster | slides | abstract | bibtex

We explore the use of computer vision methods for organizing, searching, and classifying x-ray scattering images. X-ray scattering is a technique that shines an intense beam of x-rays through a sample of interest. By recording the intensity of x-ray deflection as a function of angle, scientists can measure the structure of materials at the molecular and nano-scale. Current and planned synchrotron instruments are producing x-ray scattering data at an unprecedented rate, making the design of automatic analysis techniques crucial for future research. In this paper, we devise an attribute-based approach to recognition in x-ray scattering images and demonstrate applications to image annotation and retrieval.

@inproceedings{kiapourWACV14materials,
	Author = {M. Hadi Kiapour and Kevin Yager and 
	Alexander C. Berg and Tamara L. Berg},
    Title = {Materials Discovery: Fine-Grained Classification of X-ray Scattering Images},
    Journal = {IEEE Winter Conference on Applications of Computer Vision (WACV)},
    Year = {2014}
}

Retrieving Similar Styles to Parse Clothing


Retrieving Similar Styles to Parse Clothing

Kota Yamaguchi, M. Hadi Kiapour, Luis E. Ortiz, Tamara L. Berg
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2014

pdf | project page | dataset | code | abstract | bibtex

Clothing recognition is a societally and commercially important yet extremely challenging problem due to large variations in clothing appearance, layering, style, and body shape and pose. In this paper, we tackle the clothing parsing problem using a retrieval-based approach. For a query image, we find similar styles from a large database of tagged fashion images and use these examples to recognize clothing items in the query. Our approach combines parsing from: pre-trained global clothing models, local clothing models learned on the fly from retrieved examples, and transferred parse-masks (Paper Doll item transfer) from retrieved examples. We evaluate our approach extensively and show significant improvements over previous state-of-the-art for both localization (clothing parsing given weak supervision in the form of tags) and detection (general clothing parsing). Our experimental results also indicate that the general pose estimation problem can benefit from clothing parsing.

@inproceedings{yamaguchiTPAMI14retrieving,
	Author = {Kota Yamaguchi and M. Hadi Kiapour and 
	Luis E. Ortiz and Tamara L. Berg},
    Title = {Retrieving Similar Styles to Parse Clothing},
    Journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
    Year = {2014}
}

Paper Doll Parsing


Paper Doll Parsing: Retrieving Similar Styles to Parse Clothing Items

Kota Yamaguchi, M. Hadi Kiapour, Tamara L. Berg
International Conference on Computer Vision (ICCV), 2013

pdf | project page | dataset | demo | code | abstract | bibtex

Clothing recognition is an extremely challenging problem due to wide variation in clothing item appearance, layering, and style. In this paper, we tackle the clothing parsing problem using a retrieval based approach. For a query image, we find similar styles from a large database of tagged fashion images and use these examples to parse the query. Our approach combines parsing from: pre-trained global clothing models, local clothing models learned on the fly from retrieved examples, and transferred parse masks (paper doll item transfer) from retrieved examples. Experimental evaluation shows that our approach significantly outperforms state of the art in parsing accuracy.

@inproceedings{yamaguchiICCV13paper,
	Author = {Kota Yamaguchi and M. Hadi Kiapour and 
	Tamara L. Berg},
    Title = {Paper Doll Parsing: Retrieving Similar Styles to Parse Clothing Items},
    Journal = {International Conference on Computer Vision (ICCV)},
    Year = {2013}
}

Clothing Parsing


Parsing Clothing in Fashion Photographs

Kota Yamaguchi, M. Hadi Kiapour, Luis E. Ortiz, Tamara L. Berg
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012

pdf | project page | dataset | code | abstract | bibtex

In this paper we demonstrate an effective method for parsing clothing in fashion photographs, an extremely challenging problem due to the large number of possible garment items, variations in configuration, garment appearance, layering, and occlusion. In addition, we provide a large novel dataset and tools for labeling garment items, to enable future research on clothing estimation. Finally, we present intriguing initial results on using clothing estimates to improve pose identification, and demonstrate a prototype application for pose-independent visual garment retrieval.

@inproceedings{yamaguchiICVPR12parsing,
	Author = {Kota Yamaguchi and M. Hadi Kiapour and 
	Luis E. Ortiz and Tamara L. Berg},
    Title = {Parsing Clothing in Fashion Photographs},
    Journal = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    Year = {2012}
}

Real-Time Facial Expression Recognition


Analysis, Interpretation, and Recognition of Facial Action Units and Expressions Using Neuro-Fuzzy Modeling

M. Khademi, M. Hadi Kiapour, M. T. Manzuri, A. Kiaei,
International Workshop on Artificial Neural Networks in Pattern Recognition (ANNPR), 2010

pdf | abstract | bibtex

In this paper an accurate real-time sequence-based system for representation, recognition, interpretation, and analysis of the facial action units (AUs) and expressions is presented. Our system has the following characteristics: 1) employing adaptive-network-based fuzzy inference systems (ANFIS) and temporal information, we developed a classification scheme based on neuro-fuzzy modeling of the AU intensity, which is robust to intensity variations, 2) using both geometric and appearance-based features, and applying efficient dimension reduction techniques, our system is robust to illumination changes and it can represent the subtle changes as well as temporal information involved in formation of the facial expressions, and 3) by continuous values of intensity and employing top-down hierarchical rule-based classifiers, we can develop accurate human-interpretable AU-to-expression converters. Extensive experiments on Cohn-Kanade database show the superiority of the proposed method, in comparison with support vector machines, hidden Markov models, and neural network classifiers.

@inproceedings{khademiANNPR10analysis,
	Author = {M. Khademi and M. Hadi Kiapour and 
	M. T. Manzuri, A. Kiaei},
    Title = {Analysis, Interpretation, and Recognition of Facial Action Units and Expressions Using Neuro-Fuzzy Modeling },
    Journal = {International Workshop on Artificial Neural Networks in Pattern Recognition},
    Year = {2010}
}

Recognizing Facial Action Units using HMM and Neural Network


Recognizing Combinations of Facial Action Units with Different Intensity Using a Mixture of Hidden Markov Models and Neural Network

M. Khademi, M. T. Manzuri, M. Hadi Kiapour, A. Kiaei,
International Workshop on Multiple Classifier Systems (MCS), 2010

pdf | abstract | bibtex

Facial Action Coding System consists of 44 action units (AUs) and more than 7000 combinations. Hidden Markov models (HMMs) classifier has been used successfully to recognize facial action units (AUs) and expressions due to its ability to deal with AU dynamics. However, a separate HMM is necessary for each single AU and each AU combination. Since combinations of AU numbering in thousands, a more efficient method will be needed. In this paper an accurate real-time sequence-based system for representation and recognition of facial AUs is presented. Our system has the following characteristics: 1) employing a mixture of HMMs and neural network, we develop a novel accurate classifier, which can deal with AU dynamics, recognize subtle changes, and it is also robust to intensity variations, 2) although we use an HMM for each single AU only, by employing a neural network we can recognize each single and combination AU, and 3) using both geometric and appearance-based features, and applying efficient dimension reduction techniques, our system is robust to illumination changes and it can represent the temporal information involved in formation of the facial expressions. Extensive experiments on Cohn-Kanade database show the superiority of the proposed method, in comparison with other classifiers.

@inproceedings{khademiMCS10recognizing,
	Author = {M. Khademi and M. T. Manzuri and 
	M. Hadi Kiapour and A. Kiaei},
    Title = {Recognizing Combinations of Facial Action Units with
Different Intensity Using a Mixture of Hidden Markov Models and Neural Network},
    Journal = {International Workshop on Multiple Classifier Systems},
    Year = {2010}
}