Multi-label image classification has generated significant interest in recent years and the perfo... more Multi-label image classification has generated significant interest in recent years and the performance of such systems often suffers from the not so infrequent occurrence of incorrect or missing labels in the training data. In this paper, we extend the state-of the-art of training classifiers to jointly deal with both forms of errorful data. We accomplish this by modeling noisy and missing labels in multi-label images with a new Noise Modeling Network (NMN) that follows our convolutional neural network (CNN), integrates with it, forming an end-to-end deep learning system, which can jointly learn the noise distribution and CNN parameters. The NMN learns the distribution of noise patterns directly from the noisy data without the need for any clean training data. The NMN can model label noise that depends only on the true label or is also dependent on the image features. We show that the integrated NMN/CNN learning system consistently improves the classification performance, for diffe...
In this paper, we describe a cross-lingual information retrieval (CLIR) system that, given a quer... more In this paper, we describe a cross-lingual information retrieval (CLIR) system that, given a query in English, and a set of audio and text documents in a foreign language, can return a scored list of relevant documents, and present findings in a summary form in English. Foreign audio documents are first transcribed by a state-of-the-art pretrained multilingual speech recognition model that is finetuned to the target language. For text documents, we use multiple multilingual neural machine translation (MT) models to achieve good translation results, especially for low/medium resource languages. The processed documents and queries are then scored using a probabilistic CLIR model that makes use of the probability of translation from GIZA translation tables and scores from a Neural Network Lexical Translation Model (NNLTM). Additionally, advanced score normalization, combination, and thresholding schemes are employed to maximize the Average Query Weighted Value (AQWV) scores. The CLIR o...
Typically, relation extraction models are trained to extract instances of a relation ontology usi... more Typically, relation extraction models are trained to extract instances of a relation ontology using only training data from a single language. However, the concepts represented by the relation ontology (e.g. ResidesIn, EmployeeOf) are language independent. The numbers of annotated examples available for a given ontology vary between languages. For example, there are far fewer annotated examples in Spanish and Japanese than English and Chinese. Furthermore, using only language-specific training data results in the need to manually annotate equivalently large amounts of training for each new language a system encounters. We propose a deep neural network to learn transferable, discriminative bilingual representation. Experiments on the ACE 2005 multilingual training corpus demonstrate that the joint training process results in significant improvement in relation classification performance over the monolingual counterparts. The learnt representation is discriminative and transferable be...
Given training samples with class labels , we aim to learn a compact dictionary discriminative to... more Given training samples with class labels , we aim to learn a compact dictionary discriminative to distinguish the object from the background. Given a dictionary , the sparse code for a sample is computed by Motivated by [9], we assign a specific label to each dictionary and learn the classifier and the dictionary simultaneously. We incorporate an ideal sparse coding error and a linear regression loss into the objective function of dictionary learning where is the loss function. is the ideal sparse code error, where is an ideal sparse code. is the quadratic loss for linear regression. is the label vector where the non-zero position indicates label.
2008 International Conference on Machine Learning and Cybernetics, 2008
An improved mean shift method for object tracking based on nonparametric clustering and adaptive ... more An improved mean shift method for object tracking based on nonparametric clustering and adaptive bandwidth is presented in this paper. Based on partitioning the color space of a tracked object by using a modified nonparametric clustering, an appearance model of the tracked object is built. It captures both the color information and spatial layout of the tracked object. The similarity
2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance, 2013
ABSTRACT In this work, we present a framework to detect objects embedded in complex perspective g... more ABSTRACT In this work, we present a framework to detect objects embedded in complex perspective geometry. Our goal is to accurately identify objects such as people standing in balconies or windows on building facades of surrounding buildings. Compared to traditional computer vision work focused on activity analysis from a horizontal view, our framework provides a solution for the application domain of mobile surveillance in urban areas. A novel solution for a monocular camera is formulated by tightly coupling various computational modules including geometric analysis, segmentation, scale estimation, and object detection. In particular, our proposed approach alleviates the effect of the perspective geometry and corresponding distortion in object appearance effectively, and provides accurate scale priors to eliminate unlikely object detection hypotheses. The experimental results on collected video dataset show that the proposed approach is more accurate than traditional detection approaches based on brute-force scanning windows.
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, Jan 26, 2015
We now know that good mid-level features can greatly enhance the performance of image classificat... more We now know that good mid-level features can greatly enhance the performance of image classification, but how to efficiently learn the image features is still an open question. In this paper, we present an efficient unsupervised mid-level feature learning approach (MidFea), which only involves simple operations such as k-means clustering, convolution, pooling, vector quantization and random projection. We show this simple feature can also achieve good performance in traditional classification task. To further boost the performance, we model the neuron selectivity (NS) principle by building an additional layer over the mid-level features prior to the classifier. The NS-layer learns category-specific neurons in a supervised manner with both bottom-up inference and top-down analysis, and thus supports fast inference for a query image. Through extensive experiments, we demonstrate that this higher-level NS-layer notably improves the classification accuracy with our simple MidFea, achiev...
ABSTRACT A class-consistent k-means clustering algorithm (CCKM) and its hierarchical extension (H... more ABSTRACT A class-consistent k-means clustering algorithm (CCKM) and its hierarchical extension (Hierarchical CCKM) are presented for generating discriminative visual words for recognition problems. In addition to using the labels of training data themselves, we associate a class label with each cluster center to enforce discriminability in the resulting visual words. Our algorithms encourage data points from the same class to be assigned to the same visual word, and those from different classes to be assigned to different visual words. More specifically, we introduce a class consistency term in the clustering process which penalizes assignment of data points from different classes to the same cluster. The optimization process is efficient and bounded by the complexity of k-means clustering. A very efficient and discriminative tree classifier can be learned for various recognition tasks via the Hierarchical CCKM. The effectiveness of the proposed algorithms is validated on two public face datasets and four benchmark action datasets.
Multi-label image classification has generated significant interest in recent years and the perfo... more Multi-label image classification has generated significant interest in recent years and the performance of such systems often suffers from the not so infrequent occurrence of incorrect or missing labels in the training data. In this paper, we extend the state-of the-art of training classifiers to jointly deal with both forms of errorful data. We accomplish this by modeling noisy and missing labels in multi-label images with a new Noise Modeling Network (NMN) that follows our convolutional neural network (CNN), integrates with it, forming an end-to-end deep learning system, which can jointly learn the noise distribution and CNN parameters. The NMN learns the distribution of noise patterns directly from the noisy data without the need for any clean training data. The NMN can model label noise that depends only on the true label or is also dependent on the image features. We show that the integrated NMN/CNN learning system consistently improves the classification performance, for diffe...
In this paper, we describe a cross-lingual information retrieval (CLIR) system that, given a quer... more In this paper, we describe a cross-lingual information retrieval (CLIR) system that, given a query in English, and a set of audio and text documents in a foreign language, can return a scored list of relevant documents, and present findings in a summary form in English. Foreign audio documents are first transcribed by a state-of-the-art pretrained multilingual speech recognition model that is finetuned to the target language. For text documents, we use multiple multilingual neural machine translation (MT) models to achieve good translation results, especially for low/medium resource languages. The processed documents and queries are then scored using a probabilistic CLIR model that makes use of the probability of translation from GIZA translation tables and scores from a Neural Network Lexical Translation Model (NNLTM). Additionally, advanced score normalization, combination, and thresholding schemes are employed to maximize the Average Query Weighted Value (AQWV) scores. The CLIR o...
Typically, relation extraction models are trained to extract instances of a relation ontology usi... more Typically, relation extraction models are trained to extract instances of a relation ontology using only training data from a single language. However, the concepts represented by the relation ontology (e.g. ResidesIn, EmployeeOf) are language independent. The numbers of annotated examples available for a given ontology vary between languages. For example, there are far fewer annotated examples in Spanish and Japanese than English and Chinese. Furthermore, using only language-specific training data results in the need to manually annotate equivalently large amounts of training for each new language a system encounters. We propose a deep neural network to learn transferable, discriminative bilingual representation. Experiments on the ACE 2005 multilingual training corpus demonstrate that the joint training process results in significant improvement in relation classification performance over the monolingual counterparts. The learnt representation is discriminative and transferable be...
Given training samples with class labels , we aim to learn a compact dictionary discriminative to... more Given training samples with class labels , we aim to learn a compact dictionary discriminative to distinguish the object from the background. Given a dictionary , the sparse code for a sample is computed by Motivated by [9], we assign a specific label to each dictionary and learn the classifier and the dictionary simultaneously. We incorporate an ideal sparse coding error and a linear regression loss into the objective function of dictionary learning where is the loss function. is the ideal sparse code error, where is an ideal sparse code. is the quadratic loss for linear regression. is the label vector where the non-zero position indicates label.
2008 International Conference on Machine Learning and Cybernetics, 2008
An improved mean shift method for object tracking based on nonparametric clustering and adaptive ... more An improved mean shift method for object tracking based on nonparametric clustering and adaptive bandwidth is presented in this paper. Based on partitioning the color space of a tracked object by using a modified nonparametric clustering, an appearance model of the tracked object is built. It captures both the color information and spatial layout of the tracked object. The similarity
2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance, 2013
ABSTRACT In this work, we present a framework to detect objects embedded in complex perspective g... more ABSTRACT In this work, we present a framework to detect objects embedded in complex perspective geometry. Our goal is to accurately identify objects such as people standing in balconies or windows on building facades of surrounding buildings. Compared to traditional computer vision work focused on activity analysis from a horizontal view, our framework provides a solution for the application domain of mobile surveillance in urban areas. A novel solution for a monocular camera is formulated by tightly coupling various computational modules including geometric analysis, segmentation, scale estimation, and object detection. In particular, our proposed approach alleviates the effect of the perspective geometry and corresponding distortion in object appearance effectively, and provides accurate scale priors to eliminate unlikely object detection hypotheses. The experimental results on collected video dataset show that the proposed approach is more accurate than traditional detection approaches based on brute-force scanning windows.
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, Jan 26, 2015
We now know that good mid-level features can greatly enhance the performance of image classificat... more We now know that good mid-level features can greatly enhance the performance of image classification, but how to efficiently learn the image features is still an open question. In this paper, we present an efficient unsupervised mid-level feature learning approach (MidFea), which only involves simple operations such as k-means clustering, convolution, pooling, vector quantization and random projection. We show this simple feature can also achieve good performance in traditional classification task. To further boost the performance, we model the neuron selectivity (NS) principle by building an additional layer over the mid-level features prior to the classifier. The NS-layer learns category-specific neurons in a supervised manner with both bottom-up inference and top-down analysis, and thus supports fast inference for a query image. Through extensive experiments, we demonstrate that this higher-level NS-layer notably improves the classification accuracy with our simple MidFea, achiev...
ABSTRACT A class-consistent k-means clustering algorithm (CCKM) and its hierarchical extension (H... more ABSTRACT A class-consistent k-means clustering algorithm (CCKM) and its hierarchical extension (Hierarchical CCKM) are presented for generating discriminative visual words for recognition problems. In addition to using the labels of training data themselves, we associate a class label with each cluster center to enforce discriminability in the resulting visual words. Our algorithms encourage data points from the same class to be assigned to the same visual word, and those from different classes to be assigned to different visual words. More specifically, we introduce a class consistency term in the clustering process which penalizes assignment of data points from different classes to the same cluster. The optimization process is efficient and bounded by the complexity of k-means clustering. A very efficient and discriminative tree classifier can be learned for various recognition tasks via the Hierarchical CCKM. The effectiveness of the proposed algorithms is validated on two public face datasets and four benchmark action datasets.
Uploads
Papers by Zhuolin Jiang