Svoboda | Graniru | BBC Russia | Golosameriki | Facebook

research-article

Error-driven generalist+experts (edge): a multi-stage ensemble framework for text categorization

Authors:

C. Lee GilesAuthors Info & Claims

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

Pages 83 - 92

https://doi.org/10.1145/1458082.1458097

Published: 26 October 2008 Publication History

Abstract

We introduce a multi-stage ensemble framework, Error-Driven Generalist+Expert or Edge, for improved classification on large-scale text categorization problems. Edge first trains a generalist, capable of classifying under all classes, to deliver a reasonably accurate initial category ranking given an instance. Edge then computes a confusion graph for the generalist and allocates the learning resources to train experts on relatively small groups of classes that tend to be systematically confused with one another by the generalist. The experts' votes, when invoked on a given instance, yield a reranking of the classes, thereby correcting the errors of the generalist. Our evaluations showcase the improved classification and ranking performance on several large-scale text categorization datasets. Edge is in particular efficient when the underlying learners are efficient. Our study of confusion graphs is also of independent interest.

References

[1]

L. Cai and T. Hofmann. Hierarchical document categorization with support vector machines. In Proc. of 13th ACM international conference on Information and knowledge management (CIKM), 2004.

Digital Library

[2]

R. Caruana and A. Niculescu-Mizil. Data mining in metric space: an empirical analysis of supervised learning performance criteria. In Proc. of 10th ACM SIGKDD Conference, 2004.

Digital Library

[3]

V. R. Carvalho and W. W. Cohen. Single-pass online learning: performance, voting schemes and online feature selection. In Proc. of 12th ACM SIGKDD Conference, 2006.

Digital Library

[4]

K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passive-aggressive algorithms. Journal of Machine Learning Research, 7, 2006.

Digital Library

[5]

K. Crammer and Y. Singer. A family of additive online algorithms for category ranking. Journal of Machine Learning Research, 3:1025--1058, 2003.

Digital Library

[6]

S. Dumais and H. Chen. Hierarchical classification of web content. In Proc of 23th ACM SIGIR conf., 2000.

Digital Library

[7]

S. Dumais, J. Platt, D. Heckerman, and M. Sahami.Inductive learning algorithms and representations for text categorization. In Proceedings of the 7th ACM International Conference on Information and Knowledge Management (CIKM), 1998.

Digital Library

[8]

A. Esuli, T. Fagni, and F. Sebastiani. TreeBoost.MH: A boosting algorithm for multi-label hierarchical text categorization. In Proc of 13th Int'l Conf on String Processing and Information Retrieval (SPIRE), 2006.

Digital Library

[9]

Y. Freund and R. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. J. Computer & System Sciences, 55(1), 1997.

Digital Library

[10]

Y. Freund, R. Schapire, Y. Singer, and M. Warmuth. Using and combining predictors that specialize. In ACM Symp. on Theory of Computing (STOC), 1997.

Digital Library

[11]

S. Godbole, S. Sarawagi, and S. Chakrabarti. Scaling multi-class support vector machines using inter-class confusion. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 513--518, 2002.

Digital Library

[12]

W. Hersh, C. Buckley, T. Leone, and D. Hickam. OHSUMED: An interactive retrieval evaluation and new large test collection for research. In Proc. of the 17th ACM SIGIR Conference, pages 192--201, 1994.

Digital Library

[13]

S. Keerthi and D. DeCoste. A modified finite Newton method for fast solution of large scale linear SVMs. Journal of Machine Learning Research, 6, 2005.

Digital Library

[14]

D. Lewis, Y. Yang, T. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5, 2004.

Digital Library

[15]

T. Liu, Y. Yang, H. Wan, H. Zeng, Z. Chen, and W. Ma. Support vector machines classification with a very large-scale taxonomy. KDD Explorations, 2005.

Digital Library

[16]

O. Madani and M. Connor. Large-scale many-class learning. In SIAM Conf on Data Mining (SDM), 2008.

[17]

O. Madani, W. Greiner, D. Kempe, and M. R. Salavatipour. Recall systems: Efficient learning and use of category indices. In Proceedings of the 11th International Conference on Artificial Intelligence and Statistics (AISTATS), 2007.

[18]

O. Madani and J. Huang. On updates that constrain the features' connections during learning. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), 2008.

Digital Library

[19]

M. E. J. Newman. Mixing patterns in networks. Physical Review E, 67:026126, 2003.

[20]

J. Rennie, L. Shih, J. Teevan, and D. Karger. Tackling the poor assumptions of naive Bayes text classifiers. In Proceedings of the 20th International Conference on Machine Learning (ICML), pages 616--623, 2003.

[21]

R. Rifkin and A. Klautau. In defense of one-vs-all classification. J. Machine Learning Research, 5, 2004.

Digital Library

[22]

F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 56(6):386--408, 1958.

[23]

R. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated prediction. Machine learning, 37(1):297--336, 1999.

Digital Library

[24]

F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 2002.

Digital Library

[25]

K. Tumer and J. Ghosh. Analysis of decision boundaries in linearly combined neural classifiers. Pattern Recognition, 29(2):341--348, 1996.

[26]

K. Tumer and J. Ghosh. Robust combining of disparate classifiers through order statistics. Pattern Analysis & Applications, 5(2):189--200, 2002.

[27]

D. J. Watts and S. Strogatz. Collective dynamics of 'small-world' networks. Nature, 393:440--442, 1998.

[28]

D. H. Wolpert. Stacked generalization. Neural networks, pages 241--259, 2002.

Digital Library

[29]

L. Xu, A. Krzyzak, and C. Y. Suen. Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Transactions on Systems, Man and Cybernetics, 22(3):418--435, 1992.

[30]

Y. Yang. An evaluation of statistical approaches to text categorization. J. of Information Retrieval, 1999.

Digital Library

[31]

Y. Yang, J. Zhang, and B. Kisiel. A scalability analysis of classifiers in text categorization. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR), 2003.

Digital Library

Cited By

EnríQuez FCruz FOrtega FG. Vallejo CTroyano J(2013)A comparative study of classifier combination applied to NLP tasksInformation Fusion10.1016/j.inffus.2012.05.00114:3(255-267)Online publication date: 1-Jul-2013
https://dl.acm.org/doi/10.1016/j.inffus.2012.05.001
Madani OConnor MGreiner W(2009)Learning When Concepts AboundThe Journal of Machine Learning Research10.5555/1577069.175587210(2571-2613)Online publication date: 1-Dec-2009
https://dl.acm.org/doi/10.5555/1577069.1755872

Index Terms

Error-driven generalist+experts (edge): a multi-stage ensemble framework for text categorization
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values

Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the associative classifier and the Naive Bayes Classifier to make up the shortcomings of each other, thus improving the accuracy of ...
Building Locally Discriminative Classifier Ensemble Through Classifier Fusion Among Nearest Neighbors
PCM 2016: 17th Pacific-Rim Conference on Advances in Multimedia Information Processing - Volume 9916

Many studies on ensemble learning that combines multiple classifiers have shown that, it is an effective technique to improve accuracy and stability of a single classifier. In this paper, we propose a novel discriminative classifier fusion method, which ...
Improving automatic Chinese text categorization by error correction
IRAL '00: Proceedings of the fifth international workshop on on Information retrieval with Asian languages

In this paper we use the miss-classified news in training data as a feedback to improve the classification accuracy. We isolate the miss-classified news from the news of original classes to form new subclasses, and modify Rocchio linear classifier by ...

Comments

Information & Contributors

Information

Published In

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

October 2008

1562 pages

ISBN:9781595939913

DOI:10.1145/1458082

General Chair:
James G. Shanahan
Church and Duncan Group Inc, USA
,
Program Chairs:
Sihem Amer-Yahia
Yahoo! Research, USA
,
Ioana Manolescu
INRIA, France
,
Yi Zhang
University of California, Santa Cruz, USA
,
David A. Evans
JustSystems Evans Research, USA
,
Alek Kolcz
Microsoft Live Labs, USA
,
Key-Sun Choi
KAIST, Korea
,
Abdur Chowdury
Twitter, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM08

Sponsor:

CIKM08: Conference on Information and Knowledge Management

October 26 - 30, 2008

California, Napa Valley, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
270
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 29 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

EnríQuez FCruz FOrtega FG. Vallejo CTroyano J(2013)A comparative study of classifier combination applied to NLP tasksInformation Fusion10.1016/j.inffus.2012.05.00114:3(255-267)Online publication date: 1-Jul-2013
https://dl.acm.org/doi/10.1016/j.inffus.2012.05.001
Madani OConnor MGreiner W(2009)Learning When Concepts AboundThe Journal of Machine Learning Research10.5555/1577069.175587210(2571-2613)Online publication date: 1-Dec-2009
https://dl.acm.org/doi/10.5555/1577069.1755872

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents