Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
skip to main content
10.1145/1458082.1458097acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Error-driven generalist+experts (edge): a multi-stage ensemble framework for text categorization

Published: 26 October 2008 Publication History
  • Get Citation Alerts
  • Abstract

    We introduce a multi-stage ensemble framework, Error-Driven Generalist+Expert or Edge, for improved classification on large-scale text categorization problems. Edge first trains a generalist, capable of classifying under all classes, to deliver a reasonably accurate initial category ranking given an instance. Edge then computes a confusion graph for the generalist and allocates the learning resources to train experts on relatively small groups of classes that tend to be systematically confused with one another by the generalist. The experts' votes, when invoked on a given instance, yield a reranking of the classes, thereby correcting the errors of the generalist. Our evaluations showcase the improved classification and ranking performance on several large-scale text categorization datasets. Edge is in particular efficient when the underlying learners are efficient. Our study of confusion graphs is also of independent interest.

    References

    [1]
    L. Cai and T. Hofmann. Hierarchical document categorization with support vector machines. In Proc. of 13th ACM international conference on Information and knowledge management (CIKM), 2004.
    [2]
    R. Caruana and A. Niculescu-Mizil. Data mining in metric space: an empirical analysis of supervised learning performance criteria. In Proc. of 10th ACM SIGKDD Conference, 2004.
    [3]
    V. R. Carvalho and W. W. Cohen. Single-pass online learning: performance, voting schemes and online feature selection. In Proc. of 12th ACM SIGKDD Conference, 2006.
    [4]
    K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passive-aggressive algorithms. Journal of Machine Learning Research, 7, 2006.
    [5]
    K. Crammer and Y. Singer. A family of additive online algorithms for category ranking. Journal of Machine Learning Research, 3:1025--1058, 2003.
    [6]
    S. Dumais and H. Chen. Hierarchical classification of web content. In Proc of 23th ACM SIGIR conf., 2000.
    [7]
    S. Dumais, J. Platt, D. Heckerman, and M. Sahami.Inductive learning algorithms and representations for text categorization. In Proceedings of the 7th ACM International Conference on Information and Knowledge Management (CIKM), 1998.
    [8]
    A. Esuli, T. Fagni, and F. Sebastiani. TreeBoost.MH: A boosting algorithm for multi-label hierarchical text categorization. In Proc of 13th Int'l Conf on String Processing and Information Retrieval (SPIRE), 2006.
    [9]
    Y. Freund and R. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. J. Computer & System Sciences, 55(1), 1997.
    [10]
    Y. Freund, R. Schapire, Y. Singer, and M. Warmuth. Using and combining predictors that specialize. In ACM Symp. on Theory of Computing (STOC), 1997.
    [11]
    S. Godbole, S. Sarawagi, and S. Chakrabarti. Scaling multi-class support vector machines using inter-class confusion. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 513--518, 2002.
    [12]
    W. Hersh, C. Buckley, T. Leone, and D. Hickam. OHSUMED: An interactive retrieval evaluation and new large test collection for research. In Proc. of the 17th ACM SIGIR Conference, pages 192--201, 1994.
    [13]
    S. Keerthi and D. DeCoste. A modified finite Newton method for fast solution of large scale linear SVMs. Journal of Machine Learning Research, 6, 2005.
    [14]
    D. Lewis, Y. Yang, T. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5, 2004.
    [15]
    T. Liu, Y. Yang, H. Wan, H. Zeng, Z. Chen, and W. Ma. Support vector machines classification with a very large-scale taxonomy. KDD Explorations, 2005.
    [16]
    O. Madani and M. Connor. Large-scale many-class learning. In SIAM Conf on Data Mining (SDM), 2008.
    [17]
    O. Madani, W. Greiner, D. Kempe, and M. R. Salavatipour. Recall systems: Efficient learning and use of category indices. In Proceedings of the 11th International Conference on Artificial Intelligence and Statistics (AISTATS), 2007.
    [18]
    O. Madani and J. Huang. On updates that constrain the features' connections during learning. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), 2008.
    [19]
    M. E. J. Newman. Mixing patterns in networks. Physical Review E, 67:026126, 2003.
    [20]
    J. Rennie, L. Shih, J. Teevan, and D. Karger. Tackling the poor assumptions of naive Bayes text classifiers. In Proceedings of the 20th International Conference on Machine Learning (ICML), pages 616--623, 2003.
    [21]
    R. Rifkin and A. Klautau. In defense of one-vs-all classification. J. Machine Learning Research, 5, 2004.
    [22]
    F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 56(6):386--408, 1958.
    [23]
    R. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated prediction. Machine learning, 37(1):297--336, 1999.
    [24]
    F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 2002.
    [25]
    K. Tumer and J. Ghosh. Analysis of decision boundaries in linearly combined neural classifiers. Pattern Recognition, 29(2):341--348, 1996.
    [26]
    K. Tumer and J. Ghosh. Robust combining of disparate classifiers through order statistics. Pattern Analysis & Applications, 5(2):189--200, 2002.
    [27]
    D. J. Watts and S. Strogatz. Collective dynamics of 'small-world' networks. Nature, 393:440--442, 1998.
    [28]
    D. H. Wolpert. Stacked generalization. Neural networks, pages 241--259, 2002.
    [29]
    L. Xu, A. Krzyzak, and C. Y. Suen. Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Transactions on Systems, Man and Cybernetics, 22(3):418--435, 1992.
    [30]
    Y. Yang. An evaluation of statistical approaches to text categorization. J. of Information Retrieval, 1999.
    [31]
    Y. Yang, J. Zhang, and B. Kisiel. A scalability analysis of classifiers in text categorization. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR), 2003.

    Cited By

    View all

    Index Terms

    1. Error-driven generalist+experts (edge): a multi-stage ensemble framework for text categorization

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
      October 2008
      1562 pages
      ISBN:9781595939913
      DOI:10.1145/1458082
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 26 October 2008

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. ensemble learning
      2. many class classification
      3. text categorization

      Qualifiers

      • Research-article

      Conference

      CIKM08
      CIKM08: Conference on Information and Knowledge Management
      October 26 - 30, 2008
      California, Napa Valley, USA

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 29 Jul 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media