Casey Whitelaw

Followers

Following

Public Views

Interests

Uploads

Papers by Casey Whitelaw

Abstract Automatic recognition of named entities such as people, places, organizations, books, an... more Abstract Automatic recognition of named entities such as people, places, organizations, books, and movies across the entire web presents a number of challenges, both of scale and scope. Data for training general named entity recognizers is difficult to come by, and efficient machine learning methods are required once we have found hundreds of millions of labeled observations. We present an implemented system that addresses these issues, including a method for automatically generating training data, and a multi-class online ...

Using appraisal groups for sentiment analysis

Download

Using Appraisal Taxonomies for Sentiment Analysis

Download

Systemic Functional Features in Stylistic Text Classification

Download

Stylistic text classification using functional lexical features

Journal of The American Society for Information Science and Technology, 2007

Most text analysis and retrieval work to date has focused on the topic of a text; that is, what i... more Most text analysis and retrieval work to date has focused on the topic of a text; that is, what it is about. However, a text also contains much useful information in its style, or how it is written. This includes information about its author, its purpose, feelings it is meant to evoke, and more. This article develops a new type of lexical feature for use in stylistic text classification, based on taxonomies of various semantic functions of certain choice words or phrases. We demonstrate the usefulness of such features for the stylistic text classification tasks of determining author identity and nationality, the gender of literary characters, a text's sentiment (positive/negative evaluation), and the rhetorical character of scientific journal articles. We further show how the use of functional features aids in gaining insight about stylistic differences among different kinds of texts.

Download

Using the Web for Language Independent Spellchecking and Autocorrection

Download

Named Entity Recognition Using a Character-based Probabilistic Approach

Download

Selecting Systemic Features for Text Classification

Systemic features use linguistically- derived language models as a basis for text classification.... more Systemic features use linguistically- derived language models as a basis for text classification. The graph structure of these models allows for feature repre- sentations not available with traditional bag-of-words approaches. This paper explores the set of possible represen- tations, and proposes feature selection methods that aim to produce the most compact and effective set of attributes for a given classification problem. We show that small sets of systemic fea- tures can outperform larger sets of word- based features in the task of identifying financial scam documents.

Identifying Interpersonal Distance using Systemic Features

Download

Evaluating Corpora for Named Entity Recognition Using Character-Level Features

We present a new collection of training corpora for evaluation of language-independent named enti... more We present a new collection of training corpora for evaluation of language-independent named entity recognition systems. For the five languages included in this initial release, Basque, Dutch, English, Korean, and Spanish, we provide an analysis of the relative difficulty of the NER task for both the language in general, and as a supervised task using these corpora. We construct three strongly language-independent systems, each using only orthographic features, and compare their performance on both seen and unseen data. We achieve improved results through combining these classifiers, showing that ensemble approaches are suitable when dealing with language-independent problems.

SLINERC: The Sydney Language-Independent Named Entity Recogniser and Classifier

1 Introduction Identification of named entities is an increas-ingly important task with applicati... more

Web-scale named entity recognition