Recent work in Harmonic Serialism (HS) has led to new insights into several outstanding problems ... more Recent work in Harmonic Serialism (HS) has led to new insights into several outstanding problems in phonology. The gradual, serial optimization inherent to HS offers novel perspectives on the too-many-repairs problem (McCarthy 2008a) and makes it possible to capture generalizations stated at intermediate
1. Introduction The three nominal paradigms in (1) illustrate the fundamental problem posed by Sl... more 1. Introduction The three nominal paradigms in (1) illustrate the fundamental problem posed by Slavic alternating vowels, called yers. In (1b) a full vowel appears throughout the inflectional paradigm; the stem is invariant. In (1c) the stem is also invariant and ends in a consonant cluster regardless of the following inflectional morphology. The paradigm in (1a) contains the yer vowel (underlined). This vowel is phonetically identical to the [ε] in (1b), but, unlike the vowel in (1b), the yer only appears when the following inflection is null and deletes otherwise, creating an alternation in the stem.
Proceedings of the Annual Meetings on Phonology, 2021
In this paper we computationally implement four different theories for representing opaque and tr... more In this paper we computationally implement four different theories for representing opaque and transparent phonological interactions: Harmonic Serialism, Stratal OT, Two-Level Constraints, and Indexed Constraints. We then show that these theories make unique predictions on two tasks: (1) a learning-bias task, based on previous experimental work with humans and (2) a novel generalization task that no human data exists for. Our results in (1) show that serial models predict that transparent languages should be easier to acquire, while parallel models do not. Furthermore, the results for (2) show that all four of the theories we test make unique predictions for how humans should generalize to novel phonological interaction types.
This chapter provides a broad overview of recent research on constraint-based learning of phonolo... more This chapter provides a broad overview of recent research on constraint-based learning of phonology. The review covers the major results and the contributions of a wide range of approaches, comparing the computational properties and learning implications of these theories. Specifically, the review encompasses learning in classic OT as well as learning in related frameworks that formalize constraint interaction as (probabilistic) weighting or ranking. The learning problem is decomposed into several subproblems that highlight the complexity of the learning problem facing the child learner. The discussion emphasizes the challenges posed by these various subproblems, the insights and differences of the proposed solutions, and the outstanding questions that require further research.
This paper explores the relative merits of constraint rankingvs. weighting in the context of a ma... more This paper explores the relative merits of constraint rankingvs. weighting in the context of a major outstanding learnability problem in phonology: learning in the face of hidden structure. Specifically, the paper examines a well-known approach to the structural ambiguity problem, Robust Interpretive Parsing (RIP; Tesar & Smolensky 1998), focusing on its stochastic extension first described by Boersma (2003). Two related problems with the stochastic formulation of RIP are revealed, rooted in a failure to take full advantage of probabilistic information available in the learner's grammar. To address these problems, two novel parsing strategies are introduced and applied to learning algorithms for both probabilistic ranking and weighting. The novel parsing strategies yield significant improvements in performance, asymmetrically improving performance of OT learners. Once RIP is replaced with the proposed modifications, the apparent advantage of HG over OT learners reported in previ...
This paper develops a novel approach to learning phonological hidden structure relying on underex... more This paper develops a novel approach to learning phonological hidden structure relying on underexplored expectation driven learning strategies rooted in the machine learning literature. A novel probabilistic grammatical representation that enables estimation of expectation driven learning updates, and two learning algorithms utilizing these updates, are introduced. The algorithms are shown to surpass the performance of existing error-driven learning models on a large test system with hidden metrical structure. The paper then shows that the learning strategies are fully general and can also be successfully applied to learning of an entirely different sort of hidden structure created by unknown underlying representations.
A growing body of research in phonology addresses the representation and learning of variable pro... more A growing body of research in phonology addresses the representation and learning of variable processes and exceptional, lexically conditioned processes. Linzen et al. (2013) present a MaxEnt model with additive lexical scales to account for data exhibiting both variation and exceptionality. In this paper, we implement a learning model for lexically scaled MaxEnt grammars which we show to be successful across a range of data containing patterns of variation and exceptionality. We also explore how the model’s parameters and the rate of exceptionality in the data influence its performance and predictions for novel forms.
There exist a number of provably correct learning algorithms for Optimality Theory and closely re... more There exist a number of provably correct learning algorithms for Optimality Theory and closely related theories. These include Constraint Demotion (CD; Tesar 1995, et seq.), a family of algorithms for classic OT. For Harmonic Grammar (Legendre, Miyata and Smolensky 1990; Smolensky and Legendre 2006) and related theories (e.g. maximum entropy), there is Stochastic Gradient Ascent (SGA; Soderstrom, Mathis and Smolensky 2006, Jäger 2007). There is also the Gradual Learning Algorithm for Stochastic OT (GLA; Boersma 1997), which works well in most cases but is known not to be correct in the general case (see e.g., Pater 2008). The success of these algorithms (and correctness proofs in the case of CD and SGA) relies on the assumption that learners are provided with full structural descriptions of the data, including prosodic structure as well as underlying representations, which are not available to the human learner.
1 Introduction A central problem of language learnability is the learning of restrictive grammars... more 1 Introduction A central problem of language learnability is the learning of restrictive grammars, grammars that generate all the observed forms but as few others as possible. Given only positive evidence, there are many grammars consistent with the observed data, and the learner must select the most restrictive grammar among these. If the learner mistakenly adopts a broader grammar, no positive evidence will contradict this decision since the broader grammar is consistent with all the positive evidence plus additional data. This is known as the subset problem a well-known solution to the restrictiveness problem is a ranking bias, a preference for a particular relative ranking between certain constraint types. Ranking biases have been successfully implemented in algorithms that focus on the learning of a language-particular ranking given the correct underlying forms point out, however, when the full problem of learning rankings and underlying forms is considered, ranking biases are ...
Cognitive Limitations Impose Advantageous Constraints on Word Segmentation Katarzyna Hitczenko Ya... more Cognitive Limitations Impose Advantageous Constraints on Word Segmentation Katarzyna Hitczenko Yale University, New Haven, CT, USA Gaja Jarosz Yale University, New Haven, CT, USA Abstract: This paper analyzes the learning constraints imposed by cognitive limitations on memory and processing in the domain of word segmentation. We analyze the properties of three learning algorithms that rely on the same probability model: an incremental, constrained learner proposed by Venkataraman (2001), a batch version of Venkataraman’s learner, and a variant of the batch model that imposes memory constraints mimicking incremental processing. We show that while Venkataraman’s original incremental learner performs well, the batch version predicts massive undersegmentation. We identify the crucial properties of the probability model responsible for undersegmentation and show how memory constraints impose an opposing pressure favoring segmentation. Our results indicate that cognitive limitations exert...
These proceedings include the full length paper submissions that were peer-reviewed as papers and... more These proceedings include the full length paper submissions that were peer-reviewed as papers and accepted for either oral or poster presentation at SCiL 2018. The first 5 papers in the proceedings are those that were accepted as oral presentations, while the remaining 12 papers were presented as posters. Submissions to SCiL also included abstracts, which were presented as talks or posters at the conference. Further information, including the schedule and abstracts, can be found at our website: https://blogs.umass.edu/scil/scil2018/.
This paper examines the relationship between the language input and phonological development. Usi... more This paper examines the relationship between the language input and phonological development. Using novel data from a longitudinal corpus of spontaneous child speech in Polish, we evaluate and compare the predictions of a variety of input-based phonotactic models for syllable structure acquisition. We find that many commonly examined input statistics can make dramatically different predictions, as do different assumptions about the representational units over which statistics are calculated. Several models perform surprisingly well in predicting major effects in production accuracy, but results vary systematically depending on the representational level examined. The most successful models reference multiple abstract units of phonological representation. We also show there are departures between the predictions of the best phonotactic models and the children’s production patterns that suggest input frequency alone is not sufficient to explain the developmental patterns.
Recent work in Harmonic Serialism (HS) has led to new insights into several outstanding problems ... more Recent work in Harmonic Serialism (HS) has led to new insights into several outstanding problems in phonology. The gradual, serial optimization inherent to HS offers novel perspectives on the too-many-repairs problem (McCarthy 2008a) and makes it possible to capture generalizations stated at intermediate
1. Introduction The three nominal paradigms in (1) illustrate the fundamental problem posed by Sl... more 1. Introduction The three nominal paradigms in (1) illustrate the fundamental problem posed by Slavic alternating vowels, called yers. In (1b) a full vowel appears throughout the inflectional paradigm; the stem is invariant. In (1c) the stem is also invariant and ends in a consonant cluster regardless of the following inflectional morphology. The paradigm in (1a) contains the yer vowel (underlined). This vowel is phonetically identical to the [ε] in (1b), but, unlike the vowel in (1b), the yer only appears when the following inflection is null and deletes otherwise, creating an alternation in the stem.
Proceedings of the Annual Meetings on Phonology, 2021
In this paper we computationally implement four different theories for representing opaque and tr... more In this paper we computationally implement four different theories for representing opaque and transparent phonological interactions: Harmonic Serialism, Stratal OT, Two-Level Constraints, and Indexed Constraints. We then show that these theories make unique predictions on two tasks: (1) a learning-bias task, based on previous experimental work with humans and (2) a novel generalization task that no human data exists for. Our results in (1) show that serial models predict that transparent languages should be easier to acquire, while parallel models do not. Furthermore, the results for (2) show that all four of the theories we test make unique predictions for how humans should generalize to novel phonological interaction types.
This chapter provides a broad overview of recent research on constraint-based learning of phonolo... more This chapter provides a broad overview of recent research on constraint-based learning of phonology. The review covers the major results and the contributions of a wide range of approaches, comparing the computational properties and learning implications of these theories. Specifically, the review encompasses learning in classic OT as well as learning in related frameworks that formalize constraint interaction as (probabilistic) weighting or ranking. The learning problem is decomposed into several subproblems that highlight the complexity of the learning problem facing the child learner. The discussion emphasizes the challenges posed by these various subproblems, the insights and differences of the proposed solutions, and the outstanding questions that require further research.
This paper explores the relative merits of constraint rankingvs. weighting in the context of a ma... more This paper explores the relative merits of constraint rankingvs. weighting in the context of a major outstanding learnability problem in phonology: learning in the face of hidden structure. Specifically, the paper examines a well-known approach to the structural ambiguity problem, Robust Interpretive Parsing (RIP; Tesar & Smolensky 1998), focusing on its stochastic extension first described by Boersma (2003). Two related problems with the stochastic formulation of RIP are revealed, rooted in a failure to take full advantage of probabilistic information available in the learner's grammar. To address these problems, two novel parsing strategies are introduced and applied to learning algorithms for both probabilistic ranking and weighting. The novel parsing strategies yield significant improvements in performance, asymmetrically improving performance of OT learners. Once RIP is replaced with the proposed modifications, the apparent advantage of HG over OT learners reported in previ...
This paper develops a novel approach to learning phonological hidden structure relying on underex... more This paper develops a novel approach to learning phonological hidden structure relying on underexplored expectation driven learning strategies rooted in the machine learning literature. A novel probabilistic grammatical representation that enables estimation of expectation driven learning updates, and two learning algorithms utilizing these updates, are introduced. The algorithms are shown to surpass the performance of existing error-driven learning models on a large test system with hidden metrical structure. The paper then shows that the learning strategies are fully general and can also be successfully applied to learning of an entirely different sort of hidden structure created by unknown underlying representations.
A growing body of research in phonology addresses the representation and learning of variable pro... more A growing body of research in phonology addresses the representation and learning of variable processes and exceptional, lexically conditioned processes. Linzen et al. (2013) present a MaxEnt model with additive lexical scales to account for data exhibiting both variation and exceptionality. In this paper, we implement a learning model for lexically scaled MaxEnt grammars which we show to be successful across a range of data containing patterns of variation and exceptionality. We also explore how the model’s parameters and the rate of exceptionality in the data influence its performance and predictions for novel forms.
There exist a number of provably correct learning algorithms for Optimality Theory and closely re... more There exist a number of provably correct learning algorithms for Optimality Theory and closely related theories. These include Constraint Demotion (CD; Tesar 1995, et seq.), a family of algorithms for classic OT. For Harmonic Grammar (Legendre, Miyata and Smolensky 1990; Smolensky and Legendre 2006) and related theories (e.g. maximum entropy), there is Stochastic Gradient Ascent (SGA; Soderstrom, Mathis and Smolensky 2006, Jäger 2007). There is also the Gradual Learning Algorithm for Stochastic OT (GLA; Boersma 1997), which works well in most cases but is known not to be correct in the general case (see e.g., Pater 2008). The success of these algorithms (and correctness proofs in the case of CD and SGA) relies on the assumption that learners are provided with full structural descriptions of the data, including prosodic structure as well as underlying representations, which are not available to the human learner.
1 Introduction A central problem of language learnability is the learning of restrictive grammars... more 1 Introduction A central problem of language learnability is the learning of restrictive grammars, grammars that generate all the observed forms but as few others as possible. Given only positive evidence, there are many grammars consistent with the observed data, and the learner must select the most restrictive grammar among these. If the learner mistakenly adopts a broader grammar, no positive evidence will contradict this decision since the broader grammar is consistent with all the positive evidence plus additional data. This is known as the subset problem a well-known solution to the restrictiveness problem is a ranking bias, a preference for a particular relative ranking between certain constraint types. Ranking biases have been successfully implemented in algorithms that focus on the learning of a language-particular ranking given the correct underlying forms point out, however, when the full problem of learning rankings and underlying forms is considered, ranking biases are ...
Cognitive Limitations Impose Advantageous Constraints on Word Segmentation Katarzyna Hitczenko Ya... more Cognitive Limitations Impose Advantageous Constraints on Word Segmentation Katarzyna Hitczenko Yale University, New Haven, CT, USA Gaja Jarosz Yale University, New Haven, CT, USA Abstract: This paper analyzes the learning constraints imposed by cognitive limitations on memory and processing in the domain of word segmentation. We analyze the properties of three learning algorithms that rely on the same probability model: an incremental, constrained learner proposed by Venkataraman (2001), a batch version of Venkataraman’s learner, and a variant of the batch model that imposes memory constraints mimicking incremental processing. We show that while Venkataraman’s original incremental learner performs well, the batch version predicts massive undersegmentation. We identify the crucial properties of the probability model responsible for undersegmentation and show how memory constraints impose an opposing pressure favoring segmentation. Our results indicate that cognitive limitations exert...
These proceedings include the full length paper submissions that were peer-reviewed as papers and... more These proceedings include the full length paper submissions that were peer-reviewed as papers and accepted for either oral or poster presentation at SCiL 2018. The first 5 papers in the proceedings are those that were accepted as oral presentations, while the remaining 12 papers were presented as posters. Submissions to SCiL also included abstracts, which were presented as talks or posters at the conference. Further information, including the schedule and abstracts, can be found at our website: https://blogs.umass.edu/scil/scil2018/.
This paper examines the relationship between the language input and phonological development. Usi... more This paper examines the relationship between the language input and phonological development. Using novel data from a longitudinal corpus of spontaneous child speech in Polish, we evaluate and compare the predictions of a variety of input-based phonotactic models for syllable structure acquisition. We find that many commonly examined input statistics can make dramatically different predictions, as do different assumptions about the representational units over which statistics are calculated. Several models perform surprisingly well in predicting major effects in production accuracy, but results vary systematically depending on the representational level examined. The most successful models reference multiple abstract units of phonological representation. We also show there are departures between the predictions of the best phonotactic models and the children’s production patterns that suggest input frequency alone is not sufficient to explain the developmental patterns.
We present novel findings on the acquisition of syllable structure in Polish and explore abilitie... more We present novel findings on the acquisition of syllable structure in Polish and explore abilities of various formulations of frequency to account for the acquisition results. We analyze the spontaneous productions of four children, examining accuracy of syllable margins at several levels of representation (CV, sonority, segment). We use mixed-effects modeling to examine differences in accuracy after controlling for effects of word frequency, age, subject, and word length. We also examine differences in acquisition between children. We then compare the observed differences in accuracy to various frequency measures calculated from the child-directed speech to these children. We consider both type and token frequencies calculated at each representational level as well as versions of these measures calculated from speech directed at each individual child. Our results suggest that while frequency can be a good predictor of accuracy, no single frequency measure can fully explain accuracy differences across children or representational levels.
Uploads