The current state-of-the-art Entity Linking (EL) systems are geared towards
corpora that are as h... more The current state-of-the-art Entity Linking (EL) systems are geared towards corpora that are as heterogeneous as the Web, and therefore perform sub-optimally on domain-specific corpora. A key open problem is how to construct effective EL systems for specific domains, as knowledge of the local context should in principle increase, rather than decrease, effectiveness. In this paper we propose the hybrid use of simple specialist linkers in combination with an existing generalist system to address this problem. Our main findings are the following. First, we construct a new reusable benchmark for EL on a corpus of domain-specific conversations. Second, we test the performance of a range of approaches under the same conditions, and show that specialist linkers obtain high precision in isolation, and high recall when combined with generalist linkers. Hence, we can effectively exploit local context and get the best of both worlds.
In this paper we present LocLinkVis (Locate-Link-Visualize); a system which supports exploratory ... more In this paper we present LocLinkVis (Locate-Link-Visualize); a system which supports exploratory information access to a document collection based on geo-referencing and visualization. It uses a gazetteer which contains representations of places ranging from countries to buildings, and that is used to recognize toponyms, disambiguate them into places, and to visualize the resulting spatial footprints.
Proceedings of the first international workshop on Entity recognition & disambiguation - ERD '14, 2014
ABSTRACT Recently, Entity Linking and Retrieval turned out to be one of the most interesting task... more ABSTRACT Recently, Entity Linking and Retrieval turned out to be one of the most interesting tasks in Information Extrac-tion due to its various applications. Entity Linking (EL) is the task of detecting mentioned entities in a text and linking them to the corresponding entries of a Knowledge Base. EL is traditionally composed of three major parts: i)spotting, ii)candidate generation, and iii)candidate disam-biguation. The performance of an EL system is highly de-pendent on the accuracy of each individual part. In this paper, we focus on these three main building blocks of EL systems and try to improve on the results of one of the open source EL systems, namely DBpedia Spotlight. We propose to use text pre-processing and parameter tuning to "focus" a general-purpose EL system to perform better on different kinds of input text. Also, one of the main drawbacks of EL systems is identifying where a name does not refer to any known entity. To improve this so-called NIL-detection, we define different features using a set of texts and their known entities and design a classifier to automatically classify DB-pedia Spotlight's output entities as "NIL" or "Not NIL". The proposed system has participated in the SIGIR ERD Chal-lenge 2014 and the performance analysis of this system on the challenge's datasets shows that the proposed approaches successfully improve the accuracy of the baseline system.
Computer-supported group formation enables educators to assign students to project teams. The foc... more Computer-supported group formation enables educators to assign students to project teams. The focus in this paper is placed on gathering data about student attributes that are relevant in the context of specific course projects. We developed a method that automatically produces learner models from existing docu-ments, by linking students to topics and estimating the levels of skill, knowledge, and interest that students have in these topics. The method is evaluated in an experiment with student participants, wherein its per-formance is measured on two levels. Our results demonstrate that it is possible to link students to topics with high precision, but suggest that estimating mastery levels is a more challenging task.
An open content (OC) repository that has a formal knowledge structure, but which allows users to ... more An open content (OC) repository that has a formal knowledge structure, but which allows users to share across organizational boundaries, could solve existing knowledge search problems. However, such an OC repository that is used beyond the context of a team or firm will still need to have boundaries set for its contents in order to stay relevant. This paper deals with how these boundaries can be created for specific fields within product development and how users play a role in detailing these boundaries. Because the users of the repository have the freedom to author and edit articles, it is necessary to discuss which information is relevant to the intended field. The authors propose a framework for discourse that is based on descriptions of the field in terms of product areas, used disciplines and design aspects. A repository for Industrial Design Engineering serves as an example case.
WikID, an Industrial Design Engineering (IDE) wiki, is an online initiative targeted at designers... more WikID, an Industrial Design Engineering (IDE) wiki, is an online initiative targeted at designers to facilitate the finding of relevant information on the World Wide Web. Because the users of this website have the freedom to author and edit articles, it is necessary to reach consensus on which information is relevant to industrial design engineers. Therefore we have investigated ‘design relevance’ in literature and have conducted field research containing interviews with experts. As a secondary objective we wish to provide article-writing guidelines for users that help them to decide on design relevance and simultaneously lower the existing technical barrier towards writing and editing articles. Outcomes from the literature study are descriptions of the scope of IDE in terms of products, used disciplines and design aspects. These results were used to create a mission for WikID. We conclude that design relevance is not a property of information, but the situation where user expectations are met. The data that was gathered from the interviews showed commonalities between articles in the same category but not between every article. Therefore each category benefits from its own article-writing guidelines. These were applied on the website as ‘forms’ in order to validate the results.
Industrial design engineers use a wide variety of knowledge fields when making decisions in their... more Industrial design engineers use a wide variety of knowledge fields when making decisions in their design process. Obviously, designers cannot master every field, so they are therefore often looking for a set of rules of thumb on a particular subject. For this need a knowledge database in wiki format has been developed through a chain of studies: WikID, a portmanteau of wiki and industrial design. WikID aims to be a design tool. It offers information in a compact manner tailored to its targeted user group: industrial design engineers. The paper describes the development of WikID in a chain of studies. For a knowledge database the main issue is the labour and time consuming nature of collecting, selecting and structuring the contents for the database. For this issue, the focus is set on the use of wiki-software, and consequently on the importance of a user base, on the creation of an import wizard and templates for reducing the effort required to maintaining the database, and on the options to use semantic properties in a wiki. These topics are studied in literature and by means of a questionnaire amongst 70 respondents. One of the findings is that the targeted user group is willing to use a wiki based database. Other results of this study are an import wizard and a specified content for a materials properties template. After these studies the time had come to put WikID online: www.WikID.eu.
The current state-of-the-art Entity Linking (EL) systems are geared towards
corpora that are as h... more The current state-of-the-art Entity Linking (EL) systems are geared towards corpora that are as heterogeneous as the Web, and therefore perform sub-optimally on domain-specific corpora. A key open problem is how to construct effective EL systems for specific domains, as knowledge of the local context should in principle increase, rather than decrease, effectiveness. In this paper we propose the hybrid use of simple specialist linkers in combination with an existing generalist system to address this problem. Our main findings are the following. First, we construct a new reusable benchmark for EL on a corpus of domain-specific conversations. Second, we test the performance of a range of approaches under the same conditions, and show that specialist linkers obtain high precision in isolation, and high recall when combined with generalist linkers. Hence, we can effectively exploit local context and get the best of both worlds.
In this paper we present LocLinkVis (Locate-Link-Visualize); a system which supports exploratory ... more In this paper we present LocLinkVis (Locate-Link-Visualize); a system which supports exploratory information access to a document collection based on geo-referencing and visualization. It uses a gazetteer which contains representations of places ranging from countries to buildings, and that is used to recognize toponyms, disambiguate them into places, and to visualize the resulting spatial footprints.
Proceedings of the first international workshop on Entity recognition & disambiguation - ERD '14, 2014
ABSTRACT Recently, Entity Linking and Retrieval turned out to be one of the most interesting task... more ABSTRACT Recently, Entity Linking and Retrieval turned out to be one of the most interesting tasks in Information Extrac-tion due to its various applications. Entity Linking (EL) is the task of detecting mentioned entities in a text and linking them to the corresponding entries of a Knowledge Base. EL is traditionally composed of three major parts: i)spotting, ii)candidate generation, and iii)candidate disam-biguation. The performance of an EL system is highly de-pendent on the accuracy of each individual part. In this paper, we focus on these three main building blocks of EL systems and try to improve on the results of one of the open source EL systems, namely DBpedia Spotlight. We propose to use text pre-processing and parameter tuning to "focus" a general-purpose EL system to perform better on different kinds of input text. Also, one of the main drawbacks of EL systems is identifying where a name does not refer to any known entity. To improve this so-called NIL-detection, we define different features using a set of texts and their known entities and design a classifier to automatically classify DB-pedia Spotlight's output entities as "NIL" or "Not NIL". The proposed system has participated in the SIGIR ERD Chal-lenge 2014 and the performance analysis of this system on the challenge's datasets shows that the proposed approaches successfully improve the accuracy of the baseline system.
Computer-supported group formation enables educators to assign students to project teams. The foc... more Computer-supported group formation enables educators to assign students to project teams. The focus in this paper is placed on gathering data about student attributes that are relevant in the context of specific course projects. We developed a method that automatically produces learner models from existing docu-ments, by linking students to topics and estimating the levels of skill, knowledge, and interest that students have in these topics. The method is evaluated in an experiment with student participants, wherein its per-formance is measured on two levels. Our results demonstrate that it is possible to link students to topics with high precision, but suggest that estimating mastery levels is a more challenging task.
An open content (OC) repository that has a formal knowledge structure, but which allows users to ... more An open content (OC) repository that has a formal knowledge structure, but which allows users to share across organizational boundaries, could solve existing knowledge search problems. However, such an OC repository that is used beyond the context of a team or firm will still need to have boundaries set for its contents in order to stay relevant. This paper deals with how these boundaries can be created for specific fields within product development and how users play a role in detailing these boundaries. Because the users of the repository have the freedom to author and edit articles, it is necessary to discuss which information is relevant to the intended field. The authors propose a framework for discourse that is based on descriptions of the field in terms of product areas, used disciplines and design aspects. A repository for Industrial Design Engineering serves as an example case.
WikID, an Industrial Design Engineering (IDE) wiki, is an online initiative targeted at designers... more WikID, an Industrial Design Engineering (IDE) wiki, is an online initiative targeted at designers to facilitate the finding of relevant information on the World Wide Web. Because the users of this website have the freedom to author and edit articles, it is necessary to reach consensus on which information is relevant to industrial design engineers. Therefore we have investigated ‘design relevance’ in literature and have conducted field research containing interviews with experts. As a secondary objective we wish to provide article-writing guidelines for users that help them to decide on design relevance and simultaneously lower the existing technical barrier towards writing and editing articles. Outcomes from the literature study are descriptions of the scope of IDE in terms of products, used disciplines and design aspects. These results were used to create a mission for WikID. We conclude that design relevance is not a property of information, but the situation where user expectations are met. The data that was gathered from the interviews showed commonalities between articles in the same category but not between every article. Therefore each category benefits from its own article-writing guidelines. These were applied on the website as ‘forms’ in order to validate the results.
Industrial design engineers use a wide variety of knowledge fields when making decisions in their... more Industrial design engineers use a wide variety of knowledge fields when making decisions in their design process. Obviously, designers cannot master every field, so they are therefore often looking for a set of rules of thumb on a particular subject. For this need a knowledge database in wiki format has been developed through a chain of studies: WikID, a portmanteau of wiki and industrial design. WikID aims to be a design tool. It offers information in a compact manner tailored to its targeted user group: industrial design engineers. The paper describes the development of WikID in a chain of studies. For a knowledge database the main issue is the labour and time consuming nature of collecting, selecting and structuring the contents for the database. For this issue, the focus is set on the use of wiki-software, and consequently on the importance of a user base, on the creation of an import wizard and templates for reducing the effort required to maintaining the database, and on the options to use semantic properties in a wiki. These topics are studied in literature and by means of a questionnaire amongst 70 respondents. One of the findings is that the targeted user group is willing to use a wiki based database. Other results of this study are an import wizard and a specified content for a materials properties template. After these studies the time had come to put WikID online: www.WikID.eu.
Uploads
Papers by Alex Olieman
corpora that are as heterogeneous as the Web, and therefore perform
sub-optimally on domain-specific corpora. A key open problem is how to
construct effective EL systems for specific domains, as knowledge of the local
context should in principle increase, rather than decrease, effectiveness. In
this paper we propose the hybrid use of simple specialist linkers in
combination with an existing generalist system to address this problem. Our
main findings are the following. First, we construct a new reusable benchmark
for EL on a corpus of domain-specific conversations. Second, we test the
performance of a range of approaches under the same conditions, and show that
specialist linkers obtain high precision in isolation, and high recall when
combined with generalist linkers. Hence, we can effectively exploit local
context and get the best of both worlds.
corpora that are as heterogeneous as the Web, and therefore perform
sub-optimally on domain-specific corpora. A key open problem is how to
construct effective EL systems for specific domains, as knowledge of the local
context should in principle increase, rather than decrease, effectiveness. In
this paper we propose the hybrid use of simple specialist linkers in
combination with an existing generalist system to address this problem. Our
main findings are the following. First, we construct a new reusable benchmark
for EL on a corpus of domain-specific conversations. Second, we test the
performance of a range of approaches under the same conditions, and show that
specialist linkers obtain high precision in isolation, and high recall when
combined with generalist linkers. Hence, we can effectively exploit local
context and get the best of both worlds.