LiRA-NLG 2017
Linguistic Resources for Automatic Natural Language Generation Workshop
@ INLG 2017 Conference Santiago de Compostela, Spain
Workshop Dates: September 4, 2017 Conference Dates: September 5-7, 2017
Workshop Proceedings are included in ACL Anthology.
UP Important dates
Abstract Submission deadline: June 2, 2017 (11:50pm CET)
Acceptance notification: July 3, 2017
Submission of camera ready papers: July 20, 2017
Workshops: September 4, 2017
UP Programme
Monday, September 4, 2017 - at the School of Engineering, ETSE Universidade de Santiago de Compostela Location: Campus Vida. Lope Gómez de Marzoa Street.- Room A2
Time | Authors | Paper |
---|---|---|
8:30 - 9:00 | Registration (Entrance Hall) | |
Chair | Peter Machonis | |
9:00 - 9:10 | LiRA @ NLG Welcomes you | |
9:10 - 9:35 | Max Silberztein | From FOAF to English: Linguistic Contribution to Web Semantics
Abstract This paper presents a linguistic module capable of generating a set of English sentences that correspond to a Resource Description Framework (RDF) statement; I discuss how a generator can control the linguistic module, as well as the various limitations of a pure linguistic framework. |
9:35 - 10:00 | Silvia García Méndez, Milagros Fernández Gavilanes, Enrique Costa Montenegro, Jonathan Juncal Martínez and Francisco Javier González Castaño | Lexicon for Natural Language Generation in Spanish Adapted to Alternative and Augmentative Communication
Abstract In this paper we present Elsa, the first lexicon for Spanish with morphological, syntactic and semantic information automatically generated from a well-known pictogram resource and especially tailored for Augmentative and Alternative Communication (AAC). This lexicon, focusing on that specific icon set widely used within AAC applications, is motivated by the need to improve Natural Language Generation (NLG) systems to aid people who have been diagnosed to suffer from communication disorders. In addition, we design an automatic lexicon extension procedure by means of a training process to complete the linguistic data. For this we used a dataset composed of novels and tales in Spanish, with pictogram representations, since the lexicon is meant for AAC applications for children with disabilities. Moreover, we provide the algorithms used to build our lexicon and a use case of Elsa within an NLG system to observe the usability of our proposal. |
10:00 - 10:25 | Essia Bessaies, Slim Mesfar and Henda Ben Ghezala | Generating Answering Patterns from Factoid Arabic Questions
Abstract This works deals with Arabic factoid Question Answering systems (QA). Commonly, the task of QA is divided into three phases: question analysis, answer pattern generation, and answer extraction. Each phase plays a crucial role in overall performance. In this paper, we focus on the two first phases: Question Analysis and Answer Pattern Generation. We used the NooJ platform which represents a valuable linguistic development environment. The first evaluations show that the actual results are encouraging and could be deployed for more types of questions other than factoid ones. |
10:25 - 10:50 | Kristina Kocijan, Božo Bekavac and Krešimir Šojat | Language Generation from DB Query
Abstract This paper demonstrates how to generate natural language sentences from the pieces of data found in databases in the domain of flight tickets. By using NooJ to add context to specific customer data found in customer data sets, we are able to produce sentences that give a short textual summary of each customer, providing a list of possible suggestions how to proceed. In addition, due to the rich morphology of Croatian, we are giving special attention to matching gender, number and case information where appropriate. Thus we are able to provide individualized and grammatically correct text in spite of the customer gender or the number of tickets bought and inquiries made. We believe that such short NL overviews can help ticket sellers get a quicker assessment of the type of a customer and allow for the exchange of information with more confidence and greater speed. |
11:00 - 11:30 | Coffee break | |
Chair | Kristina Kocijan | |
11:30 - 11:55 | Peter Machonis | Using Electronic Dictionaries and NooJ to Generate Sentences Containing English Phrasal Verbs
Abstract This paper attempts to explore NooJ’s “generation” mode to automatically produce transformations of sentences containing English Phrasal Verbs (PV). We exploit the same electronic dictionary and grammar previously used to recognize PV in large corpora (Machonis 2010, 2012), but have had to design a specific grammar for generating sentences, following the examples in Silberztein (2016), which showed how NooJ could generate over two million transformations or parallel sentences from the simple sentence Joe likes Lea. We created a grammar that can generate variations of a single phrase containing one of the PV found in the NooJ PV dictionary. For the moment the grammar only handles singular nouns in the present and past tense, but it is capable of applying a succession of transformations – particle movement, preterit, negation, clefting, modal insertion, aspect introduction, question formation, and passive voice, along with various combinations of these transformations – to over 1,200 PV from the electronic dictionary. |
11:55 - 12:20 | Hela Fehri and Sondes Dardour | Generating Text with Correct Verb Conjugation: Proposal for a New Automatic Conjugator with NooJ
Abstract This paper describes a system that generates texts with correct verb conjugation. The proposed system integrates a conjugator developed using a linguistic approach. This latter is based on dictionaries and transducers built with the NooJ linguistic platform. The conjugator treats three languages: Arabic, French and English. It recognizes all verbs and allows their conjugation in different tenses. The results obtained are satisfactory and can easily be improved upon by processing other forms, such as the negative. |
12:20 - 12:45 | Jouda Ghorbel | Formalization of Speech Verbs with NooJ for Machine Translation: the French Verb accuser
Abstract The mediocrity of sentences generated by online translators prompts us to try to find a solution to have more reliable translations. This is a very difficult task due to the ambiguity of natural languages and especially the deficiencies of translation systems in terms of syntactic and semantic knowledge. How can we make automatic translation more reliable and unambiguous? Our main objective will be to generate a text where the translation of French verbs into Arabic will be without ambiguities. In this contribution, we attempt to formalize a particular class of verbs, namely the socalled verbs of speech. We shall limit ourselves to the treatment of the verb accuser ‘to accuse’ as presented in the Dubois & Dubois-Charlier (1997) electronic dictionary, Les verbes français. We shall take this verb as a prototype to show how NooJ can perform a reliable machine translation and generate a good text without ambiguities. |
12:45 - 13:10 | Ikram Bououd and Rania Fafi | Using Serious Games to Correct French Dictations: Proposal for a New Unity3D/NooJ Connector
Abstract The remarkable growth in serious game use has gradually pushed them to be present in every single domain. However, in language learning we did not find any reliable games developed for dictation exercises, commonly used for the teaching of French. This involves natural language processing in the form of an interactive game that can automatically generate corrections and assess game users. In order to fill this research gap, we propose to take advantage of the assets provided by the NooJ platform and develop a game combining NooJ and the 3D game platform Unity3D. |
13:10 - 13:35 | Ritamari Bucciarelli and Raffaele Marcone | Linguistic Resources for Automatic Natural Sign Language Generation
Abstract Work-Tools (WT) is a Linguistic Resources for Automatic Natural Sign Language Generation software for automatic textual analysis that describes and transforms language into morphemes, lexemes, and fixed phrases. It involves building a communicative model of switching non-verbal natural languages L1 to verbal L2. The WT software is structured for complex activities in natural language such as "parsing" to recognize and generate texts. It provides a man-machine interaction in the production, questioning, and evaluation of the construction of texts. It is used for didactic purposes and aids in the transformation of languages. WT consists of: (1) a search engine or database for data entry, in which the data described for cognitive areas are transformed into acronyms and indexed; (2) a writing corpus in which we organize the text in free sentences and fixed phrases, in accordance with the grammatical rules and the syntactic relations of the natural language that we propose with the data transfer from L1 to L2; (3) a transcoding corpus and faithful translation of text or textual parts from L1 to L2; and (4) a scroll bar where new text is transmitted in real time. |
13:35 | Max Silberztein | LiRA@NLG Workshop Wrap-Up |
UP Call for papers
In conjunction with INLG 2017 in Santiago de Compostela, we are organizing a half-day workshop on LiRA-NLG (Linguistic Resources for Automatic Natural Language Generation) Workshop.
This workshop aims to bring together linguists who are interested in developing large-coverage linguistic resources and researchers with an interest in developing real-world NLG software. These two communities have been working separately for many years: NLG researchers are typically more focused on technical issues specific to text generation, where good performance (e.g. recall and precision) is crucial, whereas linguists tend to focus on problems related to the development of exhaustive and precise resources that are mainly 'neutral' vis-a-vis any NLP application (e.g. parsing or generating sentences), using various grammatical formalisms such as NooJ, TAG or HPSG.
However, recent progress in both fields is reducing many of these differences, with large-coverage linguistic resources being more and more used by robust NLP software. For instance, NLG researchers can now use large dictionaries of multiword units and expressions, and several linguistic experiments have shown the feasibility of using large phrase-structure grammars (a priori used for text parsing) in 'generation' mode, to automatically produce paraphrases of sentences that are described by grammars.
By encouraging members of both communities to discuss work in related topics with each other, we hope to move towards better joint understanding of the problems involved. This workshop focuses on the following questions:
- How to develop 'neutral' linguistic resources (dictionaries, morphological, phrase-structure and transformational grammars) that can be used both to parse and generate texts automatically.
- Is it possible to generate grammatical sentences by using linguistic data alone, i.e. with no statistical methods to remove ambiguities? What are the limitations of rule-based systems, as opposed to stochastic ones?
Topics can relate to any aspect of NLG, such as:
- large-coverage linguistic resources
- lexicalization
- Machine-Translation
- NLG for real-world application
- paraphrase generation
- phraseology of specialized languages
- rule-based approaches to generation
- comparison between rule-based and statistical approaches to NLG
- surface realization
- text-to-text generation and summarization
- transformational analysis and generation.
We encourage participants to submit papers at the general INLG2017 conference as well
UP Submission Information
Authors are invited to submit short papers describing original, unpublished work, be it completed or in progress. The papers should be maximally 2 pages of main content, with additional pages allowed for references and appendices. All accepted papers will be presented as talks.
Abstract submission will be electronic in PDF format through the EasyChair conference management system.
Abstract submission page will close on June 2nd, 2017 at 23:00 Standard European Time
For full papers, please use INLG Text Formatting Style.
[NEW]Workshop Proceedings are included in ACL Anthology.[NEW]
Reviewing Policy
Reviewing will be single-blind, so authors do not need to conceal their identity. The paper should include the authors' names and affiliations. Self-references are also allowed.
UP Registration
- Regular: EUR 75 (early); EUR 125 (late); EUR 150 (on-site)
- Student: EUR 50 (early); EUR 75 (late); EUR 100 (on-site);
- free registration for student helpers
For more information on registration fees and how to become student-helper, refer to INLG website.
UP Workshop Organizers
- Kristina Kocijan, Assistant Professor of Information and Communication Sciences, University of Zagreb (Croatia)
- Peter Machonis, Professor of French and Linguistics, Florida International University (USA)
- Max Silberztein, Professor of Computer Science and Linguistics, Université de Franche-Comté (France)
UP Scientific Committee
- Héla Fehri (University of Gabes, Tunisia)
- Yuras Hetsevich (United Institute of Informatic Problems, Belarus)
- Kristina Kocijan (University of Zagreb, Croatia)
- Elena Lloret Pastor (Universidad de Alicante, Spain)
- Peter Machonis (Florida International University, USA)
- Slim Mesfar (University of Carthage, Tunisia)
- Simon Mille (Universitat Pompeu Fabra, Spain)
- Max Silberztein (Université de Franche-Comté, France)
UP More information
- Information about the host conference can be found at INLG 2017 web page
- Accommodation -> Redirecting to INLG 2017
- Venue -> Redirecting to INLG2017