Hassan Sawaf, eBay Inc., USA
Hassan Sawaf has more than 18 years of experience in research, development, management, and strategic planning of Human Language Technology. He was a senior researcher at the University of Aachen and co-founder and CEO of the German company AIXPLAIN AG, a spin-off of the University that was acquired by AppTek. As Chief Scientist at AppTek, then as Chief Scientist at SAIC, his team developed hybrid machine translation systems and automated speech recognition engines for media monitoring, telephony, and handheld devices. In 2011, Hassan became “SAIC Technical Fellow” for his work in technology thought leadership, and management of research and development in human language technology and machine learning. Last year, Hassan joined eBay to setup, grow, and lead the new machine translation team.
Language Technology for Commerce, the eBay Way - Hassan Sawaf, on Thursday 29 May at 9:00
Machine Translation and Human Language Technology plays a key role in expanding the eBay user experience to other countries. But eBay has to use MT very differently from most other companies, so a range of challenges arise. Challenges include the amount, complexity, and type of data, and also the expectations on speed, and the notion what “good” translation is. Hassan will present an overview of eBay’s work in research on language resource management, computational linguistics, machine learning for language technology, machine translation and evaluation.
Thórhallur Eythórsson, University of Iceland
Icelandic Quirks: Testing Linguistic Theories and Language Technology, on Thursday 29 May at 13:10
Linguists working on Icelandic have brought to the fore a number of important empirical facts that at the time of their initial discussion in the theoretical literature were believed to be crosslinguistically very rare, even unattested. Among such “quirks” are the following syntactic phenomena:
- Oblique (“quirky”) subjects (Andrews 1976, Thráinsson 1979)
- Stylistic Fronting (Maling 1980)
- Long Distance Reflexivization (Thráinsson 1979)
- Object Shift of full NPs (Holmberg 1986)
- The Transitive Expletive Construction (Ottósson 1989, Jonas & Bobaljik 1993)
- The New Passive (New Impersonal) (Maling & Sigurjónsdóttir 2001, Eythórsson 2008)
These phenomena provided a testing ground for various theoretical models because they contradicted conventional views on the nature of grammatical categories and syntactic structure; some even went as far as claiming that Icelandic is “not a natural language”. This pessimistic view was authoritatively examined and dismissed by Thráinsson (1996).
The present paper takes the issue one step further, by showing how the discovery of various linguistic structures of Icelandic has led to the recognition of similar facts in other (Germanic, Indo-European and even unrelated) languages, where they had previously gone unnoticed, or had at least not been problematized in terms of linguistic theory. For example, the insight that syntactic subjects can have a morphological case other than nominative was not generally acknowledged until after the oblique subject hypothesis had been proposed for Icelandic. As a consequence, earlier theories on the relation between case and grammatical function had to be revised. Thus, numerous descriptive facts from Icelandic have advanced theoretical linguistics, in that any model of natural language must take them into account.
In addition to their synchronic status, the syntactic phenomena listed above raise questions about the historical development of such “quirks”. On the one hand, Icelandic is known to be a “conservative” language that has preserved many archaic features; on the other hand, despite its relative stability, numerous innovations are known have taken place in Icelandic, including a number of syntactic changes. Fortunately, we are now in a position to be able to map, at least to a certain degree, the diachrony of Icelandic syntax from the earliest attested documents in the 12th century AD until the present day. This is in particular due to the existence of the Icelandic Parsed Historical Corpus (IcePaHC; Wallenberg et al. 2011), which is currently being put to use in work on Icelandic diachronic syntax. Among other things, this research tool is invaluable in distinguishing between archaisms and innovations in Icelandic syntax. A further corpus, Greinir skáldskapar (“Analyzer of Poetry”) (Karlsson et al. 2012), is particularly useful for the analysis of the syntax of the earliest poetic texts of Icelandic.
In conclusion, the above “quirks” present a challenge both to Linguistic Theory and Language Technology. This paper illustrates, by means of selected examples, how this challenge has been successfully met and how advances in linguistic research proceed in a constant interplay between description and theorizing.
Luc Steels, ICREA, IBE (UPF-CSIC) Barcelona, Spain and VUB AI Lab Brussels, Belgium
Luc Steels is currently an ICREA research professor at the Universitat Pompeu Fabra in Barcelona, working at the Institut de Biologia Evolutiva. He originally studied linguistics at the University of Antwerp (Belgium) and computer science and artificial intelligence at MIT (US). In 1983 he became founding director of the Artificial Intelligence Laboratory of the University of Brussels (VUB) and in 1996 the founding director of the Sony Computer Science Laboratory in Paris. Steels has worked in many areas of AI from knowledge-based systems to robotics but the past decade he has focused on the question of the origins and evolution of language, trying to set up computer simulations and robotic experiments in which new languages emerge through situated embodied interactions. With his team, he has been building various language technologies to be able to do these experiments, including a new computational formalism called Fluid Construction Grammar that attempts to operationalise aspects of construction grammar.
When will robots speak like you and me? - Luc Steels, on Friday 30 May at 9:00
The incredible growth in language resources has lead to unprecedented opportunities for language research and a lot can still be done by exploiting existing corpora and statistical language processing techniques. Nevertheless we should also remain ambitious. We should try to keep forging ahead with fundamental research, trying to tackle new application areas and improving existing applications by more sophisticated linguistic theories and language processing systems.
This talk reports on work in our group on grounded language interaction between humans and robots. This problem is extraordinarily difficult because we need to figure out how to achieve true language understanding, i.e. deep language parsing coupled to a semantics grounded in the sensori-motor embodiment of robots, and true language production, i.e. planning what to say, conceptualising the world for language and translation into utterances. We also need to figure out how artificial agents can cope with highly ungrammatical and fragmentary input by full exploitation of the context. On top of that, we can no longer view language as a static system of conventions but as a living system that is always changing and evolving, with new or shifting word senses and new or shifting usage of grammatical constructions. This implies that artificial speakers and listeners need to constantly learn, expand their language when needed, align themselves to the language use of others, and act as tutors to help others understand and acquire language.
I will present some of the key ideas that we are currently exploring to tackle these enormously challenging issues. They include a novel computational formalism called Fluid Construction Grammar, which is an attempt to operationalise key insights from construction grammar, cognitive linguistics and embodied semantics. Flexible language processing and learning is implemented using a meta-level in which diagnostics detect anomalies or gaps and repair strategies try to cope with them by ignoring ungrammaticalities or expanding the language system. We have also developed techniques for studying language as a complex adaptive system and done several experiments how vocabularies and grammars can emerge in situated embodied interactions between robotic agents. The talk is illustrated with live demos and videoclips of robots playing language games.