Big Data, Linked Open Data, LRs and HLT
The ever-increasing quantities of large and complex digital datasets, structured or unstructured, multilingual, multimodal or multimedia, pose new challenges but at the same time open up new opportunities for HLT and related fields. Ubiquitous data and information capturing devices, social media and networks, the web at large with its big data / knowledge bases and other information capturing / aggregating / publishing platforms are providing useful information and/or knowledge for a wide range of LT applications.
LREC 2014 puts a strong emphasis on the synergies of the big Linked Open Data and LRs/LT communities and their complementarity in cracking LT problems and developing useful applications and services.
LRs in the Collaborative Age
The amount of collaboratively generated and used language data is constantly increasing and it is therefore time to open a wide discussion on such LRs at LREC. There is a need to discuss the types of LRs that can be collaboratively generated and used.
Are lexicons, dictionaries, corpora, ontologies (of language data), grammars, tagsets, data categories, all possible fields in which a collaborative approach can be applied? Can collaboratively generated LRs be standardised/harmonised? And how can quality control be applied to collaboratively generated LRs? How can a collaborative approach ensure that less-resourced languages receive the same digital dignity as mainstream languages?
There is also a need to discuss legal aspects related to collaboratively generated LRs. And last but not least: are there different types of collaborative approaches, or is the Wikimedia style the best approach to collaborative generation and use of LRs?