Natural language processing and information retrieval
Enormous quantities of text are generated every day, by online applications such as messaging apps, social media, blogs, online publishing platforms. In addition to this, huge quantities of text are also available through more conventional channels such as public policies (laws and regulations), academic publications, technical documentation (manuals), clinical records, etc.

Natural language processing

 Traditionally in NLP words have been represented as discrete and static units of meaning, therefore it was technically difficult to represent the fact that some words are related to each other by similar meaning, and make use of this information in a computational system. Distributed word representations help to overcome this problem by using numerical vectors to represent words, which therefore can be conceived of as points in a multi-dimensional semantic space. It is worth noting that a similar technique had long been used in IR to represent documents, which probably contributed to the early successes of IR as opposed to the slow progress of NLP.
Deep learning uses multi-layered neural networks to process the information provided by words and sentences (represented as word vectors). Through increasingly more sophisticated network architectures significant progress has been achieved to the point that now we have models that can perform better than humans at some pre-defined language comprehension tasks.
The NLP research group at IDSIA specializes in applications of these recent techniques
to practical problems such as browsing and extracting medical knowledge from scientific literature, or analysing social media streams to detect fake news

Information retrieval

The IR research group, on the other hand, is working on the use of advanced text analysis and term weighting techniques for the detection and tracking of mental health disorders in social media. More specifically, the group developed a test collection, an evaluation methodology and several effectiveness metrics for the temporal tracking of the onset of such disorders that are currently used by tens of research group worldwide in the context of CLEF (Cross Language Evaluation Forum). The modelling of the language used by users affected by these mental health disorders is also studied, for instance by means of the the automatic generation of text showing symptoms of a mental health disorder.
Another parallel line of research that the IR group is currently pursuing is related to the general area of Mobile IR in which the group has been active for many years in the context of several past projects (Crestani, 2017). Currently the group is working of Conversational IR, as a way to enhance Mobile IR. In this context, the group is exploring new deep learning models for the generation of clarifying questions that will make it possible for the conversational search system to interact in a multi-turn way with a mobile user (Aliannejadi, 2019, Sekulic, 2021}.