Sience Research at VU-UvA: re-Search

When doing research, you often use data gathered by your predecessors. But how do you know whether the data you need exist, where you can find them, what their history is? To assist in such matters, the brand new project Re-Search has been set up. It was awarded funding by the NWO programme Creative Industry on 5 October 2015, and will commence in early 2016.

10/06/2015 | 4:46 PM

The project is led by Maarten de Rijke. Re-Search uses Amsterdam’s landscape in an interesting way, he explains, with UvA, VU Amsterdam, and the eHumanities department of the KNAW joining up with Elsevier, ‘one of the big international scientific publishers, just around the corner from here.’ The project revolves around research data sets, collected using questionnaires, lab studies, observations and experiments, for example. Researchers share such data sets in scientific publications and digital archives. Re-Search wants to make use of this in three ways, De Rijke explains. ‘We want to make the data findable. We want to find out how we can make it easier to search through data, and how the use of data sets has changed over the years.’

The three participating institutes will each appoint a PhD student to work on this from a particular perspective. VU Amsterdam’s research, for instance, will primarily focus on knowledge representation. ‘How can you semantically enrich research data to make them easier to find?’ UvA researchers have expertise in the field of search engines and self-learning methods to improve them, De Rijke explains. Searching for research data is a greater challenge than simply searching for articles. ‘Research data sets are defined by their mode of usage and the conclusions that follow from that. They build up a sort of curriculum vitae over time, and as with people, that CV defines data sets.’

To illustrate: ‘Research into natural language processing often concerns the recognition of entities in a text: places, buildings, persons. Such a study may begin with a collection of texts, with all entities correctly identified in them, by hand. Other researchers then come up with an automatic method and test it using these manually annotated texts. Hence, I may achieve an accuracy percentage of 80, someone else in a subsequent study 82, etc. The research data set gains more and more value throughout that process.’

As a result, users are not just interested in the original data set; what they especially want to know is what has been done with it since its creation, in what order, and with what results, De Rijke continues his explanation. ‘All that has to be made findable.’ What is needed is for advanced engines to dig through the data and automatically generate very accurate and detailed descriptions.

Re-Search partner Elsevier owns websites such as ScienceDirect, where researchers search for articles and research results, and where they will increasingly also be able to find research data sets. This is where the project participants will be live-testing their new algorithms, something that De Rijke is looking forward to. ‘For us as an academic party, it is difficult set up such an experiment. You don’t only need data sets, but also real users, and you need to be able to see how they use the search engine, what they find interesting and what not.’ The results thus obtained will be useful for information retrieval purposes, De Rijke elaborates, allowing the algorithms to continually improve themselves. ‘But it will also allow us, as researchers, to learn about the needs of users, and reveal matters that we had not yet considered. Empirical study always brings surprises. For us, in other words, this is an exciting opportunity for intellectual surprises and enrichment.’

Highlight Re-Search