{"id":29798,"date":"2018-07-13T12:28:05","date_gmt":"2018-07-13T10:28:05","guid":{"rendered":"https:\/\/blog.wikimedia.de\/?p=29798"},"modified":"2018-07-13T12:28:05","modified_gmt":"2018-07-13T10:28:05","slug":"wikibase-workshop-in-berlin","status":"publish","type":"post","link":"https:\/\/blog.wikimedia.de\/2018\/07\/13\/wikibase-workshop-in-berlin\/","title":{"rendered":"Wikibase Workshop in Berlin"},"content":{"rendered":"
by Thomas Arrow<\/strong><\/p>\n Following a first Wikibase workshop in Antwerp<\/a> a follow up meeting was held in Berlin at the WMDE offices<\/a> funded by the European Research Council and focussing on the modelling of grant data in Wikibases.<\/p>\n Participants of the Wikibase workshop in Berlin. Photo by Lisa-Marie K\u00f6hler, CC BY-SA 4.0<\/p><\/div>\n The workshop started at Sunday lunchtime with a workup day of talks setting the scene for a variety of different topic areas:<\/p>\n A talk by Diego from the ERC was first. He talked about the past work he has done on modelling grant data and the challenges he found.<\/p>\n Next came a talk on \u201cFederation first\u201d from Andra Waagmeester (Wikimedia volunteer, Member of the Gene Wiki Project) about how he sees Wikibase federation which he put succinctly as \u2018SPARQL\u2019 and \u00a0federation between graph database endpoints. This was followed by a description by Lydia Pintscher (WMDE) of all the different types of federation she could envision; including Andra\u2019s interpretation and how close \/ far we were to seeing these types within the Wikibase ecosystem.<\/p>\n Next up was a talk about Shex from Eric Prud’hommeaux who is on the W3C Shex Community Group. He described to us how Shape Expressions could be used to validate data stored in places like wikibase. He also showed an in browser Shex validator that can indicate if a given graph meets the constraints of a particular shape expression.<\/p>\n We then heard about FAIR data principle and how that could work with wikidata and federated wikibases.<\/p>\n After a short break we heard about the OpenAire project which has an api to provide open data on EU funded research projects and their outputs.<\/p>\n Raz Shuty of Wikimedia Deutschland gave a short talk about continued developments made on the tool called bubber. He explained that it was still being developed but the goal was to provide a click through interface to generate the config file to set up a containerized wikibase.<\/p>\n Daniel Mietchen and Tom Arrow then presented another outcome of the earlier Antwerp workshop: the wikibase registry which is a wikibase that stores information about other wikibases and encouraged the audience to include their setup on there if it was currently missing.<\/p>\n To finish the day we heard about methods of loading data into Wikidata or a Wikibase. First we heard from Gregg Thompson about the WikidataIntegrator<\/a> tool that has been extensively used by the Genewiki project. He explained that he\u2019d recently adapted it to work with Wikibases other than Wikidata. Antonin Delpeuch told us about a tool called OpenRefine<\/a> which he works on as a volunteer developer. It provides a graphical interface to load data into Wikidata and he told us he was keen to adapt it to work with arbitrary Wikibases if possible.<\/p>\n Monday started bright and early and we came together into groups to work on different topic areas for the next day and a half.<\/p>\n We had one group working on importing data about grants into Wikidata using OpenRefine. Specifically, they worked on a small dataset of researchers and metadata about them, e.g. Orcid and ScopusID. They used OpenRefine to \u2018reconcile\u2019 this dataset against that parts of this dataset that already existed on Wikidata.<\/p>\n Another area that was being worked on was linking a variety of other datasets that already exist on Wikidata to funding sources. For example, they looked at cell lines and linking those to the discovery publication using SPARQL on the Wikidata query service. A similar strategy but from \u2018the other direction\u2019 was employed by searching for scientific papers on Wikidata that had a Main Subject<\/i> of a piece of scientific software. This could then help determine who and how that software was funded.<\/p>\n The second group worked in great detail on a project that was finally titled DIEGO (Data Integration Extension for Grants Ontology), a detailed graph model to describe the funding for projects. They described the outcomes of this in Shex and could then use the online validator shown on Sunday by Mark Thompson to validate examples direct from Wikidata against it. The model could be summed up verbally as: \u201cFunders empower bureaucrats, who provide money, in partial payments, to projects, which have participants, to attain given goals, possibly in collaboration with other projects.\u201d<\/p>\n The third group worked on the WikidataIntegrator tool as previously shown by Gregg Stupp. They were keen to adapt to to work on a wider range of papers than it previously did. It used to only work on those papers that were available on PubMed or PubMedCentral and they succeeded in having a working version getting data on papers from CrossRef.<\/p>\n A fourth group worked on infrastructure and was thinking about the practicalities of having a world with many people running many wikibases. They created a new Wikibase on the WMF CloudVPS infrastructure to store data about grants and funders that may be too fine grainer for the Wikidata community to want to curate. This was named ORIG (Open Research Impact Graph).<\/p>\n
<\/a>