Open Data Quality

Researches and publications about quality of open data

Wikicheck: An end-to-end open source automatic fact-checking api based on Wikipedia

The study reviews the State-of-the-Art datasets and solutions for automatic fact-checking and tested their applicability in production environments. Authors of the publication discovered overfitting issues in those models, and proposed a data filtering method that improves the model’s performance and generalization. Then, the scientists designed an unsupervised fine-tuning of the Masked Language models to improve its accuracy working with Wikipedia.

Read More

Enhancing the Interactive Visualisation of a Data Preparation Tool from in-Memory Fitting to Big Data Sets

This study presents the challenges faced and the solutions adopted while evolving the web-based graphical user interface (GUI) of a tabular data preparation tool from in-memory fitting to Big Data sets. Traditional standalone processing and rendering solutions are no longer usable in a Big Data context.

Read More

Technical Usability of Wikidata’s Linked Data

Wikidata is an outstanding data source with potential application in many scenarios. Wikidata provides its data openly in RDF. This study aims to evaluate the usability of Wikidata as a data source for robots operating on the web of data, according to specifications and practices of linked data, the Semantic Web and ontology reasoning.

Read More

Analyzing OpenStreetMap Contributions at Scale: Introducing OSM-Interactions Tilesets

OSM-interaction tilesets are vector tiles containing GeoJSON features that represent interactions between mappers (contributors) and objects in OpenStreetMap (OSM). Interactions are abstractions of edits to OSM elements called nodes, ways, and relations. Example interactions are contributors “adjusting the corners of a building” or “re-aligning a road” while the edit to the database is recorded as a “modification to the coordinates of a node.”

Read More

Semantic Data Integration and Quality Assurance of Thematic Maps in the German Federal Agency for Cartography and Geodesy

In this paper authors present a new concept of geospatial quality assurance that is currently planned to be implemented in the German Federal Agency of Cartography and Geodesy. Linked open data is being enriched with Semantic Web data in order to create thematic maps relevant to the population.

Read More

Evaluating the Quantity of Incident-Related Information in an Open Cyber Security Dataset

Data-driven security has become essential in many organisations in their attempt to tackle Cyber security incidents. However, whilst the dominant approach to data-driven security remains through the mining of private and internal data, there is an increasing trend towards more open data through the sharing of Cyber security information and experience over public and community platforms. However, some questions remain over the quality and quantity of such open data.

Read More

Approach to Improving the Quality of Open Data in the Universe of Small Molecules

Authors described an approach to improving the quality and interoperability of open data related to small molecules, such as metabolites, drugs, natural products, food additives, and environmental contaminants. The approach involves computer implementation of an extended version of the IUPAC International Chemical Identifier (InChI) system that utilizes the three-dimensional structure of a compound to generate reproducible compound identifiers (standard InChI strings) and universally reproducible designators for all constituent atoms of each compound.

Read More

A New Tool for Automated Quality Control of Environmental Time Series (AutoQC4Env) in Open Web Services

Athors report on the development of a new software tool (AutoQC4Env) for automated quality control (QC) of environmental time series data. Novel features of this tool include a flexible Python software architecture, which makes it easy for users to configure the sequence of tests as well as their statistical parameters, and a statistical concept to assign each value a probability of being a valid data point.

Read More

Special Issue “Quality of Open Data”

The 2nd International Workshop on Quality of Open Data (QOD 2019) will be held in June 2019 in Seville in conjunction with the 22nd International Conference on Business Information Systems. The goal of the workshop is to bring together different communities working on quality in Wikipedia, DBpedia, Wikidata, OpenStreetMap, Wikimapia, and other open knowledge bases. The workshop calls for sharing research experiences and knowledge related to quality assessment in open data.

Read More

Access Control and Quality Attributes of Open Data: Applications and Techniques

Open Datasets provide one of the most popular ways to acquire insight and information about individuals, organizations and multiple streams of knowledge. Exploring Open Datasets by applying comprehensive and rigorous techniques for data processing can provide the ground for innovation and value for everyone if the data are handled in a legal and controlled way. In this study, authors propose an argumentation and abductive reasoning approach for data processing which is based on the data quality background.

Read More

Page 1 of 5

© 2017-2020 Open Data Quality