The study reviews the State-of-the-Art datasets and solutions for automatic fact-checking and tested their applicability in production environments. Authors of the publication discovered overfitting issues in those models, and proposed a data filtering method that improves the model’s performance and generalization. Then, the scientists designed an unsupervised fine-tuning of the Masked Language models to improve its accuracy working with Wikipedia.
This study presents the challenges faced and the solutions adopted while evolving the web-based graphical user interface (GUI) of a tabular data preparation tool from in-memory fitting to Big Data sets. Traditional standalone processing and rendering solutions are no longer usable in a Big Data context.
Wikidata is an outstanding data source with potential application in many scenarios. Wikidata provides its data openly in RDF. This study aims to evaluate the usability of Wikidata as a data source for robots operating on the web of data, according to specifications and practices of linked data, the Semantic Web and ontology reasoning.
OSM-interaction tilesets are vector tiles containing GeoJSON features that represent interactions between mappers (contributors) and objects in OpenStreetMap (OSM). Interactions are abstractions of edits to OSM elements called nodes, ways, and relations. Example interactions are contributors “adjusting the corners of a building” or “re-aligning a road” while the edit to the database is recorded as a “modification to the coordinates of a node.”
In this paper authors present a new concept of geospatial quality assurance that is currently planned to be implemented in the German Federal Agency of Cartography and Geodesy. Linked open data is being enriched with Semantic Web data in order to create thematic maps relevant to the population.
Data-driven security has become essential in many organisations in their attempt to tackle Cyber security incidents. However, whilst the dominant approach to data-driven security remains through the mining of private and internal data, there is an increasing trend towards more open data through the sharing of Cyber security information and experience over public and community platforms. However, some questions remain over the quality and quantity of such open data.
Authors described an approach to improving the quality and interoperability of open data related to small molecules, such as metabolites, drugs, natural products, food additives, and environmental contaminants. The approach involves computer implementation of an extended version of the IUPAC International Chemical Identifier (InChI) system that utilizes the three-dimensional structure of a compound to generate reproducible compound identifiers (standard InChI strings) and universally reproducible designators for all constituent atoms of each compound.
Athors report on the development of a new software tool (AutoQC4Env) for automated quality control (QC) of environmental time series data. Novel features of this tool include a flexible Python software architecture, which makes it easy for users to configure the sequence of tests as well as their statistical parameters, and a statistical concept to assign each value a probability of being a valid data point.
The 2nd International Workshop on Quality of Open Data (QOD 2019) will be held in June 2019 in Seville in conjunction with the 22nd International Conference on Business Information Systems. The goal of the workshop is to bring together different communities working on quality in Wikipedia, DBpedia, Wikidata, OpenStreetMap, Wikimapia, and other open knowledge bases. The workshop calls for sharing research experiences and knowledge related to quality assessment in open data.
Open Datasets provide one of the most popular ways to acquire insight and information about individuals, organizations and multiple streams of knowledge. Exploring Open Datasets by applying comprehensive and rigorous techniques for data processing can provide the ground for innovation and value for everyone if the data are handled in a legal and controlled way. In this study, authors propose an argumentation and abductive reasoning approach for data processing which is based on the data quality background.
