Round 3: Completed Weight: 10.0

Columns-Property Annotation (CPA) Challenge

SemTab: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching

8068

100

262

NEWS: Please join our discussion group and visit our website.

This is a task of the ISWC 2020 SemTab challenge (Semantic Web Challenge on Tabular Data to Knowledge Graph Matching). The task is to annotate column pairs (two ordered columns) within a table with properties defined by a knowledge graph (KG) such as DBpedia and Wikidata.

Task Description

Each submission should contain one or no property annotation for one column pair which is identified by a table id, a head column id and a tail column id. Each column pair should be annotated by one property that is as fine grained as possible but correct. Both object properties and data properties are possible. The annotation should be the property's full URI, and case is NOT sensitive

Briefly each line of the submission file should include “table ID”, “head column ID”, “tail column ID”, and “property URI”. The header should be excluded from the submission file. Here is one line example for DBpedia: “50245608_0_871275842592178099”,”0”,”1”,”http://dbpedia.org/ontology/releaseDate”. Another example for Wikidata: "KIN0LD6C","0","1","http://www.wikidata.org/prop/direct/P131".

Notes:

1) Table ID is the filename of the table data, but does not include the extension.

2) Column ID is the position of the column in the table file, starting from 0, i.e., first column’s ID is 0.

3) At most one property should be annotated for one column pair.

4) One submission file should have NO duplicate annotations for one column pair.

6) Annotations for column pairs out of the targets are ignored.

Datasets

Table set for Round #1: Tables, Target Column Pairs, KG: Wikidata

Table set for Round #2: Tables, Target Columns Pairs

Table set for Round #3: Tables, Target Columns Pairs

Table set for Round #4: Tables, Target Columns Pairs

Data Description: The table for Round #1 is generated from Wikidata (Version: March 5, 2020). One table is stored in one CSV file. Each line corresponds to a table row. Note that the first row may either be the table header or content. The column pairs for annotation are saved in a CSV file.

Evaluation Criteria

Precision, Recall and F1_Score will be calculated:

$P r e c i s i o n = \frac{c o r r e c t a n n o t a t i o n s #}{a n n o t a t i o n s #}$

$R e c a l l = \frac{c o r r e c t a n n o t a t i o n s #}{t a r g e t c o l u m n p a i r s #}$

$F 1_S c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}$

Notes:

1) # denotes the number.

2) F1_Score is used as the primary score; Precision is used as the secondary score.

3) An empty annotation of a column pair will lead to an annotated cell; we suggest to exclude the cell with empty annotation in the submission file.

Submission

1. One participant is allowed to make at most 5 submissions per day in Round #1 and #2

Tentative Dates

1. Round #1: 26 May to 20 July

2. Round #2: 25 July to 30 Aug

3. Round #3: 3 September to 17 September

4. Round #4: 20 September to 4 October

Rules

Selected systems with the best results will be invited to present their results during the ISWC conference and the Ontology Matching workshop.
Participants are encouraged to submit a system paper describing their tool and the obtained results. Papers will be published online as a volume of CEUR-WS as well as indexed on DBLP. By submitting a paper, the authors accept the CEUR-WS and DBLP publishing rules.
Please see additional information at our official website

Leaderboard

01	MTab4Wikidata	0.995
02	Team_DAGOBAH	0.993
03	LinkingPark	0.985
04	bbw	0.949
05	Unimib	0.941