Loading
Round 1: 47 days left

SemTab 2020 Column-Type Annotation (CTA) Challenge

NEWS: Please join our discussion group and visit our website

This is a task of ISWC 2020 challenge “SemTab: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching”. It’s to annotate an entity column (i.e., a column composed of entity mentions) in a table with types of a knowledge graph (KG) such as DBpedia and Wikidata.

Task Description

The task is to annotate each entity column by components of a KG as its type. The cases for DBpedia and Wikidata are a bit different. Please refer to corresponding task description and evaluation metrics for the KG used in each dataset (round).

For DBpedia, the annotation should be from DBpedia ontology classes, but excludes owl:Thing and owl:Agent. Each column can be annotated by multiple classes: the one that is as fine grained as possible and correct to all the column cells, is regarded as a perfect annotation; the one that is the ancestor of the perfect annotation is regarded as an okay annotation; others are regarded as wrong annotations.

For Wikidata, the annotation can be a normal item such as https://www.wikidata.org/wiki/Q6256. Each column should be annotated by at most one item. A perfect annotation is encouraged with a full score, while an okay annotation can still get a part of the score. 

In both cases, the annotation should be represented by its full URI (the case is NOT sensitive). Each submission should be a CSV file. Each line should include a column identified by table id and column id and its annotation(s). It means one line should include three fields: “Table ID”, “Column ID” and “Annotation URI”. The headers should be excluded from the submission file. Multiple annotations should be separated by a space, and their order does not matter. Here is an example: “9206866_1_8114610355671172497”,”0”,”http://dbpedia.org/ontology/Country http://dbpedia.org/ontology/PopulatedPlace http://dbpedia.org/ontology/Place”. 

Notes:

1) Table ID is the filename of the table data, but does NOT include the extension.

2) Column ID is the position of the column in the input, starting from 0, i.e., first column’s ID is 0.

3) One submission file should have NO duplicate lines for each target column.

4) Annotations for columns out of the target columns are ignored.

 

Datasets

Table set for Round #1: Tables, Target Columns, KG: Wikidata

Table set for Round #2: Tables, Target Columns

Table set for Round #3: Tables, Target Columns

Table set for Round #4: Tables, Target Columns

Data Description: One table is stored in one CSV file. Each line corresponds to a table row. The first row may either be the table header or content. The target columns for annotation are saved in a CSV file.

Evaluation Criteria

We use different metrics for DBpedia and Wikidata. Please calculate the correct metrics according to the KG given in the dataset for each round.

 

For DBpedia, the following metrics named Average Hierarchical Score (AH_Score) and Average Perfect Score (AP_Score) are calculated for ranking:

\(AH\_Score = {1 \times (perfect\ annotations\ \#) + 0.5\times(okay\ annotations\ \#) - 1\times (wrong\ annotations\ \#) \over target\ columns\ \#}\)

\(AP\_Score = {perfect\ annotations\ \# \over all\ annotations\ \#}\)

Notes:

1) # denotes the number.

2) AH_Score is used as the primary score to encourage as more correct annotations as possible; AP_Score is used as the secondary score.

3) See more details of the metrics in the resource paper SemTab 2019.

 

For Wikidata, we encourage one perfect annotation, and at same time score one of its ancestors (okay annotation). Thus we calculate Approximate Precision (APrecision), Approximate Recall (ARecall), and Approximate F1 Score (AF1):

\(APrecision = {\sum_{a \in all\ annotations}g(a) \over all\ annotations\ \#}\)

\(ARecall = {\sum_{col \in all\ target\ columns}(max\_annotation\_score(col)) \over all\ target\ columns\ \#}\)

\(AF1 = {2 \times APrecision \times ARecall \over APrecision + ARecall}\)

Notes:

1) # denotes the number.

2) \(g(a)\) returns the full score \(1.0\) if \(a\) is a perfect annotation, returns \(0.8^{d(a)}\) if \(a\) is an ancestor of the perfect annotation and its depth to the perfect annotation \(d(a)\) is not larger than 5, returns \(0.7^{d(a)}\) if \(a\) is a descendent of the perfect annotation and its depth to the perfect annotation \(d(a)\) is not larger than 3, and returns 0 otherwise. E.g., \(d(a)=1\) if \(a\) is a parent of the perfect annotation, and \(d(a)=2\) if \(a\) is a grandparent of the perfect annotation.

3)  \(max\_annotation\_score(col)\) returns \(g(a)\) if \(col\) has an annotation \(a\), and 0 of \(col\) has no annotation.

4) \(AF1\) is used as the primary score, and \(APrecision\) is used as the secondary score. 

Submission

1. One participant is allowed to make at most 5 submissions per day in Round #1 and #2

Tentative Dates

1. Round #1: 26 May to 20 July

2. Round #2: 25 July to 30 Aug

3. Round #3: 3 September to 17 September

4. Round #4: 20 September to 4 October

Rules

  1. Selected systems with the best results will be invited to present their results during the ISWC conference and the Ontology Matching workshop.

  2. Participants are encouraged to submit a system paper describing their tool and the obtained results. Papers will be published online as a volume of CEUR-WS as well as indexed on DBLP. By submitting a paper, the authors accept the CEUR-WS and DBLP publishing rules.

  3. Please see additional information at our official website

Latest Submissions

Jiaoyan graded