Round 1: Completed Weight: 1.0

# BioTable Column Type Annotation by Wikidata (BioTable-CTA-WD)

477
50
7
53

This is a task of Round 2 of ISWC 2021 “Semantic Web Challenge on Tabular Data to Knowledge Graph Matching”.  It’s to annotate an entity column (i.e., a column composed of entity mentions) in a table with types from Wikidata (version: 20210628).

The task is to annotate each entity column by items of Wikidata as its type. Each column can be annotated by multiple types: the one that is as fine grained as possible and correct to all the column cells, is regarded as a perfect annotation; the one that is the ancestor of the perfect annotation is regarded as an okay annotation; others are regarded as wrong annotations.

The annotation can be a normal entity of Wikidata, with the prefix of http://www.wikidata.org/entity/, such as http://www.wikidata.org/entity/Q8425. Each column should be annotated by at most one item. A perfect annotation is encouraged with a full score, while an okay annotation can still get a part of the score. Example: "KIN0LD6C","0","http://www.wikidata.org/entity/Q8425". Please use the prefix of http://www.wikidata.org/entity/ instead of the URL prefix https://www.wikidata.org/wiki/.

The annotation should be represented by its full IRI, where the case is NOT sensitive. Each submission should be a CSV file. Each line should include a column identified by table id and column id, and the column's annotation (a Wikidata item). It means one line should include three fields: “Table ID”, “Column ID” and “Annotation IRI”. The headers should be excluded from the submission file.

Notes:

1) Table ID is the filename of the table data, but does NOT include the extension.

2) Column ID is the position of the column in the input, starting from 0, i.e., first column’s ID is 0.

3) One submission file should have NO duplicate lines for each target column.

4) Annotations for columns out of the target columns are ignored.

## Datasets

Bio table set for Round #1: Tables, Target Columns

Data Description: One table is stored in one CSV file. Each line corresponds to a table row. The first row may either be the table header or content. The target columns for annotation are saved in a CSV file.

## Evaluation Criteria

We encourage one perfect annotation, and at same time score one of its ancestors (okay annotation). Thus we calculate Approximate Precision (APrecision), Approximate Recall (ARecall), and Approximate F1 Score (AF1):

$$APrecision = {\sum_{a \in all\ annotations}g(a) \over all\ annotations\ \#}$$

$$ARecall = {\sum_{col \in all\ target\ columns}(max\_annotation\_score(col)) \over all\ target\ columns\ \#}$$

$$AF1 = {2 \times APrecision \times ARecall \over APrecision + ARecall}$$

Notes:

1) # denotes the number.

2) $$g(a)$$ returns the full score $$1.0$$ if $$a$$ is a perfect annotation, returns $$0.8^{d(a)}$$ if $$a$$ is an ancestor of the perfect annotation and its depth to the perfect annotation $$d(a)$$ is not larger than 5,  returns $$0.7^{d(a)}$$ if $$a$$ is a descendent of the perfect annotation and its depth to the perfect annotation $$d(a)$$ is not larger than 3, and returns 0 otherwise. E.g., $$d(a)=1$$ if $$a$$ is a parent of the perfect annotation, and $$d(a)=2$$ if $$a$$ is a grandparent of the perfect annotation.

3)  $$max\_annotation\_score(col)$$ returns $$g(a)$$ if $$col$$ has an annotation $$a$$, and 0 of $$col$$ has no annotation.

4) $$AF1$$ is used as the primary score, and $$APrecision$$ is used as the secondary score.

5) A cell may have multiple equivalent Wikidata items as its GT (e.g., redirected pages Q20514736 and Q852446). For an annotated entity, our evaluator will calculate the score with each GT entity and select the maximum score.

## Submission

1. One participant is allowed to make at most 10 submissions per day in Round 2.

## Prizes

IBM Research has promised contributed to the prizes!

## Rules

1. Selected systems with the best results will be invited to present their results during the ISWC conference and the Ontology Matching workshop.

2. The prize winners will be announced during the ISWC conference (October 24 - 28, 2021). We will take into account all evaluation rounds specially the ones running till the conference dates.

3. Participants are encouraged to submit a system paper describing their tool and the obtained results. Papers will be published online as a volume of CEUR-WS as well as indexed on DBLP. By submitting a paper, the authors accept the CEUR-WS and DBLP publishing rules.