BioTable Cell Entity Annotation by Wikidata (BioTable-CEA-WD)
This is a task of Round 2 of ISWC 2021 “Semantic Web Challenge on Tabular Data to Knowledge Graph Matching”. It is to annotate column cells (entity mentions) in a table with entities of Wikidata (version: 20210628). Click here for the official challenge website.
The task is to annotate each target cell with an entity of Wikidata.
Each submission should contain the annotation of the target cell. One cell can be annotated by one entity with the prefix of http://www.wikidata.org/entity/. Any of the equivalent entities of the ground truth entity are regarded as correct. Case is NOT sensitive.
The submission file should be in CSV format. Each line should contain the annotation of one cell which is identified by a table id, a column id and a row id. Namely one line should have four fields: “Table ID”, “Row ID”, “Column ID” and “Entity IRI”. Each cell should be annotated by at most one entity. The headers should be excluded from the submission file. Here is an example: “OHGI1JNY”,”32”,”1”,”http://www.wikidata.org/entity/Q5484”. Please use the prefix of http://www.wikidata.org/entity/ instead of https://www.wikidata.org/wiki/ which is the prefix of the Wikidata page URL.
1) Table ID does not include filename extension; make sure you remove the .csv extension from the filename.
2) Column ID is the position of the column in the table file, starting from 0, i.e., first column’s ID is 0.
3) Row ID is the position of the row in the table file, starting from 0, i.e., first row’s ID is 0.
4) One submission file should have NO duplicate lines for one cell.
5) Annotations for cells out of the target cells are ignored.
Data Description: One table is stored in one CSV file. Each line corresponds to a table row. The first row may either be the table header or content. The target cells for annotation are saved in a CSV file.
Precision, Recall and F1 Score are calculated:
Precision = (Correct Annotations #) / (Submitted Annotations #)
Recall = (Correct Annotations #) / (Ground Truth Annotations #)
F1 Score = (2 * Precision * Recall) / (Precision + Recall)
1) # denotes the number.
2) F1 Score is used as the primary score; Precision is used as the secondary score.
3) One target cell, one ground truth annotation, i.e., # ground truth annotations = # target cells. The ground truth annotation has already covered all equivalent entities (e.g., wiki page redirected entities); the groud truth is hit if one of its equivalent entities is hit.
IBM Research has promised contributed to the prizes!
The prize winners will be announced during the ISWC conference (October 24 - 28, 2021). We will take into account all evaluation rounds specially the ones running till the conference dates.
Participants are encouraged to submit a system paper describing their tool and the obtained results. Papers will be published online as a volume of CEUR-WS as well as indexed on DBLP. By submitting a paper, the authors accept the CEUR-WS and DBLP publishing rules.
Please see additional information at our official website