Svoboda | Graniru | BBC Russia | Golosameriki | Facebook

This website requires cookies, and the limited processing of your personal data in order to function. By using the site you are agreeing to this as outlined in our privacy notice and cookie policy.

Annotations submission service

Introduction 


The annotations submission service is a mechanism to publish annotations on the Europe PMC annotations platform.

Simply put, you provide us with the annotations you wish to share, and we publish them on the Europe PMC website via SciLite and make them available through the Europe PMC Annotations API.

Ground rules 


  • Annotations should enrich the content of any abstract plus all Open Access full text articles contained in the Europe PMC platform by highlighting biological or methodological entities and providing links to related resources.
  • Links to the biological entities tagged by annotations should be public and accessible without restrictions (no subscriptions, or login screens).
  • All annotations published on the Europe PMC annotations platform will be considered public domain and will be published on the website and shared via the Europe PMC Annotations API.
  • We strongly encourage any algorithms/code used to generate annotations sets to be sharable/open source.
  • We reserve the right to take down annotations, if, for example, the content is not in scope, or is no longer reasonably maintained.

Getting started 


Please email us at [email protected], briefly describing the annotations you wish to share. We will get in touch with you and provide the information you need to generate the annotations datasets and upload them into the platform.

Once you have generated annotation files according to the instructions below, you will be able to upload your data to the platform using your private Cloud Storage system.

We have tried to make it easy, so that even if you do not have strong technical skills, it should still be possible to contribute.

By using the Annotations Submission System you acknowledge that you have read and accepted the Europe PMC Advanced User Services Privacy Notice.

Submission process 


Once you have contacted us at [email protected], we will provide you with the following information:

  1. A provider ID to use in the submitted files.
  2. URL for your private Cloud Storage system as well as username and password necessary to submit the annotations file. You are free to change those credentials whenever you want, but if you do that please let us know by writing an email to [email protected] as soon as possible. This will allow us to modify our system in order to be compliant with the new credentials.
  3. Specifications for the dataset.

With the information mentioned above it is possible to submit the file containing the annotations to the platform, either using the web browser or programmatically. For more information see "How to share text mining results in biology" video describing the process.

Each submission will consist of a single file. Every file must have less than 10000 rows, where each row represents an individual article with all associated annotations. If your dataset contains more than 10000 articles, and thus you have more than 10000 rows to upload, you can generate multiple files and then submit either a zip or a gzipped tar file containing all the data (unix command: tar -czvf submission_file.tar.gz ./*).

If you use the browser, you can follow the procedure described and shown below:

  • Go to the login page of the assigned private Cloud Storage system and login using your credentials
  • Click on the "submissions" link on the left hand side of the page
  • Click the "+" icon and then the "Upload file" icon on the right bottom part of the page to submit the file
Annotations submission

You can also submit files programmatically. To do that use the driver of the Cloud Storage System described here. There are drivers available in many different languages. Here is a simple example using Java Driver for submission:

  MinioClient minioClient = new MinioClient("https://annotations.europepmc.org", "your_username", "your_password");
  File fileToSend= new File(fileName);
  minioClient.putObject("submissions", fileToSend.getName(), new FileInputStream(fileToSend), "application/octet-stream");
			

Note that the URL specified in this case is different from the URL that you need to access from the browser to submit the file.

Once the file is submitted, it will be processed by the submission system. The system is supposed to run every 60 minutes, so it could take some time to acknowledge new files. Once the process starts, you will receive two emails:

  1. The first email will notify you that your file is going to be processed and loaded in the system. You can expect the email subject to say "Loading of the Annotations file <File Name> of the provider <Provider Name> starting".
  2. A second email will be sent once the submission file has been processed. It will notify you about the outcome of the operation reflected in the email subject. If the operation is successful the subject will be "Annotations Loading of the file <File Name> performed successfully" and you can find the number of articles processed successfully in the email attachment. If the process fails the subject will be "Annotations Loading of the file <File Name> failed" and the email will state the reason for the loading fail, asking you to resolve identified problems and resubmit the data.

The data about the submissions outcome can also be found once you are logged into the private Cloud Storage System using the "results" link on the left hand side of the browser. For example, if you submit a file "abstract.09_06_2018.tar.gz", you will find the relevant files containing the information sent by email in the submissions folder (file "Log_abstract.09_06_2018.tar.gz.txt"), and the results folder (file "Result_abstract.09_06_2018.tar.gz.txt").

Annotation submission results

Data Format 


Each file must be UTF-8 encoded.

Each row in the annotations file will contain one JSON object with all the annotations for a specific article:

{ "src":"MED", "id": "27105176", "provider": "europepmc", "anns": [  { "position": "1.2", "prefix": "Noninvasive Markers to Assess ", "exact": "Liver Fibrosis", "section": "Title", "postfix": ". ", "tags": [ { "name": "Liver Fibrosis", "uri": "http://linkedlifedata.com/resource/umls-concept/C0239946" } ] }, {"position": "2.1", "prefix": "", "exact": "Chronic liver disease", "section": "Abstract", "postfix": " represents a major public health proble", "tags": [ { "name": "Chronic liver disease", "uri": "http://linkedlifedata.com/resource/umls-concept/C0341439" } ] }, {"position": "3.2", "prefix": " and progression of ", "exact": "liver fibrosis", "section": "Abstract", "postfix": " with time and the r", "tags": [ { "name": "liver fibrosis", "uri": "http://linkedlifedata.com/resource/umls-concept/C0239946" } ] }, {"position": "3.4", "prefix": "h time and the risk of development of ", "exact": "cirrhosis", "section": "Abstract", "postfix": ". ", "tags": [ { "name": "cirrhosis", "uri": "http://linkedlifedata.com/resource/umls-concept/C0023890" } ] }, {"position": "7.2", "prefix": "essing the presence and the degree of ", "exact": "liver fibrosis", "section": "Abstract", "postfix": ". ", "tags": [ { "name": "liver fibrosis", "uri": "http://linkedlifedata.com/resource/umls-concept/C0239946" } ] }, {"position": "8.2", "prefix": "e methods useful in the evaluation of ", "exact": "liver fibrosis", "section": "Abstract", "postfix": ". ", "tags": [ { "name": "liver fibrosis", "uri": "http://linkedlifedata.com/resource/umls-concept/C0239946" } ] } ] }
			

There are two types of the annotations in the platform: sentence-based annotations and named entity annotations.

The JSON schema information that each object should adhere to varies according to the annotation type. More details as well as data validation guidelines before submission can be found here.

An example of sentence-based annotations for one article:

  {
    "src": "PMC" , #source of the article			
    "id":"PMC5844054",  #identifier of the article in the context of the source field
    "provider":"Disgenet" , #name of the provider
    "anns": [
      {
        "exact": ".... loss of SBP1 may play... aggressive disease.", # annotation sentence
        "section": "abstract",  #section of the article where the annotation was found.
        "tags": [{    #tagged entities
          "name": "... loss of SBP1 may play... aggressive disease.", #identifying name of the tagged entity
          "uri": "http://purl.uniprot.org/uniprot/Q13228"  #specific URI of the tagged entity
        }]	
      },....#other annotations elements go here
    ]
  }
				

An example of named entity annotations for one article:

  {
    "src": "MED", #source of the article		
    "id": "27105176", #identifier of the article in the context of the source field
    "provider": "europepmc", #name of the provider
    "anns": [
      {
        "position": "1.2", #position of the entity in the article
        "prefix": "Noninvasive Markers to Assess ", #prefix of the entity inside the sentence of the article
        "postfix": ". ", #postfix of the entity inside the sentence of the article
        "exact": "Liver Fibrosis", # entity referred by the annotation
        "section": "Title",  #section of the article where the annotation was found.
        "tags": [{    #tagged entities
          "name": "Liver Fibrosis", #identifying name of the tagged entity
          "uri": "http://linkedlifedata.com/resource/umls-concept/C0239946" #specific URI of the tagged entity
        }]	
      },....#other annotations elements go here
    ]
  }
			

Here is the list of fields with the relative explanations:

NameMeaningNotes
srcSource of the article Mandatory field. It has to be one of the following values:
  • MED: PubMed MEDLINE abstract
  • PMC: PubMedCentral full text article
  • PAT: Patents
  • AGR: Agricola (USDA/NAL)
  • CBA: Chinese biological abstracts
  • HIR: NHS Evidence (UK HIR)
  • CTX: CiteXplore submission
  • ETH: EThOS theses (BL)
  • CIT: CiteSeer (PSU)
id Identifier of the article in the context of the src field provided Mandatory field
providerName of the provider Mandatory field. It must match the identifying name assigned to the provider in the subscription phase
annsList of annotationsMandatory field
anns.positionPosition of the annotation Mandatory field only for named entity annotations. We require the relative order of mined entities within an article. This information is used to help to locate and highlight mined entities in the text. E.g., "1.3" means that the entity was found in the third chunk of the first sentence of the article
anns.prefix Portion of the sentence that appears before the tagged entity Relevant only for named entity recognition annotations. For each annotation at least one field between prefix and postfix must be specified
anns.postfix Portion of the sentence that appears after the tagged entity Relevant only for named entity recognition annotations. For each annotation at least one field between prefix and postfix must be specified
anns.exactText of the tagged entityMandatory field
anns.section Name of the section of the article where the tagged entity appears Optional field. For annotations datasets corresponding to a full text article (src=PMC) the list of possible values is:
  • Title
  • Abstract
  • Introduction
  • Methods
  • Results
  • Discussion
  • Acknowledgments
  • References
  • Table
  • Figure
  • Case study
  • Supplementary material
  • Conclusion
  • Abbreviations
  • Competing Interests
  • Article (default to use if the annotation can not be mapped to any of the other sections)
For any other article source the possible values are:
  • Title
  • Abstract (default)
anns.tagsList of the entities tagged by this annotation Mandatory field. This list should contain at least one tag
anns.tags.nameName of the tagged entityMandatory field
anns.tags.uri URI to the ID or the Accession number the entity is linked to (e.g.: UniPort: http://purl.uniprot.org/uniprot/[Acs_number]). Mandatory field