Terms of use

General

We have a commitment to Open Science by freely providing an online database, data and services relating to data contributed from biological and biomedical science experiments by authors and users. We impose no additional restrictions on access to, or use of, the data provided beyond those asserted by the data depositors.

Datasets associated with GigaScience journal articles are specifically released under a Creative Commons CC0 waiver unless stated otherwise. While CC0 waives any legal requirement for attribution, crediting the source of research findings and data are cultural norms and expected etiquette for science. In recognition of the extensive effort that underlies these projects, we ask that users appropriately acknowledge the use of any dataset(s). To aid researchers to find, access, and reuse data, we issue citable digital object identifiers for each dataset. Our recommended format for a data citation of each dataset is listed on each page. This is in accordance with the Data Citation Principles. The Open Science policies and practices are only sustainable if scientific credit is generated for all parties involved, and by making dataset citation details available on each dataset we are assisting in developing a global research environment that rewards data sharing.

Users submitting data agree to not upload viruses or malware, and all users agree not to upload, post, transmit, distribute or otherwise publish on or to the system any materials that contain a software virus or other harmful component.

We are not liable to you or third parties claiming through you, for any loss or damage, including the consequences of any discontinuity of service.

While we will retain our commitment to Open Science, we reserve the right to update these Terms of Use. When alterations are inevitable, we will attempt to give reasonable notice of any changes by placing a notice on our website, but you may wish to check each time you use the website. The date of the most recent revision will appear on this, Terms of Use page.

Any questions or comments concerning these Terms of Use can be addressed to the administrator via emailing [email protected].

GigaDB Services

We provide these data in good faith, but make no warranty, expressed or implied, nor assume any legal liability or responsibility for any purpose for which they are used.

All feedback provided to us on our online services will be treated as non-confidential unless the individual or organisation providing it states otherwise.

Any attempt to use our online services to a level that prevents, or looks likely to prevent, us providing services to others, will result in that use/user being blocked. We will attempt to contact the user to discuss their needs and how (and if) these can be met from other sources.

Some of the data provided from external sources may be subject to third-party constraints. Users are solely responsible for establishing the nature of and complying with any such intellectual property restrictions. Any software made available for download through our web pages (either directly or from third party repositories like GitHub) will have its own individual license agreement that supersedes the default CC0 waiver.

Depositor agreement

As an Open Data repository, depositors must make sure everyone can legally reuse the data they deposit and confirm that it is their own work. Co-authorship must be credited, and third party materials created by someone else must have the necessary permissions and attribution to allow sharing in this manner. Publication irrevocably grants anyone the right to use this work under the Creative Commons Zero Waiver which releases all rights, like public domain. We agree to include expression of wishes as to their specific licenses but we will not restrict access to data files based on those licenses. We require that all files hosted on our server are given an OSI or similarly open license, if the authors do not expressly provide one, the file(s) will be assumed to be under the most permissive CC0 copyright waiver.

A note about BGI Data

As one of the world’s largest biological data producers, the BGI’s goal is to maximize the use of its data by providing it to the research community in a timely manner. At the same time, BGI recognizes the need for researchers to be appropriately credited for their scientific contribution and investment in data generation. It is therefore expected that all researchers both honor agreements in line with the Fort Lauderdale and the Toronto International Data Release Workshop data sharing principles and appropriately acknowledge the contributions of others.

Accordingly, raw data such as individual sequence read traces are submitted to the relevant database as soon as they have exited BGI quality control pipelines. Whole genome sequence assemblies are released as soon as possible following appropriate quality analysis. Our repository contains draft versions of genome sequence assemblies, and we ask that you understand that these represent preliminary data, subject to omissions and errors. In addition, whole genome assemblies are likely to change upon the availability of new data, and our website will document new assembly versions as they are released.

In recognition of the extensive effort that underlies these projects, we ask that you appropriately acknowledge the use of any preliminary data. To aid researchers to find, access, and reuse data, we have issued citable digital object identifiers for each dataset. Our recommended format for a data citation of each dataset is listed on each page. This recommendation is in accordance with the adopted guidelines by the genome sequencing community in a statement of principles for the distribution and use of large-scale sequencing data: Community Resource Projects and the resulting NHGRI policy statement. If you have any questions regarding the use of this data, please contact us at [email protected]. In line with these recognized research norms, we request that you contact the relevant dataset contact before publishing analyses of the sequences on a genome scale. We welcome collaborative interaction to provide the community with improved whole genome analyses and annotations.

Human data

Although all data is released under the most open licenses possible the user agrees to not use data and/or metadata , alone or in combination with other data, to identify any individual or entity that has been anonymized. If you inadvertently discover the identity of any patient, individual or entity, then (a) You agree that you will make no use of this knowledge, (b) that you will notify us ([email protected]) of the incident, and (c) that you will inform no one else of the discovered identity.

Disclaimer

We provide this data in good faith, but make no warranty, expressed or implied, nor assume any legal liability or responsibility for any purpose for which they are used.

GigaScience, including GigaDB, was launched in 2012 in partnership with BGI. The stability of future hosting of content is guaranteed and protected by the host organisations BGI - an international company with over 7,000 employees and data-centres in multiple countries - and China National GeneBank (CNGB) - a Shenzhen Government run non-profit organisation. CNGB is a key infrastructure project coming out of the Chinese Government's 12th Five-Year Plan; this initial funding provides much of the current infrastructure/ hardware and staffing costs for GigaScience. Already having received a second phase of funding for another 5 years, once completed it is envisaged that a continuation of this grant will be secured for at least 5 years more. Additionally, the founding company of both CNGB and GigaScience (BGI) provides a fall-back position to ensure continued preservation of the metadata and data, and have provided guarantees on storage through the next 5-year funding period and beyond.

There are three technical parts to GigaDB: (1) The data hosting (FTP server); (2) the website GigaDB.org; and (3) the metadata database. The latter two parts are hosted together on a server physically located in Hong Kong, whereas the data hosting server is located in Shenzhen, China.

The GigaDB data repository (FTP server)

GigaDB has two types of data hosting: (a) public open access; and (b) private FTP access. The open access public FTP server has security measures in place to prevent removal or modification of data hosted on the server. These measures are ensured by READ only access to data files. The private FTP server is administered by GigaDB staff and individual submitters are provided with login details to allow them to upload their data prior to publication. Those details are shared with editors, who then share them with the selected reviewers to enable the reviewers to access the data for the purpose of review. The login details grant access to the data of one manuscript/dataset, and the passwords are revoked after completion of the submission. At this point there are two options, namely: 1) the manuscript is accepted and the accompanying data are moved to the public server; or 2) the manuscript is rejected and the data are deleted.

Both private and public FTP servers that host GigaDB data are installed and maintained by Ali Cloud in the data center of China National GeneBank (CNGB) in Dapeng, Shenzhen. The physical security of the servers is guaranteed in terms of power supply, fire protection equipment and temperature/humidity control. There is a dedicated server room at CNGB and this area has CCTV monitoring and access control. Importantly, access to the server room and activities are recorded. Network security is maintained with a firewall that ensures that only data flows that comply with the security policy can pass through the firewall. This is an important measure to ensure the legality of data access. Host system security is maintained by identifying and restricting access to data through authorisation controls, and maintenance of sufficiently detailed logs are collected for auditors to audit and monitor.

As an open data repository, public data can be viewed, retrieved and downloaded by external users. In contrast, private data can only be viewed by the submitter or authorized reviewer(s). Public and private data are stored separately, preventing illegal access to private data. The FTP node uses the hot standby policy to ensure that upload and download services are available in the event of a failure. The CNGB data centre currently has certification for the following information security standards: ISO-27001; Certification China Information Security Level Protection Level 3; and Trusted Cloud Service Certification issued by China Information and Communication Research Institute (see here). Alibaba NAS storage promises 99.9999999% data reliability, which can effectively reduce data security risks. The NAS Backup Service is responsible for the entire backup process, including automated backup tasks. This includes periodic backups with a new backup every 24 hours. Alibaba NAS also provides the ability to restore backup to an empty NAS storage system on Ali Cloud (see here).

The database holds md5sum values of all original files.These are provided for users to ensure data transfer was accomplished successfully. In addition we compare them to the files in the FTP server on a regular basis to ensure integrity.

GigaDB is also now a member of CLOCKSS (Controlled LOCKSS -"Lots of Copies Keep Stuff Safe"), ensuring all curated metadata, external links and documentation are preserved in the long term (see 3 below). In addition, we are exploring the feasibility and costs of CCLOCKS archiving the data files hosted on our FTP server as a longer term succession plan.

As an option to increase accessibility worldwide, we also are working with Complete Genomics - a subsidiary of BGI group of companies - to provide a mirror ftp server in California, USA which will be a further backup copy of the data.

GigaDB.org website

The GigaDB.org website is currently hosted on and served from BGI-Hong Kong Co. Ltd (a subsidiary of BGI group of companies) located in Hong Kong. The Hong Kong data centre is maintained by BGI, who have been running it since 2010 to a very high standard in line with ISO/IEC 27001 certification. The GigaDB source code is available in our GitHub repository under an Open Source GPLv3 license. A Behavior-Driven Development process is now coupled with Test-Driven Development and unit testing techniques for developing the GigaDB application. We have also adopted Continuous Integration and Continuous Deployment approaches to automate checking of the GigaDB source code so that the application is not broken and that new functionality is made available in a timely manner on the website. In the future we intend to move the website to a cloud service provider in an effort to deliver certified guarantees of stability and availability.

Metadata

All metadata collected by GigaDB is stored in a bespoke pSQL database. Only individual user information is kept private to comply with local and international data privacy laws. All GigaDB dataset metadata is openly available and as such has likely been duplicated in various archives and search engines (e.g. Google, utilising schema.org metadata) around the world. This offers greater discoverability for our datasets.

The pSQL database is hosted on the BGI-Hong Kong Co. Ltd servers in Hong Kong, with automatic nightly back-ups taken.

A complete dump of all dataset metadata is available from our API in XML format. We publish all datasets through the DataCite DOI register which also ensures the continued availability of basic dataset level metadata submitted as part of the registration.

GigaDB utilises CLOCKSS as an additional method to ensure persistence of all metadata beyond any unforeseen business continuity issues. The CLOCKS subscription means a copy of the dataset metadata in the form of an XML dump from API is stored indefinitely, and this ensures that the curated metadata, external links and documentation are all preserved in the long term. This is a guarantee to ensure future availability of all metadata in the event of GigaDB services being terminated for any reason.

In the future we intend to move the GigaDB database and website to a cloud service provider. This will provide additional certified guarantees of stability and availability.

GigaDB has implemented appropriate technical and organisational measures to ensure a level of security which we deem appropriate, taking into account the categories of data we collect and the way we process it.

For the purpose of monitoring and improving online services, planning and scientific review, GigaDB will keep its own records of usage and may use services provided by other organisations. GigaDB may make information about the total volume of usage of particular software or data available to third party organisations who supply the software or databases without details of any individual’s use.

In interacting with us through our website, you may choose to give us personal data. We will keep your personal data confidential and use it for purposes connected to our mission.

GigaDB will hold your personal data on our systems for as long as is considered necessary for the purpose(s) for which you provided us with your details, or as long as required by applicable legislation. If you cancel your user account we will ensure that your personal information is either deleted or will no longer be visible to others.

When you contribute scientific data to a database through our website or other submission tools, this information will be released at a time and in a manner consistent with the scientific data publication and we may store it permanently.

If you post or send offensive, inappropriate or objectionable content anywhere on or to our websites or otherwise engage in any disruptive behaviour on any of our services, we may use your personal information to stop such behaviour. Where we reasonably believe that you are or may be in breach of any applicable laws, we may use your personal information to inform relevant third parties about the content and your behaviour.

GigaDB will record the visits to the website by using cookies and page tagging without collecting any personal identifiable information of users. A cookie can be used to identify a computer, it is not used to collect any personal information. In other words, it does not have the function of identifying an individual user of the website. Cookies are used to collect statistics about the number of visits of users to GigaDB and the users’ preference of websites and online services offered. You may choose to inactivate your browser’s cookies. If you inactivate the cookies, you will not be able to use some of the functions of GigaDB.

The Personal Data we may collect from you could include:

Name
Email address
Affiliated institute
Connection information (e.g. IP address, web cookies, host information, approximate host location, pages visited, services used)

We may use your Personal Data:

To provide you with information about our services, activities or our online content by offering to subscribe you to newsletters, publications, event announcements, etc.
To personalise the way our web content is presented to you
To use IP addresses to identify your location for statistical reporting purposes and if necessary to block disruptive use
To analyse and improve our websites by examining statistical usage information
To generate statistical reports as to how our website and services are being used

We may disclose your Personal Data:

To service providers processing your information on our behalf or as part of scientific collaborations which are required to keep your information confidential
In statistical reports to external organisations (e.g. funding bodies or scientific collaborators) or appointed review committees
In forming part of the scientific record of data submitted to our data archives
To your employer, university, law enforcement or other government bodies, in exceptional circumstances where your activity is disruptive or may be illegal under your local laws

We may use your Personal Data to contact you:

When you have asked us to as part of a specific service request (e.g. password reset, help desk request, voicemail message)
In relation to any contribution (e.g. data submission or annotation) that you have provided to our archives
To invite your voluntary participation in surveys about our services
When you have opted in to receive further correspondence for scientific marketing and outreach purposes (e.g. newsletters)

We will NOT sell, give or share your personal data to third parties for any other reasons.