Getting Started-header-two

Getting Started-header-three

Search Data-header-two

Search Data-header-three

Analyze Data-header-two

Analyze Data-header-three

Studies and Access-header-two

Studies and Access-header-three

Studies and Access

For Developers-header-two

For Developers-header-three

Privacy Policy-header-two

Privacy Policy-header-three

Technical Support Icon

Technical Support


For assistance with the Kids First Data Portal, please send us your detailed questions.

Contact Us

Overview

This page outlines each available dataset and release notes on the searchable and downloadable data in the Kids First Data Resource Portal. Users requesting access to controlled data are required to have an eRA Commons account. Dataset authentication is managed by dbGaP or consortia Data Access Committees (DAC’s). To learn more about how to apply for data access, please review the “Applying for Data Access” guide.

We are continuously adding more data and working on quality improvements. As such, the data in the file repository may change as we work through known issues and improve our processing pipelines.

Available Datasets

Pediatric Brain Tumor Atlas: CBTTC

  • First Portal Release Date (beta): June 18th, 2018
  • Data Types Available::
    • Whole Genome Sequencing (WGS), RNA-Seq, Histology Images, Pathology Reports, Radiology Images, Radiology Reports, Operation Reports
  • Sequencing Center: Various
  • About the Study: CBTTC Website
  • Data Access Committee: CBTTC Data Access Committee
  • Applying for Access: CBTTC Data Access Form
  • DOI: 10.24370/SD_BHJXBDQK
  • Known Data Issues:
    • CBTTC clinical event data is collected in a way that associates a diagnosis to a biospecimen, most often a tumor. A participant can have multiple tumors over time that have different diagnoses. Currently, this data in the Kids First Data Resource Portal is being presented as a diagnosis being attached to the participant and the association between tumor and diagnosis is not being displayed. This issue is being worked on. In the meantime, a list of diagnoses and directly associated clinical events is available by emailing support@kidsfirstdrc.org.
  • Release Notes:
  • 10/11/18:
    • Released data model change to move Consent Type from Participant to Biospecimen.
  • 10/05/18:
    • Mapped the harmonized files created by the DRC to the source genomic files’ sequencing experiment. This allows both source & harmonized files to be searchable when filtering on Experiment Strategy.
    • As part of the mapping of harmonized files to Sequencing Experiments for CBTTC, we have changed the sequencing experiment external IDs.
    • Imported 36 new harmonized files for 36 existing biospecimens.
    • Associated Diagnoses for 21 tumor Biospecimens; the biospecimen - diagnoses relationships were previously missing.
    • Corrected a mis-assigned aliquot ID for sample 7316-238-T-232096.WGS.
    • Corrected the analyte type for 7316-14-N-8710.WGS from RNA to DNA.
  • 9/18/18:
    • Removed source Expression (rsem) files from the portal. We will be providing harmonized versions in the near future.
    • Removed the biospecimen/diagnosis association for diagnoses classified as “Pre Existing Medical Conditions” and “Cancer Predispositions”. These are associated to participants and not to specific biospecimens.
    • Refreshed all diagnosis values from the CBTTC source databases as the clinical team continues to reclassify missing or “Other” diagnoses into defined buckets. This refresh also brought in more diagnosis values for the “Pre Existing Medical Conditions” and “Cancer Predispositions”.
    • Updated composition for 12 biospecimens to Derived Cell Line. They were previously set to Solid Tissue: 7316-1746-T-365613.WGS, 7316-1746-T-365613.RNA-Seq, 7316-1763-T-365902.RNA-Seq, 7316-3058-T-548405.WGS, 7316-85-T-61659.WGS, 7316-85-T-61659.RNA-Seq, 7316-1763-T-365902.WGS, 7316-3058-T-548405.RNA-Seq, 7316-913-T-345474.WGS, 7316-913-T-345474.RNA-Seq, 7316-85-T-61659.WGS, 7316-85-T-61659.RNA-Seq
    • Set composition for 4 biospecimens to Plasma. They were previously set to Blood: 7316-931-P-345767.WGS, 7316-467-P-323685.WGS, 7316-883-P-344950.WGS, 7316-378-P-242813.WGS
    • Removed the following biospecimens and associated data files because potential tumor-normal mismatch issues have been identified are undergoing further QC review: 7316-471-T-323762.WGS, 7316-406-T-311440.WGS, 7316-2658-T-479078.WGS, 7316-878-T-344873.WGS, 7316-471-N-323754.WGS, 7316-406-N-311439.WGS, 7316-2658-N-479074.WGS, 7316-878-N-344866.WGS
    • Set source text anatomical site to “Central Nervous System” for external ID 7316-333-N-242258.WGS
  • 9/10/18: Initial versioned release of the Pediatric Brain Tumor Atlas. CBTTC data are made publicly available pre-publications under the above DOI with processed data available on PedCBioPortal.

Orofacial-Cleft:-European-Ancestry

  • First Portal Release Date (beta): June 18th, 2018
  • Data Types Available:: Aligned Reads, gVCFs
  • Sequencing Center: Washington University with harmonized data generated by the DRC
  • About the Study: NIH X01 Project Abstract - Mary Marazita, PI
  • Data Access Committee: Joint NIAMS-NIDCR Data Access Committee
  • Applying for Access: phs001168 dbGaP Study Page
  • Known Data Issues:
    • There is an external sample ID mismatch between dbGaP and the Kids First DRP. dbGaP sample PA1985 really should be PA1985B. Because of this, users cannot download this file from the Kids First Data Resource Portal at this time.
  • Release Notes:
  • 10/11/18:
    • Released data model change to move Consent Type from Participant to Biospecimen.
  • 10/05/18:
    • Mapped the harmonized files created by the DRC to the source genomic files’ sequencing experiment. This allows both source & harmonized files to be searchable when filtering on Experiment Strategy.
  • 9/18/18:
    • Removed the following biospecimens due to QC issues found during genomic data review.
      • External Participant & Sample ID -> Exclusion Reason
      • MD0031 -> Uncertain Identity of samples
      • MD0032 -> Uncertain Identity of samples
      • MD0033 -> Uncertain Identity of samples
      • MD0280 -> Uncertain Identity of samples
      • MD0281 -> Uncertain Identity of samples
      • MD0282 -> Uncertain Identity of samples
      • PA2063 -> Uncertain identity of sample
      • PA2027 -> Uncertain identity of sample
      • PA2200 -> High missing rate
      • PA2254 -> Duplicate of another sample
      • IA2650 -> Uncertain identity of samples
      • IA2651 -> Uncertain identity of samples
      • IA2652 -> Uncertain identity of samples
      • IA2836 -> Uncertain identity of samples
      • IA2837 -> Uncertain identity of samples
      • IA2838 -> Uncertain identity of samples
      • IA4062 -> High missing rate
      • MD3181 -> High Het/Hom ratio
      • IA4019 -> High missing rate
      • IA4022 -> High missing rate
      • IA3038 -> Definitely unrelated to offspring
      • IA4054 -> High missing rate
  • 9/10/18: Initial versioned release of this study as part of the Kids First Data Resource Center. Latest dbGaP release notes found here.

Ewing Sarcoma: Genetic Risk

  • First Portal Release Date (beta): June 18th, 2018
  • Data Types Available:: Aligned Reads, gVCFs
  • Sequencing Center: Washington University with harmonized data generated by the DRC
  • About the Study: NIH X01 Project Abstract - Joshua Schiffman, PI
  • Data Access Committee: NCI DAC
  • Applying for Access: phs001228 dbGaP Study Page
  • Known Data Issues:
    • None at this time.
  • Release Notes:
  • 10/11/18:
    • Released data model change to move Consent Type from Participant to Biospecimen.
  • 10/05/18:
    • Mapped the harmonized files created by the DRC to the source genomic files’ sequencing experiment. This allows both source & harmonized files to be searchable when filtering on Experiment Strategy.
  • 9/18/18: Updated Participant consent types to align with dbGaP.
  • 9/10/18: Initial versioned release of this study as part of the Kids First Data Resource Center. Latest dbGaP release notes found here.

Syndromic Cranial Dysinnervation

  • First Portal Release Date (beta): August 23rd, 2018
  • Data Types Available:: Aligned Reads, gVCFs
  • Sequencing Center: Baylor College of Medicine with harmonized data generated by the DRC
  • About the Study: NIH X01 Project Abstract - Elizabeth Engle, PI
  • Data Access Committee: Kids First DAC
  • Applying for Access: phs001247 dbGaP Study Page
  • Known Data Issues:
    • None at this time.
  • Release Notes:
  • 10/11/18:
    • Released data model change to move Consent Type from Participant to Biospecimen.
  • 10/05/18:
    • Mapped the harmonized files created by the DRC to the source genomic files’ sequencing experiment. This allows both source & harmonized files to be searchable when filtering on Experiment Strategy.
  • 9/18/18:
    • Fixed participant with missing family ID. PT_BX9B2A7T now has the correct family ID.
    • Updated participant consent types to align with dbGaP.
  • 9/10/18: Initial versioned release of this study as part of the Kids First Data Resource Center. Latest dbGaP release notes found here.

Congenital Heart Defects

  • First Portal Release Date (beta): June 18th, 2018
  • Data Types Available:: Aligned Reads, gVCFs
  • Sequencing Center: Baylor College of Medicine with harmonized data generated by the DRC
  • About the Study: NIH X01 Project Abstract - Christine Seidman, PI
  • Data Access Committee: Kids Fist DAC
  • Applying for Access: phs001138 dbGaP Study Page
  • Known Data Issues:
    • None at this time.
  • Release Notes:
  • 10/11/18:
    • Released data model change to move Consent Type from Participant to Biospecimen.
  • 10/05/18:
    • Mapped the harmonized files created by the DRC to the source genomic files’ sequencing experiment. This allows both source & harmonized files to be searchable when filtering on Experiment Strategy.
  • 9/18/18:
    • The study was successfully decoupled from its parent study. As part of this, the data is now downloadable from the portal for those who have been granted dbGaP access.
    • Updated participant consent types to align with dbGaP.
  • 9/10/18: Initial versioned release of this study as part of the Kids First Data Resource Center.

Adolescent Idiopathic Scoliosis

  • First Portal Release Date: October 12th, 2018
  • Data Types Available:: Aligned Reads, gVCFs
  • Sequencing Center: Hudson Alpha with harmonized data generated by the DRC
  • About the Study: NIH X01 Project Abstract - Jonathan Rios, PI
  • Data Access Committee: Kids First DAC
  • Applying for Access: phs001410 dbGaP Study Page
  • Known Data Issues:
    • None at this time.
  • Release Notes:
  • 10/11/18:
    • Initial portal release

Congenital Diaphragmatic Hernia

  • First Portal Release Date (beta): June 18th, 2018
  • Data Types Available:: Aligned Reads, gVCFs
  • Sequencing Center: Baylor College of Medicine with harmonized data generated by the DRC
  • About the Study: NIH X01 Project Abstract - Wendy Chung, PI
  • Data Access Committee: NICHD-DAC
  • Applying for Access: phs001110 dbGaP Study Page
  • Known Data Issues:
    • This study is currently missing some phenotypic & clinical data. We are currently working on curating it.
  • Release Notes:
  • 10/11/18:
    • Released data model change to move Consent Type from Participant to Biospecimen.
  • 10/05/18:
    • Mapped the harmonized files created by the DRC to the source genomic files’ sequencing experiment. This allows both source & harmonized files to be searchable when filtering on Experiment Strategy.
    • There are 12 participant/biospecimen IDs that have changed since the last release of this study. However, the old IDs are still referenced on dbGaP. Thus, in this release the External Sample Id and External Aliquot Id fields on biospecimen will refer to the old IDs. The External Id field on other clinical entities such as participant, family_relationship, diagnosis, phenotype, and outcome refer to/contain the new IDs. Once dbGaP is updated with the new IDs, the biospecimen External Sample Id and External Aliquot Id fields will be updated.
      • Old: 216 / New: CDH216
      • Old: 217 / New: CDH217
      • Old: 319 / New: CDH319
      • Old: 320 / New: CDH320
      • Old: 549 / New: CDH549
      • Old: 576 / New: CDH576
      • Old: 01-0218 / New: CDH218
      • Old: 01-0318 / New: CDH318
      • Old: 01-0577 / New: CDH577
      • Old: 05-0015 / New: CDH05-0015
      • Old: 5-15F / New: CDH5-15F
      • Old: 5-15M / New: CDH5-15M
  • 9/18/18:
    • Assigned all pro bands “Congenital diaphragmatic hernia” as a diagnosis. Previously, no diagnoses were assigned.
    • Added proband label to the children in the trios.
    • Updated participant consent types abbreviations.
  • 9/10/18: Initial versioned release of this study as part of the Kids First Data Resource Center. Latest dbGaP release notes found here.

Known Data Issues

Last Updated: 9/7/18

Below is a list of known data issues that we are actively working to resolve.

  • HPO Values: Some HPO values may be missing or incorrectly assigned. We are actively reviewing and QCing these across all studies.
  • Future Use Facets: The following facets are available in the portal but are in development for future use and may not have valid values at this time:
    • Alias Group
    • Availability
    • ICD ID Diagnosis
    • HPO Observed/Not Observed
    • Release Status
    • Shipment Date
    • Shipment Origin
    • SNOMED Observed/Not Observed
    • Spatial Descriptor (Biospecimen)
    • Spatial Descriptor (Diagnosis)

Notice an issue?

We are continuously looking for feedback on how to make the data on the Kids First DRP more searchable and usable to the community. If you notice an issue, have a question or want to provide a suggestion, please use the feedback widget within the portal or email us at support@kidsfirstdrc.org.

Access Data

Data Access Levels

The Kids First DRC supports three different data access tiers.

NIH Trusted Partner Environment

A “trusted partner” is defined as a public or private, national or international organization that is able to meet core NIH standards for establishing data quality and data management service protocols for NIH, based on the programmatic need of an NIH funding Institute or Center (IC)

Bionimbus is a trusted partner that is cloud-agnostic, operated by the University of Chicago, and is powered by the Gen3 software stack

Bionimbus fulfills the DRC’s Data Distribution roles in support of the NIH’s current genomic data sharing policies:

  • Data will be maintained through controlled access:
    • A. Permission to access data will be requested through NIH Data Access Committees, per NIH-prescribed processes for the institutional certification of data sharing requests
    • B. Standard telemetry will be used to communicate with NIH systems for authenticating Approved Users through the dbGaP data request process

Linking your eRA Commons Account to Gen3 & the Portal

To analyze data on Cavatica or to download genomic files locally, you must link your Kids First DRC Account to Gen3 via your eRA Commons login.

  1. Sign into your Kids First DRC Portal account.
  2. Navigate to Settings from the upper right-hand corner drop down, under your name.
  3. Under settings, scroll down to the Integrations section. Locate Gen3 Data Commons. Click Connect.
  4. You will be directed to https://auth.nih.gov/ to sign in. Providing your eRA Commons credentials will redirect you back to the Portal and complete your Gen3 integration.

Applying for Data Access

Access to controlled access data requires authorization from the appropriate Data Access Committee (DAC). While most dataset access within the Kids First DRC is granted through dbGaP, there are some datasets whose access is reviewed & granted through consortia DAC’s. Please reference the datasets above for their specific access management information. For any questions on how to apply for dbGaP access, please visit their page here.