Search Data
File Repository & Filter/Facets
The File Repository is the primary method of accessing data in the Kids First Data Resource Portal. It provides an overview of data available in Kids First and offers users a variety of filters for identifying and browsing participants and files of interest. Users can access the File Repository section from the Kids First Data Portal top menu bars.
On the left, a panel of data facets allows users to filter participants and files using a variety of criteria. If facet filters are applied, the table on the right will display information about matching participants and files. If no filters are applied, the table on the right will display information about all available data.
When the user applies filters, a banner appears above the table displaying the active filters that have been applied and provides access to share and save the query for later reference.
Advanced Filter Search
The File Repository section provides access to additional data filters beyond the defaults. Filters corresponding to additional properties listed in the Kids First Data Dictionary can be added using the ALL FILTERS button available at the top of the filter panel.
The button opens a search window that allows the user to find an additional filter by name or value. Not all filters have values available for filtering; checking the “Show only fields with values” checkbox will limit the search results to only those that do.
File Browser Column Definitions
Below are the definitions and descriptions of the column headers in the Kids First Data Resource Portal File Browser. Some of these overlap with definitions in the Data Dictionary and are noted as such where applicable.
The portal displays a list of default columns, and more can be added to the view by selecting the “Columns” drop down within the file browser.
Column | Description | Notes |
File ID | The Kids First DRC unique identifier for the file. | These always start with GF_ for “Genomic File” |
Participants ID | The Kids First DRC unique identifier for the participant. | These always start with PT_ for “Participant” |
Study Name | See definition in Default Clinical Filters here. | |
Proband | See definition in Default Clinical Filters here. | |
Family ID | The Kids First DRC unique identifier for the family. |
|
Data Type | See definition in Default File Filters here. | |
File Format | See definition in Default File Filters here. | |
File Size | See definition in File Filters here. | |
File Download (No label - always last column) | If you have access to download a file, you will see a clickable down pointing arrow button. If you do not have access to download a file, you will see a lock icon indicating you do not have access. | Clicking on the down pointing arrow will download the file. |
Participant External ID | The external ID of the participant provided by the investigator of the original study. This is a deidentified ID unique only within its given study. | |
File Name | The name of the file. | |
File External ID | The external ID of the file in the original study. This is often the file name. | |
Aliquot External ID | The external ID of the aliquot provided by the investigator of the original study. This is a deidentified ID unique only within its given study. | |
Sample External ID | The external ID of the sample/biospecimen provided by the investigator of the original study. This is a deidentified ID unique only within its given study. | |
Biospecimen ID | The Kids First DRC unique identifier for the biospecimen. | These always start with BS_ for “Biospecimen” |
Tissue Type (Source Text) | See definition in Default Clinical Filters here. | |
Diagnosis (Source Text) | See definition in Default Clinical Filters here. | |
Study ID | The Kids First DRC unique identifier for the study. | These always start with SD_ for “Study” |
Latest DID | The Gen3 Document ID (DID) of the file. |
Search Facet Definitions
Below are the definitions and some general notes for the Kids First DRC Data Dictionary and search facets. As some of these definitions are common across many online genomic data resources, we have modeled some of our definitions after the Genomic Data Commons located here. Any definition noted with a [1] has come from GDC.
Default Facets
The portal provides two tabs of default facets to search over. The search terms have been divided into the categories of Clinical and File to help make searching easier for the user. The definitions below
Clinical Filters
Facet | Description | Notes |
Study Name | The short name of the study derived from the Study Long Name. | Assigning short names for studies allows for easier searching on the portal. |
Diagnosis Category | An overarching classification based on the Diagnosis Source Text to aide in quick searching over Cancer and Structural Birth Defects. | |
Diagnosis (Source Text) | Analysis, and recognition of the presence and nature of disease, condition, or injury from expressed signs and symptoms; also, the scientific determination of any kind; the concise results of such a study/investigation. [1] | Data as obtained directly from the original study/investigation. |
Family Composition | A calculated value based on the family members present with genomic data within a participant’s pedigree. |
|
Proband | The participant serving as the starting point for enrollment into study, often the first family member seeking medical attention. | |
Gender | Text designations that identify gender. Gender is described as the assemblage of properties that distinguish people on the basis of their societal roles. This value is self-reported and may come from a form, questionnaire, interview, etc. [1] | |
Race | An arbitrary classification of a taxonomic group that is a division of a species. It is characterized by shared hereditary, physical attributes and behavior, and in the case of humans, by common history, nationality, or geographic distribution. [1] | |
Tissue Type (Source Text) | Text term directly from the original investigation that represents a description of the kind of tissue collected with respect to disease status or proximity to tumor tissue. [1] | Data as obtained directly from the original study/investigation. |
File Filters
Facet | Description | Notes |
Experiment Strategy | The type of sequencing experiment performed on the biospecimen. |
|
Harmonized Data | Indicate if the file has been harmonized to the Kids First DRC standards so that it can be used alongside any other file in the DRC from any other study and sequencing center. | Kids First DRC harmonization pipelines currently align to GRCh38. Read more about harmonization here. |
Data Type | The high level type of data contained within the file. | |
File Format | The technical specification of the file, typically the file extension of the file. | |
Family Shared Data Types | All members in the family share this data type. |
All Filters
File
Facet | Description | Notes |
ACL | The access control list value assigned to the file based on data access committee (DAC) authorization. This obtained from dbGaP for NIH based datasets. | |
Availability | Value assigned that indicates whether or not the file is available for download within the DRC. This value does not take into account individual user’s permissions. | For potential future use, all data on the portal are currently available for download so this field is not populated. |
Access | The type of authorization required in order to download the file. Open Access files are available to any registered portal user. Controlled access files require approval & authorization by dbGaP or the controlling DAC. | For more information on Data Access, please see the Support page here. |
Created At | The date that the file was created in the DRC’s system. | For potential future use, currently not populated with data. |
Data Type | See definition in Default File Facets here. | |
File Format | See definition in Default File Facets | |
File Name | The exact name of the file. | |
Harmonized Data | See definition in Default File Facets here. | |
Modified At | The date that the file was last modified in the DRC’s system. | For potential future use, currently not populated with data. |
Reference Genome | The reference genome by which the sequencing experiment was run against. | Unharmonized files from the various sequencing centers use various different reference genomes depending on when and where they were sequenced. The DRC harmonized files are aligned to GRCh38. |
File Size | The measure of the size of the file in KB, MB or GB. |
Participant
Facet | Description | Notes |
Alias Group | For potential future use, currently not populated with data. | |
Available Data Types | The File data types available for the participants. | See Data Types definition under Files for a more detailed Data Type definition. |
Consent Type | The informed consent type that the participant agreed to at the time of sample collection. This is used to inform any data use limitations. | |
Ethnicity | An individual’s self-described social and cultural grouping, specifically whether an individual describes themselves as Hispanic or Latino. | |
Gender | See definition in Default Clinical Filters here. | |
Proband | See definition in Default Clinical Filters here. | |
Race | See definition in Default Clinical Filters here. |
Biospecimen
Facet | Description | Notes |
Age at Event (Days) | The age of the participant in days that the biospecimen was collected. | |
Analyte Type | Text term that represents the kind of molecular specimen analyte. [1] | |
Anatomical Site (Source Text) | The source text from the investigator that describes the disease site of the submitted sample. [1] | Data as obtained directly from the original study/investigation. |
Composition | Text term that represents the cellular composition of the sample. [1] | |
Concentration (mg/ML) | Numeric value that represents the concentration of analyte or aliquot extracted from the sample or sample portion, measured in milligrams per milliliter. [1] | |
NCIt ID Tissue Type | The National Cancer Institute Thesaurus (NCIt) ID associated with the Tissue Type (Source Text) value. Derived by matching the Tissue Type Source Text with NCIt lookups. | |
NCIt ID Anatomical Site | The National Cancer Institute Thesaurus (NCIt) ID associated with the Anatomical Site (Source Text) value. Derived by matching the Anatomical Site Source Text value with NCIt lookups. | |
Participant’s Biospecimens Dbgap Consent Code | The dbGaP-assigned consent code used for access granting that is derived directly from the participant’s consent. | See Consent Type under participant for further definitions on consent. |
Shipment Date | Date the biospecimen was shipped to the sequencing center. | For potential future use, currently not populated with data. |
Shipment Origin | Location/institution from where the biospecimen was shipped. | For potential future use, currently not populated with data. |
Spatial Descriptor | Term to indicate precise, relative anatomical position from where the biospecimen was obtained. | For potential future use, currently not populated with data. |
Tissue Type (Source Text) | Text term directly from the original investigation that represents a description of the kind of tissue collected with respect to disease status or proximity to tumor tissue. [1] | Data as obtained directly from the original study/investigation. |
Uberon ID Anatomical Site | The Uberon ID associated with the Anatomical Site (Source Text) value. Derived by matching the Anatomical Site Source Text value with Uberon lookups. | |
Volume (uL) | The volume in microliters of the analytes derived from the analyte(s) shipped for sequencing and characterization. [1] |
Diagnosis
Facet | Description | Notes |
Age at Event (Days) | The participant’s age in days that they were diagnosed with the disease. | |
Diagnosis | A calculated rollup of a participant’s diagnoses. If the participant has only Cancer diagnoses, the participant’s value is Cancer. If the participant has only Structural Birth Defect diagnoses, the value is Structural Birth Defects. | |
Diagnosis (Source Text) | See definition in Default Clinical Filters here. | |
Diagnosis Category | See definition in Default Clinical Filters here. | |
ICD ID Diagnosis | ICD10 code for the diagnosis. | For potential future use, currently not populated with data. |
Mondo ID Diagnosis | The Mondo ID associated with the Diagnosis (Source Text) value. Derived by matching the Diagnosis Source Text value with Mondo ID lookups. | |
NCIt ID Diagnosis | The National Cancer Institute Thesaurus (NCIt) ID associated with the Diagnosis (Source Text) value. Derived by matching the Diagnosis Source Text value with NCIt lookups. | |
Spatial Descriptor | Term to indicate precise, relative anatomical position of the diagnosis. | For potential future use, currently not populated with data. |
Tumor Location (Source Text) | Text term from the investigator that describes the anatomic site of the tumor. [1] | Data as obtained directly from the original study/investigation. |
Family
Facet | Description | Notes |
Family Composition | See definition in Default Clinical Filters here. |
All other Family facets are a derivative of definitions above. The Family facets are files that belong to all members within a family.
Outcome
Facet | Description | Notes |
Age at Event (Days) | Participant’s age in days of the Outcome event. | |
Disease Related | Text value describing whether or not the participant’s outcome is related to their disease. For example, whether their deceased status was due to their disease. | |
Vital Status | The survival state of the participant. |
Phenotype
Facet | Description | Notes |
Age at Event (Days) | Participant’s age in days that the phenotype was observed. | |
Ancestral HPO IDs | The Human Phenotype Ontology value associated with the Participant Phenotype (Source Text) value. Derived by matching the Phenotype Source Text value with HPO lookups. | |
External ID | External ID provided by the investigator of the original study for the Phenotype observation. | |
HPO Phenotype Observed | Files for which the HPO ID was positively observed. | |
Participant Phenotype (Source Text) | The observable characteristics in a participant resulting from the expression of genes, environment factors, and their interactions. | Data as obtained directly from the original study/investigation. |
Participants Phenotype HPO - HPO Phenotype Not Observed | Files for which the HPO ID was negatively observed. | |
Participants Phenotype HPO - Snomed Phenotype Not Observed | Files for which the Snomed value associated with the Participant Phenotype (Source Text) was negatively observed. Derived by matching the Phenotype Source Text value with Snomed lookups. | |
Participants Phenotype HPO - Snomed Phenotype Observed | Files for which the Snomed value associated with the Participant Phenotype (Source Text) was positively observed. Derived by matching the Phenotype Source Text value with Snomed lookups. |
Study
Facet | Description | Notes |
Data Access Authority | The authoritative group responsible for providing access. | |
Release Status | The status of the study within its Data Access Authority. | For potential future use, currently populated with “Pending”. |
Study Long Name | The name of the study for which the original sample was sequenced and researched. Samples and participants are organized in the portal by their originating study. | IDs are the Kid First ID which starts with SD_*. If the study is a dbGaP study, the external ID for the study will be the PHS accession number. |
Study Name | See definition in Default Clinical Filters here. | |
Version | The dbGaP version of the study. |
Sequencing Experiment
Facet | Description | Notes |
Experiment Date | Date the sample was sequenced. | |
Experiment Strategy | See definition in Default File Filters here. | |
Instrument Model | The model of the sequencer used to obtain data. | |
Is Paired End | Are there paired ends? [1] | |
Library Name | Name of library. [1] | |
Library Strand | Library stranded-ness. [1] | |
Max Insert Size | Max number of bases found between paired-end adapters. | |
Mean Insert Size | Mean number of bases found between paired-end adapters. | |
Mean Read Length | Mean length of the sequenced fragments [1] | |
Platform | Name of platform used to obtain data. [1] | |
Total Reads | Total number of reads from the sequencing experiment. |