Skip to content

Class: DataSource

A cataloged data source within a lakehouse. Represents a namespace, database, object storage source, or other data collection. All required fields must be present on every submission.

URI: dcat:Dataset

classDiagram class DataSource click DataSource href "../DataSource/" CatalogEntity <|-- DataSource click CatalogEntity href "../CatalogEntity/" DataSource : access_level DataSource --> "1" AccessLevel : access_level click AccessLevel href "../AccessLevel/" DataSource : category DataSource --> "0..1" DataSourceCategory : category click DataSourceCategory href "../DataSourceCategory/" DataSource : contact_point DataSource --> "1" ContactPoint : contact_point click ContactPoint href "../ContactPoint/" DataSource : created_date DataSource : data_quality_notes DataSource : database_engine DataSource --> "0..1" DatabaseEngine : database_engine click DatabaseEngine href "../DatabaseEngine/" DataSource : deprecation_date DataSource : deprecation_reason DataSource : description DataSource : documentation_url DataSource : doi DataSource : domain DataSource : facility DataSource : format DataSource : id DataSource : instrument DataSource : is_deprecated DataSource : keywords DataSource : last_modified DataSource : license DataSource : lineage DataSource : modality DataSource : namespace DataSource : owner DataSource : previous_version DataSource --> "0..1" DataSource : previous_version click DataSource href "../DataSource/" DataSource : project_affiliation DataSource : replaced_by DataSource --> "0..1" DataSource : replaced_by click DataSource href "../DataSource/" DataSource : row_count DataSource : size_bytes DataSource : source_type DataSource --> "0..1" SourceType : source_type click SourceType href "../SourceType/" DataSource : spatial_coverage DataSource : status DataSource --> "1" DataSourceStatus : status click DataSourceStatus href "../DataSourceStatus/" DataSource : table_count DataSource : temporal_coverage_end DataSource : temporal_coverage_start DataSource : title DataSource : update_schedule DataSource --> "1" UpdateFrequency : update_schedule click UpdateFrequency href "../UpdateFrequency/" DataSource : version

Inheritance

Slots

Name Cardinality and Range Description Inheritance
owner 1
String
Person or team responsible for this data source direct
contact_point 1
ContactPoint
Structured contact information for this data source direct
namespace 1
String
Database name, source name, or space name within the lakehouse (e direct
status 1
DataSourceStatus
Current lifecycle status of this data source direct
is_deprecated 1
Boolean
Whether this data source is deprecated direct
update_schedule 1
UpdateFrequency
How frequently this data source is updated direct
access_level 1
AccessLevel
Visibility/access level of this data source direct
keywords *
String
Discovery tags for this data source direct
project_affiliation *
String
BER program affiliations (e direct
license 0..1
String
License governing use of this data source direct
domain *
String
Scientific domain(s) covered by this data source direct
version 0..1
String
Version identifier for this data source direct
doi 0..1
String
DOI if this data source has been published direct
facility 0..1
String
Originating facility (e direct
format *
String
Data format(s) available (e direct
deprecation_date 0..1
Date
Date this data source was deprecated direct
deprecation_reason 0..1
String
Explanation for why this data source was deprecated direct
replaced_by 0..1
DataSource
Reference to the data source that replaces this deprecated one direct
previous_version 0..1
DataSource
Reference to the previous version of this data source direct
temporal_coverage_start 0..1
Date
Start date of the temporal coverage of this data direct
temporal_coverage_end 0..1
Date
End date of the temporal coverage of this data direct
spatial_coverage 0..1
String
Geographic or spatial coverage description direct
data_quality_notes 0..1
String
Notes on data quality, known issues, or limitations direct
lineage 0..1
String
Provenance or lineage information for this data source direct
documentation_url 0..1
Uri
URL to external documentation for this data source direct
instrument 0..1
String
Instrument or sensor that generated the data direct
modality 0..1
String
Data modality (e direct
size_bytes 0..1
Integer
Total size of the data source in bytes direct
row_count 0..1
Integer
Number of rows or records in the data source direct
table_count 0..1
Integer
Number of tables or collections in the data source direct
source_type 0..1
SourceType
Type of data source within the lakehouse (e direct
database_engine 0..1
DatabaseEngine
Database engine for Dremio database sources (e direct
category 0..1
DataSourceCategory
Organizational category (project, shared, personal, system) direct
id 1
Uriorcurie
Unique identifier for this catalog entity CatalogEntity
title 1
String
Human-readable name for this entity CatalogEntity
description 1
String
Free-text description of this entity CatalogEntity
created_date 1
Date
Date this entity was first created or registered CatalogEntity
last_modified 0..1
Date
Date this entity was last updated CatalogEntity

Usages

used by used in type used
Lakehouse catalog_entries range DataSource
DataSource replaced_by range DataSource
DataSource previous_version range DataSource

Identifier and Mapping Information

Schema Source

  • from schema: https://w3id.org/ber-data/ber-data-registry

Mappings

Mapping Type Mapped Value
self dcat:Dataset
native ber_registry:DataSource

LinkML Source

Direct

name: DataSource
description: A cataloged data source within a lakehouse. Represents a namespace, database,
  object storage source, or other data collection. All required fields must be present
  on every submission.
from_schema: https://w3id.org/ber-data/ber-data-registry
is_a: CatalogEntity
slots:
- owner
- contact_point
- namespace
- status
- is_deprecated
- update_schedule
- access_level
- keywords
- project_affiliation
- license
- domain
- version
- doi
- facility
- format
- deprecation_date
- deprecation_reason
- replaced_by
- previous_version
- temporal_coverage_start
- temporal_coverage_end
- spatial_coverage
- data_quality_notes
- lineage
- documentation_url
- instrument
- modality
- size_bytes
- row_count
- table_count
- source_type
- database_engine
- category
slot_usage:
  description:
    name: description
    required: true
  created_date:
    name: created_date
    required: true
  owner:
    name: owner
    required: true
  contact_point:
    name: contact_point
    required: true
  namespace:
    name: namespace
    required: true
  status:
    name: status
    required: true
  is_deprecated:
    name: is_deprecated
    required: true
  update_schedule:
    name: update_schedule
    required: true
  access_level:
    name: access_level
    required: true
class_uri: dcat:Dataset

Induced

name: DataSource
description: A cataloged data source within a lakehouse. Represents a namespace, database,
  object storage source, or other data collection. All required fields must be present
  on every submission.
from_schema: https://w3id.org/ber-data/ber-data-registry
is_a: CatalogEntity
slot_usage:
  description:
    name: description
    required: true
  created_date:
    name: created_date
    required: true
  owner:
    name: owner
    required: true
  contact_point:
    name: contact_point
    required: true
  namespace:
    name: namespace
    required: true
  status:
    name: status
    required: true
  is_deprecated:
    name: is_deprecated
    required: true
  update_schedule:
    name: update_schedule
    required: true
  access_level:
    name: access_level
    required: true
attributes:
  owner:
    name: owner
    description: Person or team responsible for this data source.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    slot_uri: dcterms:publisher
    alias: owner
    owner: DataSource
    domain_of:
    - DataSource
    range: string
    required: true
  contact_point:
    name: contact_point
    description: Structured contact information for this data source.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    slot_uri: dcat:contactPoint
    alias: contact_point
    owner: DataSource
    domain_of:
    - DataSource
    range: ContactPoint
    required: true
    inlined: true
  namespace:
    name: namespace
    description: Database name, source name, or space name within the lakehouse (e.g.
      "kbase_public", "jgi_object_store").
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    alias: namespace
    owner: DataSource
    domain_of:
    - DataSource
    range: string
    required: true
  status:
    name: status
    description: Current lifecycle status of this data source.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    slot_uri: dcat:status
    alias: status
    owner: DataSource
    domain_of:
    - DataSource
    range: DataSourceStatus
    required: true
  is_deprecated:
    name: is_deprecated
    description: Whether this data source is deprecated. Must always be explicitly
      set.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    alias: is_deprecated
    owner: DataSource
    domain_of:
    - DataSource
    range: boolean
    required: true
  update_schedule:
    name: update_schedule
    description: How frequently this data source is updated.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    slot_uri: dcterms:accrualPeriodicity
    alias: update_schedule
    owner: DataSource
    domain_of:
    - DataSource
    range: UpdateFrequency
    required: true
  access_level:
    name: access_level
    description: Visibility/access level of this data source.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    alias: access_level
    owner: DataSource
    domain_of:
    - DataSource
    range: AccessLevel
    required: true
  keywords:
    name: keywords
    description: Discovery tags for this data source.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    slot_uri: dcat:keyword
    alias: keywords
    owner: DataSource
    domain_of:
    - DataSource
    range: string
    multivalued: true
  project_affiliation:
    name: project_affiliation
    description: BER program affiliations (e.g. KBase, NMDC, JGI, Phage Foundry).
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    alias: project_affiliation
    owner: DataSource
    domain_of:
    - DataSource
    range: string
    multivalued: true
  license:
    name: license
    description: License governing use of this data source.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    slot_uri: dcterms:license
    alias: license
    owner: DataSource
    domain_of:
    - DataSource
    range: string
  domain:
    name: domain
    description: Scientific domain(s) covered by this data source.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    slot_uri: dcat:theme
    alias: domain
    owner: DataSource
    domain_of:
    - DataSource
    range: string
    multivalued: true
  version:
    name: version
    description: Version identifier for this data source.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    slot_uri: pav:version
    alias: version
    owner: DataSource
    domain_of:
    - DataSource
    range: string
  doi:
    name: doi
    description: DOI if this data source has been published.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    alias: doi
    owner: DataSource
    domain_of:
    - DataSource
    range: string
  facility:
    name: facility
    description: Originating facility (e.g. NERSC, JGI, EMSL) per HPDF report recommendations.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    alias: facility
    owner: DataSource
    domain_of:
    - DataSource
    range: string
  format:
    name: format
    description: Data format(s) available (e.g. Parquet, CSV, HDF5, Zarr, NetCDF,
      FITS).
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    slot_uri: dcterms:format
    alias: format
    owner: DataSource
    domain_of:
    - DataSource
    range: string
    multivalued: true
  deprecation_date:
    name: deprecation_date
    description: Date this data source was deprecated.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    alias: deprecation_date
    owner: DataSource
    domain_of:
    - DataSource
    range: date
  deprecation_reason:
    name: deprecation_reason
    description: Explanation for why this data source was deprecated.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    alias: deprecation_reason
    owner: DataSource
    domain_of:
    - DataSource
    range: string
  replaced_by:
    name: replaced_by
    description: Reference to the data source that replaces this deprecated one.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    alias: replaced_by
    owner: DataSource
    domain_of:
    - DataSource
    range: DataSource
  previous_version:
    name: previous_version
    description: Reference to the previous version of this data source.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    alias: previous_version
    owner: DataSource
    domain_of:
    - DataSource
    range: DataSource
  temporal_coverage_start:
    name: temporal_coverage_start
    description: Start date of the temporal coverage of this data.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    alias: temporal_coverage_start
    owner: DataSource
    domain_of:
    - DataSource
    range: date
  temporal_coverage_end:
    name: temporal_coverage_end
    description: End date of the temporal coverage of this data.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    alias: temporal_coverage_end
    owner: DataSource
    domain_of:
    - DataSource
    range: date
  spatial_coverage:
    name: spatial_coverage
    description: Geographic or spatial coverage description.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    alias: spatial_coverage
    owner: DataSource
    domain_of:
    - DataSource
    range: string
  data_quality_notes:
    name: data_quality_notes
    description: Notes on data quality, known issues, or limitations.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    alias: data_quality_notes
    owner: DataSource
    domain_of:
    - DataSource
    range: string
  lineage:
    name: lineage
    description: Provenance or lineage information for this data source.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    slot_uri: prov:wasDerivedFrom
    alias: lineage
    owner: DataSource
    domain_of:
    - DataSource
    range: string
  documentation_url:
    name: documentation_url
    description: URL to external documentation for this data source.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    alias: documentation_url
    owner: DataSource
    domain_of:
    - DataSource
    range: uri
  instrument:
    name: instrument
    description: Instrument or sensor that generated the data.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    alias: instrument
    owner: DataSource
    domain_of:
    - DataSource
    range: string
  modality:
    name: modality
    description: Data modality (e.g. genomic, proteomic, imaging).
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    alias: modality
    owner: DataSource
    domain_of:
    - DataSource
    range: string
  size_bytes:
    name: size_bytes
    description: Total size of the data source in bytes.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    alias: size_bytes
    owner: DataSource
    domain_of:
    - DataSource
    range: integer
  row_count:
    name: row_count
    description: Number of rows or records in the data source.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    alias: row_count
    owner: DataSource
    domain_of:
    - DataSource
    range: integer
  table_count:
    name: table_count
    description: Number of tables or collections in the data source.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    alias: table_count
    owner: DataSource
    domain_of:
    - DataSource
    range: integer
  source_type:
    name: source_type
    description: Type of data source within the lakehouse (e.g. namespace, object_storage,
      relational_database).
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    alias: source_type
    owner: DataSource
    domain_of:
    - DataSource
    range: SourceType
  database_engine:
    name: database_engine
    description: Database engine for Dremio database sources (e.g. postgresql, mysql,
      mongodb).
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    alias: database_engine
    owner: DataSource
    domain_of:
    - DataSource
    range: DatabaseEngine
  category:
    name: category
    description: Organizational category (project, shared, personal, system).
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    alias: category
    owner: DataSource
    domain_of:
    - DataSource
    range: DataSourceCategory
  id:
    name: id
    description: Unique identifier for this catalog entity.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    slot_uri: schema:identifier
    identifier: true
    alias: id
    owner: DataSource
    domain_of:
    - Catalog
    - CatalogEntity
    range: uriorcurie
    required: true
  title:
    name: title
    description: Human-readable name for this entity.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    slot_uri: dcterms:title
    alias: title
    owner: DataSource
    domain_of:
    - Catalog
    - CatalogEntity
    range: string
    required: true
  description:
    name: description
    description: Free-text description of this entity.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    slot_uri: dcterms:description
    alias: description
    owner: DataSource
    domain_of:
    - Catalog
    - CatalogEntity
    range: string
    required: true
  created_date:
    name: created_date
    description: Date this entity was first created or registered.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    slot_uri: dcterms:created
    alias: created_date
    owner: DataSource
    domain_of:
    - CatalogEntity
    range: date
    required: true
  last_modified:
    name: last_modified
    description: Date this entity was last updated.
    from_schema: https://w3id.org/ber-data/ber-data-registry
    rank: 1000
    slot_uri: dcterms:modified
    alias: last_modified
    owner: DataSource
    domain_of:
    - CatalogEntity
    range: date
class_uri: dcat:Dataset