Class: DataSource
A cataloged data source within a lakehouse. Represents a namespace, database, object storage source, or other data collection. All required fields must be present on every submission.
URI: dcat:Dataset
classDiagram
class DataSource
click DataSource href "../DataSource/"
CatalogEntity <|-- DataSource
click CatalogEntity href "../CatalogEntity/"
DataSource : access_level
DataSource --> "1" AccessLevel : access_level
click AccessLevel href "../AccessLevel/"
DataSource : category
DataSource --> "0..1" DataSourceCategory : category
click DataSourceCategory href "../DataSourceCategory/"
DataSource : contact_point
DataSource --> "1" ContactPoint : contact_point
click ContactPoint href "../ContactPoint/"
DataSource : created_date
DataSource : data_quality_notes
DataSource : database_engine
DataSource --> "0..1" DatabaseEngine : database_engine
click DatabaseEngine href "../DatabaseEngine/"
DataSource : deprecation_date
DataSource : deprecation_reason
DataSource : description
DataSource : documentation_url
DataSource : doi
DataSource : domain
DataSource : facility
DataSource : format
DataSource : id
DataSource : instrument
DataSource : is_deprecated
DataSource : keywords
DataSource : last_modified
DataSource : license
DataSource : lineage
DataSource : modality
DataSource : namespace
DataSource : owner
DataSource : previous_version
DataSource --> "0..1" DataSource : previous_version
click DataSource href "../DataSource/"
DataSource : project_affiliation
DataSource : replaced_by
DataSource --> "0..1" DataSource : replaced_by
click DataSource href "../DataSource/"
DataSource : row_count
DataSource : size_bytes
DataSource : source_type
DataSource --> "0..1" SourceType : source_type
click SourceType href "../SourceType/"
DataSource : spatial_coverage
DataSource : status
DataSource --> "1" DataSourceStatus : status
click DataSourceStatus href "../DataSourceStatus/"
DataSource : table_count
DataSource : temporal_coverage_end
DataSource : temporal_coverage_start
DataSource : title
DataSource : update_schedule
DataSource --> "1" UpdateFrequency : update_schedule
click UpdateFrequency href "../UpdateFrequency/"
DataSource : version
Inheritance
- CatalogEntity
- DataSource
Slots
| Name | Cardinality and Range | Description | Inheritance |
|---|---|---|---|
| owner | 1 String |
Person or team responsible for this data source | direct |
| contact_point | 1 ContactPoint |
Structured contact information for this data source | direct |
| namespace | 1 String |
Database name, source name, or space name within the lakehouse (e | direct |
| status | 1 DataSourceStatus |
Current lifecycle status of this data source | direct |
| is_deprecated | 1 Boolean |
Whether this data source is deprecated | direct |
| update_schedule | 1 UpdateFrequency |
How frequently this data source is updated | direct |
| access_level | 1 AccessLevel |
Visibility/access level of this data source | direct |
| keywords | * String |
Discovery tags for this data source | direct |
| project_affiliation | * String |
BER program affiliations (e | direct |
| license | 0..1 String |
License governing use of this data source | direct |
| domain | * String |
Scientific domain(s) covered by this data source | direct |
| version | 0..1 String |
Version identifier for this data source | direct |
| doi | 0..1 String |
DOI if this data source has been published | direct |
| facility | 0..1 String |
Originating facility (e | direct |
| format | * String |
Data format(s) available (e | direct |
| deprecation_date | 0..1 Date |
Date this data source was deprecated | direct |
| deprecation_reason | 0..1 String |
Explanation for why this data source was deprecated | direct |
| replaced_by | 0..1 DataSource |
Reference to the data source that replaces this deprecated one | direct |
| previous_version | 0..1 DataSource |
Reference to the previous version of this data source | direct |
| temporal_coverage_start | 0..1 Date |
Start date of the temporal coverage of this data | direct |
| temporal_coverage_end | 0..1 Date |
End date of the temporal coverage of this data | direct |
| spatial_coverage | 0..1 String |
Geographic or spatial coverage description | direct |
| data_quality_notes | 0..1 String |
Notes on data quality, known issues, or limitations | direct |
| lineage | 0..1 String |
Provenance or lineage information for this data source | direct |
| documentation_url | 0..1 Uri |
URL to external documentation for this data source | direct |
| instrument | 0..1 String |
Instrument or sensor that generated the data | direct |
| modality | 0..1 String |
Data modality (e | direct |
| size_bytes | 0..1 Integer |
Total size of the data source in bytes | direct |
| row_count | 0..1 Integer |
Number of rows or records in the data source | direct |
| table_count | 0..1 Integer |
Number of tables or collections in the data source | direct |
| source_type | 0..1 SourceType |
Type of data source within the lakehouse (e | direct |
| database_engine | 0..1 DatabaseEngine |
Database engine for Dremio database sources (e | direct |
| category | 0..1 DataSourceCategory |
Organizational category (project, shared, personal, system) | direct |
| id | 1 Uriorcurie |
Unique identifier for this catalog entity | CatalogEntity |
| title | 1 String |
Human-readable name for this entity | CatalogEntity |
| description | 1 String |
Free-text description of this entity | CatalogEntity |
| created_date | 1 Date |
Date this entity was first created or registered | CatalogEntity |
| last_modified | 0..1 Date |
Date this entity was last updated | CatalogEntity |
Usages
| used by | used in | type | used |
|---|---|---|---|
| Lakehouse | catalog_entries | range | DataSource |
| DataSource | replaced_by | range | DataSource |
| DataSource | previous_version | range | DataSource |
Identifier and Mapping Information
Schema Source
- from schema: https://w3id.org/ber-data/ber-data-registry
Mappings
| Mapping Type | Mapped Value |
|---|---|
| self | dcat:Dataset |
| native | ber_registry:DataSource |
LinkML Source
Direct
name: DataSource
description: A cataloged data source within a lakehouse. Represents a namespace, database,
object storage source, or other data collection. All required fields must be present
on every submission.
from_schema: https://w3id.org/ber-data/ber-data-registry
is_a: CatalogEntity
slots:
- owner
- contact_point
- namespace
- status
- is_deprecated
- update_schedule
- access_level
- keywords
- project_affiliation
- license
- domain
- version
- doi
- facility
- format
- deprecation_date
- deprecation_reason
- replaced_by
- previous_version
- temporal_coverage_start
- temporal_coverage_end
- spatial_coverage
- data_quality_notes
- lineage
- documentation_url
- instrument
- modality
- size_bytes
- row_count
- table_count
- source_type
- database_engine
- category
slot_usage:
description:
name: description
required: true
created_date:
name: created_date
required: true
owner:
name: owner
required: true
contact_point:
name: contact_point
required: true
namespace:
name: namespace
required: true
status:
name: status
required: true
is_deprecated:
name: is_deprecated
required: true
update_schedule:
name: update_schedule
required: true
access_level:
name: access_level
required: true
class_uri: dcat:Dataset
Induced
name: DataSource
description: A cataloged data source within a lakehouse. Represents a namespace, database,
object storage source, or other data collection. All required fields must be present
on every submission.
from_schema: https://w3id.org/ber-data/ber-data-registry
is_a: CatalogEntity
slot_usage:
description:
name: description
required: true
created_date:
name: created_date
required: true
owner:
name: owner
required: true
contact_point:
name: contact_point
required: true
namespace:
name: namespace
required: true
status:
name: status
required: true
is_deprecated:
name: is_deprecated
required: true
update_schedule:
name: update_schedule
required: true
access_level:
name: access_level
required: true
attributes:
owner:
name: owner
description: Person or team responsible for this data source.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
slot_uri: dcterms:publisher
alias: owner
owner: DataSource
domain_of:
- DataSource
range: string
required: true
contact_point:
name: contact_point
description: Structured contact information for this data source.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
slot_uri: dcat:contactPoint
alias: contact_point
owner: DataSource
domain_of:
- DataSource
range: ContactPoint
required: true
inlined: true
namespace:
name: namespace
description: Database name, source name, or space name within the lakehouse (e.g.
"kbase_public", "jgi_object_store").
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
alias: namespace
owner: DataSource
domain_of:
- DataSource
range: string
required: true
status:
name: status
description: Current lifecycle status of this data source.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
slot_uri: dcat:status
alias: status
owner: DataSource
domain_of:
- DataSource
range: DataSourceStatus
required: true
is_deprecated:
name: is_deprecated
description: Whether this data source is deprecated. Must always be explicitly
set.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
alias: is_deprecated
owner: DataSource
domain_of:
- DataSource
range: boolean
required: true
update_schedule:
name: update_schedule
description: How frequently this data source is updated.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
slot_uri: dcterms:accrualPeriodicity
alias: update_schedule
owner: DataSource
domain_of:
- DataSource
range: UpdateFrequency
required: true
access_level:
name: access_level
description: Visibility/access level of this data source.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
alias: access_level
owner: DataSource
domain_of:
- DataSource
range: AccessLevel
required: true
keywords:
name: keywords
description: Discovery tags for this data source.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
slot_uri: dcat:keyword
alias: keywords
owner: DataSource
domain_of:
- DataSource
range: string
multivalued: true
project_affiliation:
name: project_affiliation
description: BER program affiliations (e.g. KBase, NMDC, JGI, Phage Foundry).
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
alias: project_affiliation
owner: DataSource
domain_of:
- DataSource
range: string
multivalued: true
license:
name: license
description: License governing use of this data source.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
slot_uri: dcterms:license
alias: license
owner: DataSource
domain_of:
- DataSource
range: string
domain:
name: domain
description: Scientific domain(s) covered by this data source.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
slot_uri: dcat:theme
alias: domain
owner: DataSource
domain_of:
- DataSource
range: string
multivalued: true
version:
name: version
description: Version identifier for this data source.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
slot_uri: pav:version
alias: version
owner: DataSource
domain_of:
- DataSource
range: string
doi:
name: doi
description: DOI if this data source has been published.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
alias: doi
owner: DataSource
domain_of:
- DataSource
range: string
facility:
name: facility
description: Originating facility (e.g. NERSC, JGI, EMSL) per HPDF report recommendations.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
alias: facility
owner: DataSource
domain_of:
- DataSource
range: string
format:
name: format
description: Data format(s) available (e.g. Parquet, CSV, HDF5, Zarr, NetCDF,
FITS).
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
slot_uri: dcterms:format
alias: format
owner: DataSource
domain_of:
- DataSource
range: string
multivalued: true
deprecation_date:
name: deprecation_date
description: Date this data source was deprecated.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
alias: deprecation_date
owner: DataSource
domain_of:
- DataSource
range: date
deprecation_reason:
name: deprecation_reason
description: Explanation for why this data source was deprecated.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
alias: deprecation_reason
owner: DataSource
domain_of:
- DataSource
range: string
replaced_by:
name: replaced_by
description: Reference to the data source that replaces this deprecated one.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
alias: replaced_by
owner: DataSource
domain_of:
- DataSource
range: DataSource
previous_version:
name: previous_version
description: Reference to the previous version of this data source.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
alias: previous_version
owner: DataSource
domain_of:
- DataSource
range: DataSource
temporal_coverage_start:
name: temporal_coverage_start
description: Start date of the temporal coverage of this data.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
alias: temporal_coverage_start
owner: DataSource
domain_of:
- DataSource
range: date
temporal_coverage_end:
name: temporal_coverage_end
description: End date of the temporal coverage of this data.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
alias: temporal_coverage_end
owner: DataSource
domain_of:
- DataSource
range: date
spatial_coverage:
name: spatial_coverage
description: Geographic or spatial coverage description.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
alias: spatial_coverage
owner: DataSource
domain_of:
- DataSource
range: string
data_quality_notes:
name: data_quality_notes
description: Notes on data quality, known issues, or limitations.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
alias: data_quality_notes
owner: DataSource
domain_of:
- DataSource
range: string
lineage:
name: lineage
description: Provenance or lineage information for this data source.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
slot_uri: prov:wasDerivedFrom
alias: lineage
owner: DataSource
domain_of:
- DataSource
range: string
documentation_url:
name: documentation_url
description: URL to external documentation for this data source.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
alias: documentation_url
owner: DataSource
domain_of:
- DataSource
range: uri
instrument:
name: instrument
description: Instrument or sensor that generated the data.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
alias: instrument
owner: DataSource
domain_of:
- DataSource
range: string
modality:
name: modality
description: Data modality (e.g. genomic, proteomic, imaging).
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
alias: modality
owner: DataSource
domain_of:
- DataSource
range: string
size_bytes:
name: size_bytes
description: Total size of the data source in bytes.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
alias: size_bytes
owner: DataSource
domain_of:
- DataSource
range: integer
row_count:
name: row_count
description: Number of rows or records in the data source.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
alias: row_count
owner: DataSource
domain_of:
- DataSource
range: integer
table_count:
name: table_count
description: Number of tables or collections in the data source.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
alias: table_count
owner: DataSource
domain_of:
- DataSource
range: integer
source_type:
name: source_type
description: Type of data source within the lakehouse (e.g. namespace, object_storage,
relational_database).
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
alias: source_type
owner: DataSource
domain_of:
- DataSource
range: SourceType
database_engine:
name: database_engine
description: Database engine for Dremio database sources (e.g. postgresql, mysql,
mongodb).
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
alias: database_engine
owner: DataSource
domain_of:
- DataSource
range: DatabaseEngine
category:
name: category
description: Organizational category (project, shared, personal, system).
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
alias: category
owner: DataSource
domain_of:
- DataSource
range: DataSourceCategory
id:
name: id
description: Unique identifier for this catalog entity.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
slot_uri: schema:identifier
identifier: true
alias: id
owner: DataSource
domain_of:
- Catalog
- CatalogEntity
range: uriorcurie
required: true
title:
name: title
description: Human-readable name for this entity.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
slot_uri: dcterms:title
alias: title
owner: DataSource
domain_of:
- Catalog
- CatalogEntity
range: string
required: true
description:
name: description
description: Free-text description of this entity.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
slot_uri: dcterms:description
alias: description
owner: DataSource
domain_of:
- Catalog
- CatalogEntity
range: string
required: true
created_date:
name: created_date
description: Date this entity was first created or registered.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
slot_uri: dcterms:created
alias: created_date
owner: DataSource
domain_of:
- CatalogEntity
range: date
required: true
last_modified:
name: last_modified
description: Date this entity was last updated.
from_schema: https://w3id.org/ber-data/ber-data-registry
rank: 1000
slot_uri: dcterms:modified
alias: last_modified
owner: DataSource
domain_of:
- CatalogEntity
range: date
class_uri: dcat:Dataset