Skip to content

Terminology

A glossary of acronyms, projects, and technologies referenced across the BRIDGE / BER Data Lakehouse ecosystem.

Programs & Initiatives

BRIDGE

Biological and enviRonmental Infrastructure for Data manaGement and Exploration. This is the high-level BER Lakehouse effort. Formerly referred to as BERDL (BER Data Lakehouse).

  • BRIDGE DM — Data Modeling Team
  • BRIDGE PE — Platform Engineering Team
  • BRIDGE UR — User Research & Engagement Team

Note: BERDL has now been split into K-BERDL (KBase BER Data Lakehouse) and J-BERDL (JGI Data Lakehouse). BRIDGE is the higher-level effort to link them.

HPDF

High Performance Data Facility.

AmSC

American Science Cloud. DOE's next-generation, federated research ecosystem under the Genesis Mission Platform.

Genesis Mission

The top-level AI directive from DOE.

BER AI Projects

AIMS-LEAF

AI-driven Multimodal imaging for Linking gEnotype And Phenotype. Full title: AI-Driven Multimodal Imaging for Predicting Stress Response in Plants: Linking Genotype to Phenotype across Scales (PI: Samuel Webb).

BERtron

The prototype cross-BER data catalogue.

APPL ("apple")

Advanced Plant Phenotyping Lab. ORNL

ADP

Agentic Deep Phenotyping.

BSSD AI Pilot Projects

Projects funded by BER for FY25. Descriptions of projects relevant to the DM Team are below. List was obtained from the kickoff meeting agenda.

  • OPAL
  • A Generative Pretrained Transformer for Genomic Photosynthesis to Support Abundant Bioenergy Production
  • BERIL
  • BRIDGE
  • SLAC — AI-Driven Multimodal Imaging for Predicting Stress Response in Plants: Linking Genotype to Phenotype across Scales
  • LAMBDA

OPAL

Orchestrated Platform for Autonomous Laboratories. Berkeley Lab's portion of OPAL comprises three projects that are intended to work together:

  1. Microbial Genome-to-Function Mapping (Nathan Hillson, Chris Petzold)
  2. Data Architectures and Agentic AI (aka BERIL; Paramvir Dahal, Chris Mungall)
  3. Standardized federated data infrastructure, integration models (aka BRIDGE Data Modeling Team; Kjiersten Fagnan, Valerie Skye, Montana Smith, Sierra Moxon, AJ Ireland, Danielle Christianson, Ryan Ly)

FAMOUS

Foundational AI Models for Optimizing and Understanding Biological Systems. The Model Team will use AI to enable synthetic biology approaches. Leverage advanced AI to decipher protein and pathway function, automate data generation and collection. PIs: Douglas Mans, Paul Adams.

OPAL-FAMOUS

As of January 14, 2026, BER is referring to OPAL and FAMOUS collectively as "OPAL-FAMOUS."

A Generative Pretrained Transformer for Genomic Photosynthesis

Oak Ridge is developing a plant phenotype lakehouse framework.

LAMBDA

Lakehouse-enabled AI-ready Multi-modal Bioimaging Data Architecture. PI: Paul Adams. J. Harry Caufield and Chris Mungall are doing the data modeling for this project. Framework for all BER-supported structural biology and imaging resources. Establish common data structures, metadata standards, and APIs to enable integration of multimodal datasets into the Data Lakehouse. 6-Lab project. Web site: https://lambda-doe.org/

AMP2

Anaerobic Microbial Phenotyping Platform. Hardware platform currently being installed at EMSL.

Platforms & Frameworks

Ganymede

A commercial data lake product based on Google BigQuery. JGI sample data, growth condition information, and EMSL proteomics data. This data is part of the OPAL lab automation project.

CODA

The UI framework and host for Ganymede system.

BERIL

Integrated AI Infrastructure for Genome-to-Phenotype Research Using the BER Data Lakehouse.

CDM

Central Data Model. The KBase data model for the KBase Lakehouse.

KE

Knowledge Engine.

KBase Tenant

A space set up within the KBase Lakehouse infrastructure for a particular partner's data.

Storage & Data Technologies

S3

(Amazon Simple Storage Service.)

MinIO

(S3-compatible object storage.)

DuckDB

(In-process analytical database.)

Parquet

A columnar storage file format.

Iceberg

Open table format for analytic datasets.

Dremio

Data lakehouse query engine.

DBNS

GCS

Google Cloud Storage.

GOLD

AI & Agentic Tooling

Agentic AI

MCP

Model Context Protocol. A somewhat standardized API (e.g., FastMCP in Python, or a Node.js implementation) that is popular among Agentic AI frameworks. Often this is connected to some back-end API or database and is just a pass-through adapter for Agentic frameworks like Goose or Claude Code. This, when paired with any LLM, gives that LLM the ability to access resources provided by that MCP or even to perform actions (e.g., initiate an analytic workflow with specific parameters and data).

LLM

Large Language Model.

Goose

Claude

Claude Code

Data Concepts

ETL

Extract, Transform, Load.

Label / Tag / Keyword / Annotation

All often used as synonyms.

Data Governance

Data Dictionary

Synonym: Controlled Vocabulary (CV).