Medallion Layer Features

Explore how the Bronze, Silver, and Gold layers evolve across the Lakehouse pipeline to support clean analytics and machine learning.

Gold Layer: Aggregated User Features

Derived from Silver, this layer aggregates user interactions into ML-ready features for personalization and recommendation systems.

🔄 What's New in This Layer

Gold Glue job success

Gold job completed in under 2 minutes on AWS Glue 5.0 using 2 DPUs.

S3 gold layer output

Data written to s3://ai-lakehouse-project/gold/user_features/

Gold Table Schema

user_id              STRING
last_event_timestamp TIMESTAMP
last_event_type      STRING
click_count          INT
purchase_count       INT
last_feature_hash    STRING
days_since_last_event INT
training_date        DATE

Sample Athena Query

SELECT user_id, click_count, purchase_count
FROM ai_lakehouse_db.gold_user_features
WHERE training_date = CURRENT_DATE
ORDER BY click_count DESC
LIMIT 10;

Validated with Athena for fast, partitioned queries.

Athena querying gold_user_features

Athena query on gold_user_features with successful result preview and performance metrics.

⚙️Automated ETL with Glue Workflow

The pipeline leverages AWS Glue Workflow to orchestrate Bronze → Silver → Gold transformations. This screenshot shows the full run completed successfully, with DAG-style visual dependencies.

AWS Glue Workflow for Medallion Architecture

✔️ Workflow execution status: Completed (July 10, 2025)

Silver Layer: Cleaned and Partitioned Events

Silver builds on Bronze, eliminating duplicates, filtering out nulls, and enriching with partitioning for efficient query access.

🔄 What's New in This Layer

Glue Silver job success

Silver job performed partitioning and quality filters.

S3 silver layer

Partitioned data at s3://ai-lakehouse-project/silver/user_events/

Silver Table Schema

user_id         STRING
event_type      STRING
event_timestamp TIMESTAMP
event_date      DATE
feature_hash    STRING

Bronze Layer: Normalized Raw Ingestion

The Bronze layer is the raw landing zone. It transforms JSON into schema-aware Parquet, enriched with ingestion metadata and AI-friendly fields.

🔄 What's New in This Layer

Glue Bronze Job Success

Bronze ETL job executed on AWS Glue 5.0 with 2 DPUs. Parsed and enriched raw JSON from S3.

S3 Bronze Output Screenshot

Raw Parquet output written to s3://ai-lakehouse-project/bronze/user_events_parquet/

Bronze Table Schema

user_id           STRING
session_id        STRING
event_type        STRING
event_timestamp   TIMESTAMP
raw_payload       STRING
ingestion_ts      TIMESTAMP
model_input_flag  BOOLEAN
feature_hash      STRING

ETL Layer Upgrade Summary

This project was upgraded to support:

Each job is optimized for fast execution (under 2 mins) and minimal reprocessing.