data-doc & data-sim Project

Agentic data cleansing and normalization system


Project maintained by realSergiy Hosted on GitHub Pages — Theme by mattgraham

Project Plan: Agentic “data-doc” Internal Demo

← Back to Home

This document outlines the project plan for developing an internal technology demonstration of the “data-doc,” an agentic data cleansing and normalization system.

Project Components

The project consists of two independent applications:

High-Level System Architecture

The end-to-end architecture is composed of two independent applications that interact via a shared data layer hosted on Supabase (PostgreSQL) and Google Drive.

System Components

data-sim

A standalone application responsible for generating and populating the demo environment:

See data-sim Documentation for detailed specifications.

data-doc

The core agentic application with three main components:

See data-doc Documentation for detailed specifications.

Architectural Flow

  1. The data-sim is executed to set up the demo scenario, populating the Supabase DB and Google Drive with a faulty dataset
  2. The data-doc frontend connects to its backend
  3. The user initiates a “scan” from the web UI
  4. The data-doc backend triggers the LangGraph agent workflow
  5. The agents connect to the Supabase DB and Google Drive (read-only) to perform discovery and diagnosis
  6. The results (Health Report, Cleansing Plan) are sent back to the frontend for user review
  7. The user approves the plan
  8. The data-doc backend triggers the execution agent, which connects to the Supabase DB (read-write) to apply the approved fixes

Mismatch Analysis

This demo plan specifies a Supabase (PostgreSQL) stack and not the Oracle ecosystem. Why this technical difference does not invalidate the demo’s purpose as a proof-of-concept: