data-doc & data-sim Project

Agentic data cleansing and normalization system


Project maintained by realSergiy Hosted on GitHub Pages — Theme by mattgraham

data-doc Application

← Back to Project Overview

Overview

The data-doc is an agentic application that diagnoses and repairs data quality issues in manufacturing databases. It uses LangGraph-orchestrated agents powered by Claude API to scan, analyze, and fix data problems with minimal human intervention.

Technology Stack

Agentic Framework Design (LangGraph)

A multi-agent graph where each agent has a specialized role, managed by LangGraph for state transitions and complex logic.

Agent 1: ScannerAgent

Agent 2: DiagnosisAgent

Agent 3: ExecutionAgent

Core Functionality

1. Connection & Discovery

The system securely connects to Supabase and Google Drive using API keys. The ScannerAgent dynamically discovers all tables and files in the target environment.

2. Intelligent Scanning

The ScannerAgent executes a pre-defined sequence of checks:

3. Reporting & Planning

The DiagnosisAgent produces:

4. Interactive Cleansing

The web UI displays the Cleansing Plan as a checklist. Users can:

Demo Walkthrough

Demo Scenario (15 minutes)

(0:00-2:00) The Problem

Start on the Supabase dashboard. Show the messy tables (parts, bill_of_materials). Point out a clear duplicate part and an orphaned BOM record. Briefly show the Google Drive folder, highlighting an ECO PDF that has no corresponding database entry. This establishes the “pain.”

(2:00-5:00) Diagnosis

Switch to the data-doc web app. Kick off a “New Scan.” The UI shows the ScannerAgent and DiagnosisAgent running in sequence. A “Health Report” is generated.

(5:00-9:00) The Plan

Walk through the Health Report, showing how the agent correctly identified the exact issues we saw manually. Click “Generate Cleansing Plan.” Review the proposed plan, showing the generated SQL for merging duplicate parts and deleting orphaned records. Highlight the step that proposes creating a new entry in the engineering_changes table based on the PDF content.

(9:00-12:00) Execution

The user clicks “Approve & Execute Plan.” A modal appears showing the ExecutionAgent logging its actions in real-time: “Backing up parts table… Executing step 1/5: Merging duplicates… Success.”

(12:00-15:00) The Result

The UI shows a “Cleansing Complete” summary. Switch back to the Supabase dashboard and refresh the parts table. Show that the duplicates are gone and the data is clean. Show the new record in the engineering_changes table. Conclude by reiterating how the agentic system automated a complex, error-prone manual task in minutes.

Web Interface Design

The React Vite frontend is clean, modern, and focused on clarity and user control.

Main Dashboard

Scan Results View

Health Report Tab

Displays the generated markdown report with collapsible sections for each issue category (e.g., “Inconsistent Parts,” “Divergent ECOs”)

Cleansing Plan Tab

An interactive checklist of proposed actions. Each item shows:

Execution View