data-doc & data-sim Project

Agentic data cleansing and normalization system


Project maintained by realSergiy Hosted on GitHub Pages — Theme by mattgraham

data-sim Application

← Back to Project Overview

Overview

The data-sim is a standalone application that programmatically generates realistic manufacturing datasets with embedded data quality challenges. It creates controlled, faulty environments where the data-doc can demonstrate its diagnostic and repair capabilities.

Technology Stack

Objective

The data-sim will programmatically generate a realistic manufacturing dataset that embodies specific data quality challenges. Its primary function is to create a controlled, faulty environment where the data-doc can demonstrate its diagnostic and repair capabilities.

Data Schema

The simulator will generate the following assets:

Supabase (PostgreSQL) Tables

Table Name Columns Description Related Pain(s)
parts part_id (TEXT), description (TEXT), revision (TEXT), source_system (TEXT), created_at (TIMESTAMP) Canonical list of parts, simulating data from two different ERPs Disparate data, Inconsistent BOM/Part data
bill_of_materials assembly_part_id (TEXT), component_part_id (TEXT), quantity (INT), version (TEXT) Defines product structures Inconsistent BOM/Part data
suppliers supplier_id (TEXT), supplier_name (TEXT), part_id (TEXT), lead_time_days (INT) Supplier and lead time information for parts Makeshift capacity planning
work_in_progress wo_id (TEXT), part_id (TEXT), status (TEXT), last_scan_location (TEXT), last_scan_time (TIMESTAMP) Snapshot of current work orders on the shop floor WIP and quality issues
engineering_changes eco_id (TEXT), product_affected (TEXT), change_description (TEXT), status (TEXT) A database record of Engineering Change Orders Divergent ECOs

Google Drive Files

File Name Pattern Description Related Pain(s)
eco_‹eco_id›.pdf Unstructured PDF documents detailing an engineering change Divergent ECOs, Reporting and trust deficits
quality_reports/‹wo_id›.csv CSV exports from a hypothetical quality system with inspection results WIP and quality issues

Fault Injection

The simulator will inject the following types of faults:

1. Disparate Data Faults

2. Inconsistent BOM/Part Data Faults

3. Divergent ECO Faults

4. WIP & Quality Faults

Agent Architecture

The data-sim uses LangGraph to orchestrate agents that generate and inject faulty data:

Web Interface Features

The React Vite frontend provides: