Agentic data cleansing and normalization system
The data-sim is a standalone application that programmatically generates realistic manufacturing datasets with embedded data quality challenges. It creates controlled, faulty environments where the data-doc can demonstrate its diagnostic and repair capabilities.
The data-sim will programmatically generate a realistic manufacturing dataset that embodies specific data quality challenges. Its primary function is to create a controlled, faulty environment where the data-doc can demonstrate its diagnostic and repair capabilities.
The simulator will generate the following assets:
| Table Name | Columns | Description | Related Pain(s) |
|---|---|---|---|
| parts | part_id (TEXT), description (TEXT), revision (TEXT), source_system (TEXT), created_at (TIMESTAMP) | Canonical list of parts, simulating data from two different ERPs | Disparate data, Inconsistent BOM/Part data |
| bill_of_materials | assembly_part_id (TEXT), component_part_id (TEXT), quantity (INT), version (TEXT) | Defines product structures | Inconsistent BOM/Part data |
| suppliers | supplier_id (TEXT), supplier_name (TEXT), part_id (TEXT), lead_time_days (INT) | Supplier and lead time information for parts | Makeshift capacity planning |
| work_in_progress | wo_id (TEXT), part_id (TEXT), status (TEXT), last_scan_location (TEXT), last_scan_time (TIMESTAMP) | Snapshot of current work orders on the shop floor | WIP and quality issues |
| engineering_changes | eco_id (TEXT), product_affected (TEXT), change_description (TEXT), status (TEXT) | A database record of Engineering Change Orders | Divergent ECOs |
| File Name Pattern | Description | Related Pain(s) |
|---|---|---|
| eco_‹eco_id›.pdf | Unstructured PDF documents detailing an engineering change | Divergent ECOs, Reporting and trust deficits |
| quality_reports/‹wo_id›.csv | CSV exports from a hypothetical quality system with inspection results | WIP and quality issues |
The simulator will inject the following types of faults:
The data-sim uses LangGraph to orchestrate agents that generate and inject faulty data:
The React Vite frontend provides: