data-doc & data-sim Project
Agentic data cleansing and normalization system
Project maintained by realSergiy
Hosted on GitHub Pages — Theme by mattgraham
Agentic Data Cleansing System
Welcome to the data-doc and data-sim project documentation.
This project demonstrates an agentic data cleansing and normalization system designed for manufacturing environments.
Project Documentation
Complete project plan including system architecture, components, and proof-of-concept approach.
Application Specifications
- data-sim - Data Simulator application that generates realistic datasets with embedded quality issues
- data-doc - Data Doctor application that diagnoses and repairs data quality problems using AI agents
Key Features
- Agentic Architecture: LangGraph-orchestrated agents powered by Claude API
- Intelligent Scanning: Automated discovery and profiling of data quality issues
- Interactive Cleansing: User-approved execution of generated repair plans
- Modern Stack: Python backend with FastAPI, React Vite frontend
- Demo-Ready: Complete 15-minute walkthrough scenario
Getting Started
Start with the Project Overview to understand the complete system architecture and design approach.