Inspiration

Merging datasets with differing schemas is time-consuming and error-prone which is why we decided to take on the EYxHTV challenge. We wanted to speed data engineering by automating schema matching, producing mapping docs, and giving analysts a one-click merge workflow powered by AI.

What it does

DataMerge Pro accepts two data bundles (and optional schema docs), uses AI to infer table and field mappings, applies transformations to merge heterogeneous datasets, and outputs merged files plus a downloadable PDF mapping report for review and audit.

How we built it

  • Frontend: Next.js (app router) with Tailwind CSS and react-icons for a clean, responsive UI. Drag-and-drop upload components and a compact download/mapping page.
  • Backend: FastAPI serving upload, merge, and PDF endpoints. Pandas handles file reads (CSV/Excel), and a generative AI (Gemini) produces mapping suggestions from uploaded data + optional schema docs.
  • Orchestration: Merge runs as a background job with a job-status endpoint; mapping results are saved to outputs and returned to the frontend for review and download.
  • Docs: Server-side PDF generation of mapping documentation for audit and sharing.

Challenges we ran into

  • Prompt size: We had to think about the context window and ensure what ever we are uploading for analysis respects it. We mitigated this by simply sampling the first few rows of each of the files.
  • Creating a good data merging pipeline: We were initially struggling on formulating a technique for creating mappings and merging.
  • Edge cases in schema parsing: mixed types and inconsistent column names can potentially cause issues in our mapping and merging pipeline.
  • UI/UX: We struggled to figure out the right balance between a user friendly UI/UX and a technical functional UI/UX.

Accomplishments that we're proud of

  • Created an elegant UI/UX to merge 2 datasets easily.
  • Displaying detailed mapping documentation on download page.
  • Website automatically creates and allows downloading of PDF documentation regarding mapping details.
  • Functioning full-stack project with excellent communication between frontend and backend.

What we learned

  • Learned to use FastAPI with python and to use Google Gen AI Python SDK to implement AI powered mapping capabilities in the application.
  • Learned to use Python Pandas for reading and merging CSV and Excel files.
  • Polished our UI/UX design skills by creating an elegant user friendly UI/UX.

What's next for DataMerge Pro

  • Add an interactive mapping editor so users can tweak mappings if they want before merging.
  • Add automated tests and CI, plus deployment as a containerized service with a small staging environment.
  • Test different prompts for creating the mappings and see what works best
  • Implement data normalization techniques to make process more rigorous

Built With

Share this project:

Updates