Inspiration

Many people love and are expert users of Excel, but can’t answer questions if data isn’t in one spreadsheet. Data comes from tables with different schema or unstructured text blobs Data work is only 5% analysis, and 95% getting data into the table needed for analysis

What it does

Introducing “DataJack” – your jack of all trades, your Virtual Data Engineer

How we built it

  • Self-hosted plugin has access to a directory with data CSVs (future: integration with other sources)
  • User provides desired output table schema through ChatGPT
  • DataJack reads the data CSVs and asks GPT4 to produce output table in desired format
  • Feed GPT4 only CSV headers and some randomly sampled rows
  • Ask GPT4 to produce code to read the input csvs and produce the output table, instead of asking it to produce the output table directly
  • Execute code within a validation loop

What's next for DataJack

  • Future work: scale to massive tables without increasing context window or token costs

Built With

  • deeplake
  • gpt4
  • replit
Share this project:

Updates