{"id":78382,"date":"2024-06-13T09:00:28","date_gmt":"2024-06-13T16:00:28","guid":{"rendered":"https:\/\/github.blog\/?p=78382"},"modified":"2024-07-23T06:14:10","modified_gmt":"2024-07-23T13:14:10","slug":"unlocking-the-power-of-unstructured-data-with-rag","status":"publish","type":"post","link":"https:\/\/github.blog\/ai-and-ml\/llms\/unlocking-the-power-of-unstructured-data-with-rag\/","title":{"rendered":"Unlocking the power of unstructured data with RAG"},"content":{"rendered":"\n

Whether they’re building a new product or improving a process or feature, developers and IT leaders need data and insights to make informed decisions.<\/p>\n

When it comes to software development, this data exists in two ways: unstructured and structured. While structured data follows a specific and predefined format, unstructured data—like email, an audio or visual file, code comment, or commit message—doesn’t. This makes unstructured data hard to organize and interpret, which means teams can miss out on potentially valuable insights.<\/p>\n

To make the most of their unstructured data, development teams are turning to retrieval-augmented generation, or RAG, a method for customizing large language models (LLMs). They can use RAG to keep LLMs up to date with organizational knowledge and the latest information available on the web. They can also use RAG and LLMs to surface and extract insights from unstructured data.<\/p>\n

GitHub data scientists, Pam Moriarty<\/a> and Jessica Guo<\/a>, explain unstructured data’s unique value in software development, and how developers and organizations can use RAG to create greater efficiency and value in the development process.<\/p>\n

Unstructured data in software development<\/span><\/a><\/h2>\n

When it comes to software development, unstructured data includes source code and the context surrounding it<\/strong>, as these sources of information don’t follow a predefined format.<\/p>\n

Here are some examples of unstructured data on GitHub:<\/p>\n