{"id":78382,"date":"2024-06-13T09:00:28","date_gmt":"2024-06-13T16:00:28","guid":{"rendered":"https:\/\/github.blog\/?p=78382"},"modified":"2024-07-23T06:14:10","modified_gmt":"2024-07-23T13:14:10","slug":"unlocking-the-power-of-unstructured-data-with-rag","status":"publish","type":"post","link":"https:\/\/github.blog\/ai-and-ml\/llms\/unlocking-the-power-of-unstructured-data-with-rag\/","title":{"rendered":"Unlocking the power of unstructured data with RAG"},"content":{"rendered":"\n
Whether they’re building a new product or improving a process or feature, developers and IT leaders need data and insights to make informed decisions.<\/p>\n
When it comes to software development, this data exists in two ways: unstructured and structured. While structured data follows a specific and predefined format, unstructured data—like email, an audio or visual file, code comment, or commit message—doesn’t. This makes unstructured data hard to organize and interpret, which means teams can miss out on potentially valuable insights.<\/p>\n
To make the most of their unstructured data, development teams are turning to retrieval-augmented generation, or RAG, a method for customizing large language models (LLMs). They can use RAG to keep LLMs up to date with organizational knowledge and the latest information available on the web. They can also use RAG and LLMs to surface and extract insights from unstructured data.<\/p>\n
GitHub data scientists, Pam Moriarty<\/a> and Jessica Guo<\/a>, explain unstructured data’s unique value in software development, and how developers and organizations can use RAG to create greater efficiency and value in the development process.<\/p>\n When it comes to software development, unstructured data includes source code and the context surrounding it<\/strong>, as these sources of information don’t follow a predefined format.<\/p>\n Here are some examples of unstructured data on GitHub:<\/p>\n The same features that make unstructured data valuable also make it hard to analyze.<\/p>\n Unstructured data lacks inherent organization, as it often consists of free-form text, images, or multimedia content.<\/p>\n “Without clear boundaries or predefined formats, extracting meaningful information from unstructured data becomes very challenging,” Guo says.<\/p>\n But LLMs can help to identify complex patterns in unstructured data<\/strong>—especially text. Though not all unstructured data is text, a lot of text is unstructured. And LLMs can help you to analyze it.<\/p>\n “When dealing with ambiguous, semi-structured or unstructured data, LLMs dramatically excel at identifying patterns, sentiments, entities, and topics within text data and uncover valuable insights that might otherwise remain hidden,” Guo explains.<\/p>\nUnstructured data in software development<\/span><\/a><\/h2>\n
\n
The value of unstructured data<\/span><\/a><\/h2>\n