<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-US"><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://msadr.ir/feed.xml" rel="self" type="application/atom+xml" /><link href="https://msadr.ir/" rel="alternate" type="text/html" hreflang="en-US" /><updated>2025-07-06T18:55:03+03:30</updated><id>https://msadr.ir/feed.xml</id><title type="html">Data Star</title><subtitle>I&apos;m Mohammad Sadr, Here I will talk about Data and its applications. Data Star it my blog name :)</subtitle><author><name>Data Star</name><email>jafarisadr+datastar@gmail.com</email></author><entry><title type="html">Introducing CryptoIntel</title><link href="https://msadr.ir/cryptointel-pro/" rel="alternate" type="text/html" title="Introducing CryptoIntel" /><published>2025-04-19T02:03:03+03:30</published><updated>2025-04-19T02:03:03+03:30</updated><id>https://msadr.ir/cryptointel-pro</id><content type="html" xml:base="https://msadr.ir/cryptointel-pro/"><![CDATA[<h1 id="-building-a-generative-ai-powered-analyst-for-crypto-market-intelligence">🚀 Building a Generative AI-Powered Analyst for Crypto Market Intelligence</h1>

<p><em>How we created an MVP that understands crypto whitepapers and answers investor-grade questions using LangChain, Gemini, and vector search.</em></p>

<hr />

<h2 id="-the-problem">🧩 The Problem</h2>

<p>In the fast-moving world of cryptocurrency, thousands of new tokens, protocols, and platforms launch every year. Whitepapers, blog posts, and documentation are released daily — and most of it is long, technical, and time-consuming to analyze.</p>

<p><strong>What if we had an AI assistant that could:</strong></p>
<ul>
  <li>Understand these documents,</li>
  <li>Extract the most important financial and technical details,</li>
  <li>And answer questions like a human analyst would — but instantly?</li>
</ul>

<p>That’s exactly what we set out to build.</p>

<hr />

<h2 id="-the-use-case">💡 The Use Case</h2>

<p>We created <strong>CryptoIntel</strong>, an MVP (minimal viable product) that uses <strong>generative AI</strong> to analyze crypto whitepapers and answer investor-facing questions, like:</p>

<blockquote>
  <p><em>“What is the total supply and vesting schedule of this token?”</em></p>
</blockquote>

<p>This isn’t just a chatbot — it’s a <strong>Retrieval-Augmented Generation (RAG)</strong> system that combines document understanding, vector search, and Google Gemini’s LLM to deliver intelligent, structured answers.</p>

<hr />

<h2 id="-how-we-built-it">🔧 How We Built It</h2>

<p>Let’s break down the key components:</p>

<h3 id="1-load-the-crypto-whitepaper">1. Load the Crypto Whitepaper</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">langchain.document_loaders</span> <span class="kn">import</span> <span class="n">UnstructuredPDFLoader</span>

<span class="n">PDF_PATH</span> <span class="o">=</span> <span class="s">"./sample_crypto_whitepaper.pdf"</span>
<span class="n">loader</span> <span class="o">=</span> <span class="n">UnstructuredPDFLoader</span><span class="p">(</span><span class="n">PDF_PATH</span><span class="p">)</span>
<span class="n">documents</span> <span class="o">=</span> <span class="n">loader</span><span class="p">.</span><span class="n">load</span><span class="p">()</span>
</code></pre></div></div>

<h3 id="2-chunk-and-embed-the-text">2. Chunk and Embed the Text</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">langchain.text_splitter</span> <span class="kn">import</span> <span class="n">RecursiveCharacterTextSplitter</span>
<span class="kn">from</span> <span class="nn">langchain_google_genai</span> <span class="kn">import</span> <span class="n">GoogleGenerativeAIEmbeddings</span>
<span class="kn">from</span> <span class="nn">langchain.vectorstores</span> <span class="kn">import</span> <span class="n">Chroma</span>

<span class="n">splitter</span> <span class="o">=</span> <span class="n">RecursiveCharacterTextSplitter</span><span class="p">(</span><span class="n">chunk_size</span><span class="o">=</span><span class="mi">800</span><span class="p">,</span> <span class="n">chunk_overlap</span><span class="o">=</span><span class="mi">150</span><span class="p">)</span>
<span class="n">docs_split</span> <span class="o">=</span> <span class="n">splitter</span><span class="p">.</span><span class="n">split_documents</span><span class="p">(</span><span class="n">documents</span><span class="p">)</span>
<span class="n">docs_texts</span> <span class="o">=</span> <span class="p">[</span><span class="n">doc</span><span class="p">.</span><span class="n">page_content</span> <span class="k">for</span> <span class="n">doc</span> <span class="ow">in</span> <span class="n">docs_split</span><span class="p">]</span>

<span class="n">DB_NAME</span> <span class="o">=</span> <span class="s">"cryptointel"</span>
<span class="n">embed_fn</span> <span class="o">=</span> <span class="n">GeminiEmbeddingFunction</span><span class="p">()</span>
<span class="n">chroma_client</span> <span class="o">=</span> <span class="n">chromadb</span><span class="p">.</span><span class="n">Client</span><span class="p">()</span>
<span class="n">db</span> <span class="o">=</span> <span class="n">chroma_client</span><span class="p">.</span><span class="n">get_or_create_collection</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="n">DB_NAME</span><span class="p">,</span> <span class="n">embedding_function</span><span class="o">=</span><span class="n">embed_fn</span><span class="p">)</span>
<span class="n">db</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">documents</span><span class="o">=</span><span class="n">docs_texts</span><span class="p">,</span> <span class="n">ids</span><span class="o">=</span><span class="p">[</span><span class="nb">str</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">docs_split</span><span class="p">))])</span>
</code></pre></div></div>

<h3 id="3-ask-questions-with-contextual-understanding">3. Ask Questions with Contextual Understanding</h3>

<p>We used <strong>LangChain’s RetrievalQA</strong> with a custom prompt for non-technical users:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">langchain_google_genai</span> <span class="kn">import</span> <span class="n">GoogleGenerativeAI</span>
<span class="kn">from</span> <span class="nn">langchain.chains</span> <span class="kn">import</span> <span class="n">RetrievalQA</span>

<span class="n">prompt</span> <span class="o">=</span> <span class="sa">f</span><span class="s">"""You are a friendly and informative assistant helping someone understand a crypto project's whitepaper.
Use the provided passage as a reference to answer the question below.
Make your explanation simple, engaging, and non-technical—like you're explaining it to a curious friend.
If the passage doesn't contain relevant information, let the user know kindly and skip it.

QUESTION: </span><span class="si">{</span><span class="n">query</span><span class="si">}</span><span class="s">
"""</span>

<span class="c1"># Add the retrieved documents to the prompt.
</span><span class="k">for</span> <span class="n">passage</span> <span class="ow">in</span> <span class="n">all_passages</span><span class="p">:</span>
    <span class="n">passage_oneline</span> <span class="o">=</span> <span class="n">passage</span><span class="p">.</span><span class="n">replace</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="s">" "</span><span class="p">)</span>
    <span class="n">prompt</span> <span class="o">+=</span> <span class="sa">f</span><span class="s">"PASSAGE: </span><span class="si">{</span><span class="n">passage_oneline</span><span class="si">}</span><span class="se">\n</span><span class="s">"</span>


<span class="n">model_config</span> <span class="o">=</span> <span class="n">types</span><span class="p">.</span><span class="n">GenerateContentConfig</span><span class="p">(</span>
    <span class="n">temperature</span><span class="o">=</span><span class="mf">0.2</span><span class="p">,</span>
<span class="p">)</span>

<span class="n">answer</span> <span class="o">=</span> <span class="n">client</span><span class="p">.</span><span class="n">models</span><span class="p">.</span><span class="n">generate_content</span><span class="p">(</span>
    <span class="n">model</span><span class="o">=</span><span class="s">"gemini-2.0-flash"</span><span class="p">,</span>
    <span class="n">config</span><span class="o">=</span><span class="n">model_config</span><span class="p">,</span>
    <span class="n">contents</span><span class="o">=</span><span class="n">prompt</span><span class="p">)</span>
</code></pre></div></div>

<p>Output:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"token"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ABC"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"total_supply"</span><span class="p">:</span><span class="w"> </span><span class="s2">"1,000,000,000"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"distribution"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"public"</span><span class="p">:</span><span class="w"> </span><span class="s2">"40%"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"team"</span><span class="p">:</span><span class="w"> </span><span class="s2">"20%"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"ecosystem"</span><span class="p">:</span><span class="w"> </span><span class="s2">"30%"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"reserves"</span><span class="p">:</span><span class="w"> </span><span class="s2">"10%"</span><span class="w">
  </span><span class="p">},</span><span class="w">
  </span><span class="nl">"vesting"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Team tokens vest over 24 months with a 6-month cliff."</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<hr />

<h2 id="️-limitations">⚠️ Limitations</h2>

<p>While this MVP shows promising results, there are challenges to address:</p>

<ul>
  <li><strong>Structured extraction isn’t perfect</strong>: The model can hallucinate values if the context isn’t clear.</li>
  <li><strong>Longer documents may lose context</strong>: RAG pipelines work best when the relevant chunk is retrieved, but that’s not always guaranteed.</li>
  <li><strong>No live data yet</strong>: This version analyzes static documents. It doesn’t yet connect to APIs like CoinGecko or social sentiment trackers.</li>
</ul>

<hr />

<h2 id="-the-future-of-cryptointel">🔮 The Future of CryptoIntel</h2>

<p>This is just the beginning. Here’s where we’re taking this:</p>

<p>✅ <strong>Real-Time API Integration</strong><br />
Pull data from CoinMarketCap, Etherscan, or Glassnode to enrich answers with live stats.</p>

<p>✅ <strong>Multi-Modal Understanding</strong><br />
Add image/chart recognition to analyze tokenomics diagrams or financial models in whitepapers.</p>

<p>✅ <strong>Sentiment Analysis</strong><br />
Scrape Twitter/X, Reddit, and news to add emotional context to market narratives.</p>

<p>✅ <strong>Agentic Reasoning</strong><br />
Use LangChain Agents to create multi-step workflows — e.g., retrieve info → correlate events → generate risk report.</p>

<p>✅ <strong>Enterprise Deployment</strong><br />
Offer crypto hedge funds and analysts a secure on-premises or API-based platform to automate research workflows.</p>

<hr />

<h2 id="-final-thoughts">👋 Final Thoughts</h2>

<p>This MVP is a glimpse into how <strong>Generative AI</strong> can transform the way we analyze, understand, and act on complex information — especially in the high-stakes, fast-paced world of crypto.</p>

<p>If you’re exploring how to integrate generative AI into your own financial analysis or document-heavy workflows, reach out — or fork the repo and build from here.</p>

<hr />]]></content><author><name>Data Star</name><email>jafarisadr+datastar@gmail.com</email></author><category term="crypto" /><category term="AI" /><category term="LLM" /><category term="RAG" /><category term="GenerativeAI" /><category term="Gen AI Intensive Course" /><category term="Capstone 2025Q1" /><summary type="html"><![CDATA[An AI-Powered Agent for Smart Crypto Market Insights]]></summary></entry><entry><title type="html">The beginning</title><link href="https://msadr.ir/the-beggining/" rel="alternate" type="text/html" title="The beginning" /><published>2018-03-03T02:03:03+03:30</published><updated>2018-03-03T02:03:03+03:30</updated><id>https://msadr.ir/the-beggining</id><content type="html" xml:base="https://msadr.ir/the-beggining/"><![CDATA[<h4 id="data-and-the-world">Data and the World</h4>

<p>Here I am going to publish some articles about Data Science, Big data and Business Intelligence.</p>

<p>And this is just the beginning…</p>]]></content><author><name>Data Star</name><email>jafarisadr+datastar@gmail.com</email></author><category term="zero" /><category term="beginning" /><summary type="html"><![CDATA[Data and the World]]></summary></entry></feed>