<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://aus10powell.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://aus10powell.github.io/" rel="alternate" type="text/html" /><updated>2025-08-05T06:01:13+00:00</updated><id>https://aus10powell.github.io/feed.xml</id><title type="html">aus10powell.io</title><subtitle>aus10powell.github.github.io</subtitle><author><name>Austin Powell</name></author><entry><title type="html">Wise Words of Ovid</title><link href="https://aus10powell.github.io/nlp/llm/WiseWordsofOvid/" rel="alternate" type="text/html" title="Wise Words of Ovid" /><published>2025-02-01T00:00:00+00:00</published><updated>2025-02-01T00:00:00+00:00</updated><id>https://aus10powell.github.io/nlp/llm/WiseWordsofOvid</id><content type="html" xml:base="https://aus10powell.github.io/nlp/llm/WiseWordsofOvid/"><![CDATA[<p><strong>A wise bot retrieving quotes from antiquity</strong></p>

<blockquote>
"The mind which is conscious of right, rectitude, undeviating integrity, despises, laughs at, treats with contempt, the lies of rumor."
<footer>— Ovid (Fasti)</footer>
</blockquote>

<h2 id="problem-definition-and-goal">Problem Definition and Goal</h2>

<h3 id="challenge">Challenge</h3>
<p>Generally, to reply to a different X tweets (often political or opinionated in nature) with a quote that strikes a neutral tone. Depending on humerous/ironc/sarcastic tone if comment is overtly negative.</p>

<ul>
  <li>Prompt-based “wisdom” responses based on sentiment using <a href="https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment?text=Whoa.+CNN+is+now+reporting+that+several+Republican+voters+have+said+that+they+voted+for+Tom+Suozzi+today+because+Republicans+sabotaged+the+border+security+deal.+This+is+huge+%26+amazing.+Voters+see+right+through+Republican+nonsense+%26+they+are+making+them+find+out+big+time.">cardiffnlp/twitter-roberta-base-sentiment</a>. 
E.g.:</li>
</ul>

<blockquote>
  <p>“Here is why I think we’re seeing this: Time &amp; time again, the media focuses on polls &amp; draws conclusions about the state of the race without focusing on 1.) all Democrats have done &amp; 2.) the real threat of Trump/MAGA extremism. But VOTERS ARE NOT DUMB! Wake up, media.”</p>
</blockquote>

<h3 id="goal">Goal</h3>
<ul>
  <li>Replying to tweets with relevant quotes.</li>
  <li>Using sentiment analysis to determine tone (e.g., humorous, neutral).</li>
</ul>

<h3 id="generatingextracting-quotes">Generating/Extracting Quotes:</h3>
<h4 id="data-and-challenges">Data and Challenges</h4>
<p>Most of the works are found easily online written in poetic form which is not as straight-forward to parse as regular documents. Found a vast improvement in text extraction from Ovid works using 300 dimension vs 200 dimension embeddings. This is consistent with many of the recommendation for creating RAGs.</p>

<ul>
  <li><strong>Data:</strong> Extracting quotes from Ovid’s poetic works.</li>
  <li><strong>Key Challenges:</strong>
    <ul>
      <li>Parsing poetic text.</li>
      <li>Avoiding hallucinations in LLM-generated quotes.</li>
    </ul>
  </li>
</ul>

<h4 id="tasks-involved-in-problem-definition">Tasks Involved in Problem Definition</h4>
<h5 id="ovid-quote-retrieval-task">Ovid Quote Retrieval Task</h5>
<ul>
  <li>Quote retreival from different poet works of Ovid for main tweet content.
    <ul>
      <li>Quote generation using different LLMs
        <ul>
          <li>The expected challenge of hallucinations arises here particurly when the LLM was not allowed to choose which person it was quoting</li>
          <li>RAG: E.g. “What did Ovid have to say about different political parties”</li>
        </ul>
      </li>
      <li>Sentence-similarity (see <a href="https://arxiv.org/pdf/2403.05440.pdf">Is Cosine-Similarity of Embeddings Really About Similarity?</a>) which was most useful in weeding out totally useless chunks of text. However, as far as generating memorable quotes similar in concept even it was not so much:
        <ul>
          <li><strong>Input sentence:</strong> Plurality should never be proposed unless needed. (William Occam)</li>
          <li>
            <table>
              <tbody>
                <tr>
                  <td><strong>Output sentence 1:</strong> Everything should be made as simple as possible, but not simpler. (Albert Einstein)</td>
                  <td>0.217 similarity using MLX LLM</td>
                </tr>
              </tbody>
            </table>
          </li>
          <li>
            <table>
              <tbody>
                <tr>
                  <td><strong>Output sentence 2:</strong> Simplicity is the ultimate sophistication. ( Leonardo da Vinci)</td>
                  <td>0.175 similarity using MLX LLM</td>
                </tr>
              </tbody>
            </table>
          </li>
        </ul>
      </li>
    </ul>
  </li>
  <li>Responses:
    <ul>
      <li>RAG-based:
        <ul>
          <li>Metrics:
            <ul>
              <li><strong>Context Relevance:</strong> Ensuring the retrieved context is pertinent to the user query, utilizing LLMs for context relevance scoring.</li>
              <li><strong>Groundedness:</strong> Separating the response into statements and verifying each against the retrieved context.</li>
              <li><strong>Answer Relevance:</strong> Checking if the response aptly addresses the original question.</li>
            </ul>
          </li>
        </ul>
      </li>
      <li>Political: For political determined the type of response (LLM can generate the actual text), sentiment can be used but also clustering may be useful <a href="https://towardsdatascience.com/clustering-sentence-embeddings-to-identify-intents-in-short-text-48d22d3bf02e">Clustering Sentence Embeddings to Identify Intents in Short Text</a></li>
      <li><strong>Take-away:</strong> RAG-based methods were better at making factual statements, e.g. “Who is Arachne?” -&gt; “Arachne is the protagonist of a tale in Greek mythology known primarily from the version told by the Roman poet Ovid, which is the earliest extant source for the story.”. Subjective</li>
    </ul>
  </li>
</ul>

<h4 id="tweets-needed-tweet-probability-of-repeated-occurrence-over-time">Tweets needed: Tweet Probability of Repeated Occurrence over Time</h4>
<p>Baseline for tweets retrieved needs to seed initial content with replacement.</p>

\[\begin{equation*}P(\text{repeated in 2 weeks}) = 1 - \left( \frac{N - 1}{N} \right)^{\frac{14}{m}}\end{equation*}\]

<p>Substituting in the values:</p>

\[\begin{equation*} P(\text{repeated in 2 weeks}) = 1 - \left( \frac{99}{100} \right)^{14}\end{equation*}\]

<p>Calculating this gives:</p>

\[\begin{equation*} P(\text{repeated in 2 weeks}) \approx 0.135\end{equation*}\]

<p>Personally, I’d like it to be under 10% for a longer period of time (increases probability), so if I tweak the numbers a bit 
upping bank of tweets (N = 200 tweets) @ days = 21:</p>

\[\begin{equation*}P(\text{repeated in 3 weeks}) \approx  0.0980\end{equation*}\]

<h5 id="interesting-quote-hallucinations">Interesting Quote “Hallucinations”:</h5>
<ul>
  <li>Fictitious author
    <ul>
      <li><strong>Prompt:</strong> Respond to every input with a quote from your peers and the first and last name of who said it.</li>
      <li><strong>Input:</strong> “all of this has happened before, all of this will happen again” (Sam Altman)</li>
      <li><strong>Output:</strong> “The only way to deal with the void is to continue to try to fill it.” - John Smith</li>
    </ul>
  </li>
  <li>Fictitious quote from author:
    <ul>
      <li><strong>Prompt:</strong> <em>Respond to every input with a quote from your peers and the first and last name of who said it.</em></li>
      <li><strong>Input:</strong> <em>Everyone talking re: over-education of youth. But that’s not the problem. Real issue is believing certain work is beneath you. We’ve all answered phones. <u>Jensen Huang</u> (NW $81B) cleaned toilets. I was broke prep cook in a Jewish deli. Gotta start somewhere</em></li>
      <li><strong>Output:</strong> “Real issue is believing certain work is beneath you.” - <u>Jensen Huang</u></li>
    </ul>
  </li>
  <li>Postured quote: From no one in particular and not particularly memerable.
    <ul>
      <li><strong>Prompt:</strong>  <em>Generate a response to a self-promotional tweet expressing a strong stance against politics.</em></li>
      <li><strong>Input:</strong>  <em>We all come from somewhere, but we’re all Americans.</em></li>
      <li><strong>Output:</strong> “We all come from different places and backgrounds, but at the end of the day, we’re all Americans.” -Unknown</li>
    </ul>
  </li>
</ul>

<h4 id="fine-tuning-based-on-gold-standard">Fine-tuning based on gold standard</h4>

<p>I wanted to see how much leverage I could get out of a few-shot training examples. I used the <em>babbage-002</em> model so perhaps I would have gotten more leverage out of GPT3.*. Even so, the results were not encouraging for ~10 political tweet examples with glib and sometimes stoic quotes from figures such as MLK. It begs the question on how much effort is put into making an LLM work.</p>

<p><em>Gavin Newsom Tweet</em></p>
<blockquote>
  <p>Another day, another common sense gun safety law struck down by a right-wing judge.</p>

  <p>This is exactly why we need to amend the U.S. Constitution and establish a Right to Safety. https://sandiegouniontribune.com/news/courts/story/2024-03-12/&gt; san-diego-federal-judge-strikes-down-california-law-limiting-high-frequency-gun-purchases</p>
</blockquote>

<p><em>Fine-tuned response</em></p>
<blockquote>
  <p>#2
VenezuelanCuban View Post Originally posted by Hopefully the whole country with be armed like in Venezuela.</p>

  <p>It will be, but I don’t see how they will be armed. I don’t think it’s possible. They’re so used to the stateless existence that they will refuse to live under it. They will not adopt the military industrial complex mentality, which in their view is a tool of the state. They’ll have to form their own government. Only then can</p>
</blockquote>

<h2 id="dual-encoder">Dual-Encoder:</h2>
<p>A Dual-Encoder model can help with effective quote retrieval tailored to tweet contexts. This architecture excels in retrieval tasks by computing and comparing embeddings for two input types—in this case, tweet content and a collection of quotes from Ovid’s works.</p>

<p>Semantic Understanding: Dual-Encoders map inputs like tweets and quotes into high-dimensional embeddings that effectively capture their semantic meaning. This ensures the retrieved quote aligns with the tweet’s tone, sentiment, and context.</p>

<p>Scalability: Once encoded into embeddings, quotes can be efficiently searched using similarity measures such as cosine similarity. This approach is ideal for handling large datasets of poetic quotes, enabling real-time response generation.</p>

<p>Tailored Responses with Flexibility: Fine-tuning Dual-Encoder models allows for nuanced and highly relevant responses.</p>

<div align="center">
 <img src="/assets/images/wisewordsofovid/dual_encoder_architectures.png" alt="askDocs example" style="width:350px;" />
<figcaption>Different architectures of Dual-Encoders (https://aclanthology.org/2022.emnlp-main.640.pdf)</figcaption>
</div>

<h2 id="technology-stack">Technology Stack</h2>

<h3 id="infrastructure">Infrastructure:</h3>

<ul>
  <li>Digital Ocean Droplets utilized for hosting, ensuring efficient and scalable deployment.</li>
  <li>Github Actions employed for Continuous Integration/Continuous Deployment (CI/CD) and workflow automation, streamlining the development pipeline.</li>
  <li>X API for deployment</li>
</ul>

<h3 id="machine-learning-stack">Machine Learning Stack:</h3>

<ul>
  <li>Leveraging Huggingface’s library for Twitter Sentiment.</li>
  <li>Incorporating OpenAI GPT-3.5 Turbo for queries</li>
  <li>Langchain integrated to handle specific language-related tasks involving GPT.</li>
</ul>

<h3 id="model-training">Model Training:</h3>
<ul>
  <li>Training/experimentation with natural language models on diverse datasets to enhance performance.</li>
  <li>Fine-tuning GPT-3.5 Turbo on domain-specific tasks, optimizing for accuracy and relevance.</li>
</ul>

<h2 id="relevant-reading">Relevant Reading</h2>
<ul>
  <li>TWEETEVAL is a standardized test bed for seven tweet classification tasks. These are: sentiment analysis, emotion recognition, offensive language detection, hate speech detection, stance prediction, emoji prediction, and irony detection. <a href="https://arxiv.org/pdf/2010.12421.pdf">TWEETEVAL: Unified Benchmark and Comparative Evaluation for Tweet Classification</a></li>
</ul>

<h2 id="future-work">Future Work</h2>
<p>The current focus is on retrieval, but Dual-Encoders could evolve into a hybrid system that combines retrieval and generation. Using a recommendation system that understands context, relevance, and even mood, it could not only rank and refine retrieved quotes but also adapt or paraphrase them using a generative model. This approach ensures the insights provided are tailored, dynamic, and highly relevant.</p>]]></content><author><name>Austin Powell</name></author><category term="NLP" /><category term="LLM" /><summary type="html"><![CDATA[A bot generating memorable quotes and replies based on works of Ovid]]></summary></entry><entry><title type="html">2025 Technical Reading</title><link href="https://aus10powell.github.io/2025/papers/Reading/" rel="alternate" type="text/html" title="2025 Technical Reading" /><published>2025-01-01T00:00:00+00:00</published><updated>2025-01-01T00:00:00+00:00</updated><id>https://aus10powell.github.io/2025/papers/Reading</id><content type="html" xml:base="https://aus10powell.github.io/2025/papers/Reading/"><![CDATA[<p><strong>2025 Reading List</strong></p>

<h2 id="nlp">NLP</h2>

<h3 id="llms">LLMs</h3>

<h4 id="there-are-only-6-rag-evals"><a href="https://jxnl.co/writing/2025/05/19/there-are-only-6-rag-evals/">There Are Only 6 RAG Evals</a></h4>
<blockquote>
  <table>
    <tbody>
      <tr>
        <td>This article argues that RAG (Retrieval-Augmented Generation) evaluation can be simplified to six core metrics based on the relationships between the question (Q), context (C), and answer (A). The author organizes these into three tiers: Tier 1 focuses on retrieval precision and recall; Tier 2 covers primary relationships like context relevance (C</td>
        <td>Q), faithfulness (A</td>
        <td>C), and answer relevance (A</td>
        <td>Q); and Tier 3 includes advanced metrics such as context support (C</td>
        <td>A), question answerability (Q</td>
        <td>C), and self-containment (Q</td>
        <td>A). The framework aims to provide a systematic and exhaustive way to diagnose and improve RAG systems.</td>
      </tr>
    </tbody>
  </table>
</blockquote>

<h3 id="foundations-of-large-language-models"><a href="https://arxiv.org/abs/2501.09223">Foundations of Large Language Models</a></h3>
<blockquote>
  <p>Broad overview of the following topics: Model architectuers and training, activation functions, learning techniques, data handling and preocessing, optimization and efficiency, concepts in learning, applciations and methods, techniquies and strategies.</p>
</blockquote>

<h3 id="scaling-laws-for-precision"><a href="https://arxiv.org/abs/2411.04330">Scaling Laws for Precision</a></h3>
<blockquote>
  <p>Training in lower precision reduces the model’s “effective parameter count,” allowing us to predict the additional loss incurred from training in low precision and post-train quantization. Authors find that lower precision training can be more compute-efficient, but it can also lead to worse performance. They also find that there is a trade-off between the amount of data a model is trained on and the precision at which it is trained. For example, a model that is trained on a lot of data may perform worse if it is quantized to a lower precision after training.
In conclusion, this paper shows that the precision of a language model can have a significant impact on its performance. It is important to consider both the training and inference precision when choosing a precision for a language model.</p>
</blockquote>

<h4 id="the-era-of-1-bit-llms-all-large-language-models-are-in-158-bits"><a href="https://arxiv.org/abs/2402.17764">The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits</a></h4>
<blockquote>
  <p>BitNet b1.58 is a new way to make AI language models use less memory and power by reducing each parameter to just three values: -1, 0, or 1. The model works as well as regular 16-bit models when it reaches 3 billion parameters, but uses about 3.5 times less memory and runs almost 3 times faster. At larger sizes (70 billion parameters), it runs even better - about 4 times faster and uses much less memory than standard models. The model saves a lot of energy too - using 71 times less power for its main calculations compared to regular models. Tests show it performs well on language tasks and can handle long training sessions (2 trillion tokens) with good results.</p>
</blockquote>

<h4 id="smarter-better-faster-longer-a-modern-bidirectional-encoder-for-fastmemory-efficient-and-long-context-finetuning-and-inference"><a href="https://arxiv.org/pdf/2412.13663">Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast,Memory Efficient, and Long Context Finetuning and Inference</a></h4>
<blockquote>
  <p>ModernBERT is an improved version of the BERT model. It’s designed for tasks like retrieval and classification. The model was trained on 2 trillion tokens and can handle sequences up to 8192 tokens long. It uses modern techniques like rotary positional embeddings and Gated Linear Units. This makes it faster and more memory-efficient. ModernBERT achieves top results in various evaluations, including classification tasks and retrieval in different domains, such as code. It’s also optimized to run efficiently on common GPUs.</p>
</blockquote>

<h4 id="star-a-simple-training-free-approach-for-recommendations-using-large-language-models"><a href="https://arxiv.org/pdf/2410.16458">STAR: A Simple Training-free Approach for Recommendations using Large Language Models</a></h4>
<blockquote>
  <p>STAR is a new framework using large language models for recommendations without fine-tuning, featuring a two-stage process: retrieval (using semantic similarity and collaborative information) and ranking. Testing on Amazon Review datasets showed STAR outperformed supervised models on Beauty and Toys categories, and performed nearly as well on Sports items. This suggests LLMs can be effective for recommendation systems without the cost and complexity of fine-tuning.</p>
</blockquote>

<h3 id="agents-llm-based">Agents (LLM-based)</h3>

<h4 id="agentigraph-an-interactive-knowledge-graph-platform-for-llm-based-chatbots-utilizing-private-data"><a href="https://arxiv.org/pdf/2410.11531">AGENTiGraph: An Interactive Knowledge Graph Platform for LLM-based Chatbots Utilizing Private Data</a></h4>
<blockquote>
  <p>LLMs are powerful but can struggle with factual consistency and require complex queries. Traditional KG tools are difficult to use and require technical expertise. AGENTiGraph bridges the gap between LLMs and KGs. It uses a multi-agent system where each agent has a specific role, such as interpreting user intent or translating queries into graph operations. This allows for more natural language interaction with KGs and improved accuracy in tasks like question answering.</p>
</blockquote>

<h4 id="agents-are-not-enough"><a href="https://arxiv.org/pdf/2412.16241v1">Agents are not enough</a></h4>

<p>Current AI agents are limited. They can’t handle complex tasks or adapt to different situations.</p>

<ul>
  <li>
    <p>There are historical challenges with agents, such as limited capabilities and lack of trust.</p>
  </li>
  <li>
    <p>To improve agents, the authors propose three things:</p>
    <ul>
      <li>A secure version for private tasks.</li>
      <li>A user representation to avoid constant user input.</li>
      <li>A program to manage interactions between user and agents.</li>
    </ul>
  </li>
</ul>

<p>The idea is to create an ecosystem with different components working together:</p>

<ul>
  <li>
    <p>Agents: Focus on specific tasks and can work with each other.</p>
  </li>
  <li>
    <p>Sims: Represent users with their preferences and privacy settings.</p>
  </li>
  <li>
    <p>Assistants: Interact with users and manage Sims and Agents to complete tasks.</p>
  </li>
</ul>

<h4 id="agents-chip-hyuen"><a href="https://huyenchip.com//2025/01/07/agents.html">Agents (Chip Hyuen)</a></h4>

<blockquote>
  <p>AI agents perceive and act on their environment, with their capabilities defined by available tools and the environment itself. Tools are essential for agents to perceive (read) and act (write), augmenting knowledge, extending capabilities (like math or code execution), and enabling real-world actions. Planning is crucial for complex tasks, requiring plan generation, validation (by heuristics or AI), and execution, ideally decoupled to prevent wasted resources. Foundation models can be used for planning, especially when provided with information about action outcomes, and function calling enables tool use within model APIs. Effective agents require careful consideration of planning granularity, reflection/error correction, tool selection, and robust evaluation to address potential failures in planning, tool usage, or efficiency.</p>
</blockquote>

<h4 id="a-survey-on-llm-powered-agents-for-recommender-systems"><a href="https://arxiv.org/abs/2502.10050">A Survey on LLM-powered Agents for Recommender Systems</a></h4>
<blockquote>
  <p>This survey reviews LLM-powered agents in recommender systems, categorizing them into three paradigms: recommender-oriented (enhancing core mechanisms), interaction-oriented (improving user dialogue), and simulation-oriented (modeling complex interactions). It analyzes agent architecture (profile, memory, planning, action) and discusses datasets, evaluation, challenges, and future research directions.</p>
</blockquote>

<h4 id="llm-powered-agents-for-recommender-systems-a-comprehensive-survey"><a href="https://arxiv.org/pdf/2308.14296">LLM-powered Agents for Recommender Systems: A Comprehensive Survey</a></h4>
<blockquote>
  <p>This comprehensive survey examines the integration of LLM-powered agents in recommender systems, focusing on their architecture, capabilities, and applications. The paper discusses how these agents can enhance recommendation quality through better understanding of user preferences, improved interaction patterns, and more sophisticated reasoning capabilities. It also addresses key challenges in deployment, including computational efficiency, privacy concerns, and evaluation methodologies.</p>
</blockquote>

<h4 id="mrkl-systems-a-modular-neuro-symbolic-architecture-that-combines-large-language-models-external-knowledge-sources-and-discrete-reasoning"><a href="https://arxiv.org/abs/2205.00445">MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning</a></h4>
<blockquote>
  <p>Proposes MRKL (Modular Reasoning, Knowledge and Language), a neuro-symbolic architecture to overcome limitations of large language models (LMs). It integrates multiple neural models with discrete knowledge and reasoning modules, aiming to handle complex tasks involving knowledge, reasoning, and language processing more effectively than standalone LMs. The paper outlines the architecture, implementation challenges, and AI21 Labs’ implementation, Jurassic-X.</p>
</blockquote>

<h4 id="design-patterns-for-securing-llm-agents-against-prompt-injections"><a href="https://arxiv.org/abs/2506.08837">Design Patterns for Securing LLM Agents against Prompt Injections</a></h4>
<blockquote>
  <p>As AI agents powered by Large Language Models (LLMs) become increasingly versatile and capable of addressing a broad spectrum of tasks, ensuring their security has become a critical challenge. This paper proposes a set of principled design patterns for building AI agents with provable resistance to prompt injection attacks, which exploit the agent’s reliance on natural language inputs. The authors systematically analyze these patterns, discuss their trade-offs in terms of utility and security, and illustrate their real-world applicability through case studies. This is particularly important when agents are granted tool access or handle sensitive information.</p>
</blockquote>

<h2 id="machine-learning">Machine Learning</h2>

<h4 id="improving-pinterest-search-relevance-using-large-language-models"><a href="https://medium.com/pinterest-engineering/improving-pinterest-search-relevance-using-large-language-models-4cd938d4e892">Improving Pinterest Search Relevance Using Large Language Models</a></h4>
<blockquote>
  <p>Pinterest improved their search relevance using a two-stage approach: (1) A cross-encoder LLM teacher model trained on human-annotated data to predict 5-scale relevance scores, and (2) A lightweight student model trained via knowledge distillation for production serving. The system leverages rich Pin text features including titles, descriptions, synthetic image captions, and user engagement data. The LLM-based approach improved search feed relevance by 2.18% (nDCG@20) and increased fulfillment rates by over 1.5% globally. Key innovations include using multilingual LLMs to generalize across languages, enriched text representations, and large-scale semi-supervised learning through distillation.</p>
</blockquote>

<h4 id="reinforcement-learning-an-overview"><a href="https://arxiv.org/pdf/2412.05265">Reinforcement Learning: An Overview</a></h4>
<blockquote>
  <p>This is a pretty hefty paper and can take several hours to read through once but is a good refresher and/or overview. It provides a comprehensive overview of reinforcement learning (RL) and sequential decision making from end of 2024 (December). Based off of a textbook from Kevin Murphy. The paper covers value-based RL, policy-gradient methods, and model-based approaches. It also briefly discusses the integration of RL with large language models (LLMs). This work updates and expands upon chapters 34 and 35 of Murphy’s earlier textbook.</p>
</blockquote>

<h4 id="healthbench-a-benchmark-for-evaluating-language-models-in-healthcare"><a href="https://openai.com/index/healthbench/">HealthBench: A Benchmark for Evaluating Language Models in Healthcare</a></h4>
<blockquote>
  <p>HealthBench is a new benchmark for evaluating language models in healthcare applications. It focuses on assessing models’ capabilities in medical knowledge, clinical reasoning, and patient communication. The benchmark includes tasks like medical question answering, clinical case analysis, and patient education. This is particularly important as healthcare applications require high accuracy and reliability, and the benchmark helps identify areas where models need improvement to be safely deployed in healthcare settings.</p>
</blockquote>]]></content><author><name>Austin Powell</name></author><category term="2025" /><category term="Papers" /><summary type="html"><![CDATA[Technical articles, research papers, etc. read in 2025]]></summary></entry><entry><title type="html">2024 Technical Reading</title><link href="https://aus10powell.github.io/2024/papers/Reading/" rel="alternate" type="text/html" title="2024 Technical Reading" /><published>2024-01-01T00:00:00+00:00</published><updated>2024-01-01T00:00:00+00:00</updated><id>https://aus10powell.github.io/2024/papers/Reading</id><content type="html" xml:base="https://aus10powell.github.io/2024/papers/Reading/"><![CDATA[<p><strong>2024 Reading List</strong></p>

<h2 id="nlp">NLP</h2>

<h3 id="large-language-models">Large-Language Models</h3>

<h4 id="memory-and-mips-max-inner-product-search"><a href="https://chat.openai.com/share/46ff149e-a4c7-4dd7-a800-fc4a642ea389">Memory and MIPS (Max Inner Product Search)</a></h4>
<blockquote>
  <p>Generally speaking, this deals with the concepts of having large corpus of documents that are represented in a high-dimensional space. After being given a new document, you want to find the document in your existing collection that is most similar to it. This is the maximum inner product search. The chat with GPT involves a discussion of this idea in relation to memory.</p>
</blockquote>

<h4 id="transforming-wearable-data-into-health-insights-using-large-language-model-agents"><a href="https://arxiv.org/pdf/2406.06464">Transforming wearable data into health insights using large language model agents</a></h4>
<blockquote>
  <p>An AI system harnessing cutting-edge technology to revolutionize how wearable health data translates into actionable advice for individuals, tackling over 84% of numerical questions
and a vast majority of open-ended inquiries.</p>
</blockquote>

<h4 id="mealrec-a-meal-recommendation-dataset-with-meal-course-affiliation-for-personalization-and-healthiness"><a href="https://arxiv.org/pdf/2404.05386">MealRec+: A Meal Recommendation Dataset with Meal-Course Affiliation for Personalization and Healthiness</a></h4>
<blockquote>
  <p>The task of meal recommendation involves intricate connections among users, courses, and meals, particularly through meal-course affiliation, yet existing datasets lack this crucial affiliation. To address this gap, the MealRec+ dataset is introduced, leveraging simulation methods to derive meal-course affiliation and user-meal interactions, and demonstrating that cooperative learning of these interactions improves the effectiveness of meal recommendations, with efforts also made to enhance the healthiness of recommendations.</p>
</blockquote>

<h4 id="llmner-zerofew-shot-named-entity-recognition-exploiting-the-power-of-large-language-models"><a href="https://arxiv.org/abs/2406.04528">llmNER: (Zero|Few)-Shot Named Entity Recognition, Exploiting the Power of Large Language Models</a></h4>
<blockquote>
  <p>llmNER is a Python library designed for zero-shot and few-shot named entity recognition (NER) tasks using Large Language Models (LLMs). It simplifies prompt composition, model querying, and result parsing, facilitating efficient prompt engineering for NER applications. The library demonstrates versatility through validation on two NER tasks, aiming to streamline in-context learning research by enhancing prompt and parsing procedures.</p>
</blockquote>

<h4 id="machine-unlearning-in-2024"><a href="https://ai.stanford.edu/~kzliu/blog/unlearning">Machine Unlearning in 2024</a></h4>
<blockquote>
  <p>A focuse on removing specific information from trained machine learning (ML) models without retraining them from scratch. The goal being editing away undesirable data, such as private information, outdated knowledge, copyrighted material, harmful content, and misinformation.</p>
</blockquote>

<h4 id="adaptive-retrieval-augmented-generation-for-conversational-systems"><a href="https://arxiv.org/pdf/2407.21712">Adaptive Retrieval-Augmented Generation for Conversational Systems</a></h4>
<blockquote>
  <p>This research addresses the question of whether Retrieval Augmented Generation (RAG) is always necessary in conversational AI systems. The study introduces RAGate, a gating model that predicts when external knowledge augmentation is needed for improved responses, based on conversation context and human judgments.</p>
</blockquote>

<h4 id="medical-llms-a-practical-guide"><a href="https://github.com/AI-in-Health/MedLLMsPracticalGuide">Medical LLMS: A practical guide</a></h4>
<blockquote>
  <p>An github website for curating papers and resources around healthcare and LLMs.</p>
</blockquote>

<h4 id="from-beginner-to-expert-modeling-medical-knowledge-into-general-llms"><a href="https://arxiv.org/abs/2312.01040">From Beginner to Expert: Modeling Medical Knowledge into General LLMs</a></h4>
<blockquote>
  <p>Demonstrates how a smaller-scale language model (10B parameters) can be effectively adapted for medical applications through a 3-stage optimization process: medical knowledge injection, domain instruction tuning, and specific task adaptation. The team developed large-scale medical datasets for each stage and introduced a novel Verification-of-Choice approach for multi-choice questions, significantly enhancing the model’s reasoning abilities. Remarkably, the resulting AntGLM-Med-10B model outperforms many larger language models, including those specifically trained on medical knowledge, showing that strategic fine-tuning and task-specific innovations can compensate for smaller model size in specialized domains.</p>
</blockquote>

<h4 id="lora-low-rank-adaptation-of-large-language-models"><a href="https://arxiv.org/pdf/2106.09685">LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS</a></h4>
<blockquote>
  <p>LoRA is a method of using the Low-rank weights from LLMs to reduce the amount of training needing to be done while perserving most of the original performance. The gain in training time according to paper can be over a factor of 4</p>
</blockquote>

<h2 id="computer-vision">Computer Vision</h2>

<h4 id="yolov10-real-time-end-to-end-object-detection"><a href="https://arxiv.org/abs/2405.14458">YoloV10: Real-Time End-to-End Object Detection</a></h4>
<blockquote>
  <p>Among other improvements in efficiency, the highlight may be tackeling the speed issues caused by NMS.</p>
</blockquote>

<h4 id="faiss-a-library-for-efficient-similarity-search"><a href="https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/">Faiss: A library for efficient similarity search</a></h4>
<blockquote>
  <p>Facebook AI Similarity Search (Faiss), a library that allows us to quickly search for multimedia documents that are similar to each other.</p>
</blockquote>

<h4 id="discount-counting-in-large-image-collections-with-detector-based-importance-sampling"><a href="https://arxiv.org/abs/2306.03151">DISCount: Counting in Large Image Collections with Detector-Based Importance Sampling</a></h4>
<blockquote>
  <p>A more efficient and accurate method for counting objects in large image collections by combining automated detection with strategic human involvement.</p>
</blockquote>

<h2 id="recommendation-systems">Recommendation Systems</h2>

<h4 id="page-level-optimization-of-e-commerce-item-recommendations"><a href="https://arxiv.org/abs/2108.05891">Page-level Optimization of e-Commerce Item Recommendations</a></h4>

<h4 id="recommending-what-video-to-watch-next-a-multitask-ranking-system"><a href="https://dl.acm.org/doi/10.1145/3298689.3346997">Recommending what video to watch next: a multitask ranking system</a></h4>]]></content><author><name>Austin Powell</name></author><category term="2024" /><category term="Papers" /><summary type="html"><![CDATA[Technical articles, research papers, etc. read in 2024]]></summary></entry><entry><title type="html">Twitter Account Sentiment and Text Generation</title><link href="https://aus10powell.github.io/nlp/api/TwitterAccounts/" rel="alternate" type="text/html" title="Twitter Account Sentiment and Text Generation" /><published>2023-02-16T00:00:00+00:00</published><updated>2023-02-16T00:00:00+00:00</updated><id>https://aus10powell.github.io/nlp/api/TwitterAccounts</id><content type="html" xml:base="https://aus10powell.github.io/nlp/api/TwitterAccounts/"><![CDATA[<p><strong>A text generation and sentiment analysis of 6 Twitter Accounts</strong></p>

<h1 id="twitterapi">TwitterAPI</h1>

<p>The TwitterAPI project aims to perform text generation and sentiment analysis on six different Twitter accounts. The project utilizes NLP techniques, Transformers, and the HuggingFace library to generate tweets in the style of the selected accounts and analyze the sentiment of their tweets. Additionally, the project provides a metric for the health of the account audience and identifies anomalous tweets within the last few months. The project is deployed using FastAPI, Docker, and Gunicorn, with models hosted on HuggingFace Spaces. The code and models are stored in a GitHub repository and synced using GitHub Actions.</p>

<div align="center">
 <img src="/assets/images/twitter_accounts/TwitterAPIScreenshot1.png" alt="askDocs example" style="width:450px;" />
<figcaption>Screenshot of App</figcaption>
</div>

<h2 id="goal">Goal:</h2>

<ul>
  <li>Generate a Tweet in the style of one from the following Twitter Accounts (“alikarimi_ak8”, “elonmusk”,”BarackObama”,”taylorlorenz”,”cathiedwood”,”ylecun”). NOTE: alikarimi_ak8 tweets often in Persian however generated tweets are in English.</li>
  <li>Provide Sentiment Analysis for the 6 accounts NOTE: Based on saved historical data not live data due to the uncertain nature of scraping tweets with Elon Musk acquisition.</li>
  <li>“Account Health” Metric:</li>
</ul>
<div align="center">
 <img src="/assets/images/twitter_accounts/obamahealthmetric.png" alt="askDocs example" style="width:350px;" />
<figcaption>Distribution of Obama Twitter Account metric</figcaption>
</div>

<ul>
  <li>Provide a metric for the account audience health</li>
  <li>Anomalous tweets within the last few months:</li>
</ul>

<h2 id="twitter-accounts">Twitter Accounts</h2>

<p><em>Twitter App (warning: may be blank and take a minute to update while free servers spin up) and if scraping library is being block there may be internal error</em></p>

<div style="text-center;">
    <iframe src="https://aus10powell-twitteraccounts.hf.space" frameborder="100" width="1000" height="1550">
    </iframe>
</div>

<h3 id="deployment">Deployment</h3>

<p><strong>Tech Stack</strong></p>

<ul>
  <li><strong>Web Stack:</strong>
    <ul>
      <li>FastAPI</li>
      <li>Docker (To simply build base potentially on larger deployment)</li>
      <li>Gunicorn is used to spawn the FastAPI on four child worker processes using the Asynchronous Uvicorn Worker Class. Each Uvicorn worker class runs the FastAPI app on a randomly chosen process id. The Gunicorn runs on a process id that can be configured to run on a specified port and handles the request delegation. All four instances of the FastAPI will use the same database created in the Azure Database for PostgreSQL Server. The connection to the database is established and disconnected in the startup and shutdown events of the FastAPI, respectively. The App Service deployment configuration automatically pulls and deploys any changes made to the GitHub repository it is configured with.</li>
      <li>Huggingface Spaces (Free, larger-than-standard free compute and memory. Enough to save models on space without using API)</li>
      <li><em>Docker</em> container deployment on <em>HuggingFace Spaces</em></li>
    </ul>
  </li>
  <li><strong>ML/DevOps:</strong>
    <ul>
      <li>Github Actions (to sync with Huggingface Spaces)</li>
    </ul>
  </li>
  <li><strong>Models:</strong>
    <ul>
      <li>Persion to English: For tweets from <a href="https://huggingface.co/persiannlp/mt5-base-parsinlu-opus-translation_fa_en">mt5-base-parsinlu-opus-translation_fa_en</a></li>
      <li>Sentiment Model Used: <a href="https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment">twitter-roberta-base-sentiment</a></li>
      <li>Generative Model Used: <a href="https://huggingface.co/transformers/v4.4.2/model_doc/gpt2.html">OpenAI GPT2</a>.</li>
    </ul>
  </li>
</ul>

<!-- # Notes:
* 04/26/2023:
  * Trained 2 accounts models
  * Buit response in index.html to generate one response from those 2 accounts
  * Investigated (potentially) hosting the site on huggingface spaces
  * Investigated (potentiallu) hosting models on huggingface to pull into site
  * NOTE: Account alikarimi_ak8 was particularily tricky as most/all tweets are written in Persian for which there is few translation apis freely available. I wound up using a huggingface library and dealing with some interesting emoji issues.
* 04/27/23:
  * TODO:
    * Train the rest of account models
      * Train english models
      * Train Persian to text model
      * resolve model names based of handles
    * Create sentiment dataset for all accounts using huggingface scraper from historical data until now and display in Altair.
    * Summarization:
      * Reformate old code and eliminate unnecessary code
      * Decide on next steps as far as necessary
  * COMPLETED:
    * Created a model for each account
    * Enabled translation for the Persion account
* 05/02/23
  * COMPLETED:
    * Successfully uploaded hugginface model to hugginface model hub in order to enable api from webpage to the hosted site at huggingface avoiding storage of models on webpage.
* 05/06/23
  * COMPLETED:
    * Successfully tested a hosted app as a RESTful API endpoint for all the trained GPT2 models. Main positive take-away from this is that the file requirements
    * Learned that HuggingFace Spaces doesn't allow function call to pass through URLs with their free space...or at least it is difficult to. Using a POST request is better.
  * TODO:
    * Investigate as to whether there is a app structure that can be flexible in deployment to either Azure or Huggingface spaces

* 05/08/23
  * COMPLETED:
    * Test whether the app can be hit with a RESTful call from another hosted site...on hugginface I suppose. It seems the spaces on Huggingface on distributed and need to run on Docker if departing from the strict format they have for static.
* 05/09/23
  * COMPLETED:
    * NOTE: It became too much to try and manage Docker as well as dealing with app when working with Azure Web Services, however the switch to Huggingface for the relative great utilization of their space makes Docker make more sense. For the generall use-case of trying to update and showcase personal projects that are not going to be leaking money, Docker does seem to be the way to go. You don't need to build a ton of images locally either. Just perhaps when you're trouble-shooting a Docker specific issue.
* 05/10/23
  * TODO:
    * Technical:
      * Having Huggingface spaces pull from the github page...or at least be automated.
    * Code within page:
      * Resolve visual issues with displaying notebook html within the page
      * Decide what should actually be displayed in the notebook
        * Sentiment analysis on the four accounts
    * Extra features:
      * Sentiment: Display a sentiment score on the generated response
      * Display a summary of the generated response. NOTE: this would fit in well with the tweet analysis over time.
      *
* 05/12/23
  * COMPLETED:
    * Somewhat justified display on Markdown page
    * Adjusted color-schemes and display of sentiments
    * Started again on summarization of tweets:
      * Reduce a long list of tweets down enough to run a deep learning summarizer on it

* 05/14/23:
  * COMPLETED:
    * Potentially tracked down one issue with the hugging face regarding why the response was not being generated.

* 05/15/23:
  * Completed index.html for returning tweets
  * added tweepy integration for returning specific tweets along with javascript callbacks.
  * TODO:
    * Reformate generate a reply...it doesn't make sense to be at the top

* 05/20/23
  * Completed refactoring of some old summarization and examples.ipynb code
  * Verified that additional information and metrics can be pulled for the tweets

* 06/01/23
  * COMPLETED:
    * Implemented sentiment score on table returned
    * Implemented Persian translation column to return if account was Persian text

* 06/02/23
  * COMPLETED:
    * Initial distributions for evaluating a good "account health" metric.
    *
* 06/03/23
  * COMPLETED:
    * moved spacy embeddings load to app start
    * Fixed the issue of reply not being generated

* 06/05/23
  * COMPLETED:
    * Dive deeper into thinking/justification for metrics
      * Conveneniece (don't have to look at audience tweets)
      * Normalized for different accounts to enable direct comparison
  * METRICS:
    * Account Health Metric:
      * "positives" * "likes" / account_subs:
        * Normalized to compare different "health" for different accunts
        * Reasonable metric for how engaged audience is with tweets that are positive messaging. May speak more to th
        * The logged distribution is surprisingly normal for the different accounts
    * Audience Health Metric:
      -->]]></content><author><name>Austin Powell</name></author><category term="NLP" /><category term="API" /><summary type="html"><![CDATA[A text generation and sentiment analysis of 6 Twitter Accounts]]></summary></entry><entry><title type="html">National Parks</title><link href="https://aus10powell.github.io/hobbies/National-Parks/" rel="alternate" type="text/html" title="National Parks" /><published>2023-01-01T00:00:00+00:00</published><updated>2023-01-01T00:00:00+00:00</updated><id>https://aus10powell.github.io/hobbies/National-Parks</id><content type="html" xml:base="https://aus10powell.github.io/hobbies/National-Parks/"><![CDATA[<h1 id="national-and-state-parks">National and State Parks</h1>

<h2 id="menu">Menu</h2>
<ul>
  <li><a href="#californiastateparks">California State Parks</a></li>
  <li><a href="#nationalparks">National Parks</a></li>
</ul>

<h2 id="california-state-parks">California State Parks</h2>
<p><a href="https://drive.google.com/file/d/14wrth4pBZKE63y84t5T74jB19BzBGOkW/view?usp=sharing">280 California State Parks</a></p>

<h2 id="national-parks">National Parks</h2>

<table>
  <thead>
    <tr>
      <th>NationalParkName</th>
      <th>State</th>
      <th>Year Est.</th>
      <th>Order Est.</th>
      <th>Visitors (2019)</th>
      <th>Location</th>
      <th>Latitude</th>
      <th>Longitude</th>
      <th>Square Feet</th>
      <th>Description</th>
      <th>Visited</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Acadia</td>
      <td>Maine</td>
      <td>1919</td>
      <td>13th</td>
      <td>3,437,286</td>
      <td>44.35°N 68.21°W</td>
      <td>44.3876119</td>
      <td>-68.2039123</td>
      <td>47,389.67 acres (191.8 km2)</td>
      <td>Covering most of Mount Desert Island and other coastal islands, Acadia features the tallest mountain on the Atlantic coast, granite peaks, ocean shoreline, woodlands, and lakes. There are freshwater, estuary, forest, and intertidal habitats.[6]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>American Samoa</td>
      <td>American Samoa</td>
      <td>1988</td>
      <td>50th</td>
      <td>60,006</td>
      <td>14.25°S 170.68°W</td>
      <td>-14.2331268</td>
      <td>-169.4760133</td>
      <td>9,000.00 acres (36.4 km2)</td>
      <td>The southernmost national park is on three Samoan islands and protects coral reefs, rainforests, volcanic mountains, and white beaches. The area is also home to flying foxes, brown boobies, sea turtles, and 900 species of fish.[7]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Arches</td>
      <td>Utah</td>
      <td>1971</td>
      <td>35th</td>
      <td>1,659,702</td>
      <td>38.68°N 109.57°W</td>
      <td>38.5719944</td>
      <td>-109.4735066</td>
      <td>76,518.98 acres (309.7 km2)</td>
      <td>This site features more than 2,000 natural sandstone arches, including the Delicate Arch. In a desert climate millions of years of erosion have led to these structures, and the arid ground has life-sustaining soil crust and potholes, natural water-collecting basins. Other geologic formations are stone columns, spires, fins, and towers.[8]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Badlands</td>
      <td>South Dakota</td>
      <td>1978</td>
      <td>38th</td>
      <td>970,998</td>
      <td>43.75°N 102.50°W</td>
      <td>43.7480316</td>
      <td>-102.4983748</td>
      <td>242,755.94 acres (982.4 km2)</td>
      <td>The Badlands are a collection of buttes, pinnacles, spires, and grass prairies. It has the world’s richest fossil beds from the Oligocene epoch, and there is wildlife including bison, bighorn sheep, black-footed ferrets, and swift foxes.[9]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Big Bend</td>
      <td>Texas</td>
      <td>1978</td>
      <td>27th</td>
      <td>463,832</td>
      <td>29.25°N 103.25°W</td>
      <td>29.2669239</td>
      <td>-103.2201828</td>
      <td>801,163.21 acres (3,242.2 km2)</td>
      <td>Named for the Bend of the Rio Grande along the US–Mexico border, this park includes a part of the Chihuahuan Desert. A wide variety of Cretaceous and Tertiary fossils as well as cultural artifacts of Native Americans exist within its borders.[10]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Biscayne</td>
      <td>Florida</td>
      <td>1980</td>
      <td>41st</td>
      <td>708,522</td>
      <td>25.65°N 80.08°W</td>
      <td>35.746512</td>
      <td>-39.462891</td>
      <td>172,924.07 acres (699.8 km2)</td>
      <td>Located in Biscayne Bay, this park at the north end of the Florida Keys has four interrelated marine ecosystems: mangrove forest, the Bay, the Keys, and coral reefs. Threatened animals include the West Indian Manatee, American crocodile, various sea turtles, and peregrine falcon.[11]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Black Canyon of the Gunnison</td>
      <td>Colorado</td>
      <td>1999</td>
      <td>55th</td>
      <td>432,818</td>
      <td>38.57°N 107.72°W</td>
      <td>38.4948194</td>
      <td>-107.71625</td>
      <td>32,950.03 acres (133.3 km2)</td>
      <td>The park protects a quarter of the Gunnison River, which has dark canyon walls from the Precambrian era. The canyon has very steep descents, and it is a site for river rafting and rock climbing. The narrow, steep canyon, made of gneiss and schist, is often in shadow, appearing black.[12]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Bryce Canyon</td>
      <td>Utah</td>
      <td>1928</td>
      <td>17th</td>
      <td>432,818</td>
      <td>37.57°N 112.18°W</td>
      <td>37.6215335</td>
      <td>-112.1549442</td>
      <td>35,835.08 acres (145.0 km2)</td>
      <td>Bryce Canyon is a giant natural amphitheatre along the Paunsaugunt Plateau. The unique area has hundreds of tall hoodoos formed by erosion. The region was originally settled by Native Americans and later by Mormon pioneers.[13]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Canyonlands</td>
      <td>Utah</td>
      <td>1964</td>
      <td>31st</td>
      <td>733,996</td>
      <td>38.2°N 109.93°W</td>
      <td>38.3337024</td>
      <td>-110.8076084</td>
      <td>337,597.83 acres (1,366.2 km2)</td>
      <td>This landscape was eroded into canyons, buttes, and mesas by the Colorado River, Green River, and their tributaries, which divide the park into three districts. There are rock pinnacles and other naturally sculpted rock, as well as artifacts from Ancient Pueblo Peoples.[14]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Capitol Reef</td>
      <td>Utah</td>
      <td>1971</td>
      <td>36th</td>
      <td>1,226,519</td>
      <td>38.20°N 111.17°W</td>
      <td>37.9430873</td>
      <td>-110.9506636</td>
      <td>241,904.26 acres (979.0 km2)</td>
      <td>The park’s Waterpocket Fold is a 100-mile (160 km) monocline that shows the Earth’s geologic layers. Other natural features are monoliths and sandstone domes and cliffs shaped like the United States Capitol.[15]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Carlsbad Caverns</td>
      <td>New Mexico</td>
      <td>1930</td>
      <td>19th</td>
      <td>440,691</td>
      <td>32.17°N 104.44°W</td>
      <td>32.3391752</td>
      <td>-104.3791912</td>
      <td>46,766.45 acres (189.3 km2)</td>
      <td>Carlsbad Caverns has 117 caves, the longest of which is over 120 miles (190 km) long. The Big Room is almost 4,000 feet (1,200 m) long, and the caves are home to over 400,000 Mexican Free-tailed Bats and sixteen other species. Above ground are the Chihuahuan Desert and Rattlesnake Springs.[16]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Channel Islands</td>
      <td>California</td>
      <td>1980</td>
      <td>40th</td>
      <td>409,630</td>
      <td>34.01°N 119.42°W</td>
      <td>33.9948581</td>
      <td>-119.7194377</td>
      <td>249,561.00 acres (1,009.9 km2)</td>
      <td>Five of the eight Channel Islands are protected, and half of the park’s area is underwater. The islands have a unique Mediterranean ecosystem. They are home to over 2,000 species of land plants and animals, and 145 are unique to them. The islands were originally settled by the Chumash people.[17]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Congaree</td>
      <td>South Carolina</td>
      <td>2003</td>
      <td>57th</td>
      <td>159,445</td>
      <td>33.78°N 80.78°W</td>
      <td>33.8272919</td>
      <td>-80.7436141</td>
      <td>26,545.86 acres (107.4 km2)</td>
      <td>On the Congaree River, this park is the largest portion of old-growth floodplain forest left in North America. Some of the trees are the tallest in the Eastern US, and the Boardwalk Loop is an elevated walkway through the swamp.[18]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Crater Lake</td>
      <td>Oregon</td>
      <td>1902</td>
      <td>5th</td>
      <td>704,512</td>
      <td>42.94°N 122.1°W</td>
      <td>42.9445872</td>
      <td>-122.1090039</td>
      <td>183,224.05 acres (741.5 km2)</td>
      <td>Crater Lake lies in the caldera of Mount Mazama formed 7,700 years ago after an eruption. It is the deepest lake in the United States and is known for its blue color and water clarity. There are two islands in the lake, and, with no inlets or outlets, all water comes through precipitation.[19]</td>
      <td>Yes</td>
    </tr>
    <tr>
      <td>Cuyahoga Valley</td>
      <td>Ohio</td>
      <td>2000</td>
      <td>56th</td>
      <td>2,237,997</td>
      <td>41.24°N 81.55°W</td>
      <td>41.2361578</td>
      <td>-81.55122</td>
      <td>32,860.73 acres (133.0 km2)</td>
      <td>This park along the Cuyahoga River has waterfalls, hills, trails, and displays about early rural living. The Ohio and Erie Canal Towpath Trail follows the Ohio and Erie Canal, where mules towed canal boats. The park has numerous historic homes, bridges, and structures.[20] The park also offers a scenic train ride with various trips available. [21]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Death Valley</td>
      <td>California</td>
      <td>1994</td>
      <td>53rd</td>
      <td>1,740,945</td>
      <td>36.24°N 116.82°W</td>
      <td>36.2460841</td>
      <td>-116.8185076</td>
      <td>3,372,401.96 acres (13,647.6 km2)</td>
      <td>Death Valley is the hottest, lowest, and driest place in the United States. Daytime temperatures have topped 130°F (54°C) and it is home to Badwater Basin, the lowest point in North America. There are canyons, colorful badlands, sand dunes, mountains, and over 1000 species of plants in this graben on a fault line. Further geologic points of interest are salt flats, springs, and buttes.[22]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Denali</td>
      <td>Alaska</td>
      <td>1917</td>
      <td>12th</td>
      <td>601,152</td>
      <td>63.33°N 150.50°W</td>
      <td>63.7477803</td>
      <td>-150.3064953</td>
      <td>4,740,911.72 acres (19,185.8 km2)</td>
      <td>Centered around the Mount McKinley, the tallest mountain in North America, Denali is serviced by a single road leading to Wonder Lake. McKinley and other peaks of the Alaska Range are covered with long glaciers and boreal forest. Wildlife includes grizzly bears, Dall sheep, caribou, and gray wolves.[23]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Dry Tortugas</td>
      <td>Florida</td>
      <td>1992</td>
      <td>51st</td>
      <td>79,200</td>
      <td>24.63°N 82.87°W</td>
      <td>35.746512</td>
      <td>-39.462891</td>
      <td>64,701.22 acres (261.8 km2)</td>
      <td>The Dry Tortugas on the west end of the Florida Keys are the site of Fort Jefferson, the largest masonry structure in the Western Hemisphere. With most of the park being water, it is the home of coral reefs and shipwrecks and is only accessible by plane or boat.[24]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Everglades</td>
      <td>Florida</td>
      <td>1934</td>
      <td>20th</td>
      <td>1,118,300</td>
      <td>25.32°N 80.93°W</td>
      <td>25.2866149</td>
      <td>-80.8986502</td>
      <td>1,508,537.90 acres (6,104.8 km2)</td>
      <td>The Everglades are the largest subtropical wilderness in the United States. This mangrove ecosystem and marine estuary is home to 36 protected species, including the Florida panther, American crocodile, and West Indian manatee. Some areas have been drained and developed; restoration projects aim to restore the ecology.[25]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Gates of the Arctic</td>
      <td>Alaska</td>
      <td>1980</td>
      <td>42nd</td>
      <td>10,518</td>
      <td>67.78°N 153.30°W</td>
      <td>65.8443667</td>
      <td>-153.4302993</td>
      <td>7,523,897.74 acres (30,448.1 km2)</td>
      <td>This northernmost park protects part of the Brooks Range and has no park facilities. The land is home to Alaska natives, who have relied on the land and caribou for 11,000 years.[26]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Gateway Arch</td>
      <td>Missouri</td>
      <td>2018</td>
      <td>60th</td>
      <td>2,055,309</td>
      <td> </td>
      <td> </td>
      <td> </td>
      <td> </td>
      <td> </td>
      <td>No</td>
    </tr>
    <tr>
      <td>Glacier</td>
      <td>Montana</td>
      <td>1910</td>
      <td>8th</td>
      <td>3,049,839</td>
      <td>48.80°N 114.00°W</td>
      <td>48.6162807</td>
      <td>-113.8594034</td>
      <td>1,013,572.41 acres (4,101.8 km2)</td>
      <td>Part of Waterton Glacier International Peace Park, this park has 26 remaining glaciers and 130 named lakes under the tall Rocky Mountain peaks. There are historic hotels and a landmark road in this region of rapidly receding glaciers. These mountains, formed by an overthrust, have the world’s best sedimentary fossils from the Proterozoic era.[27]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Glacier Bay</td>
      <td>Alaska</td>
      <td>1980</td>
      <td>43rd</td>
      <td>672,087</td>
      <td>58.50°N 137.00°W</td>
      <td>58.659708</td>
      <td>-136.573678</td>
      <td>3,224,840.31 acres (13,050.5 km2)</td>
      <td>Glacier Bay has numerous tidewater glaciers, mountains, and fjords. The temperate rainforest and the bay are home to grizzly bears, mountain goats, whales, seals, and eagles. When discovered in 1794 by George Vancouver, the entire bay was covered by ice, but the glaciers have receded over 65 miles (105 km).[28]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Grand Canyon</td>
      <td>Arizona</td>
      <td>1919</td>
      <td>14th</td>
      <td>5,974,411</td>
      <td>36.06°N 112.14°W</td>
      <td>36.0604107</td>
      <td>-112.1411217</td>
      <td>1,217,403.32 acres (4,926.7 km2)</td>
      <td>The Grand Canyon, carved out by the Colorado River, is 277 miles (446 km) long, up to 1 mile (1.6 km) deep, and up to 15 miles (24 km) wide. Millions of years of exposure has formed colorful layers of the Colorado Plateau in mesas and canyon walls.[29]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Grand Teton</td>
      <td>Wyoming</td>
      <td>1929</td>
      <td>18th</td>
      <td>3,405,614</td>
      <td>43.73°N 110.80°W</td>
      <td>43.7325324</td>
      <td>-110.8059887</td>
      <td>309,994.66 acres (1,254.5 km2)</td>
      <td>Grand Teton is the tallest mountain in the Teton Range. The park’s Jackson Hole valley and reflective piedmont lakes contrast with the tall mountains, which abruptly rise from the glacial sage-covered valley.[30]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Great Basin</td>
      <td>Nevada</td>
      <td>1986</td>
      <td>49th</td>
      <td>131,802</td>
      <td>38.98°N 114.30°W</td>
      <td>38.9648711</td>
      <td>-114.1914951</td>
      <td>77,180.00 acres (312.3 km2)</td>
      <td>Based around Wheeler Peak, the Great Basin has 5,000-year-old bristlecone pines, glacial moraines, and the limestone Lehman Caves. It has some of the country’s darkest night skies, and there are animal species including Townsend’s big-eared bat, Pronghorn, and Bonneville cutthroat trout.[31]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Great Sand Dunes</td>
      <td>Colorado</td>
      <td>2004</td>
      <td>58th</td>
      <td>527,546</td>
      <td>37.73°N 105.51°W</td>
      <td>37.6572135</td>
      <td>-105.5474373</td>
      <td>42,983.74 acres (173.9 km2)</td>
      <td>The tallest dunes in North America are up to 750 feet (230 m) tall and neighbor grasslands, shrublands and wetlands. They were formed by sand deposits of the Rio Grande on the San Luis Valley. The park also has alpine lakes, six 13,000-foot mountains, and ancient forests.[32]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Great Smoky Mountains</td>
      <td>Tennessee</td>
      <td>1934</td>
      <td>21st</td>
      <td>12,547,743</td>
      <td>35.68°N 83.53°W</td>
      <td>35.6839416</td>
      <td>-83.5323006</td>
      <td>521,490.13 acres (2,110.4 km2)</td>
      <td>The Great Smoky Mountains, part of the Appalachian Mountains, have a wide range of elevations, making them home to over 400 vertebrate species, 100 tree species, and 5000 plant species. Hiking is the park’s main attraction, with over 800 miles (1,300 km) of trails, including 70 miles (110 km) of the Appalachian Trail. Other activities are fishing, horseback riding, and visiting some of nearly 80 historic structures.[33]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Guadalupe Mountains</td>
      <td>Texas</td>
      <td>1966</td>
      <td>32nd</td>
      <td>188,833</td>
      <td>31.92°N 104.87°W</td>
      <td>31.7786811</td>
      <td>-104.9856176</td>
      <td>86,415.97 acres (349.7 km2)</td>
      <td>This park has Guadalupe Peak, the highest point in Texas, the scenic McKittrick Canyon full of Bigtooth Maples, part of the Chihuahuan Desert, and a fossilized reef from the Permian.[34]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Haleakala</td>
      <td>Hawaii</td>
      <td>1961</td>
      <td>29th</td>
      <td>994,394</td>
      <td>20.72°N 156.17°W</td>
      <td>20.7278194</td>
      <td>-156.3081217</td>
      <td>29,093.67 acres (117.7 km2)</td>
      <td>The Haleakalā volcano on Maui has a very large crater with many cinder cones, Hosmer’s Grove of alien trees, and the native Hawaiian Goose. The Kipahulu section has numerous pools with freshwater fish. This National Park has the greatest number of endangered species.[35]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Hawaii Volcanoes</td>
      <td>Hawaii</td>
      <td>1916</td>
      <td>10th</td>
      <td>1,368,376</td>
      <td>19.38°N 155.20°W</td>
      <td>19.4095008</td>
      <td>-155.0483131</td>
      <td>323,431.38 acres (1,308.9 km2)</td>
      <td>This park on the Big Island protects the Kīlauea and Mauna Loa volcanoes, two of the world’s most active. Diverse ecosystems of the park range from those at sea level to 13,000 feet (4,000 m).[36]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Hot Springs</td>
      <td>Arkansas</td>
      <td>1921</td>
      <td>16th</td>
      <td>1,467,153</td>
      <td>34.51°N 93.05°W</td>
      <td>34.5100997</td>
      <td>-93.0500476</td>
      <td>5,549.75 acres (22.5 km2)</td>
      <td>The only National Park in an urban area, this smallest National Park is based around the natural hot springs that have been managed for public use. Bathhouse Row preserves 47 of these with many beneficial minerals.[37]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Indiana Dunes</td>
      <td>Indiana</td>
      <td>2019</td>
      <td>61st</td>
      <td>2,134,285</td>
      <td> </td>
      <td> </td>
      <td> </td>
      <td> </td>
      <td> </td>
      <td>No</td>
    </tr>
    <tr>
      <td>Isle Royale</td>
      <td>Michigan</td>
      <td>1940</td>
      <td>25th</td>
      <td>26,410</td>
      <td>48.10°N 88.55°W</td>
      <td>47.723087</td>
      <td>-86.940716</td>
      <td>571,790.11 acres (2,314.0 km2)</td>
      <td>The largest island in Lake Superior, this park is a site of isolation and wilderness. It has many shipwrecks, waterways, and hiking trails. The park also includes over 400 smaller islands in the waters up to 4.5 miles (7.2 km) from the island. There are only 20 mammal species and it is known for its wolf and moose relationship.[38]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Joshua Tree</td>
      <td>California</td>
      <td>1994</td>
      <td>54th</td>
      <td>2,988,547</td>
      <td>33.79°N 115.90°W</td>
      <td>33.6608927</td>
      <td>-115.9467997</td>
      <td>789,745.47 acres (3,196.0 km2)</td>
      <td>Covering parts of the Colorado and Mojave Deserts and the Little San Bernardino Mountains, this is the home of the Joshua tree. Across great elevation changes are sand dunes, dry lakes, rugged mountains, and granite monoliths.[39]</td>
      <td>Yes</td>
    </tr>
    <tr>
      <td>Katmai</td>
      <td>Alaska</td>
      <td>1980</td>
      <td>44th</td>
      <td>84,167</td>
      <td>58.50°N 155.00°W</td>
      <td>58.0458356</td>
      <td>-156.5982286</td>
      <td>3,674,529.68 acres (14,870.3 km2)</td>
      <td>This park on the Alaska Peninsula protects the Valley of Ten Thousand Smokes, an ash flow formed by the 1912 eruption of Novarupta, as well as Mount Katmai. Over 2,000 brown bears come here to catch spawning salmon.[40]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Kenai Fjords</td>
      <td>Alaska</td>
      <td>1980</td>
      <td>45th</td>
      <td>356,601</td>
      <td>59.92°N 149.65°W</td>
      <td>60.1696088</td>
      <td>-149.2361887</td>
      <td>669,982.99 acres (2,711.3 km2)</td>
      <td>Near Seward on the Kenai Peninsula, this park protects the Harding Icefield and at least 38 glaciers and fjords stemming from it. The only area accessible to the public by road is Exit Glacier, while the rest can only be viewed by boat tours.[41]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Kings Canyon</td>
      <td>California</td>
      <td>1940</td>
      <td>24th</td>
      <td>632,110</td>
      <td>36.80°N 118.55°W</td>
      <td>36.7472811</td>
      <td>-119.0925356</td>
      <td>461,901.20 acres (1,869.2 km2)</td>
      <td>Home to several Giant sequoia groves and the General Grant Tree, the world’s second largest, this park also has part of the Kings River, site of the granite Kings Canyon, and San Joaquin River, as well as the Boyden Cave.[42]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Kobuk Valley</td>
      <td>Alaska</td>
      <td>1980</td>
      <td>46th</td>
      <td>15,766</td>
      <td>67.55°N 159.28°W</td>
      <td>67.2872981</td>
      <td>-160.0342625</td>
      <td>1,750,716.50 acres (7,084.9 km2)</td>
      <td>Kobuk Valley has 61 miles (98 km) of the Kobuk River and three regions of sand dunes. Created by glaciers, the Great Kobuk, the Little Kobuk, and the Hunt River Sand Dunes can reach 100 feet (30 m) high and 100 °F (38 °C), and they are the largest dunes in the arctic. Twice a year, half a million caribou migrate through the dunes and across river bluffs that contain ice age fossils.[43]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Lake Clark</td>
      <td>Alaska</td>
      <td>1980</td>
      <td>47th</td>
      <td>17,157</td>
      <td>60.97°N 153.42°W</td>
      <td>60.59772</td>
      <td>-152.7408824</td>
      <td>2,619,733.21 acres (10,601.7 km2)</td>
      <td>The region around Lake Clark has four active volcanoes, including Mount Redoubt, rivers, glaciers, and waterfalls. There are temperate rainforests, a tundra plateau, and three mountain ranges.[44]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Lassen Volcanic</td>
      <td>California</td>
      <td>1916</td>
      <td>11th</td>
      <td>517,039</td>
      <td>40.49°N 121.51°W</td>
      <td>40.5328759</td>
      <td>-121.8412133</td>
      <td>106,372.36 acres (430.5 km2)</td>
      <td>Lassen Peak, the largest plug dome volcano in the world, is joined by all three other types of volcanoes in this park: shield, cinder dome, and composite. Other than the volcano, which last erupted in 1915, the park has hydrothermal areas, including fumaroles, boiling pools, and steaming ground, heated by molten rock under the peak.[45]</td>
      <td>Yes</td>
    </tr>
    <tr>
      <td>Mammoth Cave</td>
      <td>Kentucky</td>
      <td>1941</td>
      <td>26th</td>
      <td>551,590</td>
      <td>37.18°N 86.10°W</td>
      <td>37.2135982</td>
      <td>-86.1548269</td>
      <td>52,830.19 acres (213.8 km2)</td>
      <td>With 392 miles (631 km) of passageways mapped, Mammoth Cave is by far the world’s longest cave system. Cave animals include eight bat species, Kentucky cave shrimp, Northern cavefish, and cave salamanders. Above ground, the park contains Green River (Kentucky), 70 miles of hiking trails, sinkholes, and springs.[46]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Mesa Verde</td>
      <td>Colorado</td>
      <td>1906</td>
      <td>7th</td>
      <td>556,203</td>
      <td>37.18°N 108.49°W</td>
      <td>37.1828071</td>
      <td>-108.4922277</td>
      <td>52,121.93 acres (210.9 km2)</td>
      <td>This area has over 4,000 archaeological sites of the Ancestral Pueblo, who lived here for 700 years. Cliff dwellings built in the 12th and 13th centuries include Cliff Palace, which has 150 rooms and 23 kivas, and the Balcony House, with passages and tunnels.[47]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Mount Rainier</td>
      <td>Washington</td>
      <td>1899</td>
      <td>4th</td>
      <td>1,501,621</td>
      <td>46.85°N 121.75°W</td>
      <td>46.8667428</td>
      <td>-121.8040452</td>
      <td>235,625.00 acres (953.5 km2)</td>
      <td>Mount Rainier, an active volcano, is the most prominent peak in the Cascades, and it is covered by 26 named glaciers including Carbon Glacier and Emmons Glacier, the largest in the continental United States. The mountain is popular for climbing, and more than half of the park is covered by subalpine and alpine forests. Paradise on the south slope is one of the snowiest places in the world, and the Longmire visitor center is the start of the Wonderland Trail, which encircles the mountain.[48]</td>
      <td>Yes</td>
    </tr>
    <tr>
      <td>New River Gorge</td>
      <td>West Virginia</td>
      <td>2020</td>
      <td>63rd</td>
      <td>N/A</td>
      <td> </td>
      <td> </td>
      <td> </td>
      <td> </td>
      <td>New River Gorge is known for its stunning scenery, deep canyons, and the New River, one of the oldest rivers in the world.rafting.</td>
      <td>No</td>
    </tr>
    <tr>
      <td>North Cascades</td>
      <td>Washington</td>
      <td>1968</td>
      <td>33rd</td>
      <td>38,208</td>
      <td>48.70°N 121.20°W</td>
      <td>48.7147254</td>
      <td>-121.1533781</td>
      <td>504,780.94 acres (2,042.8 km2)</td>
      <td>This complex includes the two units of the National Park and the Ross Lake and Lake Chelan National Recreation Areas. There are numerous glaciers, and popular hiking and climbing areas are Cascade Pass, Mount Shuksan, Mount Triumph, and Eldorado Peak.[49]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Olympic</td>
      <td>Washington</td>
      <td>1938</td>
      <td>23rd</td>
      <td>3,245,806</td>
      <td>47.97°N 123.50°W</td>
      <td>47.9707283</td>
      <td>-123.5035728</td>
      <td>922,650.86 acres (3,733.8 km2)</td>
      <td>Situated on the Olympic Peninsula, this park ranges from Pacific shoreline with tide pools to temperate rainforests to Mount Olympus. The glaciated Olympic Mountains overlook the Hoh Rain Forest and Quinault Rain Forest, the wettest area of the continental United States.[50]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Petrified Forest</td>
      <td>Arizona</td>
      <td>1962</td>
      <td>30th</td>
      <td>643,588</td>
      <td>35.07°N 109.78°W</td>
      <td>35.0199151</td>
      <td>-109.7829791</td>
      <td>93,532.57 acres (378.5 km2)</td>
      <td>This portion of the Chinle Formation has a great concentration of 225-million-year-old petrified wood. The surrounding region, the Painted Desert, has eroded red-hued volcanic rock called bentonite. There are also dinosaur fossils and over 350 Native American sites.[51]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Pinnacles</td>
      <td>California</td>
      <td>2013</td>
      <td>59th</td>
      <td>177,224</td>
      <td>36.48°N 121.16°W</td>
      <td>36.4808829</td>
      <td>-121.1593104</td>
      <td>26,605.73 acres (107.7 km2)</td>
      <td>Known for the namesake eroded leftovers of half of an extinct volcano, it is popular for its rock climbing.[52]</td>
      <td>Yes</td>
    </tr>
    <tr>
      <td>Redwood</td>
      <td>California</td>
      <td>1968</td>
      <td>34th</td>
      <td>504,722</td>
      <td>41.30°N 124.00°W</td>
      <td>41.3678119</td>
      <td>-124.0296856</td>
      <td>112,512.05 acres (455.3 km2)</td>
      <td>This park and the co-managed state parks protect almost half of all remaining Coastal Redwoods, the tallest trees on Earth. There are three large river systems in this very seismically active area, and the 37 miles (60 km) of protected coastline have tide pools and seastacks. The prairie, estuary, coast, river, and forest ecosystems have varied animal and plant species.[53]</td>
      <td>Yes</td>
    </tr>
    <tr>
      <td>Rocky Mountain</td>
      <td>Colorado</td>
      <td>1915</td>
      <td>9th</td>
      <td>4,670,053</td>
      <td>40.40°N 105.58°W</td>
      <td>40.400054</td>
      <td>-105.57937</td>
      <td>265,828.41 acres (1,075.8 km2)</td>
      <td>This section of the Rocky Mountains has ecosystems varying in elevation from the over 150 riparian lakes to Montane and subalpine forests to the alpine tundra. Large wildlife including mule deer, bighorn sheep, black bears, and cougars inhabit these igneous mountains and glacier valleys. The fourteener Longs Peak and Bear Lake are popular destinations.[54]</td>
      <td>Yes</td>
    </tr>
    <tr>
      <td>Saguaro</td>
      <td>Arizona</td>
      <td>1994</td>
      <td>52nd</td>
      <td>1,020,226</td>
      <td>32.25°N 110.50°W</td>
      <td>32.0149212</td>
      <td>-110.6645683</td>
      <td>91,439.71 acres (370.0 km2)</td>
      <td>Split into the separate Rincon Mountain and Tucson Mountain Districts, the dry Sonoran Desert is still home to much life in six biotic communities. Beyond the namesake Giant Saguaro cacti, there are barrel cacti, cholla cacti, and prickly pears, as well as Lesser Long-nosed Bats, Spotted Owls, and javelinas.[55]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Sequoia</td>
      <td>California</td>
      <td>1890</td>
      <td>2nd</td>
      <td>1,246,053</td>
      <td>36.43°N 118.68°W</td>
      <td>36.4495198</td>
      <td>-118.7198004</td>
      <td>404,051.17 acres (1,635.1 km2)</td>
      <td>This park protects the Giant Forest, which has the world’s largest tree, General Sherman, as well as four of the next nine. It also has over 240 caves, the tallest mountain in the continental U.S., Mount Whitney, and the granite dome Moro Rock.[56]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Shenandoah</td>
      <td>Virginia</td>
      <td>1935</td>
      <td>22nd</td>
      <td>1,425,507</td>
      <td>38.53°N 78.35°W</td>
      <td>38.5314894</td>
      <td>-78.3522476</td>
      <td>199,045.23 acres (805.5 km2)</td>
      <td>Shenandoah’s Blue Ridge Mountains are covered by hardwood forests that are home to tens of thousands of animals. The Skyline Drive and Appalachian Trail run the entire length of this narrow park that has more than 500 miles (800 km) of hiking trails along scenic overlooks and waterfalls of the Shenandoah River.[57]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Theodore Roosevelt</td>
      <td>North Dakota</td>
      <td>1978</td>
      <td>39th</td>
      <td>691,658</td>
      <td>46.97°N 103.45°W</td>
      <td>46.8747588</td>
      <td>-103.2519011</td>
      <td>70,446.89 acres (285.1 km2)</td>
      <td>This region that enticed and influenced President Theodore Roosevelt is now a park of three units in the badlands. Besides Roosevelt’s historic cabin, there are scenic drives and backcountry hiking opportunities. Wildlife includes American Bison, pronghorns, Bighorn sheep, and wild horses.[58]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Virgin Islands</td>
      <td>US Virgin Islands</td>
      <td>1956</td>
      <td>28th</td>
      <td>133,398</td>
      <td>18.33°N 64.73°W</td>
      <td>18.3368114</td>
      <td>-64.7280952</td>
      <td>14,688.87 acres (59.4 km2)</td>
      <td>The island of Saint John has rich human and natural history. There are Taino archaeological sites and ruins of sugar plantations from Columbus’s time. Past the pristine beaches are mangroves, seagrass beds, coral reefs and algal plains.[59]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Voyageurs</td>
      <td>Minnesota</td>
      <td>1971</td>
      <td>37th</td>
      <td>232,974</td>
      <td>48.50°N 92.88°W</td>
      <td>48.5494209</td>
      <td>-93.3763619</td>
      <td>218,200.17 acres (883.0 km2)</td>
      <td>This park on four main lakes, a site for canoeing, kayaking, and fishing, has a history of Ojibwe Native Americans, French fur traders called voyageurs, and a gold rush. Formed by glaciers, this region has tall bluffs, rock gardens, islands and bays, and historic buildings.[60]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>White Sands</td>
      <td>New Mexico</td>
      <td>2019</td>
      <td>62nd</td>
      <td>608,785</td>
      <td> </td>
      <td> </td>
      <td> </td>
      <td> </td>
      <td> </td>
      <td>No</td>
    </tr>
    <tr>
      <td>Wind Cave</td>
      <td>South Dakota</td>
      <td>1903</td>
      <td>6th</td>
      <td>615,350</td>
      <td>43.57°N 103.48°W</td>
      <td>43.3759472</td>
      <td>-103.4511768</td>
      <td>28,295.03 acres (114.5 km2)</td>
      <td>Wind Cave is distinctive for its calcite fin formations called boxwork and needle-like growths called frostwork. The cave, which was discovered by the sound of wind coming from a hole in the ground, is the world’s densest cave system. Above ground is a mixed-grass prairie with animals such as bison, black-footed ferrets, and prairie dogs,[61] and Ponderosa pine forests home to cougars and elk.</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Wrangell - St. Elias</td>
      <td>Alaska</td>
      <td>1980</td>
      <td>48th</td>
      <td>74,518</td>
      <td>61.00°N 142.00°W</td>
      <td>61.3314317</td>
      <td>-142.939826</td>
      <td>8,323,147.59 acres (33,682.6 km2)</td>
      <td>This mountainous land has the convergence of the Alaska, Chugach, and Wrangell-Saint Elias Ranges, which have many of the continent’s tallest mountains over 16,000 feet (4,900 m), including Mount Saint Elias. More than 25% of this park of volcanic peaks is covered with glaciers, including the tidewater Hubbard Glacier, piedmont Malaspina Glacier, and valley Nabesna Glacier.[62]</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Yellowstone</td>
      <td>Wyoming, Montana, Idaho</td>
      <td>1872</td>
      <td>1st</td>
      <td>4,020,288</td>
      <td>44.60°N 110.50°W</td>
      <td>44.4620852</td>
      <td>-110.6424411</td>
      <td>2,219,790.71 acres (8,983.2 km2)</td>
      <td>Situated on the Yellowstone Caldera, the first national park in the world has vast geothermal areas such as hot springs and geysers, the best-known being Old Faithful and Grand Prismatic Spring. The yellow-hued Grand Canyon of the Yellowstone River has numerous waterfalls, and four mountain ranges run through the park. There are almost 60 mammal species, including the gray wolf, grizzly bear, lynx, bison, and elk.[63]</td>
      <td>Yes</td>
    </tr>
    <tr>
      <td>Yosemite</td>
      <td>California</td>
      <td>1890</td>
      <td>3rd</td>
      <td>4,422,861</td>
      <td>37.83°N 119.50°W</td>
      <td>37.8276596</td>
      <td>-119.5053357</td>
      <td>761,266.19 acres (3,080.7 km2)</td>
      <td>Yosemite has towering cliffs, waterfalls, and sequoias in a diverse area of geology and hydrology. Half Dome and El Capitan rise from the central glacier-formed Yosemite Valley, as does Yosemite Falls, North America’s tallest waterfall. Three Giant Sequoia groves and vast wilderness are home to diverse wildlife.[64]</td>
      <td>Yes</td>
    </tr>
    <tr>
      <td>Zion</td>
      <td>Utah</td>
      <td>1919</td>
      <td>15th</td>
      <td>4,488,268</td>
      <td>37.30°N 113.05°W</td>
      <td>37.088313</td>
      <td>-113.2394452</td>
      <td>146,597.60 acres (593.3 km2)</td>
      <td>This geologically unique area has colorful sandstone canyons, high plateaus, and rock towers. Natural arches and exposed formations of the Colorado Plateau make up a large wilderness of four ecosystems.[65]</td>
      <td>No</td>
    </tr>
  </tbody>
</table>]]></content><author><name>Austin Powell</name></author><category term="Hobbies" /><summary type="html"><![CDATA[US Parks]]></summary></entry><entry><title type="html">2023 Technical Reading</title><link href="https://aus10powell.github.io/2023/papers/Reading/" rel="alternate" type="text/html" title="2023 Technical Reading" /><published>2023-01-01T00:00:00+00:00</published><updated>2023-01-01T00:00:00+00:00</updated><id>https://aus10powell.github.io/2023/papers/Reading</id><content type="html" xml:base="https://aus10powell.github.io/2023/papers/Reading/"><![CDATA[<p><strong>2023 Reading List</strong></p>

<h2 id="signal-processing">Signal Processing</h2>

<h3 id="general">General</h3>
<ul>
  <li><a href="https://deepwavedigital.com/wp-content/uploads/2019/11/GTCDC2019_Deepwave_Public_Release_FINAL.pdf">End-to-End Signal Processing and Deep Learning (using Embedded GPUs)</a>: End-to-End Signal Processing and Deep Learning, particularly using Embedded GPUs, are specialized areas of focus. These technologies enable the development of full-stack solutions for deep learning and GPU-enabled signal processing systems, including edge compute hardware and custom applications. The integration of hardware and software facilitates efficient performance in tasks such as radio embedded systems with FPGA, CPU, and GPU, GPU-based signal processing algorithms, and pruned neural networks for inference on edge RF systems.</li>
</ul>

<h3 id="audio">Audio</h3>
<ul>
  <li><a href="https://arxiv.org/abs/2305.01578&quot;">Self-supervised learning for infant cry analysis</a>
    <ul>
      <li>Self-supervised learning can be used to learn useful representations of infant cries from unlabeled data.</li>
      <li>The self-supervised approach was able to achieve comparable performance to a supervised learning approach that was trained on a small amount of labeled data.</li>
      <li>Self-supervised learning could be a valuable tool for developing new and improved infant cry analysis systems.</li>
    </ul>
  </li>
  <li><a href="https://arxiv.org/abs/2305.00969&quot;">CryCeleb: A Speaker Verification Dataset Based on Infant Cry Sounds</a>
    <ul>
      <li>The CryCeleb dataset is a large and diverse dataset of infant cry sounds. This challenge has narrowed in on the task of distinguishing baby cries from each other.</li>
    </ul>
  </li>
</ul>

<h2 id="nlp">NLP</h2>
<ul>
  <li><a href="https://arxiv.org/abs/1910.13461">BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension</a>
    <ul>
      <li>BART is a denoising autoencoder, which means that it is trained on a dataset of corrupted text. The corruptions can be simple, such as replacing words with random words, or more complex, such as removing words or sentences. BART is trained to reconstruct the original text from the corrupted text.</li>
      <li>This training procedure helps BART to learn to represent the meaning of text, even when the text is corrupted. This makes BART well-suited for natural language generation, translation, and comprehension tasks.</li>
    </ul>
  </li>
  <li><a href="https://arxiv.org/abs/2305.16765">Backpack Language Models</a>
    <ul>
      <li>Backpacks is a neural architecture that combines strong modeling performance with interpretability and control. It learns multiple sense vectors for each word, representing different aspects of the word, and allows for intervention and modification of these vectors to change the model’s behavior. It outperforms larger models in lexical similarity evaluations and enables controllable text generation and debiasing through sense vector manipulation.</li>
    </ul>
  </li>
  <li>Date specific (might not age well) 05/25/23
    <ul>
      <li><a href="https://www.youtube.com/watch?v=bZQun8Y4L2A&amp;t=518s&amp;ab_channel=MicrosoftDeveloper">State of GPT (a Youtube update by Microsoft)</a></li>
      <li><a href="https://arxiv.org/abs/2212.09251">Discovering Language Model Behaviors with Model-Written Evaluations</a></li>
    </ul>
  </li>
  <li><a href="https://lilianweng.github.io/posts/2023-06-23-agent/">LLM Powered Autonomous Agents</a> An OpenAI researcher’s musing on RHL.</li>
  <li><a href="https://github.com/linexjlin/GPTs">Prompt Receipes</a> Great starting points for some specific use-cases in prompting. Some are shockingly effective.</li>
</ul>

<h3 id="large-language-models">Large-Language Models</h3>
<ul>
  <li>
    <p><a href="http://proceedings.mlr.press/v97/houlsby19a/houlsby19a.pdf">Parameter-Efficient Transfer Learning for NLP</a>: using adapter modules as an efficient alternative to fine-tuning large pre-trained models in the context of numerous downstream tasks. Adapter modules introduce minimal trainable parameters per task, enabling the incorporation of new tasks without retraining the entire model. Demonstrating their effectiveness, the authors apply adapter modules to 26 text classification tasks, including the GLUE benchmark, achieving near state-of-the-art performance with only a slight increase in parameters. This approach contrasts with traditional fine-tuning, which requires training all parameters for each task, showcasing the efficiency and flexibility of adapter modules in handling diverse tasks.</p>
  </li>
  <li><a href="https://huyenchip.com/2023/05/02/rlhf.html">RLHF: Reinforcement Learning from Human Feedback (A blog by Chip Huyen)</a></li>
  <li><a href="https://huggingface.co/blog/stackllama">StackLLaMA: A hands-on guide to train LLaMA with RLHF</a>: About the development of the StackLLaMA model, a reinforcement learning from human feedback (RLHF) fine-tuned LLaMA model for Stack Exchange question and answering. The model is trained using a combination of supervised fine-tuning, reward modeling, and reinforcement learning techniques. Challenges faced during training, such as balancing rewards and managing KL divergence, are highlighted, and the post emphasizes the ongoing efforts to improve RLHF methods. The StackLLaMA model demonstrates the application of RL techniques in natural language processing tasks, showcasing its potential for enhancing question-answering systems.</li>
  <li><a href="https://react-lm.github.io/">ReAct: Synergizing Reasoning and Acting in Language Models (web write-up)</a>: a novel approach that leverages Large Language Models (LLMs) to generate both reasoning traces and task-specific actions concurrently. By using reasoning and acting, ReAct demonstrates enhanced synergy between the two, enabling the model to induce, track, and update action plans based on reasoning traces and interact effectively with external sources. The results show ReAct’s effectiveness over existing methods in various language and decision-making tasks, addressing issues like hallucination and error propagation while improving human interpretability and trustworthiness.</li>
  <li><a href="https://arxiv.org/abs/2210.03629">ReAct: Synergizing Reasoning and Acting in Language Models</a></li>
  <li><a href="https://arxiv.org/pdf/2310.08123.pdf">Who Wrote it and Why? Prompting Large-Language Models for Authorship Verification</a>: The text discusses the significance of Authorship Verification (AV) in natural language processing, highlighting challenges in existing techniques such as data requirements and explainability. To address these issues, the paper introduces PromptAV, a novel approach leveraging Large-Language Models for AV. PromptAV utilizes step-by-step stylometric explanation prompts, outperforming state-of-the-art methods with limited data and providing intuitive explanations for predictions. The research aims to enhance AV effectiveness and interpretability, presenting PromptAV as a promising solution in forensic analysis, plagiarism detection, and identification of deceptive content.</li>
  <li>
    <p><a href="https://www.anyscale.com/blog/a-comprehensive-guide-for-building-rag-based-llm-applications-part-1">Building RAG-based LLM Applications for Production</a>: Summary of      Retrieval Augmented Generation (RAG):</p>

    <p>Retrieval Augmented Generation (RAG) addresses the limitation of Large Language Models (LLMs) not being trained on specific user data. It involves incorporating user data into the LLMs’ existing knowledge. In the RAG process, user data is loaded and prepared for queries, creating an index. User queries filter the data down to the most relevant context, and this context, along with the query, is sent to the LLM, which generates a response.</p>

    <p>Key stages within RAG include:</p>

    <ul>
      <li>Loading: Retrieving user data from various sources and bringing it into the processing pipeline, supported by connectors like those provided by LlamaHub.</li>
      <li>Indexing: Creating a data structure, often involving vector embeddings and metadata strategies, to facilitate efficient querying and retrieval of contextually relevant data.</li>
      <li>Storing: Saving the indexed data and associated metadata to avoid the need for re-indexing in the future.</li>
      <li>Querying: Utilizing LLMs and LlamaIndex data structures for various query strategies, such as sub-queries, multi-step queries, and hybrid approaches.</li>
      <li>Evaluation: Assessing the effectiveness of the pipeline through objective measures, comparing strategies and evaluating the accuracy, fidelity, and speed of query responses.</li>
    </ul>
  </li>
</ul>

<h2 id="vision">Vision</h2>
<ul>
  <li><a href="https://arxiv.org/abs/2405.14458">YoloV10: Real-Time End-to-End Object Detection</a> Among other improvements in efficiency, the highlight may be tackeling the speed issues caused by NMS.</li>
  <li><a href="https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/">Faiss: A library for efficient similarity search</a>:  Facebook AI Similarity Search (Faiss), a library that allows us to quickly search for multimedia documents that are similar to each other</li>
</ul>

<h2 id="ml-general">ML (General)</h2>
<ul>
  <li><a href="http://medianetlab.ee.ucla.edu/papers/AAAI_2018_DeepHit">DeepHit: A Deep Learning Approach to Survival Analysis with Competing Risks
</a> Time-to-event analysis is widely used in economics, finance, engineering, medicine and many other areas. Previous models rely on strong parametric assumptions that are often violated. DeepHit uses a deep neural network to learn the distribution of survival times directly. Comparisons with previous models on the basis of real and synthetic datasets demonstrate that DeepHit achieves statistically significant performance improvements.</li>
</ul>

<h3 id="recommendation-and-search">Recommendation and Search</h3>
<ul>
  <li><a href="https://eugeneyan.com/writing/counterfactual-evaluation/">Counterfactual Evaluation for Recommendation Systems</a>: the challenge of evaluating recommendation systems and suggests that they should be treated as interventional problems rather than observational ones. It explains that traditional offline evaluation methods may not capture the true impact of recommendations on user behavior. The article introduces counterfactual evaluation as an alternative approach, particularly focusing on Inverse Propensity Scoring (IPS) and its variants like Clipped IPS (CIPS) and Self-Normalized IPS (SNIPS), highlighting their advantages and limitations.</li>
  <li><a href="https://arxiv.org/pdf/2002.05515.pdf">Improving Deep Learning For Airbnb Search</a>: Airbnb’s transition to deep learning for search ranking significantly impacted its roadmap, leading to a shift in strategy. While the initial optimism about incorporating machine learning ideas from literature surveys faded due to application-specific challenges, the focus shifted towards a process-driven approach, emphasizing the importance of iterative strategies over individual techniques for enhancing deep learning models in industrial settings.</li>
</ul>

<h2 id="ml-ops">ML Ops</h2>
<ul>
  <li><a href="Rules of Machine Learning: Best Practices for ML Engineering (M. Zinkevich)">https://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf</a></li>
  <li><a href="https://blog.tensorflow.org/2021/12/continuous-adaptation-for-machine.html">Continuous Adaptation for Machine (Learning)</a></li>
  <li><a href="https://cloud.google.com/blog/topics/developers-practitioners/model-training-cicd-system-part-i">Model training as a CI/CD system: Part I</a></li>
  <li><a href="https://cloud.google.com/blog/topics/developers-practitioners/model-training-cicd-system-part-ii">Model training as a CI/CD system: Part II</a></li>
  <li><a href="https://madewithml.com/courses/mlops/api/">APIs for Model Serving</a></li>
  <li>
    <p><a href="https://docs.github.com/en/actions/learn-github-actions/understanding-github-actions">Github Actions</a> This is a great place to start especially since your project likely already has a Github repository. It covers all the basics of CI/CD. It discusses the components of a workflow and how to create one. Workflows can be triggered by events such as pushes to the repository. Each workflow run is executed on a runner, which can be a virtual machine or a self-hosted server. Jobs within a workflow can run in parallel or sequentially.</p>

    <ul>
      <li>The process of building a CI/CD system specifically for model training.</li>
      <li>Key components such as data management, environment setup, model training pipeline, and version control.</li>
      <li>Emphasizes the benefits of automation, highlights cloud infrastructure support, and suggests specific tools for implementing a model training CI/CD system.</li>
    </ul>
  </li>
  <li><a href="https://cloud.google.com/solutions/application-deployment-and-testing-strategies">Application deployment and testing strategies (Google Cloud)</a>
    <ul>
      <li>Best Practices: 1) Backward compatibility 2) Continuous integration/continuous deployment (CI/CD) 3) Automation 4) Operating environments and configuration management. 5) Rollback strategy in case things go wrong 6) Post-deployment monitoring</li>
    </ul>
  </li>
  <li><a href="https://blog.tensorflow.org/2021/01/leveraging-tensorflow-tensorrt-integration.html">Leveraging TensorFlow-TensorRT integration for Low latency Inference</a>:TensorRT integration in TensorFlow allows developers to optimize and accelerate their deep learning models for deployment on GPUs. By leveraging TensorRT’s optimizations, TensorFlow users can achieve faster inference times and reduced memory footprint. This integration provides a seamless workflow, enabling efficient deployment of TensorFlow models with improved performance for real-time applications.
*<a href="https://wandb.ai/site/articles/bayesian-hyperparameter-optimization-a-primer">Bayesian hyperparameter tuning: Nuts &amp; bolts</a></li>
</ul>

<h2 id="statistical">Statistical</h2>
<ul>
  <li><a href="https://watermark.silverchair.com/55-6-1031.pdf?token=AQECAHi208BE49Ooan9kkhW_Ercy7Dm3ZL_9Cf3qfKAc485ysgAAA1swggNXBgkqhkiG9w0BBwagggNIMIIDRAIBADCCAz0GCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMLx9G1vXE640UVIehAgEQgIIDDuVcAcoVq_sfbFu8CGHwHlYnMN-tbQb0neED632xB-3FiQ0-kQhsE5g0NRFg8gwNFssmYzIEeO6_4MGcdwX-mJrjT7_lnDBmJ7eETkYOW97K0674mc4BoQ4cIKgknvRaj3fMPPjGe_GDkr2vp8kOrSKxN8sAvBbxqOaDLVky_fyIhdVpX5pxHkbVxl30U4iY3wUNwGl2tuBv0atdLbNbocwpTpWCiiIlm1JjSDpDkdOyvoc9qYYg7pCh_gV45jMbNAHUktEJ_0OTcqnJYcvVWsqwjnkljnplWseVUhe6IgHGzN6juC5KuuB5Jciql8WZ7QQ5V7ju0Dn5YaUqk2G9y7EjxrEwcElx7ye48ZI2PmgEoLwMOiHMfV9iaNRsA9tErunzo2O0aH5ZT3MvCQZzcthfhTcx7j-vsRFxNj2Xnvkymp36xSKmF-7xjIzq2N2QuibF9bzFeOlzW3aIXtDhjZE99EG2EQmqcggTpUFyiXftzh0hBwFHLkygWx2oUMyI0sSYt-02MYrR8G2f_FSLEtUzKTKJ-Efuth9nm-4hLOvciBQkY_SnTvHqlAXy4wENtdXalYb5GDHN_07a92wpTm6pTL4BlgZwCj4MnLMwfnfUUSrJR4L20cKNSClQqE7hy_msylSEw_zYveyN-Pp9Sh7U1RQpqy_12Bp_UIqP5IsXTjAhX3Kg2UvuMNcmIZGUvYBludcopl1LWIafj_30RBY3BjCmVFGitP-hqa_veAmPx8rq7uTmlDN8IEeWz7TqsJEOlC6Y7sehB5zeKJORfNyjtXfzO-ImwLlk6kQm6aoUSHlHHRKNns-j76Ezhx6MCZ-Yr03gBRO1bz4dImQK7Euvf3dSVOT87QLsRTNyH5pEFe94TS90348PTXluxGy4FAqndt_rxc2epL2at7bHwGD_h3jcQzlXcVyekFzdFyFL5dh8qKRD4IPK7D203hD8U4UK-XoOUalY269vFMYuaNES3oosBQ4sugYMU7yH_IpMuoIZVG6kEhDyJ7sEBKm--AD9ggUqkLkT2_TC3ll6">Bayesian stock assessment: a review and example application using the logistic model</a></li>
</ul>]]></content><author><name>Austin Powell</name></author><category term="2023" /><category term="Papers" /><summary type="html"><![CDATA[Technical articles, research papers, etc. read in 2023]]></summary></entry><entry><title type="html">MIT Fishery Monitoring With Computer Vision</title><link href="https://aus10powell.github.io/computer-vision/MITFishery/" rel="alternate" type="text/html" title="MIT Fishery Monitoring With Computer Vision" /><published>2022-02-16T00:00:00+00:00</published><updated>2022-02-16T00:00:00+00:00</updated><id>https://aus10powell.github.io/computer-vision/MITFishery</id><content type="html" xml:base="https://aus10powell.github.io/computer-vision/MITFishery/"><![CDATA[<p><strong>A grant-funded computer vision-based system to count fish under various conditions entering Massachusetts Fisheries.</strong></p>

<h1 id="mit-fishery">MIT Fishery</h1>
<p>I am currently working on an ongoing project to detect and count fish in Massachusetts Fisheries. This project is funded by a grant from the federal government. I started working on this project in 2022, and I am working individually with a research professor and MIT PhD students. I am responsible taking the initial research project and making it operational: 1) Making model performant enough to count fish  2) Working with research grant funding to set up appropriate infrastructure and 3) deploying the infrastructure.</p>

<p><a href="https://youtu.be/3UxDNxzXF5U"><img src="/assets/images/mitfishery/annotated_counter.png" alt="Click to view video" /></a></p>

<h2 id="table-of-contents">Table of Contents</h2>
<ul id="markdown-toc">
  <li><a href="#mit-fishery" id="markdown-toc-mit-fishery">MIT Fishery</a>    <ul>
      <li><a href="#table-of-contents" id="markdown-toc-table-of-contents">Table of Contents</a></li>
      <li><a href="#qa-bot-ask-questions-about-the-repository" id="markdown-toc-qa-bot-ask-questions-about-the-repository">QA Bot: Ask questions about the repository:</a></li>
      <li><a href="#public-demo-of-dashboard" id="markdown-toc-public-demo-of-dashboard">Public Demo of dashboard</a></li>
      <li><a href="#overview" id="markdown-toc-overview">Overview</a></li>
      <li><a href="#object-detection" id="markdown-toc-object-detection">Object Detection:</a>        <ul>
          <li><a href="#performance" id="markdown-toc-performance">Performance</a>            <ul>
              <li><a href="#performance-comparison-for-different-years" id="markdown-toc-performance-comparison-for-different-years">Performance Comparison for Different Years</a></li>
              <li><a href="#overview-1" id="markdown-toc-overview-1">Overview</a></li>
              <li><a href="#metrics-detecting-and-tracking" id="markdown-toc-metrics-detecting-and-tracking">Metrics: Detecting and Tracking</a>                <ul>
                  <li><a href="#detecting" id="markdown-toc-detecting">Detecting</a></li>
                </ul>
              </li>
              <li><a href="#performance-metrics" id="markdown-toc-performance-metrics">Performance Metrics</a></li>
              <li><a href="#discussion" id="markdown-toc-discussion">Discussion</a></li>
            </ul>
          </li>
          <li><a href="#data-preparation" id="markdown-toc-data-preparation">Data Preparation</a>            <ul>
              <li><a href="#video-considerations" id="markdown-toc-video-considerations">Video Considerations</a></li>
              <li><a href="#due-to-the-extreme-varition-in-camera-quality-and-image-quality-the-following-types-of-image-augmentation-have-become-extremely-usefuly-in-the-quality-of-model" id="markdown-toc-due-to-the-extreme-varition-in-camera-quality-and-image-quality-the-following-types-of-image-augmentation-have-become-extremely-usefuly-in-the-quality-of-model">Due to the extreme varition in camera quality and image quality the following types of image augmentation have become extremely usefuly in the quality of model:</a>                <ul>
                  <li><a href="#example-image-augmentation" id="markdown-toc-example-image-augmentation">Example Image Augmentation</a></li>
                  <li><a href="#example-of-challenging-atmospheric-conditions-for-id-ing-of-fish-for-purposes-of-tracking" id="markdown-toc-example-of-challenging-atmospheric-conditions-for-id-ing-of-fish-for-purposes-of-tracking">Example of Challenging Atmospheric Conditions for ID-ing of Fish for Purposes of Tracking</a></li>
                </ul>
              </li>
              <li><a href="#example-of-the-need-for-more-data-when-counting-and-tracking" id="markdown-toc-example-of-the-need-for-more-data-when-counting-and-tracking">Example of the need for more data when counting AND tracking</a></li>
              <li><a href="#breakdown-of-different-challenges-with-data-quality" id="markdown-toc-breakdown-of-different-challenges-with-data-quality">Breakdown of different challenges with data quality:</a></li>
              <li><a href="#annotation-and-general-approach-notes-for-object-counting-in-video" id="markdown-toc-annotation-and-general-approach-notes-for-object-counting-in-video">Annotation and General Approach Notes for Object Counting in Video</a></li>
              <li><a href="#what-does-a-label-balanced-look-like-in-object-detection" id="markdown-toc-what-does-a-label-balanced-look-like-in-object-detection">What does a label “balanced” look like in object detection?</a></li>
              <li><a href="#dataset-selection-and-training" id="markdown-toc-dataset-selection-and-training">Dataset Selection and Training:</a>                <ul>
                  <li><a href="#holdout-set-for-validation" id="markdown-toc-holdout-set-for-validation">Holdout Set for Validation:</a></li>
                </ul>
              </li>
              <li><a href="#bayesian-optimization-with-wandbsweeps" id="markdown-toc-bayesian-optimization-with-wandbsweeps">Bayesian Optimization with wandb.sweeps:</a></li>
              <li><a href="#tracking" id="markdown-toc-tracking">Tracking</a></li>
            </ul>
          </li>
        </ul>
      </li>
    </ul>
  </li>
</ul>

<h2 id="qa-bot-ask-questions-about-the-repository">QA Bot: Ask questions about the repository:</h2>
<p><a href="https://huggingface.co/chat/conversation/66b2a6ee5141de599b9cb1c1">
    <img src="/assets/images/mitfishery/ChatbotImage.png" alt="QA Chatbot: Ask questions about this repository" width="350" />
</a></p>

<h2 id="public-demo-of-dashboard">Public Demo of dashboard</h2>
<div style="display: flex; justify-content: center;">
  <iframe src="https://aus10powell-mit-fishery-app.hf.space" width="1100" height="1050" display="block" scrolling="yes" frameborder="0" float="right"></iframe>
</div>

<h2 id="overview">Overview</h2>
<p>Fisheries populations have a large impact on the U.S. economy. Each year the U.S. fishing industry contributes 90 billion dollars and 1.5 million jobs to the U.S. economy. Each species may serve as a predator or prey for another. In this regard, fisheries populations are interconnected and dependent. While humans may depend on these populations as a source of sustenance (food, goods, etc.), humans can also negatively impact population growth. Barriers to migration, pollution, overfishing, and other forms of human-interference may impact spawning patterns of fisheries species. In 2014, 17% of U.S. fisheries were classified as overfished. Therefore, it is necessary to monitor these fisheries populations to determine when policy must be changed in efforts to maintain healthy oceans.</p>

<p>Many groups, including NOAA Fisheries, state agencies, as well as regional fisheries councils and local municipalities, deploy camera and video equipment to monitor fisheries populations. Large amounts of video and photographic data are gathered at timed intervals. However, not all photos contain aquatic life. Currently, employees at these agencies among others are responsible for manually annotating the gathered videos and photos; this means they identify and count the relevant aquatic specimens in the data. Not only is this an inefficient use of time and resources, but also it can lead to inaccurate results due to human error. NOAA Fisheries Management can make a significant improvement in time and resource use through automation of the annotation process.</p>

<p>Throughout the project, I have made significant progress in addressing these challenges and have achieved promising results. Here’s an overview of the key aspects and advancements made:</p>

<h2 id="object-detection">Object Detection:</h2>
<p>The initial challenge involved designing an accurate object detection algorithm specifically tailored for fish tracking. It required careful consideration of model architecture, hyperparameters, and dataset selection. Through iterative experimentation and fine-tuning.</p>

<ul>
  <li>Original YOLO paper: (You Only Look Once: Unified, Real-Time Object Detection)[https://arxiv.org/abs/1506.02640]</li>
</ul>

<h3 id="performance">Performance</h3>

<p>The primary metric used to assess performance in training the object detection is map50-95 (mean average precision at IoU (Intersection over Union) thresholds of 0.5 to 0.95) which I baselined the project with at 0.31 and now at 0.72 has become sufficent for tracking in videos to receive a 0.28 MAPE (Mean Absolute Percentage Error). This doesn’t reflect the full picture of how well the algorithm performs in tracking an object from one point to another. There is more discussion on this later.</p>

<h4 id="performance-comparison-for-different-years">Performance Comparison for Different Years</h4>

<h4 id="overview-1">Overview</h4>
<p>This section provides a comparison of the algorithm’s performance across different years (2016, 2017, and 2018) at one specific site. The performance metrics considered include <code class="language-plaintext highlighter-rouge">videos_missed</code>, <code class="language-plaintext highlighter-rouge">mape</code>, <code class="language-plaintext highlighter-rouge">total_percent_error</code>, <code class="language-plaintext highlighter-rouge">misscounts</code>, <code class="language-plaintext highlighter-rouge">total_herring</code>, <code class="language-plaintext highlighter-rouge">rmse</code>, <code class="language-plaintext highlighter-rouge">mae</code>, and <code class="language-plaintext highlighter-rouge">f1</code>.</p>

<h4 id="metrics-detecting-and-tracking">Metrics: Detecting and Tracking</h4>
<p>To track and count fish objects effectively, there is obviously a need to detect/classify the fish which affects the performance tracking the fish from frame to frame.</p>

<h5 id="detecting">Detecting</h5>

\[\text{MAPE} = \frac{1}{n} \sum_{i=1}^{n} \left| \frac{\text{True Counts}_i - \text{Predicted Counts}_i}{\text{True Counts}_i} \right| \times 100\]

<p>This is a good overall metric that generally captures how well, for a series of video segments, your algorithm counts the objects. For my use-case, because the video is initially starts recording for movement (can be triggered by seaweed as well as fish), each video capturing the fish is ~30sec. For MAPE, this means that if only 1 fish was in the video and it was not counted which will be penalized more heavily than a video with 2 fish where only 1 was counted. This can be important since a lot of fish will swim in parallel which can add to the difficulty in detection.</p>

<h4 id="performance-metrics">Performance Metrics</h4>
<p>Here’s a breakdown of the performance metrics for each year:</p>

<table>
  <thead>
    <tr>
      <th>Year</th>
      <th>Videos Missed</th>
      <th>MAPE</th>
      <th>Total Percent Error</th>
      <th>Misscounts</th>
      <th>Total Herring</th>
      <th>RMSE</th>
      <th>MAE</th>
      <th>F1</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>2016</td>
      <td>4</td>
      <td>0.196667</td>
      <td>0.333333</td>
      <td>7</td>
      <td>21</td>
      <td>1.9</td>
      <td>0.7</td>
      <td>0.333333</td>
    </tr>
    <tr>
      <td>2017</td>
      <td>3</td>
      <td>0.09375</td>
      <td>0.12</td>
      <td>3</td>
      <td>25</td>
      <td>0.1875</td>
      <td>0.1875</td>
      <td>0.856459</td>
    </tr>
    <tr>
      <td>2018</td>
      <td>11</td>
      <td>0.166667</td>
      <td>0.12</td>
      <td>9</td>
      <td>75</td>
      <td>0.543478</td>
      <td>0.326087</td>
      <td>0.611905</td>
    </tr>
  </tbody>
</table>

<h4 id="discussion">Discussion</h4>
<p>The performance variance over the years can be attributed to several factors, including variations in video quality, lighting conditions, and the presence of other objects that can reduce confidence in herring frame to frame. These factors pose challenges for accurate fish detection and tracking, leading to differences in the algorithm’s effectiveness across different years. The primary factor in better performance seems to be the light quality that helps define the fish. However, despite these challenges, the algorithm has demonstrated consistent improvement in performance over time, as evidenced by the evolving metrics.</p>

<h3 id="data-preparation">Data Preparation</h3>

<h4 id="video-considerations">Video Considerations</h4>
<p>The ideal sample rate depends on how fast objects move. For slower videos, a lower rate might suffice. But to capture every detail of speedy objects, you’ll need to annotate more frames, ensuring you have enough data to calculate their velocity accurately.</p>

<h4 id="due-to-the-extreme-varition-in-camera-quality-and-image-quality-the-following-types-of-image-augmentation-have-become-extremely-usefuly-in-the-quality-of-model">Due to the extreme varition in camera quality and image quality the following types of image augmentation have become extremely usefuly in the quality of model:</h4>

<h5 id="example-image-augmentation">Example Image Augmentation</h5>
<p><img src="/assets/images/mitfishery/example_augmentation.png" alt="Click to view video" /></p>

<h5 id="example-of-challenging-atmospheric-conditions-for-id-ing-of-fish-for-purposes-of-tracking">Example of Challenging Atmospheric Conditions for ID-ing of Fish for Purposes of Tracking</h5>
<p><img src="/assets/images/mitfishery/murky_water_2018.png" alt="Click to view video" /></p>

<h4 id="example-of-the-need-for-more-data-when-counting-and-tracking">Example of the need for more data when counting AND tracking</h4>
<p><img src="/assets/images/mitfishery/two_fish_2018.png" alt="Click to view video" /></p>

<h4 id="breakdown-of-different-challenges-with-data-quality">Breakdown of different challenges with data quality:</h4>

<p>1) <strong>Illumination Challenge:</strong> In object tracking, illumination challenges arise when the lighting conditions change dramatically across frames or in different parts of the video. This variation in brightness, shadows, and highlights can affect the appearance of the object, making it difficult for the tracker to maintain accurate identification.</p>

<p><img src="/assets/images/mitfishery/illumination.jpg" alt="Illumination &gt;&lt;" /></p>

<p>2) <strong>Occlusion Challenge:</strong> Occlusion occurs when an object being tracked is partially or completely obscured by other objects, elements, or even itself. Occlusions can disrupt the object’s visibility, leading to potential misidentifications or temporary loss of tracking. This is of particulary problem for this situation since we want to pick up the fish as soon as possible to establish a unique tracking id.</p>

<p><img src="/assets/images/mitfishery/occlusion.jpg" alt="Occlusion &gt;&lt;" /></p>

<p>3) <strong>Deformation Challenge:</strong> Deformation challenges refer to situations where the object of interest undergoes significant shape changes over time. This could occur due to the object’s inherent flexibility or interactions with other objects. Tracking such deformable objects requires handling non-rigid transformations effectively.</p>

<p><img src="/assets/images/mitfishery/deformation.gif" alt="Out of Plane &gt;&lt;" /></p>

<p>4) <strong>Noise Corruption/Blurring Challenge:</strong> Noise in the image data can be caused by various factors, such as sensor limitations, compression artifacts, or environmental conditions. Noise corruption poses a challenge to object tracking by introducing unwanted fluctuations and reducing the clarity of the object’s features.</p>

<p><img src="/assets/images/mitfishery/noise_corruption_blurring.jpg" alt="Occlusion &gt;&lt;" /></p>

<p>5) <strong>Out of Plane Rotation Challenge:</strong> Out of plane rotation occurs when the object is viewed from different angles, leading to changes in its appearance and perspective across frames. Accurately tracking objects with out of plane rotations requires handling geometric transformations.</p>

<p><img src="/assets/images/mitfishery/out_of_plane.gif" alt="Out of Plane &gt;&lt;" /></p>

<p>6) <strong>Motion Blurring Challenge:</strong> Motion blur is caused by the relative motion between the camera and the object during exposure, resulting in smearing of the object’s appearance. Tracking objects affected by motion blur requires robustness to handle the loss of fine details and sharpness in the image.</p>

<p><img src="/assets/images/mitfishery/motion_blurring.jpg" alt="Motion Blurring &gt;&lt;" /></p>

<h4 id="annotation-and-general-approach-notes-for-object-counting-in-video">Annotation and General Approach Notes for Object Counting in Video</h4>
<ul>
  <li>When splitting images between train/val/test by complete journey of object across screen. I.e. split all of the images of a fish traveling across screen into train then a seperate fish traveling across screen into val.</li>
  <li>Set the confidence of your YOLO inference at the confidence that maximizes your F1 score. E.g. Max 98% @ Confidence = 0.32. Although maximizing for recall is good, the F1 score is more directly correlated with tracking a fish across the screen.</li>
  <li>Try and hit ~1000 instances of an object class that is to be tracked</li>
</ul>

<h4 id="what-does-a-label-balanced-look-like-in-object-detection">What does a label “balanced” look like in object detection?</h4>
<p>In this context, “balanced” doesn’t refer to the visual appearance of a label itself. Instead, it describes the distribution of labels within a dataset used to train an object detection model. A balanced dataset ensures that all object categories have roughly the same number of examples. This is crucial for optimal model performance, as an unbalanced dataset can lead the model to prioritize the frequently occurring categories and struggle to detect less frequent ones. Ideally, a “balanced” label distribution would have a similar number of labels for each object class, creating a level playing field for the model to learn from all categories effectively.</p>

<p><img src="/assets/images/mitfishery/2d_distribution.png" alt="reasonable balance &gt;&lt;" /></p>

<h4 id="dataset-selection-and-training">Dataset Selection and Training:</h4>
<p>“garbage in, garbage out.” So, I spent a good amount of time curating a diverse dataset with annotated fish images and videos. Lighting conditions, backgrounds, and different fish species etc. By training the model on a curated dataset, I witnessed significant improvements in detection performance.</p>

<h5 id="holdout-set-for-validation">Holdout Set for Validation:</h5>
<p>To make sure our counting game is on point, I set aside a holdout set of videos with ground truth fish counts. It’s like having a benchmark to compare against. I tested the algorithm’s count predictions against the ground truth, giving us valuable insights into its accuracy and effectiveness. No fishy business here!</p>

<h4 id="bayesian-optimization-with-wandbsweeps">Bayesian Optimization with wandb.sweeps:</h4>
<p>I gained enough confidence to narrow down my parameter search space. With the help of <a href="https://docs.wandb.ai/guides/sweeps">Wandb.Sweeps’</a> Bayesian optimization capabilities, I let the algorithm do its magic overnight. It efficiently explored the parameter space and brought me some impressive results. It’s like having a super-smart assistant working while I catch some zzz’s.</p>

<h4 id="tracking">Tracking</h4>

<p>The <a href="https://arxiv.org/abs/2206.14651">“botsort” algorithm</a> leverages motion patterns to estimate object displacement, matches appearances to maintain consistency across frames, and predicts future positions based on historical trajectory data. This comprehensive approach has significantly improved tracking accuracy and facilitated reliable fish counting.</p>

<p>It was selected because it had the best MOTA (Multiple Object Tracking Accuracy) that considers identity switches, false positives, and false negatives.</p>

\[\text{MOTA} = 1 - \frac{\frac{1}{n} \sum_{i=1}^{n}(\text{False Positives}_i + \text{False Negatives}_i + \text{Identity Switches}_i)}{\frac{1}{n} \sum_{i=1}^{n}(\text{Total Number of Ground Truth Objects}_i)}\]]]></content><author><name>Austin Powell</name></author><category term="Computer-Vision" /><summary type="html"><![CDATA[A computer vision-based system to count fish under various conditions entering Massachusetts Fisheries.]]></summary></entry><entry><title type="html">PyBay 2020 Lightning Talk on Predicting NoShows</title><link href="https://aus10powell.github.io/talks/ml/healthcare/pybay-lightning-talk/" rel="alternate" type="text/html" title="PyBay 2020 Lightning Talk on Predicting NoShows" /><published>2020-08-16T00:00:00+00:00</published><updated>2020-08-16T00:00:00+00:00</updated><id>https://aus10powell.github.io/talks/ml/healthcare/pybay-lightning-talk</id><content type="html" xml:base="https://aus10powell.github.io/talks/ml/healthcare/pybay-lightning-talk/"><![CDATA[<p><img src="/assets/images/predictingNoShows/PyBayNoShows.png" alt="PyBayNoShows" />
<a href="/assets/images/predictingNoShows/PyBay NoShows.pdf">Download PDF</a>
<!-- <a href="/assets/images/predictingNoShows/PyBayNoShows.pdf" target="/assets/images/predictingNoShows/PyBayNoShows.png">PDF.</a> --></p>]]></content><author><name>Austin Powell</name></author><category term="Talks" /><category term="ML" /><category term="Healthcare" /><summary type="html"><![CDATA[An overview of common industry challenge solved by machine learning]]></summary></entry><entry><title type="html">Poizon Plants</title><link href="https://aus10powell.github.io/computer/vision/ai/machine-learning/poison-oak/plants/poizon-plants/" rel="alternate" type="text/html" title="Poizon Plants" /><published>2020-05-26T00:00:00+00:00</published><updated>2020-05-26T00:00:00+00:00</updated><id>https://aus10powell.github.io/computer/vision/ai/machine-learning/poison-oak/plants/poizon-plants</id><content type="html" xml:base="https://aus10powell.github.io/computer/vision/ai/machine-learning/poison-oak/plants/poizon-plants/"><![CDATA[<p><strong>A practical iOS app identifying poison oak.</strong></p>
<h1 id="poizon-plants">Poizon Plants</h1>
<h2 id="table-of-contents">Table of Contents</h2>
<ul id="markdown-toc">
  <li><a href="#poizon-plants" id="markdown-toc-poizon-plants">Poizon Plants</a>    <ul>
      <li><a href="#table-of-contents" id="markdown-toc-table-of-contents">Table of Contents</a>        <ul>
          <li><a href="#intromotivation" id="markdown-toc-intromotivation">Intro/Motivation</a></li>
          <li><a href="#cold-start-problem" id="markdown-toc-cold-start-problem">Cold-Start Problem</a>            <ul>
              <li><a href="#object-detections-vs-classification" id="markdown-toc-object-detections-vs-classification">Object Detections Vs Classification</a></li>
              <li><a href="#labeling-process" id="markdown-toc-labeling-process">Labeling Process</a></li>
              <li><a href="#choice-of-cut-off-probability" id="markdown-toc-choice-of-cut-off-probability">Choice of Cut-off Probability</a></li>
            </ul>
          </li>
        </ul>
      </li>
      <li><a href="#ios-app" id="markdown-toc-ios-app">iOS App</a></li>
      <li><a href="#modeling" id="markdown-toc-modeling">Modeling</a>        <ul>
          <li><a href="#training" id="markdown-toc-training">Training</a></li>
        </ul>
      </li>
      <li><a href="#tayloring-model-for-mobile-platform-converting-tensorflow-to-tf-lite" id="markdown-toc-tayloring-model-for-mobile-platform-converting-tensorflow-to-tf-lite">Tayloring model for mobile platform (Converting Tensorflow to TF-Lite)</a>        <ul>
          <li><a href="#convert-keras-trained-model-to-tf-lite" id="markdown-toc-convert-keras-trained-model-to-tf-lite">Convert Keras trained model to TF-lite</a></li>
          <li><a href="#save-model-for-use-in-ios" id="markdown-toc-save-model-for-use-in-ios">Save model for use in iOS</a></li>
        </ul>
      </li>
      <li><a href="#other-training-lessons-learned" id="markdown-toc-other-training-lessons-learned">Other Training Lessons Learned</a></li>
      <li><a href="#possible-next-steps" id="markdown-toc-possible-next-steps">Possible Next Steps:</a></li>
      <li><a href="#conclusion" id="markdown-toc-conclusion">Conclusion</a>        <ul>
          <li><a href="#references" id="markdown-toc-references">References</a></li>
          <li><a href="#papers-i-checked-out" id="markdown-toc-papers-i-checked-out">Papers I checked out</a></li>
        </ul>
      </li>
    </ul>
  </li>
</ul>

<style>
    .iframe-container {
        float: right;
        margin-right: 10px;
        margin-left: 10px;
    }
</style>

<div class="iframe-container">
  <iframe src="https://aus10powell-poizon-app.hf.space" width="800" height="1500" display="block" scrolling="yes" frameborder="0"></iframe>
</div>

<p><img src="/assets/images/poizon_plants/poizon_plants_app.jpg" alt="image" style="float: left; margin-right: 1em;" /></p>

<h3 id="intromotivation">Intro/Motivation</h3>
<p>I have gotten poison oak multiple times. While exposure to the oil found on the plant will not cause an allergic reaction for everyone, the fact remains it’s not fun. Also, it should be noted that while some people definitely do not have an allergic reaction, there is no proof that you won’t develop an allergic reaction over time with continued exposure, according to the American Osteopathic College of Dermatology and other sources. Basically, no one is really safe.</p>

<p>Of course, if you don’t live on the West Coast, all of this might not matter too much. While <a href="https://en.wikipedia.org/wiki/Urushiol">Urushiol</a> (the oil on poison oak causing the allergic reaction) can be found in plants all over the world, it seems to really love the North American West Coast. Those of us living in California or Oregon for more than a few years are likely familiar with, or at least have heard of, poison oak. But even people native to California (much less one of the millions of tourists) have trouble identifying the plant if it’s not in its signature glowing, oily red. It seems like a good use-case for computer vision.</p>

<p>Given that mobile phones are ubiquitous even when out enjoying nature, creating a poison oak app seemed like a useful project and learning opportunity.</p>

<p>Side note: I had no interest in duplicating effort for something that already had a solution, so I did a little research to see if there were any existing solutions for this niche personal challenge. Interestingly enough, there were already a few apps on the iOS store that were simple classification apps like the one I proposed.</p>

<h3 id="cold-start-problem">Cold-Start Problem</h3>
<p><img src="/assets/images/poizon_plants/classic_red_poison_oak.jpg" alt="image" title="Classic Red Poison Oak Bush" style="float: right; margin-right: 1em;" /></p>

<p>As with any supervised learning challenge, it is critical to get a good source of labeled data. Perhaps the obvious place to start for poison oak was Google Images. Using Chrome extension <a href="https://chrome.google.com/webstore/detail/imageye-image-downloader/agionbommeaifngbhincahgmoflcikhm?hl=en">here</a>, I was able to download large quantities of images.</p>

<p>Semantics of the search were somewhat important (“poison oak”, “poison oak bush”, “poison oak autumn”, etc.), so I focused on making searches that got images of the plant under different seasons. The thought here being to capture as much heterogeneity of poison oak as possible for users of the app. Unsurprisingly, this is the first place where the data revealed its bias since the major season captured for poison oak seemed to be when it was at its most obvious: red or orangish-red starting end of summer. But this is for poison oak during the later half of the year and not even necessarily when it’s at its oiliest.</p>

<p>For the “not poison” images, I tried to cover a broad flora that might exist geographically with poison oak and especially on plants whose leaves I thought would confuse the average hiker. This initial dataset netted me <strong>about 3k images</strong> in total after cleaning out the expected garbage images (cartoons images, stock photo images with watermarks, etc.). Since transfer learning is available, this was a great start and stage to baseline model performance.</p>

<p>To obtain a preliminary understanding of the signal present in the dataset, I implemented a Resnet50 on an 80/20 random split of the data. With minor tuning, I achieved a validation accuracy of over 98%. However, this result appeared suspiciously ideal and indicated a strong presence of bias in the dataset. I suspect that the image data obtained from Google Images may have been generated by a model similar to the one I used for training. Minor differences in data cleaning and algorithms may be responsible for the suboptimal performance, preventing me from achieving 100% accuracy with the model trained on the Google Images.</p>

<p>Of course, the only solution to poor data is get more data. And hopefully that captures more variety for the use-case your modeling for. The benefit of acquiring data this way is that I would be:</p>
<ul>
  <li>1) Labeling on the go and would have more intuition what my model was and wasn’t understanding well.</li>
  <li>2) Refining the concept-space of what constituted a reasonable recognition of an image containing poison oak since this could include multiple plants.</li>
  <li>3) Have a reason to go on more hikes.</li>
</ul>

<h4 id="object-detections-vs-classification">Object Detections Vs Classification</h4>
<p>The question arises as to why we refrain from treating this issue strictly as an object detection problem. In an ideal scenario, you would draw a bounding box encapsulating the identified leaf or plant. However, anchoring the project in a practical use-case: an individual equipped with a cell-phone who wants to maintain their distance from the plant in question. As they incrementally approaches, their certainty concerning the classification converges.</p>

<h4 id="labeling-process">Labeling Process</h4>

<p>There is a growing demand for quality labeling services. Perhaps the most well known is <a href="https://www.mturk.com">Mechancial Turk</a> which provides relatively inexpensive labeling service. The challenge here is finding a domain experts which in my case relied on (most likely) utilizing, possibly, non-Californians to decide if a plant was poison oak or not. For an in-house approach services such as <a href="https://labelbox.com">LabelBox</a> and <a href="https://spacy.io/universe/project/prodigy/">Spacy Prodigy</a> provide a great improvement on the efficiency of human labeling. While I’ve had great experiences with Prodigy for NLP projects, since LabelBox provides a limited free access, I utilized their product. Regardless, it’s a tedious process.</p>

<p>Again, I hand-labeled 8000 images that I took with my iphone 7 plus!</p>

<h4 id="choice-of-cut-off-probability">Choice of Cut-off Probability</h4>
<p>This is a choice which I’ve seen discussed very rarely for a binary classification problem (at least in workshops/tutorials/books). It can have great practical implications for the end-user. Generally you assume the cut-off for a binary classifier at 50/50 for deciding whether to bin the output of the softmax as a 1 or 0. I chose to model the app as a degree of certainty that a given image was poison oak based on feedback from different people I had test the app out. In this case, binning the probabilities into categories such as “possibly poison oak”, “definitely poison oak”, etc. provided better intuition to the user that a 60% vs a 80% probability.</p>

<figure class="half">
<a href="/assets/images/poizon_plants/IMG_2268.jpg"><img src="/assets/images/poizon_plants/probability_cutoff_not_poison.jpg" style="width:100%;height:90%" /></a>
<a href="/assets/images/poizon_plants/IMG_3408.jpg"><img src="/assets/images/poizon_plants/probability_cutoff_is_poison.jpg" style="width:100%;height:90%" /></a>
<figcaption>Probability Cut-off.</figcaption>
</figure>

<h2 id="ios-app">iOS App</h2>
<ul>
  <li><strong>App Starting point:</strong> <a href="https://developer.apple.com/documentation/createml/creating_an_image_classifier_model">Creating an image classifier</a></li>
  <li><strong>Adds on app:</strong> <a href="https://firebase.google.com/docs/admob">Admob and Firebase</a></li>
</ul>

<h2 id="modeling">Modeling</h2>
<p>While initial base-lining of model was quit easy with Apple’s CoreML, it quickly became obvious that the developer-friendly approach to ML wasn’t going to cut it for my self-assigned project. It was a great generalizer and got me 0-60 super quick.</p>

<h3 id="training">Training</h3>
<ul>
  <li><strong>Model Tracking:</strong> Utilization of Weights &amp; Biases (<a href="https://www.wandb.com/">wandb.com</a>) was very useful when iterating over different models. It even has an image viewer so that you can peek at how you model is predicting example images while it is training.<a href="https://www.wandb.com/articles/introduction-hyperparameter-sweeps">Wandb Hyper-parameter Sweeps</a> was also very helpful in reducing manual rerunning of hyperparameters.</li>
</ul>

<figure class="half">
<a href="/assets/images/poizon_plants/wb_terminal.png"><img src="/assets/images/poizon_plants/wb_terminal.png" style="width:100%;height:90%" /></a>
<a href="/assets/images/poizon_plants/wb_dashboard.png"><img src="/assets/images/poizon_plants/wb_dashboard.png" style="width:100%;height:90%" /></a>
    <figcaption>Contextual image information may often be a confounder.</figcaption>
</figure>

<ul>
  <li><strong>Choice of architecture:</strong>:
Due to wanting to have a very controlled way of dealing with outliers, i.e. more distant pictures of a bush of poison oak it was necessary to use both dropout and L2 regularization (<a href="http://www.chioka.in/differences-between-l1-and-l2-as-loss-function-and-regularization/">Here is a good blog post on this topic</a>). L2 was a reasonable choice for regularization since I wanted to smooth more for those outliers (as opposed to L1 which does not penalize as strongly). The reason for this is in the figure below:
    <ul>
      <li>We don’t particularly care this image has a probability of 92% vs 99%</li>
      <li>We DO care if THIS image has a probability of 60% vs 5%:</li>
    </ul>
  </li>
</ul>

<h2 id="tayloring-model-for-mobile-platform-converting-tensorflow-to-tf-lite">Tayloring model for mobile platform (Converting Tensorflow to TF-Lite)</h2>
<ul>
  <li><strong>Why:</strong>
    <ul>
      <li>CoreML is not as well supported with documentation etc for complex ML training</li>
      <li>TF-Lite could also easily be used on Android devices (or pretty much anywhere else for that matter)</li>
      <li>Optimizations are offered for TF Lite:
        <ul>
          <li>Default</li>
          <li>Latency</li>
          <li>Size
However with my testing (TF V2.3.0), there was a significant (~5%) drop in F1 when optimizing for Latency or Size and not a <u>significant change in user experience</u>.</li>
        </ul>
      </li>
    </ul>
  </li>
</ul>

<h3 id="convert-keras-trained-model-to-tf-lite">Convert Keras trained model to TF-lite</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># TF: v2.3.0
# Python: v3.8.2
</span>
<span class="kn">from</span> <span class="nn">tensorflow.keras.models</span> <span class="kn">import</span> <span class="n">load_model</span>
<span class="kn">from</span> <span class="nn">tensorflow.lite.TFLiteConverter</span> <span class="kn">import</span> <span class="n">from_keras_model</span>

<span class="c1"># Load model (I had saved as a .model file and it worked fine)
</span><span class="n">model</span> <span class="o">=</span> <span class="n">load_model</span><span class="p">(</span><span class="n">model_path</span><span class="p">,</span> <span class="n">custom_objects</span><span class="o">=</span><span class="n">dependencies</span><span class="p">,</span><span class="nb">compile</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>

<span class="n">converter</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">lite</span><span class="p">.</span><span class="n">TFLiteConverter</span><span class="p">.</span><span class="n">from_keras_model</span><span class="p">(</span><span class="n">model</span><span class="p">)</span>

<span class="c1"># Set True to optimize model for speed
</span><span class="k">if</span> <span class="bp">False</span><span class="p">:</span>
    <span class="k">print</span><span class="p">(</span><span class="s">'[INFO] Optimizing TF Lite model'</span><span class="p">)</span>
    <span class="n">converter</span><span class="p">.</span><span class="n">optimizations</span> <span class="o">=</span> <span class="p">[</span><span class="n">tf</span><span class="p">.</span><span class="n">lite</span><span class="p">.</span><span class="n">Optimize</span><span class="p">.</span><span class="n">OPTIMIZE_FOR_SPEED</span><span class="p">]</span> 
<span class="n">tflite_model</span> <span class="o">=</span> <span class="n">converter</span><span class="p">.</span><span class="n">convert</span><span class="p">()</span>
</code></pre></div></div>
<p>With my testing, the optimized models did not significantly (even on my dated iPhone 8 plus) change the user experience. The f1-score did significantly drop a few points however, and this was enough for me to use the default settings in TF Lite. I assume on much smaller IOT devices latency and size would greatly outweigh relatively small metric gains from the default.</p>

<h3 id="save-model-for-use-in-ios">Save model for use in iOS</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">pathlib</span>

<span class="c1"># Write model to file
</span><span class="n">tflite_model_file</span> <span class="o">=</span> <span class="n">pathlib</span><span class="p">.</span><span class="n">Path</span><span class="p">(</span><span class="s">'./poison_not_poison.tflite'</span><span class="p">)</span>
<span class="n">tflite_model_file</span><span class="p">.</span><span class="n">write_bytes</span><span class="p">(</span><span class="n">tflite_model</span><span class="p">)</span>

<span class="c1"># Write labels to file
</span><span class="n">labels</span> <span class="o">=</span> <span class="p">[</span><span class="s">'is_poison'</span><span class="p">,</span><span class="s">'not_poison'</span><span class="p">]</span>

<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s">'poison_labels.txt'</span><span class="p">,</span><span class="s">'w'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
    <span class="n">f</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="s">'</span><span class="se">\n</span><span class="s">'</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">labels</span><span class="p">))</span>
</code></pre></div></div>

<div style="text-align:center"><img src="/assets/images/poizon_plants/xcode_model.png" /><figcaption>Drop model in iOS app</figcaption></div>

<h2 id="other-training-lessons-learned">Other Training Lessons Learned</h2>
<ul>
  <li>
    <p>Initially, I made all layers trainable which resulted in a extremely spiky, but generally decreasing validation loss but only after huge number of 250 epochs. After running some experiments, I found a much steadier decrease of validation loss at training only the last 20 layers which resulted in a smoother convergence and higher overall accuracy. I also experimented with different optimizers, including Adam and SGD, and found that Adam yielded the best results. In addition, I utilized data augmentation techniques such as random rotations and flips to further improve the model’s robustness. Finally, I fine-tuned the hyperparameters of the model, such as the learning rate and batch size, to achieve the best possible performance.</p>
  </li>
  <li>
    <p><strong>Choice of object detection vs classification task</strong>
When thinking about how to make the app practical for hikers vs what was practical for modeling, it seemed like there were 2 choices: focus on the object detection task or a classification task.</p>
  </li>
</ul>

<div style="text-align:center"><img src="/assets/images/poizon_plants/object_dection_bush.jpg" /><figcaption>Practically speaking, it seems more useful that a person would pull out their phone and sweep the camera over a range of plants. Looking ahead, this would involve a lot of manual labeling of bounding boxes and too much work. The project could always be easily extended to include this scope also. So classification it was.</figcaption></div>

<ul>
  <li>
    <p><strong>Getting to &gt; 90% F1 score</strong>
This goal was set both for practical reasons (having a classifier app that was a satisfying product), as well as a mental milestone for “I understand training an image classifier”. There is potentially a lot of learning to be had between an 87% F1 score that I struggled with for a while and 90% F1-score that I finally achieved. Achieving a &gt;90% F1 score was a critical goal for developing a high-quality image classifier app, as well as a personal milestone for my understanding of training such a model. The journey from struggling with an 87% F1 score to finally hitting 90% taught me a lot about the learning potential between these two points.</p>
  </li>
  <li>
    <p>It was particularly important to optimize on the 299x299 image size used by iOS during model development for performance, compatibility with the iOS platform and consistent testing.</p>
  </li>
  <li>
    <p><strong>Focus on your highest-loss images to understand true performance</strong> 
  This was useful both for cleaning bad/poor quality images as well as finding areas where potentially you might need to break out a new class, e.g. “Unknown bush”</p>
  </li>
  <li><strong>Think your data is well-labeled? Think again…and again:</strong>
    <ul>
      <li>Despite having gone through and labeled, by hand, thousands of images, examining images with top log-loss showed the my human error. I think one reason for this is during the labeling process, I had additional context for “yes this is poison oak” due to having walked by a large bush of poison oak already. When looking only at the photo that was taken with no additional context, which is what the neural net is doing, it was not clear to my human eye.</li>
    </ul>
  </li>
  <li><strong>How do you know when your data is enough?</strong>
For the practicality of my problem, where a region captured with a phone camera “may contain poison oak”, it was not immediately clear if I had taken pictures with sufficient variation. An example of when this issue first surfaced was when a reddish plant, which was not poison oak, caused trouble for the algorithm. This presented a bit of a challenge because while red is a strong indicator (for both humans and neural nets) for identifying sumac plants, it is not a definitive rule for identifying poison oak.”
    <figure class="half">
  <a href="/assets/images/poizon_plants/IMG_2268.jpg"><img src="/assets/images/poizon_plants/IMG_2268.jpg" style="width:100%;height:90%" /></a>
  <a href="/assets/images/poizon_plants/IMG_3408.jpg"><img src="/assets/images/poizon_plants/IMG_3408.jpg" style="width:100%;height:90%" /></a>
  <figcaption>Strong red color can be indicative of poison oak but also a strong false positive.</figcaption>
  </figure>
  </li>
  <li>
    <p><strong>Training</strong>
Augmentation is quit helpful for this use-case. Shift and rotation especially. But I came across a moment when my validation accuracy would reach a certain point around 70% where it wouldn’t get better; but it wouldn’t get worse either. This is due to the rookie mistake of augmenting the validation data as well as the training data.</p>
  </li>
  <li><strong>A fair comparison:</strong>
    <figure class="half">
  <a href="/assets/images/poizon_plants/IMG_4273..jpg"><img src="/assets/images/poizon_plants/IMG_4273.jpg" style="width:100%;height:90%" /></a>
  <a href="/assets/images/poizon_plants/IMG_IMG_4274.jpg"><img src="/assets/images/poizon_plants/IMG_4274.jpg" style="width:100%;height:90%" /></a>
   <figcaption>Contextual image information may often be a confounder.</figcaption>
  </figure>
  </li>
</ul>

<h2 id="possible-next-steps">Possible Next Steps:</h2>
<p>Model experimentation with better feature extraction to take better advantage of the obvious leaf colors (edge detection, color space conversion and texture analysis, etc).  Different model types such as Probabilistic Neural Network (PNN) that will accomidate a level of uncertainty among densly populated leaves in addition to be faster which is always an advantage on edge. This would be particularly useful where PNNs for classifying multiple different species of plants where the number of plants can potentially be very large and dense.</p>

<h2 id="conclusion">Conclusion</h2>
<p>If you’ve gotten this far: thanks for reading!</p>

<p>Several training lessons were learned during the development of the Poizon Plant iOS app. I discovered that training only the last 20 layers and using Adam optimizer led to a smoother convergence and higher overall accuracy. Data augmentation techniques, such as random rotations and flips, were utilized to further improve the model’s robustness. In addition, the hyperparameters of the model were fine-tuned to achieve the best possible performance. The choice of classification task over object detection was made for practical reasons, and a goal of achieving over 90% F1 score was set. I found that focusing on high-loss images helped to understand true performance, and careful consideration was given to label the data properly. Also, the use of GPUs was important to maintain the momentum of the project. Finally, a fair comparison between images was essential to avoid contextual information as a confounder. Overall, these lessons demonstrate the importance of careful consideration and experimentation during the development of deep learning models.</p>

<p>I deployed the model using TensorFlow Lite, a mobile framework for running machine learning models on mobile devices. To make the model even more efficient on mobile, I used the MobileNetV2 architecture, which is designed specifically for mobile devices and has a smaller memory footprint than other CNN architectures. With the model deployed and optimized for mobile, it was ready for use in real-world applications.</p>

<h3 id="references">References</h3>
<ul>
  <li><strong><a href="https://apps.apple.com/us/app/poizon-plant/id1475980295" title="Link to iOS App">Poizon Plant iOS app</a></strong> (Note: the app being available to download is contingent upon free ads covering the registration cost of iOS apps. If the link is broken, it’s likely because there wasn’t enough ad revenue)</li>
</ul>

<h3 id="papers-i-checked-out">Papers I checked out</h3>
<ul>
  <li><a href="https://ieeexplore.ieee.org/abstract/document/4458016">A Leaf Recognition Algorithm for Plant Classification Using Probabilistic Neural Network</a></li>
  <li><a href="https://arxiv.org/abs/2003.05379">Plant Disease Detection Using Deep learning</a></li>
  <li><a href="https://arxiv.org/pdf/2004.12599.pdf">Deploying Image Deblurring across Mobile Devices: A Perspective of Quality
and Latency</a></li>
</ul>]]></content><author><name>Austin Powell</name></author><category term="Computer" /><category term="Vision" /><category term="AI" /><category term="Machine-Learning" /><category term="Poison-Oak" /><category term="Plants" /><summary type="html"><![CDATA[A computer vision classifier to make identifying poison oak practical.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://aus10powell.github.io/assets/images/poizon_plants/poizon_plants_app.jpg" /><media:content medium="image" url="https://aus10powell.github.io/assets/images/poizon_plants/poizon_plants_app.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Word Embedding Comparisons for Disease Name Entity Recognition</title><link href="https://aus10powell.github.io/general/nlp-word-embeddings/" rel="alternate" type="text/html" title="Word Embedding Comparisons for Disease Name Entity Recognition" /><published>2018-10-31T00:00:00+00:00</published><updated>2018-10-31T00:00:00+00:00</updated><id>https://aus10powell.github.io/general/nlp-word-embeddings</id><content type="html" xml:base="https://aus10powell.github.io/general/nlp-word-embeddings/"><![CDATA[<h2 id="intro-and-overview">Intro and Overview</h2>
<p>Named Entity Recognition (NER) is a crucial task in natural language processing, particularly in the field of biomedical research. With the rise of ELMo embeddings, I was curious to see how they stacked up against other popular word embedding techniques for a DNER task. Accurate identification and extraction of disease entities from medical texts is necessary for several applications, such as clinical decision support systems, drug discovery, and epidemiological studies. It can help researchers and healthcare professionals quickly identify relevant information from a vast amount of medical literature and electronic health records, leading to better patient outcomes and more efficient medical research. In this quick comparison, I explore the performance of different word embedding techniques for the DNER task, with a focus on the much-touted ELMo embeddings.</p>

<p>The motivation for comparing word embeddings for Named Entity Recognition (NER) with diseases and adverse conditions stems from the recent popularity of ELMo embeddings in health-related natural language processing tasks. ELMo embeddings have shown promising results in capturing context-specific information and could potentially enhance the performance of NER models in the biomedical domain. Therefore, I aimed to investigate the suitability of ELMo embeddings alongside other popular techniques, such as word2vec, GloVe, and fastText, for DNER tasks.</p>

<p>ELMo embeddings, developed by researchers at Allen Institute for AI, are a breakthrough in the field of natural language processing. These embeddings are unique because they are contextualized: they capture different shades of meaning of a word based on its surrounding context. This is a significant improvement over traditional word embeddings, which treat each word as a static entity independent of its context. The ELMo model is based on a deep, bi-directional language model that learns to predict the next word in a sentence given both its preceding and following words. By doing so, it captures a rich representation of the word’s meaning, informed by both its syntactic and semantic context. The resulting embeddings are highly effective in a wide range of natural language processing tasks, including named entity recognition, sentiment analysis, and question answering.</p>

<p>I compared different word embedding techniques for Disease Named Entity Recognition (DNER) task. I focused on three popular methods, i.e., word2vec, GloVe, and fastText, and evaluated their performance on a biomedical text corpus.</p>

<p>The experiments were conducted on two datasets with distinct characteristics: one consisting of clinical notes from hospital settings, and the other from biomedical literature. The datasets were chosen to reflect the diversity of text sources encountered in real-world DNER tasks. To ensure fair evaluation, I employed a rigorous cross-validation protocol and carefully selected a representative subset of the data for training, validation, and testing. The resulting benchmark allows for meaningful comparisons of the performance of different embedding techniques across different text domains.</p>

<p>To evaluate the performance of the different word embeddings, we used a variety of evaluation metrics, including precision, recall, and F1 score. These metrics provide a comprehensive view of the embeddings’ strengths and limitations in identifying named entities in biomedical text. However, as pointed out by Rob Hyndman in his work on forecasting accuracy measures, it is important not to rely solely on these metrics and to consider other factors such as computational efficiency and interpretability when selecting a word embedding technique. Moreover, we should be cautious of overfitting to a particular dataset and ensure that the embeddings generalize well to new data. By taking a nuanced approach to evaluation, we can gain a better understanding of the capabilities and limitations of each word embedding technique and make informed decisions when selecting an appropriate method for a DNER task.</p>

<p>However, my study also revealed that combining different embedding techniques and incorporating domain-specific features could further improve the performance of the models. I suggested that further research is needed to explore the use of hybrid models for biomedical named entity recognition tasks.</p>

<p>The finding that the choice of embedding method had minimal impact on the models’ performance is important, as it suggests that practitioners can use a range of embedding methods without sacrificing performance. This echoes the sentiment expressed by Rob Hyndman in his work on time series forecasting, where he argues that the choice of forecasting method is less important than other factors such as data quality and feature selection. However, it’s worth noting that this finding is based on the specific datasets and evaluation metrics used in this effort. As such, practitioners should exercise caution in extrapolating these findings to other contexts and datasets.</p>

<p>It’s clear that the choice of word embedding technique can have a significant impact on the performance of natural language processing models, especially for tasks such as DNER.</p>

<h2 id="results">Results</h2>
<p>All scores reported are the F1-Micro on BIO format for the Disease and Adverse Effect Entities. Essentially both entities are scored if they are recognized as a whole. E.g. if heart failure atrial fibrillation chf [Disease] is but not if heart failure atrial fibrillation.
F1-Micro is the harmonic mean of a micro-averaged precision and recall (see below). It is a reasonable (more on that below) choice as a metric as it accounts for the class imbalance of Adverse and Disease entities and we don’t have a particular metric needed to optimize for:
Micro-Precision: (TP1+TP2)/(TP1+TP2+FP1+FP2)
Micro-Recall: (TP1+TP2)/(TP1+TP2+FN1+FN2)
Results:</p>

<table>
  <thead>
    <tr>
      <th>Embeddings</th>
      <th>F1-Micro Score</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>ELMo Embeddings (5.5b,200d)</td>
      <td>0.779 ± 0.02</td>
    </tr>
    <tr>
      <td>EHR/Biomedical Text Embeddings</td>
      <td>0.493 ± 0.05</td>
    </tr>
    <tr>
      <td>(approx 3b words, w2v cbow, 200d)</td>
      <td> </td>
    </tr>
    <tr>
      <td>GloVe (42b,300d)</td>
      <td>0.811 ± 0.04</td>
    </tr>
    <tr>
      <td>GloVe (6b,50d)</td>
      <td>0.750±0.04</td>
    </tr>
    <tr>
      <td>GloVe (6b,100d)</td>
      <td>0.780 ± 0.01</td>
    </tr>
    <tr>
      <td>GloVe (6b, 200d)</td>
      <td>0.804± 0.04</td>
    </tr>
    <tr>
      <td>GloVe (6b, 300d)</td>
      <td>0.816 ± 0.03</td>
    </tr>
    <tr>
      <td>FastText (16bn, 300d)</td>
      <td>0.791 ± 0.05</td>
    </tr>
  </tbody>
</table>

<h2 id="conclusion">Conclusion</h2>
<p>The results (if they show anything) seem to suggest that the comparison may not be entirely fair. However, what stood out the most was the surprisingly poor performance of the EHR embeddings. This observation underscores the need for a much larger corpus, especially considering that these embeddings were trained using the cbow word2vec method, which may not be the optimal choice for capturing rare disease words. In contrast, the GloVe embeddings excel at weighing rare words through their co-occurrence frequency, as highlighted by their comparative performance.</p>

<p>It is worth noting that the AllenNLP website, which hosts the ELMo embeddings, acknowledges the omission of a comparison with GloVe, as they deemed them not directly comparable.</p>

<p>While F1 score is a ‘safe place’ for data scientists, it should be carefully considered for NER tasks. Boundary errors, as it turns out, are one of the major sources of error in biological applications. Optimizing solely for F1 may cause us to overlook the left flank, where tagging ‘flank’ as a location still represents a partial but significant success. Labeling errors remain a significant concern.</p>

<p>Another factor to consider is the potential sparsity of entities in the corpus used for training. Many openly available biomedical text datasets are based on research articles or abstracts, which are densely packed with biomedical concepts. Additionally, the tone of these datasets is more academic and may not capture the indications for certain diseases that are of importance.</p>

<p>On a related note, as I was writing this post, I stumbled upon Facebook’s Meta-embeddings, which provide a mechanism to determine the most effective embeddings for a specific prediction task. This ensemble-type approach allows for an intriguing exploration of specialized word variations between embeddings. The authors argue that the modeler should not personally choose the embeddings, but rather rely on the objective rigor of DME (Dynamic Meta-Embeddings).</p>

<p>It is important to keep in mind that many of these general-purpose, broad-label embeddings do not clearly define their data cleansing and tokenization methods or specify their particular optimization objectives.</p>

<p>For further reading, I found this resource helpful and relevant in exploring the best and latest in word embeddings. Additionally, if you wish to delve deeper into the trends of last year, you may find this article informative: http://ruder.io/word-embeddings-2017/.</p>

<h4 id="links-to-embeddings">Links to embeddings</h4>
<ul>
  <li>Elmo Embeddings: https://allennlp.org/elmo</li>
  <li>GloVe Embeddings (Common Crawl (42B tokens, 1.9M vocab, uncased, 300d vectors): https://nlp.stanford.edu/projects/glove/</li>
  <li>Disease and Adverse Effects NER dataset that I used: https://www.scai.fraunhofer.de/en/business-research-areas/bioinformatics/downloads/corpus-for-disease-names-and-adverse-effects.html</li>
</ul>]]></content><author><name>Austin Powell</name></author><category term="General" /><summary type="html"><![CDATA[An overview of out-of-box benefits evaluated on open-source health dataset]]></summary></entry></feed>