aus10powell.io

Wise Words of Ovid

2025-02-01T00:00:00+00:00

A wise bot retrieving quotes from antiquity

"The mind which is conscious of right, rectitude, undeviating integrity, despises, laughs at, treats with contempt, the lies of rumor."
— Ovid (Fasti)

Problem Definition and Goal

Challenge

Generally, to reply to a different X tweets (often political or opinionated in nature) with a quote that strikes a neutral tone. Depending on humerous/ironc/sarcastic tone if comment is overtly negative.

Prompt-based “wisdom” responses based on sentiment using cardiffnlp/twitter-roberta-base-sentiment. E.g.:

“Here is why I think we’re seeing this: Time & time again, the media focuses on polls & draws conclusions about the state of the race without focusing on 1.) all Democrats have done & 2.) the real threat of Trump/MAGA extremism. But VOTERS ARE NOT DUMB! Wake up, media.”

Goal

Replying to tweets with relevant quotes.
Using sentiment analysis to determine tone (e.g., humorous, neutral).

Generating/Extracting Quotes:

Data and Challenges

Most of the works are found easily online written in poetic form which is not as straight-forward to parse as regular documents. Found a vast improvement in text extraction from Ovid works using 300 dimension vs 200 dimension embeddings. This is consistent with many of the recommendation for creating RAGs.

Data: Extracting quotes from Ovid’s poetic works.
Key Challenges:
- Parsing poetic text.
- Avoiding hallucinations in LLM-generated quotes.

Tasks Involved in Problem Definition

Ovid Quote Retrieval Task

Quote retreival from different poet works of Ovid for main tweet content.

Quote generation using different LLMs
- The expected challenge of hallucinations arises here particurly when the LLM was not allowed to choose which person it was quoting
- RAG: E.g. “What did Ovid have to say about different political parties”

Sentence-similarity (see Is Cosine-Similarity of Embeddings Really About Similarity?) which was most useful in weeding out totally useless chunks of text. However, as far as generating memorable quotes similar in concept even it was not so much:

Input sentence: Plurality should never be proposed unless needed. (William Occam)

Output sentence 1: Everything should be made as simple as possible, but not simpler. (Albert Einstein)

0.217 similarity using MLX LLM

Output sentence 2: Simplicity is the ultimate sophistication. ( Leonardo da Vinci)

0.175 similarity using MLX LLM

Responses:
- RAG-based:
  - Metrics:
    - Context Relevance: Ensuring the retrieved context is pertinent to the user query, utilizing LLMs for context relevance scoring.
    - Groundedness: Separating the response into statements and verifying each against the retrieved context.
    - Answer Relevance: Checking if the response aptly addresses the original question.
- Political: For political determined the type of response (LLM can generate the actual text), sentiment can be used but also clustering may be useful Clustering Sentence Embeddings to Identify Intents in Short Text
- Take-away: RAG-based methods were better at making factual statements, e.g. “Who is Arachne?” -> “Arachne is the protagonist of a tale in Greek mythology known primarily from the version told by the Roman poet Ovid, which is the earliest extant source for the story.”. Subjective

Tweets needed: Tweet Probability of Repeated Occurrence over Time

Baseline for tweets retrieved needs to seed initial content with replacement.

\[\begin{equation*}P(\text{repeated in 2 weeks}) = 1 - \left( \frac{N - 1}{N} \right)^{\frac{14}{m}}\end{equation*}\]

Substituting in the values:

\[\begin{equation*} P(\text{repeated in 2 weeks}) = 1 - \left( \frac{99}{100} \right)^{14}\end{equation*}\]

Calculating this gives:

\[\begin{equation*} P(\text{repeated in 2 weeks}) \approx 0.135\end{equation*}\]

Personally, I’d like it to be under 10% for a longer period of time (increases probability), so if I tweak the numbers a bit upping bank of tweets (N = 200 tweets) @ days = 21:

\[\begin{equation*}P(\text{repeated in 3 weeks}) \approx 0.0980\end{equation*}\]

Interesting Quote “Hallucinations”:

Fictitious author
- Prompt: Respond to every input with a quote from your peers and the first and last name of who said it.
- Input: “all of this has happened before, all of this will happen again” (Sam Altman)
- Output: “The only way to deal with the void is to continue to try to fill it.” - John Smith
Fictitious quote from author:
- Prompt: Respond to every input with a quote from your peers and the first and last name of who said it.
- Input: Everyone talking re: over-education of youth. But that’s not the problem. Real issue is believing certain work is beneath you. We’ve all answered phones. Jensen Huang (NW $81B) cleaned toilets. I was broke prep cook in a Jewish deli. Gotta start somewhere
- Output: “Real issue is believing certain work is beneath you.” - Jensen Huang
Postured quote: From no one in particular and not particularly memerable.
- Prompt: Generate a response to a self-promotional tweet expressing a strong stance against politics.
- Input: We all come from somewhere, but we’re all Americans.
- Output: “We all come from different places and backgrounds, but at the end of the day, we’re all Americans.” -Unknown

Fine-tuning based on gold standard

I wanted to see how much leverage I could get out of a few-shot training examples. I used the babbage-002 model so perhaps I would have gotten more leverage out of GPT3.*. Even so, the results were not encouraging for ~10 political tweet examples with glib and sometimes stoic quotes from figures such as MLK. It begs the question on how much effort is put into making an LLM work.

Gavin Newsom Tweet

Another day, another common sense gun safety law struck down by a right-wing judge.

This is exactly why we need to amend the U.S. Constitution and establish a Right to Safety. https://sandiegouniontribune.com/news/courts/story/2024-03-12/> san-diego-federal-judge-strikes-down-california-law-limiting-high-frequency-gun-purchases

Fine-tuned response

#2 VenezuelanCuban View Post Originally posted by Hopefully the whole country with be armed like in Venezuela.

It will be, but I don’t see how they will be armed. I don’t think it’s possible. They’re so used to the stateless existence that they will refuse to live under it. They will not adopt the military industrial complex mentality, which in their view is a tool of the state. They’ll have to form their own government. Only then can

Dual-Encoder:

A Dual-Encoder model can help with effective quote retrieval tailored to tweet contexts. This architecture excels in retrieval tasks by computing and comparing embeddings for two input types—in this case, tweet content and a collection of quotes from Ovid’s works.

Semantic Understanding: Dual-Encoders map inputs like tweets and quotes into high-dimensional embeddings that effectively capture their semantic meaning. This ensures the retrieved quote aligns with the tweet’s tone, sentiment, and context.

Scalability: Once encoded into embeddings, quotes can be efficiently searched using similarity measures such as cosine similarity. This approach is ideal for handling large datasets of poetic quotes, enabling real-time response generation.

Tailored Responses with Flexibility: Fine-tuning Dual-Encoder models allows for nuanced and highly relevant responses.

Different architectures of Dual-Encoders (https://aclanthology.org/2022.emnlp-main.640.pdf)

Technology Stack

Infrastructure:

Digital Ocean Droplets utilized for hosting, ensuring efficient and scalable deployment.
Github Actions employed for Continuous Integration/Continuous Deployment (CI/CD) and workflow automation, streamlining the development pipeline.
X API for deployment

Machine Learning Stack:

Leveraging Huggingface’s library for Twitter Sentiment.
Incorporating OpenAI GPT-3.5 Turbo for queries
Langchain integrated to handle specific language-related tasks involving GPT.

Model Training:

Training/experimentation with natural language models on diverse datasets to enhance performance.
Fine-tuning GPT-3.5 Turbo on domain-specific tasks, optimizing for accuracy and relevance.

Relevant Reading

TWEETEVAL is a standardized test bed for seven tweet classification tasks. These are: sentiment analysis, emotion recognition, offensive language detection, hate speech detection, stance prediction, emoji prediction, and irony detection. TWEETEVAL: Unified Benchmark and Comparative Evaluation for Tweet Classification

Future Work

The current focus is on retrieval, but Dual-Encoders could evolve into a hybrid system that combines retrieval and generation. Using a recommendation system that understands context, relevance, and even mood, it could not only rank and refine retrieved quotes but also adapt or paraphrase them using a generative model. This approach ensures the insights provided are tailored, dynamic, and highly relevant.

2025 Technical Reading

2025-01-01T00:00:00+00:00

2025 Reading List

NLP

LLMs

There Are Only 6 RAG Evals

This article argues that RAG (Retrieval-Augmented Generation) evaluation can be simplified to six core metrics based on the relationships between the question (Q), context (C), and answer (A). The author organizes these into three tiers: Tier 1 focuses on retrieval precision and recall; Tier 2 covers primary relationships like context relevance (C Q), faithfulness (A C), and answer relevance (A Q); and Tier 3 includes advanced metrics such as context support (C A), question answerability (Q C), and self-containment (Q A). The framework aims to provide a systematic and exhaustive way to diagnose and improve RAG systems.

Foundations of Large Language Models

Broad overview of the following topics: Model architectuers and training, activation functions, learning techniques, data handling and preocessing, optimization and efficiency, concepts in learning, applciations and methods, techniquies and strategies.

Scaling Laws for Precision

Training in lower precision reduces the model’s “effective parameter count,” allowing us to predict the additional loss incurred from training in low precision and post-train quantization. Authors find that lower precision training can be more compute-efficient, but it can also lead to worse performance. They also find that there is a trade-off between the amount of data a model is trained on and the precision at which it is trained. For example, a model that is trained on a lot of data may perform worse if it is quantized to a lower precision after training. In conclusion, this paper shows that the precision of a language model can have a significant impact on its performance. It is important to consider both the training and inference precision when choosing a precision for a language model.

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

BitNet b1.58 is a new way to make AI language models use less memory and power by reducing each parameter to just three values: -1, 0, or 1. The model works as well as regular 16-bit models when it reaches 3 billion parameters, but uses about 3.5 times less memory and runs almost 3 times faster. At larger sizes (70 billion parameters), it runs even better - about 4 times faster and uses much less memory than standard models. The model saves a lot of energy too - using 71 times less power for its main calculations compared to regular models. Tests show it performs well on language tasks and can handle long training sessions (2 trillion tokens) with good results.

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast,Memory Efficient, and Long Context Finetuning and Inference

ModernBERT is an improved version of the BERT model. It’s designed for tasks like retrieval and classification. The model was trained on 2 trillion tokens and can handle sequences up to 8192 tokens long. It uses modern techniques like rotary positional embeddings and Gated Linear Units. This makes it faster and more memory-efficient. ModernBERT achieves top results in various evaluations, including classification tasks and retrieval in different domains, such as code. It’s also optimized to run efficiently on common GPUs.

STAR: A Simple Training-free Approach for Recommendations using Large Language Models

STAR is a new framework using large language models for recommendations without fine-tuning, featuring a two-stage process: retrieval (using semantic similarity and collaborative information) and ranking. Testing on Amazon Review datasets showed STAR outperformed supervised models on Beauty and Toys categories, and performed nearly as well on Sports items. This suggests LLMs can be effective for recommendation systems without the cost and complexity of fine-tuning.

Agents (LLM-based)

AGENTiGraph: An Interactive Knowledge Graph Platform for LLM-based Chatbots Utilizing Private Data

LLMs are powerful but can struggle with factual consistency and require complex queries. Traditional KG tools are difficult to use and require technical expertise. AGENTiGraph bridges the gap between LLMs and KGs. It uses a multi-agent system where each agent has a specific role, such as interpreting user intent or translating queries into graph operations. This allows for more natural language interaction with KGs and improved accuracy in tasks like question answering.

Agents are not enough

Current AI agents are limited. They can’t handle complex tasks or adapt to different situations.

There are historical challenges with agents, such as limited capabilities and lack of trust.
To improve agents, the authors propose three things:
- A secure version for private tasks.
- A user representation to avoid constant user input.
- A program to manage interactions between user and agents.

The idea is to create an ecosystem with different components working together:

Agents: Focus on specific tasks and can work with each other.
Sims: Represent users with their preferences and privacy settings.
Assistants: Interact with users and manage Sims and Agents to complete tasks.

Agents (Chip Hyuen)

AI agents perceive and act on their environment, with their capabilities defined by available tools and the environment itself. Tools are essential for agents to perceive (read) and act (write), augmenting knowledge, extending capabilities (like math or code execution), and enabling real-world actions. Planning is crucial for complex tasks, requiring plan generation, validation (by heuristics or AI), and execution, ideally decoupled to prevent wasted resources. Foundation models can be used for planning, especially when provided with information about action outcomes, and function calling enables tool use within model APIs. Effective agents require careful consideration of planning granularity, reflection/error correction, tool selection, and robust evaluation to address potential failures in planning, tool usage, or efficiency.

A Survey on LLM-powered Agents for Recommender Systems

This survey reviews LLM-powered agents in recommender systems, categorizing them into three paradigms: recommender-oriented (enhancing core mechanisms), interaction-oriented (improving user dialogue), and simulation-oriented (modeling complex interactions). It analyzes agent architecture (profile, memory, planning, action) and discusses datasets, evaluation, challenges, and future research directions.

LLM-powered Agents for Recommender Systems: A Comprehensive Survey

This comprehensive survey examines the integration of LLM-powered agents in recommender systems, focusing on their architecture, capabilities, and applications. The paper discusses how these agents can enhance recommendation quality through better understanding of user preferences, improved interaction patterns, and more sophisticated reasoning capabilities. It also addresses key challenges in deployment, including computational efficiency, privacy concerns, and evaluation methodologies.

MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning

Proposes MRKL (Modular Reasoning, Knowledge and Language), a neuro-symbolic architecture to overcome limitations of large language models (LMs). It integrates multiple neural models with discrete knowledge and reasoning modules, aiming to handle complex tasks involving knowledge, reasoning, and language processing more effectively than standalone LMs. The paper outlines the architecture, implementation challenges, and AI21 Labs’ implementation, Jurassic-X.

Design Patterns for Securing LLM Agents against Prompt Injections

As AI agents powered by Large Language Models (LLMs) become increasingly versatile and capable of addressing a broad spectrum of tasks, ensuring their security has become a critical challenge. This paper proposes a set of principled design patterns for building AI agents with provable resistance to prompt injection attacks, which exploit the agent’s reliance on natural language inputs. The authors systematically analyze these patterns, discuss their trade-offs in terms of utility and security, and illustrate their real-world applicability through case studies. This is particularly important when agents are granted tool access or handle sensitive information.

Machine Learning

Improving Pinterest Search Relevance Using Large Language Models

Pinterest improved their search relevance using a two-stage approach: (1) A cross-encoder LLM teacher model trained on human-annotated data to predict 5-scale relevance scores, and (2) A lightweight student model trained via knowledge distillation for production serving. The system leverages rich Pin text features including titles, descriptions, synthetic image captions, and user engagement data. The LLM-based approach improved search feed relevance by 2.18% (nDCG@20) and increased fulfillment rates by over 1.5% globally. Key innovations include using multilingual LLMs to generalize across languages, enriched text representations, and large-scale semi-supervised learning through distillation.

Reinforcement Learning: An Overview

This is a pretty hefty paper and can take several hours to read through once but is a good refresher and/or overview. It provides a comprehensive overview of reinforcement learning (RL) and sequential decision making from end of 2024 (December). Based off of a textbook from Kevin Murphy. The paper covers value-based RL, policy-gradient methods, and model-based approaches. It also briefly discusses the integration of RL with large language models (LLMs). This work updates and expands upon chapters 34 and 35 of Murphy’s earlier textbook.

HealthBench: A Benchmark for Evaluating Language Models in Healthcare

HealthBench is a new benchmark for evaluating language models in healthcare applications. It focuses on assessing models’ capabilities in medical knowledge, clinical reasoning, and patient communication. The benchmark includes tasks like medical question answering, clinical case analysis, and patient education. This is particularly important as healthcare applications require high accuracy and reliability, and the benchmark helps identify areas where models need improvement to be safely deployed in healthcare settings.

2024 Technical Reading

2024-01-01T00:00:00+00:00

2024 Reading List

NLP

Large-Language Models

Memory and MIPS (Max Inner Product Search)

Generally speaking, this deals with the concepts of having large corpus of documents that are represented in a high-dimensional space. After being given a new document, you want to find the document in your existing collection that is most similar to it. This is the maximum inner product search. The chat with GPT involves a discussion of this idea in relation to memory.

Transforming wearable data into health insights using large language model agents

An AI system harnessing cutting-edge technology to revolutionize how wearable health data translates into actionable advice for individuals, tackling over 84% of numerical questions and a vast majority of open-ended inquiries.

MealRec+: A Meal Recommendation Dataset with Meal-Course Affiliation for Personalization and Healthiness

The task of meal recommendation involves intricate connections among users, courses, and meals, particularly through meal-course affiliation, yet existing datasets lack this crucial affiliation. To address this gap, the MealRec+ dataset is introduced, leveraging simulation methods to derive meal-course affiliation and user-meal interactions, and demonstrating that cooperative learning of these interactions improves the effectiveness of meal recommendations, with efforts also made to enhance the healthiness of recommendations.

llmNER: (Zero|Few)-Shot Named Entity Recognition, Exploiting the Power of Large Language Models

llmNER is a Python library designed for zero-shot and few-shot named entity recognition (NER) tasks using Large Language Models (LLMs). It simplifies prompt composition, model querying, and result parsing, facilitating efficient prompt engineering for NER applications. The library demonstrates versatility through validation on two NER tasks, aiming to streamline in-context learning research by enhancing prompt and parsing procedures.

Machine Unlearning in 2024

A focuse on removing specific information from trained machine learning (ML) models without retraining them from scratch. The goal being editing away undesirable data, such as private information, outdated knowledge, copyrighted material, harmful content, and misinformation.

Adaptive Retrieval-Augmented Generation for Conversational Systems

This research addresses the question of whether Retrieval Augmented Generation (RAG) is always necessary in conversational AI systems. The study introduces RAGate, a gating model that predicts when external knowledge augmentation is needed for improved responses, based on conversation context and human judgments.

Medical LLMS: A practical guide

An github website for curating papers and resources around healthcare and LLMs.

From Beginner to Expert: Modeling Medical Knowledge into General LLMs

Demonstrates how a smaller-scale language model (10B parameters) can be effectively adapted for medical applications through a 3-stage optimization process: medical knowledge injection, domain instruction tuning, and specific task adaptation. The team developed large-scale medical datasets for each stage and introduced a novel Verification-of-Choice approach for multi-choice questions, significantly enhancing the model’s reasoning abilities. Remarkably, the resulting AntGLM-Med-10B model outperforms many larger language models, including those specifically trained on medical knowledge, showing that strategic fine-tuning and task-specific innovations can compensate for smaller model size in specialized domains.

LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS

LoRA is a method of using the Low-rank weights from LLMs to reduce the amount of training needing to be done while perserving most of the original performance. The gain in training time according to paper can be over a factor of 4

Computer Vision

YoloV10: Real-Time End-to-End Object Detection

Among other improvements in efficiency, the highlight may be tackeling the speed issues caused by NMS.

Faiss: A library for efficient similarity search

Facebook AI Similarity Search (Faiss), a library that allows us to quickly search for multimedia documents that are similar to each other.

DISCount: Counting in Large Image Collections with Detector-Based Importance Sampling

A more efficient and accurate method for counting objects in large image collections by combining automated detection with strategic human involvement.

Recommendation Systems

Page-level Optimization of e-Commerce Item Recommendations

Recommending what video to watch next: a multitask ranking system

Twitter Account Sentiment and Text Generation

2023-02-16T00:00:00+00:00

A text generation and sentiment analysis of 6 Twitter Accounts

TwitterAPI

The TwitterAPI project aims to perform text generation and sentiment analysis on six different Twitter accounts. The project utilizes NLP techniques, Transformers, and the HuggingFace library to generate tweets in the style of the selected accounts and analyze the sentiment of their tweets. Additionally, the project provides a metric for the health of the account audience and identifies anomalous tweets within the last few months. The project is deployed using FastAPI, Docker, and Gunicorn, with models hosted on HuggingFace Spaces. The code and models are stored in a GitHub repository and synced using GitHub Actions.

Screenshot of App

Goal:

Generate a Tweet in the style of one from the following Twitter Accounts (“alikarimi_ak8”, “elonmusk”,”BarackObama”,”taylorlorenz”,”cathiedwood”,”ylecun”). NOTE: alikarimi_ak8 tweets often in Persian however generated tweets are in English.
Provide Sentiment Analysis for the 6 accounts NOTE: Based on saved historical data not live data due to the uncertain nature of scraping tweets with Elon Musk acquisition.
“Account Health” Metric:

Distribution of Obama Twitter Account metric

Provide a metric for the account audience health
Anomalous tweets within the last few months:

Twitter Accounts

Twitter App (warning: may be blank and take a minute to update while free servers spin up) and if scraping library is being block there may be internal error

Deployment

Tech Stack

Web Stack:
- FastAPI
- Docker (To simply build base potentially on larger deployment)
- Gunicorn is used to spawn the FastAPI on four child worker processes using the Asynchronous Uvicorn Worker Class. Each Uvicorn worker class runs the FastAPI app on a randomly chosen process id. The Gunicorn runs on a process id that can be configured to run on a specified port and handles the request delegation. All four instances of the FastAPI will use the same database created in the Azure Database for PostgreSQL Server. The connection to the database is established and disconnected in the startup and shutdown events of the FastAPI, respectively. The App Service deployment configuration automatically pulls and deploys any changes made to the GitHub repository it is configured with.
- Huggingface Spaces (Free, larger-than-standard free compute and memory. Enough to save models on space without using API)
- Docker container deployment on HuggingFace Spaces
ML/DevOps:
- Github Actions (to sync with Huggingface Spaces)
Models:
- Persion to English: For tweets from mt5-base-parsinlu-opus-translation_fa_en
- Sentiment Model Used: twitter-roberta-base-sentiment
- Generative Model Used: OpenAI GPT2.

National Parks

2023-01-01T00:00:00+00:00

National and State Parks

California State Parks
National Parks

California State Parks

280 California State Parks

National Parks

NationalParkName	State	Year Est.	Order Est.	Visitors (2019)	Location	Latitude	Longitude	Square Feet	Description	Visited
Acadia	Maine	1919	13th	3,437,286	44.35°N 68.21°W	44.3876119	-68.2039123	47,389.67 acres (191.8 km2)	Covering most of Mount Desert Island and other coastal islands, Acadia features the tallest mountain on the Atlantic coast, granite peaks, ocean shoreline, woodlands, and lakes. There are freshwater, estuary, forest, and intertidal habitats.[6]	No
American Samoa	American Samoa	1988	50th	60,006	14.25°S 170.68°W	-14.2331268	-169.4760133	9,000.00 acres (36.4 km2)	The southernmost national park is on three Samoan islands and protects coral reefs, rainforests, volcanic mountains, and white beaches. The area is also home to flying foxes, brown boobies, sea turtles, and 900 species of fish.[7]	No
Arches	Utah	1971	35th	1,659,702	38.68°N 109.57°W	38.5719944	-109.4735066	76,518.98 acres (309.7 km2)	This site features more than 2,000 natural sandstone arches, including the Delicate Arch. In a desert climate millions of years of erosion have led to these structures, and the arid ground has life-sustaining soil crust and potholes, natural water-collecting basins. Other geologic formations are stone columns, spires, fins, and towers.[8]	No
Badlands	South Dakota	1978	38th	970,998	43.75°N 102.50°W	43.7480316	-102.4983748	242,755.94 acres (982.4 km2)	The Badlands are a collection of buttes, pinnacles, spires, and grass prairies. It has the world’s richest fossil beds from the Oligocene epoch, and there is wildlife including bison, bighorn sheep, black-footed ferrets, and swift foxes.[9]	No
Big Bend	Texas	1978	27th	463,832	29.25°N 103.25°W	29.2669239	-103.2201828	801,163.21 acres (3,242.2 km2)	Named for the Bend of the Rio Grande along the US–Mexico border, this park includes a part of the Chihuahuan Desert. A wide variety of Cretaceous and Tertiary fossils as well as cultural artifacts of Native Americans exist within its borders.[10]	No
Biscayne	Florida	1980	41st	708,522	25.65°N 80.08°W	35.746512	-39.462891	172,924.07 acres (699.8 km2)	Located in Biscayne Bay, this park at the north end of the Florida Keys has four interrelated marine ecosystems: mangrove forest, the Bay, the Keys, and coral reefs. Threatened animals include the West Indian Manatee, American crocodile, various sea turtles, and peregrine falcon.[11]	No
Black Canyon of the Gunnison	Colorado	1999	55th	432,818	38.57°N 107.72°W	38.4948194	-107.71625	32,950.03 acres (133.3 km2)	The park protects a quarter of the Gunnison River, which has dark canyon walls from the Precambrian era. The canyon has very steep descents, and it is a site for river rafting and rock climbing. The narrow, steep canyon, made of gneiss and schist, is often in shadow, appearing black.[12]	No
Bryce Canyon	Utah	1928	17th	432,818	37.57°N 112.18°W	37.6215335	-112.1549442	35,835.08 acres (145.0 km2)	Bryce Canyon is a giant natural amphitheatre along the Paunsaugunt Plateau. The unique area has hundreds of tall hoodoos formed by erosion. The region was originally settled by Native Americans and later by Mormon pioneers.[13]	No
Canyonlands	Utah	1964	31st	733,996	38.2°N 109.93°W	38.3337024	-110.8076084	337,597.83 acres (1,366.2 km2)	This landscape was eroded into canyons, buttes, and mesas by the Colorado River, Green River, and their tributaries, which divide the park into three districts. There are rock pinnacles and other naturally sculpted rock, as well as artifacts from Ancient Pueblo Peoples.[14]	No
Capitol Reef	Utah	1971	36th	1,226,519	38.20°N 111.17°W	37.9430873	-110.9506636	241,904.26 acres (979.0 km2)	The park’s Waterpocket Fold is a 100-mile (160 km) monocline that shows the Earth’s geologic layers. Other natural features are monoliths and sandstone domes and cliffs shaped like the United States Capitol.[15]	No
Carlsbad Caverns	New Mexico	1930	19th	440,691	32.17°N 104.44°W	32.3391752	-104.3791912	46,766.45 acres (189.3 km2)	Carlsbad Caverns has 117 caves, the longest of which is over 120 miles (190 km) long. The Big Room is almost 4,000 feet (1,200 m) long, and the caves are home to over 400,000 Mexican Free-tailed Bats and sixteen other species. Above ground are the Chihuahuan Desert and Rattlesnake Springs.[16]	No
Channel Islands	California	1980	40th	409,630	34.01°N 119.42°W	33.9948581	-119.7194377	249,561.00 acres (1,009.9 km2)	Five of the eight Channel Islands are protected, and half of the park’s area is underwater. The islands have a unique Mediterranean ecosystem. They are home to over 2,000 species of land plants and animals, and 145 are unique to them. The islands were originally settled by the Chumash people.[17]	No
Congaree	South Carolina	2003	57th	159,445	33.78°N 80.78°W	33.8272919	-80.7436141	26,545.86 acres (107.4 km2)	On the Congaree River, this park is the largest portion of old-growth floodplain forest left in North America. Some of the trees are the tallest in the Eastern US, and the Boardwalk Loop is an elevated walkway through the swamp.[18]	No
Crater Lake	Oregon	1902	5th	704,512	42.94°N 122.1°W	42.9445872	-122.1090039	183,224.05 acres (741.5 km2)	Crater Lake lies in the caldera of Mount Mazama formed 7,700 years ago after an eruption. It is the deepest lake in the United States and is known for its blue color and water clarity. There are two islands in the lake, and, with no inlets or outlets, all water comes through precipitation.[19]	Yes
Cuyahoga Valley	Ohio	2000	56th	2,237,997	41.24°N 81.55°W	41.2361578	-81.55122	32,860.73 acres (133.0 km2)	This park along the Cuyahoga River has waterfalls, hills, trails, and displays about early rural living. The Ohio and Erie Canal Towpath Trail follows the Ohio and Erie Canal, where mules towed canal boats. The park has numerous historic homes, bridges, and structures.[20] The park also offers a scenic train ride with various trips available. [21]	No
Death Valley	California	1994	53rd	1,740,945	36.24°N 116.82°W	36.2460841	-116.8185076	3,372,401.96 acres (13,647.6 km2)	Death Valley is the hottest, lowest, and driest place in the United States. Daytime temperatures have topped 130°F (54°C) and it is home to Badwater Basin, the lowest point in North America. There are canyons, colorful badlands, sand dunes, mountains, and over 1000 species of plants in this graben on a fault line. Further geologic points of interest are salt flats, springs, and buttes.[22]	No
Denali	Alaska	1917	12th	601,152	63.33°N 150.50°W	63.7477803	-150.3064953	4,740,911.72 acres (19,185.8 km2)	Centered around the Mount McKinley, the tallest mountain in North America, Denali is serviced by a single road leading to Wonder Lake. McKinley and other peaks of the Alaska Range are covered with long glaciers and boreal forest. Wildlife includes grizzly bears, Dall sheep, caribou, and gray wolves.[23]	No
Dry Tortugas	Florida	1992	51st	79,200	24.63°N 82.87°W	35.746512	-39.462891	64,701.22 acres (261.8 km2)	The Dry Tortugas on the west end of the Florida Keys are the site of Fort Jefferson, the largest masonry structure in the Western Hemisphere. With most of the park being water, it is the home of coral reefs and shipwrecks and is only accessible by plane or boat.[24]	No
Everglades	Florida	1934	20th	1,118,300	25.32°N 80.93°W	25.2866149	-80.8986502	1,508,537.90 acres (6,104.8 km2)	The Everglades are the largest subtropical wilderness in the United States. This mangrove ecosystem and marine estuary is home to 36 protected species, including the Florida panther, American crocodile, and West Indian manatee. Some areas have been drained and developed; restoration projects aim to restore the ecology.[25]	No
Gates of the Arctic	Alaska	1980	42nd	10,518	67.78°N 153.30°W	65.8443667	-153.4302993	7,523,897.74 acres (30,448.1 km2)	This northernmost park protects part of the Brooks Range and has no park facilities. The land is home to Alaska natives, who have relied on the land and caribou for 11,000 years.[26]	No
Gateway Arch	Missouri	2018	60th	2,055,309						No
Glacier	Montana	1910	8th	3,049,839	48.80°N 114.00°W	48.6162807	-113.8594034	1,013,572.41 acres (4,101.8 km2)	Part of Waterton Glacier International Peace Park, this park has 26 remaining glaciers and 130 named lakes under the tall Rocky Mountain peaks. There are historic hotels and a landmark road in this region of rapidly receding glaciers. These mountains, formed by an overthrust, have the world’s best sedimentary fossils from the Proterozoic era.[27]	No
Glacier Bay	Alaska	1980	43rd	672,087	58.50°N 137.00°W	58.659708	-136.573678	3,224,840.31 acres (13,050.5 km2)	Glacier Bay has numerous tidewater glaciers, mountains, and fjords. The temperate rainforest and the bay are home to grizzly bears, mountain goats, whales, seals, and eagles. When discovered in 1794 by George Vancouver, the entire bay was covered by ice, but the glaciers have receded over 65 miles (105 km).[28]	No
Grand Canyon	Arizona	1919	14th	5,974,411	36.06°N 112.14°W	36.0604107	-112.1411217	1,217,403.32 acres (4,926.7 km2)	The Grand Canyon, carved out by the Colorado River, is 277 miles (446 km) long, up to 1 mile (1.6 km) deep, and up to 15 miles (24 km) wide. Millions of years of exposure has formed colorful layers of the Colorado Plateau in mesas and canyon walls.[29]	No
Grand Teton	Wyoming	1929	18th	3,405,614	43.73°N 110.80°W	43.7325324	-110.8059887	309,994.66 acres (1,254.5 km2)	Grand Teton is the tallest mountain in the Teton Range. The park’s Jackson Hole valley and reflective piedmont lakes contrast with the tall mountains, which abruptly rise from the glacial sage-covered valley.[30]	No
Great Basin	Nevada	1986	49th	131,802	38.98°N 114.30°W	38.9648711	-114.1914951	77,180.00 acres (312.3 km2)	Based around Wheeler Peak, the Great Basin has 5,000-year-old bristlecone pines, glacial moraines, and the limestone Lehman Caves. It has some of the country’s darkest night skies, and there are animal species including Townsend’s big-eared bat, Pronghorn, and Bonneville cutthroat trout.[31]	No
Great Sand Dunes	Colorado	2004	58th	527,546	37.73°N 105.51°W	37.6572135	-105.5474373	42,983.74 acres (173.9 km2)	The tallest dunes in North America are up to 750 feet (230 m) tall and neighbor grasslands, shrublands and wetlands. They were formed by sand deposits of the Rio Grande on the San Luis Valley. The park also has alpine lakes, six 13,000-foot mountains, and ancient forests.[32]	No
Great Smoky Mountains	Tennessee	1934	21st	12,547,743	35.68°N 83.53°W	35.6839416	-83.5323006	521,490.13 acres (2,110.4 km2)	The Great Smoky Mountains, part of the Appalachian Mountains, have a wide range of elevations, making them home to over 400 vertebrate species, 100 tree species, and 5000 plant species. Hiking is the park’s main attraction, with over 800 miles (1,300 km) of trails, including 70 miles (110 km) of the Appalachian Trail. Other activities are fishing, horseback riding, and visiting some of nearly 80 historic structures.[33]	No
Guadalupe Mountains	Texas	1966	32nd	188,833	31.92°N 104.87°W	31.7786811	-104.9856176	86,415.97 acres (349.7 km2)	This park has Guadalupe Peak, the highest point in Texas, the scenic McKittrick Canyon full of Bigtooth Maples, part of the Chihuahuan Desert, and a fossilized reef from the Permian.[34]	No
Haleakala	Hawaii	1961	29th	994,394	20.72°N 156.17°W	20.7278194	-156.3081217	29,093.67 acres (117.7 km2)	The Haleakalā volcano on Maui has a very large crater with many cinder cones, Hosmer’s Grove of alien trees, and the native Hawaiian Goose. The Kipahulu section has numerous pools with freshwater fish. This National Park has the greatest number of endangered species.[35]	No
Hawaii Volcanoes	Hawaii	1916	10th	1,368,376	19.38°N 155.20°W	19.4095008	-155.0483131	323,431.38 acres (1,308.9 km2)	This park on the Big Island protects the Kīlauea and Mauna Loa volcanoes, two of the world’s most active. Diverse ecosystems of the park range from those at sea level to 13,000 feet (4,000 m).[36]	No
Hot Springs	Arkansas	1921	16th	1,467,153	34.51°N 93.05°W	34.5100997	-93.0500476	5,549.75 acres (22.5 km2)	The only National Park in an urban area, this smallest National Park is based around the natural hot springs that have been managed for public use. Bathhouse Row preserves 47 of these with many beneficial minerals.[37]	No
Indiana Dunes	Indiana	2019	61st	2,134,285						No
Isle Royale	Michigan	1940	25th	26,410	48.10°N 88.55°W	47.723087	-86.940716	571,790.11 acres (2,314.0 km2)	The largest island in Lake Superior, this park is a site of isolation and wilderness. It has many shipwrecks, waterways, and hiking trails. The park also includes over 400 smaller islands in the waters up to 4.5 miles (7.2 km) from the island. There are only 20 mammal species and it is known for its wolf and moose relationship.[38]	No
Joshua Tree	California	1994	54th	2,988,547	33.79°N 115.90°W	33.6608927	-115.9467997	789,745.47 acres (3,196.0 km2)	Covering parts of the Colorado and Mojave Deserts and the Little San Bernardino Mountains, this is the home of the Joshua tree. Across great elevation changes are sand dunes, dry lakes, rugged mountains, and granite monoliths.[39]	Yes
Katmai	Alaska	1980	44th	84,167	58.50°N 155.00°W	58.0458356	-156.5982286	3,674,529.68 acres (14,870.3 km2)	This park on the Alaska Peninsula protects the Valley of Ten Thousand Smokes, an ash flow formed by the 1912 eruption of Novarupta, as well as Mount Katmai. Over 2,000 brown bears come here to catch spawning salmon.[40]	No
Kenai Fjords	Alaska	1980	45th	356,601	59.92°N 149.65°W	60.1696088	-149.2361887	669,982.99 acres (2,711.3 km2)	Near Seward on the Kenai Peninsula, this park protects the Harding Icefield and at least 38 glaciers and fjords stemming from it. The only area accessible to the public by road is Exit Glacier, while the rest can only be viewed by boat tours.[41]	No
Kings Canyon	California	1940	24th	632,110	36.80°N 118.55°W	36.7472811	-119.0925356	461,901.20 acres (1,869.2 km2)	Home to several Giant sequoia groves and the General Grant Tree, the world’s second largest, this park also has part of the Kings River, site of the granite Kings Canyon, and San Joaquin River, as well as the Boyden Cave.[42]	No
Kobuk Valley	Alaska	1980	46th	15,766	67.55°N 159.28°W	67.2872981	-160.0342625	1,750,716.50 acres (7,084.9 km2)	Kobuk Valley has 61 miles (98 km) of the Kobuk River and three regions of sand dunes. Created by glaciers, the Great Kobuk, the Little Kobuk, and the Hunt River Sand Dunes can reach 100 feet (30 m) high and 100 °F (38 °C), and they are the largest dunes in the arctic. Twice a year, half a million caribou migrate through the dunes and across river bluffs that contain ice age fossils.[43]	No
Lake Clark	Alaska	1980	47th	17,157	60.97°N 153.42°W	60.59772	-152.7408824	2,619,733.21 acres (10,601.7 km2)	The region around Lake Clark has four active volcanoes, including Mount Redoubt, rivers, glaciers, and waterfalls. There are temperate rainforests, a tundra plateau, and three mountain ranges.[44]	No
Lassen Volcanic	California	1916	11th	517,039	40.49°N 121.51°W	40.5328759	-121.8412133	106,372.36 acres (430.5 km2)	Lassen Peak, the largest plug dome volcano in the world, is joined by all three other types of volcanoes in this park: shield, cinder dome, and composite. Other than the volcano, which last erupted in 1915, the park has hydrothermal areas, including fumaroles, boiling pools, and steaming ground, heated by molten rock under the peak.[45]	Yes
Mammoth Cave	Kentucky	1941	26th	551,590	37.18°N 86.10°W	37.2135982	-86.1548269	52,830.19 acres (213.8 km2)	With 392 miles (631 km) of passageways mapped, Mammoth Cave is by far the world’s longest cave system. Cave animals include eight bat species, Kentucky cave shrimp, Northern cavefish, and cave salamanders. Above ground, the park contains Green River (Kentucky), 70 miles of hiking trails, sinkholes, and springs.[46]	No
Mesa Verde	Colorado	1906	7th	556,203	37.18°N 108.49°W	37.1828071	-108.4922277	52,121.93 acres (210.9 km2)	This area has over 4,000 archaeological sites of the Ancestral Pueblo, who lived here for 700 years. Cliff dwellings built in the 12th and 13th centuries include Cliff Palace, which has 150 rooms and 23 kivas, and the Balcony House, with passages and tunnels.[47]	No
Mount Rainier	Washington	1899	4th	1,501,621	46.85°N 121.75°W	46.8667428	-121.8040452	235,625.00 acres (953.5 km2)	Mount Rainier, an active volcano, is the most prominent peak in the Cascades, and it is covered by 26 named glaciers including Carbon Glacier and Emmons Glacier, the largest in the continental United States. The mountain is popular for climbing, and more than half of the park is covered by subalpine and alpine forests. Paradise on the south slope is one of the snowiest places in the world, and the Longmire visitor center is the start of the Wonderland Trail, which encircles the mountain.[48]	Yes
New River Gorge	West Virginia	2020	63rd	N/A					New River Gorge is known for its stunning scenery, deep canyons, and the New River, one of the oldest rivers in the world.rafting.	No
North Cascades	Washington	1968	33rd	38,208	48.70°N 121.20°W	48.7147254	-121.1533781	504,780.94 acres (2,042.8 km2)	This complex includes the two units of the National Park and the Ross Lake and Lake Chelan National Recreation Areas. There are numerous glaciers, and popular hiking and climbing areas are Cascade Pass, Mount Shuksan, Mount Triumph, and Eldorado Peak.[49]	No
Olympic	Washington	1938	23rd	3,245,806	47.97°N 123.50°W	47.9707283	-123.5035728	922,650.86 acres (3,733.8 km2)	Situated on the Olympic Peninsula, this park ranges from Pacific shoreline with tide pools to temperate rainforests to Mount Olympus. The glaciated Olympic Mountains overlook the Hoh Rain Forest and Quinault Rain Forest, the wettest area of the continental United States.[50]	No
Petrified Forest	Arizona	1962	30th	643,588	35.07°N 109.78°W	35.0199151	-109.7829791	93,532.57 acres (378.5 km2)	This portion of the Chinle Formation has a great concentration of 225-million-year-old petrified wood. The surrounding region, the Painted Desert, has eroded red-hued volcanic rock called bentonite. There are also dinosaur fossils and over 350 Native American sites.[51]	No
Pinnacles	California	2013	59th	177,224	36.48°N 121.16°W	36.4808829	-121.1593104	26,605.73 acres (107.7 km2)	Known for the namesake eroded leftovers of half of an extinct volcano, it is popular for its rock climbing.[52]	Yes
Redwood	California	1968	34th	504,722	41.30°N 124.00°W	41.3678119	-124.0296856	112,512.05 acres (455.3 km2)	This park and the co-managed state parks protect almost half of all remaining Coastal Redwoods, the tallest trees on Earth. There are three large river systems in this very seismically active area, and the 37 miles (60 km) of protected coastline have tide pools and seastacks. The prairie, estuary, coast, river, and forest ecosystems have varied animal and plant species.[53]	Yes
Rocky Mountain	Colorado	1915	9th	4,670,053	40.40°N 105.58°W	40.400054	-105.57937	265,828.41 acres (1,075.8 km2)	This section of the Rocky Mountains has ecosystems varying in elevation from the over 150 riparian lakes to Montane and subalpine forests to the alpine tundra. Large wildlife including mule deer, bighorn sheep, black bears, and cougars inhabit these igneous mountains and glacier valleys. The fourteener Longs Peak and Bear Lake are popular destinations.[54]	Yes
Saguaro	Arizona	1994	52nd	1,020,226	32.25°N 110.50°W	32.0149212	-110.6645683	91,439.71 acres (370.0 km2)	Split into the separate Rincon Mountain and Tucson Mountain Districts, the dry Sonoran Desert is still home to much life in six biotic communities. Beyond the namesake Giant Saguaro cacti, there are barrel cacti, cholla cacti, and prickly pears, as well as Lesser Long-nosed Bats, Spotted Owls, and javelinas.[55]	No
Sequoia	California	1890	2nd	1,246,053	36.43°N 118.68°W	36.4495198	-118.7198004	404,051.17 acres (1,635.1 km2)	This park protects the Giant Forest, which has the world’s largest tree, General Sherman, as well as four of the next nine. It also has over 240 caves, the tallest mountain in the continental U.S., Mount Whitney, and the granite dome Moro Rock.[56]	No
Shenandoah	Virginia	1935	22nd	1,425,507	38.53°N 78.35°W	38.5314894	-78.3522476	199,045.23 acres (805.5 km2)	Shenandoah’s Blue Ridge Mountains are covered by hardwood forests that are home to tens of thousands of animals. The Skyline Drive and Appalachian Trail run the entire length of this narrow park that has more than 500 miles (800 km) of hiking trails along scenic overlooks and waterfalls of the Shenandoah River.[57]	No
Theodore Roosevelt	North Dakota	1978	39th	691,658	46.97°N 103.45°W	46.8747588	-103.2519011	70,446.89 acres (285.1 km2)	This region that enticed and influenced President Theodore Roosevelt is now a park of three units in the badlands. Besides Roosevelt’s historic cabin, there are scenic drives and backcountry hiking opportunities. Wildlife includes American Bison, pronghorns, Bighorn sheep, and wild horses.[58]	No
Virgin Islands	US Virgin Islands	1956	28th	133,398	18.33°N 64.73°W	18.3368114	-64.7280952	14,688.87 acres (59.4 km2)	The island of Saint John has rich human and natural history. There are Taino archaeological sites and ruins of sugar plantations from Columbus’s time. Past the pristine beaches are mangroves, seagrass beds, coral reefs and algal plains.[59]	No
Voyageurs	Minnesota	1971	37th	232,974	48.50°N 92.88°W	48.5494209	-93.3763619	218,200.17 acres (883.0 km2)	This park on four main lakes, a site for canoeing, kayaking, and fishing, has a history of Ojibwe Native Americans, French fur traders called voyageurs, and a gold rush. Formed by glaciers, this region has tall bluffs, rock gardens, islands and bays, and historic buildings.[60]	No
White Sands	New Mexico	2019	62nd	608,785						No
Wind Cave	South Dakota	1903	6th	615,350	43.57°N 103.48°W	43.3759472	-103.4511768	28,295.03 acres (114.5 km2)	Wind Cave is distinctive for its calcite fin formations called boxwork and needle-like growths called frostwork. The cave, which was discovered by the sound of wind coming from a hole in the ground, is the world’s densest cave system. Above ground is a mixed-grass prairie with animals such as bison, black-footed ferrets, and prairie dogs,[61] and Ponderosa pine forests home to cougars and elk.	No
Wrangell - St. Elias	Alaska	1980	48th	74,518	61.00°N 142.00°W	61.3314317	-142.939826	8,323,147.59 acres (33,682.6 km2)	This mountainous land has the convergence of the Alaska, Chugach, and Wrangell-Saint Elias Ranges, which have many of the continent’s tallest mountains over 16,000 feet (4,900 m), including Mount Saint Elias. More than 25% of this park of volcanic peaks is covered with glaciers, including the tidewater Hubbard Glacier, piedmont Malaspina Glacier, and valley Nabesna Glacier.[62]	No
Yellowstone	Wyoming, Montana, Idaho	1872	1st	4,020,288	44.60°N 110.50°W	44.4620852	-110.6424411	2,219,790.71 acres (8,983.2 km2)	Situated on the Yellowstone Caldera, the first national park in the world has vast geothermal areas such as hot springs and geysers, the best-known being Old Faithful and Grand Prismatic Spring. The yellow-hued Grand Canyon of the Yellowstone River has numerous waterfalls, and four mountain ranges run through the park. There are almost 60 mammal species, including the gray wolf, grizzly bear, lynx, bison, and elk.[63]	Yes
Yosemite	California	1890	3rd	4,422,861	37.83°N 119.50°W	37.8276596	-119.5053357	761,266.19 acres (3,080.7 km2)	Yosemite has towering cliffs, waterfalls, and sequoias in a diverse area of geology and hydrology. Half Dome and El Capitan rise from the central glacier-formed Yosemite Valley, as does Yosemite Falls, North America’s tallest waterfall. Three Giant Sequoia groves and vast wilderness are home to diverse wildlife.[64]	Yes
Zion	Utah	1919	15th	4,488,268	37.30°N 113.05°W	37.088313	-113.2394452	146,597.60 acres (593.3 km2)	This geologically unique area has colorful sandstone canyons, high plateaus, and rock towers. Natural arches and exposed formations of the Colorado Plateau make up a large wilderness of four ecosystems.[65]	No

2023 Technical Reading

2023-01-01T00:00:00+00:00

2023 Reading List

Signal Processing

General

End-to-End Signal Processing and Deep Learning (using Embedded GPUs): End-to-End Signal Processing and Deep Learning, particularly using Embedded GPUs, are specialized areas of focus. These technologies enable the development of full-stack solutions for deep learning and GPU-enabled signal processing systems, including edge compute hardware and custom applications. The integration of hardware and software facilitates efficient performance in tasks such as radio embedded systems with FPGA, CPU, and GPU, GPU-based signal processing algorithms, and pruned neural networks for inference on edge RF systems.

Audio

Self-supervised learning for infant cry analysis
- Self-supervised learning can be used to learn useful representations of infant cries from unlabeled data.
- The self-supervised approach was able to achieve comparable performance to a supervised learning approach that was trained on a small amount of labeled data.
- Self-supervised learning could be a valuable tool for developing new and improved infant cry analysis systems.
CryCeleb: A Speaker Verification Dataset Based on Infant Cry Sounds
- The CryCeleb dataset is a large and diverse dataset of infant cry sounds. This challenge has narrowed in on the task of distinguishing baby cries from each other.

NLP

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
- BART is a denoising autoencoder, which means that it is trained on a dataset of corrupted text. The corruptions can be simple, such as replacing words with random words, or more complex, such as removing words or sentences. BART is trained to reconstruct the original text from the corrupted text.
- This training procedure helps BART to learn to represent the meaning of text, even when the text is corrupted. This makes BART well-suited for natural language generation, translation, and comprehension tasks.
Backpack Language Models
- Backpacks is a neural architecture that combines strong modeling performance with interpretability and control. It learns multiple sense vectors for each word, representing different aspects of the word, and allows for intervention and modification of these vectors to change the model’s behavior. It outperforms larger models in lexical similarity evaluations and enables controllable text generation and debiasing through sense vector manipulation.
Date specific (might not age well) 05/25/23
- State of GPT (a Youtube update by Microsoft)
- Discovering Language Model Behaviors with Model-Written Evaluations
LLM Powered Autonomous Agents An OpenAI researcher’s musing on RHL.
Prompt Receipes Great starting points for some specific use-cases in prompting. Some are shockingly effective.

Large-Language Models

Parameter-Efficient Transfer Learning for NLP: using adapter modules as an efficient alternative to fine-tuning large pre-trained models in the context of numerous downstream tasks. Adapter modules introduce minimal trainable parameters per task, enabling the incorporation of new tasks without retraining the entire model. Demonstrating their effectiveness, the authors apply adapter modules to 26 text classification tasks, including the GLUE benchmark, achieving near state-of-the-art performance with only a slight increase in parameters. This approach contrasts with traditional fine-tuning, which requires training all parameters for each task, showcasing the efficiency and flexibility of adapter modules in handling diverse tasks.
RLHF: Reinforcement Learning from Human Feedback (A blog by Chip Huyen)
StackLLaMA: A hands-on guide to train LLaMA with RLHF: About the development of the StackLLaMA model, a reinforcement learning from human feedback (RLHF) fine-tuned LLaMA model for Stack Exchange question and answering. The model is trained using a combination of supervised fine-tuning, reward modeling, and reinforcement learning techniques. Challenges faced during training, such as balancing rewards and managing KL divergence, are highlighted, and the post emphasizes the ongoing efforts to improve RLHF methods. The StackLLaMA model demonstrates the application of RL techniques in natural language processing tasks, showcasing its potential for enhancing question-answering systems.
ReAct: Synergizing Reasoning and Acting in Language Models (web write-up): a novel approach that leverages Large Language Models (LLMs) to generate both reasoning traces and task-specific actions concurrently. By using reasoning and acting, ReAct demonstrates enhanced synergy between the two, enabling the model to induce, track, and update action plans based on reasoning traces and interact effectively with external sources. The results show ReAct’s effectiveness over existing methods in various language and decision-making tasks, addressing issues like hallucination and error propagation while improving human interpretability and trustworthiness.
ReAct: Synergizing Reasoning and Acting in Language Models
Who Wrote it and Why? Prompting Large-Language Models for Authorship Verification: The text discusses the significance of Authorship Verification (AV) in natural language processing, highlighting challenges in existing techniques such as data requirements and explainability. To address these issues, the paper introduces PromptAV, a novel approach leveraging Large-Language Models for AV. PromptAV utilizes step-by-step stylometric explanation prompts, outperforming state-of-the-art methods with limited data and providing intuitive explanations for predictions. The research aims to enhance AV effectiveness and interpretability, presenting PromptAV as a promising solution in forensic analysis, plagiarism detection, and identification of deceptive content.
Building RAG-based LLM Applications for Production: Summary of Retrieval Augmented Generation (RAG):

Retrieval Augmented Generation (RAG) addresses the limitation of Large Language Models (LLMs) not being trained on specific user data. It involves incorporating user data into the LLMs’ existing knowledge. In the RAG process, user data is loaded and prepared for queries, creating an index. User queries filter the data down to the most relevant context, and this context, along with the query, is sent to the LLM, which generates a response.

Key stages within RAG include:
- Loading: Retrieving user data from various sources and bringing it into the processing pipeline, supported by connectors like those provided by LlamaHub.
- Indexing: Creating a data structure, often involving vector embeddings and metadata strategies, to facilitate efficient querying and retrieval of contextually relevant data.
- Storing: Saving the indexed data and associated metadata to avoid the need for re-indexing in the future.
- Querying: Utilizing LLMs and LlamaIndex data structures for various query strategies, such as sub-queries, multi-step queries, and hybrid approaches.
- Evaluation: Assessing the effectiveness of the pipeline through objective measures, comparing strategies and evaluating the accuracy, fidelity, and speed of query responses.

Vision

YoloV10: Real-Time End-to-End Object Detection Among other improvements in efficiency, the highlight may be tackeling the speed issues caused by NMS.
Faiss: A library for efficient similarity search: Facebook AI Similarity Search (Faiss), a library that allows us to quickly search for multimedia documents that are similar to each other

ML (General)

DeepHit: A Deep Learning Approach to Survival Analysis with Competing Risks Time-to-event analysis is widely used in economics, finance, engineering, medicine and many other areas. Previous models rely on strong parametric assumptions that are often violated. DeepHit uses a deep neural network to learn the distribution of survival times directly. Comparisons with previous models on the basis of real and synthetic datasets demonstrate that DeepHit achieves statistically significant performance improvements.

Recommendation and Search

Counterfactual Evaluation for Recommendation Systems: the challenge of evaluating recommendation systems and suggests that they should be treated as interventional problems rather than observational ones. It explains that traditional offline evaluation methods may not capture the true impact of recommendations on user behavior. The article introduces counterfactual evaluation as an alternative approach, particularly focusing on Inverse Propensity Scoring (IPS) and its variants like Clipped IPS (CIPS) and Self-Normalized IPS (SNIPS), highlighting their advantages and limitations.
Improving Deep Learning For Airbnb Search: Airbnb’s transition to deep learning for search ranking significantly impacted its roadmap, leading to a shift in strategy. While the initial optimism about incorporating machine learning ideas from literature surveys faded due to application-specific challenges, the focus shifted towards a process-driven approach, emphasizing the importance of iterative strategies over individual techniques for enhancing deep learning models in industrial settings.

ML Ops

https://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf
Continuous Adaptation for Machine (Learning)
Model training as a CI/CD system: Part I
Model training as a CI/CD system: Part II
APIs for Model Serving
Github Actions This is a great place to start especially since your project likely already has a Github repository. It covers all the basics of CI/CD. It discusses the components of a workflow and how to create one. Workflows can be triggered by events such as pushes to the repository. Each workflow run is executed on a runner, which can be a virtual machine or a self-hosted server. Jobs within a workflow can run in parallel or sequentially.
- The process of building a CI/CD system specifically for model training.
- Key components such as data management, environment setup, model training pipeline, and version control.
- Emphasizes the benefits of automation, highlights cloud infrastructure support, and suggests specific tools for implementing a model training CI/CD system.
Application deployment and testing strategies (Google Cloud)
- Best Practices: 1) Backward compatibility 2) Continuous integration/continuous deployment (CI/CD) 3) Automation 4) Operating environments and configuration management. 5) Rollback strategy in case things go wrong 6) Post-deployment monitoring
Leveraging TensorFlow-TensorRT integration for Low latency Inference:TensorRT integration in TensorFlow allows developers to optimize and accelerate their deep learning models for deployment on GPUs. By leveraging TensorRT’s optimizations, TensorFlow users can achieve faster inference times and reduced memory footprint. This integration provides a seamless workflow, enabling efficient deployment of TensorFlow models with improved performance for real-time applications. *Bayesian hyperparameter tuning: Nuts & bolts

Statistical

Bayesian stock assessment: a review and example application using the logistic model

MIT Fishery Monitoring With Computer Vision

2022-02-16T00:00:00+00:00

A grant-funded computer vision-based system to count fish under various conditions entering Massachusetts Fisheries.

MIT Fishery

I am currently working on an ongoing project to detect and count fish in Massachusetts Fisheries. This project is funded by a grant from the federal government. I started working on this project in 2022, and I am working individually with a research professor and MIT PhD students. I am responsible taking the initial research project and making it operational: 1) Making model performant enough to count fish 2) Working with research grant funding to set up appropriate infrastructure and 3) deploying the infrastructure.

MIT Fishery

QA Bot: Ask questions about the repository:

Public Demo of dashboard

Overview

Fisheries populations have a large impact on the U.S. economy. Each year the U.S. fishing industry contributes 90 billion dollars and 1.5 million jobs to the U.S. economy. Each species may serve as a predator or prey for another. In this regard, fisheries populations are interconnected and dependent. While humans may depend on these populations as a source of sustenance (food, goods, etc.), humans can also negatively impact population growth. Barriers to migration, pollution, overfishing, and other forms of human-interference may impact spawning patterns of fisheries species. In 2014, 17% of U.S. fisheries were classified as overfished. Therefore, it is necessary to monitor these fisheries populations to determine when policy must be changed in efforts to maintain healthy oceans.

Many groups, including NOAA Fisheries, state agencies, as well as regional fisheries councils and local municipalities, deploy camera and video equipment to monitor fisheries populations. Large amounts of video and photographic data are gathered at timed intervals. However, not all photos contain aquatic life. Currently, employees at these agencies among others are responsible for manually annotating the gathered videos and photos; this means they identify and count the relevant aquatic specimens in the data. Not only is this an inefficient use of time and resources, but also it can lead to inaccurate results due to human error. NOAA Fisheries Management can make a significant improvement in time and resource use through automation of the annotation process.

Throughout the project, I have made significant progress in addressing these challenges and have achieved promising results. Here’s an overview of the key aspects and advancements made:

Object Detection:

The initial challenge involved designing an accurate object detection algorithm specifically tailored for fish tracking. It required careful consideration of model architecture, hyperparameters, and dataset selection. Through iterative experimentation and fine-tuning.

Original YOLO paper: (You Only Look Once: Unified, Real-Time Object Detection)[https://arxiv.org/abs/1506.02640]

Performance

The primary metric used to assess performance in training the object detection is map50-95 (mean average precision at IoU (Intersection over Union) thresholds of 0.5 to 0.95) which I baselined the project with at 0.31 and now at 0.72 has become sufficent for tracking in videos to receive a 0.28 MAPE (Mean Absolute Percentage Error). This doesn’t reflect the full picture of how well the algorithm performs in tracking an object from one point to another. There is more discussion on this later.

Performance Comparison for Different Years

Overview

This section provides a comparison of the algorithm’s performance across different years (2016, 2017, and 2018) at one specific site. The performance metrics considered include videos_missed, mape, total_percent_error, misscounts, total_herring, rmse, mae, and f1.

Metrics: Detecting and Tracking

To track and count fish objects effectively, there is obviously a need to detect/classify the fish which affects the performance tracking the fish from frame to frame.

Detecting

\[\text{MAPE} = \frac{1}{n} \sum_{i=1}^{n} \left| \frac{\text{True Counts}_i - \text{Predicted Counts}_i}{\text{True Counts}_i} \right| \times 100\]

This is a good overall metric that generally captures how well, for a series of video segments, your algorithm counts the objects. For my use-case, because the video is initially starts recording for movement (can be triggered by seaweed as well as fish), each video capturing the fish is ~30sec. For MAPE, this means that if only 1 fish was in the video and it was not counted which will be penalized more heavily than a video with 2 fish where only 1 was counted. This can be important since a lot of fish will swim in parallel which can add to the difficulty in detection.

Performance Metrics

Here’s a breakdown of the performance metrics for each year:

Year	Videos Missed	MAPE	Total Percent Error	Misscounts	Total Herring	RMSE	MAE	F1
2016	4	0.196667	0.333333	7	21	1.9	0.7	0.333333
2017	3	0.09375	0.12	3	25	0.1875	0.1875	0.856459
2018	11	0.166667	0.12	9	75	0.543478	0.326087	0.611905

Discussion

The performance variance over the years can be attributed to several factors, including variations in video quality, lighting conditions, and the presence of other objects that can reduce confidence in herring frame to frame. These factors pose challenges for accurate fish detection and tracking, leading to differences in the algorithm’s effectiveness across different years. The primary factor in better performance seems to be the light quality that helps define the fish. However, despite these challenges, the algorithm has demonstrated consistent improvement in performance over time, as evidenced by the evolving metrics.

Data Preparation

Video Considerations

The ideal sample rate depends on how fast objects move. For slower videos, a lower rate might suffice. But to capture every detail of speedy objects, you’ll need to annotate more frames, ensuring you have enough data to calculate their velocity accurately.

Due to the extreme varition in camera quality and image quality the following types of image augmentation have become extremely usefuly in the quality of model:

Example Image Augmentation

Example of Challenging Atmospheric Conditions for ID-ing of Fish for Purposes of Tracking

Example of the need for more data when counting AND tracking

Breakdown of different challenges with data quality:

1) Illumination Challenge: In object tracking, illumination challenges arise when the lighting conditions change dramatically across frames or in different parts of the video. This variation in brightness, shadows, and highlights can affect the appearance of the object, making it difficult for the tracker to maintain accurate identification.

<" />

2) Occlusion Challenge: Occlusion occurs when an object being tracked is partially or completely obscured by other objects, elements, or even itself. Occlusions can disrupt the object’s visibility, leading to potential misidentifications or temporary loss of tracking. This is of particulary problem for this situation since we want to pick up the fish as soon as possible to establish a unique tracking id.

<" />

3) Deformation Challenge: Deformation challenges refer to situations where the object of interest undergoes significant shape changes over time. This could occur due to the object’s inherent flexibility or interactions with other objects. Tracking such deformable objects requires handling non-rigid transformations effectively.

<" />

4) Noise Corruption/Blurring Challenge: Noise in the image data can be caused by various factors, such as sensor limitations, compression artifacts, or environmental conditions. Noise corruption poses a challenge to object tracking by introducing unwanted fluctuations and reducing the clarity of the object’s features.

<" />

5) Out of Plane Rotation Challenge: Out of plane rotation occurs when the object is viewed from different angles, leading to changes in its appearance and perspective across frames. Accurately tracking objects with out of plane rotations requires handling geometric transformations.

<" />

6) Motion Blurring Challenge: Motion blur is caused by the relative motion between the camera and the object during exposure, resulting in smearing of the object’s appearance. Tracking objects affected by motion blur requires robustness to handle the loss of fine details and sharpness in the image.

<" />

Annotation and General Approach Notes for Object Counting in Video

When splitting images between train/val/test by complete journey of object across screen. I.e. split all of the images of a fish traveling across screen into train then a seperate fish traveling across screen into val.
Set the confidence of your YOLO inference at the confidence that maximizes your F1 score. E.g. Max 98% @ Confidence = 0.32. Although maximizing for recall is good, the F1 score is more directly correlated with tracking a fish across the screen.
Try and hit ~1000 instances of an object class that is to be tracked

What does a label “balanced” look like in object detection?

In this context, “balanced” doesn’t refer to the visual appearance of a label itself. Instead, it describes the distribution of labels within a dataset used to train an object detection model. A balanced dataset ensures that all object categories have roughly the same number of examples. This is crucial for optimal model performance, as an unbalanced dataset can lead the model to prioritize the frequently occurring categories and struggle to detect less frequent ones. Ideally, a “balanced” label distribution would have a similar number of labels for each object class, creating a level playing field for the model to learn from all categories effectively.

<" />

Dataset Selection and Training:

“garbage in, garbage out.” So, I spent a good amount of time curating a diverse dataset with annotated fish images and videos. Lighting conditions, backgrounds, and different fish species etc. By training the model on a curated dataset, I witnessed significant improvements in detection performance.

Holdout Set for Validation:

To make sure our counting game is on point, I set aside a holdout set of videos with ground truth fish counts. It’s like having a benchmark to compare against. I tested the algorithm’s count predictions against the ground truth, giving us valuable insights into its accuracy and effectiveness. No fishy business here!

Bayesian Optimization with wandb.sweeps:

I gained enough confidence to narrow down my parameter search space. With the help of Wandb.Sweeps’ Bayesian optimization capabilities, I let the algorithm do its magic overnight. It efficiently explored the parameter space and brought me some impressive results. It’s like having a super-smart assistant working while I catch some zzz’s.

Tracking

The “botsort” algorithm leverages motion patterns to estimate object displacement, matches appearances to maintain consistency across frames, and predicts future positions based on historical trajectory data. This comprehensive approach has significantly improved tracking accuracy and facilitated reliable fish counting.

It was selected because it had the best MOTA (Multiple Object Tracking Accuracy) that considers identity switches, false positives, and false negatives.

\[\text{MOTA} = 1 - \frac{\frac{1}{n} \sum_{i=1}^{n}(\text{False Positives}_i + \text{False Negatives}_i + \text{Identity Switches}_i)}{\frac{1}{n} \sum_{i=1}^{n}(\text{Total Number of Ground Truth Objects}_i)}\]

PyBay 2020 Lightning Talk on Predicting NoShows

2020-08-16T00:00:00+00:00

Download PDF

Poizon Plants

2020-05-26T00:00:00+00:00

A practical iOS app identifying poison oak.

Poizon Plants

Poizon Plants

Intro/Motivation

I have gotten poison oak multiple times. While exposure to the oil found on the plant will not cause an allergic reaction for everyone, the fact remains it’s not fun. Also, it should be noted that while some people definitely do not have an allergic reaction, there is no proof that you won’t develop an allergic reaction over time with continued exposure, according to the American Osteopathic College of Dermatology and other sources. Basically, no one is really safe.

Of course, if you don’t live on the West Coast, all of this might not matter too much. While Urushiol (the oil on poison oak causing the allergic reaction) can be found in plants all over the world, it seems to really love the North American West Coast. Those of us living in California or Oregon for more than a few years are likely familiar with, or at least have heard of, poison oak. But even people native to California (much less one of the millions of tourists) have trouble identifying the plant if it’s not in its signature glowing, oily red. It seems like a good use-case for computer vision.

Given that mobile phones are ubiquitous even when out enjoying nature, creating a poison oak app seemed like a useful project and learning opportunity.

Side note: I had no interest in duplicating effort for something that already had a solution, so I did a little research to see if there were any existing solutions for this niche personal challenge. Interestingly enough, there were already a few apps on the iOS store that were simple classification apps like the one I proposed.

Cold-Start Problem

As with any supervised learning challenge, it is critical to get a good source of labeled data. Perhaps the obvious place to start for poison oak was Google Images. Using Chrome extension here, I was able to download large quantities of images.

Semantics of the search were somewhat important (“poison oak”, “poison oak bush”, “poison oak autumn”, etc.), so I focused on making searches that got images of the plant under different seasons. The thought here being to capture as much heterogeneity of poison oak as possible for users of the app. Unsurprisingly, this is the first place where the data revealed its bias since the major season captured for poison oak seemed to be when it was at its most obvious: red or orangish-red starting end of summer. But this is for poison oak during the later half of the year and not even necessarily when it’s at its oiliest.

For the “not poison” images, I tried to cover a broad flora that might exist geographically with poison oak and especially on plants whose leaves I thought would confuse the average hiker. This initial dataset netted me about 3k images in total after cleaning out the expected garbage images (cartoons images, stock photo images with watermarks, etc.). Since transfer learning is available, this was a great start and stage to baseline model performance.

To obtain a preliminary understanding of the signal present in the dataset, I implemented a Resnet50 on an 80/20 random split of the data. With minor tuning, I achieved a validation accuracy of over 98%. However, this result appeared suspiciously ideal and indicated a strong presence of bias in the dataset. I suspect that the image data obtained from Google Images may have been generated by a model similar to the one I used for training. Minor differences in data cleaning and algorithms may be responsible for the suboptimal performance, preventing me from achieving 100% accuracy with the model trained on the Google Images.

Of course, the only solution to poor data is get more data. And hopefully that captures more variety for the use-case your modeling for. The benefit of acquiring data this way is that I would be:

1) Labeling on the go and would have more intuition what my model was and wasn’t understanding well.
2) Refining the concept-space of what constituted a reasonable recognition of an image containing poison oak since this could include multiple plants.
3) Have a reason to go on more hikes.

Object Detections Vs Classification

The question arises as to why we refrain from treating this issue strictly as an object detection problem. In an ideal scenario, you would draw a bounding box encapsulating the identified leaf or plant. However, anchoring the project in a practical use-case: an individual equipped with a cell-phone who wants to maintain their distance from the plant in question. As they incrementally approaches, their certainty concerning the classification converges.

Labeling Process

There is a growing demand for quality labeling services. Perhaps the most well known is Mechancial Turk which provides relatively inexpensive labeling service. The challenge here is finding a domain experts which in my case relied on (most likely) utilizing, possibly, non-Californians to decide if a plant was poison oak or not. For an in-house approach services such as LabelBox and Spacy Prodigy provide a great improvement on the efficiency of human labeling. While I’ve had great experiences with Prodigy for NLP projects, since LabelBox provides a limited free access, I utilized their product. Regardless, it’s a tedious process.

Again, I hand-labeled 8000 images that I took with my iphone 7 plus!

Choice of Cut-off Probability

This is a choice which I’ve seen discussed very rarely for a binary classification problem (at least in workshops/tutorials/books). It can have great practical implications for the end-user. Generally you assume the cut-off for a binary classifier at 50/50 for deciding whether to bin the output of the softmax as a 1 or 0. I chose to model the app as a degree of certainty that a given image was poison oak based on feedback from different people I had test the app out. In this case, binning the probabilities into categories such as “possibly poison oak”, “definitely poison oak”, etc. provided better intuition to the user that a 60% vs a 80% probability.

Probability Cut-off.

iOS App

App Starting point: Creating an image classifier
Adds on app: Admob and Firebase

Modeling

While initial base-lining of model was quit easy with Apple’s CoreML, it quickly became obvious that the developer-friendly approach to ML wasn’t going to cut it for my self-assigned project. It was a great generalizer and got me 0-60 super quick.

Training

Model Tracking: Utilization of Weights & Biases (wandb.com) was very useful when iterating over different models. It even has an image viewer so that you can peek at how you model is predicting example images while it is training.Wandb Hyper-parameter Sweeps was also very helpful in reducing manual rerunning of hyperparameters.

Contextual image information may often be a confounder.

Choice of architecture:: Due to wanting to have a very controlled way of dealing with outliers, i.e. more distant pictures of a bush of poison oak it was necessary to use both dropout and L2 regularization (Here is a good blog post on this topic). L2 was a reasonable choice for regularization since I wanted to smooth more for those outliers (as opposed to L1 which does not penalize as strongly). The reason for this is in the figure below:
- We don’t particularly care this image has a probability of 92% vs 99%
- We DO care if THIS image has a probability of 60% vs 5%:

Tayloring model for mobile platform (Converting Tensorflow to TF-Lite)

Why:
- CoreML is not as well supported with documentation etc for complex ML training
- TF-Lite could also easily be used on Android devices (or pretty much anywhere else for that matter)
- Optimizations are offered for TF Lite:
  - Default
  - Latency
  - Size However with my testing (TF V2.3.0), there was a significant (~5%) drop in F1 when optimizing for Latency or Size and not a significant change in user experience.

Convert Keras trained model to TF-lite

# TF: v2.3.0
# Python: v3.8.2

from tensorflow.keras.models import load_model
from tensorflow.lite.TFLiteConverter import from_keras_model

# Load model (I had saved as a .model file and it worked fine)
model = load_model(model_path, custom_objects=dependencies,compile=True)

converter = tf.lite.TFLiteConverter.from_keras_model(model)

# Set True to optimize model for speed
if False:
    print('[INFO] Optimizing TF Lite model')
    converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SPEED] 
tflite_model = converter.convert()

With my testing, the optimized models did not significantly (even on my dated iPhone 8 plus) change the user experience. The f1-score did significantly drop a few points however, and this was enough for me to use the default settings in TF Lite. I assume on much smaller IOT devices latency and size would greatly outweigh relatively small metric gains from the default.

Save model for use in iOS

import pathlib

# Write model to file
tflite_model_file = pathlib.Path('./poison_not_poison.tflite')
tflite_model_file.write_bytes(tflite_model)

# Write labels to file
labels = ['is_poison','not_poison']

with open('poison_labels.txt','w') as f:
    f.write('\n'.join(labels))

Drop model in iOS app

Other Training Lessons Learned

Initially, I made all layers trainable which resulted in a extremely spiky, but generally decreasing validation loss but only after huge number of 250 epochs. After running some experiments, I found a much steadier decrease of validation loss at training only the last 20 layers which resulted in a smoother convergence and higher overall accuracy. I also experimented with different optimizers, including Adam and SGD, and found that Adam yielded the best results. In addition, I utilized data augmentation techniques such as random rotations and flips to further improve the model’s robustness. Finally, I fine-tuned the hyperparameters of the model, such as the learning rate and batch size, to achieve the best possible performance.
Choice of object detection vs classification task When thinking about how to make the app practical for hikers vs what was practical for modeling, it seemed like there were 2 choices: focus on the object detection task or a classification task.

Practically speaking, it seems more useful that a person would pull out their phone and sweep the camera over a range of plants. Looking ahead, this would involve a lot of manual labeling of bounding boxes and too much work. The project could always be easily extended to include this scope also. So classification it was.

Getting to > 90% F1 score This goal was set both for practical reasons (having a classifier app that was a satisfying product), as well as a mental milestone for “I understand training an image classifier”. There is potentially a lot of learning to be had between an 87% F1 score that I struggled with for a while and 90% F1-score that I finally achieved. Achieving a >90% F1 score was a critical goal for developing a high-quality image classifier app, as well as a personal milestone for my understanding of training such a model. The journey from struggling with an 87% F1 score to finally hitting 90% taught me a lot about the learning potential between these two points.
It was particularly important to optimize on the 299x299 image size used by iOS during model development for performance, compatibility with the iOS platform and consistent testing.
Focus on your highest-loss images to understand true performance This was useful both for cleaning bad/poor quality images as well as finding areas where potentially you might need to break out a new class, e.g. “Unknown bush”
Think your data is well-labeled? Think again…and again:
- Despite having gone through and labeled, by hand, thousands of images, examining images with top log-loss showed the my human error. I think one reason for this is during the labeling process, I had additional context for “yes this is poison oak” due to having walked by a large bush of poison oak already. When looking only at the photo that was taken with no additional context, which is what the neural net is doing, it was not clear to my human eye.
How do you know when your data is enough? For the practicality of my problem, where a region captured with a phone camera “may contain poison oak”, it was not immediately clear if I had taken pictures with sufficient variation. An example of when this issue first surfaced was when a reddish plant, which was not poison oak, caused trouble for the algorithm. This presented a bit of a challenge because while red is a strong indicator (for both humans and neural nets) for identifying sumac plants, it is not a definitive rule for identifying poison oak.”

Strong red color can be indicative of poison oak but also a strong false positive.
Training Augmentation is quit helpful for this use-case. Shift and rotation especially. But I came across a moment when my validation accuracy would reach a certain point around 70% where it wouldn’t get better; but it wouldn’t get worse either. This is due to the rookie mistake of augmenting the validation data as well as the training data.
A fair comparison:

Contextual image information may often be a confounder.

Possible Next Steps:

Model experimentation with better feature extraction to take better advantage of the obvious leaf colors (edge detection, color space conversion and texture analysis, etc). Different model types such as Probabilistic Neural Network (PNN) that will accomidate a level of uncertainty among densly populated leaves in addition to be faster which is always an advantage on edge. This would be particularly useful where PNNs for classifying multiple different species of plants where the number of plants can potentially be very large and dense.

Conclusion

If you’ve gotten this far: thanks for reading!

Several training lessons were learned during the development of the Poizon Plant iOS app. I discovered that training only the last 20 layers and using Adam optimizer led to a smoother convergence and higher overall accuracy. Data augmentation techniques, such as random rotations and flips, were utilized to further improve the model’s robustness. In addition, the hyperparameters of the model were fine-tuned to achieve the best possible performance. The choice of classification task over object detection was made for practical reasons, and a goal of achieving over 90% F1 score was set. I found that focusing on high-loss images helped to understand true performance, and careful consideration was given to label the data properly. Also, the use of GPUs was important to maintain the momentum of the project. Finally, a fair comparison between images was essential to avoid contextual information as a confounder. Overall, these lessons demonstrate the importance of careful consideration and experimentation during the development of deep learning models.

I deployed the model using TensorFlow Lite, a mobile framework for running machine learning models on mobile devices. To make the model even more efficient on mobile, I used the MobileNetV2 architecture, which is designed specifically for mobile devices and has a smaller memory footprint than other CNN architectures. With the model deployed and optimized for mobile, it was ready for use in real-world applications.

References

Poizon Plant iOS app (Note: the app being available to download is contingent upon free ads covering the registration cost of iOS apps. If the link is broken, it’s likely because there wasn’t enough ad revenue)

Papers I checked out

Word Embedding Comparisons for Disease Name Entity Recognition

2018-10-31T00:00:00+00:00

Intro and Overview

Named Entity Recognition (NER) is a crucial task in natural language processing, particularly in the field of biomedical research. With the rise of ELMo embeddings, I was curious to see how they stacked up against other popular word embedding techniques for a DNER task. Accurate identification and extraction of disease entities from medical texts is necessary for several applications, such as clinical decision support systems, drug discovery, and epidemiological studies. It can help researchers and healthcare professionals quickly identify relevant information from a vast amount of medical literature and electronic health records, leading to better patient outcomes and more efficient medical research. In this quick comparison, I explore the performance of different word embedding techniques for the DNER task, with a focus on the much-touted ELMo embeddings.

The motivation for comparing word embeddings for Named Entity Recognition (NER) with diseases and adverse conditions stems from the recent popularity of ELMo embeddings in health-related natural language processing tasks. ELMo embeddings have shown promising results in capturing context-specific information and could potentially enhance the performance of NER models in the biomedical domain. Therefore, I aimed to investigate the suitability of ELMo embeddings alongside other popular techniques, such as word2vec, GloVe, and fastText, for DNER tasks.

ELMo embeddings, developed by researchers at Allen Institute for AI, are a breakthrough in the field of natural language processing. These embeddings are unique because they are contextualized: they capture different shades of meaning of a word based on its surrounding context. This is a significant improvement over traditional word embeddings, which treat each word as a static entity independent of its context. The ELMo model is based on a deep, bi-directional language model that learns to predict the next word in a sentence given both its preceding and following words. By doing so, it captures a rich representation of the word’s meaning, informed by both its syntactic and semantic context. The resulting embeddings are highly effective in a wide range of natural language processing tasks, including named entity recognition, sentiment analysis, and question answering.

I compared different word embedding techniques for Disease Named Entity Recognition (DNER) task. I focused on three popular methods, i.e., word2vec, GloVe, and fastText, and evaluated their performance on a biomedical text corpus.

The experiments were conducted on two datasets with distinct characteristics: one consisting of clinical notes from hospital settings, and the other from biomedical literature. The datasets were chosen to reflect the diversity of text sources encountered in real-world DNER tasks. To ensure fair evaluation, I employed a rigorous cross-validation protocol and carefully selected a representative subset of the data for training, validation, and testing. The resulting benchmark allows for meaningful comparisons of the performance of different embedding techniques across different text domains.

To evaluate the performance of the different word embeddings, we used a variety of evaluation metrics, including precision, recall, and F1 score. These metrics provide a comprehensive view of the embeddings’ strengths and limitations in identifying named entities in biomedical text. However, as pointed out by Rob Hyndman in his work on forecasting accuracy measures, it is important not to rely solely on these metrics and to consider other factors such as computational efficiency and interpretability when selecting a word embedding technique. Moreover, we should be cautious of overfitting to a particular dataset and ensure that the embeddings generalize well to new data. By taking a nuanced approach to evaluation, we can gain a better understanding of the capabilities and limitations of each word embedding technique and make informed decisions when selecting an appropriate method for a DNER task.

However, my study also revealed that combining different embedding techniques and incorporating domain-specific features could further improve the performance of the models. I suggested that further research is needed to explore the use of hybrid models for biomedical named entity recognition tasks.

The finding that the choice of embedding method had minimal impact on the models’ performance is important, as it suggests that practitioners can use a range of embedding methods without sacrificing performance. This echoes the sentiment expressed by Rob Hyndman in his work on time series forecasting, where he argues that the choice of forecasting method is less important than other factors such as data quality and feature selection. However, it’s worth noting that this finding is based on the specific datasets and evaluation metrics used in this effort. As such, practitioners should exercise caution in extrapolating these findings to other contexts and datasets.

It’s clear that the choice of word embedding technique can have a significant impact on the performance of natural language processing models, especially for tasks such as DNER.

Results

All scores reported are the F1-Micro on BIO format for the Disease and Adverse Effect Entities. Essentially both entities are scored if they are recognized as a whole. E.g. if heart failure atrial fibrillation chf [Disease] is but not if heart failure atrial fibrillation. F1-Micro is the harmonic mean of a micro-averaged precision and recall (see below). It is a reasonable (more on that below) choice as a metric as it accounts for the class imbalance of Adverse and Disease entities and we don’t have a particular metric needed to optimize for: Micro-Precision: (TP1+TP2)/(TP1+TP2+FP1+FP2) Micro-Recall: (TP1+TP2)/(TP1+TP2+FN1+FN2) Results:

Embeddings	F1-Micro Score
ELMo Embeddings (5.5b,200d)	0.779 ± 0.02
EHR/Biomedical Text Embeddings	0.493 ± 0.05
(approx 3b words, w2v cbow, 200d)
GloVe (42b,300d)	0.811 ± 0.04
GloVe (6b,50d)	0.750±0.04
GloVe (6b,100d)	0.780 ± 0.01
GloVe (6b, 200d)	0.804± 0.04
GloVe (6b, 300d)	0.816 ± 0.03
FastText (16bn, 300d)	0.791 ± 0.05

Conclusion

The results (if they show anything) seem to suggest that the comparison may not be entirely fair. However, what stood out the most was the surprisingly poor performance of the EHR embeddings. This observation underscores the need for a much larger corpus, especially considering that these embeddings were trained using the cbow word2vec method, which may not be the optimal choice for capturing rare disease words. In contrast, the GloVe embeddings excel at weighing rare words through their co-occurrence frequency, as highlighted by their comparative performance.

It is worth noting that the AllenNLP website, which hosts the ELMo embeddings, acknowledges the omission of a comparison with GloVe, as they deemed them not directly comparable.

While F1 score is a ‘safe place’ for data scientists, it should be carefully considered for NER tasks. Boundary errors, as it turns out, are one of the major sources of error in biological applications. Optimizing solely for F1 may cause us to overlook the left flank, where tagging ‘flank’ as a location still represents a partial but significant success. Labeling errors remain a significant concern.

Another factor to consider is the potential sparsity of entities in the corpus used for training. Many openly available biomedical text datasets are based on research articles or abstracts, which are densely packed with biomedical concepts. Additionally, the tone of these datasets is more academic and may not capture the indications for certain diseases that are of importance.

On a related note, as I was writing this post, I stumbled upon Facebook’s Meta-embeddings, which provide a mechanism to determine the most effective embeddings for a specific prediction task. This ensemble-type approach allows for an intriguing exploration of specialized word variations between embeddings. The authors argue that the modeler should not personally choose the embeddings, but rather rely on the objective rigor of DME (Dynamic Meta-Embeddings).

It is important to keep in mind that many of these general-purpose, broad-label embeddings do not clearly define their data cleansing and tokenization methods or specify their particular optimization objectives.

For further reading, I found this resource helpful and relevant in exploring the best and latest in word embeddings. Additionally, if you wish to delve deeper into the trends of last year, you may find this article informative: http://ruder.io/word-embeddings-2017/.

Links to embeddings

Elmo Embeddings: https://allennlp.org/elmo
GloVe Embeddings (Common Crawl (42B tokens, 1.9M vocab, uncased, 300d vectors): https://nlp.stanford.edu/projects/glove/
Disease and Adverse Effects NER dataset that I used: https://www.scai.fraunhofer.de/en/business-research-areas/bioinformatics/downloads/corpus-for-disease-names-and-adverse-effects.html

aus10powell.io

Wise Words of Ovid

Problem Definition and Goal

Challenge

Goal

Generating/Extracting Quotes:

Data and Challenges

Tasks Involved in Problem Definition

Ovid Quote Retrieval Task

Tweets needed: Tweet Probability of Repeated Occurrence over Time

Interesting Quote “Hallucinations”:

Fine-tuning based on gold standard

Dual-Encoder:

Technology Stack

Infrastructure:

Machine Learning Stack:

Model Training:

Relevant Reading

Future Work

2025 Technical Reading

NLP

LLMs

Agents (LLM-based)

Machine Learning

2024 Technical Reading

NLP

Large-Language Models

Computer Vision

Recommendation Systems

Twitter Account Sentiment and Text Generation

TwitterAPI

Goal:

Twitter Accounts

Deployment

National Parks

National and State Parks

Menu

California State Parks

National Parks

2023 Technical Reading

Signal Processing

General

Audio

NLP

Large-Language Models

Vision

ML (General)

Recommendation and Search

ML Ops

Statistical

MIT Fishery Monitoring With Computer Vision

MIT Fishery

Table of Contents

QA Bot: Ask questions about the repository:

Public Demo of dashboard

Overview

Object Detection:

Performance

Performance Comparison for Different Years

Overview

Metrics: Detecting and Tracking

Detecting

Performance Metrics

Discussion

Data Preparation

Video Considerations

Due to the extreme varition in camera quality and image quality the following types of image augmentation have become extremely usefuly in the quality of model:

Example Image Augmentation

Example of Challenging Atmospheric Conditions for ID-ing of Fish for Purposes of Tracking

Example of the need for more data when counting AND tracking

Breakdown of different challenges with data quality:

Annotation and General Approach Notes for Object Counting in Video

What does a label “balanced” look like in object detection?

Dataset Selection and Training:

Holdout Set for Validation:

Bayesian Optimization with wandb.sweeps:

Tracking

PyBay 2020 Lightning Talk on Predicting NoShows

Poizon Plants

Poizon Plants