Mark Watson’s Artificial Intelligence Books and Blog

I removed my OpenAI Account - notes for readers of my books

Mark Watson — Sun, 01 Mar 2026 17:07:57 GMT

I was outraged by the pressure the Department of Defense and the Trump administration put on Anthropic last week to accept changes in their contract.

I was also outraged by Sam Altman’s response and decided to simply remove my OpenAI account (easy to do: https://help.openai.com/en/articles/6378407-how-to-delete-your-account).

A number of my books have examples using OpenAI APIs. I will in the next few weeks modify the public GitHub repos for these books adding a directory OLD_MATERIAL that will contain the markdown file for any book chapters removed and the OpenAI example code). This will allow my readers to still easily read the removed material as well as the example code.

I will also push revised copies of the affected books to: https://leanpub.com/u/markwatson

Balancing Privacy and Productivity

Mark Watson — Sun, 22 Feb 2026 13:48:19 GMT

I build my digital life on two primary service providers:

Proton: mail, cloud storage, and Lumo private LLM chat (integrated web search tool with a strong Mistral model: my default tool that replaces plain web searches as well as covering 90% of my routine ‘LLM chat’ use)
Google: Gemini APIs, occasional use of Gemini for deep research, very occasional use of AntiGravity for coding using Claude and Gemini models, YouTube Plus for entertainment (philosophy talks, nature videos, Qi Gong exercise, etc. etc.)

I also use:

DuckDuckGo: when I still do web search, DDG is my default.

Make your own decisions

While I argue that almost everyone should think more carefully what personal and business information they leak to thousands of companies that buy, sell, trade, and use your data - and what to do to minimize these oozing data leaks - the question of how to stay productive is probably more difficult for you, dear reader.

I am happily retired so I understand that my tool and infrastructure requirements are easier to reconcile with privacy concerns than those of my online friends who are reading this.

Using both a 16B MacBook Air and a 32B Mac mini for LLM Experiments: My Setup

Mark Watson — Sat, 24 Jan 2026 17:26:17 GMT

I have a setup that other people might find useful. I leave my M2Pro MacMini tucked away always running headless. I use the Tailscale service to assign a “sort of” IP address to the MacMini - in fact, other devices must have TailScale installed and all devices need to be using the same configuration. I have never used TailScale before so it took almost 20 minutes to setup both my Macs, iPad Pro, and my iPhone.

On my MacMini, I always keep running:

OLLAMA_HOST=0.0.0.0 OLLAMA_CONTEXT_LENGTH=32768 ollama serve

Setting a larger than default context length is crucial. For example running Claude Code using a Ollama hosted model fails with a smaller context.

On my MacBook Air I need to set the OLLAMA_HOST address assigned by Tailscale (set the IP address Tailscale gives you, not the ‘dummy’ one I show here):

$ OLLAMA_HOST=100.122.241.16 ollama list

NAME ID SIZE MODIFIED nemotron-3-nano:latest b725f1117407 24 GB 4 weeks ago

devstral-small-2:latest 24277f07f62d 15 GB 4 weeks ago

....

In Python code:

import os, ollama

_remote = os.getenv(”REMOTE_OLLAMA”)

client = ollama.Client(host=_remote) if _remote else ollama

def completion(prompt):

print(f”\n\n$$ completion: prompt: {prompt}\n\n”)

response = client.chat(

model=”rnj-1”,

messages=[{”role”: “user”, “content”: prompt}]

)

ret = response[”message”][”content”]

print(f”\n\n$$ completion: ret: {ret}\n\n”)

return ret

print(completion(”1 + 3”))

I also find it useful to set a shell script for using ollama on the remote MacMini that I call ‘mollama’:

$ cat ~/bin/mollama

#!/usr/bin/env zsh

OLLAMA_HOST= 100.122.241.16:11434 exec ollama “$@”%

$ chmod +x ~/bin/mollama

Using Open WebUI on MacBook Air to access to my MacMini:

uv tool install open-webui --python 3.11

Run using:

OLLAMA_HOST=100.122.241.16 open-webui serve

To NOT show OpenAI models, just models on the MacMini with Ollama:

OLLAMA_HOST=100.122.241.16 OPENAI_API_KEY=’’ OPENAI_KEY=’’ open-webui serve

Note, to uninstall use:

uv tool uninstall open-webui

Using Claude Code (not using Anthropic’s models)

Set these environment variables:

export ANTHROPIC_AUTH_TOKEN=ollama

export ANTHROPIC_BASE_URL=http://100.122.241.16:11434

export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

And then create an alias:

alias CLAUDE=’~/.local/bin/claude --model glm-4.7-flash’

Problems

The only hassle with my setup is that even though I keep a few SSH sessions open on the remote MacMini, at least twice a week I find that I need to plug the MacMini into a monitor and keyboard/mouse for admin actions that I can’t easily do on the command line.

Supporting Goals of Free Software Foundation while using Apple hardware

Mark Watson — Fri, 14 Nov 2025 22:29:10 GMT

I just rejoined the FSF (https://www.fsf.org/) recently after almost ten years of not being a member. For decades I relied on Linux for work and writing and being a member of the FSF made sense. However when I decided to simplify my life and use a MacBook, an iPhone, and an iPad then I let my FSF membership drop. I am enjoying being a member again.

I am not recommending that anyone adopt my setup, but it may still be of general interest:

Apps vs. Web Browser

On mobile devices (and my MacBook) I don’t like to install apps unless it is something like a Chess game app. Services like Email, Mastodon, Facebook, YouTube, and X all work fine in a web browser, even with the DuckDuckGo browser which is usually my default.

Email

I use my own custom domain on Apple iCloud email. Convenient enough. I use backup ProtonMail and Gmail accounts.

I use the DuckDuckGo web browser, even on my iPhone, for these three services.

AI vs. No AI

Well, I have been a paid AI practitioner since 1982 but I believe that modern (LLM based) AI should be used purposely, keeping in mind “productivity vs. just playing around issues.” The resource costs of overuse of AI might cause an economic crash (a favorite little conspiracy theory of mine).

I like to have AI available only when I specifically want it. Usually I just code in plain-old Emacs for coding and writing my books. If I specifically want AI help with something then for an IDE experience I will use the TRAE coding agent. For command line, I will use gemini-cli or codex. I like to use AI coding help 4 or 5 times a week. As an example, today I wanted some Python code that used a few libraries converted to Common Lisp (using several popular CL libraries). TRAE one-shotted this for me in two minutes. I think it would have taken me over 20 minutes to write it myself.

AI is OK for easy stuff you can do yourself, and save time.

I prefer to use AI less, rather than more often but I love writing open source software and Open Content licensed books (https://leanpub.com/u/markwatson) - no strong desire to over-automate my work. But I appreciate AI tools when I do use them.

Do it myself AI integrations

I do write many small Python (and occasionally Lisp) utilities for specific purposes using LLM APIs. I prefer local LLMs on Ollama or LM Studio but I also use commercial APIs.

AI / Web Browser Integration?

In general I say NO! After reading their privacy docs I did experiment a few hours each with the Perplexity Comet Browser and OpenAI’s ChatGPT Atlas and I have read the docs for Opera Neon and Brave with Leo. (I used Proton’s Lumo AI for two months which is not really a browser integration and decided I didn’t want to pay for it after two months.)

I am a human being, I like to think for myself, and just use AI sparingly.

One option I am investigating is the “AI tab” on the DuckDuckGo browser that seems very useful and is privacy preserving. I am evaluating DDG’s subscription paid plan month by month, haven’t subscribed for a year yet.

Software Development

I find macOS almost as good as Linux for coding. I do keep one or two small Linux VPSs running on the web when Linux is better than macOS.

Apple's Hybrid Intelligence Architecture

Mark Watson — Thu, 13 Nov 2025 16:09:59 GMT

I wrote a iPadOS/iOS app “Mark Chat” a few months ago: https://apps.apple.com/us/app/markchat/id6747982917 I have good intentions of adding this app as an example to my Swift AI book (read online: https://leanpub.com/SwiftAI/read).

This app combines Apple’s local 3B parameter system model with secure cloud based inference, as needed. Apple’s AI efforts have received some justified criticism for slow rollout of features, but I believe Apple is on the right track.

When using Apple’s hybrid model system, what is computed locally? Here is a rough breakdown:

Processed On-Device (Local): These tasks are typically low-complexity, highly repetitive, context-dependent, and privacy-sensitive. They must be fast and available offline.

Writing Tools: Proofreading, tone adjustments (friendly, professional, concise).
Summarization: Notification summaries, message preview summaries.
Generation: Genmoji creation.
Siri Context: Understanding on-device data (e.g., “What’s on my calendar?”).
Photos: Intelligent search (after local indexing) and the “Clean up” tool.

Escalated to Private Cloud Compute (PCC): These tasks require more powerful generative models for higher-quality output, deeper reasoning, or broad-world knowledge.

Advanced Writing Tools: More complex “Rewrite” functions, Summary, Key Points, Lists, and Tables.
Mail: Full email summarization and Smart Replies.
Summarization (Broad): Summaries for Safari web pages and Notes audio recordings.
ChatGPT Integration: Any request explicitly routed to a third-party model (which requires user permission).

Architecture of the ~3B Parameter On-Device Model

The default on-device model is a highly optimized foundation model with approximately 3 billion parameters. It is not a generic, off-the-shelf model but a custom-built LLM designed specifically for efficient inference on Apple Silicon’s Neural Engine.

Performance and Optimization: To meet the strict memory, power, and performance requirements of a mobile device, the model employs aggressive optimization techniques. This includes a mixed 2-bit and 4-bit “low-bit palletization” strategy, achieving an average of 3.7 bits-per-weight. This aggressive quantization is key to fitting a 3B-parameter model into the device’s memory. On an iPhone 15 Pro (A17 Pro chip), this optimized model is capable of generating approximately 30 tokens per second.
Model Capabilities (Internal vs. External View): Apple’s internal human-evaluation benchmarks show this ~3B model outperforming larger, well-regarded open-source models like Mistral-7B, Gemma-7B, and Llama-3-8B on a variety of user-facing tasks.
This data, however, is complemented by third-party developer benchmarks. These independent tests suggest that on raw academic NLP benchmarks (like MMLU), the base on-device model may underperform similarly-sized models like Phi-3 Mini. This juxtaposition does not imply a contradiction, but rather a clarification of the model’s purpose. It is not a general-purpose, high-knowledge LLM; it is a highly-tuned, task-oriented engine optimized for the specific functions of Apple Intelligence (summarization, tone adjustment, etc.).

Digital Diet

Mark Watson — Sat, 25 Oct 2025 18:09:10 GMT

I started investing personal time in learning and personal research in 1977 when I read “Mind Inside Matter” and started using Lisp languages. While I believe that adequate random exploration followed by drilling down into interesting topics is useful for career development and enjoyment every few years I evaluate how I spend time and make adjustments. I am performing one of these digital diet phases this week. Hopefully my notes may help you too:

One of my largest time syncs is switching between privacy preserving tools and sometimes more useful tools. I love ProtonMail but I stopped using it yesterday because of the time sync of periodically switching my domain between Gmail and ProtonMail and other overhead of using two email providers. While I believe privacy is important in general, in my particular case I spend all my time writing open source software and writing open content books: everything I do is non-private. If I ran a company, we would probably use Proton products, but for my independent open work having just a single simple email system like Gmail is a time saver. (I configure Google services for maximum privacy and security and minimum data sharing.)
I spend too much time trying to do too much using local LLM models. This week I created a short list of use cases where local models run using Ollama and LM Studio make sense for my work flows (mostly using small models embedded in specific applications). Everything else is now handled by paying for Gemini and GPT-5 (all variants) and I am trying to spend much less time experimenting with all available models (except Anthropic: I don’t like their business model, but that is just me).
OK, now that I have the smaller stuff out of the way, I will address a larger time sync: spending too much time configuring Emacs and VSCode to use many different local and remote LLM-powered AI coding agents and then interrupting editing work flow switching to AI tools inside an editor or IDE. While I do have many useful packages added to both editors I use, for my work flow I am much more time efficient just using Emacs (mostly) and VSCode (sometimes useful for projects with many files) for editing. I am now just using command line coding agents: I am standardizing on just using Google’s gemini-client and OpenAI’s codex when I specifically want help in writing code, tweaking documentation, of brainstorming possible code changes. Splitting AI use out to just command line interactions saves me time and effort. Consider this attitude as ‘anti-IDE.’

Using LLMs vs. 'The Environment'

Mark Watson — Thu, 09 Oct 2025 19:03:57 GMT

There’s a strange irony in watching people talk about global warming and other environmental problems while simultaneously hammering away at chatbots like they’re infinite-energy oracles. Every throwaway prompt like “write my tweet,” “summarize this article,” “draft an email to my cat,” etc. runs on megawatts and datacenter water cooling. It’s not that using LLMs is evil; it’s that overusing them as a replacement for thinking has become its own form of digital pollution. While there is a real environmental cost, perhaps the erosion of intellectual self-sufficiency is even more damaging.

Agentic programming tools take this wastefulness to another level. These “autonomous” agents chain LLM calls together, recursively prompting other models, often without meaningful human oversight. Each step burns compute cycles. The result isn’t intelligence—it’s a Rube Goldberg machine of API calls, an illusion of progress paid for in electricity, latency, and cognitive outsourcing. If overusing LLM Chat is leaving the lights on, then agentic systems are the AI equivalent of lighting up Las Vegas to toast a single slice of bread.

To be clear, dear reader, I use gemini-cli and codex a few times a week to good effect. What I am complaining about here is devs on social media bragging how many agentic AI coding sessions they keep running in multiple sessions.

The energy and other environmental externalities for over-using or miss-using AI are extreme, dumping on our environment and our future economy.

Gerbil Scheme OpenAI, Google Gemini, and Ollama Local Model Clients

Mark Watson — Sun, 24 Aug 2025 20:05:46 GMT

Gerbil Scheme is a modern Lisp with a “batteries included” library. Here we look at three simple examples that demonstrate network IO and handling JSON responses.

September 19, 2025 update: I have released a book about Gerbil Scheme: Gerbil Scheme in Action - AI Applications, Network Programming, and Utilities

Let’s start with local model (Ollama must be installed and ‘ollama serve’ should be running):

(import :std/net/request :std/text/json)
(export ollama)

(def (ollama prompt
             model: (model "gemma3:latest")) ;; "gpt-oss:20b")) ;; "qwen3:0.6b"))
  (let* ((endpoint "http://localhost:11434/api/generate")
         (headers '(("Content-Type". "application/json")))
         (body-data 
           (list->hash-table
             `(("model". ,model) ("prompt". ,prompt) ("stream". #f))))
         (body-string (json-object->string body-data)))

    (let ((response
            (http-post endpoint headers: headers data: body-string)))
      (if (= (request-status response) 200)
          (let* ((response-json (request-json response)))
            ;;(displayln (hash-keys response-json))
            (hash-ref response-json 'response))
          (error "Ollama API request failed"
                 status: (request-status response)
                 body: (request-text response))))))

;;  (ollama "why is the sky blue? Be very concise.")

You would run this example using:

$ gcx ollama.ss
$ gxi 
Gerbil v0.18.1-78-gc5546da0 on Gambit v4.9.5-124-g6d1a9a9b
> (import :ollama/ollama)
> (ollama "why is the sky blue?")
"The blue color of the sky is a...."

The Gemini example is similar except the web service call must be authenticated:

;; File gemini.ss
(import :std/net/request
        :std/text/json)

(export gemini)

(def uri2 "https://generativelanguage.googleapis.com/v1beta/models/")

(def (gemini prompt
             model: (model "gemini-2.5-flash")
             system-prompt:
             (system-prompt "You are a helpful assistant."))
     (let ((api-key (get-environment-variable "GOOGLE_API_KEY")))
       (unless api-key
         (error "GEMINI_API_KEY environment variable not set."))

       (let* ((headers `(("Content-Type". "application/json")
                         ("x-goog-api-key". ,api-key)))
              (body-data
               (list->hash-table
                `(("contents".
                   ,(list
                      (list->hash-table
                         `(("role". "user")
                           ("parts".
                             ,(list
                               (list->hash-table
                                `(("text". ,prompt))))))))))))
              (body-string (json-object->string body-data))
              (endpoint
               (string-append
                 uri2
                 model ":generateContent?key=" api-key)))

         (let ((response
                (http-post
                  endpoint headers: headers data: body-string)))
           (if (= (request-status response) 200)
               (let* ((response-json (request-json response))
                      (candidate
                        (car (hash-ref response-json 'candidates)))
                      (content (hash-ref candidate 'content))
                      (p1 (car (hash-ref content 'parts))))
                 (hash-ref p1 'text)))))))

;;   (gemini "why is the sky blue? be very concise")

The OpenAI example is similar:

(import :std/net/request
        :std/text/json)

(export openai)

(def (openai
       prompt
       model: (model "gpt-5-mini")
       system-prompt: (system-prompt "You are a helpful assistant."))
     (let ((api-key (get-environment-variable "OPENAI_API_KEY")))
    (unless api-key
      (error "OPENAI_API_KEY environment variable not set."))

    (let* ((headers `(("Content-Type". "application/json")
                      ("Authorization".
                       ,(string-append "Bearer " api-key))))
           (body-data
            (list->hash-table
             `(("model". ,model)
               ("messages".
                ,(list
                   (list->hash-table
                    `(("role". "system")
                      ("content". ,system-prompt)))
                   (list->hash-table `(("role". "user")
                                       ("content". ,prompt))))))))
           (body-string (json-object->string body-data))
           (endpoint "https://api.openai.com/v1/chat/completions"))

      (let ((response
              (http-post
                endpoint headers: headers data: body-string)))
        (if (= (request-status response) 200)
            (let* ((response-json (request-json response))
                   (choices (hash-ref response-json 'choices))
                   (first-choice (and (pair? choices) (car choices)))
                   (message (hash-ref first-choice 'message))
                   (content (hash-ref message 'content)))
              content)
            (error "OpenAI API request failed"
                   status: (request-status response)
                   body: (request-text response)))))))

;;  (openai "why is the sky blue? be very concise")

Gerbil Scheme is useful for general network and concurrent programming tasks.

How do you manage your time evaluating new AI tools?

Mark Watson — Fri, 01 Aug 2025 14:33:40 GMT

Hello dear readers, do you ever get frustrated at the end of a week when you realize how many hours you spent reading about new AI tools (most frequently LLM based, sometimes RL and good old fashioned ML) and trying them? I do!

As a form of “digital diet” I now only pay a yearly subscription for Google Gemini Plus, ignoring OpenAI and Claude. Do I have fear of missing out (FOMO) when, for example, I hear people raving on Hacker News that Claude Code is the “bat’s ass” best coding tool? Nope. I just happily switch between using Google’s gemini-cli in either free mode or paid API mode and usually have a pleasant and productive coding session - but after some soul searching I have changed my habits to often just coding from scratch and not by default start using a coding agent.

I really believe that the big win using AI is in improving ourselves, as a learning tool, and for research. Dear readers, obviously you get to decide for yourselves how much you use AI vs. ‘doing it all yourself.’

One other thing, some news: I am writing a new book based on using LM Studio to work using AI offline.

I love using Ollama for personal research, and I enjoy maintaining my book on Ollama by adding examples and updating it. However, I find that I use Ollama and LM Studio for different local AI use cases so it is worth writing down what I am learning and my workflows (with plenty of example code) in a separate book. I only have three chapters written so far; in a few weeks I will publish the first version and post on Substack a link to read the new book online.

Added material for using Moonshot AI's Kimi K2 model to my Common Lisp book

Mark Watson — Tue, 29 Jul 2025 18:09:04 GMT

You can reads the new material online at https://leanpub.com/lovinglisp/read#leanpub-auto-moonshots-kimi-k2-model

Enjoy!

I now keep both the example programs and the book’s manuscript files in one public repository: https://github.com/mark-watson/loving-common-lisp

I am taking a break from LLMs, updating my Lisp books instead

Mark Watson — Thu, 24 Jul 2025 23:40:10 GMT

For over two years I have spent most of my personal development time implementing LLM-based RAG system, experimenting with local models, and trying to build my own LLM-based tools from the ground up. I now feel stuck with LLM tool calling and building agentic systems: it is simple enough to build demo or example systems but I find the commercial products like gemini-cli and OpenAI agent mode to be so much more compelling than what I prototype for my own use (especially when using weaker local models running on Ollama) that I feel like taking a break for a few months.

I thought about what I felt like doing for a few months, and it is an easy call: do some Lisp hacking!

I plan on concentrating on Clojure and Common Lisp, but will probably spend some time with Racket and Hy (hylang). Anything interesting I work on will probably be added to one of my existing Lisp books: https://leanpub.com/u/markwatson

A few days ago I did some code maintenance on my 12 year old web site https://cookingspace.com/ that is written in Clojure. If I get that code sufficiently cleaned up I would like to open source it and add a new chapter to my Clojure book.

AI bubbles

Mark Watson — Tue, 15 Jul 2025 14:13:01 GMT

I have lived and worked through two previous ‘AI winters’ and I expect the current bubble to eventually pop in a dramatic way. There will be good things produced by AI, but I am skeptical of the panic FOMO rush to AGI or super intelligence.

I think technology improvements that drastically lower the costs of inference will hollow out the anticipated profits from xAI, Google, Anthropic, OpenAI, etc.

I was experimenting with tool use with Kimi K2 APIs yesterday - very effective (especially with tool use), and so incredibly inexpensive. I am retired, now doing independent research, so my requirements are very different than most tech people who are still in the job market or growing their own business. I find a combination of local Ollama models, with very inexpensive APis like Moonshot’s Kimi with occasional Gemini 2.5 Pro use, and occasionally using gemini-cli provides extraordinary value. Am I missing out by not using one or more $200-$300 a month subscriptions? Probably but I don’t care.

Thoughts in NVdia’s $4T valuation

There is no doubt that NVidia is a very well run company, great research and great products.

How large is NVidia’s moat?

NVidia is being propped up by the US government. Huawei’s new chips are lower tech but would probably hit a good ‘practical sweet spot’ for AI data centers in many countries around the world but our current administration is threatening economic violence against any countries who choose to use more cost effective Huawei AI chips. Given protectionism by the US government and the quality of NVidia’s products I think NVidia’s stock value is secure for now, but it will probably eventually get disrupted.

AI needs highly effective continuous learning

Mark Watson — Sat, 05 Jul 2025 17:34:34 GMT

A hallmark of human cognition is the ability for continual, data-efficient learning and knowledge consolidation. Current techniques like RAG, which treat past interactions as a static corpus for retrieval, fail to replicate this dynamic process, limiting agents to a non-evolving, superficial memory. Knowledge graphs, drawing from Semantic Web technologies, provide a robust formalism for representing episodic memory for rich, contextual retrieval. The critical gap, however, is not just in memory representation, but in the autonomous mechanisms for lifelong learning—enabling an agent to generalize and evolve its internal models from that stored experience.

I would love to see more engineering and research (or Common Lisp hacking if you roll that way 😃) directed towards algorithms for building, maintaining, accessing and using episodic memories of agents’ experiences.

My new book "Practical AI with Google: A Solo Knowledge Worker's Guide to Gemini, AI Studio, and LLM APIs"

Mark Watson — Mon, 05 May 2025 16:32:46 GMT

You can read my new book online at Practical AI with Google: A Solo Knowledge Worker's Guide to Gemini, AI Studio, and LLM APIs

Most books I have written are deeply technical - this is not one of them. Here, I attempt to make state of the art AI approachable and usable to a wider and not necessarily technical audience.

A Decision Surface

My previous book Ollama in Action: Building Safe, Private AI with LLMs, Function Calling and Agents concerned running local LLMs using Ollama - an idea I like for security and privacy (and fun!) reasons. Recent reasoning models like qwen3:30b, gemma3:27b-it-qat, and qwen3:30b can be run very well on a high end home computer, but they are very slow compared to using APIs from Google, OpenAI, etc. and that slows down my development process.

For a few years I wrote my own utilities using LLMs (both local and commercial APIs) to get stuff done, now I do much less coding, using instead products built around Gemini (and less often ChatGPT) because the product offerings do mostly what I want with no custom coding on my part.

I have been retired for a two years but I still spend 3 to 4 hours a day performing what I call “personal research” so my developer use cases are probably different than most people reading this. Something we all share however is the experience of living through exponential growth of AI capabilities.

What I can run on my home system (a Mac mini M2Pro with 32G) seems to be lagging in the capabilities of commercial APIs and end user AI products by about 6 months. For now I play in both worlds: state of the art commercial AI and what I can run locally.

LLM Based Agents: Comparison Overview between LlamaIndex AgentWorkFlow and Microsoft AutoGen

Mark Watson — Wed, 05 Feb 2025 21:09:47 GMT

I have been experimenting with writing agents using small local LLMs, with mixed results. Experiments that perform poorly with small models running locally with Ollama usually work much better with larger models like OpenAI O1, etc. For my talk today I mostly used gpt-o1-mini, and ended with a simp,e example using Groq to run a distilled Deepseek R1 model.

Every thing is in this notebook: My talk Colab Notebook

Incredible capability advances of LLMs that I can run on my Mac: Deepseek-R1

Mark Watson — Tue, 21 Jan 2025 00:39:26 GMT

Wow! I have been very pleased with the qwen2.5 and qwen2.5-coder models that easily run on my M2Pro 32G Mac. For reasoning I have been going back to using OpenAI O1 and Claude Sonnet, but after my preliminary tests with Deepseek-R1, I feel like I can do most everything now on my personal computer.

I am using: ollama run deepseek-r1:32b

A few resources:

Download the research paper “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning” from https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf

I recently published my new book “Ollama in Action: Building Safe, Private AI with LLMs, Function Calling and Agents” that can be read free online at https://leanpub.com/ollama/read

While I sometimes enjoy tight integration of LLM tooling with editors like Emacs, VSCode, and PyCharm I also enjoy ‘pure’ editing experiences and using a small script that you run in any source directory and pass a prompt on the command line: https://gist.github.com/mark-watson/8845cea9a38e2655a4b45a91400349c9

Fascinating interview from last year with the Deepseek CEO talking about his motivation for pure research towards AGI and his companies open source strategy: Deepseek: The Quiet Giant Leading China’s AI Race

Wow, Google's NotebookLM rocks for creating podcasts from text

Mark Watson — Wed, 18 Sep 2024 21:22:05 GMT

Today I took the PDF for my book "Practical Artificial Intelligence Programming With Clojure (you can read it free online here) and used it to create a notebook in Google's NotebookLM and asked for a generated 8 minute podcast. This experimental app created a podcast with two people discussing my book accurately and showing wonderful knowledge of technology. If you want to listen to the audio track that Google's NotebookLM created listen to the audio file attached to this blog article.

New OpenAI gpt-o1-preview and gpt-o1-mini and one week experience with Replit.com AI Coding Agent

Mark Watson — Fri, 13 Sep 2024 17:27:42 GMT

I have only spent a short while experimenting with the ne gtp-o1 models: so far very impressive for science, math, and instruction following. You need a ChatGPT Plus account to try it, or you can perform rate limited queries for half the monthly cost using Abacus AI

The thing I am most impressed with (this week!) is the Replit.co AI coding agent that after briefly trying it I pre-paid for a one year subscription. I quickly rewrote a complex Clojure web app in JavaScript making it much less expensive to host CookingSpace.com

I gave a live demo of Replit AI in my weekly AI demo and group chat. Please join: Mark Watson's informal AI presentations and group chats

Code and notes from my talk today: Exploring the Future of AI: Introduction to using LLMs using Python

Mark Watson — Thu, 15 Aug 2024 20:36:48 GMT

I started an informal code demo and group conversation Meetup group (link) and today I gave a fifteen minute code demo followed by a conversation with the attendees. Here is a GitHub repo with the code examples: https://github.com/mark-watson/talk2_LLM_Python_intro

Here are my talk notes:

Exploring the Future of AI: Introduction to using LLMs using Python

Riff on ‘AI grounding’ and how LLMs help: LLMs, trained on vast amounts of text, excel at recognizing patterns and providing contextually relevant responses. They mimic grounded understanding by referencing large datasets that encompass a variety of real-world scenarios. For example, they can infer meanings from complex contexts by drawing on their training data. When LLMs are integrated with other modalities, such as vision or audio (e.g., vision-language models), the grounding improves. These models can associate text with images or sounds, making the connections more robust and closer to a human-like understanding of concepts.

Tradeoffs between using large context LLMs: where a large body of text is added to a query prompt, to the alternative approach of breaking multiple documents into many separate chunks of text, calculating an embedding vector for each chunk, and then storing the chunks and their associated embedding vectors ins a vector data store.

Long-Context LLMs: designed to support processing large blocks of text, often an entire book, within a single prompt. These models can accommodate extended sequences of text, enabling them to consider more context at once. This is particularly useful for tasks that require maintaining continuity over long narratives or documents. However, long-context LLMs have limitations, such as performance degradation when the context becomes too long, which can lead to reduced accuracy in generating or retrieving relevant information. These models are also computationally expensive, as handling extensive sequences demands significant resources.

On the other hand, vector stores (or vector databases) work by converting text or other unstructured data into high-dimensional vectors using embeddings. These vectors are stored and can be retrieved based on their similarity to a query vector, allowing for efficient semantic search across vast datasets. This approach provides a form of “long-term memory” for LLMs, enabling them to access and retrieve relevant information from large collections of documents without needing to process the entire context at once. Vector stores are particularly useful in retrieval-augmented generation (RAG) systems, where they help the model to find and focus on the most relevant information, improving both efficiency and accuracy .

In essence, while long-context LLMs attempt to handle extensive information within the model’s processing window, vector stores offer an external memory solution that complements LLMs by efficiently managing and retrieving relevant information from larger datasets.

What about LLM hallucinations?

Long context windows and retrieval-augmented generation (RAG) data stores significantly reduce LLM hallucinations by improving the model's access to relevant and accurate information during the generation process.

1. Long Context Windows: When LLMs are equipped with long context windows, they can process and retain more information within a single session. This allows the model to maintain continuity and consistency over extended text, reducing the chances of fabricating information that doesn't align with the given context or user query. By having access to more surrounding context, the model can generate more coherent and accurate responses that are anchored in the actual input data.

2. RAG Embedding Vector Data Stores: In a RAG setup, an LLM is paired with a vector store that holds a vast amount of pre-processed, structured information. When a query is posed, the model retrieves relevant documents or data snippets from this store, which then informs the generation process. This retrieval step grounds the model's output in factual data, effectively reducing the likelihood of hallucinations. Since the model can rely on precise and contextually relevant information, it is less prone to generating plausible-sounding but incorrect or nonsensical content.

Together, these approaches enhance the reliability of LLM outputs. Long context windows allow the model to consider more of the input in a single pass, while RAG ensures that the model has access to verified information, leading to fewer instances of hallucination and more trustworthy results.

Tools I am using: Aider+Anthropic and Rye

Mark Watson — Wed, 10 Jul 2024 22:43:37 GMT

Aider is AI pair programming in your terminal and Rye is my new driver for Python development.

I have a keen interest in the future of work and the automation of what I think of as “knowledge work.” Here I share my latest work setup for Python. Please add your own favorite tools in the comments. I am still experimenting with LLM-based tooling for Lisp languages.

Aider

Aider is a command-line tool for code refactoring and bug fixing, leveraging AI capabilities to improve code quality and development efficiency. It integrates with version control systems to understand the project’s history and provide contextual suggestions. Aider can analyze the codebase, identify potential issues, and recommend or automatically apply fixes, streamlining the development process and enhancing code maintainability. Aider documentation

Rye

Rye is a Python package manager and resolver designed to simplify dependency management and streamline the Python development workflow. It offers features like:

• Dependency Resolution: Efficiently resolves and installs package dependencies, ensuring compatibility and minimizing conflicts.
• Environment Management: Supports the creation and management of isolated environments, similar to tools like virtualenv and pipenv.
• User-Friendly CLI: Provides a straightforward command-line interface for managing packages and environments.
• Project Configuration: Facilitates the definition and maintenance of project-specific configurations and dependencies.

Rye is particularly useful for maintaining clean and reproducible development environments, making it easier to manage Python projects. Rye documentation

Work flow

I have Emacs and VSCode configured to run Rye project targets and unit tests.

After you install Aider:

pip install aider-chat

You can just run aider on the command line. By default it looks for environment variables for commercial LLM hosts or a local Ollama server, for example:

export ANTHROPIC_API_KEY=
export OPENAI_API_KEY=
export OLLAMA_API_BASE=http://127.0.0.1:11434

It handles use cases like naming an existing function and asking for something similar with a new argument, new functionality, different functionality, etc. It shows git diffs of any changes it wants to make. The first time I used Aider, in ten minutes I added two new features to my project and generated test data and unit tests.

I find both Rye and Aider to make development very fast and pleasant.