inference – Hackaday

Analog Optical Computer for Inference and Combinatorial Optimization

Maya Posch — Fri, 12 Sep 2025 02:00:06 +0000

Although computers are overwhelmingly digital today, there’s a good point to be made that analog computers are the more efficient approach for specific applications. The authors behind a recent paper in Nature are arguing that inference – essential for LLMs – can be done significantly more efficiently using an analog optical computer (AOC).

As the authors describe it, the function of this AOC is to perform a fixed-point search using only optical and analog electronic components. The optics handle the matrix-vector multiplications, while the analog components handle the non-linear operations, subtractions and annealing. This is performed in 20 ns cycles until noise has been reduced to an acceptable level, considering the analog nature of the computer. A big advantage here is that no analog-digital conversions are required as with other (digital) hybrid systems.

So far a small-scale AOC has been constructed for tasks like image classification and non-linear regression tasks, with the authors claiming the AOC being over a hundred times more efficient than current GPU-derived vector processors.

Why LLaMa Is A Big Deal

Matthew Carlson — Wed, 22 Mar 2023 14:00:27 +0000

You might have heard about LLaMa or maybe you haven’t. Either way, what’s the big deal? It’s just some AI thing. In a nutshell, LLaMa is important because it allows you to run large language models (LLM) like GPT-3 on commodity hardware. In many ways, this is a bit like Stable Diffusion, which similarly allowed normal folks to run image generation models on their own hardware with access to the underlying source code. We’ve discussed why Stable Diffusion matters and even talked about how it works.

LLaMa is a transformer language model from Facebook/Meta research, which is a collection of large models from 7 billion to 65 billion parameters trained on publicly available datasets. Their research paper showed that the 13B version outperformed GPT-3 in most benchmarks and LLama-65B is right up there with the best of them. LLaMa was unique as inference could be run on a single GPU due to some optimizations made to the transformer itself and the model being about 10x smaller. While Meta recommended that users have at least 10 GB of VRAM to run inference on the larger models, that’s a huge step from the 80 GB A100 cards that often run these models.

While this was an important step forward for the research community, it became a huge one for the hacker community when [Georgi Gerganov] rolled in. He released llama.cpp on GitHub, which runs the inference of a LLaMa model with 4-bit quantization. His code was focused on running LLaMa-7B on your Macbook, but we’ve seen versions running on smartphones and Raspberry Pis. There’s even a version written in Rust! A rough rule of thumb is anything with more than 4 GB of RAM can run LLaMa. Model weights are available through Meta with some rather strict terms, but they’ve been leaked online and can be found even in a pull request on the GitHub repo itself.

Aside from occasionally funny and quirky projects, how does having a local GPT-3 like chatbot impact us? The simple fact is that it is accessible to hackers. Not only can you run it, but the code is available, the models are trained on publicly available data, so you could train your own though it took 21 days on 2048 A100 GPUs, and it’s useful enough to provide reasonable output. Stanford even released a version called Alpaca that is LLaMa-7B fine-tuned for instruction following which elevates it from a simple chatbot to a bot able to follow instructions. There is even a guide on how to replicate Alpaca yourself for less than $100 of cloud computing.

Of course, like most current LLMs, LLaMa suffers from the same problems of hallucination, bias, and stereotypes. When asked to generate code, it can try to request endpoints that don’t exist. When asked what the capital of Tanzania is, it will reply Dar es Salaam instead of Dodoma. Researchers haven’t solved the problem of trying to secure a black box, as it is still far too easy to get the model to do something its creators tried hard to prevent.

While it is incredible to think that just a few weeks ago it would have been ridiculous to think you could run a GPT-3 level model on your personal laptop, this ultimately asks the question: what will we do with this? The easy answer is sophisticated spam. Long term there are concerns that large language models could replace programmers and writers. For writing or tweaking small programs, it is already quite good as [Simon Wilson] demonstrated by asking it to generate some AppleScript. However, that is still up for debate. Being able to spit out an accurate answer to a question does not a human make. What do you do with the raw sort of bias-confused amorphous intelligence that is ChatGPT and other LLMs now running locally?

Rather than connecting to an API, the Raspberry Pi inside of this old typewriter can run it entirely locally with no internet connection required. Because the model is smaller, it becomes much easier to fine-tune for your use case. By taking a bunch of dialog from a TV show (let’s say the Simpsons) you could fine-tune the model to respond like a character from the show. Looking further into the future, there is an excellent paper called ReAct that tries to put something like an internal dialog into chat GPT by asking it to output questions, thoughts, and actions. A good example might be this:

Question: How much bigger is the land in Seattle, Washington versus the water? Thought: I need to use Wikipedia to look up the square footage of the city area and the water Action: search_wikipedia: Seattle, WA Observation: • City 142.07 sq mi (367.97 km2) • Land 83.99 sq mi (217.54 km2) • Water 58.08 sq mi (150.43 km2) • Metro 8,186 sq mi (21,202 km2) Thought: The city is 142.07 square miles and the water is 58.08 square miles, I should calculate a ratio. Action: calculate: 142.07 / 58.08 Observation: 2.4461 Answer: The land is 2.4x the size of the water or 83.99 square miles bigger

You can see how this forms a loop where complex actions can be broken down to be performed by simplified helpers, like searching Wikipedia, calling APIs, controlling smart appliances, or actuating motors. Google has been experimenting with the concept in their PaLM-SayCan system, which uses an LLM (PaLM) and breaks it down into smaller tasks.

We can see LLaMa powering NPCs in video games, optimizing blog titles, and controlling robots. So understandably, we’re quite curious to see what you all do with it. One thing is for sure, though. Putting this in the hands of creative hackers is going to be fun.

Spatial AI and CV Hack Chat

Dan Maloney — Mon, 29 Nov 2021 20:01:47 +0000

Join us on Wednesday, December 1 at noon Pacific for the Spatial AI and CV Hack Chat with Erik Kokalj!

A lot of what we take for granted these days existed only in the realm of science fiction not all that long ago. And perhaps nowhere is this more true than in the field of machine vision. The little bounding box that pops up around everyone’s face when you go to take a picture with your cell phone is a perfect example; it seems so trivial now, but just think about what’s involved in putting that little yellow box on the screen, and how it would not have been plausible just 20 years ago.

Perhaps even more exciting than the development of computer vision systems is their accessibility to anyone, as well as their move into the third dimension. No longer confined to flat images, spatial AI and CV systems seek to extract information from the position of objects relative to others in the scene. It’s a huge leap forward in making machines see like we see and make decisions based on that information.

To help us along the road to incorporating spatial AI into our projects, Erik Kokalj will stop by the Hack Chat. Erik does technical documentation and support at Luxonis, a company working on the edge of spatial AI and computer vision. Join us as we explore the depths of spatial AI.

Our Hack Chats are live community events in the Hackaday.io Hack Chat group messaging. This week we’ll be sitting down on Wednesday, December 1st at 12:00 PM Pacific time. If time zones have you tied up, we have a handy time zone converter.