IdeenTech Global

Understanding Short-Term Memory in LangGraph: A Hands-On Guide

Sajith K — Thu, 13 Mar 2025 04:32:37 +0000

If you’ve ever built a chatbot, you’ve probably noticed how quickly things fall apart when it forgets what you just said. Short-term memory is the key to keeping conversations flowing naturally, and LangGraph makes it straightforward to implement. In this post, we’ll explore how LangGraph handles short-term memory using thread-scoped checkpoints, and I’ll walk you through practical examples to show it in action.

By the end, you’ll see how to build a simple chatbot, add memory to it, retrieve conversation states, and even manage long histories with trimming and summarization techniques. Let’s dive in!

What Is Short-Term Memory in LangGraph?

LangGraph manages short-term memory as part of an agent’s state, persisting it through thread-scoped checkpoints. This state typically includes the conversation history — human inputs and AI responses — along with any other relevant data, like uploaded files or generated outputs. By storing this state in the graph, the agent can maintain full context within a single conversation thread, while keeping different threads isolated from each other.

Let’s start with a basic example and build from there.

A Simple Chatbot Without Memory

First, let’s create a chatbot that doesn’t remember anything. This will help us see the problem memory solves.

				
					from langchain_groq import ChatGroq
from langchain.prompts import ChatPromptTemplate,MessagesPlaceholder
from langgraph.graph import StateGraph,START,END
from langgraph.graph.message import AnyMessage,add_messages
from typing import Annotated,List
from typing_extensions import TypedDict

class State(TypedDict):
    messages:Annotated[List[AnyMessage],add_messages]

llm = ChatGroq(model="llama-3.3-70b-versatile",api_key="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")
prompt_template = ChatPromptTemplate.from_messages(
    [
        ("system", "{system_message}"),
        MessagesPlaceholder("messages")
    ]
)
llm_model = prompt_template | llm

graph_builder=StateGraph(State)

def ChatNode(state:State)->State:
    system_message="You are an assistant"
    state["messages"]=llm_model.invoke({"system_message":system_message,"messages":state["messages"]})
    return state

graph_builder.add_node("chatnode",ChatNode)
graph_builder.add_edge(START,"chatnode")
graph_builder.add_edge("chatnode",END)
graph=graph_builder.compile()

# First input
input_state={"messages":["My name is sajith"]}
response_state=graph.invoke(input_state)
for message in response_state["messages"]:
    message.pretty_print()

# Second input
input_state={"messages":["Who am i?"]}
response_state=graph.invoke(input_state)
for message in response_state["messages"]:
    message.pretty_print()

Output:

				
					================================ Human Message =================================
My name is Sajith
================================== Ai Message ==================================
Hello Sajith, it's nice to meet you. Is there something I can help you with or would you like to chat?

				
					================================ Human Message =================================
Who am I?
================================== Ai Message ==================================
Unfortunately, I don't have any information about you, so I'm not sure who you are...

Notice how the bot forgets my name between the two inputs. Each invocation starts fresh, with no memory of prior messages. That’s fine for one-off responses, but terrible for a conversation. Let’s fix that.

Adding Short-Term Memory with Checkpoints

To give our chatbot memory, we’ll use LangGraph’s MemorySaver checkpointer and a thread_id to persist state across invocations. Here’s the updated code:

				
					from langgraph.checkpoint.memory import MemorySaver

# Same imports and setup as before...

graph_builder = StateGraph(State)

def ChatNode(state: State) -> State:
    system_message = "You are an assistant"
    state["messages"] = llm_model.invoke({"system_message": system_message, "messages": state["messages"]})
    return state

graph_builder.add_node("chatnode", ChatNode)
graph_builder.add_edge(START, "chatnode")
graph_builder.add_edge("chatnode", END)
graph = graph_builder.compile(checkpointer=MemorySaver())

config={"configurable":{"thread_id":1}}

# First Input
input_state={"messages":["My name is sajith"]}
response_state=graph.invoke(input_state,config=config)
for message in response_state["messages"]:
    message.pretty_print()

# Second Input
input_state={"messages":["Who am i?"]}
response_state=graph.invoke(input_state,config=config)
for message in response_state["messages"]:
    message.pretty_print()

Output:

				
					================================ Human Message =================================
My name is Sajith
================================== Ai Message ==================================
Hello Sajith, it's nice to meet you. Is there something I can help you with or would you like to chat?

				
					================================ Human Message =================================
My name is Sajith
================================== Ai Message ==================================
Hello Sajith, it's nice to meet you. Is there something I can help you with or would you like to chat?
================================ Human Message =================================
Who am I?
================================== Ai Message ==================================
You are Sajith, the person I'm having a conversation with. If you'd like to share more about yourself, I'm all ears! What do you do, or what are your interests, Sajith?

Now the bot remembers my name! The MemorySaver persists the state, and the thread_id ensures all interactions stay within the same conversation thread. Each new input appends to the existing message history, giving the LLM full context.

Peeking Into the State

LangGraph lets you inspect the conversation state at any point using the thread_id. Here’s how to get the current state snapshot:

				
					config = {"configurable": {"thread_id": 1}}
state = graph.get_state(config=config)
print(state)

Or fetch a specific state using a checkpoint_id:

				
					config = {"configurable": {"thread_id": 1,"checkpoint_id":"1eff1b93-8255-698c-8004-bc52982a4a1d"}}
state = graph.get_state(config)
print(state)

Output:

				
					StateSnapshot(
    values={'messages': 
    [HumanMessage(content='My name is sajith', additional_kwargs={}, response_metadata={}, id='a504dce7-b478-4c1c-a6cd-426d0447b6c1'), 
    AIMessage(content="Hello Sajith, it's nice to meet you. Is there something I can help you with or would you like to chat?", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 28, 'prompt_tokens': 45, 'total_tokens': 73, 'completion_time': 0.101818182, 'prompt_time': 0.0047757, 'queue_time': 0.233790742, 'total_time': 0.106593882}, 'model_name': 'llama-3.3-70b-versatile', 'system_fingerprint': 'fp_2ca0059abb', 'finish_reason': 'stop', 'logprobs': None}, id='run-4ee5966e-8d52-43b7-893c-05f5162181ca-0', usage_metadata={'input_tokens': 45, 'output_tokens': 28, 'total_tokens': 73}), 
    HumanMessage(content='Who am i?', additional_kwargs={}, response_metadata={}, id='22804775-6822-420a-bf33-e2a1b5a8e80d'), 
    AIMessage(content="You are Sajith, the person I'm having a conversation with. If you'd like to share more about yourself, I'm all ears! What do you do, or what are your interests, Sajith?", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 46, 'prompt_tokens': 86, 'total_tokens': 132, 'completion_time': 0.167272727, 'prompt_time': 0.00620371, 'queue_time': 0.233534108, 'total_time': 0.173476437}, 'model_name': 'llama-3.3-70b-versatile', 'system_fingerprint': 'fp_2ca0059abb', 'finish_reason': 'stop', 'logprobs': None}, id='run-619f4f89-25d0-4641-b86d-3f2b26269cbf-0', usage_metadata={'input_tokens': 86, 'output_tokens': 46, 'total_tokens': 132})
    ]}, 
    next=(), 
    config={'configurable': {'thread_id': 1, 'checkpoint_ns': '', 'checkpoint_id': '1eff1b93-8255-698c-8004-bc52982a4a1d'}},
    metadata={'source': 'loop', 'writes': {'chatnode': {'messages': AIMessage(content="You are Sajith, the person I'm having a conversation with. If you'd like to share more about yourself, I'm all ears! What do you do, or what are your interests, Sajith?", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 46, 'prompt_tokens': 86, 'total_tokens': 132, 'completion_time': 0.167272727, 'prompt_time': 0.00620371, 'queue_time': 0.233534108, 'total_time': 0.173476437}, 'model_name': 'llama-3.3-70b-versatile', 'system_fingerprint': 'fp_2ca0059abb', 'finish_reason': 'stop', 'logprobs': None}, id='run-619f4f89-25d0-4641-b86d-3f2b26269cbf-0', usage_metadata={'input_tokens': 86, 'output_tokens': 46, 'total_tokens': 132})}}, 'thread_id': 1, 'step': 4, 'parents': {}}, 
    created_at='2025-02-23T07:38:48.499019+00:00', 
    parent_config={'configurable': {'thread_id': 1, 'checkpoint_ns': '', 'checkpoint_id': '1eff1b93-7a2a-6554-8003-2b5dfd35edf7'}}, 
    tasks=())

You can also retrieve the entire state history for a thread_id .This code will return all the snapshots (6 snapshots till now) with the checkpoint_id of each snapshot:

				
					config = {"configurable": {"thread_id": 1}}
state_history = graph.get_state_history(config)
for state in state_history:
    print(state)

Output:

				
					StateSnapshot(
    values={'messages': 
    [HumanMessage(content='My name is sajith', additional_kwargs={}, response_metadata={}, id='a504dce7-b478-4c1c-a6cd-426d0447b6c1'), 
    AIMessage(content="Hello Sajith, it's nice to meet you. Is there something I can help you with or would you like to chat?", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 28, 'prompt_tokens': 45, 'total_tokens': 73, 'completion_time': 0.101818182, 'prompt_time': 0.0047757, 'queue_time': 0.233790742, 'total_time': 0.106593882}, 'model_name': 'llama-3.3-70b-versatile', 'system_fingerprint': 'fp_2ca0059abb', 'finish_reason': 'stop', 'logprobs': None}, id='run-4ee5966e-8d52-43b7-893c-05f5162181ca-0', usage_metadata={'input_tokens': 45, 'output_tokens': 28, 'total_tokens': 73}), 
    HumanMessage(content='Who am i?', additional_kwargs={}, response_metadata={}, id='22804775-6822-420a-bf33-e2a1b5a8e80d'), 
    AIMessage(content="You are Sajith, the person I'm having a conversation with. If you'd like to share more about yourself, I'm all ears! What do you do, or what are your interests, Sajith?", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 46, 'prompt_tokens': 86, 'total_tokens': 132, 'completion_time': 0.167272727, 'prompt_time': 0.00620371, 'queue_time': 0.233534108, 'total_time': 0.173476437}, 'model_name': 'llama-3.3-70b-versatile', 'system_fingerprint': 'fp_2ca0059abb', 'finish_reason': 'stop', 'logprobs': None}, id='run-619f4f89-25d0-4641-b86d-3f2b26269cbf-0', usage_metadata={'input_tokens': 86, 'output_tokens': 46, 'total_tokens': 132})
    ]}, 
    next=(), 
    config={'configurable': {'thread_id': 1, 'checkpoint_ns': '', 'checkpoint_id': '1eff1b93-8255-698c-8004-bc52982a4a1d'}},
    metadata={'source': 'loop', 'writes': {'chatnode': {'messages': AIMessage(content="You are Sajith, the person I'm having a conversation with. If you'd like to share more about yourself, I'm all ears! What do you do, or what are your interests, Sajith?", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 46, 'prompt_tokens': 86, 'total_tokens': 132, 'completion_time': 0.167272727, 'prompt_time': 0.00620371, 'queue_time': 0.233534108, 'total_time': 0.173476437}, 'model_name': 'llama-3.3-70b-versatile', 'system_fingerprint': 'fp_2ca0059abb', 'finish_reason': 'stop', 'logprobs': None}, id='run-619f4f89-25d0-4641-b86d-3f2b26269cbf-0', usage_metadata={'input_tokens': 86, 'output_tokens': 46, 'total_tokens': 132})}}, 'thread_id': 1, 'step': 4, 'parents': {}}, 
    created_at='2025-02-23T07:38:48.499019+00:00', 
    parent_config={'configurable': {'thread_id': 1, 'checkpoint_ns': '', 'checkpoint_id': '1eff1b93-7a2a-6554-8003-2b5dfd35edf7'}}, 
    tasks=())
StateSnapshot(
    values={'messages': 
    [HumanMessage(content='My name is sajith', additional_kwargs={}, response_metadata={}, id='fb2530a6-23a9-49b9-aa05-05ee3d6033b5'), 
    AIMessage(content="Hello Sajith, it's nice to meet you. Is there something I can help you with or would you like to chat?", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 28, 'prompt_tokens': 45, 'total_tokens': 73, 'completion_time': 0.101818182, 'prompt_time': 0.005301916, 'queue_time': 0.23472177200000002, 'total_time': 0.107120098}, 'model_name': 'llama-3.3-70b-versatile', 'system_fingerprint': 'fp_0a4b7a8df3', 'finish_reason': 'stop', 'logprobs': None}, id='run-9bd34581-f421-4ba4-8e4f-4a8ebe7dc0b2-0', usage_metadata={'input_tokens': 45, 'output_tokens': 28, 'total_tokens': 73}), 
    HumanMessage(content='Who am i?', additional_kwargs={}, response_metadata={}, id='609a20d0-f3b6-45b5-8e41-ae3c0d5ef87f')
    ]}, 
    next=('chatnode',), 
    config={'configurable': {'thread_id': 1, 'checkpoint_ns': '', 'checkpoint_id': '1eff3ff0-f18a-60bb-8003-0206ca59dd82'}},
    metadata={'source': 'loop', 'writes': None, 'thread_id': 1, 'step': 3, 'parents': {}}, 
    created_at='2025-02-26T05:03:46.725660+00:00', 
    parent_config={'configurable': {'thread_id': 1, 'checkpoint_ns': '', 'checkpoint_id': '1eff3ff0-f17a-6dec-8002-a34664ba3775'}}, 
    tasks=(PregelTask(id='c674df80-862b-a0af-9be3-0e7b687fab82', name='chatnode', path=('__pregel_pull', 'chatnode'), error=None, interrupts=(), state=None, result={'messages': AIMessage(content="You are Sajith, the person I'm currently chatting with. If you'd like to share more about yourself, I'm all ears! What do you do, or what are your interests?", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 41, 'prompt_tokens': 86, 'total_tokens': 127, 'completion_time': 0.149090909, 'prompt_time': 0.006059721, 'queue_time': 0.232343247, 'total_time': 0.15515063}, 'model_name': 'llama-3.3-70b-versatile', 'system_fingerprint': 'fp_5f849c5a0b', 'finish_reason': 'stop', 'logprobs': None}, id='run-1c6fa3f7-cc6a-4481-83de-98423bdbdb49-0', usage_metadata={'input_tokens': 86, 'output_tokens': 41, 'total_tokens': 127})}),))

....

StateSnapshot(
    values={'messages': 
    [HumanMessage(content='My name is sajith', additional_kwargs={}, response_metadata={}, id='fb2530a6-23a9-49b9-aa05-05ee3d6033b5')
    ]}, 
    next=('chatnode',), 
    config={'configurable': {'thread_id': 1, 'checkpoint_ns': '', 'checkpoint_id': '1eff3ff0-e021-6e1b-8000-f334729d7d33'}},
    metadata={'source': 'loop', 'writes': None, 'thread_id': 1, 'step': 0, 'parents': {}}, 
    created_at='2025-02-26T05:03:44.900445+00:00', 
    parent_config={'configurable': {'thread_id': 1, 'checkpoint_ns': '', 'checkpoint_id': '1eff3ff0-e01d-6340-bfff-9f60f86089c9'}}, 
    tasks=(PregelTask(id='7643d589-07fc-af50-26da-056cf318eca4', name='chatnode', path=('__pregel_pull', 'chatnode'), error=None, interrupts=(), state=None, result={'messages': AIMessage(content="Hello Sajith, it's nice to meet you. Is there something I can help you with or would you like to chat?", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 28, 'prompt_tokens': 45, 'total_tokens': 73, 'completion_time': 0.101818182, 'prompt_time': 0.005301916, 'queue_time': 0.23472177200000002, 'total_time': 0.107120098}, 'model_name': 'llama-3.3-70b-versatile', 'system_fingerprint': 'fp_0a4b7a8df3', 'finish_reason': 'stop', 'logprobs': None}, id='run-9bd34581-f421-4ba4-8e4f-4a8ebe7dc0b2-0', usage_metadata={'input_tokens': 45, 'output_tokens': 28, 'total_tokens': 73})}),))

StateSnapshot(
    values={'messages': 
    []}, StateSnapshot(
    values={'messages': 
    [HumanMessage(content='My name is sajith', additional_kwargs={}, response_metadata={}, id='fb2530a6-23a9-49b9-aa05-05ee3d6033b5')
    ]}, 
    next=('chatnode',), 
    config={'configurable': {'thread_id': 1, 'checkpoint_ns': '', 'checkpoint_id': '1eff3ff0-e021-6e1b-8000-f334729d7d33'}},
    metadata={'source': 'loop', 'writes': None, 'thread_id': 1, 'step': 0, 'parents': {}}, 
    created_at='2025-02-26T05:03:44.900445+00:00', 
    parent_config={'configurable': {'thread_id': 1, 'checkpoint_ns': '', 'checkpoint_id': '1eff3ff0-e01d-6340-bfff-9f60f86089c9'}}, 
    tasks=(PregelTask(id='7643d589-07fc-af50-26da-056cf318eca4', name='chatnode', path=('__pregel_pull', 'chatnode'), error=None, interrupts=(), state=None, result={'messages': AIMessage(content="Hello Sajith, it's nice to meet you. Is there something I can help you with or would you like to chat?", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 28, 'prompt_tokens': 45, 'total_tokens': 73, 'completion_time': 0.101818182, 'prompt_time': 0.005301916, 'queue_time': 0.23472177200000002, 'total_time': 0.107120098}, 'model_name': 'llama-3.3-70b-versatile', 'system_fingerprint': 'fp_0a4b7a8df3', 'finish_reason': 'stop', 'logprobs': None}, id='run-9bd34581-f421-4ba4-8e4f-4a8ebe7dc0b2-0', usage_metadata={'input_tokens': 45, 'output_tokens': 28, 'total_tokens': 73})}),))
    next=('__start__',), 
    config={'configurable': {'thread_id': 1, 'checkpoint_ns': '', 'checkpoint_id': '1eff3ff0-e01d-6340-bfff-9f60f86089c9'}},
    metadata={'source': 'input', 'writes': {'__start__': {'messages': ['My name is sajith']}}, 'thread_id': 1, 'step': -1, 'parents': {}}, 
    created_at='2025-02-26T05:03:44.898532+00:00', 
    parent_config=None, 
    tasks=(PregelTask(id='09ab64cc-a528-2dab-e307-f2fdba458043', name='__start__', path=('__pregel_pull', '__start__'), error=None, interrupts=(), state=None, result={'messages': ['My name is sajith']}),))

Updating the State

But what if you want to tweak the state — say, to add or correct information? LangGraph’s update_state method lets you modify the state directly. Here’s an example where we update the state to include a new piece of info:

				
					from langchain_core.messages import HumanMessage

# After initial invokes
config = {"configurable": {"thread_id": 1}}
graph.update_state(config, {"messages": [HumanMessage(content="I am a software engineer")]})

# Test the updated state
input_state = {"messages": ["What do I do?"]}
response_state = graph.invoke(input_state, config=config)
for message in response_state["messages"]:
    message.pretty_print()

Output:

				
					================================ Human Message =================================
My name is Sajith
================================== Ai Message ==================================
Hello Sajith, it's nice to meet you. Is there something I can help you with or would you like to chat?
================================ Human Message =================================
Who am I?
================================== Ai Message ==================================
You are Sajith, the person I'm having a conversation with. If you'd like to share more about yourself, I'm all ears! What do you do, or what are your interests, Sajith?
================================ Human Message =================================
I am a software engineer
================================ Human Message =================================
What do I do?
================================== Ai Message ==================================
You’re a software engineer, Sajith! That’s what you told me. Do you work on anything specific, like web development or AI?

With graph.update_state, we injected “I am a software engineer” into the conversation history without running a full node. The next invocation picks up this updated state, and the bot responds with the new context in mind.

Modifying State at a Checkpoint

You can also update the state at a specific checkpoint using the checkpoint_id and continue execution from there. Let’s walk through an example where we update a message “I am a software engineer” to “I live in Kerala”

				
					config = {"configurable":{"thread_id":1}}

# Get all checkpoints
all_checkpoints = []
for state in graph.get_state_history(config):
    all_checkpoints.append(state)

# find the index of checkpoint to restore to
index = 0
selected_index = 0
for state in graph.get_state_history(config):
    index += 1
    if state.values["messages"] != [] and state.values["messages"][-1].content == "I am a software engineer":
        selected_index = index

# update the message at the selected checkpoint
old_config = all_checkpoints[selected_index].config
graph.update_state(old_config, {"messages": [HumanMessage(content="I live in kerala")]})

# Test the updated state
input_state = {"messages": ["Where do i live?"]}
response_state = graph.invoke(input_state, config=config)
for message in response_state["messages"]:
    message.pretty_print()

Output:

				
					================================ Human Message =================================
My name is Sajith
================================== Ai Message ==================================
Hello Sajith, it's nice to meet you. Is there something I can help you with or would you like to chat?
================================ Human Message =================================
Who am I?
================================== Ai Message ==================================
You are Sajith, the person I'm having a conversation with. If you'd like to share more about yourself, I'm all ears! What do you do, or what are your interests, Sajith?
================================ Human Message =================================
I live in kerala
================================ Human Message =================================
Where do i live?
================================== Ai Message ==================================
You live in Kerala, a beautiful state in India known for its stunning natural landscapes, rich culture, and delicious cuisine. Is that correct, Sajith?

We replaced “I am a software engineer” with “I live in kerala” in the conversation history using graph.update_state by specifying the checkpoint_id.

Managing Long Conversations

As conversations grow, the message history can balloon beyond an LLM’s context window, causing errors. LangGraph offers two main strategies to handle this: trimming messages and summarizing conversations.

Trimming Messages

Let’s trim the message list to keep only the last few entries. You can use a custom reducer function to filter messages based on the state information or you can use RemoveMessage to remove message from state when using add_messages as the reducer.

				
					from langgraph.graph.message import RemoveMessage

# Same imports and setup as before...

def filter_node(state: State) -> State:
    # Keep only the last 2 messages
    state["messages"] = [RemoveMessage(id=m.id) for m in state["messages"][:-2]]
    return state

graph_builder = StateGraph(State)
graph_builder.add_node("filternode", filter_node)
graph_builder.add_node("chatnode", ChatNode)
graph_builder.add_edge(START, "filternode")
graph_builder.add_edge("filternode", "chatnode")
graph_builder.add_edge("chatnode", END)
graph = graph_builder.compile(checkpointer=MemorySaver())

Now, older messages are pruned, keeping the context manageable. Alternatively, you can use trim_messages from LangChain to trim based on token count:

				
					from langchain_core.messages import trim_messages

# Same imports and setup as before...

graph_builder = StateGraph(State)

def chat_node(state: State) -> State:
    system_message = "You are an assistant"
    trimmed_messages = trim_messages(
        state["messages"],
        strategy="last",
        token_counter=llm,
        max_tokens=100,
        start_on="human",
        end_on=("human", "tool"),
        include_system=True
    )
    state["messages"] = llm_model.invoke({"system_message": system_message, "messages": trimmed_messages})
    return state

graph_builder.add_node("chatnode", ChatNode)
graph_builder.add_edge(START, "chatnode")
graph_builder.add_edge("chatnode", END)
graph = graph_builder.compile(checkpointer=MemorySaver())

This ensures the context stays within a token limit, preserving the most relevant parts.

Summarizing Conversations

For a smarter approach, you can summarize the conversation and use that summary as context. Here’s an example:

				
					from langchain_groq import ChatGroq
from langchain.prompts import ChatPromptTemplate,MessagesPlaceholder
from langgraph.graph import StateGraph,START,END
from langgraph.graph.message import AnyMessage,add_messages,RemoveMessage
from langchain_core.messages import HumanMessage
from langgraph.checkpoint.memory import MemorySaver
from typing import Annotated,List,Literal
from typing_extensions import TypedDict

class State(TypedDict):
    messages:Annotated[List[AnyMessage],add_messages]
    summary: str

llm = ChatGroq(model="llama-3.3-70b-versatile",api_key="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")
prompt_template = ChatPromptTemplate.from_messages(
    [
        ("system", "{system_message}"),
        MessagesPlaceholder("messages")
    ]
)
llm_model = prompt_template | llm

graph_builder=StateGraph(State)

def chat_node(state:State)->State:
    system_message="You are an assistant"
    summary = state.get("summary", "")
    if summary:
        system_message += f"Summary of conversation earlier: {summary}"
    state["messages"]=llm_model.invoke({"system_message":system_message,"messages":state["messages"]})
    return state

def summarize_conversation(state: State):
    system_message="You are an chat summarizer"
    summary = state.get("summary", "")
    if summary:
        summary_message = (
            f"This is summary of the conversation to date: {summary}\n\n"
            "Extend the summary by taking into account the new messages above:"
        )
    else:
        summary_message = "Create a summary of the conversation above:"
    response=llm_model.invoke({"system_message":system_message,"messages":state["messages"]+[HumanMessage(content=summary_message)]})
    # We now need to delete messages that we no longer want to show up # delete all message except last 2
    delete_messages = [RemoveMessage(id=m.id) for m in state["messages"][:-2]]
    return {"summary": response.content, "messages": delete_messages}

def should_continue(state: State) -> Literal["summarize", END]:
    """Return the next node to execute."""
    messages = state["messages"]
    # If there are more than six messages, then we summarize the conversation
    if len(messages) > 6:
        return "summarize"
    return END

graph_builder.add_node("chatnode",chat_node)
graph_builder.add_node("summarize",summarize_conversation)
graph_builder.add_edge(START,"chatnode")
graph_builder.add_conditional_edges("chatnode",should_continue)
graph_builder.add_edge("summarize",END)
graph=graph_builder.compile(checkpointer=MemorySaver())

config={"configurable":{"thread_id":1}}

input_state={"messages":["My name is sajith"]}
response_state=graph.invoke(input_state,config=config)
for message in response_state["messages"]:
    message.pretty_print()

In this setup, the chatbot summarizes the conversation when it exceeds six messages, storing the summary and remove older messages. The summary is then injected into the system prompt, keeping the agent informed without overwhelming the context window.

Example Interaction:

				
					Input: "My name is Sajith"
Output: "Hello Sajith, it's nice to meet you..."
Input: "I am a software engineer"
Output: "That’s great, Sajith! What type of projects do you work on?"
...
[After several messages]
Summary: "Sajith is a software engineer living in Kerala, working at Ideenkreise Tech..."

Wrapping Up

LangGraph’s approach to short-term memory is both elegant and practical. With thread-scoped checkpoints, you can maintain conversation context effortlessly. Using trimming or summarization you can handle even lengthy chats.

Deploying LangGraph with FastAPI: A Step-by-Step Tutorial

Sajith K — Wed, 12 Mar 2025 06:40:06 +0000

In this tutorial, we’ll build a simple chatbot using FastAPI and LangGraph. We’ll use LangChain’s integration with Groq to power our language model, and we’ll manage our conversation context with a helper function that trims messages to fit within token limits.

Overview

Our project consists of several key components:

Dependencies (requirements.txt): Lists all required packages.
Environment Variables (.env): Contains GROQ_API_KEY
LLM Configuration (llm.py): Sets up our language model using Groq and a prompt template.
Context Management (context_manager.py): Handles token counting and trimming of conversation messages.
LangGraph Conversation Graph (graph.py): Creates a state graph that processes incoming messages.
FastAPI Application (main.py): Defines the API endpoint to handle chat requests.

Let’s dive into each piece.

1. Setting Up the Environment

Create a new project directory and add a requirements.txt file with the following dependencies:

				
					fastapi
uvicorn
python-dotenv
langgraph
langchain
langchain_groq
tiktoken

Install them using pip:

				
					pip install -r requirements.txt

Add the .env file with GROQ_API_KEY

				
					GROQ_API_KEY="paste your groq api key"

2. Configuring the Language Model

In llm.py, we integrate with LangChain and Groq to configure our language model. We load environment variables (like the GROQ_API_KEY), set up our ChatGroq instance, and prepare a prompt template that includes a system instruction and user messages.

				
					from langchain_groq import ChatGroq
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder

import os
from dotenv import load_dotenv
load_dotenv()

os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY")
llm = ChatGroq(model="llama-3.3-70b-versatile")

prompt_template = ChatPromptTemplate.from_messages(
    [
        ("system", "{system_message}"),
        MessagesPlaceholder("messages")
    ]
)

llm_model = prompt_template | llm

3. Managing Conversation Context

To ensure our language model receives a prompt within its token limits, we define helper functions in context_manager.py. This file includes a function to count tokens in our messages and another to trim older messages while preserving the most recent context.

				
					from tiktoken import encoding_for_model
from langgraph.graph.message import BaseMessage

def count_tokens(messages: list[BaseMessage]) -> int:
    encoding = encoding_for_model("gpt-3.5-turbo")  # Use as approximation for Llama
    num_tokens = 0
    for message in messages:
        num_tokens += len(encoding.encode(message.content))
        num_tokens += 4  # Approximate overhead per message
    return num_tokens

def trim_messages(messages: list[BaseMessage], max_tokens: int = 4000) -> list[BaseMessage]:
    if not messages:
        return messages

    # Always keep the system message if it exists
    system_message = None
    chat_messages = messages.copy()

    if messages[0].type == "system":
        system_message = chat_messages.pop(0)

    current_tokens = count_tokens(chat_messages)

    while current_tokens > max_tokens and len(chat_messages) > 1:
        chat_messages.pop(0)
        current_tokens = count_tokens(chat_messages)

    if system_message:
        chat_messages.insert(0, system_message)

    for message in chat_messages:
        print(message)

    return chat_messages

4. Building the Conversation Graph

The heart of our application lies in graph.py, where we use LangGraph to define a state graph. The graph contains a single node chatbot which processes the conversation state. It trims the messages (using our context manager) to ensure we don’t exceed the token limit and then invokes the language model with a helpful system instruction.

				
					from typing import Annotated
from typing_extensions import TypedDict

from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langgraph.checkpoint.memory import MemorySaver

from llm import llm_model
from context_manager import trim_messages

class State(TypedDict):
    messages: Annotated[list, add_messages]

def chatbot(state: State):
    # Trim messages to fit context window
    state["messages"] = trim_messages(state["messages"], max_tokens=4000)
    # Invoke LLM Model
    system_message = "You are a helpful assistant. You are a human being. Talk like a human."
    response = llm_model.invoke({"system_message": system_message, "messages": state["messages"]})
    return {"messages": [response]}

graph_builder = StateGraph(State)
graph_builder.add_node("chatbot", chatbot)
graph_builder.add_edge(START, "chatbot")
graph_builder.add_edge("chatbot", END)

graph = graph_builder.compile(checkpointer=MemorySaver())

5. Creating the FastAPI Application

We start by defining our FastAPI app in main.py. This file sets up a single /chat endpoint that receives a POST request with chat messages and a thread identifier. The endpoint then hands off the messages to our LangGraph graph for processing.

				
					from fastapi import FastAPI
from graph import graph
from pydantic import BaseModel

app = FastAPI()

class ChatInput(BaseModel):
    messages: list[str]
    thread_id: str

@app.post("/chat")
async def chat(input: ChatInput):
    config = {"configurable": {"thread_id": input.thread_id}}
    response = await graph.ainvoke({"messages": input.messages}, config=config)
    return response["messages"][-1].content

6. Running the Application Locally

To run your FastAPI server, simply use Uvicorn:

				
					uvicorn main:app --reload

This command starts the server with hot-reloading enabled. You can now send POST requests to the /chat endpoint. For example, using curl:

				
					curl -X POST "http://127.0.0.1:8000/chat" \
     -H "Content-Type: application/json" \
     -d '{
           "messages": ["Hello, how are you?"],
           "thread_id": "example_thread"
         }'

This request sends a chat message to your deployed LangGraph graph and returns the chatbot’s response.

Output:

				
					"I'm doing great, thanks for asking. It's nice to finally have someone to chat with. I've been sitting here waiting for a conversation to start, so I'm excited to talk to you. How about you? How's your day going so far?"

7. Deploying Your Application

When you’re ready to share your chatbot with the world, consider deploying your FastAPI app using platforms such as Heroku, AWS, or DigitalOcean.

8. Adding Real-Time Streaming with LangGraph astream and WebSockets

In this section, we’ll extend our chatbot to support real-time streaming using LangGraph’s astream functionality and FastAPI’s WebSocket support. This allows users to see the chatbot’s responses as they’re generated, creating a more interactive experience. We’ll also add a simple HTML-based chat interface to interact with the WebSocket endpoint.

Step 1: Create the HTML Template

Create a new file called template.py with the following content:

				
					html = """
<!DOCTYPE html>
<html>
    <head>
        <title>Chat</title>
    </head>
    <body>
        <h1>WebSocket Chat</h1>
        <form action="proxy.php?url=" onsubmit="sendMessage(event)">
            <input type="text" id="messageText" autocomplete="off"/>
            <button>Send</button>
        </form>
        <p id='messages'></p>
        <script>
            var ws = new WebSocket("ws://localhost:8000/ws/123");
            ws.onmessage = function(event) {
                var messages = document.getElementById('messages')
                messages.innerText+=event.data
            };
            function sendMessage(event) {
                var input = document.getElementById("messageText")
                ws.send(input.value)
                input.value = ''
                event.preventDefault()
            }
        </script>
    </body>
</html>
"""

This HTML provides a basic chat interface that connects to a WebSocket endpoint at ws://localhost:8000/ws/123. It sends user input to the server and displays streamed responses in real time.

Step 2: Update main.py with WebSocket Support

Modify your existing main.py to include WebSocket functionality and streaming support:

				
					from fastapi import FastAPI,WebSocket
from fastapi.responses import HTMLResponse
from graph import graph
from pydantic import BaseModel
from template import html

app = FastAPI()

class ChatInput(BaseModel):
    messages: list[str]
    thread_id: str

@app.post("/chat")
async def chat(input: ChatInput):
    config = {"configurable": {"thread_id": input.thread_id}}
    response = await graph.ainvoke({"messages": input.messages}, config=config)
    return response["messages"][-1].content


# Streaming
# Serve the HTML chat interface
@app.get("/")
async def get():
    return HTMLResponse(html)

# WebSocket endpoint for real-time streaming
@app.websocket("/ws/{thread_id}")     
async def websocket_endpoint(websocket: WebSocket, thread_id: str):
    config = {"configurable": {"thread_id": thread_id}}
    await websocket.accept()
    while True:
        data = await websocket.receive_text()
        async for event in graph.astream({"messages": [data]}, config=config, stream_mode="messages"):
            await websocket.send_text(event[0].content)

HTML Endpoint (/): Serves the chat interface defined in template.py.
WebSocket Endpoint (/ws/{thread_id}): Establishes a WebSocket connection for a given thread_id. When a message is received, it uses graph.astream to stream the chatbot’s response in real-time.

Step 3: Test the WebSocket Chat

Add websockets to requirements.txt
Ensure your FastAPI app is running
Open your browser and navigate to http://localhost:8000/ .
Type a message (e.g., “Hey, what’s up?”) in the input field and press “Send”.
Watch the response stream in real-time below the input field.

Notes:

The WebSocket example uses a hardcoded thread_id of “123” in the HTML. For a more dynamic setup, you could pass the thread_id via URL parameters or user input in the HTML.

8. Conclusion

In this tutorial, we walked through building a FastAPI-powered chatbot that leverages LangGraph to manage conversational state. We integrated a language model using LangChain’s Groq module and ensured our conversation context stayed within token limits using our custom context manager.

9. Full Code

https://github.com/Sajith-K-Sasi/chatbot_langgraph_fastapi.git

Implementing a Prompt Generator Using LangChain, LangGraph, and Groq

Sajith K — Thu, 06 Feb 2025 09:46:22 +0000

In this tutorial, we will implement a prompt generator by getting requirements from user. A human-in-the-loop mechanism is incorporated using the interrupt function to gather human input by temporarily halting the graph and is then resumed with Command class. This example builds upon the code provided in the LangGraph’s use case for prompt generator by introducing human involvement to enhance interactivity and control.

Overview

We’ll use LangGraph for workflow management, LangChain for LLM interactions, and Groq’s LLama-3 model as our language model.

The system starts by gathering information about the prompt requirements through a series of questions.
It uses a human-in-the-loop approach, allowing for interactive refinement of requirements.
Once all necessary information is collected, it generates a prompt template based on the gathered requirements.
The system can handle interruptions and resume where it left off, making it robust for real-world use.

Let’s dive into the implementation details!

Prerequisites

Before we begin, make sure you have the following:

Python 3.12+
Required packages: langchain, langchain_core, langchain_groq, langgraph, langsmith, python-dotenv
API keys for Langchain and Groq

Setting Up the Environment

Create a .env file with the following keys:

				
					LANGCHAIN_API_KEY=<your_langchain_api_key>
GROQ_API_KEY=<your_groq_api_key>
LANGCHAIN_TRACING_V2=<tracing_key_if_any>

Then, load these variables in your script:

				
					import os
from dotenv import load_dotenv

load_dotenv()
os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGCHAIN_API_KEY")
os.environ["LANGCHAIN_PROJECT"] = "LangGraph Prompt Generator"
os.environ["LANGCHAIN_TRACING_V2"] = os.getenv("LANGCHAIN_TRACING_V2")
os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY")

Step 1: Building the Prompt Generator LLMs using Langchain

Setting up the Language Model

We’ll use Groq’s LLama-3 model for this task:

				
					from langchain_groq import ChatGroq

llm = ChatGroq(model="llama-3.1-70b-versatile", temperature=0)

Gathering Information from the User

We start by creating a model to gather necessary details from the user:

				
					from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from pydantic import BaseModel
from typing import List

class PromptInstructions(BaseModel):
    objective: str
    variables: List[str]
    constraints: List[str]
    requirements: List[str]

llm_with_tool = llm.bind_tools([PromptInstructions])

gather_info_prompt = """Your job is to get information from the user about what type of prompt template they want to create.
You must get the following information from them:
- What the objective of the prompt is
- What variables will be passed into the prompt template
- Any constraints for what the output should NOT do
- Any requirements that the output MUST adhere to
Ask user to give all the information.You must get all the above information from the user, if not clear ask the user again. 
Do not assume any answer by yourself, get it from the user.
After getting all the information from user, call the relevant tool.
Don't call the tool until you get all the information from user.
If user is not answering your question ask again"""

gather_info_prompt_template = ChatPromptTemplate.from_messages([
    ("system", gather_info_prompt),
    MessagesPlaceholder("messages")
])

gather_info_llm = gather_info_prompt_template | llm_with_tool

PromptInstructions is a Pydantic model that enforces data validation.

ChatPromptTemplate creates a template for the conversation

MessagesPlaceholder allows for maintaining conversation history

Generating the Prompt

Next, we define a model to create the prompt based on the gathered information:

				
					generate_prompt = """Based on the following requirements, write a good prompt template:{reqs}"""
generate_prompt_template = ChatPromptTemplate.from_messages([
    ("system", generate_prompt)
])
generate_prompt_llm = generate_prompt_template | llm

Step 2: State Management

We define the graph’s state to manage messages and interruptions:

				
					from langgraph.graph.message import AnyMessage, add_messages
from typing_extensions import TypedDict
from typing import Annotated,List

class State(TypedDict):
    messages: Annotated[List[AnyMessage], add_messages]
    is_interrupted: bool

Step 3: Creating Nodes

Here are the nodes that make up our graph:

Information Gathering Node:

				
					from langgraph.types import Command
from langchain_core.messages import ToolMessage

def gather_info_node(state: State) -> State:
    state["messages"] = gather_info_llm.invoke({"messages": state["messages"]})
    
    if state["messages"].tool_calls:
        return Command(update=state, goto="tool_node")
    
    state["is_interrupted"] = True
    return Command(update=state, goto="human_in_loop_node")

Checks if the gather_info_llm wants to use any tools and uses Command objects to control the graph flow, either moves to the tool node or interrupts for human input.

Tool Node:

				
					def tool_node(state: State):
    return {
        "messages": [
            ToolMessage(
                content=str(state["messages"][-1].tool_calls[0]["args"]),
                tool_call_id=state["messages"][-1].tool_calls[0]["id"],
            )
        ]
    }

Processes tool calls from the gather_info_llm and creates a ToolMessage with the tool’s output

Prompt Generation Node:

				
					def generate_prompt_node(state):
    state["messages"] = generate_prompt_llm.invoke({"reqs": state["messages"][-1].content})
    return state

Takes the gathered requirements from tool_node and invokes the generate_prompt_llm to generate the actual prompt.

Human-in-Loop Node:

				
					from langgraph.types import interrupt

def human_in_loop_node(state):
    input = interrupt("please give the requested data")
    state["is_interrupted"] = False
    state["messages"] = input
    return state

Interrupts the graph execution using interrupt function to requests input from the user. After getting the input the node resets the interrupted flag and updates the state with user input

Step 4: Building the Graph

				
					from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver

graph_builder = StateGraph(State)

graph_builder.add_node("gather_info_node", gather_info_node)
graph_builder.add_node("generate_prompt_node", generate_prompt_node)
graph_builder.add_node("tool_node", tool_node)
graph_builder.add_node("human_in_loop_node", human_in_loop_node)

graph_builder.add_edge(START, "gather_info_node")
graph_builder.add_edge("human_in_loop_node", "gather_info_node")
graph_builder.add_edge("tool_node", "generate_prompt_node")
graph_builder.add_edge("generate_prompt_node", END)

graph = graph_builder.compile(checkpointer=MemorySaver())

Create a new StateGraph with our State type. Add all the nodes to the graph, define the flow between nodes using edges and compile the graph for execution, uses MemorySaver for checkpointing.

Step 5: Implementing Graph Interaction

Finally, we create helper functions to interact with our graph:

				
					from langchain_core.messages import HumanMessage

def invoke_graph(message, config, graph):
    human_message = HumanMessage(content=message)
    response = graph.invoke({"messages": [human_message]}, config=config)
    return response
def resume_graph(message, config, graph):
    human_message = HumanMessage(content=message)
    response = graph.invoke(Command(resume=human_message), config=config)
    return response
def display(response):
    for message in response["messages"]:
        message.pretty_print()

invoke_graph: Starts a new graph execution

resume_graph: Continues an interrupted execution, uses Command(resume=) to continue execution

display: Display the graph response

Step 6: Executing the Graph

The graph can now be invoked, and resumed as follows:

				
					config = {"configurable": {"thread_id": 1234}}
message = input("Enter input message: ")

graph_values = graph.get_state(config).values
if "is_interrupted" in graph_values and graph_values["is_interrupted"]:
    response = resume_graph(message, config)
else:
    response = invoke_graph(message, config)

display(response)

config: Uses configuration for thread management
message: Gets user input
Checks if the graph is interrupted
Either resumes using resume_graph or starts new execution using invoke_graph
Displays the response

Complete Code:

				
					import os
from dotenv import load_dotenv

from pydantic import BaseModel
from typing import List
from typing_extensions import TypedDict
from typing import Annotated,List

from langchain_groq import ChatGroq
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import ToolMessage,HumanMessage

from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import AnyMessage, add_messages
from langgraph.checkpoint.memory import MemorySaver
from langgraph.types import Command,interrupt

load_dotenv()
os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGCHAIN_API_KEY")
os.environ["LANGCHAIN_PROJECT"] = "LangGraph Prompt Generator"
os.environ["LANGCHAIN_TRACING_V2"] = os.getenv("LANGCHAIN_TRACING_V2")
os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY")

# Setting up the Language Model
def get_llm():
    llm = ChatGroq(model="llama-3.3-70b-versatile", temperature=0)
    return llm

# Gathering Information from the User
def get_gather_info_llm(llm):
    class PromptInstructions(BaseModel):
        objective: str
        variables: List[str]
        constraints: List[str]
        requirements: List[str]

    llm_with_tool = llm.bind_tools([PromptInstructions])

    gather_info_prompt = """Your job is to get information from the user about what type of prompt template they want to create.
    You must get the following information from them:
    - What the objective of the prompt is
    - What variables will be passed into the prompt template
    - Any constraints for what the output should NOT do
    - Any requirements that the output MUST adhere to
    Ask user to give all the information.You must get all the above information from the user, if not clear ask the user again. 
    Do not assume any answer by yourself, get it from the user.
    After getting all the information from user, call the relevant tool.
    Don't call the tool until you get all the information from user.
    If user is not answering your question ask again"""
    gather_info_prompt_template = ChatPromptTemplate.from_messages([
        ("system", gather_info_prompt),
        MessagesPlaceholder("messages")
    ])
    gather_info_llm = gather_info_prompt_template | llm_with_tool

    return gather_info_llm

# Generating the Prompt
def get_generate_prompt_llm(llm):
    generate_prompt = """Based on the following requirements, write a good prompt template:{reqs}"""
    generate_prompt_template = ChatPromptTemplate.from_messages([
        ("system", generate_prompt)
    ])
    generate_prompt_llm = generate_prompt_template | llm

    return generate_prompt_llm

def create_graph(gather_info_llm,generate_prompt_llm):
    # State Management
    class State(TypedDict):
        messages: Annotated[List[AnyMessage], add_messages]
        is_interrupted: bool

    # Nodes
    def gather_info_node(state: State) -> State:
        state["messages"] = gather_info_llm.invoke({"messages": state["messages"]})
        
        if state["messages"].tool_calls:
            return Command(update=state, goto="tool_node")
        
        state["is_interrupted"] = True
        return Command(update=state, goto="human_in_loop_node")
    def tool_node(state: State):
        return {
            "messages": [
                ToolMessage(
                    content=str(state["messages"][-1].tool_calls[0]["args"]),
                    tool_call_id=state["messages"][-1].tool_calls[0]["id"],
                )
            ]
        }
    def generate_prompt_node(state):
        state["messages"] = generate_prompt_llm.invoke({"reqs": state["messages"][-1].content})
        return state
    def human_in_loop_node(state):
        input = interrupt("please give the requested data")
        state["is_interrupted"] = False
        state["messages"] = input
        return state

    # Building the graph
    graph_builder = StateGraph(State)

    graph_builder.add_node("gather_info_node", gather_info_node)
    graph_builder.add_node("generate_prompt_node", generate_prompt_node)
    graph_builder.add_node("tool_node", tool_node)
    graph_builder.add_node("human_in_loop_node", human_in_loop_node)

    graph_builder.add_edge(START, "gather_info_node")
    graph_builder.add_edge("human_in_loop_node", "gather_info_node")
    graph_builder.add_edge("tool_node", "generate_prompt_node")
    graph_builder.add_edge("generate_prompt_node", END)

    graph = graph_builder.compile(checkpointer=MemorySaver())

    return graph

# Implementing Graph Interaction
def invoke_graph(message, config, graph):
    human_message = HumanMessage(content=message)
    response = graph.invoke({"messages": [human_message]}, config=config)
    return response
def resume_graph(message, config, graph):
    human_message = HumanMessage(content=message)
    response = graph.invoke(Command(resume=human_message), config=config)
    return response
def display(response):
    for message in response["messages"]:
        message.pretty_print()

# Executing the Graph
llm = get_llm()
gather_info_llm= get_gather_info_llm(llm)
generate_prompt_llm= get_generate_prompt_llm(llm)
graph = create_graph(gather_info_llm,generate_prompt_llm)

config = {"configurable": {"thread_id": 1234}}

while True:
    message = input("Enter input message: ")

    graph_values = graph.get_state(config).values
    if "is_interrupted" in graph_values and graph_values["is_interrupted"]:
        response = resume_graph(message, config, graph)
    else:
        response = invoke_graph(message, config, graph)

    display(response)

Example Response

Initial Input:

				
					find largest of 10 numbers

Information gathered by

gather_info_node:

				
					{'objective': 'Find the largest even number in a list of 10 numbers', 'variables': ['list of 10 numbers'], 'constraints': ['only consider even numbers'], 'requirements': ['output should be a float value']}

Final Generated Prompt by

generate_prompt_node:

				
					Here's a good prompt template based on the given requirements:

"Write a function to find the largest even number in a given list of 10 numbers. The function should only consider even numbers from the list and return the result as a float value. The input list will contain 10 numbers, and the function should handle this list to produce the desired output."

Alternatively, if you want a more formal template:

"Given a list of 10 numbers, develop an algorithm to identify the largest even number within this list, ignoring any odd numbers. Ensure the output is returned as a float value, and provide a clear explanation of your approach."

Example input: 
"Find the largest even number in the list [12, 45, 7, 23, 56, 89, 34, 6, 98, 11] and return the result as a float."

Expected output: 
"The largest even number in the list is 98.0"

Complete Interaction Output:

				
					================================[1m Human Message [0m=================================
Hi
==================================[1m Ai Message [0m==================================
To create a prompt template, I need some information from you. Can you please tell me what the objective of the prompt is? What do you want the prompt to achieve?
================================[1m Human Message [0m=================================
find largest of 10 numbers
==================================[1m Ai Message [0m==================================
So the objective of the prompt is to find the largest of 10 numbers.
Next, I need to know what variables will be passed into the prompt template. In this case, I'm assuming the variables are the 10 numbers. Can you confirm that? Are there any other variables that will be passed into the prompt template?
Also, are there any constraints for what the output should NOT do? For example, should the prompt avoid using certain words or phrases?
Lastly, are there any requirements that the output MUST adhere to? For example, should the output be in a specific format or should it provide explanations for the answer?
================================[1m Human Message [0m=================================
it will be a list of 10 numbers
==================================[1m Ai Message [0m==================================
So, to confirm:
* The objective of the prompt is to find the largest of 10 numbers.
* The variable that will be passed into the prompt template is a list of 10 numbers.
Next, I need to know about any constraints for what the output should NOT do. For example, should the prompt avoid using certain words or phrases, or should it avoid providing multiple answers?
Also, are there any requirements that the output MUST adhere to? For example, should the output be in a specific format, such as "The largest number is X", or should it provide explanations for why a particular number is the largest?
Please let me know about any constraints or requirements you have in mind.
(And if you're not sure, that's okay! I can ask follow-up questions to help clarify things.)
================================[1m Human Message [0m=================================
only find even numbers
==================================[1m Ai Message [0m==================================
So, to confirm:
* The objective of the prompt is to find the largest of 10 numbers, but only considering the even numbers in the list.
* The variable that will be passed into the prompt template is a list of 10 numbers.
As for constraints, I understand that the prompt should:
* Only consider even numbers when finding the largest number
Is that correct?
As for requirements, are there any specific requirements for the output? For example, should the output be in a specific format, such as "The largest even number is X", or should it provide explanations for why a particular number is the largest even number?
Please let me know if there's anything else I should know.
(And if you're ready, I can call the PromptInstructions function with the information we've gathered so far!)
================================[1m Human Message [0m=================================
output should be float
==================================[1m Ai Message [0m==================================
Tool Calls:
  PromptInstructions (call_n8ce)
 Call ID: call_n8ce
  Args:
    objective: Find the largest even number in a list of 10 numbers
    variables: ['list of 10 numbers']
    constraints: ['only consider even numbers']
    requirements: ['output should be a float value']
=================================[1m Tool Message [0m=================================
{'objective': 'Find the largest even number in a list of 10 numbers', 'variables': ['list of 10 numbers'], 'constraints': ['only consider even numbers'], 'requirements': ['output should be a float value']}
==================================[1m Ai Message [0m==================================
Here's a good prompt template based on the given requirements:
"Write a function to find the largest even number in a given list of 10 numbers. The function should only consider even numbers from the list and return the result as a float value. The input list will contain 10 numbers, and the function should handle this list to produce the desired output."
Alternatively, if you want a more formal template:
"Given a list of 10 numbers, develop an algorithm to identify the largest even number within this list, ignoring any odd numbers. Ensure the output is returned as a float value, and provide a clear explanation of your approach."
Example input: 
"Find the largest even number in the list [12, 45, 7, 23, 56, 89, 34, 6, 98, 11] and return the result as a float."
Expected output: 
"The largest even number in the list is 98.0"

Conclusion

By combining LangChain, LangGraph, and Groq, we’ve built a robust and customizable prompt generation workflow. Human-in-the-loop mechanisms ensure flexibility, making it ideal for real-world applications.

Harnessing the Power of LLM Agents in Software Development

Sajith K — Thu, 06 Feb 2025 09:45:01 +0000

My journey began in December 2024, when I started exploring the potential of LLMs, LangChain, and LangGraph to harness the power of AI-driven agents. The goal was clear: leverage the strengths of LLM agents to elevate the capabilities of my software solutions. This quest opened the door to understanding how these tools could enable complex workflows, integrate with external systems, and create dynamic, adaptable software architectures.

In this post, I’ll introduce you to LangGraph, an orchestration framework I’ve been working with, and showcase how it can be used to build agent workflows with powerful state management and fine-tuned control. Whether you’re a developer seeking to incorporate LLM agents into your projects, or simply curious about the potential of AI-driven software, I hope this post offers valuable insights to fuel your own journey.

Introduction to LangGraph: Basics and Examples

LangGraph is a stateful orchestration framework for building multi-actor applications with Large Language Models (LLMs). It enables creating complex agent workflows with cycles and branching, essential for agentic architectures. LangGraph provides fine-grained control over application flow and state for building reliable agents, while also offering built-in persistence for advanced memory and human-in-the-loop workflows.

Key features include:

Cycles and Branching: Enable loops and conditionals in workflows.
Persistence: Save state after each graph step, allowing workflows to pause and resume.
Human-in-the-Loop: Pause execution for approvals or edits of planned actions.
Streaming Support: Stream outputs in real time, including token-level outputs.
LangChain Integration: Works seamlessly with LangChain and LangSmith or as a standalone tool.

It models agent workflows as graphs, where you define the behavior of your agents using three key components:

State: A shared data structure that represents the current snapshot of your application. It can be any Python type, but is typically a TypedDict or Pydantic BaseModel.
Nodes: Python functions that encode the logic of your agents. They receive the current State as input, perform some computation or side-effect, and return an updated State.
Edges: Python functions that determine which Node to execute next based on the current State. They can be conditional branches or fixed transitions.

Example 1: Graph With Single Node

LangGraph’s state management system is a core feature that ensures data flows seamlessly between nodes. Let’s explore how state is handled in a simple graph using a single node.

Code Example:

				
					from langgraph.graph import StateGraph, START, END
from typing import List
from typing_extensions import TypedDict

# Define the structure of the state
class State(TypedDict):
    messages: List[str]
    count: int

# Create the graph
def create_graph():
    graph_builder = StateGraph(State)

    # Define a single node
    def node1(state: State) -> State:
        state["messages"] = state["messages"] + ["Hello from node 1"]
        state["count"] += 1
        return state

    # Add the node to the graph
    graph_builder.add_node("node1", node1)

    # Define the flow of the graph
    graph_builder.add_edge(START, "node1")
    graph_builder.add_edge("node1", END)

    # Compile the graph
    graph = graph_builder.compile()
    return graph

# Initialize the graph and input state
graph = create_graph()
input_state = {"messages": ["Hi from user"], "count": 0}

# Invoke the graph
response_state = graph.invoke(input_state)
print(response_state)

Output:

				
					{
'messages': ['Hi from user', 'Hello from node 1'], 
'count': 1
}

Explanation:

State Definition: The State class defines the structure of the state, including messages and count.
Node Logic: The node1 function takes the input state, appends a message, increments the count, and returns the updated state.
Graph Flow: The graph starts at the START node, processes through node1, and ends at the END node.
Result: The graph modifies the input state as it flows through node1.

Example 2: Graph With Two Nodes Connected Sequentially

In this example, we expand on the first one by adding a second node that sequentially processes the state. This demonstrates how multiple nodes work together in a flow, passing the updated state from one to the next.

Code Example:

				
					from langgraph.graph import StateGraph, START, END
from typing import List
from typing_extensions import TypedDict

# Define the structure of the state
class State(TypedDict):
    messages: List[str]
    count: int

# Create the graph
def create_graph():
    graph_builder = StateGraph(State)

    # Define the first node
    def node1(state: State) -> State:
        state["messages"] = state["messages"] + ["Hello from node 1"]
        state["count"] += 1
        return state

    # Define the second node
    def node2(state: State) -> State:
        state["messages"] = state["messages"] + ["Hello from node 2"]
        state["count"] += 1
        return state

    # Add the nodes to the graph
    graph_builder.add_node("node1", node1)
    graph_builder.add_node("node2", node2)

    # Define the flow of the graph
    graph_builder.add_edge(START, "node1")
    graph_builder.add_edge("node1", "node2")
    graph_builder.add_edge("node2", END)

    # Compile the graph
    graph = graph_builder.compile()
    return graph

# Initialize the graph and input state
graph = create_graph()
input_state = {"messages": ["Hi from user"], "count": 0}

# Invoke the graph
response_state = graph.invoke(input_state)
print(response_state)

Output:

				
					{
'messages': ['Hi from user', 'Hello from node 1', 'Hello from node 2'], 
'count': 2
}

Explanation:

Multiple Nodes: The graph now includes two nodes (node1 and node2), each modifying the state.
State Flow: After node1 updates the state, node2 processes it further. The flow is sequential.
Graph Flow: The graph starts at START, processes through node1, then node2, and ends at END.
Result: The final state includes messages from both nodes, and the count is incremented twice, showing how state flows across multiple nodes.

Example 3: Graph with Two Nodes Selected Conditionally

In this example, we explore how to use conditional logic to select between multiple nodes based on the state. This is a useful technique when the next step in the workflow depends on a dynamic condition, such as a user input or a previously computed result.

Here, we have a graph with two nodes (node1 and node2). The choice of which node to execute next is determined by the value of the use_node field in the state.

Code Example:

				
					from langgraph.graph import StateGraph, START, END
from typing import List
from typing_extensions import TypedDict

# Define the structure of the state
class State(TypedDict):
    messages: List[str]
    use_node: str
    count: int

# Create the graph
def create_graph():
    graph_builder = StateGraph(State)

    # Define the first node
    def node1(state: State) -> State:
        state["messages"] = state["messages"] + ["Hello from node 1"]
        state["count"] += 1
        return state

    # Define the second node
    def node2(state: State) -> State:
        state["messages"] = state["messages"] + ["Hello from node 2"]
        state["count"] += 1
        return state

    # Conditional router to decide which node to use
    def conditional_router(state: State) -> str:
        if state["use_node"] == "node1":
            return "node1"
        else:
            return "node2"

    # Add the nodes to the graph
    graph_builder.add_node("node1", node1)
    graph_builder.add_node("node2", node2)

    # Add a conditional edge based on the `use_node` value
    graph_builder.add_conditional_edges(START, conditional_router,{"node1":"node1","node2":"node2"})
    graph_builder.add_edge("node1", END)
    graph_builder.add_edge("node2", END)

    # Compile the graph
    graph = graph_builder.compile()
    return graph

# Initialize the graph and input state
graph = create_graph()
input_state = {"messages": ["Hi from user"], "use_node": "node2", "count": 0}

# Invoke the graph
response_state = graph.invoke(input_state)
print(response_state)

Output:

				
					{
'messages': ['Hi from user', 'Hello from node 2'], 
'use_node': 'node2', 
'count': 1
}

Explanation:

Conditional Node Selection: The graph now includes a conditional_router function that checks the value of use_node in the state. If use_node is “node1“, the flow proceeds to node1; otherwise, it moves to node2.
Node Logic: node1 adds a message to the messages list and increments the count.node2 performs similar actions but appends a different message to messages.
Conditional Flow: The START node’s edge is now conditional, determined by the conditional_router. Depending on the use_node value, either node1 or node2 is selected, and the flow continues to END.
Result: Since the input state specifies “use_node“: “node2“, the graph processes through node2, adding its message and incrementing the count.

Example 4: Graph with Nodes Connected in Parallel

In this example, we explore how to execute multiple nodes in parallel. This technique is useful when different parts of your workflow can run simultaneously without affecting each other’s results. By connecting multiple nodes in parallel, you can increase the efficiency and flexibility of your agent workflows.

Here, we have four nodes (node1, node2, node3, and node4). The graph is structured such that node1 runs first, followed by both node2 and node3 in parallel, and finally node4, which collects the results from both parallel nodes.

Code Example:

				
					from langgraph.graph import StateGraph, START, END
from typing import List, Annotated
from typing_extensions import TypedDict

# Helper functions to demonstrate state manipulation
def add_messages(left, right):
    return left + right
def add_count(left, right):
    return left + right

# Define the structure of the state
class State(TypedDict):
    messages: Annotated[List[str], add_messages]
    count: Annotated[int, add_count]

# Create the graph
def create_graph():
    graph_builder = StateGraph(State)
   
    # Define the first node
    def node1(state: State) -> State:
        print("Entered Node 1 with State: " + str(state))
        state["messages"] = ["Hello from node 1"]
        state["count"] = 1
        return state
    
    # Define the second node
    def node2(state: State) -> State:
        print("Entered Node 2 with State: " + str(state))
        state["messages"] = ["Hello from node 2"]
        state["count"] = 1
        return state
    
    # Define the third node
    def node3(state: State) -> State:
        print("Entered Node 3 with State: " + str(state))
        state["messages"] = ["Hello from node 3"]
        state["count"] = 1
        return state

    # Define the fourth node
    def node4(state: State) -> State:
        print("Entered Node 4 with State: " + str(state))
        state["messages"] = ["Hello from node 4"]
        state["count"] = 1
        return state
    
    # Add the nodes to the graph
    graph_builder.add_node("node1", node1)
    graph_builder.add_node("node2", node2)
    graph_builder.add_node("node3", node3)
    graph_builder.add_node("node4", node4)
    
    # Define parallel edges
    graph_builder.add_edge(START, "node1")
    graph_builder.add_edge("node1", "node2")
    graph_builder.add_edge("node1", "node3")
    graph_builder.add_edge("node2", "node4")
    graph_builder.add_edge("node3", "node4")
    graph_builder.add_edge("node4", END)
    
    # Compile the graph
    graph = graph_builder.compile()
    return graph

# Initialize the graph and input state
graph = create_graph()
input_state = {"messages": ["Hi from user"], "count": 0}

# Invoke the graph
response_state = graph.invoke(input_state)
print("Output State: " + str(response_state))

Output:

				
					Entered Node 1 with State: {'messages': ['Hi from user'], 'count': 0}
Entered Node 2 with State: {'messages': ['Hi from user', 'Hello from node 1'], 'count': 1}
Entered Node 3 with State: {'messages': ['Hi from user', 'Hello from node 1'], 'count': 1}
Entered Node 4 with State: {'messages': ['Hi from user', 'Hello from node 1', 'Hello from node 2', 'Hello from node 3'], 'count': 3}
Output State: {'messages': ['Hi from user', 'Hello from node 1', 'Hello from node 2', 'Hello from node 3', 'Hello from node 4'], 'count': 4}

Explanation:

1. Parallel Execution:

The graph starts with node1, which processes the initial state.
After node1 completes, the flow splits into two parallel paths: node2 and node3. These nodes execute independently and concurrently.
Once both node2 and node3 finish, the flow merges into node4, where all the results are consolidated.

2. Node Logic:

Each node appends a message to the messages list and updates the count. The print statements help track the state as it flows through the graph.

3. Graph Flow: The graph follows this path:

START → node1 → node2 and node3 (in parallel) → node4 → END

4. Result:

The final state shows that all messages from the nodes are accumulated, and the count is incremented as expected. The final state includes the messages from all nodes, and the count is 4 (as node1, node2, node3, and node4 each contribute 1 to the count).

Example 5: Graph with a Single LLM Node

This example demonstrates how to integrate a Large Language Model (LLM) into a LangGraph workflow. By adding a single LLM node, we can process user queries and generate AI responses.

By leveraging LangChain’s prompt template and LangGraph’s state management, developers can build powerful applications with minimal effort.

Code Example:

				
					from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import AnyMessage, add_messages
from typing import Annotated, List
from typing_extensions import TypedDict
from langchain_groq import ChatGroq
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder

# Define the structure of the state
class State(TypedDict):
    messages: Annotated[List[AnyMessage], add_messages]

# Initialize the LLM with a prompt
def get_llm():
    prompt_template = ChatPromptTemplate.from_messages([
        ("system", "You are an AI Assistant, respond to user question in 10 words"),
        MessagesPlaceholder("message_history"),
        ("user", "{input}")
    ])
    llm = ChatGroq(model="llama3-70b-8192", temperature=1, api_key="Paste your Groq API key")
    llm_with_prompt = prompt_template | llm
    return llm_with_prompt

# Create the graph
def create_graph():
    graph_builder = StateGraph(State)
    
    # Define the LLM node
    def ChatNode(state: State) -> State:
        last_message = state["messages"][-1]
        message_history = state["messages"][:-1]
        llm = get_llm()
        state["messages"] = llm.invoke({"input": last_message.content, "message_history": message_history})
        return state

    # Add the LLM node to the graph
    graph_builder.add_node("chatnode", ChatNode)
    graph_builder.add_edge(START, "chatnode")
    graph_builder.add_edge("chatnode", END)

    # Compile the graph
    graph = graph_builder.compile()
    return graph

# Initialize the graph and input state
graph = create_graph()
input_state = {"messages": ["Who are you?"]}

# Invoke the graph
response_state = graph.invoke(input_state)
print(response_state)

Output:

				
					{
    'messages': 
    [
        HumanMessage(content='Who are you?', additional_kwargs={}, response_metadata={}, id='...'), 
        AIMessage(content='I am LLaMA, an AI assistant, here to help you.', additional_kwargs={}, response_metadata={...},id='...', usage_metadata={...})
    ]
}

Explanation:

LLM Initialization: The get_llm function sets up an LLM with a predefined prompt and integrates it into LangGraph using ChatPromptTemplate.
LLM Node Logic: The ChatNode function extracts the latest user message and message history from the state, invokes the LLM, and updates the state with the AI-generated response.
Graph Flow: The graph starts at START, processes through the ChatNode, and ends at END.
State Updates: The final state includes the user query (HumanMessage) and the AI’s response (AIMessage), showcasing seamless integration of an LLM.

Example 6: Graph with Single LLM Node Having Access to Tools

In this example, The LLM node will have access to a custom tool (multiply_tool) to compute multiplication. This demonstrates how LangGraph can orchestrate agent workflows with external tools.

This example expands LangGraph’s functionality by incorporating tools within the workflow, demonstrating how LangGraph integrates LLM models and custom tools for more dynamic and intelligent agent behavior.

Code Example:

				
					from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import AnyMessage, add_messages
from typing import Annotated, List
from typing_extensions import TypedDict
from langchain_groq import ChatGroq
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.tools import tool
from langgraph.prebuilt import ToolNode, tools_condition

# Define the structure of the state
class State(TypedDict):
    messages: Annotated[List[AnyMessage], add_messages]

# Define the multiply tool
@tool
def multiply_tool(a: float, b: float) -> float:
    """Use this tool to multiply 2 numbers"""
    print("<<<<<<<<<using multiply tool>>>>>>>>>>>")
    return a * b

# Get LLM with tool integration
def get_llm():
    prompt_template = ChatPromptTemplate.from_messages([
        ("system", """You are an AI Assistant, respond to user questions in 10 words.
        Use multiply tool to compute multiplication of numbers.
        For all other questions, answer directly."""),
        MessagesPlaceholder("messages")
    ])
    llm = ChatGroq(model="llama3-70b-8192", temperature=1, api_key="paste your groq api key here")
    llm_with_tool = llm.bind_tools(tools=[multiply_tool])
    llm_with_prompt = prompt_template | llm_with_tool
    return llm_with_prompt

# Create the LangGraph workflow
def create_graph():
    graph_builder = StateGraph(State)
    
    # Define the LLM node
    def ChatNode(state: State) -> State:
        llm = get_llm()
        state["messages"] = llm.invoke({"messages":state["messages"]})
        return state
    
    # Add nodes and edges to the graph
    graph_builder.add_node("chatnode", ChatNode)
    graph_builder.add_node("toolnode", ToolNode(tools=[multiply_tool]))
    graph_builder.add_edge(START, "chatnode")
    graph_builder.add_conditional_edges("chatnode", tools_condition, {"tools": "toolnode", "__end__": END})
    graph_builder.add_edge("toolnode", "chatnode")

    # Compile the graph
    graph = graph_builder.compile()
    return graph

# Initialize the graph
graph = create_graph()

# Test cases
print("======================Not using tool=========================")
input_state1 = {"messages": ["Who are you?"]}
response_state1 = graph.invoke(input_state1)
print(response_state1)

print("======================Using tool=========================")
input_state2 = {"messages": ["What is 2*3?"]}
response_state2 = graph.invoke(input_state2)
print(response_state2)

Output:

				
					======================Not using tool=========================
{
    'messages': [
        HumanMessage(content='Who are you?', additional_kwargs={}, response_metadata={}, id='...'), 
        AIMessage(content='I am an AI Assistant.', additional_kwargs={}, response_metadata={...}, id='...', usage_metadata={...})
    ]
}
======================Using tool=========================
<<<<<<<<<using multiply tool>>>>>>>>>>>
{
    'messages': [
        HumanMessage(content='What is 2*3?', additional_kwargs={}, response_metadata={}, id='...'), 
        AIMessage(content='', tool_calls=[{'name': 'multiply_tool', 'args': {'a': 2, 'b': 3}, 'id': 'call_7c9t', 'type': 'tool_call'}], additional_kwargs={...}, response_metadata={...}, id='...', usage_metadata={...}), 
        ToolMessage(content='6.0', name='multiply_tool', id='...', tool_call_id='call_7c9t'), 
        AIMessage(content='The answer is 6.0.', additional_kwargs={}, response_metadata={...}, id='...', usage_metadata={...})
    ]
}

Explanation:

Tool Integration: We use the multiply_tool to perform calculations within the workflow. The tool is invoked only when the LLM determines the user’s question is related to multiplication.
LLM Configuration: The get_llm function configures a LLM model and binds the tool to it. The model uses a custom prompt template to handle user queries.
Graph Workflow: The graph begins with the ChatNode that processes user input through the LLM. If the user’s query is related to multiplication (like “What is 2*3?”), the ToolNode invokes the multiply_tool. The graph then loops back to process further messages, showing the dynamic interaction between nodes and tools.
Output: When no tool is needed, the LLM responds directly. When the tool is required, the multiply_tool performs the calculation, and the LLM uses the tool’s result in the response.