Skip to content

Commit 1c7cdbe

Browse files
xingyaowwrbren
andauthored
feat(CodeActAgent): Support Agent-User Interaction during Task Execution and the Full Integration of CodeActAgent (OpenHands#1290)
* initialize plugin definition * initialize plugin definition * simplify mixin * further improve plugin mixin * add cache dir for pip * support clean up cache * add script for setup jupyter and execution server * integrate JupyterRequirement to ssh_box * source bashrc at the end of plugin load * add execute_cli that accept code via stdin * make JUPYTER_EXEC_SERVER_PORT configurable via env var * increase background cmd sleep time * Update opendevin/sandbox/plugins/mixin.py Co-authored-by: Robert Brennan <[email protected]> * add mixin to base class * make jupyter requirement a dataclass * source plugins only when >0 requirements * add `sandbox_plugins` for each agent & have controller take care of it * update build.sh to make logs available in /opendevin/logs * switch to use config for lib and cache dir * Add SANDBOX_WORKSPACE_DIR into config * Add SANDBOX_WORKSPACE_DIR into config * fix occurence of /workspace * fix permission issue with /workspace * use python to implement execute_cli to avoid stdin escape issue * add IPythonRunCellAction and get it working * wait until jupyter is avaialble * support plugin via copying instead of mounting * add agent talk action * support follow-up user language feedback * add __str__ for action to be printed better * only print PLAN at the beginning * wip: update codeact agent * get rid the initial messate * update codeact agent to handle null action; add thought to bash * dispatch thought for RUN action as well * fix weird behavior of pxssh where the output would not flush correctly * make ssh box can handle exit_code properly as well * add initial version of swe-agent plugin; * rename swe cursors * split setup script into two and create two requirements * print SWE-agent command documentation * update swe-agent to default to no custom docs * add initial version of swe-agent plugin; * rename swe cursors * split setup script into two and create two requirements * print SWE-agent command documentation * update swe-agent to default to no custom docs * update dockerfile with dependency from swe-agent * make env setup a separate script for .bashrc source * add wip prompt * fix mount_dir for ssh_box * update prompt * fix mount_dir for ssh_box * default to use host network * default to use host network * move prompt to a separate file * fix swe-tool plugins; add missing _split_string * remove hostname from sshbox * update the prompt with edit functionality * fix swe-tool plugins; add missing _split_string * add awaiting into status bar * fix the bug of additional send event * remove some print action * move logic to config.py * remove debugging comments * make host network as default * make WORKSPACE_MOUNT_PATH as abspath * implement execute_cli via file cp * Revert "implement execute_cli via file cp" This reverts commit 06f0155. * add codeact dependencies to default container * add IPythonRunCellObservation * add back cache dir and default to /tmp * make USE_HOST_NETWORK a bool * revert use host network to false * add temporarily fix for IPython RUN action * update prompt * revert USE_HOST_NETWORK to true since it is not affecting anything * attempt to fix lint * remove newline * fix jupyter execution server * add `thought` to most action class * fix unit tests for current action abstraction * support user exit * update test cases with the latest action format (added 'thought') * fix integration test for CodeActAGent by mocking stdin * only mock stdin for tests with user_responses.log * remove -exec integration test for CodeActAgent since it is not supported * remove specific stop word * fix comments * improve clarity of prompt * fix py lint * fix integration tests * sandbox might failed in chown due to mounting, but it won't be fatal * update debug instruction for sshbox * fix typo * get RUN_AS_DEVIN and network=host working with app sandbox * get RUN_AS_DEVIN and network=host working with app sandbox * attempt to fix the workspace base permission * sandbox might failed in chown due to mounting, but it won't be fatal * update sshbox instruction * remove default user id since it will be passed in the instruction * revert permission fix since it should be resolved by correct SANDBOX_USER_ID * the permission issue can be fixed by simply provide correct env var * remove log * set sandbox user id to getuid by default * move logging to initializer * make the uid consistent across host, app container, and sandbox * remove hostname as it causes sudo issue * fix permission of entrypoint script * make the uvicron app run as host user uid for jupyter plugin * add warning message * update dev md for instruction of running unit tests * add back unit tests * revert back to the original sandbox implementation to fix testcases * revert use host network * get docker socket gid and usermod instead of chmod 777 * allow unit test workflow to find docker.sock * make sandbox test working via patch * fix arg parser that's broken for some reason * try to fix app build disk space issue * fix integration test * Revert "fix arg parser that's broken for some reason" This reverts commit 6cc8961. * update Development.md * cleanup intergration tests & add exception for CodeAct+execbox * fix config * implement user_message action * fix doc * fix event dict error * fix frontend lint * revert accidentally changes to integration tests * revert accidentally changes to integration tests --------- Co-authored-by: Robert Brennan <[email protected]> Co-authored-by: Robert Brennan <[email protected]>
1 parent ea214d1 commit 1c7cdbe

81 files changed

Lines changed: 2698 additions & 433 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

agenthub/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,13 +26,15 @@ The `state` contains:
2626
Here is a list of available Actions, which can be returned by `agent.step()`:
2727
- [`CmdRunAction`](../opendevin/action/bash.py) - Runs a command inside a sandboxed terminal
2828
- [`CmdKillAction`](../opendevin/action/bash.py) - Kills a background command
29+
- [`IPythonRunCellAction`](../opendevin/action/bash.py) - Execute a block of Python code interactively (in Jupyter notebook) and receives `CmdOutputObservation`. Requires setting up `jupyter` [plugin](../opendevin/sandbox/plugins) as a requirement.
2930
- [`FileReadAction`](../opendevin/action/fileop.py) - Reads the content of a file
3031
- [`FileWriteAction`](../opendevin/action/fileop.py) - Writes new content to a file
3132
- [`BrowseURLAction`](../opendevin/action/browse.py) - Gets the content of a URL
3233
- [`AgentRecallAction`](../opendevin/action/agent.py) - Searches memory (e.g. a vector database)
3334
- [`AddTaskAction`](../opendevin/action/tasks.py) - Adds a subtask to the plan
3435
- [`ModifyTaskAction`](../opendevin/action/tasks.py) - Changes the state of a subtask
3536
- [`AgentThinkAction`](../opendevin/action/agent.py) - A no-op that allows the agent to add plaintext to the history (as well as the chat log)
37+
- [`AgentTalkAction`](../opendevin/action/agent.py) - A no-op that allows the agent to add plaintext to the history and talk to the user.
3638
- [`AgentFinishAction`](../opendevin/action/agent.py) - Stops the control loop, allowing the user to enter a new task
3739

3840
You can use `action.to_dict()` and `action_from_dict` to serialize and deserialize actions.
Lines changed: 72 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -1,54 +1,37 @@
11
import re
22
from typing import List, Mapping
33

4+
from agenthub.codeact_agent.prompt import EXAMPLES, SYSTEM_MESSAGE
45
from opendevin.action import (
56
Action,
67
AgentEchoAction,
78
AgentFinishAction,
9+
AgentTalkAction,
810
CmdRunAction,
11+
IPythonRunCellAction,
12+
NullAction,
913
)
1014
from opendevin.agent import Agent
1115
from opendevin.llm.llm import LLM
1216
from opendevin.observation import (
1317
AgentMessageObservation,
1418
CmdOutputObservation,
19+
IPythonRunCellObservation,
20+
UserMessageObservation,
1521
)
16-
from opendevin.sandbox.plugins import JupyterRequirement, PluginRequirement
17-
from opendevin.state import State
18-
19-
SYSTEM_MESSAGE = """You are a helpful assistant. You will be provided access (as root) to a bash shell to complete user-provided tasks.
20-
You will be able to execute commands in the bash shell, interact with the file system, install packages, and receive the output of your commands.
21-
22-
DO NOT provide code in ```triple backticks```. Instead, you should execute bash command on behalf of the user by wrapping them with <execute> and </execute>.
23-
For example:
24-
25-
You can list the files in the current directory by executing the following command:
26-
<execute>ls</execute>
27-
28-
You can also install packages using pip:
29-
<execute> pip install numpy </execute>
30-
31-
You can also write a block of code to a file:
32-
<execute>
33-
echo "import math
34-
print(math.pi)" > math.py
35-
</execute>
36-
37-
When you are done, execute the following to close the shell and end the conversation:
38-
<execute>exit</execute>
39-
"""
40-
41-
INVALID_INPUT_MESSAGE = (
42-
"I don't understand your input. \n"
43-
'If you want to execute command, please use <execute> YOUR_COMMAND_HERE </execute>.\n'
44-
'If you already completed the task, please exit the shell by generating: <execute> exit </execute>.'
22+
from opendevin.sandbox.plugins import (
23+
JupyterRequirement,
24+
PluginRequirement,
25+
SWEAgentCommandsRequirement,
4526
)
27+
from opendevin.state import State
4628

4729

4830
def parse_response(response) -> str:
4931
action = response.choices[0].message.content
50-
if '<execute>' in action and '</execute>' not in action:
51-
action += '</execute>'
32+
for lang in ['bash', 'ipython']:
33+
if f'<execute_{lang}>' in action and f'</execute_{lang}>' not in action:
34+
action += f'</execute_{lang}>'
5235
return action
5336

5437

@@ -58,7 +41,20 @@ class CodeActAgent(Agent):
5841
The agent works by passing the model a list of action-observation pairs and prompting the model to take the next step.
5942
"""
6043

61-
sandbox_plugins: List[PluginRequirement] = [JupyterRequirement()]
44+
sandbox_plugins: List[PluginRequirement] = [JupyterRequirement(), SWEAgentCommandsRequirement()]
45+
SUPPORTED_ACTIONS = (
46+
CmdRunAction,
47+
IPythonRunCellAction,
48+
AgentEchoAction,
49+
AgentTalkAction,
50+
NullAction
51+
)
52+
SUPPORTED_OBSERVATIONS = (
53+
AgentMessageObservation,
54+
UserMessageObservation,
55+
CmdOutputObservation,
56+
IPythonRunCellObservation
57+
)
6258

6359
def __init__(
6460
self,
@@ -93,56 +89,76 @@ def step(self, state: State) -> Action:
9389
assert state.plan.main_goal, 'Expecting instruction to be set'
9490
self.messages = [
9591
{'role': 'system', 'content': SYSTEM_MESSAGE},
96-
{'role': 'user', 'content': state.plan.main_goal},
92+
{
93+
'role': 'user',
94+
'content': (
95+
f'Here is an example of how you can interact with the environment for task solving:\n{EXAMPLES}\n\n'
96+
f"NOW, LET'S START!\n\n{state.plan.main_goal}"
97+
)
98+
},
9799
]
98100
updated_info = state.updated_info
99101
if updated_info:
100102
for prev_action, obs in updated_info:
101103
assert isinstance(
102-
prev_action, (CmdRunAction, AgentEchoAction)
103-
), 'Expecting CmdRunAction or AgentEchoAction for Action'
104-
if isinstance(
105-
obs, AgentMessageObservation
106-
): # warning message from itself
104+
prev_action, self.SUPPORTED_ACTIONS
105+
), f'{prev_action.__class__} is not supported (supported: {self.SUPPORTED_ACTIONS})'
106+
# prev_action is already added to self.messages when returned
107+
108+
# handle observations
109+
assert isinstance(
110+
obs, self.SUPPORTED_OBSERVATIONS
111+
), f'{obs.__class__} is not supported (supported: {self.SUPPORTED_OBSERVATIONS})'
112+
if isinstance(obs, (AgentMessageObservation, UserMessageObservation)):
107113
self.messages.append(
108114
{'role': 'user', 'content': obs.content})
115+
116+
# User wants to exit
117+
if obs.content.strip() == '/exit':
118+
return AgentFinishAction()
109119
elif isinstance(obs, CmdOutputObservation):
110120
content = 'OBSERVATION:\n' + obs.content
111121
content += f'\n[Command {obs.command_id} finished with exit code {obs.exit_code}]]'
112122
self.messages.append({'role': 'user', 'content': content})
123+
elif isinstance(obs, IPythonRunCellObservation):
124+
content = 'OBSERVATION:\n' + obs.content
125+
self.messages.append({'role': 'user', 'content': content})
113126
else:
114127
raise NotImplementedError(
115128
f'Unknown observation type: {obs.__class__}'
116129
)
130+
117131
response = self.llm.completion(
118132
messages=self.messages,
119-
stop=['</execute>'],
133+
stop=[
134+
'</execute_ipython>',
135+
'</execute_bash>',
136+
],
120137
temperature=0.0
121138
)
122139
action_str: str = parse_response(response)
123-
state.num_of_chars += sum(len(message['content'])
124-
for message in self.messages) + len(action_str)
140+
state.num_of_chars += sum(
141+
len(message['content']) for message in self.messages
142+
) + len(action_str)
125143
self.messages.append({'role': 'assistant', 'content': action_str})
126144

127-
command = re.search(r'<execute>(.*)</execute>', action_str, re.DOTALL)
128-
if command is not None:
145+
if bash_command := re.search(r'<execute_bash>(.*)</execute_bash>', action_str, re.DOTALL):
146+
# remove the command from the action string to get thought
147+
thought = action_str.replace(bash_command.group(0), '').strip()
129148
# a command was found
130-
command_group = command.group(1)
149+
command_group = bash_command.group(1).strip()
131150
if command_group.strip() == 'exit':
132151
return AgentFinishAction()
133-
return CmdRunAction(command=command_group)
134-
# # execute the code
135-
# # TODO: does exit_code get loaded into Message?
136-
# exit_code, observation = self.env.execute(command_group)
137-
# self._history.append(Message(Role.ASSISTANT, observation))
152+
return CmdRunAction(command=command_group, thought=thought)
153+
elif python_code := re.search(r'<execute_ipython>(.*)</execute_ipython>', action_str, re.DOTALL):
154+
# a code block was found
155+
code_group = python_code.group(1).strip()
156+
thought = action_str.replace(python_code.group(0), '').strip()
157+
return IPythonRunCellAction(code=code_group, thought=thought)
138158
else:
139-
# we could provide a error message for the model to continue similar to
140-
# https://github.com/xingyaoww/mint-bench/blob/main/mint/envs/general_env.py#L18-L23
141-
# observation = INVALID_INPUT_MESSAGE
142-
# self._history.append(Message(Role.ASSISTANT, observation))
143-
return AgentEchoAction(
144-
content=INVALID_INPUT_MESSAGE
145-
) # warning message to itself
159+
# We assume the LLM is GOOD enough that when it returns pure natural language
160+
# it want to talk to the user
161+
return AgentTalkAction(content=action_str)
146162

147163
def search_memory(self, query: str) -> List[str]:
148164
raise NotImplementedError('Implement this abstract method')

0 commit comments

Comments
 (0)