This guide is intended for non-developers and walks you through installing, setting up, and running the Haru system.
Before you begin, make sure you're installing the system on a machine running either Ubuntu 20.04 or 24.04.
Please note that the Haru system has not been tested on macOS or Windows, and we cannot guarantee compatibility with those platforms.
The Haru system and its applications are packaged using Docker and distributed via the GitHub Container Registry (GHCR).
Most applications require access to GPU resources, so ensure that your system has an NVIDIA GPU and the appropriate drivers installed.
-
Install Docker engine
To install Docker on Ubuntu, follow the official guides here:
We strongly recommend following the official documentation, as it is regularly updated.
If you're short on time, you can also install Docker by copying and pasting the following commands into your terminal:
# Add Docker's official GPG key: sudo apt-get update sudo apt-get install ca-certificates curl sudo install -m 0755 -d /etc/apt/keyrings sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc sudo chmod a+r /etc/apt/keyrings/docker.asc # Add the repository to Apt sources: echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \ $(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}") stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt-get update # Install the Docker packages sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin # Create the docker group sudo groupadd docker # Add your user to the docker group sudo usermod -aG docker $USER newgrp docker # Configure Docker to start on boot with systemd sudo systemctl enable docker.service sudo systemctl enable containerd.service # Run hello-world docker run hello-world
-
Install the NVIDIA Container toolkit
To enable GPU support in Docker containers, you’ll need to install the NVIDIA Container Toolkit.
Follow the official guide here:
We highly recommend using the official documentation, as it is regularly updated and includes troubleshooting steps.
If you prefer a quicker setup, you can also run the following commands in your terminal:
# Configure the production repository curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list # Update the packages list from the repository sudo apt-get update # Install the NVIDIA Container Toolkit packages export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.17.8-1 sudo apt-get install -y \ nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \ nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \ libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \ libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION} # Configure the container runtime by using the nvidia-ctk command sudo nvidia-ctk runtime configure --runtime=docker # Restart the Docker daemon sudo systemctl restart docker
-
Verify the installation
docker ps
You should see an empty list, indicating that no containers are currently running.
With Docker and the NVIDIA Container Toolkit installed, you're ready to start downloading and running Haru applications.
-
Authenticate with the Private Registry
Haru applications are hosted in a private registry on GitHub. To authorize your machine to access it, run the following in your terminal:
export PAT=<your-pat> echo $PAT | docker login ghcr.io -u <your-github-username> --password-stdin
-
Download the Base Image
Pull the base image used by all Haru applications:
docker pull ghcr.io/haru-project/haru-os:latest
-
Confirm the Image Download
Verify that the
haru-osimage is available locally:docker images
You should see ghcr.io/haru-project/haru-os listed in the output.
As mentioned earlier, each Haru application is packaged as a Docker image. To install an application, simply pull its corresponding image from the registry.
Most application images follow this format ghcr.io/haru-project/<application-name>:<tag>.
Where:
<application-name>is the application name<tag>is the application version (e.g., latest)
The Haru Simulator runs a virtual Haru. It’s perfect for testing and development when you don’t have the physical robot.
To install the simulator, run:
docker pull ghcr.io/haru-project/hve-simulator@sha256:fb89b358b9c69ea34fedda4781d42158ef392af2ca96debb12d4344a8b81031dThe Haru Communication App is Haru’s main application. It’s made up of several Docker images that work together.
To install it, run:
docker pull ghcr.io/haru-project/strawberry-ros-azure-kinect:latest
docker pull ghcr.io/haru-project/strawberry-ros-faces-module:latest
docker pull ghcr.io/haru-project/strawberry-ros-hands:latest
docker pull ghcr.io/haru-project/strawberry-ros-people:latest
docker pull ghcr.io/haru-project/strawberry-ros-visualization:latest
docker pull ghcr.io/haru-project/strawberry-resource-monitor:latest
docker pull ghcr.io/haru-project/haru-speech:ros2
docker pull ghcr.io/haru-project/haru-llm:feature-eval-test
docker pull ghcr.io/haru-project/haru-agent-reasoner:feature-web-projector
docker pull ghcr.io/haru-project/strawberry-tts-api:latest
docker pull ghcr.io/haru-project/strawberry-tts:ros2
docker pull ghcr.io/haru-project/haru-ipad-action-server:ros2
docker pull ghcr.io/haru-project/haru-web-projector:latestEach application has a download data step (e.g., bash scripts/download_*_data.sh). These scripts extract default configuration files from the Docker images onto your host filesystem (into the data/ directory). This allows you to review and edit configuration files before launching the containers — for example, changing microphone settings, LLM model endpoints, or ROS parameters. You should run these scripts at least once before starting each application for the first time.
If you want to refresh every bundle in one shot, run bash scripts/download_all_data.sh. That wrapper runs each download script in sequence, removes the existing data/ tree before copying, and leaves the final permissions in the state that the downstream services expect.
The Kinect 99-k4a.rules udev rule still needs to be installed manually; follow the Perception troubleshooting instructions (search for “udev” below) when you first set up the Azure Kinect so you only have to run sudo once.
We recommend using the helper script scripts/compose.sh for all stacks. It automatically includes the shared apps/compose.common.yaml file and the correct envs/*.env.
To quickly validate all compose files, run:
bash scripts/validate_compose.shIf you want to launch all layers from a single compose file, use:
bash scripts/compose.sh all up --force-recreate -dThis uses envs/all.env for compose-time variables. Optional services still respect profiles:
bash scripts/compose.sh all --profile tts --profile webui up --force-recreate -dThe Haru Simulator uses a graphical interface, so you need to allow Docker to show windows on your screen. Run the following command in your terminal before starting the simulator:
xhost +local:dockerThis gives Docker permission to display graphical applications on your desktop. It is required because Docker containers need access to the host's X11 display server to render GUI windows (e.g., the Unity simulator, RViz, Groot).
Note: You only need to do this once per session, or each time you restart your computer.
Download data:
bash scripts/download_simulator_data.shStart command:
bash scripts/compose.sh simulator up --force-recreate -dExpected output:
- A Unity Application window appears
Once the software is launched, follow these steps on the Unity Application window:
-
Set the ROS_IP
Click the red button next to the text box to automatically set your ROS_IP.
-
Select the Scene: "Haru Virtual Avatar"
Use the green or yellow buttons next to the scene preview to browse and select "Haru Virtual Avatar".
Tick the "Set as default" box on the bottom left to set the scene selection as default.
-
Start the Scene
Click the blue "Start" button to begin loading the scene.
-
Open Options
Click the orange "Options" button to access the settings.
-
Adjust Scene Configuration
In the "Scene Configuration" tab:
- Enable the "Autoplay scene" checkbox (make sure it is checked).
- Disable the "Enable py_env" checkbox (make sure it is unchecked).
- Enable the "Launch RVIZ" checkbox (make sure it is checked).
Note: RViz may also be started by the perception layer. If you are running both the simulator and the perception stack, you may see two RViz windows. This is expected and will be consolidated in a future release.
-
Adjust Robot Configuration
In the "Haru Configuration" tab:
- Set "TTS Language" to your preferred language.
- Enable the "Publish Haru TFs" checkbox (make sure it is checked).
- (Optional) You can also:
- Adjust the robot’s position in space via the "Haru Base Pose" settings.
-
Apply and Restart
Click the grey "Apply and Restart" button to save changes and reload the scene.
-
Play the Scene
Once the scene reloads, click the green "Play" button (if "Autoplay scene" was checked it starts automatically).
-
Confirm Scene is Active
You should now see the robot’s eyes and mouth appear. Additionally, a new window named RViz should open.
-
Visualize Robot in RViz
You should now see a 3D model of the robot appear in the RViz window.
-
Use the Haru Web Interface
Open your web browser and go to: http://0.0.0.0:7000/haru_web
- Click on the "Haru control" tab.
- From here, you can:
- Control the robot’s motors manually.
- Use Text-To-Speech (TTS) to make the robot speak.
- Trigger Routines, which are pre-programmed movements or actions.
Note: This web interface is still experimental, so you may encounter some limitations or bugs.
To shut down the simulator, run:
bash scripts/compose.sh simulator downThe Haru Communication App uses a graphical interface, so you need to allow Docker to show windows on your screen. Run the following command in your terminal before starting the application:
xhost +local:dockerThis gives Docker permission to display graphical applications on your desktop. It is required because Docker containers need access to the host's X11 display server to render GUI windows (e.g., RViz, Groot).
Note: You only need to do this once per session, or each time you restart your computer.
HCA is made up of several layers that work together. We recommend starting them one at a time so you can confirm each one runs correctly before moving on.
-
Perception layer
Handles Haru’s vision and sensory input.
Configuration note:
You can change the containers configuration in theenvs/perception.env.Start command:
bash scripts/compose.sh perception up azure-kinect faces hands people visualization --force-recreate -d
Expected output:
- An RViz window appears showing:
- Live camera feed
- Detected skeletons and tracking markers
Related repositories for debug: strawberry-ros-people
- An RViz window appears showing:
-
Speech layer
Enables Haru’s speech recognition and speech input.
Download data:
bash scripts/download_speech_data.sh
Download/Clear models (mandatory before first start):
Important: You must download the speech models before starting the speech layer for the first time. Without this step, the
recognitioncontainer will crash on startup.bash scripts/compose.sh speech --profile setup up download-models --force-recreate
To clear and re-download models:
bash scripts/compose.sh speech --profile setup up clear-models --force-recreate
Configuration note: You can change the containers configuration in the
envs/speech.env. You can change the ROS nodes configuration in thedata/configs/haru_speech.yaml.Microphone selection and setup: By default, the audio node auto-detects an available microphone (e.g., Azure Kinect Microphone Array), which may not be the device you intend to use. To select a specific microphone, run
arecord -lon your host to list available capture devices, then update theaudio.deviceparameter indata/speech/configs/haru_speech.yamlto match the desired device name (e.g.,ZOOM H8). If you are using a H6/H8/H12 recorder as your microphone input, make sure to set it to Multi Track mode on the device itself before connecting it to your computer. This ensures all input channels are available to the system.Start command:
bash scripts/compose.sh speech up audio configure recognition verification --force-recreate -d
LifeCycle commands: Currently, the Speech layers are started (configure + activate) automatically by setting the
dev_autostart:=trueparameter.Expected output:
- Container logs on the
recognitionservice display:- VAD (Voice Activity Detection) status
- ASR (Automatic Speech Recognition) results for detected speech
Related repositories for debug: haru-speech
- Container logs on the
-
LLM layer
Provides Haru’s large language model capabilities.
Download data:
bash scripts/download_llm_data.sh
Configuration note:
envs/llm.envis the non-secret source of truth for config. Secrets (API keys, tokens) live inenvs/llm.secrets.env(untracked). You can change the LLM server configuration indata/llm/configs/litellm_server.yaml. You can change the ROS nodes configuration indata/llm/configs/haru_llm.yaml. You can change agent configs (prompts, settings) indata/llm/agents/.Setting up API keys (required for cloud models): The default configuration uses cloud-hosted models. To use them, you need to provide your API keys:
- Copy
envs/llm.secrets.env.exampletoenvs/llm.secrets.env - Fill in your API keys (e.g.,
OPENAI_API_KEY,ANTHROPIC_API_KEY,HF_TOKEN)
You can change which model each agent uses by editing the
*_MODEL_IDvariables inenvs/llm.env. The model names must match entries defined indata/llm/configs/litellm_server.yaml.Using local/self-hosted models: If you want to run your own model server (e.g., vLLM, Ollama), add a new model entry to
data/llm/configs/litellm_server.yaml:- model_name: custom-model litellm_params: model: <provider>/<model-name> api_base: http://<server-host>:<server-port>/v1
Then set the corresponding
*_MODEL_IDinenvs/llm.envtocustom-model. For a full list of supported providers and configuration options, see the LiteLLM Providers documentation.Start command:
bash scripts/compose.sh llm up action-args dashboard --force-recreate -d
Optional profiles:
- Web UI:
bash scripts/compose.sh llm --profile webui up webui --force-recreate -d
- vLLM:
bash scripts/compose.sh llm --profile vllm up vllm --force-recreate -d
LifeCycle commands: Currently, the LLM layers are started (configure + activate) automatically by setting the
dev_autostart:=trueparameter.Expected output:
- Container logs on the
action-argsservice confirm:- LLM agents are initialized
- Models are successfully loaded from the server
- LLM Dashboard is running at: http://127.0.0.1:8501
- LLM server is running at: http://127.0.0.1:4050
- LLM Web UI is running at: http://127.0.0.1:8080 (only if the
webuiprofile is started)
Related repositories for debug: haru-llm
- Copy
-
Reasoner layer
Manages decision-making and task execution.
Download data:
bash scripts/download_reasoner_data.sh
Configuration note: You can change the containers configuration in the
envs/reasoner.env.Microphone and iPad mapping (important for multi-mic setups): Edit
data/reasoner/configs/params/postprocessors_params.yamlto match your physical setup.mic_id_to_position— set the position (in meters) of each microphone relative to the robot's positionipad_id_to_mic_id— map each iPad device ID to a microphone channel ID (enables dynamic naming from iPads)mic_id_to_person_name— map each microphone channel ID to a default person name (used as fallback)
When iPads are connected and participants set their names on the iPad app, the system automatically uses those names instead of the static
mic_id_to_person_namevalues. The mapping flows through theipad_id_to_mic_idconfiguration: each iPad ID is linked to a mic channel, and the name set on the iPad is used for that channel's participant.Example (x = front/back, y = left/right, z = up/down):
mic_id_to_position: [ '0: {x: 1.0, y: 0.0, z: 0.0}', # 1m in front of robot '1: {x: 0.0, y: -1.0, z: 0.0}', # 1m to the right '2: {x: 0.0, y: 1.0, z: 0.0}', # 1m to the left '3: {x: -1.0, y: 0.0, z: 0.0}', # 1m behind '4: {x: , y: , z: }' # unused channel (ignored) ] ipad_id_to_mic_id: [ '1: 0', # iPad 1 linked to mic channel 0 '2: 1', # iPad 2 linked to mic channel 1 '3: 2', # iPad 3 linked to mic channel 2 ] mic_id_to_person_name: [ '0: {name: alice}', # channel 0 assigned to alice '1: {name: bob}', # channel 1 assigned to bob '2: {name: charlie}', # channel 2 assigned to charlie '3: {name: dana}', # channel 3 assigned to dana '4: {name: }' # unused channel (ignored) ]
Start command:
bash scripts/compose.sh reasoner up bt-forest --force-recreate -d
LifeCycle commands: Currently, the Reasoner layers are started (configure + activate) automatically by setting the
dev_autostart:=trueparameter.Expected output:
- Multiple Groot windows should open (one per behavior tree controller):
- Expressivity controller — manages TTS/Routine-driven expressions
- Gaze controller — manages gaze behavior
- iPad students controller — manages requests/responses to the students iPad
- iPad teacher controller — manages requests/responses to the teacher iPad
- Unity controller — manages the projection of photos/videos to the Unity Projector
- Both windows display the behavior tree and its current execution status
Note: The behavior tree controllers depend on action servers that run by different services (robot, iPad, projector, ...). If services are not running, some controllers may fail to load (timeout after ~10s) and fewer Groot windows will appear than expected. Make sure the services you wish to use are running before starting the reasoner.
Related repositories for debug: agent_reasoner
-
Expressive TTS layer (optional)
Enhances Haru with a more expressive and natural-sounding voice
Download data:
bash scripts/download_tts_data.sh
Start command:
bash scripts/compose.sh tts --profile tts up gpt-sovits cerevoice-api tts-client --force-recreate -d
Optional ROS bridge:
bash scripts/compose.sh tts --profile tts --profile ros up ros-node --force-recreate -d
Configuration note:
You can change the containers configuration in theenvs/tts.env.Expected output:
- GPT Sovits API is running at: http://127.0.0.1:9880
- Cerevoice API is running at: http://127.0.0.1:8015
- TTS API is running at: http://127.0.0.1:8022
Related repositories for debug: strawberry-tts
-
iPad layer (optional)
Provides an action server for controlling iPads connected to Haru. The iPads can be used as displays for students and teachers during interaction scenarios.
Prerequisites: You must have the teacher and student iPad apps installed via TestFlight (provided by 4i). Make sure the iPads are connected to the same network as the host machine.
Configuration note: You can change the containers configuration in the
envs/ipad.env. TheNUM_IPADSvariable controls how many student iPads the action server expects. Set this to the number of iPads running the student application that you have connected to the network. The teacher application wrapper is started automatically — no additional configuration is needed for it.iPad app settings: On each iPad, open Settings and find the Encouraging Mediator app entries (both teacher and student apps). Configure the following:
- ROS IP — set to the IP address of this machine (the one running the HCA stack)
- Port — set to
9091(default)
When the connection is successful, the connection icon in the app turns green and you should see connection logs appear in the console.
Start command:
bash scripts/compose.sh ipad up server --force-recreate -d
Expected output:
- Container logs on the
serverservice confirm:- iPad action server is initialized
- Connected to iPads
-
Projector layer (optional)
Enables a web-based projector display, allowing Haru to project images and videos onto a surface during interactions.
Projector resources are downloaded as part of the reasoner data and mounted into the projector service.
Configuration note: You can change the containers configuration in the
envs/projector.env. Projector resources are managed through the behavior tree system. You can update or replace the resources indata/reasoner/projector/to change what content is available for projection.Start command:
bash scripts/compose.sh projector up server --force-recreate -d
Expected output:
- Projector web server is running at: http://127.0.0.1:8081
The User Application provides the Episode Builder, a web interface for creating and managing task episodes.
Configuration note: You can change the containers configuration in the
envs/user.env.
Start command:
bash scripts/compose.sh user up episode-builder --force-recreate -dExpected output:
- Episode Builder web UI is running at: http://127.0.0.1:8551
Related repositories for debug: simple-haru-episode-builder
Once all layers are running, start a test task with:
bash scripts/compose.sh reasoner up reasoner context-manager execute-task-testOnce all layers are running, start a scenario task with:
bash scripts/compose.sh reasoner up reasoner context-manager execute-task-scenarioIn the simulator or on the real robot, Haru begins carrying out the assigned task.
To shut down all layers and running task, run:
bash scripts/compose.sh perception down
bash scripts/compose.sh speech down
bash scripts/compose.sh llm down
bash scripts/compose.sh reasoner down
bash scripts/compose.sh tts down
bash scripts/compose.sh ipad down
bash scripts/compose.sh projector down
bash scripts/compose.sh user downSometimes, you may need to adjust your settings if things don’t work as expected. Here are a couple of common issues and how to fix them:
-
Unity Application Won’t Start
If the Unity interface fails to open:
- In both your host system and the Haru Simulator container, run:
env | grep DISPLAY - Compare the values. They must be identical.
- If they are different, edit the file
envs/simulator.envand replace:with the actual value from your host system.DISPLAY=${DISPLAY:-=:0}
- In both your host system and the Haru Simulator container, run:
-
No Sound in Simulation
If you can’t hear the sound of clicks or the robot in the simulation:
- Run
aplay -lon your host machine to see the available audio devices. - Set the
AUDIO_CARDenvironment variable in theenvs/simulator.envfile to the device name or ID (e.g.,1,2, ...).
- Run
-
LLMs Don't Connect
If you cannot access the LLMs:
- If you are using LLMs that need an API key, make sure to provide it in the env file.
- Make sure you can chat with the LLMs you want to use. You can use the WebUI running at
http://0.0.0.0:8080. - For each LLM agent, you can set its LLM model by setting the following env variable
{AGENT_NAME}_MODEL_ID.
-
Face is Not Recognized
If your face is not being recognized by the system:
- Set the following environment variables in the
envs/perception.envfile:LEARNING_PERSON_NAME=<your-name> # the face will be registered as this name LEARNING_PERSON_ID=<your-id> # the person id to register (visible from RVIZ)
- Run the face registration process:
bash scripts/compose.sh perception up register-face --force-recreate
- The system will begin collecting frames of your face for training.
- Once training is complete (typically takes about 5 minutes), your face should be recognized correctly.
- Set the following environment variables in the
-
View Logs for More Information
If the problem persists, checking the logs can help identify errors:
docker logs -f <container_name>
Replace
<container_name>with the name of your running container (you check running containers withdocker ps). The logs will often include warnings or error messages you can use for troubleshooting.