Skip to content

Adding scripts & Readme steps for vLLM based workloads over IBM LSF#23

Open
arshabbir wants to merge 2 commits intoIBMSpectrumComputing:masterfrom
arshabbir:LSF-vLLM
Open

Adding scripts & Readme steps for vLLM based workloads over IBM LSF#23
arshabbir wants to merge 2 commits intoIBMSpectrumComputing:masterfrom
arshabbir:LSF-vLLM

Conversation

@arshabbir
Copy link
Copy Markdown

No description provided.

Comment thread LSF-vLLM/README.md
This repository shows how to run a long-running vLLM inference service under IBM LSF,
validate it through a standard OpenAI-compatible API, access it from a Jupyter notebook,
and reuse the same service from a downstream batch job.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little wordsmithing:

In this repository we demonstrate how to deploy a large-language model inference service on an LSF cluster using vLLM. The service exposes an OpenAI-compatible API. We show how various clients can use the model for interactive or batch inference.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the latest commit

Comment thread LSF-vLLM/README.md
- python3 installed
- curl installed
- network access from the execution host to pull the vLLM image and model
- a single-node IBM LSF setup is sufficient for this implementation
Copy link
Copy Markdown

@michaelspriggs michaelspriggs Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also require a shared $HOME directory, correct? That is not strictly necessary for LSF, but is a common deployment.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this in the latest README

Comment thread LSF-vLLM/README.md
- scripts/batch_client.py
Reads a prompt corpus and sends requests to the registered vLLM service.
- notebook/LSF_vLLM_Client.ipynb
Jupyter notebook for interactive validation against the IBM LSF-managed runtime.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no notebook subdirectory

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the latest commit

Comment thread LSF-vLLM/README.md
- notebook/LSF_vLLM_Client.ipynb
Jupyter notebook for interactive validation against the IBM LSF-managed runtime.
- corpus/prompts.txt
Sample prompt corpus for downstream batch validation.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no corpus subdirectory

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the latest commit

Comment thread LSF-vLLM/README.md
Prerequisites
-------------
- IBM LSF installed and operational
- podman installed
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this must be installed on all compute nodes of the cluster right? Not sure whether we need to use the LSF podman integration? I guess likely not (which is fine)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. We dont need LSF podmain integration

Comment thread LSF-vLLM/README.md

```bash
cp corpus/prompts.txt ~/lsf_vllm_poc/corpus/prompts.txt
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to include some lines to say to clone this repo, and cd into whatever base directory. Just make it easy for people to cut-and-paste lines so that they can reproduce this without having to think too much.

Also, need to update corpus -> scripts

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have addressed it and added the below ..hope this is fine . Please verify

git clone https://github.com/IBMSpectrumComputing/lsf-integrations.git
cd lsf-integrations/LSF-vLLM

After this follow the instructions step by step given below.

Comment thread LSF-vLLM/README.md

```bash
MODEL=Qwen/Qwen3-0.6B PORT=8001 API_KEY=local-vllm-key
```
Copy link
Copy Markdown

@michaelspriggs michaelspriggs Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how to do this? grep a line in one of the config files?

sounds like the step should be to update the API_KEY. Where do users get this key from? Should this be a prerequisite?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the below note, in the updated README.

NOTE :
Default demo API key: local-vllm-key

The service script uses this value unless API_KEY is explicitly set before submission.
If you choose a different value, update the curl commands, notebook cells, and batch client inputs accordingly.

Comment thread LSF-vLLM/README.md

```
http://127.0.0.1:8001/v1
```
Copy link
Copy Markdown

@michaelspriggs michaelspriggs Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this one, looks like you are starting the notebook on the cluster node, and then connecting from the laptop through ssh tunnel.

You should mention which host each command gets run on (laptop vs. LSF compute host) and also for the URL to use that in the web browser.

Also mention that a prerequisite for this is to have ssh access to a cluster node.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the README with these steps explaining where to run the commands

Comment thread LSF-vLLM/README.md
bjobs
bpeek ${BATCH_JOBID}
cat ~/lsf_vllm_poc/results/batch_${JOBID}.jsonl
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I suggest to break this into a few sections:

(1) Deploy the LLM

  • deploy
  • monitor
  • kill
    (2) Use the LLM
  • curl
  • Jupyter
  • LSF job

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review the new Restructured readme file.

Copy link
Copy Markdown

@michaelspriggs michaelspriggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my inline comments

@arshabbir arshabbir force-pushed the LSF-vLLM branch 5 times, most recently from 76cc627 to cd5baca Compare April 23, 2026 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants