starcoder server for huggingface-vscdoe custom endpoint.
Can't handle distributed inference very well yet.
This fork:
- Refactor the generator codes to separate classes
- Adds support for starcoder under ct2fast conversion for faster inference on consumer hardware
- Has a support vs code extension for triggered code completion see vstarcoder
PS: Rationale for not using huggingface-vscode explained in vstarcoder extension readme
pip install -r requirements.txt
python main.pyFill http://localhost:8000/api/generate/ into Hugging Face Code > Model ID or Endpoint in VSCode.
curl -X POST http://localhost:8000/api/generate/ -d '{"inputs": "", "parameters": {"max_new_tokens": 64}}'
# response = {"generated_text": ""}