Skip to content

tao3k/wendao-datascience-py

Repository files navigation

wendao-datascience

wendao-datascience is an LLM-oriented datascience facade for Wendao query and get outputs.

It exists to keep one workflow simple:

  1. normalize Wendao payloads, rows, or Arrow tables into one stable dataset object
  2. expose Arrow and Polars views over that dataset
  3. summarize the dataset in a form that an LLM can consume quickly
  4. help an LLM write one script or several Python scripts that become a strong analyzer

This package does not own Wendao transport. wendao-core-lib and wendao-arrow-interface remain the transport and session-facing layers. This package starts after the Wendao data is already materialized.

Upstream Pins

The package pins wendao-core-lib and wendao-arrow-interface through [tool.uv.sources] so both packages resolve from the same upstream Git revision over https://github.com/tao3k/xiuxian-artisan-workshop.

Quick Start

uv sync
uv run pytest
uv run python examples/scripted_repo_search_first_implementation.py
from wendao_datascience import WendaoDataset

payload = {
    "rows": [
        {"doc_id": "doc-1", "language": "python", "score": 0.91},
        {"doc_id": "doc-2", "language": "rust", "score": 0.77},
    ]
}

dataset = WendaoDataset.from_query_payload(payload, route="/search/repos/main")
frame = dataset.to_polars()
profile = dataset.profile()
request = dataset.build_script_request("Summarize score distribution by language")

print(frame)
print(profile.to_markdown())
print(request.prompt)

Documentation

Primary package positioning and the LLM-facing goal are documented in docs/llm_analyzer_mission.md.

The first concrete implementation is examples/scripted_repo_search_first_implementation.py, which turns one WendaoArrowSession repo-search result into:

  1. one WendaoDataset
  2. one repo-search overview
  3. one LLM-ready script request

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors