wendao-datascience is an LLM-oriented datascience facade for Wendao query
and get outputs.
It exists to keep one workflow simple:
- normalize Wendao payloads, rows, or Arrow tables into one stable dataset object
- expose Arrow and Polars views over that dataset
- summarize the dataset in a form that an LLM can consume quickly
- help an LLM write one script or several Python scripts that become a strong analyzer
This package does not own Wendao transport. wendao-core-lib and
wendao-arrow-interface remain the transport and session-facing layers. This
package starts after the Wendao data is already materialized.
The package pins wendao-core-lib and wendao-arrow-interface through
[tool.uv.sources] so both packages resolve from the same upstream Git
revision over https://github.com/tao3k/xiuxian-artisan-workshop.
uv sync
uv run pytest
uv run python examples/scripted_repo_search_first_implementation.pyfrom wendao_datascience import WendaoDataset
payload = {
"rows": [
{"doc_id": "doc-1", "language": "python", "score": 0.91},
{"doc_id": "doc-2", "language": "rust", "score": 0.77},
]
}
dataset = WendaoDataset.from_query_payload(payload, route="/search/repos/main")
frame = dataset.to_polars()
profile = dataset.profile()
request = dataset.build_script_request("Summarize score distribution by language")
print(frame)
print(profile.to_markdown())
print(request.prompt)Primary package positioning and the LLM-facing goal are documented in
docs/llm_analyzer_mission.md.
The first concrete implementation is
examples/scripted_repo_search_first_implementation.py,
which turns one WendaoArrowSession repo-search result into:
- one
WendaoDataset - one repo-search overview
- one LLM-ready script request