This is an experimentation project and a work in progress. It is a custom MCP (Model Context Protocol) server that runs locally and enables a tool like Claude Desktop to search the public Borealis Dataverse - Canada's research data repository - directly from conversations.
Note: This MCP server was built with help from Claude for the Python parts.
This connector allows:
- Searching through research datasets from Canadian institutions
- Filtering results by specific institutions (70+ Canadian institutions supported)
- Filtering by geographic coverage (country, province/state, city that the data is about)
- Returning formatted results with titles, descriptions, DOI links, and authors
- Accessing both published and unpublished datasets (unpublished only with API key and access permissions)
- Retrieving dataset metadata when asking for more information about specific datasets
- Listing files within datasets with filtering and pagination
- Retrieving and viewing text-based dataset files directly in chat (under 5MB, configurable line limit)
- Boolean operators (AND/OR/NOT) supported in searches, case-insensitive
- "Search Borealis for datasets about pollination"
- "Use my Borealis search tool to find datasets from UofT about bees"
- "Show me datasets from the last 5 years from UBC about forestry"
- "Find datasets about healthcare in Saskatchewan"
- "Tell me more about the SynPAIN dataset"
- "What files are in this dataset?"
- "Can we look at the readme file together?"
- Claude Desktop (Claude Desktop is used here in the instructions as an example)
- Python 3.7+
- A Borealis account and API key (optional - public searches work without authentication)
pip install mcp httpx python-docxpython-docx is required for Word document (.docx) extraction. If you skip it, the server will still work — .docx files will return a download link with instructions to install the library.
git clone https://github.com/jesswhyte/borealis_dataverse_mcp.git
cd borealis_dataverse_mcpchmod +x borealis_server.py- Go to https://borealisdata.ca
- Log in or create an account
- Navigate to your account settings
- Generate an API token
- Copy the token (format:
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
Edit your Claude Desktop configuration file, e.g. if you're on a Mac:
nano ~/Library/Application\ Support/Claude/claude_desktop_config.jsonAdd the following configuration (note you will need to change the path to borealis_server.py and edit your API key):
{
"mcpServers": {
"borealis-dataverse": {
"command": "python3",
"args": [
"/absolute/path/to/borealis_server.py"
],
"env": {
"BOREALIS_API_KEY": "your-api-key-here"
}
}
}
}Important:
- Replace
/absolute/path/to/borealis_server.pywith the full path to your cloned repository - Replace
your-api-key-herewith your actual Borealis API key - If you don't have an API key, omit the
envsection entirely - public searches will still work
- Quit Claude Desktop completely (⌘+Q)
- Reopen Claude Desktop
Check the logs to confirm the server started successfully:
cat ~/Library/Logs/Claude/mcp*.log | grep borealisYou should see messages indicating the server started and connected.
Open a new conversation in Claude Desktop and try:
Search Borealis for datasets about pelagic species from UBC
Search Borealis for datasets about [your topic]
Find datasets from [Institution Name] about [topic]
Supported institution formats:
- Full names: "University of Toronto", "McGill University"
- Some short names: "UBC", "U of T"
- Common abbreviations: "UAlberta", "UWaterloo"
Find datasets about [city/province/country]
Note: Geographic filters indicate what region the data is about (e.g., "datasets about Halifax"), not where the researchers are located.
After viewing search results, you can ask for detailed metadata:
Tell me more about the [dataset name]
This retrieves metadata including descriptions, all authors with affiliations, keywords, license details, and file information.
The tool supports:
- Number of results: Request more or fewer results (max 100) in prompt
- Sorting: Sort by relevance (default), date (newest first), or name (alphabetical)
- Type filtering: Filter by dataset, dataverse, or file
- Combined filters: Mix university, geographic, and keyword filters in a single query
Includes mappings for 70+ Canadian institutions. See borealis_server.py for the complete list.
The MCP server has four tools:
Search for datasets. Supports boolean operators (AND/OR/NOT) — case-insensitive, automatically normalized.
Retrieve metadata for a specific dataset
List all files in a specific dataset with support for:
- Pagination (limit and offset parameters)
- File type filtering (search by extension or filename)
- File metadata (size, type, access restrictions)
- MD5 checksums for verification
- File IDs for retrieval
Download and retrieve file content with intelligent handling:
- Text-based files (CSV, TXT, DAT, R, Python, etc.) displayed directly in chat
- Word documents (
.docx) extracted as plain text with heading structure preserved (requirespython-docx) - 5MB maximum file size for chat display
- Configurable line limit (
max_lines, default 100, max 2000); when truncated, response explains the limit and offers to re-fetch with a higher value - Pass
doito include a direct download link in truncation messages - PDFs return a direct download URL and a Claude Desktop drag-and-drop tip
- Other binary files return a direct download URL
- File format detection and validation
- Language: Python 3
- MCP SDK: Official Python MCP library from Anthropic
- HTTP Library: httpx for async API calls
- Configuration: Environment variables and JSON config file
- Receives a search or metadata request from the user
- The MCP server translates institution names to dataverse identifiers and formats queries
- The server queries the Borealis API with appropriate filters
- Results are parsed and formatted
- Returns structured results with DOI links and metadata
Give it a second or restart Claude Sometimes the MCP server doesn't start up fast enough, try restarting Claude and give it a second before attempting again.
Check the logs: For example...
cat ~/Library/Logs/Claude/mcp*.log | tail -50Common issues:
- Verify the file path in your config is correct and absolute
- Ensure the Python file is executable (
chmod +x borealis_server.py) - Check that Python 3 is available:
which python3
The server automatically falls back to public search if authentication fails. To verify your API key:
curl -H "X-Dataverse-key: YOUR_KEY" "https://borealisdata.ca/api/search?q=test"- Ensure you started a new conversation after restarting Claude Desktop
- MCP tools only load into conversations created after the server connects
- Check that the server shows as connected in Claude Desktop
To add additional tools:
- Add the tool definition to
list_tools() - Implement the handler function
- Add the handler to
call_tool()
See the Borealis API documentation for available endpoints: https://borealisdata.ca/guides/en/latest/api/
Test the server directly:
cd /path/to/borealis-mcp-server
python3 borealis_server.pyThe server should start and wait for input without errors.
- The server uses async/await for non-blocking API calls
- Authentication is optional; public searches work without an API key
- Institution name matching is case-insensitive
- The
subtreeparameter filters results to specific dataverses - Geographic filters use the
fq(filter query) parameter - Results are limited to 100 per request (Borealis API limit)
- Metadata is retrieved in JSON-LD format and parsed for display
- It uses MCP’s stdio transport, so it talks over standard input and output instead of using HTTP. It must be started by an MCP-compatible host (for example, Claude Desktop) and does not run as an HTTP server, so it cannot be started with uvicorn or opened in a browser.
- Geographic filters indicate dataset subject matter, not researcher location
- Regional groupings (e.g., "Toronto-based research" spanning multiple institutions) not yet implemented
- Some MCP server startup timing issues may require restarting Claude Desktop
Areas for potential enhancement:
- Code refactoring (split this up into config, tools). It's a little unwieldy.
- Regional institution groupings (e.g., all Toronto institutions)
- Broader file download capabilities
- Better error handling and user feedback
- Expanded geographic mappings
- Date range filtering
- Could add an HTTP based MCP interface so it works with OpenAI style connectors.
GPLv3
- Built using the Model Context Protocol
- Developed with a lot of assistance from Claude
- Powered by Borealis Dataverse. Borealis has high quality metadata, a well-structured API, and clear documentation. The connector depends on that foundation.
For issues related to:
- This MCP server: Open an issue in this repository
- Borealis API: Visit https://borealisdata.ca/guides/
- Claude Desktop: Contact Anthropic support
- MCP Protocol: See https://modelcontextprotocol.io/


