It's just a JSON file, so you can use it in any environment. Sourced from GitHub's Linguist project (defines all 145 data languages known to GitHub). Data is updated via script and released via new package version.
pip install data-languagesimport data_languages
json_lang_data = data_languages['JSON']
print(json_lang_data['extensions']) # => ['.4DForm', '.4DProject', '.avsc', ...]Note: Most type checkers will falsely warn data_languages is not subscriptable because they are incapable of analyzing runtime behavior (where the module is replaced w/ a dictionary for cleaner, direct access). You can safely suppress such warnings using # type: ignore.
Get language(s) from an extension:
def get_lang(file_ext):
lang_matches = [
lang for lang, data in data_languages.items()
if file_ext in data['extensions']
]
return lang_matches[0] if len(lang_matches) == 1 else lang_matches
print(get_lang('.ical')) # => iCalendarGet language(s) from a file path:
def get_lang_from_path(filepath):
from pathlib import Path
file_ext = Path(filepath).suffix
lang_matches = [
lang for lang, data in data_languages.items()
if file_ext in data['extensions']
]
return lang_matches[0] if len(matches) == 1 else lang_matches
print(get_lang_from_path('steam.vdf')) # => Valve Data Format
print(get_lang_from_path('Sublime.sublime-snippet')) # => XML
print(get_lang_from_path('README.md')) # => [] (use prose-languages pkg)Copyright © 2026 Adam Lui
</> markup-languages - File extensions for markup languages.
🇨🇳 non-latin-locales - ISO 639-1 (2-letter) codes for non-Latin locales.
#! programming-languages - File extensions for programming languages.


