1117 Russian cities with geographic coordinates, identifiers and 2020 population estimate.
from pathlib import Path
import requests
import pandas as pd
url = ("https://raw.githubusercontent.com/"
"epogrebnyak/ru-cities/main/assets/towns.csv")
# save file locally
p = Path("towns.csv")
if not p.exists():
content = requests.get(url).text
p.write_text(content, encoding="utf-8")
# read as dataframe
df = pd.read_csv("towns.csv")
print(df.sample(5))- towns.csv - city information
- regions.csv - list of Russian Federation regions
- alt_city_names.json - alternative city names
Basic info:
city- city name (several cities have alternative names marked inalt_city_names.json)population- city population, thousand people, Rosstat estimate as of 1.1.2020lat,lon- city geographic coordinates
Region:
region_name- subnational region name: oblast, republic, krai or one AO (Chukotka)region_name_ao- autonomous okrug (AO) name, if AO is a part of larger regions (applies to 3 AO)region_iso_code- ISO 3166 code, egRU-VLDfederal_district, egЦентральный
City codes:
okatooktmofias_idkladr_id
- City list and city population collected from Rosstat publication Регионы России. Основные социально-экономические показатели городов and parsed from publication Microsoft Word files.
- City list corresponds to this Wikipedia article.
- Alternative dataset is wiki-based Dadata city dataset (no population data).
There are four autonomous regions (AO) in Russia:
- Ненецкий автономный округ
- Ханты-Мансийский автономный округ - Югра
- Чукотский автономный округ
- Ямало-Ненецкий автономный округ
Ханты-Мансийский and Ямало-Ненецкий (AO) are inner parts of Тюменская область.
Ненецкий autonomous regions (AO) is inner part of Архангельская область.
AO names above are listed in region_name_ao for three AO.
Чукотский AO is a stand-alone region, it is not an inner part of any region.
Чукотский автономный округ is listed in region_name only.
- Several notable towns are classified as administrative part of larger cities (
Сестрорецкis a municpality at Saint-Petersburg,Щербинкаis a part of larger Moscow). They are not reported in this dataset.
Белоозерскийnot found in Rosstat publication, but should be considered a city as of January 1, 2020. We included it into dataset.
ДмитриевandДмитриев-Льговскийare the same city.- We suppressed letter "ё"
citycolumns in towns.csv - we haveОрел, but notОрёл. This affected, for example:БелоозёрскийКоролёвЛикино-ДулёвоОзёрыЩёлковоОрёл
assets/alt_city_names.json contains the alternative name pairing.
poetry install
poetry run python -m pytest
Run:
- download data from Rosstat using rar/get.sh
- convert
Саратовская область.docto docx - run make.py
Creates:
_towns.csvassets/regions.csv
Note: do not attempt this stage if you do not have to - these scripts take a while and use third-party API access. You have the resulting files in repo, so probably you can skip running these scripts.
Run:
cd geocoding- run coord_dadata.py (needs token)
- run coord_osm.py
Creates:
- geocoding/coord_dadata.csv
- geocoding/coord_osm.csv
Run:
- run merge.py
Creates:
- assets/towns.csv