Goal: Match individual Russian elections with their corresponding OKTMO code
Problem: The Russian election site does not provide unique indicators to match electoral data to other datasets for the same geographic unit. For example, here is the landing page for the results for the recent election of the municipal head of 'Майское сельское поселение'.
[url removed, login to view]
The main webpage has a header that tells what the election is for (Выборы главы муниципального образования "Майское сельское поселение"). In some cases, but not all, there is some extra information about the ИК (избирательная комиссия Кошехабльского района). There is an identifying number in the URL for the region ('1' : adygei).
What I'd like is a Python script run that matches information from these three fields:
- region number
- header of main election webpage
- избирательная комиссия
to a database of ОКТМО codes that I have downloaded and organized. So in this case, I would want the Python algorithm to link to the observation I've created based on information from this page: [url removed, login to view] I want to link elections to the most detailed ОКТМО possible based on the above three fields.
To help do this, I have created several databases to be used as inputs:
ОКТМО codes: all of the data scraped and organized from [url removed, login to view] This has fields indicating the name of the geographic unit and its ОКТМО code.
Region 'dictionary': This maps the 'region number' in the URL to a list of Russian regions to help matching.
Database of over 20,000 elections with the three fields mentioned above: region number, header of main election webpage, and избирательная комиссия.
In addition, I have another dataset that links the first 8-digits of the VRN field from the URL (VRN-8) to the муниципальный район/образование that can be used as a robustness check or to help match when the algorithm cannot decide on a final ОКТМО candidate. The logic here is that the VRN-8 is a unique identifier for the higher level муниципальный район/образование in which the election was held. In the case above, '40140031' links to Кошехабльский район, which is the higher administrative unit for the Майское сельское поселение. However, the rest of the VRN number does not uniquely identify the lower subunits. Moreover, this extra dataset has messy address data (I can send a sample if its useful) and may not cover all elections in the dataset.
1) For each election, the correct ОКТМО code would be assigned that corresponds to the geographic unit where the election was held. This code would have some kind of 'accuracy score' attached to it.
2) I would also need a Python script that could neatly input a list of elections, and using the dictionaries, match future elections to their correct ОКТМО code. The code that implements the fuzzy string matching algorithm could also be used on different projects in the future.