iso3c_kr Vignette
Kadir Jun Ayhan
2024-04-24
Source:vignettes/iso3c_kr-vignette.Rmd
iso3c_kr-vignette.Rmd
In my research, I often work with country-year data from Korean sources, including data on diplomatic visits, trade, aid and so on. One of the fundamental difficulties I have had is the lack of universal country codes across different datasets. Further complicating matters is the inconsistency of country names in these datasets. For example, Democratic Republic of the Congo has five different spellings across different official sources that I could find: 콩고 민주공화국, 자이르, 콩고민주공화국, 콩고 민주 공화국, 콩고민주공화국(DR콩고).
To address this issue, I have created a function in my
kdiplo
package that converts Korean country names into ISO
3166-1 alpha-3 (iso3c) country codes. This function,
iso3c_kr
, is designed to assign universal iso3c country
codes to Korean-language country names that will make it easier to join
different kinds of data.
One still needs to check if the output is correct, especially for countries that have gone through political transitions such as Germany, Yugoslavia, Russia, Vietnam, Yemen and so on.
Sometimes the Korean government sources have overlapping data for Yugoslavia and Serbia, for example. In such cases, one needs to check the data and make sure that the data is correct.
For example, the following is sample Korean trade data from Korean Statistical Information Service (KOSIS):
국가별 | 2018 년 | 2019 년 | 2020 년 | 2021 년 | 2022 년 | 2023 년 |
---|---|---|---|---|---|---|
잠비아 | 26241 | 16087 | 17619 | 28356 | 14068 | 15459 |
잠비아 | 108344 | 54542 | 15164 | 100606 | 82198 | 53867 |
자이르 | NA | NA | NA | NA | NA | NA |
자이르 | 618 | 8 | 113 | 4 | NA | NA |
짐바브웨 | 25964 | 14088 | 15514 | 20404 | 16083 | 19563 |
짐바브웨 | 4909 | 13098 | 11377 | 9627 | 10415 | 20862 |
And, the following is sample Korean aid data from Korea’s ODA portal:
country_kr | sector | no_of_projects | aid_usd | aid_krw | year |
---|---|---|---|---|---|
베트남 | 통신정책, 계획 및 행정(voluntary code) | 2 | 232334 | 270736486 | 2019 |
캄보디아 | 11321 | 1 | 85815 | 99999361 | 2019 |
미얀마 | 사회보호/보장 | 1 | 103460 | 120560903 | 2019 |
라오스 | 비정규 농업훈련 | 1 | 107958 | 125802378 | 2019 |
몽골 | 의료서비스 | 5 | 511824 | 596423389 | 2019 |
Wide format is quite common in official Korean data sources. Trade
data is in wide format. Before using the iso3c_kr
function,
let’s first transform the trade data into a long (country-year) format
to make it in the same format as the aid data. This will make joining
the two datasets more feasible.
Using the iso3c_kr
function, we can simply convert
Korean country names into iso3c country codes. For example, the
following is the output of the iso3c_kr
function for the
Korean trade data:
trade <- iso3c_kr(trade, "country_kr") #you copy paste the column name that has the Korean country names.
trade[c(50, 150, 250, 350, 450, 550), c(1,5, 2:4)] %>% gt::gt()
country_kr | iso3c | year | export_kosis | import_kosis |
---|---|---|---|---|
계 | NA | 2014 | 572664607000 | 525514506000 |
아랍에미리트 연합 | ARE | 1996 | 1377933000 | 2259205000 |
앤티가바부다 | ATG | 1978 | NA | NA |
앵귈라 | AIA | 2019 | 817000 | 1000 |
아르메니아 | ARM | 2001 | 1255000 | 43000 |
앙골라 | AGO | 1983 | 235000 | NA |
We see that in this example, “계” (gyae) did not get any
iso3c country code. This is because the iso3c_kr
function
could not find the iso3c country code for this entry. This is because,
it is not a country name. “계” means total. It is best to check the data
to see which entries did not get an iso3c code.
## [1] "계, 국제통화기금, 기타, 기타국"
They mean “total”, “IMF”, “other”, and “other countries” in Korean. In other words, we are not missing any countries, which is good.
Now let’s convert the Korean country names in the aid data into iso3c country codes:
aid <- iso3c_kr(aid, "country_kr") #you copy paste the column name that has the Korean country names.
aid[c(50, 150, 250, 350, 450, 550),c(1, 6, 2:5)] %>% gt::gt()
country_kr | year | sector | no_of_projects | aid_usd | aid_krw |
---|---|---|---|---|---|
베트남 | 2019 | 통신정책, 계획 및 행정(voluntary code) | 2 | 232334 | 270736486 |
캄보디아 | 2019 | 11321 | 1 | 85815 | 99999361 |
미얀마 | 2019 | 사회보호/보장 | 1 | 103460 | 120560903 |
라오스 | 2019 | 비정규 농업훈련 | 1 | 107958 | 125802378 |
몽골 | 2019 | 의료서비스 | 5 | 511824 | 596423389 |
필리핀 | 2019 | 농업용수자원 | 2 | 0 | 0 |
Once you know the iso3c country codes, you can get the English
country names, or other country codes (such as Correlates of War country
codes) using the countrycode
package, for example.
country_kr | iso3c | country_name | year | export_kosis | import_kosis |
---|---|---|---|---|---|
계 | NA | NA | 2014 | 572664607000 | 525514506000 |
아랍에미리트 연합 | ARE | United Arab Emirates | 1996 | 1377933000 | 2259205000 |
앤티가바부다 | ATG | Antigua & Barbuda | 1978 | NA | NA |
앵귈라 | AIA | Anguilla | 2019 | 817000 | 1000 |
아르메니아 | ARM | Armenia | 2001 | 1255000 | 43000 |
앙골라 | AGO | Angola | 1983 | 235000 | NA |
More importantly, this function allows users to be able to join different datasets that have Korean country names. For example, one can join the trade data with the aid data using the iso3c country codes. In this example, I will join the trade data with the aid data using the iso3c country codes.
trade_aid <- trade %>% left_join(aid, by = c("iso3c", "year"), suffix = c("", "_aid"))
trade_aid %>%
filter(year == 2019 & !is.na(iso3c)) %>%
slice(c(50, 150, 250, 350, 450, 550)) %>%
select(c(1, 5, 6, 2:4, 8, 10)) %>%
gt::gt()
country_kr | iso3c | country_name | year | export_kosis | import_kosis | sector | aid_usd |
---|---|---|---|---|---|---|---|
아르메니아 | ARM | Armenia | 2019 | 12729000 | 16743000 | 전문대,대학(원) 교육 | 119069 |
방글라데시 | BGD | Bangladesh | 2019 | 1282342000 | 404703000 | 건설정책 및 행정관리 | 46251 |
볼리비아 | BOL | Bolivia | 2019 | 30434000 | 450576000 | 환경정책 및 행정관리 | 80969 |
코트디부아르 | CIV | Côte d’Ivoire | 2019 | 136494000 | 5264000 | 교육정책 및 행정관리 | 30096 |
콜롬비아 | COL | Colombia | 2019 | 1143075000 | 718214000 | 성인 기초생활교육 | 62976 |
알제리 | DZA | Algeria | 2019 | 700918000 | 1746239000 | 레크리에이션 및 스포츠(voluntary code) | 22312 |
Voilà! Now we have a dataset that has both trade and aid data, both
of which originally did not have consistent country names or country
codes. I plan to add warning messages to the iso3c_kr
function to make it easier to spot potential issues with the conversion
of Korean country names. I will continue to update the Korean country
name dataset in the kdiplo
package as I come across new
data sources. Feel free to report unavailable country names in the
iso3c_kr
function to me using the issue tracker on
Github.