VI. Geography

In this section, we make use of the Geographic Data for Ancient Near Eastern Archaeological Sites from Professor Olof Pedersén at Uppsala University to locate some of the geographic names appeared in the Drehem texts.

We will use fuzzy search to find matches between the ANE files and our texts, and then use the geographic locations in the ANE to plot those that appear in our texts.

The following installs and imports the relevant libararies.

from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
%cd "drive/My Drive/SumerianNetworks"
[Errno 2] No such file or directory: 'drive/My Drive/SumerianNetworks'
/content/drive/My Drive/SumerianNetworks
!pip install geopandas
!pip install descartes
!pip install pysal
!pip install fuzzywuzzy
!pip install python-Levenshtein
!pip install NetworkX
import pandas as pd
import numpy as np
import geopandas as gpd
from fuzzywuzzy import fuzz 
from fuzzywuzzy import process 
import fiona
fiona.drvsupport.supported_drivers['kml'] = 'rw' # enable KML support which is disabled by default
fiona.drvsupport.supported_drivers['KML'] = 'rw' # enable KML support which is disabled by defaults
import networkx as nx
/usr/local/lib/python3.7/dist-packages/geopandas/_compat.py:110: UserWarning: The Shapely GEOS version (3.8.0-CAPI-1.13.1 ) is incompatible with the GEOS version PyGEOS was compiled with (3.9.1-CAPI-1.14.2). Conversions between both will be slow.
  shapely_geos_version, geos_capi_version_string

We first import filtered Drehem data and extract all the geographic names: SN (settlement name), TN (temple name), AN, FN (field name), QN (quarter name), WN (water course name).

drehem = pd.read_csv("JupyterBook/words_df.csv")
drehem.head()
Unnamed: 0 lemma id_text id_line id_word label
0 0 1(geš₂)[]NU P113959 3 P113959.3.1 o 1
1 1 1(u)[]NU P113959 3 P113959.3.2 o 1
2 2 6(aš)[]NU P113959 3 P113959.3.3 o 1
3 3 4(barig)[]NU P113959 3 P113959.3.4 o 1
4 4 3(ban₂)[]NU P113959 3 P113959.3.5 o 1
drehem.shape
(3175331, 6)
geo_names = {}
p_number = ''
places = []
geo_names_list = []

#SN (settlement name), TN (temple name), AN, FN (field name), QN (quarter name), WN (water course name)
#remove year names
types = ["SN", "TN", "AN", "FN", "QN", "WN"]
for index, row in drehem.iterrows():
      # if row['id_word'][:7] != p_number:
      #     geo_names[p_number] = places
      #     p_number = row['id_word'][:7]
      #     places = []
      w = row['lemma']
      if w[-2:] in types:
          geo_names[row['id_text']] = w
# geo_names[p_number] = [places]
# del geo_names['']
d = {'ip':list(geo_names.keys()),'geo_name': list(geo_names.values())}
geo_df = pd.DataFrame.from_dict(data = d)
geo_df.head()
ip geo_name
0 P125538 Nibru[1]SN
1 P379198 hub₂-e[]FN
2 P110676 e-su-dar[]FN
3 P388027 Irisaŋrig[1]SN
4 P118246 Gaeš[1]SN

Now we remove the suffixes and leave just the geographical names. (eg. “[00]SN”).

def remove_tail(string):
  return string.split("[")[0]

non_emp_geo = geo_df
non_emp_geo["cleaned_geo_name"] = non_emp_geo["geo_name"].map(remove_tail)
non_emp_geo.head()
ip geo_name cleaned_geo_name
0 P125538 Nibru[1]SN Nibru
1 P379198 hub₂-e[]FN hub₂-e
2 P110676 e-su-dar[]FN e-su-dar
3 P388027 Irisaŋrig[1]SN Irisaŋrig
4 P118246 Gaeš[1]SN Gaeš

We now import the ANE KMZ files. Make sure you have the latest files from here and change the file name below accordingly.

location = gpd.read_file('JupyterBook/jupyter_collection/04_Geographic_Names/geopandas_intro/data/sites.kml', driver='KML',layer=2)
print(location.shape)
location.head()
(2857, 3)
Name Description geometry
0 Adab (Bismaya) POINT Z (45.62386 31.95071 0.00000)
1 Abydos POINT Z (31.91906 26.18509 0.00000)
2 Adamdun? (Teppe Surkhehgan) POINT Z (48.79814 32.02168 0.00000)
3 Admannu / Natmane (Tell 'Ali) POINT Z (43.68200 35.38198 0.00000)
4 Adumatu? (Dumat el-Jandal) POINT Z (39.86723 29.81135 0.00000)

Since the Name column contains both the modern name and ancient name of the same place, we will convert them into a list for matching purpose.

def parse_names(string):
    """Takes in a string (eg. Adamdun? (Teppe Surkhehgan))
    Returns a tuple of list of names, and whether 
    or not there's doubts ('?') on the ancient name (0/1).
    In case there
    """
    names = []
    ancient_names = ""
    modern_names = ""
    if '?' in string:
        string = string.replace('?', '')
    if '(' in string:
        ancient_names = string.split('(')[0]
        modern_names = string.split('(')[1].split(')')[0]
        if '/' in modern_names:
            for name in modern_names.split('/'):
                names.append(name)
        else:
            names.append(modern_names)
    else:
        ancient_names = string
    
    if '/' in ancient_names:
        for name in ancient_names.split('/'):
            names.append(name)
    else:
        names.append(ancient_names)
    return names
location["Name (in list)"] = location["Name"].map(parse_names)
location.head()
Name Description geometry Name (in list)
0 Adab (Bismaya) POINT Z (45.62386 31.95071 0.00000) [Bismaya, Adab ]
1 Abydos POINT Z (31.91906 26.18509 0.00000) [Abydos]
2 Adamdun? (Teppe Surkhehgan) POINT Z (48.79814 32.02168 0.00000) [Teppe Surkhehgan, Adamdun ]
3 Admannu / Natmane (Tell 'Ali) POINT Z (43.68200 35.38198 0.00000) [Tell 'Ali, Admannu , Natmane ]
4 Adumatu? (Dumat el-Jandal) POINT Z (39.86723 29.81135 0.00000) [Dumat el-Jandal, Adumatu ]

There are also columns with a question mark indicating if there’s any doubt in the ancient and moden name equivalence, we will make a new column Doubt? indicating that information.

location["Doubt?"] = location["Name"].map(lambda string: '?' in string)
location.head()
Name Description geometry Name (in list) Doubt?
0 Adab (Bismaya) POINT Z (45.62386 31.95071 0.00000) [Bismaya, Adab ] False
1 Abydos POINT Z (31.91906 26.18509 0.00000) [Abydos] False
2 Adamdun? (Teppe Surkhehgan) POINT Z (48.79814 32.02168 0.00000) [Teppe Surkhehgan, Adamdun ] True
3 Admannu / Natmane (Tell 'Ali) POINT Z (43.68200 35.38198 0.00000) [Tell 'Ali, Admannu , Natmane ] False
4 Adumatu? (Dumat el-Jandal) POINT Z (39.86723 29.81135 0.00000) [Dumat el-Jandal, Adumatu ] True

Now expand location into one name for each row.

location_expanded = location.explode("Name (in list)")
location_expanded.head()
Name Description geometry Name (in list) Doubt?
0 Adab (Bismaya) POINT Z (45.62386 31.95071 0.00000) Bismaya False
0 Adab (Bismaya) POINT Z (45.62386 31.95071 0.00000) Adab False
1 Abydos POINT Z (31.91906 26.18509 0.00000) Abydos False
2 Adamdun? (Teppe Surkhehgan) POINT Z (48.79814 32.02168 0.00000) Teppe Surkhehgan True
2 Adamdun? (Teppe Surkhehgan) POINT Z (48.79814 32.02168 0.00000) Adamdun True

Now we have 5856 places.

location_expanded.shape
(5856, 5)

Now we have both files (ANE file and our Drehem file) ready, we will match the geo names in these files. We will use the fuzzywuzzy library’s WRatio, which is more tolerant to small differences like capitalization and punctuation. You can try different methods.

Note that the matching takes a fairly long time (> 30 mins) to run.

def fuzzy_search(geo_name):
  scores = location_expanded['Name (in list)'].map(lambda name: fuzz.WRatio(geo_name, name))
  
  return list(location_expanded[scores>90]['Name'])
non_emp_geo['fuzzy_match'] = non_emp_geo['cleaned_geo_name'].map(fuzzy_search)
non_emp_geo[non_emp_geo['fuzzy_match'].map(len)>0]
ip geo_name cleaned_geo_name fuzzy_match
9 P121982 Umma[1]SN Umma [Umma (Tell Jokha)]
16 P465780 Umma[1]SN Umma [Umma (Tell Jokha)]
19 P141679 Umma[1]SN Umma [Umma (Tell Jokha)]
27 P100676 Umma[1]SN Umma [Umma (Tell Jokha)]
56 P131594 Umma[1]SN Umma [Umma (Tell Jokha)]
... ... ... ... ...
20679 P136415 Lagaš[1]SN Lagaš [Lagaš (El-Hiba)]
20684 P122006 Umma[1]SN Umma [Umma (Tell Jokha)]
20686 P218142 Umma[1]SN Umma [Umma (Tell Jokha)]
20690 P322761 Zabalam[1]SN Zabalam [Zabalam (Tulul Ibzaikh)]
20697 P378821 Isin[1]SN Isin [Isin (Ishan Bahriyat)]

3802 rows × 4 columns

If we set the matching score threshold to 90, we got 37 unique matches.

matches = non_emp_geo[non_emp_geo['fuzzy_match'].map(len)>0]
unique_matches = matches.drop_duplicates('geo_name').reset_index(drop=True)
# unique_matches = matched['cleaned_geo_name'].unique()
unique_matches.head()
ip geo_name cleaned_geo_name fuzzy_match
0 P121982 Umma[1]SN Umma [Umma (Tell Jokha)]
1 P136601 Isin[1]SN Isin [Isin (Ishan Bahriyat)]
2 P110363 Anšan[1]SN Anšan [Anšan (Tell Malyan)]
3 P324788 Zabalam[1]SN Zabalam [Zabalam (Tulul Ibzaikh)]
4 P118180 Eridug[1]SN Eridug [Eridu (Tell Abu Shahrain)]
unique_matches['fuzzy_match'] = unique_matches['fuzzy_match'].map(lambda l: l[0])
unique_matches.head()
ip geo_name cleaned_geo_name fuzzy_match
0 P121982 Umma[1]SN Umma Umma (Tell Jokha)
1 P136601 Isin[1]SN Isin Isin (Ishan Bahriyat)
2 P110363 Anšan[1]SN Anšan Anšan (Tell Malyan)
3 P324788 Zabalam[1]SN Zabalam Zabalam (Tulul Ibzaikh)
4 P118180 Eridug[1]SN Eridug Eridu (Tell Abu Shahrain)

Here we saved the matches to drive so we can reuse it later.

non_emp_geo.to_csv("JupyterBook/jupyter_collection/04_Geographic_Names/working files/non_emp_geo.csv")

Now we try to set the score to be > 87.5. Note the following cell also runs a fairly long time.

def fuzzy_search_875(geo_name):
  scores = location_expanded['Name (in list)'].map(lambda name: fuzz.WRatio(geo_name, name))
  
  return list(location_expanded[scores>87.5]['Name'])
non_emp_geo = pd.read_csv("JupyterBook/jupyter_collection/04_Geographic_Names/working files/non_emp_geo.csv")
non_emp_geo['fuzzy_match_875'] = non_emp_geo['cleaned_geo_name'].map(fuzzy_search_875)
non_emp_geo[non_emp_geo['fuzzy_match_875'].map(len)>0]

We got more matches, but some of them are not what we want.

non_emp_geo[non_emp_geo['fuzzy_match_875'].map(len)>0]["cleaned_geo_name"].unique()

We can see the subset of places which are not in the 33 matches with scores greater than 90 but are in matches with scores greater than 87.5. We can see some of them are not the same place, but there are places we missed in the 33.

in875_notin_90 = non_emp_geo[(non_emp_geo['fuzzy_match'].map(len)==2)&(non_emp_geo['fuzzy_match_875'].map(len)>0)]
in875_notin_90.head(50)
in875_notin_90[in875_notin_90["fuzzy_match_875"].map(lambda lst: any([s.find('Adamdun') != -1 for s in lst]))]

Since manually going over the names takes a lot of time and labor, we will focus on the 33 good matches at this time.

Here we will add the coordinate information to the matched locations.

def add_coordinates(str):
    s = location[location["Name"]==str]['geometry']
    if len(s) > 0:
        return s.iloc[0]
    else:
        return None

unique_matches['coordinates'] = unique_matches["fuzzy_match"].map(add_coordinates)
unique_matches = unique_matches.dropna().reset_index(drop=True)
unique_matches.head()
ip geo_name cleaned_geo_name fuzzy_match coordinates
0 P121982 Umma[1]SN Umma Umma (Tell Jokha) POINT Z (45.88767609164621 31.66743001625768 0)
1 P136601 Isin[1]SN Isin Isin (Ishan Bahriyat) POINT Z (45.26937063450065 31.88437701508951 0)
2 P110363 Anšan[1]SN Anšan Anšan (Tell Malyan) POINT Z (52.41066812180781 30.01111323889061 0)
3 P324788 Zabalam[1]SN Zabalam Zabalam (Tulul Ibzaikh) POINT Z (45.87575461093311 31.74472924832169 0)
4 P118180 Eridug[1]SN Eridug Eridu (Tell Abu Shahrain) POINT Z (45.99672447851623 30.81686971277664 0)

Now we will use NetworkX to plot the locations in a graph. We have a few additional libraries to install and import.

!pip install contextily
!pip install cartopy
!pip uninstall shapely
!pip install shapely --no-binary shapely
Requirement already satisfied: contextily in /usr/local/lib/python3.7/dist-packages (1.1.0)
Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from contextily) (1.0.1)
Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from contextily) (2.23.0)
Requirement already satisfied: pillow in /usr/local/lib/python3.7/dist-packages (from contextily) (7.1.2)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from contextily) (3.2.2)
Requirement already satisfied: mercantile in /usr/local/lib/python3.7/dist-packages (from contextily) (1.2.1)
Requirement already satisfied: rasterio in /usr/local/lib/python3.7/dist-packages (from contextily) (1.2.3)
Requirement already satisfied: geopy in /usr/local/lib/python3.7/dist-packages (from contextily) (1.17.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->contextily) (2020.12.5)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->contextily) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->contextily) (1.25.11)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->contextily) (2.10)
Requirement already satisfied: numpy>=1.11 in /usr/local/lib/python3.7/dist-packages (from matplotlib->contextily) (1.19.5)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->contextily) (2.4.7)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->contextily) (1.3.1)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->contextily) (2.8.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->contextily) (0.10.0)
Requirement already satisfied: click>=3.0 in /usr/local/lib/python3.7/dist-packages (from mercantile->contextily) (7.1.2)
Requirement already satisfied: cligj>=0.5 in /usr/local/lib/python3.7/dist-packages (from rasterio->contextily) (0.7.1)
Requirement already satisfied: affine in /usr/local/lib/python3.7/dist-packages (from rasterio->contextily) (2.3.0)
Requirement already satisfied: snuggs>=1.4.1 in /usr/local/lib/python3.7/dist-packages (from rasterio->contextily) (1.4.7)
Requirement already satisfied: click-plugins in /usr/local/lib/python3.7/dist-packages (from rasterio->contextily) (1.1.1)
Requirement already satisfied: attrs in /usr/local/lib/python3.7/dist-packages (from rasterio->contextily) (21.2.0)
Requirement already satisfied: geographiclib<2,>=1.49 in /usr/local/lib/python3.7/dist-packages (from geopy->contextily) (1.50)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.1->matplotlib->contextily) (1.15.0)
Requirement already satisfied: cartopy in /usr/local/lib/python3.7/dist-packages (0.19.0.post1)
Requirement already satisfied: pyshp>=2 in /usr/local/lib/python3.7/dist-packages (from cartopy) (2.1.3)
Requirement already satisfied: shapely>=1.5.6 in /usr/local/lib/python3.7/dist-packages (from cartopy) (1.7.1)
Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib/python3.7/dist-packages (from cartopy) (1.19.5)
Uninstalling Shapely-1.7.1:
  Would remove:
    /usr/local/lib/python3.7/dist-packages/Shapely-1.7.1-py3.7.egg-info
    /usr/local/lib/python3.7/dist-packages/shapely/*
Proceed (y/n)? y
  Successfully uninstalled Shapely-1.7.1
Collecting shapely
  Using cached https://files.pythonhosted.org/packages/42/f3/0e1bc2c4f15e05e30c6b99322b9ddaa2babb3f43bc7df2698efdc1553439/Shapely-1.7.1.tar.gz
Skipping wheel build for shapely, due to binaries being disabled for it.
ERROR: albumentations 0.1.12 has requirement imgaug<0.2.7,>=0.2.5, but you'll have imgaug 0.2.9 which is incompatible.
Installing collected packages: shapely
    Running setup.py install for shapely ... ?25l?25hdone
Successfully installed shapely-1.7.1
from libpysal import weights, examples
from contextily import add_basemap
import matplotlib.pyplot as plt
import networkx as nx
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
import cartopy
/usr/local/lib/python3.7/dist-packages/geopandas/_compat.py:110: UserWarning: The Shapely GEOS version (3.6.2-CAPI-1.10.2 4d2925d6) is incompatible with the GEOS version PyGEOS was compiled with (3.9.1-CAPI-1.14.2). Conversions between both will be slow.
  shapely_geos_version, geos_capi_version_string

GeoPandas dataframe requires a geometry attribute, we construct geo_um.

coordinates = np.column_stack((unique_matches["coordinates"].map(lambda p: p.x), unique_matches["coordinates"].map(lambda p: p.y)))
geo_um = gpd.GeoDataFrame(unique_matches, geometry = list(unique_matches['coordinates']))
geo_um.head()
ip geo_name cleaned_geo_name fuzzy_match coordinates geometry
0 P121982 Umma[1]SN Umma Umma (Tell Jokha) POINT Z (45.88767609164621 31.66743001625768 0) POINT Z (45.88768 31.66743 0.00000)
1 P136601 Isin[1]SN Isin Isin (Ishan Bahriyat) POINT Z (45.26937063450065 31.88437701508951 0) POINT Z (45.26937 31.88438 0.00000)
2 P110363 Anšan[1]SN Anšan Anšan (Tell Malyan) POINT Z (52.41066812180781 30.01111323889061 0) POINT Z (52.41067 30.01111 0.00000)
3 P324788 Zabalam[1]SN Zabalam Zabalam (Tulul Ibzaikh) POINT Z (45.87575461093311 31.74472924832169 0) POINT Z (45.87575 31.74473 0.00000)
4 P118180 Eridug[1]SN Eridug Eridu (Tell Abu Shahrain) POINT Z (45.99672447851623 30.81686971277664 0) POINT Z (45.99672 30.81687 0.00000)

We try to build connections using p-numbers: since the same geographic name can appear in multiple p-numbers, so we add an edge if two places appear in the same p-number. But unfortunately, we can’t find two places appearing in the same p-number, so instead we plotted a fully connected graph.

matches.head()
ip geo_name cleaned_geo_name fuzzy_match
9 P121982 Umma[1]SN Umma [Umma (Tell Jokha)]
16 P465780 Umma[1]SN Umma [Umma (Tell Jokha)]
19 P141679 Umma[1]SN Umma [Umma (Tell Jokha)]
27 P100676 Umma[1]SN Umma [Umma (Tell Jokha)]
56 P131594 Umma[1]SN Umma [Umma (Tell Jokha)]
name_pnum = matches.groupby(['cleaned_geo_name'])['ip'].apply(list).reset_index(name = 'pnum')
name_pnum.head()
cleaned_geo_name pnum
0 Adab [P325875, P126986, P458586, P123317, P201858, ...
1 Anšan [P110363, P356008, P113518, P406257, P131214, ...
2 Awal [P107255, P145800, P126531, P107256, P112023]
3 Aššur [P117479, P143751, P134727, P248907, P126176]
4 Babila [P101457, P129473, P119451, P103818]
g = nx.Graph()

#Add nodes to the graph
for place in name_pnum['cleaned_geo_name']:
  g.add_node(place)

#Add edges
num_edges = 0
for place1 in name_pnum['cleaned_geo_name'].values:
  for place2 in name_pnum['cleaned_geo_name'].values:
    if place1 != place2:
      weight = 1
      # for pnum1 in name_pnum[name_pnum['cleaned_geo_name']==place1]['pnum'].values[0]:
      #   for pnum2 in name_pnum[name_pnum['cleaned_geo_name']==place2]['pnum'].values[0]:
      #     if pnum1 == pnum2:
      #       weight += 1
      #       num_edges += 1
      g.add_edge(place1, place2)

pos = {}
for place in name_pnum['cleaned_geo_name'].values:
  loc = unique_matches[unique_matches['cleaned_geo_name']==place]["coordinates"].values[0]
  pos[place] = (loc.x, loc.y)
crs = ccrs.PlateCarree()
fig, ax = plt.subplots(
    1, 1, figsize=(12, 8), subplot_kw=dict(projection=crs))
ax.add_feature(cartopy.feature.OCEAN)
ax.add_feature(cartopy.feature.LAND, edgecolor='black')
ax.add_feature(cartopy.feature.LAKES, edgecolor='black')
ax.add_feature(cartopy.feature.RIVERS)
ax.gridlines()
# Extent of .
ax.set_extent([30, 55, 25, 40])
nx.draw_networkx(g, ax=ax,
                 font_size=10,
                 alpha=.8,
                 width=.075,
                 pos=pos,
                 cmap=plt.cm.autumn)

We can zoom in a little to the area where we have a lot of known geo names (around Drehem).

crs = ccrs.PlateCarree()
fig, ax = plt.subplots(
    1, 1, figsize=(12, 8), subplot_kw=dict(projection=crs))
ax.add_feature(cartopy.feature.OCEAN)
ax.add_feature(cartopy.feature.LAND, edgecolor='black')
ax.add_feature(cartopy.feature.LAKES, edgecolor='black')
ax.add_feature(cartopy.feature.RIVERS)
ax.gridlines()
# Extent of .
ax.set_extent([42, 48, 30, 35])
nx.draw_networkx(g, ax=ax,
                 font_size=10,
                 alpha=.8,
                 width=.075,
                 pos=pos,
                 cmap=plt.cm.autumn)

This is in no way completed, potential next step might include interactive graphs like those realized in Gephi (potentially using Bokeh?) and explore other methods to set up edges (currently there are no two places with a same p number so I just simply connected them) and edge weights.