VII. Commodities Counting¶

1 Adding Quantity/Number Information to Lemmas¶

The first step is to extract lemmas representing numbers from the rest of our data. To do this, we simply look for all words whose lemma ends in ‘NU’, storing these in the variable numbers_df. Note that the indexes are not re-assigned from words_df, so we can still use numbers_df to determine the index of a number-lemma in the words_df frame.

# note that 'NU$' is a regular expression, so the $ indicates that the 'NU'
# must be at the end of the string.
numbers_df = words_df[words_df['lemma'].str.contains('NU$')].copy()

With that out of the way, we next need to do the following:

Identify all consecutive sequences of number-lemmas (‘number blocks’) on a line.
Develop a method to convert the number blocks from (1) into integer or decimal format.
Associate the lemma that immediately follows a number block from (1) with the converted integer from (2), and add this information back into the words_df frame.

1.1 Identifying Number Blocks¶

def search_for_number_blocks(numb_df):
  number_blocks = [] # this will be a list-of-lists, which each sub-list 
                     # representing the indexes of one number block   
  
  current_block = [] # the current number block we are building
  
  # loop through all number lemmas
  for i in range(0, len(numb_df.index) - 2):
    # note that these represent indexes in the words_df frame as whole
    this_index = numb_df.index[i]
    next_index = numb_df.index[i+1]
    current_block.append(numb_df.index[i])
    
    # there are 3 cases where we have reached the end of a block. First, if the
    # index of the next lemma isn't immediately after the current one. Second,
    # if the next lemma is on a different line. Third, if the next lemma is in a
    # different tablet (to deal with the case of two one-line tablets in a row).
    if next_index != this_index + 1:
      number_blocks.append(current_block)
      current_block = []
    elif numb_df.at[this_index, 'id_line'] != numb_df.at[next_index, 'id_line']:
      number_blocks.append(current_block)
      current_block = []
    elif numb_df.at[this_index, 'id_text'] != numb_df.at[next_index, 'id_text']:     
      number_blocks.append(current_block)
      current_block = []
  return number_blocks

1.2 Converting Number Blocks to a Numerical Format¶

The second part is a bit more involved. We break it up into three separate functions, with the last function using the first two.

In the first of these, add_number_block_column, we both check that the words following a number block are indeed on the same line, and if so add the block to a new dataframe indexed by the location of the word that follows.

Second, we have convert_single_counting_number, which converts the (Arabic) numerals given in the lemmatization into a proper number.

Third, we build on convert_single_counting_number in get_comm_count. This uses the previous function to convert the Arabic numerals in a number block, then multiplies that by the approporiate weight of the unit used in the Sumerian number system.

def add_number_block_column(number_blocks, words_frame):
  out_words_frame = words_frame.copy() 
  item_locs = []
  good_blocks = []
  for block in number_blocks:
    if words_frame.at[block[-1], 'id_line'] != words_frame.at[block[-1] + 1, 'id_line']:
      pass
    elif words_frame.at[block[-1], 'id_text'] != words_frame.at[block[-1] + 1, 'id_text']:
      pass
    else:
      item_locs.append(block[-1] + 1)
      good_blocks.append(block)
  number_indexes = pd.Series(data=good_blocks, index=item_locs)
  return number_indexes

def convert_single_counting_number(num_str):
  count = 0
  grouping_name = ''
  bracket_string = re.search('\[.*\]', num_str).group()
  if bracket_string == '[]':
    count = float(Fraction(re.match('^[0-9/]*', num_str).group()))
    grouping_name = re.search('\(.*\)', num_str).group()[1:][:-1]
  else:
    count = 1
    grouping_name = num_str[0:num_str.index('[')]
  return [count, grouping_name]

# A dictionary for converting terms to the quantity the represent.
grouping_dict = {'diš@t':1, 'diš': 1, 'min':2,'eš':3, 'limmu':4, 'ja':5, 'aš':6, 
            'aš@c':6, 'umun':7,'ussu':8,'ilimmu':9,'u':10, 
            'niš':20, 'ušu':30, 'nimin': 40, 'ninnu':50, 
            'geš₂':60, 'ĝeš-min':120, 'gešʾu':600, 
            'ĝes-u-min':1200, 'šar₂':3600,'sila': 1, 'barig':60, 'ban₂':10, 
            'gur':300}

def get_comm_count(word_index, comm_series, numb_df):
  number_block = comm_series.at[word_index]
  numb_df.loc[number_block, 'lemma']
  converted_nums = numb_df.loc[number_block, 'lemma'].apply(convert_single_counting_number)
  try:
    return sum(converted_nums.apply(lambda x: (x[0] * grouping_dict[x[1]]))), list(converted_nums.apply(lambda x: x[1]))
  except:
    return None, list(converted_nums.apply(lambda x: x[1]))

1.3 Adding Number Values to words_df¶

Given that the previous steps have been performed, the final procedure is to associated the calculated number with a lemma in words_df. In particular, we attach it to the lemma that immediately follows the number block, but in the future, a more sophisticated method may be used to ensure it is attached to a commodity.

def add_comm_counts_to_words_df(words_frame, comm_idx, numb_df):
  counts = []
  units = []
  for index in comm_idx.index:
    count, unit = get_comm_count(index, comm_idx, numb_df)
    counts.append(count)
    units.append(unit)
  comm_count = pd.Series(data = counts, index = comm_idx.index)
  units_found = pd.Series(data = units, index = comm_idx.index)
  new_words_frame = words_frame.copy()
  new_words_frame['comm_count'] = comm_count
  new_words_frame['units_found'] = units_found
  return new_words_frame

# Ensure the lemma string is in one of the correct formats.
numbers_df = numbers_df.loc[( numbers_df['lemma'].str.match('[0-9/]*\(.*\)\[\]NU') | numbers_df['lemma'].str.match('.*\[.+\]NU'))]
comm_idx = add_number_block_column(search_for_number_blocks(numbers_df), words_df)

out = add_comm_counts_to_words_df(words_df, comm_idx, numbers_df)

2 Cleaning Our Data¶

There are several uncertainties and inaccuracies that remain in our data. For one, much of our future analysis will look at changes in commodities over time, but many of the tablets have near-illegible or multiple dates. Before we continue, we must filter these out.

Secondly, lemmas with the same transliteration are occasionally mistaken for one another. This can be quite problematic since oftentimes the typical quantity of one is much greater than the other. To attempt to detect when this has occurred, we eliminate commodities whose largest count is over five times that of its second largest.

Another potential issue is that many lemmas were only used in a few tablets from a limited selection of years. In such a case they tell us very little about overall changes to the Ur III economy throughout its existence. Therefore we exclude commodities that occur in fewer than 10 years.

valid_counts = out.dropna()
valid_counts = valid_counts[(valid_counts['questionable'] == False) & (valid_counts['metadata_source_x'] == 'BDTNS') & (valid_counts['multiple_dates'] == False) & (valid_counts['ftype'] != 'yn')]

unreliable_lemmas = []
for lemma in list(set(valid_counts['lemma'])):
  ordered_vals = list(valid_counts[valid_counts['lemma'] == lemma]['comm_count'].sort_values(ascending = False))
  years = list(valid_counts[valid_counts['lemma'] == lemma]['year'])
  if len(ordered_vals) > 1 and ordered_vals[0] > ordered_vals[1]*5: # filter out data with likely mislemmatization
    unreliable_lemmas.append(lemma)
  elif '[unit]' in lemma: # if this is a unit of measurement, for now we ignore it
    unreliable_lemmas.append(lemma)
  elif len(years) < 10: # if a lemma only appears in a few years throw it out
    unreliable_lemmas.append(lemma)

valid_counts = valid_counts.drop(valid_counts['lemma'][valid_counts['lemma'].isin(unreliable_lemmas)].index)

3 Refining Our Commodity Search¶

class Lemma:
  def __init__(self, transliteration = '.*', definition = '.*', pos = '.*', plaintext_alias = None):
    self.transliteration = transliteration
    self.definition = definition
    self.pos = pos
    self.regex_seed = f'(({transliteration})\[({definition})\]({pos}))'
    self.plaintext_alias = plaintext_alias
  
  def lookup(self, words_frame):
    return words_frame['lemma'].str.contains(self.regex_seed)
  
  def merge_matches(self, words_frame):
    out_frame = words_frame.copy()
    locations = self.lookup(out_frame)
    if ~locations.any():
      return out_frame
    replacement_string = self.plaintext_alias if self.plaintext_alias is not None else re.escape(self.regex_seed)
    out_frame['lemma'][locations] = replacement_string
    return out_frame

def lookup_lines_of_lemma(lemma, words_frame):
  filtered_frame = words_frame.copy()
  filtered_frame = filtered_frame[filtered_frame['lemma'].str.match(lemma.regex_seed)]
  return filtered_frame[['id_text', 'id_line', 'lemma']].groupby(by=['id_text', 'id_line']).indices

def lookup_forward_neighbors(lemma, words_frame, leap_size = 1):
  return words_frame.loc[(words_frame.loc[words_frame['lemma'].str.fullmatch(lemma.regex_seed)].index + 1)[:(-leap_size)]]

def lookup_backwards_neighbors(lemma, words_frame, leap_size = 1):
  return words_frame.loc[(words_frame.loc[words_frame['lemma'].str.fullmatch(lemma.regex_seed)].index - leap_size)[leap_size:]]

reduced_words_df = words_df[['lemma', 'id_text', 'id_line', 'id_word', 'label']]

delivery_texts = [key[0] for key in lookup_lines_of_lemma(Lemma('mu.DU','delivery'), reduced_words_df).keys()]
expenditure_texts = [key[0] for key in lookup_lines_of_lemma(Lemma('ziga', 'expenditure'), reduced_words_df).keys()]
expenditure_texts.extend([key[0] for key in lookup_lines_of_lemma(Lemma('zig', 'rise'), reduced_words_df).keys()])

# If a text only has mu.DU, then add it to delivery
valid_counts['delivery'] = valid_counts['id_text'].isin(delivery_texts) & ~valid_counts['id_text'].isin(expenditure_texts)

# If a text has zig or ziga, regardless of whether it has mu.DU, add it to expenditure
valid_counts['expenditure'] = valid_counts['id_text'].isin(expenditure_texts)

# Lastly, create a category that contains both sets of texts
valid_counts['delivery_or_expenditure'] = valid_counts['id_text'].isin(expenditure_texts) | valid_counts['id_text'].isin(delivery_texts)

lookup_backwards_neighbors(Lemma('mu.DU', 'delivery', 'N'), reduced_words_df, 2)

	lemma	id_text	id_line	id_word	label
220	Urum[1]SN	P100217	12	P100217.12.4	r 2
260	ensik[ruler]N	P100219	8	P100219.8.1	o 6
303	1(diš)[]NU	P100221	4	P100221.4.1	o 2
366	ud[sun]N	P100222	16	P100222.16.1	r 6
505	sila[lamb]N	P100228	4	P100228.4.2	o 2
...	...	...	...	...	...
579201	sila[lamb]N	P517184	4	P517184.4.2	o 2
579208	sila[lamb]N	P517184	6	P517184.6.2	o 4
579217	sila[lamb]N	P517184	9	P517184.9.2	o 7
579226	niga[fattened]V/i	P517184	12	P517184.12.3	o 10
579474	ur-{d}ištaran[]PN	P519792	29	P519792.29.1	r 10

4793 rows × 5 columns

4 Exploring all the Texts¶

generic_goat_lemma = Lemma(definition='~?goat', plaintext_alias='maš[goat]N')
generic_sheep_lemma = Lemma(definition='~?sheep', plaintext_alias='udu[sheep]N')

comm_counts = valid_counts.copy()
comm_counts = comm_counts.loc[(comm_counts['lemma'].str.contains(']N$'))]
comm_counts = comm_counts.loc[~(comm_counts['lemma'].str.contains('year$'))]

sheep_n_goats = generic_goat_lemma.merge_matches(comm_counts)
sheep_n_goats = generic_sheep_lemma.merge_matches(sheep_n_goats)

/usr/local/lib/python3.7/dist-packages/pandas/core/strings.py:2001: UserWarning: This pattern has match groups. To actually get the groups, use str.extract.
  return func(self, *args, **kwargs)
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:18: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

commodities_by_year = valid_counts[['lemma', 'min_year', 'comm_count']].groupby(by=['lemma', 'min_year']).sum()

sheep_n_goats_by_year = sheep_n_goats[['lemma', 'min_year', 'comm_count']].groupby(by=['lemma', 'min_year']).sum()

commodities_by_year.sort_values(by = 'comm_count', ascending=False)

		comm_count
lemma	min_year
udu[sheep]N	71.0	91183.000000
udu[sheep]N	70.0	87813.000000
x[NA]NA	70.0	85703.000000
gun[load]N	85.0	82026.000000
itud[moon]N	83.0	55220.000000
...	...	...
ma-na[]PN	63.0	0.500000
a-geš-IGI-ri[]PN	83.0	0.500000
niŋlalarum[dessert]N	84.0	0.500000
išah[lard]N	70.0	0.500000
šu-{d}iškur[]PN	58.0	0.333333

2688 rows × 1 columns

tablets_by_time = words_df[(words_df['questionable'] == False) & (words_df['metadata_source_x'] == 'BDTNS') & (words_df['multiple_dates'] == False)][['min_year', 'id_text']].groupby(by = 'min_year').nunique()

words_by_time = words_df[(words_df['questionable'] == False) & (words_df['metadata_source_x'] == 'BDTNS') & (words_df['multiple_dates'] == False)][['min_year', 'id_text']].groupby(by = 'min_year').count()

import matplotlib.pyplot as plt
plt.plot((commodities_by_year.loc['udu[sheep]N']['comm_count']/tablets_by_time['id_text']).index, commodities_by_year.loc['udu[sheep]N']['comm_count']/tablets_by_time['id_text'])
plt.plot((commodities_by_year.loc['maš[goat]N']['comm_count']/tablets_by_time['id_text']).index, commodities_by_year.loc['maš[goat]N']['comm_count']/tablets_by_time['id_text'])
#plt.plot(commodities_by_year.loc['maš[goat]N'].index,  commodities_by_year.loc['maš[goat]N']['comm_count']/tablets_by_time['id_text'])
plt.plot(tablets_by_time.index, tablets_by_time/10)

[<matplotlib.lines.Line2D at 0x7f3698f827d0>]

_images/PartVII-CommoditiesCounting_40_1.png

plt.plot((sheep_n_goats_by_year.loc['udu[sheep]N']['comm_count']/tablets_by_time['id_text']).index, sheep_n_goats_by_year.loc['udu[sheep]N']['comm_count']/tablets_by_time['id_text'])
plt.plot((sheep_n_goats_by_year.loc['maš[goat]N']['comm_count']/tablets_by_time['id_text']).index, sheep_n_goats_by_year.loc['maš[goat]N']['comm_count']/tablets_by_time['id_text'])
plt.plot(tablets_by_time.index, tablets_by_time/10)

[<matplotlib.lines.Line2D at 0x7f3698f631d0>]

_images/PartVII-CommoditiesCounting_41_1.png

plt.plot((commodities_by_year.loc['udu[sheep]N']['comm_count']/words_by_time['id_text']).index, commodities_by_year.loc['udu[sheep]N']['comm_count']/words_by_time['id_text'])

[<matplotlib.lines.Line2D at 0x7f3698f3d950>]

_images/PartVII-CommoditiesCounting_42_1.png

relative_spike = (commodities_by_year.loc['udu[sheep]N']['comm_count']/words_by_time['id_text']).dropna()

relative_spike.sort_values(ascending=False)

min_year
0    3.415813
0    2.996315
0    2.518033
0    2.232092
0    2.089681
0    2.009363
0    1.976220
0    1.852672
0    1.787211
0    1.785718
0    1.743820
0    1.717206
0    1.666760
0    1.517521
0    1.450986
0    1.407816
0    1.250905
0    1.245296
0    1.225937
0    1.174519
0    1.074880
0    1.038360
0    0.981723
0    0.713147
0    0.710645
0    0.697853
0    0.678775
0    0.648679
0    0.575142
0    0.554688
0    0.550728
0    0.488782
0    0.238155
0    0.235935
0    0.151741
0    0.151239
0    0.118158
0    0.113158
0    0.057937
0    0.050339
0    0.037833
dtype: float64

plt.plot(commodities_by_year.loc['u[ewe]N'].index, commodities_by_year.loc['u[ewe]N'])

plt.xlim(30, 90)

(30.0, 90.0)

_images/PartVII-CommoditiesCounting_45_1.png

5 Exploring Expenditure Texts¶

delivery_by_year = valid_counts.loc[valid_counts['delivery'], ['lemma', 'min_year', 'comm_count']].groupby(by=['lemma', 'min_year']).sum()
expenditure_by_year = valid_counts.loc[valid_counts['expenditure'], ['lemma', 'min_year', 'comm_count']].groupby(by=['lemma', 'min_year']).sum()
both_by_year =  valid_counts.loc[valid_counts['delivery_or_expenditure'] , ['lemma', 'min_year', 'comm_count']].groupby(by=['lemma', 'min_year']).sum()

A comparison of udu and mash (per tablet) over time in expenditure texts.

expenditure_by_year.loc['ninda[bread]N']['comm_count']

min_year
58.0    120.0
61.0     60.0
65.0    140.0
75.0    250.0
Name: comm_count, dtype: float64

plt.plot((expenditure_by_year.loc['udu[sheep]N']['comm_count']/tablets_by_time['id_text']).index, expenditure_by_year.loc['udu[sheep]N']['comm_count']/tablets_by_time['id_text'])
plt.plot((expenditure_by_year.loc['maš[goat]N']['comm_count']/tablets_by_time['id_text']).index, expenditure_by_year.loc['maš[goat]N']['comm_count']/tablets_by_time['id_text'])
#plt.plot((expenditure_by_year.loc['gun[load]N']['comm_count']/tablets_by_time['id_text']).index, expenditure_by_year.loc['gun[load]N']['comm_count']/tablets_by_time['id_text'])
plt.plot((expenditure_by_year.loc['gud[ox]N']['comm_count']/tablets_by_time['id_text']).index, expenditure_by_year.loc['gud[ox]N']['comm_count']/tablets_by_time['id_text'])
plt.plot((expenditure_by_year.loc['ninda[bread]N']['comm_count']/tablets_by_time['id_text']).index, expenditure_by_year.loc['ninda[bread]N']['comm_count']/tablets_by_time['id_text'])


#plt.plot(commodities_by_year.loc['maš[goat]N'].index,  commodities_by_year.loc['maš[goat]N']['comm_count']/tablets_by_time['id_text'])
#plt.plot(tablets_by_time.index, tablets_by_time/10)

[<matplotlib.lines.Line2D at 0x7f3698e1a650>]

_images/PartVII-CommoditiesCounting_50_1.png

A comparison of udu in expediture and delivery texts.

plt.plot((expenditure_by_year.loc['udu[sheep]N']['comm_count']/tablets_by_time['id_text']).index, expenditure_by_year.loc['udu[sheep]N']['comm_count']/tablets_by_time['id_text'])
plt.plot((delivery_by_year.loc['udu[sheep]N']['comm_count']/tablets_by_time['id_text']).index, delivery_by_year.loc['udu[sheep]N']['comm_count']/tablets_by_time['id_text'])

[<matplotlib.lines.Line2D at 0x7f3698d892d0>]

_images/PartVII-CommoditiesCounting_52_1.png

A raw comparison of udu and mash in expenditure texts (not adjusted for number of tablets).

plt.plot((expenditure_by_year.loc['udu[sheep]N']['comm_count']).index, expenditure_by_year.loc['udu[sheep]N']['comm_count'])
plt.plot((expenditure_by_year.loc['maš[goat]N']['comm_count']).index, expenditure_by_year.loc['maš[goat]N']['comm_count'])

[<matplotlib.lines.Line2D at 0x7f369a23a250>]

_images/PartVII-CommoditiesCounting_54_1.png

A comparison of udu and mash in delivery (also adjusted).

plt.plot((delivery_by_year.loc['udu[sheep]N']['comm_count']/tablets_by_time['id_text']).index, delivery_by_year.loc['udu[sheep]N']['comm_count']/tablets_by_time['id_text'])
plt.plot((delivery_by_year.loc['maš[goat]N']['comm_count']/tablets_by_time['id_text']).index, delivery_by_year.loc['maš[goat]N']['comm_count']/tablets_by_time['id_text'])

[<matplotlib.lines.Line2D at 0x7f3699b63690>]

_images/PartVII-CommoditiesCounting_56_1.png

Sumerian Networks