{ "cells": [ { "cell_type": "markdown", "id": "9ecfcfe3", "metadata": { "id": "C6xk0NIh0VlU" }, "source": [ "# IV. Archive Classification\n", "\n", "The primary question which we pursue in this section is how one can use reproducible and replicable workflows for discovering the optimal classifications of the text groups from the Drehem texts, found in an unprovenanced archival context. We describe how we leverage existing classification models to help validate our findings. " ] }, { "cell_type": "code", "execution_count": null, "id": "a08cdc06", "metadata": { "id": "lLPJzR-ravyd" }, "outputs": [], "source": [ "# import necessary libraries\n", "import pandas as pd\n", "from tqdm.auto import tqdm\n", "\n", "# import libraries for this section\n", "import re\n", "import matplotlib.pyplot as plt\n", "\n", "# import ML models from sklearn\n", "from sklearn.linear_model import LogisticRegression\n", "from sklearn.neighbors import KNeighborsClassifier\n", "from sklearn.decomposition import PCA\n", "from sklearn.naive_bayes import BernoulliNB\n", "from sklearn.naive_bayes import GaussianNB\n", "from sklearn import svm\n", "# import train_test_split function\n", "from sklearn.model_selection import train_test_split\n", "from sklearn import metrics" ] }, { "cell_type": "markdown", "id": "5ff76687", "metadata": { "id": "-hkY_R8ia6WJ" }, "source": [ "## 1 Set up Data\n", "Create a dictionary of archive categories" ] }, { "cell_type": "markdown", "id": "4c13bbbd", "metadata": { "id": "qS7sRldSHRMk" }, "source": [ "### 1.1 Labeling the Training Data\n", "\n", "We will be labeling the data according to what words show up in it." ] }, { "cell_type": "code", "execution_count": null, "id": "5a4b368d", "metadata": { "id": "fK6LfbUHbDfu" }, "outputs": [], "source": [ "labels = dict()\n", "labels['domesticated_animal'] = ['ox', 'cow', 'sheep', 'goat', 'lamb', '~sheep', 'equid'] # account for plural\n", "#split domesticated into large and small - sheep, goat, lamb, ~sheep would be small domesticated animals\n", "labels['wild_animal'] = ['bear', 'gazelle', 'mountain'] # account for 'mountain animal' and plural\n", "labels['dead_animal'] = ['die'] # find 'die' before finding domesticated or wild\n", "labels['leather_object'] = ['boots', 'sandals']\n", "labels['precious_object'] = ['copper', 'bronze', 'silver', 'gold']\n", "labels['wool'] = ['wool', '~wool']\n", "# labels['queens_archive'] = []" ] }, { "cell_type": "markdown", "id": "340ead64", "metadata": { "id": "wuSA3G_ibFOJ" }, "source": [ "Using filtered_with_neighbors.csv generated above, make P Number and id_line the new indices.\n", "\n", "Separate components of the lemma." ] }, { "cell_type": "code", "execution_count": null, "id": "ffaf43ae", "metadata": { "id": "vWjeplv-fxqL" }, "outputs": [], "source": [ "words_df = pd.read_pickle('https://gitlab.com/yashila.bordag/sumnet-data/-/raw/main/part_3_words_output.p') # uncomment to read from online file\n", "#words_df = pd.read_pickle('output/part_3_output.p') #uncomment to read from local file" ] }, { "cell_type": "code", "execution_count": null, "id": "42d93947", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 506 }, "id": "_J7MkHrVbHy9", "outputId": "72f3f9a0-f47b-4ace-8767-d7dcfa80b8f7" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lemmaid_textid_wordlabeldatedates_referencespublicationcollectionmuseum_noftypemetadata_source_xprof?role?family?number?commodity?neighborsmin_yearmax_yearmin_monthmax_monthdiri_monthmin_daymax_dayquestionableother_meta_charsmultiple_datesmetadata_source_y012
pnid_line
10004136(diš)[]NUP100041P100041.3.1o 1SSXX - 00 - 00SSXX - 00 - 00AAS 053Louvre Museum, Paris, FranceAO 20313BDTNSNoNoNoYesNo[]01False01FalseFalseFalseBDTNS6(diš)NU
3udu[sheep]NP100041P100041.3.2o 1SSXX - 00 - 00SSXX - 00 - 00AAS 053Louvre Museum, Paris, FranceAO 20313BDTNSNoNoNoNoNo[]01False01FalseFalseFalseBDTNSudusheepN
4kišib[seal]NP100041P100041.4.1o 2SSXX - 00 - 00SSXX - 00 - 00AAS 053Louvre Museum, Paris, FranceAO 20313BDTNSNoNoNoNoNo[]01False01FalseFalseFalseBDTNSkišibsealN
4Lusuen[0]PNP100041P100041.4.2o 2SSXX - 00 - 00SSXX - 00 - 00AAS 053Louvre Museum, Paris, FranceAO 20313BDTNSNoNoNoNoNo[6(diš)[]NU, udu[sheep]N, kišib[seal]N, Lusuen...01False01FalseFalseFalseBDTNSLusuen0PN
5ki[place]NP100041P100041.5.1o 3SSXX - 00 - 00SSXX - 00 - 00AAS 053Louvre Museum, Paris, FranceAO 20313BDTNSNoNoNoNoNo[]01False01FalseFalseFalseBDTNSkiplaceN
\n", "
" ], "text/plain": [ " lemma id_text id_word ... 0 1 2\n", "pn id_line ... \n", "100041 3 6(diš)[]NU P100041 P100041.3.1 ... 6(diš) NU\n", " 3 udu[sheep]N P100041 P100041.3.2 ... udu sheep N\n", " 4 kišib[seal]N P100041 P100041.4.1 ... kišib seal N\n", " 4 Lusuen[0]PN P100041 P100041.4.2 ... Lusuen 0 PN\n", " 5 ki[place]N P100041 P100041.5.1 ... ki place N\n", "\n", "[5 rows x 31 columns]" ] }, "execution_count": 41, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "data = words_df.copy()\n", "data.loc[:, 'pn'] = data.loc[:, 'id_text'].str[-6:].astype(int)\n", "data = data.set_index(['pn', 'id_line']).sort_index()\n", "extracted = data.loc[:, 'lemma'].str.extract(r'(\\S+)\\[(.*)\\](\\S+)')\n", "data = pd.concat([data, extracted], axis=1)\n", "data = data.fillna('') #.dropna() ????\n", "data.head()" ] }, { "cell_type": "code", "execution_count": null, "id": "9d484fe9", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "UKkYphJffS7_", "outputId": "2886b384-39c3-42a2-a8ca-eaca58cd3464" }, "outputs": [ { "data": { "text/plain": [ "o 1 46955\n", "o 2 37850\n", "o 3 36878\n", "r 3 33824\n", "o 4 32289\n", " ... \n", "env o 10 1\n", "o vii' 4 1\n", "o ii 50 1\n", "a i 9 1\n", "seal S000081 ii 4 1\n", "Name: label, Length: 2032, dtype: int64" ] }, "execution_count": 42, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "data['label'].value_counts()" ] }, { "cell_type": "code", "execution_count": null, "id": "b8359832", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 506 }, "id": "shc_Pu4IbJ67", "outputId": "74a76380-207e-4e63-e384-4fc936d0a32d" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lemmaid_textid_wordlabeldatedates_referencespublicationcollectionmuseum_noftypemetadata_source_xprof?role?family?number?commodity?neighborsmin_yearmax_yearmin_monthmax_monthdiri_monthmin_daymax_dayquestionableother_meta_charsmultiple_datesmetadata_source_y012archive
pnid_line
10004136(diš)[]NUP100041P100041.3.1o 1SSXX - 00 - 00SSXX - 00 - 00AAS 053Louvre Museum, Paris, FranceAO 20313BDTNSNoNoNoYesNo[]01False01FalseFalseFalseBDTNS6(diš)NU
3udu[sheep]NP100041P100041.3.2o 1SSXX - 00 - 00SSXX - 00 - 00AAS 053Louvre Museum, Paris, FranceAO 20313BDTNSNoNoNoNoNo[]01False01FalseFalseFalseBDTNSudusheepNdomesticated_animal
4kišib[seal]NP100041P100041.4.1o 2SSXX - 00 - 00SSXX - 00 - 00AAS 053Louvre Museum, Paris, FranceAO 20313BDTNSNoNoNoNoNo[]01False01FalseFalseFalseBDTNSkišibsealN
4Lusuen[0]PNP100041P100041.4.2o 2SSXX - 00 - 00SSXX - 00 - 00AAS 053Louvre Museum, Paris, FranceAO 20313BDTNSNoNoNoNoNo[6(diš)[]NU, udu[sheep]N, kišib[seal]N, Lusuen...01False01FalseFalseFalseBDTNSLusuen0PN
5ki[place]NP100041P100041.5.1o 3SSXX - 00 - 00SSXX - 00 - 00AAS 053Louvre Museum, Paris, FranceAO 20313BDTNSNoNoNoNoNo[]01False01FalseFalseFalseBDTNSkiplaceN
\n", "
" ], "text/plain": [ " lemma id_text ... 2 archive\n", "pn id_line ... \n", "100041 3 6(diš)[]NU P100041 ... NU \n", " 3 udu[sheep]N P100041 ... N domesticated_animal\n", " 4 kišib[seal]N P100041 ... N \n", " 4 Lusuen[0]PN P100041 ... PN \n", " 5 ki[place]N P100041 ... N \n", "\n", "[5 rows x 32 columns]" ] }, "execution_count": 43, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "for archive in labels.keys():\n", " data.loc[data.loc[:, 1].str.contains('|'.join([re.escape(x) for x in labels[archive]])), 'archive'] = archive\n", "\n", "data.loc[:, 'archive'] = data.loc[:, 'archive'].fillna('')\n", "\n", "data.head()" ] }, { "cell_type": "markdown", "id": "bd6d513c", "metadata": { "id": "-5so2KJ5bLin" }, "source": [ "The function get_set has a dataframe row as an input and returns a dictionary where each key is a word type like NU and PN. The values are its corresponding lemmas." ] }, { "cell_type": "markdown", "id": "5e180b98", "metadata": { "id": "IFeAuQmDInDY" }, "source": [ "### 1.2 Data Structuring" ] }, { "cell_type": "code", "execution_count": null, "id": "63afc925", "metadata": { "id": "5kTJp996bNaz" }, "outputs": [], "source": [ "def get_set(df):\n", " \n", " d = {}\n", "\n", " seals = df[df['label'].str.contains('seal')]\n", " df = df[~df['label'].str.contains('seal')]\n", "\n", " for x in df[2].unique():\n", " d[x] = set(df.loc[df[2] == x, 0])\n", "\n", " d['SEALS'] = {}\n", " for x in seals[2].unique():\n", " d['SEALS'][x] = set(seals.loc[seals[2] == x, 0])\n", "\n", " return d" ] }, { "cell_type": "code", "execution_count": null, "id": "bff4408f", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "dgLTwrRebO-7", "outputId": "dd3479a1-5d36-4235-8864-e6c92daa070e" }, "outputs": [ { "data": { "text/plain": [ "{'': {''},\n", " 'MN': {'Šueša'},\n", " 'N': {'itud', 'maš', 'mu', 'mu.DU', 'udu'},\n", " 'NU': {'1(diš)', '2(diš)'},\n", " 'PN': {'Apilatum', 'Ku.ru.ub.er₃', 'Šulgisimti'},\n", " 'SEALS': {},\n", " 'SN': {'Šašrum'},\n", " 'V/i': {'hulu'},\n", " 'V/t': {'dab'}}" ] }, "execution_count": 45, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "get_set(data.loc[100271])" ] }, { "cell_type": "code", "execution_count": null, "id": "eb30aa35", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 230 }, "id": "rNidxpgcbQVq", "outputId": "0abc700c-248e-42b6-99c6-e989458c3dbe" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
archiveset
pn
100041{domesticated_animal}{'NU': {'6(diš)'}, 'N': {'ki', 'kišib', 'udu'}...
100189{dead_animal}{'NU': {'1(diš)', '5(diš)-kam', '2(diš)'}, 'N'...
100190{dead_animal}{'NU': {'3(u)', '1(diš)', '5(diš)', '1(diš)-ka...
100191{dead_animal}{'NU': {'1(diš)', '4(diš)', '4(diš)-kam', '2(u...
100211{dead_animal}{'NU': {'1(diš)', '1(u)', '1(diš)-kam', '2(diš...
\n", "
" ], "text/plain": [ " archive set\n", "pn \n", "100041 {domesticated_animal} {'NU': {'6(diš)'}, 'N': {'ki', 'kišib', 'udu'}...\n", "100189 {dead_animal} {'NU': {'1(diš)', '5(diš)-kam', '2(diš)'}, 'N'...\n", "100190 {dead_animal} {'NU': {'3(u)', '1(diš)', '5(diš)', '1(diš)-ka...\n", "100191 {dead_animal} {'NU': {'1(diš)', '4(diš)', '4(diš)-kam', '2(u...\n", "100211 {dead_animal} {'NU': {'1(diš)', '1(u)', '1(diš)-kam', '2(diš..." ] }, "execution_count": 46, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "archives = pd.DataFrame(data.groupby('pn').apply(lambda x: set(x['archive'].unique()) - set(['']))).rename(columns={0: 'archive'})\n", "archives.loc[:, 'set'] = data.reset_index().groupby('pn').apply(get_set)\n", "archives.loc[:, 'archive'] = archives.loc[:, 'archive'].apply(lambda x: {'dead_animal'} if 'dead_animal' in x else x)\n", "archives.head()" ] }, { "cell_type": "code", "execution_count": null, "id": "08da24cb", "metadata": { "id": "rOKf3s6qbSrG" }, "outputs": [], "source": [ "def get_line(row, pos_lst=['N']):\n", " words = {'pn' : [row.name]} #set p_number\n", " for pos in pos_lst:\n", " if pos in row['set']:\n", " #add word entries for all words of the selected part of speech\n", " words.update({word: [1] for word in row['set'][pos]})\n", " return pd.DataFrame(words)" ] }, { "cell_type": "markdown", "id": "c844a607", "metadata": { "id": "D-fon0TCLhxA" }, "source": [ "Each row represents a unique P-number, so the matrix indicates which word are present in each text." ] }, { "cell_type": "code", "execution_count": null, "id": "27c47c98", "metadata": { "id": "wgZkM0MCQzu3" }, "outputs": [], "source": [ "sparse = words_df.groupby(by=['id_text', 'lemma']).count()\n", "sparse = sparse['id_word'].unstack('lemma')\n", "sparse = sparse.fillna(0)" ] }, { "cell_type": "code", "execution_count": null, "id": "978fdb70", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 461 }, "id": "yHn9mTTVMOsy", "outputId": "6ba8c00f-d3ef-4073-f98f-f26a176aba22" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
kikišibuduitudmuudgasilašumu.DUmaškimekišibakzabardabumaššaglugalmašgalkiraenensikegiaigikarŋiriragabadubsarmašdasaŋŋaamarmadaakitiluabgudzigauzudašgargukkalšugid...šaŋanlapukutumlagaztumbangiimduaKU.du₃batiʾumniŋnasikiduʾagudumdumšuhugarišuturgagurunindašuraekaskalakusaŋnammahegizidniskugarasaŋ.DUN₃muhaldimgalšagiagalšagiamahkurunakgalugulaʾekšidimgalkalamenkudinkiʾanabaharhurizumlagabibaduballašembulugliniŋsahaensi
pn
1000411.01.01.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1001891.0NaN1.01.01.01.01.01.01.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1001901.0NaN1.01.01.01.0NaN1.0NaN1.01.01.01.01.01.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1001911.0NaN1.01.01.01.0NaNNaN1.0NaNNaNNaNNaNNaNNaN1.01.01.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1002111.0NaN1.01.01.01.01.01.01.0NaNNaNNaNNaNNaNNaNNaN1.01.01.01.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
......................................................................................................................................................................................................................................................
519650NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
519658NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
519792NaNNaN1.01.01.01.0NaN1.0NaN1.0NaNNaNNaNNaN1.0NaNNaN1.0NaN1.0NaN1.0NaNNaNNaN1.0NaNNaNNaNNaNNaNNaNNaNNaN1.0NaNNaN1.0NaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
519957NaNNaNNaN1.0NaNNaNNaN1.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN1.0NaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
5199591.01.0NaN1.0NaNNaNNaN1.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "

15139 rows × 1076 columns

\n", "
" ], "text/plain": [ " ki kišib udu itud mu ... balla šembulug li niŋsaha ensi\n", "pn ... \n", "100041 1.0 1.0 1.0 NaN NaN ... NaN NaN NaN NaN NaN\n", "100189 1.0 NaN 1.0 1.0 1.0 ... NaN NaN NaN NaN NaN\n", "100190 1.0 NaN 1.0 1.0 1.0 ... NaN NaN NaN NaN NaN\n", "100191 1.0 NaN 1.0 1.0 1.0 ... NaN NaN NaN NaN NaN\n", "100211 1.0 NaN 1.0 1.0 1.0 ... NaN NaN NaN NaN NaN\n", "... ... ... ... ... ... ... ... ... .. ... ...\n", "519650 NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN\n", "519658 NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN\n", "519792 NaN NaN 1.0 1.0 1.0 ... NaN NaN NaN NaN NaN\n", "519957 NaN NaN NaN 1.0 NaN ... NaN NaN NaN NaN NaN\n", "519959 1.0 1.0 NaN 1.0 NaN ... NaN NaN NaN NaN NaN\n", "\n", "[15139 rows x 1076 columns]" ] }, "execution_count": 49, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "sparse = pd.concat(archives.apply(get_line, axis=1).values).set_index('pn')\n", "\n", "sparse" ] }, { "cell_type": "code", "execution_count": null, "id": "aeb77970", "metadata": { "id": "Cps3vRT8Xc7f" }, "outputs": [], "source": [ "sparse = sparse.fillna(0)\n", "sparse = sparse.join(archives.loc[:, 'archive'])" ] }, { "cell_type": "code", "execution_count": null, "id": "95e8af2d", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 279 }, "id": "j43addrEfqrB", "outputId": "d73bcf04-e30d-4fb0-dd19-c8f83425eed3" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
kikišibuduitudmuudgasilašumu.DUmaškimekišibakzabardabumaššaglugalmašgalkiraenensikegiaigikarŋiriragabadubsarmašdasaŋŋaamarmadaakitiluabgudzigauzudašgargukkalšugid...niŋnasikiduʾagudumdumšuhugarišuturgagurunindašuraekaskalakusaŋnammahegizidniskugarasaŋ.DUN₃muhaldimgalšagiagalšagiamahkurunakgalugulaʾekšidimgalkalamenkudinkiʾanabaharhurizumlagabibaduballašembulugliniŋsahaensiarchivedomesticated_animalwild_animaldead_animalleather_objectprecious_objectwool
pn
1000411.01.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{domesticated_animal}1.00.00.00.00.00.0
1001891.00.01.01.01.01.01.01.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{dead_animal}0.00.01.00.00.00.0
1001901.00.01.01.01.01.00.01.00.01.01.01.01.01.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{dead_animal}0.00.01.00.00.00.0
1001911.00.01.01.01.01.00.00.01.00.00.00.00.00.00.01.01.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{dead_animal}0.00.01.00.00.00.0
1002111.00.01.01.01.01.01.01.01.00.00.00.00.00.00.00.01.01.01.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{dead_animal}0.00.01.00.00.00.0
\n", "

5 rows × 1083 columns

\n", "
" ], "text/plain": [ " ki kišib udu ... leather_object precious_object wool\n", "pn ... \n", "100041 1.0 1.0 1.0 ... 0.0 0.0 0.0\n", "100189 1.0 0.0 1.0 ... 0.0 0.0 0.0\n", "100190 1.0 0.0 1.0 ... 0.0 0.0 0.0\n", "100191 1.0 0.0 1.0 ... 0.0 0.0 0.0\n", "100211 1.0 0.0 1.0 ... 0.0 0.0 0.0\n", "\n", "[5 rows x 1083 columns]" ] }, "execution_count": 51, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "sparse.loc[sparse.loc[:, 'archive'].apply(lambda x: 'domesticated_animal' in x), 'domesticated_animal'] = 1\n", "sparse.loc[:, 'domesticated_animal'] = sparse.loc[:, 'domesticated_animal'].fillna(0)\n", "\n", "sparse.loc[sparse.loc[:, 'archive'].apply(lambda x: 'wild_animal' in x), 'wild_animal'] = 1\n", "sparse.loc[:, 'wild_animal'] = sparse.loc[:, 'wild_animal'].fillna(0)\n", "\n", "sparse.loc[sparse.loc[:, 'archive'].apply(lambda x: 'dead_animal' in x), 'dead_animal'] = 1\n", "sparse.loc[:, 'dead_animal'] = sparse.loc[:, 'dead_animal'].fillna(0)\n", "\n", "sparse.loc[sparse.loc[:, 'archive'].apply(lambda x: 'leather_object' in x), 'leather_object'] = 1\n", "sparse.loc[:, 'leather_object'] = sparse.loc[:, 'leather_object'].fillna(0)\n", "\n", "sparse.loc[sparse.loc[:, 'archive'].apply(lambda x: 'precious_object' in x), 'precious_object'] = 1\n", "sparse.loc[:, 'precious_object'] = sparse.loc[:, 'precious_object'].fillna(0)\n", "\n", "sparse.loc[sparse.loc[:, 'archive'].apply(lambda x: 'wool' in x), 'wool'] = 1\n", "sparse.loc[:, 'wool'] = sparse.loc[:, 'wool'].fillna(0)\n", "sparse.head()" ] }, { "cell_type": "code", "execution_count": null, "id": "e803dbb5", "metadata": { "id": "ZF3ECsWlgfcU" }, "outputs": [], "source": [ "known = sparse.loc[sparse['archive'].apply(len) == 1, :]\n", "unknown = sparse.loc[(sparse['archive'].apply(len) == 0) | (sparse['archive'].apply(len) > 1), :]" ] }, { "cell_type": "code", "execution_count": null, "id": "8b2faae2", "metadata": { "id": "eu8Hr19zbcKH" }, "outputs": [], "source": [ "unknown_0 = sparse.loc[(sparse['archive'].apply(len) == 0), :]" ] }, { "cell_type": "code", "execution_count": null, "id": "5f3c4d1e", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Ssc8HVpVb10Y", "outputId": "e4080241-0a04-44b6-dfbb-3a5e2bdf0b66" }, "outputs": [ { "data": { "text/plain": [ "(3243, 1083)" ] }, "execution_count": 54, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "unknown.shape" ] }, { "cell_type": "markdown", "id": "4dd468f1", "metadata": { "id": "PwbgjCVeLXpx" }, "source": [ "### 1.3 Data Exploration" ] }, { "cell_type": "code", "execution_count": null, "id": "8151330e", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 615 }, "id": "r96qBX7qj1hq", "outputId": "047456a3-4e48-4ecf-871e-2f503befe6ae" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
kikišibuduitudmuudgasilašumu.DUmaškimekišibakzabardabumaššaglugalmašgalkiraenensikegiaigikarŋiriragabadubsarmašdasaŋŋaamarmadaakitiluabgudzigauzudašgargukkalšugid...niŋnasikiduʾagudumdumšuhugarišuturgagurunindašuraekaskalakusaŋnammahegizidniskugarasaŋ.DUN₃muhaldimgalšagiagalšagiamahkurunakgalugulaʾekšidimgalkalamenkudinkiʾanabaharhurizumlagabibaduballašembulugliniŋsahaensiarchivedomesticated_animalwild_animaldead_animalleather_objectprecious_objectwool
pn
1002170.00.00.01.01.01.00.01.00.01.00.00.00.00.01.00.00.00.00.00.00.01.00.00.00.00.00.01.01.01.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{domesticated_animal, wild_animal}1.01.00.00.00.00.0
1002291.00.01.01.01.01.00.00.00.01.01.00.00.00.00.01.00.01.00.00.00.00.00.00.01.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{domesticated_animal, wild_animal}1.01.00.00.00.00.0
1002840.00.01.01.01.01.01.01.00.01.00.00.00.00.00.00.00.01.00.00.01.00.00.00.00.00.00.01.00.01.00.00.00.01.01.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{domesticated_animal, wild_animal}1.01.00.00.00.00.0
1003301.00.01.01.01.00.00.00.00.01.00.00.00.01.00.00.01.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.01.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{domesticated_animal, wild_animal}1.01.00.00.00.00.0
1007491.00.01.00.01.00.01.01.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.01.01.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{domesticated_animal, wild_animal}1.01.00.00.00.00.0
......................................................................................................................................................................................................................................................
5002101.00.00.01.01.01.00.01.00.00.00.00.00.00.00.00.01.00.01.00.00.00.00.00.00.00.00.01.00.01.00.00.00.00.01.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{domesticated_animal, wild_animal}1.01.00.00.00.00.0
5079680.00.00.01.01.00.00.01.00.01.00.00.00.00.01.00.00.00.00.00.01.00.00.00.00.00.00.01.01.01.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{domesticated_animal, wild_animal}1.01.00.00.00.00.0
5093251.00.01.01.00.01.00.00.00.00.01.00.00.00.00.01.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{domesticated_animal, wild_animal}1.01.00.00.00.00.0
5127800.00.00.01.01.00.00.00.00.00.00.00.00.00.00.01.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{leather_object, wool}0.00.00.01.00.01.0
5128121.01.01.01.01.00.00.00.01.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.01.00.00.00.01.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{domesticated_animal, wild_animal}1.01.00.00.00.00.0
\n", "

1195 rows × 1083 columns

\n", "
" ], "text/plain": [ " ki kišib udu ... leather_object precious_object wool\n", "pn ... \n", "100217 0.0 0.0 0.0 ... 0.0 0.0 0.0\n", "100229 1.0 0.0 1.0 ... 0.0 0.0 0.0\n", "100284 0.0 0.0 1.0 ... 0.0 0.0 0.0\n", "100330 1.0 0.0 1.0 ... 0.0 0.0 0.0\n", "100749 1.0 0.0 1.0 ... 0.0 0.0 0.0\n", "... ... ... ... ... ... ... ...\n", "500210 1.0 0.0 0.0 ... 0.0 0.0 0.0\n", "507968 0.0 0.0 0.0 ... 0.0 0.0 0.0\n", "509325 1.0 0.0 1.0 ... 0.0 0.0 0.0\n", "512780 0.0 0.0 0.0 ... 1.0 0.0 1.0\n", "512812 1.0 1.0 1.0 ... 0.0 0.0 0.0\n", "\n", "[1195 rows x 1083 columns]" ] }, "execution_count": 127, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "unknown.loc[sparse['archive'].apply(len) > 1, :]" ] }, { "cell_type": "code", "execution_count": null, "id": "8742909c", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 461 }, "id": "-xukYCAib3kS", "outputId": "09bb169d-704e-4dbb-987a-6f4e64961084" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
kikišibuduitudmuudgasilašumu.DUmaškimekišibakzabardabumaššaglugalmašgalkiraenensikegiaigikarŋiriragabadubsarmašdasaŋŋaamarmadaakitiluabgudzigauzudašgargukkalšugid...niŋnasikiduʾagudumdumšuhugarišuturgagurunindašuraekaskalakusaŋnammahegizidniskugarasaŋ.DUN₃muhaldimgalšagiagalšagiamahkurunakgalugulaʾekšidimgalkalamenkudinkiʾanabaharhurizumlagabibaduballašembulugliniŋsahaensiarchivedomesticated_animalwild_animaldead_animalleather_objectprecious_objectwool
pn
1002921.00.00.01.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{}0.00.00.00.00.00.0
1003010.00.00.00.00.00.00.00.01.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{}0.00.00.00.00.00.0
1003750.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{}0.00.00.00.00.00.0
1003760.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{}0.00.00.00.00.00.0
1003770.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{}0.00.00.00.00.00.0
......................................................................................................................................................................................................................................................
5196470.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{}0.00.00.00.00.00.0
5196500.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{}0.00.00.00.00.00.0
5196580.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{}0.00.00.00.00.00.0
5199570.00.00.01.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{}0.00.00.00.00.00.0
5199591.01.00.01.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{}0.00.00.00.00.00.0
\n", "

2048 rows × 1083 columns

\n", "
" ], "text/plain": [ " ki kišib udu ... leather_object precious_object wool\n", "pn ... \n", "100292 1.0 0.0 0.0 ... 0.0 0.0 0.0\n", "100301 0.0 0.0 0.0 ... 0.0 0.0 0.0\n", "100375 0.0 0.0 0.0 ... 0.0 0.0 0.0\n", "100376 0.0 0.0 0.0 ... 0.0 0.0 0.0\n", "100377 0.0 0.0 0.0 ... 0.0 0.0 0.0\n", "... ... ... ... ... ... ... ...\n", "519647 0.0 0.0 0.0 ... 0.0 0.0 0.0\n", "519650 0.0 0.0 0.0 ... 0.0 0.0 0.0\n", "519658 0.0 0.0 0.0 ... 0.0 0.0 0.0\n", "519957 0.0 0.0 0.0 ... 0.0 0.0 0.0\n", "519959 1.0 1.0 0.0 ... 0.0 0.0 0.0\n", "\n", "[2048 rows x 1083 columns]" ] }, "execution_count": 55, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "#find rows where archive has empty set\n", "unknown[unknown['archive'] == set()]" ] }, { "cell_type": "code", "execution_count": null, "id": "6b79b32c", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "x-2uy3iEb5x9", "outputId": "20f0a326-36bc-49b8-818b-46555e38ce7c" }, "outputs": [ { "data": { "text/plain": [ "['domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'wild_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'precious_object',\n", " 'precious_object',\n", " 'precious_object',\n", " 'precious_object',\n", " 'precious_object',\n", " 'wool',\n", " 'wool',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'precious_object',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'leather_object',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'wild_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'wild_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'wild_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'wild_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'precious_object',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'wool',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'wild_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'wild_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'wild_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'wild_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'leather_object',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'wild_animal',\n", " 'wild_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'wild_animal',\n", " 'wool',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'precious_object',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'wild_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'wild_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'precious_object',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'wild_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'wild_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'wild_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'precious_object',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'wild_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'precious_object',\n", " 'wild_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'precious_object',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'wild_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'precious_object',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'precious_object',\n", " 'domesticated_animal',\n", " 'precious_object',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'precious_object',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'precious_object',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'precious_object',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'precious_object',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'precious_object',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'wild_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'precious_object',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'precious_object',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'precious_object',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'precious_object',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'precious_object',\n", " 'domesticated_animal',\n", " 'wild_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'wild_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'wild_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'wild_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'precious_object',\n", " 'domesticated_animal',\n", " 'precious_object',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'wild_animal',\n", " 'precious_object',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'precious_object',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " 'dead_animal',\n", " 'wild_animal',\n", " 'domesticated_animal',\n", " 'domesticated_animal',\n", " ...]" ] }, "execution_count": 56, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "known_copy = known\n", "known_archives = [known_copy['archive'].to_list()[i].pop() for i in range(len(known_copy['archive'].to_list()))]\n", "known_archives" ] }, { "cell_type": "code", "execution_count": null, "id": "87d76471", "metadata": { "id": "VbFoEDiLb8KR" }, "outputs": [], "source": [ "known['archive_class'] = known_archives" ] }, { "cell_type": "code", "execution_count": null, "id": "ca1129bd", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 386 }, "id": "16hd_Pcy-WGi", "outputId": "ba618b28-2f00-4960-a87b-34912f8f38b5" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Percent of texts in Domesticated Animal Archive: 0.6905682582380632\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAbUAAAFgCAYAAAA8WedBAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdedxmc/3H8debsS8zwwwxgxFDPy0GYyuVfc8IhcSQTPqpEEmliIhKihAh/MgamRDGFso2hLJl7MY2zFiyxPD5/fH5Xs1xd9+zmPu+rvs+1/v5eNyP+7rOcp3vOde5zuec76qIwMzMrA7maHUCzMzMuouDmpmZ1YaDmpmZ1YaDmpmZ1YaDmpmZ1YaDmpmZ1YaDmvVqkpaW9C9JczZxmytKukvSq5K+0cPbWlfSU9OZ/2tJ3+/JNHSx3UMkndXs7c4KSZ+U9GCr09EXSXpM0oZdzOvTx9VBrSbKSfpGCQCNvyVbna7ZFRFPRMSCEfFOEzd7AHBdRCwUEcd2tZCk0yVNlbRETyUkIvaMiMN64rMlfUHS+HKuPCPpT5LW6Ylt9YSIuDEiVnw/60raVdJNlfcLS/qLpN9Lmrv7Utm9yk1QSPp2T21jdo5rb+CgVi+fKQGg8fd0daakfq1KWB+zDHDv9BaQtACwLfAy8MUZLNvrjrukbwK/AI4AFgeWBk4ARrUyXa0gaSBwDfA4sH1EvNXiJE3PaGAysMv0FuqN51yzOKjVXLmr20vSQ8BDZdqWJXvtJUl/lfSxyvKrSLqzZL2dJ+lcST8q895zd1v5/OXL63kk/UzSE5KeK1ln85V560p6StJ+kp4vTwa7VT5nPklHS3pc0suSbirThpVt9CvL9Zd0all/oqQfNbImJS0v6c9l/RcknTed47KVpHvLMbhe0v+U6dcC6wG/Kk8wK3TxEdsCLwGHkhea6mcfIulCSWdJegXYVdIikn4r6WlJUyT9ocM6XR2X0yvH/35JW1bm9ZM0SdKq5f1a5ft8SdLdktbtYt/7l3TvFREXRcRrEfF2RPwxIr7VxToXSHq2HNsbJH24Mm9zSfeVc2aipP3L9EGSLi3pmSzpRklzlHlLlqeiSZIeVSWbV9Ia5QnylXIe/byLNL0n61aZW7G/pHtKOs+TNG9n61bWGQxcB/wD+GJETK0c9+MlXVb261ZJy1XW+7ik28t2bpf08TJ9PUl/ryw3TtLtlfc3Str6/aRXeSO1HbAXMFzSyMq8xu9kd0lPANeW6XuU8+bV8h2tWvnIEZ1tu3pcJX1b0oUd0vFLSceW193ye+xWEeG/GvwBjwEbdjI9gHHAIsB8wCrA88CawJzkBfkxYB5gbvJudV9gLvIH9Dbwo/JZuwI3dfL5y5fXxwBjy7YWAv4I/LjMWxeYSl5M5wI2B14HBpb5xwPXA0NKuj5e0jSsbKNfWe5i4CRgAWAx4DbgK2XeOcD3yJu1eYF1ujhWKwCvARuVtBwATADmLvOvB748g+N9DfAT8ilnKrBaZd4h5bhtXdIyH3AZcB4wsGzz0zN5XE6vHP8fAGdXtrMFcH95PQR4saw/R9m3F4HBnaR907LNftPZv0OAsyrvv1S+03nIJ7y7KvOeAT5ZXg8EVi2vfwz8uuzXXMAnAZX03VH2Z27gg8AjwCZlvZuBncvrBYG1ukjjusBTHX4DtwFLkufg/cCeXay7K3Af+UR+IqAO808vx28NoB9wNnBumbcIMAXYuczbsbxftHzXbwKDyj4/B0wsx24+4A1g0VlNb1l+53Ks5yR/W8dV5g0jfydnkr+N+YDPlW2vXo778sAyM9p29biSuRavAwuV93OWNKzVXb/Hbr8WNmMj/mvCF5kn6b/Ip4eXgD+U6QGsX1nuROCwDus+CHwa+BTwdPUHDvyVmQhq5UfzGrBcZd7awKPl9brlB92vMv95YK1y0r8BrNzJfjV+rP3IAPJvYL7K/B3J8i/KD/pkYOgMjtX3gfMr7+coP/51y/vrmU5QI7Pq3gVGlPdXAr+szD8EuKHyfomy/MBOPqvL41Jen145/ssDrwLzl/dnAz8or78N/F+Hz74SGN3JNncCnp3BMTqESlDrMG9A+U76l/dPAF8BFu6w3KHAJZSbnsr0NYEnOkz7DvDb8voG4IfAoBmkcV3+O6h9sfL+J8Cvu1h313Is3wbW7GT+6cAplfebAw+U1zsDt3VY/mZg1/L6RmAb8ty+CjifvJFYD7jn/aS3zL8a+EXlvJ8EzNXhd/LBDt//3l18Vpfb7uS43gTsUl5vBDxcXnfL77G7/5z9WC9bR8SA8rd1ZfqTldfLAPuVLKGXJL0ELEXesS0JTIxyRhaPz+S2BwPzA3dUPveKMr3hxSjZO8Xr5J34IPJO7uEZbGMZ8u73mco2TiLvECGfuATcpsxa/FIXn7Nkdb8i4l3yGA2Z8W4CeVG7PyLuKu/PBr4gaa7KMtVjvhQwOSKmdPF5XR2X94iICeQd9WckzQ9sBfyuzF4G+FyH73UdMqD+1/aAQZrJchdJc0o6UtLDyuzUx8qsQeX/tuRF//GS3bR2mf5T8gn4KkmPSDqwktYlO6T1u+RFEmB38mn6gZK1958s15nwbOV1p8ex4m5gf+BPklaZhc96z/lTPM608+fPZGD4VHl9PXnT+OnyfpbTK2kpMiieXSZdQv5mtuiwaMfzbnq/qZk9Vr8jgxXAF3jvOdcdv8du1baFiW2mGqSeBA6PiMM7LiTp08AQSaoEtqWZ9sN4jQxcjeU/UFn9BfKJ48MRMXEW0/cCmWWzHHmh6cqT5J3hoA5BAICIeBbYo6RtHeBqSTeUYFD1NPDRyn6IvADMbLp3AZaW1Lgo9COznjYnLzbw38d8EUkDIuKlmdxGV84hLzBzAPdV9u1J8kltj5n4jJvJ47g1cOEMloW8kI0CNiQDWn8yu00AEXE7MKoE9a+RTyZLRcSrwH7kTdRHgGtL+dKT5BP88M42FhEPATuW8rdtgAslLRoRr81EWmdJRPxS0jzAOEnrRsQ/ZmK1p8kLetXS5E0cZOA6mnyCPZI8Vr8hj/nx7zOpO5Pf+R/zdAUyqI0GquWzHc+75Zh9FwBHSxoKfJbMgWl8fnf8HruVn9Taz2+APSWtqbSApC0kLURe7KYC35A0l6RtyDKFhruBD0saUQqVD2nMKE87vwGOkbQYgKQhkjaZUYLKuqcBP1dWIJhT0trlYlNd7hkyO+doZRXsOSQtV4Ixkj5XfniQF5Igs/06Oh/YQtIG5UK8H/nj/OuM0lqeQpYrx2VE+fsIeffaaY20ku4/ASdIGliO7admtK0unAtsDHyVaXfMAGeRT3CblOM3bynwH9rxAyLiZbI863hJW0uav6RpM0k/6WSbC5HH50XypuaIxgxJc0vaSVL/iHgbeIVyzJUVkpYvNw0vA++UebcBr5ZKCPOV9H5E0uplvS9KGlzOi8ZNQGffY7eIiJ8AvyQvujNTlf1yYAVlk4h+krYHVgIuLfP/CqxIniO3RcS9ZBBck8xafT9Gk1myIyp/2wKbS1q0i3VOAfaXtFr5rS8vqWMwnqGImEQ+bf6WvBm5v0zvrt9jt3JQazMRMZ68e/oVeaJNIMsXiKzKvE15PxnYHriosu4/yXKSq8malO+pCUmW60wAbinZVFeTP+6ZsT/wd+D2su2j6Pz83IWsXHBfSf+FTMtiWx24VdK/yAore0fEI50cgwfJavjHkU+JnyGbQ8xMVe7RwCUR8feIeLbxR14Ut5S0SBfr7UyW3zxAlpntMxPb+i/lQnIzWZHmvMr0J8mnqe+SZS1PAt+ii994RBwNfBM4qLL813jvXX/DmWT22kTyuN/Syb49Vr7zPckyO4Dh5Dnwr5LmEyLiusg2h1uSF+ZHye/gFPIJELL86d7yPf4S2CEi3pjBoZktkW0BTwGuUaWWYxfLvkimfz8y0B8AbBkRL5T5rwF3AvdWzqmbgccj4vlZTZuktcigeHz1nIuIseTvbcfO1ouIC4DDyZufV8nvtqvzc0Z+Rz6p/67D9Nn+PXY3vbf4xOy9JJ1OFhof1Oq0mJnNiJ/UzMysNhzUzMysNpz9aGZmteEnNTMzqw0HNTMzqw03vm6hQYMGxbBhw1qdDDOzPuWOO+54ISIGdzbPQa2Fhg0bxvjx41udDDOzPkVSl933OfvRzMxqw0HNzMxqw0HNzMxqw0HNzMxqw0HNzMxqw0HNzMxqw0HNzMxqw0HNzMxqw42v+6hhB17W6iTMlMeO3KLVSTCzNuInNTMzqw0HNUDSvpLulfQPSedImlfSspJulTRB0nmS5i7LzlPeTyjzh1U+5ztl+oOSNmnV/piZtau2D2qShgDfAEZGxEeAOYEdgKOAYyJieWAKsHtZZXdgSpl+TFkOSSuV9T4MbAqcIGnOZu6LmVm7a/ugVvQD5pPUD5gfeAZYH7iwzD8D2Lq8HlXeU+ZvIEll+rkR8e+IeBSYAKzRpPSbmRkOakTEROBnwBNkMHsZuAN4KSKmlsWeAoaU10OAJ8u6U8vyi1and7KOmZk1QdsHNUkDyaesZYElgQXI7MOe2t4YSeMljZ80aVJPbcbMrC21fVADNgQejYhJEfE2cBHwCWBAyY4EGApMLK8nAksBlPn9gRer0ztZ5z8i4uSIGBkRIwcP7nSMOzMze58c1DLbcS1J85eysQ2A+4DrgO3KMqOBS8rrseU9Zf61ERFl+g6lduSywHDgtibtg5mZ4cbXRMStki4E7gSmAn8DTgYuA86V9KMy7dSyyqnA/0maAEwmazwSEfdKOp8MiFOBvSLinabujJlZm2v7oAYQEQcDB3eY/Aid1F6MiDeBz3XxOYcDh3d7As3MbKY4+9HMzGrDQc3MzGrDQc3MzGrDQc3MzGrDQc3MzGrDQc3MzGrDQc3MzGrDQc3MzGrDQc3MzGrDQc3MzGrDQc3MzGrDQc3MzGrDQc3MzGrDQc3MzGrDQc3MzGrDQc3MzGqj7YOapBUl3VX5e0XSPpIWkTRO0kPl/8CyvCQdK2mCpHskrVr5rNFl+YckjW7dXpmZtae2D2oR8WBEjIiIEcBqwOvAxcCBwDURMRy4prwH2AwYXv7GACcCSFqEHD17TXLE7IMbgdDMzJqj7YNaBxsAD0fE48Ao4Iwy/Qxg6/J6FHBmpFuAAZKWADYBxkXE5IiYAowDNm1u8s3M2puD2nvtAJxTXi8eEc+U188Ci5fXQ4AnK+s8VaZ1Nd3MzJrEQa2QNDewFXBBx3kREUB003bGSBovafykSZO64yPNzKxwUJtmM+DOiHiuvH+uZCtS/j9fpk8ElqqsN7RM62r6e0TEyRExMiJGDh48uJt3wcysvTmoTbMj07IeAcYCjRqMo4FLKtN3KbUg1wJeLtmUVwIbSxpYKohsXKaZmVmT9Gt1AnoDSQsAGwFfqUw+Ejhf0u7A48Dny/TLgc2BCWRNyd0AImKypMOA28tyh0bE5CYk38zMCgc1ICJeAxbtMO1FsjZkx2UD2KuLzzkNOK0n0mhmZjPm7EczM6sNBzUzM6sNBzUzM6sNBzUzM6sNBzUzM6sNBzUzM6sNBzUzM6sNBzUzM6sNBzUzM6sNBzUzM6sNBzUzM6sNBzUzM6sNBzUzM6sNBzUzM6sNBzUzM6sNBzUzM6sNBzVA0gBJF0p6QNL9ktaWtIikcZIeKv8HlmUl6VhJEyTdI2nVyueMLss/JGl06/bIzKw9OailXwJXRMSHgJWB+4EDgWsiYjhwTXkPsBkwvPyNAU4EkLQIcDCwJrAGcHAjEJqZWXO0fVCT1B/4FHAqQES8FREvAaOAM8piZwBbl9ejgDMj3QIMkLQEsAkwLiImR8QUYBywaRN3xcys7bV9UAOWBSYBv5X0N0mnSFoAWDwininLPAssXl4PAZ6srP9UmdbVdDMzaxIHNegHrAqcGBGrAK8xLasRgIgIILpjY5LGSBovafykSZO64yPNzKxwUMsnqqci4tby/kIyyD1XshUp/58v8ycCS1XWH1qmdTX9PSLi5IgYGREjBw8e3K07YmbW7to+qEXEs8CTklYskzYA7gPGAo0ajKOBS8rrscAupRbkWsDLJZvySmBjSQNLBZGNyzQzM2uSfq1OQC/xdeBsSXMDjwC7kQH/fEm7A48Dny/LXg5sDkwAXi/LEhGTJR0G3F6WOzQiJjdvF8zMzEENiIi7gJGdzNqgk2UD2KuLzzkNOK17U2dmZjOr7bMfzcysPhzUzMysNhzUzMysNhzUzMysNhzUzMysNhzUzMysNhzUzMysNhzUzMysNhzUzMysNhzUzMysNhzUzMysNhzUzMysNhzUzMysNhzUzMysNhzUzMysNhzUzMysNhzUzMysNhzUAEmPSfq7pLskjS/TFpE0TtJD5f/AMl2SjpU0QdI9klatfM7osvxDkka3an/MzNqVg9o060XEiIgYWd4fCFwTEcOBa8p7gM2A4eVvDHAiZBAEDgbWBNYADm4EQjMzaw4Hta6NAs4or88Atq5MPzPSLcAASUsAmwDjImJyREwBxgGbNjvRZmbtzEEtBXCVpDskjSnTFo+IZ8rrZ4HFy+shwJOVdZ8q07qa/h6SxkgaL2n8pEmTunMfzMzaXr9WJ6CXWCciJkpaDBgn6YHqzIgISdEdG4qIk4GTAUaOHNktn2lmZslPakBETCz/nwcuJsvEnivZipT/z5fFJwJLVVYfWqZ1Nd3MzJqk7YOapAUkLdR4DWwM/AMYCzRqMI4GLimvxwK7lFqQawEvl2zKK4GNJQ0sFUQ2LtPMzKxJnP2YZWUXS4I8Hr+LiCsk3Q6cL2l34HHg82X5y4HNgQnA68BuABExWdJhwO1luUMjYnLzdsPMzNo+qEXEI8DKnUx/Edigk+kB7NXFZ50GnNbdaTQzs5nT9tmPZmZWHw5qZmZWGzPMfpT0DvD3yqStI+KxHkuRmZnZ+zQzZWpvRMSIzmYoa1coIt7t3mSZmZnNulnOfpQ0TNKDks4kq74vJelbkm4vHfz+sLLs9yT9U9JNks6RtH+Zfr2kkeX1IEmPlddzSvpp5bO+UqavW9a5UNIDks4uARVJq0v6q6S7Jd0maSFJN0gaUUnHTZL+qzKImZnVy8w8qc0n6a7y+lFgX7Iz39ERcYukjcv7NQABYyV9CngN2AEYUbZzJ3DHDLa1O9nua3VJ8wB/kXRVmbcK8GHgaeAvwCck3QacB2wfEbdLWhh4AzgV2BXYR9IKwLwRcfdM7KuZmfVhs5z9KGkY8HjpzBeykfHGwN/K+wXJILcQcHFEvF7WGzsT29oY+Jik7cr7/uWz3gJui4inymfdBQwDXgaeiYjbASLilTL/AuD7kr4FfAk4fSa2bWZmfdz7baf2WuW1gB9HxEnVBSTtM531pzIt63PeDp/19Yh4T08cktYF/l2Z9A7TSXtEvC5pHNmj/ueB1aaTFjMzq4nuqNJ/JfAlSQsCSBpSOga+Adha0nylG6rPVNZ5jGmBZrsOn/VVSXOVz1qhdF3VlQeBJSStXpZfSFIj2J0CHAvcXoaCMTOzmpvtHkUi4ipJ/wPcXOpu/Av4YkTcKek84G6yM+DbK6v9jOyCagxwWWX6KWS24p2lIsgkpo1j1tm235K0PXCcpPnI8rQNgX9FxB2SXgF+O7v7aGZmfYOy16cmbEg6hAw2P2vS9pYErgc+1FubHIwcOTLGjx//vtYdduBlM16oF3jsyC1anQQzqxlJd0TEyM7m1bJHEUm7ALcC3+utAc3MzLpf0zo0johDmritM4Ezm7U9MzPrHWr5pGZmZu3JQc3MzGrDQc3MzGrDQa0o/U7+TdKl5f2ykm6VNEHSeZLmLtPnKe8nlPnDKp/xnTL9QUmbtGZPzMzal4PaNHsD91feHwUcExHLA1PIfikp/6eU6ceU5ZC0EtnX5YeBTYETJM3ZpLSbmRkOagBIGgpsQTb+bgypsz5wYVnkDKY1Ah9V3lPmb1CWHwWcGxH/johHgQlkJ89mZtYkDmrpF8ABQKNN26LASxExtbx/ChhSXg8BngQo818uy/9neifrmJlZE7R9UJO0JfB8RMxoWJzu2t4YSeMljZ80aVIzNmlm1jbaPqgBnwC2KgOVnktmO/4SGFDpHHkoMLG8nggsBVDm9wderE7vZJ3/iIiTI2JkRIwcPHhw9++NmVkba/ugFhHfiYihETGMrOhxbUTsBFzHtBEERgOXlNdjy3vK/GsjO9AcC+xQakcuS44Dd1uTdsPMzGhiN1l90LeBcyX9iBwA9dQy/VTg/yRNACaTgZCIuFfS+cB95Hhxe0XEO81PtplZ+3JQq4iI68me/YmIR+ik9mJEvAl8rov1DwcO77kUmpnZ9LR99qOZmdWHg5qZmdWGg5qZmdWGg5qZmdWGg5qZmdWGg5qZmdWGg5qZmdWGg5qZmdWGg5qZmdWGg5qZmdWGg5qZmdWGg5qZmdWGg5qZmdWGg5qZmdWGg5qZmdWGg5qZmdVG2wc1SfNKuk3S3ZLulfTDMn1ZSbdKmiDpPElzl+nzlPcTyvxhlc/6Tpn+oKRNWrNHZmbtq+2DGvBvYP2IWBkYAWwqaS3gKOCYiFgemALsXpbfHZhSph9TlkPSSsAOwIeBTYETJM3Z1D0xM2tz/VqdgFaLiAD+Vd7OVf4CWB/4Qpl+BnAIcCIwqrwGuBD4lSSV6edGxL+BRyVNANYAbu75vaiHYQde1uokzJTHjtyi1Ukwsy74SQ2QNKeku4DngXHAw8BLETG1LPIUMKS8HgI8CVDmvwwsWp3eyTpmZtYEDmpARLwTESOAoeTT1Yd6aluSxkgaL2n8pEmTemozZmZtyUGtIiJeAq4D1gYGSGpkzw4FJpbXE4GlAMr8/sCL1emdrFPdxskRMTIiRg4ePLhH9sPMrF21fVCTNFjSgPJ6PmAj4H4yuG1XFhsNXFJejy3vKfOvLeVyY4EdSu3IZYHhwG3N2QszMwNXFAFYAjij1FScAzg/Ii6VdB9wrqQfAX8DTi3Lnwr8X6kIMpms8UhE3CvpfOA+YCqwV0S80+R9MTNra20f1CLiHmCVTqY/QpavdZz+JvC5Lj7rcODw7k6jmZnNnLbPfjQzs/pwUDMzs9pwUDMzs9pwUDMzs9pwUDMzs9pwUDMzs9pwUDMzs9pwUDMzs9pwUDMzs9pwUDMzs9pwUDMzs9pwUDMzs9pwUDMzs9pwUDMzs9pwUDMzs9pwUDMzs9po+6AmaSlJ10m6T9K9kvYu0xeRNE7SQ+X/wDJdko6VNEHSPZJWrXzW6LL8Q5JGt2qfzMzaVdsHNWAqsF9ErASsBewlaSXgQOCaiBgOXFPeA2wGDC9/Y4ATIYMgcDCwJjli9sGNQGhmZs3R9kEtIp6JiDvL61eB+4EhwCjgjLLYGcDW5fUo4MxItwADJC0BbAKMi4jJETEFGAds2sRdMTNre20f1KokDQNWAW4FFo+IZ8qsZ4HFy+shwJOV1Z4q07qabmZmTeKgVkhaEPg9sE9EvFKdFxEBRDdtZ4yk8ZLGT5o0qTs+0szMCgc1QNJcZEA7OyIuKpOfK9mKlP/Pl+kTgaUqqw8t07qa/h4RcXJEjIyIkYMHD+7eHTEza3NtH9QkCTgVuD8ifl6ZNRZo1GAcDVxSmb5LqQW5FvByyaa8EthY0sBSQWTjMs3MzJqkX6sT0At8AtgZ+Luku8q07wJHAudL2h14HPh8mXc5sDkwAXgd2A0gIiZLOgy4vSx3aERMbs4umJkZOKgRETcB6mL2Bp0sH8BeXXzWacBp3Zc6MzObFW2f/WhmZvXhoGZmZrXhoGZmZrXhoGZmZrXhoGZmZrXhoGZmZrXhoGZmZrXhoGZmZrXhoGZmZrXhoGZmZrXhoGZmZrXhoGZmZrXhoGZmZrXhoGZmZrXhoGZmZrXhoGZmZrXR9kFN0mmSnpf0j8q0RSSNk/RQ+T+wTJekYyVNkHSPpFUr64wuyz8kaXQr9sXMrN21fVADTgc27TDtQOCaiBgOXFPeA2wGDC9/Y4ATIYMgcDCwJrAGcHAjEJqZWfO0fVCLiBuAyR0mjwLOKK/PALauTD8z0i3AAElLAJsA4yJickRMAcbx34HSzMx6WNsHtS4sHhHPlNfPAouX10OAJyvLPVWmdTXdzMyayEFtBiIigOiuz5M0RtJ4SeMnTZrUXR9rZmY4qHXluZKtSPn/fJk+EViqstzQMq2r6f8lIk6OiJERMXLw4MHdnnAzs3bmoNa5sUCjBuNo4JLK9F1KLci1gJdLNuWVwMaSBpYKIhuXaWZm1kT9Wp2AVpN0DrAuMEjSU2QtxiOB8yXtDjwOfL4sfjmwOTABeB3YDSAiJks6DLi9LHdoRHSsfGJmZj2s7YNaROzYxawNOlk2gL26+JzTgNO6MWlmZjaLnP1oZma14aBmZma14aBmZma14aBmZma14aBmZma14aBmZma14aBmZma14aBmZma14aBmZma14aBmZma14aBmZma14aBmZma10fYdGpv1lGEHXtbqJMyUx47cotVJMOs2flIzM7PacFAzM7PacFAzM7PacJmamc00lxNab+cntW4maVNJD0qaIOnAVqfHzKydOKh1I0lzAscDmwErATtKWqm1qTIzax/OfuxeawATIuIRAEnnAqOA+1qaKjPrVB2zU+u4T7NCEdEjH9yOJG0HbBoRXy7vdwbWjIivVZYZA4wpb1cEHmx6Qrs2CHih1YnoZnXbp7rtD9Rvn+q2P9D79mmZiBjc2Qw/qTVZRJwMnNzqdHRG0viIGNnqdHSnuu1T3fYH6rdPddsf6Fv75DK17jURWKryfmiZZmZmTeCg1r1uB4ZLWlbS3MAOwNgWp8nMrG04+7EbRcRUSV8DrgTmBE6LiHtbnKxZ0SuzRWdT3fapbvsD9dunuu0P9KF9ckURMzOrDWc/mplZbTiomZlZbTiomZlZbTioWa8iaX5JrsDUQyQNkDS01eloF5IWb3Ua2o2DmvUakhYGzga27E2BTdIHJfX5bt8lzQscAewsaelWp6cnlZujj5fXIySt2uTtS9Ig4DZJX2jmtns7SXNIUk99voOa9RoR8QpwBfBVYONeFNiGA7+TtHWrEzI7IuJN4DxgBeCzkpZpcZJ60jzAHpIuBk4AXm/mxiO9AHwNONlQGdwAACAASURBVETS55q5/d6q3FhtBMwnaStJ/9vd2+gtFw1rc5LmiIh3I+IkSW8D+5XpV0XE1BamSxFxZbnb/lVJ50WtSs/71bgzjog/S3oX2AOYQ9JFEfF4a1PX/SJiiqQrgBOByyPiAciRNCLinZ7cdjlnopwrf5T0DnB8eX9eT267t4uINyWtAvwQ6A/s1d3b8JOatVy5CLwraVGAiDgNOA3YnxY/sZWLkyLiMmAf4GhJ27QqPe9H4yILLC1p7oi4ETgGWBXYpk5PbI3gLWlZYDKwK7C4pB8DRMQ7kgb05PZjWuPf5SUNiojLgZ2BIyVt31Pb7u0kNeLNGcBUsgvBOxu/7+7KknRQs5YrgeMzwEmSfi1pnYg4G/g1GUi2aHZgq1wcVwO2kjQ0Ii4h7yyPlvTZZqZndpTjuxVwFvm0+XXgIeAoYGVgB0nDWpfC7lP2dQvgcuCpiBgLfB1YW9IhZXzDo3oqsDUCmqRvAicBF5SAOhHYETisjN7RVio3rp8EPlv+7ibHnxxeFvtAt2wsIvznv5b+AesDfwOWIbsYuwnYsczbGbgBGNyCdG1ODg30I+BhYMsyfVNgErBdq4/dTO7Hx4E7gCXI7Li/lX1aGBgB/A4Y1up0dtO+fojsg3XNDtOXBa4F7ml8jz2YhtWAO4H5gbXJMuITyeKeLYG7gIVafaxa8N1sQY4tuX5l2q/IJ7evkoF/+Oxux2Vq1jKV7Ig1gC8DHwYGAJcAXyvlH/8n6eqImNTktH0I+AGwCbA88CVgX0nzRMTvJe0CvNnMNM2qRjklOVrEHmQAWx34MbA7Wa7xI2BMRPyrZQntXm8D90TEraVSwtTIMtnHyQoKS0bEk925wUoZWiPrsT/wYkS8DtwsaQp5Hn08Ii6VdH2NjvdMKTWbRwN7RMRfSjb4WxHxNUn7kJWX9oyIh2Z3W85+tKar5J3PVS66PyWzw8YAoyLiCOAdsjxtyYh4ptlpjKxY8CVgSeBI8k7/CuAUSVtHxJ8i4rqerJr8flXStDBARJwfEXeSWT6jI+J88q54EWDRvnyBrWQTq7x+A1hL0soR8WZkJ+PrUCoe9UBAm6MEMoCFyv9bgCnKzs0b59KzTMtma2pNzF7iVWAuMjcG8uYDSUtHxC+A/SMr1cz278lBzZqu3NVuDPxW0u7kxfUNYEFgR0kfJguSj42Ip5uRpsrFcQVJa5Z03gd8EPhHRPwb+AtZDvBIdV+akb5ZUY7vZsCfJB0qad0ya2GyevkawEeB4yLin61K5+yqPCFtQVYsOp4sl/khcJWk3cr59RvggeiBWo/lpqwxov2pko4HtgUuAJaTdI6kPYD1yOzP/6xTZ5Xf05KShpXfybXAEEkrle9tDbKMd1jju+mO35N76bemK1l7ZwKnA1uRweJsMpvsu+TT0fcjK2Y0M11bkE+N7wDXkdmPS5NPas+QWXffiIjrm5muWaXsMeQE8viuTrbZugy4jbzALwqcEBEXtyqN3UXShmR26hiytuzQiPi0pM8DI8lAflFEXNWhZmJ3puEz5HmzE7Ah+cT2FnlOjyGfUi6OvjUM1WxTtuvcj6yFei/5O98EGAJMAT5JeULr1u06qFkzVO6qVyTvpleKiBPLU9k+wKPAxcAEYImIeKKnLkJdpEvAQcCFZFbo+WQlkZOBxciL1W0RMa4n0/N+VfZjJDn6+poRcaCkxYAvlmnjIuJySf0j4uVmHN/uVvbnQxFxQ3k/hqwYMhT4DrBTRDwqaf5SptXT6fkEWTHk7XI+L0BWfNoW2CsiXuuLx3l2SfoI+eS8BfC/wPYRsZqkD5BBbTng4Yi4o7uPj7Mfrcc1yh0kbUTerX0LOKJkO9wL/JysJPIFYI6IeAKak7VX0rUVeVe9IbBUqViwJ1l4/XXgwYg4PCLG9dIytDnLfqwL/BHYDDhA0qci4nmydtkkYHNJgyPiZeidWafTUyoWbQo8UyoeQNYwPBXYF9i2BLQtgG8ru8rq1mtc9fsvadgJWBPYVdLHIuK18uSxJHn+9Lnj/H5I+oCk46rl5WSzih2ArYHPl+mDIuKOUs57B3T/8XFQsx4jaUHIMgRl+6AdyIogWwLHAedKWjYi7gcOB84vZVfNTOOHgG+S2Y13Al+XtGYJBv9LFu4v0Vi+N12gNK2x+jtlP8YAO0fEGLI26R8lrRsRL5JPnEdHk2uRdqdSFvU7MuvqMEmbkNmsLwHPR8QzJTvyaOAvEfF6d5dfNb7/Ui70CtlU4tGSrn0lbVyy3fqTWdbtYgpZPX+YstbpU2RTkq8Au0TEw5I2Jcsdh/TkzaGDmvUIZePWfSQtJmlO8uQeQWlgGRE/IO/k/ijpgxFxX0T8vQnpWqxkGVEC7XHADRHxGzKwjgMOlPSJiHgO2KYE3V5F0nzA/prWaHptMktnXUnzRvbK8g3gWknrRcQLEfFoa1I7e1QAlKfod4HnyDZfnwC2ARaRdAFwCPDNiLiqB9OzNlkJ5ytkG8pPkef1pWQuxM5kM4lneyoNvU1E/DsiHiSz8K8gg9wlwGPARpJGkzcbh0XExJ68OXSZmvWI8hQxD3njNJys5nwoEMDvIuKustxhwJ8i4q9NSNMcwC5k4+6nyGrFJ5FZRXuWcrxFyTZc65LZoa/2RK252aXsYWUB8olg24g4Rtk/5ceBvwIXRsRbkr5E9qzRYxf5niSpXwlkKHvdXxh4KSJukbQ3mcV3bkTcKGluYJGeDCZlG4uRZa7zkjUt1yErQGxDNtKfu9k5Dq1SKcvtF9l8oh95o7gEeTy2INuhLgxc2sjCd1CzPqmc4F8mqzOfQGbVHE620/l9RIxvUZoGkjUbLwRuJGsETiXvIp8qgW2hiHis2embVcpuhw4iLxjHlSC2MtlrxdkR8VZZrs9VVpA0mMzW24Fs3zQW+D15w3FZRHxX2XB3OeCq7q5F10l61ibL9M4nm6D8ArgImJvs0u0Q4PDeeBPU3cpvZHBEPKBsPrIpecP6c7K25+HkzeLny83VvJGjRPQ4Zz9at2pkE0maq9xhnw1cTXYsuxrwPbJK+Q6SFurqc3oqXSVNcwAvkI2R1yLLogT8WNJSEfFibw1olePbX9m7yY3k08J6kvYu2Y73kXfHgxrr9bWABlDK/x4hs7N2AnaLiL3JJ6NtJO1Hdj/1OJntumwPJ+nJ8ncGJbACr0TEyWSPLWe3SUCbh6xAtYtySJ2fANeTQewAsqPs/cjf2CXNDGjgJzXrAcp2O2PIfvb+CNxKlqmNBM4p75eObNzczHStTgavx8nKBd8g26GdTVYLP5WsTNHjZXuzo9TW/B55wb86Ik4t5YR7A7dHxE+VHTA/1dKEzgZVhoiR9CMyaOwXEWeVaWsA/xsRu0qaC/gD8NuIuLAJaVuZbBu3EPm08qGe3mZvU3IINicD2aMRcUiZ/m1gjYjYtmTVHg08GxGHNyttflKzbiVpeXJgxLFkYf7hZCPLk8jeOEaTZQ5NCWiVJ5t1yAvft8jG1OsAvyQLsncH1oqIXftAQGv0Q3kImW26n6S9IuIvZO2zdZRNJfpyQFOp0bk4QEQcRF4c96s8jQ0me+wYAMxHdvt1VzPSFxF3kzkPJwAvVSrr1F6jiUTJIbiIbGT+aeUYaUTEUeRQP6uWrO8JwLLq5qYV0+MOja3blBP7FODMiPhNqdI/hSy/Opy86C4VEZOblaZSiL0e2ZntZmQQ24YMDJQ07Us+ufVqkoaTx/feiPhTmbYLcLqyg9hjJN0TEb1+X6anfGebAt+U9Bx5YfwxMCdwpaQrySeEnzf2VdI+0YTG1pU0Pg+cI+nCiHi7WdttpXKz8a6yYfVUMiv2+2SW42cl9QeeJm84Gv2J3g9c2d1NK6abTmc/WneSdBmZLbNeuduenxxHajfgMxExpQVpOrVsf6VSsD2EbGg9iiyTubqvlDlJOoSsePMNsk/Kd5R9Vf6O7Mniib6yL11R9jIzlvzOFia7+hoWEaMl/bBM3ygiHlQTRrK2aZRtA48F/kzeJO5ANuzfh7xx/CdwUuQ4dq1JYx8//62FKtV5VwAWjOwJHkljyWFZdqwEtoERMbHJ6Zq7UvvvLLKd3EfLvKHAxsCdjeYFvU1lP0aS7aDGR8Szkn4ArAIcTD61vSNp4cjGwH1eKS8bExFfVrZxXJIcIufkyGFLVixtoqxJSjb+IDLL8QeRI1RsR/Z5uR1ZKWRfMqDd38ratg5qNltKpYUfkJUvXgB+GhETJF1EtlP7TFOzHqYFgs3JGmpvAD+KiLclnQl8BFi9BIJ5ent7olLp5jCy6cFQ4KzI8dy+C3waOKCU8dSGpGXI2nQHRMQFZdpJZN+bp/bF5gl1UG4wfk22Q2vcTH2DrBjyRUkDW5ET05Eritj7VmoTHkA2PB0HfIbsKmiFiNimLDaimWmqlMccSda03Bk4W9lryS5kjcFGZZC3mpm2WaXs+mov8vj+BfgY2X/j9pFjzv2VmpWLK/sJfZzMztpD0t6lfdhaZFOFPtk8oSbmJNvkfYFskwbZ2LxRXb9XlOX6Sc3eN2U3U/OTjZmPIIdkP5g88Zv+BFGySOYha1r+guz54TDyhzcQ+Gpkh7cr94WnG2WP9IuT5UrHkWWTu5C9NBwTEWe0MHk9qlTTX4sciuhp4I8R8YfWpqp9VXJAFiEbwD9ClqVtSg4T1aMN32eFg5rNlhJIDiYrKJwmaU/yiW2f6Iah2d9nmuYna2CdRVYoeLPUoruIbOvUp0YeVvYSslRE/FDZWe72wKHRC/uknFmVi+R0G+Yqe4B5t9S6c7ZjE6nSRVl5P2ej/JZso7Yg8M+IuKE3fTe1yrqw5isXpseAg0pjyz2AfVsV0EqaXpf0NlnteEVJbwE3k41ze11AK1lu0yt3/CdwcinT2JHsp7IOAW1jYCVJp0+nGcK7jWPTWy6a7UA5TNQqko6OaaNSv1PO1VeAcyvL9qrhmFymZtOlHCZitRmcuL8DjiIrZnw/ygCOPZyuecr/rtI1BfgT2YXPpWStrNt6Ol2zSjlo4vYqDY07ma+IuImsLh1kQLummWnsbpWAdixZo/M9Aa3xnZYng3clLSDpo61IaztSDuR7INmf6HuaSzRuMMoTdGNa9KYbDmc/2nRJ2pessrsPWf29yzZBjdqEjYtST53oyl4kriCHGPlrV1kfpTHoYLK5QW+ttr85OSDphWQj1ec6WabxZFPtOmpGT3e9UqXc83zg9Ii4qNTwHAncFxHnleUaWV2N73qP6OW9vdSBpKWAY8hy3M9GxGudLNP4bvqTFZl+1mg60xv4Sc06VQlMx5BZd98mG8F2tXy/RvX4nr5zK3f2ZwEnSFqlXPA7e2J7JSImxLRhbnrd+R4Rl5MdPm8JbF0u4h2XaQzt8Y6mdVPU5wJa0b+UoV1O1pS9lCyfmRPYpDyVVQPaBWSlIwe0HlL97UTEk+QNVpDfx4Idlq1+N5cA1/emgAYOataFRlBSjiS8HLAUcKakNTsGkHKiT5U0QNKu1ayJ7lbKlSB7SH8VuELSWl0EtjnKOvOWfeo1gaCxH8phO7Yly//2I3ufH9xx2cbxBb7f8ULTF0iaQ9LSwPXKcdEuJLOGvxsRXyWfxj4INIJ3f7Jiz6HNyM5uV5VcgE0lHSTpO2QfqWeSN1rrN863smz1ZuP70YRxEGdZRPjPf53+kT3Y3wWsVt4fTLZHW7OyzJzlf39y8M1PNyFdG5FtzTYjO5Wd2EgT07LUG+kaAIwHPtjq41nSM6jyejHyKe0T5f12wHlkZZuBnRzf64F1W70Ps7n/3yKf/D9RmbYB8A+yoX5j2ijg461Obzv8ld/ReHIE7zuB48v0Pcv5uDUwR5k2L3lDuW6r093l/rQ6Af7rvX/kyMq/B0ZUpv2W7Mh0zQ6B42pgnSal69vAEZX3Xyd7M/l4ed+v/O9fgvCnesGxFJnFdjfwq8r084BdK+/3Idv/fAmYr3J8r2nW8e2BfV8R2LrD93U3OVLCguSQOZs3jlOr09tuf+QT8/ByI3ET2c9mY96XgZUr74f2lhvErv5cpd/+o5IVMZAMDJMkPQ+MlDQxctDGM4CVgNdjWr+O15Dt0m5qUlKfJp8iGxUmjpO0A3CKpDUj4tWyDxeRWSTNStf0LBwRL5fsxhsl/Swi9geuJYdQGRk5Evg1ZDu/2yLijVLL81zgkF6yHzOtZAf3I+/0Pyjp3YgYW76vIWS52gbAiVHKZaJcOa1ndKxUVb6jucmBZhcjb7AeU/brOFdEnFJdP/rAkEau/WjvIWkUeSf9NlnOcR/ZI/zDwOtkVsU3IsdTaqyzUvTQ+GiVQLsWWUb2KtnP5BVkT+7nAEPIMdF+ExG3lPX2Be6OiGt7Il2zQtICZGDaKyKekLQEOSjpb8iOeo8kyyynkqOD7x0RV1XWXzoinmh+yt+fyne2YET8q5Sx7gUMA/4cEX+QtBo5nt03oxc2tag75fAxr0bE48oho64DDoyIXysHnD2VbD5yfSvT+X44qNl/KPtyPAbYigxkW0fEiNJuZWWyM+A/R2knpQ49DvRgurYkx2M7h7yzP5oMCseT/TeuDnwrIi7t6bS8X+WOeEVylIALSmAbT2ZF/rhcZFYHHoiIm8s6fXZYFUlbAPuT42rdDvyMHA19BFmzblXgK42bEOtZ5Xw7PCK+pOxL8xzgIeA24DTyxvAMMriNIHvi77W/p+lxUGtjHYOScoiTj5JPaXsBO0XEI8qRlB9rURqXJ390XyD7mdsLeJkszL5A2UfgkuWOs9d01dORcoDPs8iyi69HxNnlQnMzcFFEfLOlCZxNHdrQjSDLXvcmRz8/G7giIg5SDiuzAdnoelzLEtxmSm3S88ibwGfJm9c5yJq3C5NPzW+QfbnOFzlWXa/9PU2Pg1qbUnZp9UngKbJCxQpkBYVvk3nsO0bEk8rGwd8i+xt8IZpcLV7SkmRFiUXIUaq3Jgu09wZ+ERHH9tYfXzVd5UK/KZl9uh9513xq2b87yAE+H2z28e0OpQnCvsDBkUP8rE2Oh7Zbmb8guY/fi4gLK+v1yu+tTjrcbAwmf9/bAytGdic3gizD/QA5Yv2trUtt93A7tfY1DzAfOT7ShWTFhCuBe8gupoZL2p6sGXV0RDzfjAtuo62ZpA8pB/J8q5TXLQecUp4YJ5HlabdC761cUMqV1pT0lcgG4IsBL5JB+QBJe0bE08AyEXF/XwxoRT/gFGCJEqSfBYZKGgYQEf8in9zerq7UW7+3uii5GHtIWlrSyuSNxxHAA+TTM+W8vIysPfxyq9LanRzU2lREvAo8T+af30M+DRER+5BlPZuRd3D7R8SlnTRs7ql0RakheD6wK3Crsn/Ed4Exkr5GVqw4v7ffVZaLymjgp5J2JKtL/4xs6zMa+G4J3H2y3KwhIp4BHgO+RrYbfJasyHOGpO1KTbpdyZsla5KIeJscIuYxsibw6RExGfgc8Lqk88pyd5JdXT3QqrR2J2c/tplKzbTG/2HkCMqfAi4uAWwwWdPxrfLD6Ok0LQZsSHa7s3D5vyPZFu5A4JOlOvzOwJLA3yO7l+q1lH3ovQksRJZl/LP8P4jMgtwUWCC67p2+16ucQ6sAD5L9bI4hKx2MIS+eqwDLkjVTr2hZYttQuRGdg2wmsho54vsDyq7WFiRrOM4bEZ+pU1awg1obqVyEtiB7r3ierAV1PznA50fJMrY1gP+NiEebkSayEsj6wJ+BG8mRnieTtee+EBETlENh/CXK0DG9+UcoaT7ge2Q1/d8Ar5AjcJ8CfJasWfrpaOHwPN1FOcr4icDnI+J2ScuQ59Li5FP+i5rBmGnWvSq/8wWidEgs6XPk+bdVRPy5VMB6k+yL895Wpre7Oai1mVLx4zCyosWeZO/ouwO3kIHui+Rd9dgmp2s/YHkyv/8bwFxkzwVTSxu1HwNfjoiHm5mu96vUNludzI67mBx5+5iIuL+vtTvrirIvx7Fkbc5qu8Vh5He4FBnM3+6rTRP6mkpAGwXsQGbb/ywi/ibpi+So8N8mn6R3jj4wAvysclBrI6WM57tkedVwslbjxWRw+1JE3CRp7oh4q5lPQpI2Kemag+zH8QmyDOYn5N3k7mSPGpc0Iz3dSdIK5JPLF4DHI2KNvtz+DN5z4VyGrES0XZneGHpoLrK26oCIeLCliW1D5fd0BFkh6VSygtL3IuJy5cjpGwJjo9LAv04c1GqucgH6n/KU0J9si3IB2fj1Xkk3AcuQlUZeauYFt5SnXURWAb+vVAQZRPaVuBjZQPTvETGuN2c5To+yK7EPk+1/+myP85VzqdFTyFxklvHvI+LossxGZO/u+/bh2px9ksoYe5K+T/Z5Opgsk74a+DzwncjeXOZqRll5q7j2Y41VLkKbAr+XtHJEvExWrf4n8IakT1FqO0bEiy14gnibrBI+qLw/iawMsg5we0T8PEoj3b4Y0AAi4vWIuD0ibmhWLdKeUM6lTYCzJR0E7ERmM64v6fhSy/GnwLUOaM1TOafmAoiIw8hy8r2AHSLiYLIpyW6SBtU5oIGDWi01TvJyEfoQ2XvAHpX88ynkd38w2SfhtRHxj1akNSKmkNmh60r6SPnB/Z6sIdj7xmqaTX01MANIWgf4OZm1tSI5uGfj4jkX8DHyaeCSvhy8+5ryO9+YbDryNUmrAq+RXZRtpewp6EXgxxHxQivT2gzOfqyZUni/NnBByYpYlew09otlfqPcYw7y6WhgtLhLnNJWa0+y1uXtZIWVvSLi6lakx6apPO0vRJbFvEI29zgO2Daye7LBkSM4vGedFiW5bah0cydpXbIG6h5kZwpXA98h25p+jqzO/83oo305zioHtZqRtBLZuPcJ8k5tAPAn4IcR8YeyzObkGGlHtCyhHZSL5tpkp8l3RMSfW5wkK8pTwMeBCWTD9xeA9SNicsmO/ARwVKP6uPUsScsCk0vbzXnIbtduJm84TgI+G9nF3eJkrszQyD5c2+Jmw+Op1UR58lKpbDEXWVD8J7K/xKPJbIgPkb1yH0G2o+o1Ins4uar8WS9RnvS3As6LiBtL84oFgZD0SfLcOtABramWA+6U9MGImCLpCeBQsgLYVhExsbRLWywijid7FenTWd+zwk9qNSBpXvJu+c9klsObwEvkcC0XkD0KLEU2Zp4IXBYRf2yXOzebNZUsR5EdEb9N9vDyKFlDdltgI/IcOz4ixvpcaq5S+esEcgif/sDJZE88J5Flm2cCB0TEn1qWyBZxUKuBcvH5IdnV1RKU8ijlMB+NfhJ/3aECiS9C1qVSKWQhsvf275Ljvv2yMr8/8E6p2u9zqQVKMcIxZGD7JNlX6wiyL9Gj+2K7zu7goNbHVdqmDAP+QD6JfQ6YWhpRr0mOlXQ+OVSLq1pbpypPaB8nG+3eSXab9kmyt5fDIuK4VqbR3kvZ5d1PgTXKDcaywL8j4ul2vdlwUOvDKhehNYCh5FAsPwDmJKvvPlyW+x9g4ejlvdpb65Vz6Siyav4tyj4CNyEriqxHdqF2cCvTaO+lHNXidOB/Invhb2tup9aHlYC2FXlCT4qIiWSboanAgZLGlELklxzQbCb1J7Ox1y/vHyef1h4my209WnUvU8rNvgSs3Oq09AZ+UuvDSpbjOcBOpcrux4AlI+IKZQfBywFXNarym80MZWe4RwPfj4hzJH2aLLtZr1Qjb8tsrb7A342DWp8maUFyOImXyTZpy5JdTF0eEYdWGlq3/Ylus0bSZ8jRka8ie3o/K5o8coPZ++Hsxz6kUXtR0kcljSB7BPkZ2W7ocnKYj4OAeQAi4t/lvwOazZKI+CM5DNHyZB+cY1W0OGlm0+XG131IKUMbRXaBcxs5EONPI2InAEnrkUHuoNal0uqiBLI3gdMkPRwRF7U6TWYz4ie1PkTSksDXgQ3IIVmWBh6VNF8pX9sHODgiLvUdtXWHyDG3dgPuanVazGaGy9T6EOWgjN8B/k4OOjk6IiZIWpvs7eGt0h+fy9DMrC35Sa0Xq5ShDQCIiMfJQvv9gK+WgLYB2V3OAo02Kg5oZtau/KTWy5VaaF8F/g3sTfZivyEwBLiOHKTxgHYZVsLMbHoc1HqhSk8hiwAXk0Oy7wQsSnaJ8wbZOFbAvRFxnbMczcwc1HotSZ8CFibHrfpmmXYE2aD6pxExvpXpMzPrjVym1otUytDWJhu+bgNsK+nrABHxXeBJ4BBJC7csoWZmvZSf1HoZSauTjaivjIjLyrhJY4CrI+KEsswKEfHPVqbTzKw3cuPrXqJSJrYmsCnwcHlyuw4IYD9J/SLiWAc0M7POOai1UHXQTmBJSc9GxK8kPQd8BbgjIm6SdB2ZVfx8C5NrZtbrOfuxFyhZjAcDE8jg9Q1yDKtdgKNcu9HMbOb4Sa0FJA0GNiJHqh4IHAvsDjwHfBa4hAxqg4DvSborIqa0KLlmZn2Gg1qTlSzHjcl2Zv2AvwHXRMSNkuaIiKMkLQ2MiohjJV3qgGZmNnMc1JqsZCGeLekDwFpkg+pRkm6LiN+WxV4kewyB7NPRzMxmgoNaC0jaBNgKmBMYAJwPHCppCeCBMm8fcD+OZmazwhVFmkzSYsBFwJiIuE/SXuS4aJADMj4C3OK+HM3MZp2f1JrvbfK4DyrvTwaOB5YFzgNOLf0+urajmdkscjdZTVYqfZwPrCv9f3v3E2JVGYdx/PsYbjIJok2QKEoSOak4VhRFEkFEYS60nHQhhDkQQy36R5C0KQJrYUhitJCicogxGiKMQqisGO3WzDj+oUX2D1r0h4okouRpcd4Ld+ySM83AXI/PZ3Muv3nPe953YObhPedwX3XZ/gsYAE4CB5pBlkCLiJi83H6cAZIuBXqBq4FDwFrgPtvvzejAIiLOcgm1GSJpLnAt1f5oDdvvz/CQIiLOegm1iIiojTxTi4iI2kioRUREbSTUIiKiNhJqERFRGwm1iIiojYRaRIeStEaSJV3+P879StLFbeqrJT06TeO7VdKnhAlBMQAAAs9JREFUko5K+lzSs6X+hKQHp+MaEZOVUIvoXD3AgXL8F0mT/po724O2n57qwCR1ATuAjbavAFZSbXIbMaMSahEdSNIFwPVUm8eub6mvkvShpEHgqKTzJD0jaUzSqKS+lm76JH0m6XBztSdpk6Qdki6U9LWkWaU+R9K3kmZLWiRpn6RGuVa7leLDwJO2jwPYPmV7Z5t5bJZ0SNKIpAFJ55f6ujLmEUkflNoSSQclDZe5XDYtv8w4pyTUIjrTHcA+218AP0nqbvnZCuB+24uBe4EFwHLbS4FXWtr9aHsFsBMYdzvQ9q/AMHBjKd0OvFO+i/QFoM92dznv+Tbj6wIaE5jHXttX2V4GHKMKaYCtwC2lvrrUeoHttpdTrfy+m0D/EeMk1CI6Uw+wp3zew/hbkAdtNzePvRnYZftvANs/t7TbW44NquA7XT9wV/m8HugvK8TrgNclDQO7gEumMI+usto7DGwAlpT6R8BuSZup9hUE+AR4TNIjwHzbf0zhunGOytYzER1G0kXATcCVkkz1T9+SHipNTk6wqz/L8RTt/9YHgafK9bqB/cAc4JeyWvovR8o5I2dotxtYY3tE0iZgFYDtXknXALcBDUndtl+VNFRqb0vaYnv/GfqPGCcrtYjOsxZ42fZ82wtszwNOADe0afsusKX50kgJqAmx/TvVLhHbgbfKc7HfgBOS1pX+JGlZm9O3Ua2qFpd2syT1tmk3F/he0myqlRql/SLbQ7a3Aj8A8yQtBL60/RzwJrB0onOJaEqoRXSeHuCN02oDtH8L8kXgG2BU0ghw9ySv1Q9sLMemDcA9pb8jVM/3xrE9CjwAvCbpGDAGLGzT/+PAENXtxuMt9W3lBZYx4GOqFd+dwFi57dkFvDTJuUTkW/ojIqI+slKLiIjaSKhFRERtJNQiIqI2EmoREVEbCbWIiKiNhFpERNRGQi0iImojoRYREbXxD/Irk7AzZ//QAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light", "tags": [] }, "output_type": "display_data" } ], "source": [ "archive_counts = known['archive_class'].value_counts()\n", "\n", "\n", "plt.xlabel('Archive Class')\n", "plt.ylabel('Frequency', rotation=0, labelpad=30)\n", "plt.title('Frequencies of Archive Classes in Known Archives')\n", "plt.xticks(rotation=45)\n", "plt.bar(archive_counts.index, archive_counts);\n", "\n", "percent_domesticated_animal = archive_counts['domesticated_animal'] / sum(archive_counts)\n", "\n", "print('Percent of texts in Domesticated Animal Archive:', percent_domesticated_animal)" ] }, { "cell_type": "code", "execution_count": null, "id": "bb853849", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "yBWVV59_5Z3J", "outputId": "a81e37e2-4841-4680-baad-7775da37476c" }, "outputs": [ { "data": { "text/plain": [ "(11896, 1084)" ] }, "execution_count": 61, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "known.shape" ] }, { "cell_type": "code", "execution_count": null, "id": "9acb4f23", "metadata": { "id": "ukAq-vDUMfM-" }, "outputs": [], "source": [ "words_df_copy = words_df.copy()\n", "words_df_copy['id_text'] = [int(pn[1:]) for pn in words_df_copy['id_text']]\n", "\n", "grouped = words_df_copy.groupby(by=['id_text']).first()\n", "grouped = grouped.fillna(0)\n", "\n", "known_copy = known.copy()\n", "known_copy['year'] = grouped.loc[grouped.index.isin(known.index),:]['min_year']\n", "\n", "year_counts = known_copy.groupby(by=['year', 'archive_class'], as_index=False).count().set_index('year').loc[:, 'archive_class':'ki']\n", "year_counts_pivoted = year_counts.pivot(columns='archive_class', values='ki').fillna(0)" ] }, { "cell_type": "code", "execution_count": null, "id": "24475256", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 295 }, "id": "Rd1bg6MvdkMv", "outputId": "ee0d42bd-b588-4a1e-bff1-9c215b428762" }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAbIAAAEWCAYAAAAD/hLkAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOydd3xUxfbAvycFUughoBQFkRpIQgepigoKUhTkISpgQdRnfRb091T0oc/C0yeiYgEBHypNxYKNJiA1QUBAqgRIaEmAJIQkpMzvj7m72ZRNNmUTdjPfz2c/e+/M3Jmzd++9556ZM3NEKYXBYDAYDJ6KT2ULYDAYDAZDWTCKzGAwGAwejVFkBoPBYPBojCIzGAwGg0djFJnBYDAYPBqjyAwGg8Hg0RhF5qWIyGUick5EfCuwzdYisk1EUkTkYTe31V9EYovInykiz7lTBiftThGR/1V0uyVBRPqIyN7KlsMTEZEYEbnWSZ45r4CIPCsiH1dkm1VekVkXZpr10Ld9GlW2XGVFKXVEKVVDKZVdgc0+BaxSStVUSk13VkhE5ohIlohc6i5BlFKTlFL/ckfdInKbiERZ18pxEflBRHq7oy13oJRaq5RqXZpjRWS8iKxz2K8lIr+JyBIRqVZ+UpYv1ouPEpGn3dVGWc6rJ2G9JNqelRdEJNNh/wel1CtKqXsqUqYqr8gsbrIe+rbPMcdMEfGrLME8jMuBXUUVEJFg4BYgCbi9mLIX3XkXkceB/wKvAA2By4D3gGGVKVdlICJ1gRXAYWC0UupCJYtUFOOA08CdRRW6GK+5yib/ObFeEmsopWqg74MFDs/OGypDRqPInGC9vT0oIvuB/VbaEKvr7KyIrBeRcIfyHUVkq9WttkBEvhCRqVZenrdYh/qvtLari8g0ETkiIietN55AK6+/iMSKyD9E5JRlAUxwqCdQRP4jIodFJElE1llpzaw2/KxytUVklnV8nIhMtXU7isiVIvKrdXyCiCwo4rwMFZFd1jlYLSJtrfSVwNXADOvNrJWTKm4BzgIvoR8ujnVPEZHFIvI/EUkGxotIPRH5RESOicgZEfk63zHOzssch/P/p4gMccjzE5F4Eelk7few/s+zIrJdRPo7+e21LbkfVEp9qZRKVUplKqW+VUo96eSYRSJywjq3a0QkzCHvRhHZbV0zcSLyhJVeX0S+s+Q5LSJrRcTHymtkWT/xInJIHLpwRaSbaEsx2bqO3nQiU55uWdG9Ek+IyA5LzgUiElDYsQ7HhAKrgJ3A7UqpLIfz/q6IfG/9rk0i0sLhuKtEZIvVzhYRucpKv1pE/nAo94uIbHHYXysiw0sjr+iXp5HAg0BLEenikGe7T+4WkSPASiv9Xuu6SbH+o04OVUYW1rbjeRWRp0VkcT453haR6da2u+/HsrQ/XrSV/ZaIJAJTnLXvRCZ797rD+Z0gIkdF38OTRKSrdQ7PisiMfMffZZ37MyLyk4hcXmyjSqkq/QFigGsLSVfAL0A9IBDoCJwCugO+6IdwDFAdqIZ+K30M8EffNJnAVKuu8cC6Quq/0tp+C/jGaqsm8C3wbyuvP5CFfoD6AzcC54G6Vv67wGqgsSXXVZZMzaw2/KxyXwEfAMFAA2AzcJ+V9znwf+gXmwCgt5Nz1QpIBa6zZHkKOABUs/JXA/cUc75XAK+jrZksoLND3hTrvA23ZAkEvgcWAHWtNvu5eF7mOJz/54H5Du0MBv60thsDidbxPtZvSwRCC5F9kNWmXxG/bwrwP4f9u6z/tDraktvmkHcc6GNt1wU6Wdv/BmZav8sf6AOIJV+09XuqAVcAfwEDreM2AHdY2zWAHk5k7A/E5rsHNgON0Nfgn8AkJ8eOB3ajLe/3AcmXP8c6f90AP2A+8IWVVw84A9xh5Y2x9kOs/zodqG/95pNAnHXuAoE0IKSk8lrl77DOtS/63nrHIa8Z+j6Zh743AoFRVttdrfN+JXB5cW07nld078R5oKa172vJ0KMi7scytj8efZ0/ZP1Pga5e7/nTHM7vTOu3XG/9z19b7TZGP1dt9/Uw6ze0tdr+J7C+2Od4WZSAN3ysC/Mc2ko4C3xtpSvgGody7wP/ynfsXqAf0Bc4hsNNDazHBUVm3SipQAuHvJ7AIYebIw2Hh6f1x/ewLvQ0IKKQ32W7gPzQSiPD8YJEP0RWWdvzgA+BJsWcq+eAhQ77Pugbvr+1v5oiFBm6Gy4HiLT2fwLezncDrHHYv9QqX7eQupyeF2t7jsP5vxJIAYKs/fnA89b208Cn+er+CRhXSJtjgRPFnKMp5LuxHfLqWP9JbWv/CHAfUCtfuZeApVgvOg7p3YEj+dKeAT6xttcALwL1i5GxPwUV2e0O+68DM50cO946l5lA90Ly5wAfO+zfCOyxtu8ANucrvwEYb22vBW5GX9s/AwvRLw9XAztKI6+Vvxz4r8N1Hw/457tPrsj3/z/ipC6nbRdyXtcBd1rb1wEHre2Kuh9L2/74/NdZSa53CldkjR3yE9Fd0bb9JcCj1vYPwN35ftN5rBcJZx/TtagZrpSqY32GO6Qfddi+HPiHZQqfFZGzQFP0m1kjIE5ZZ97isItthwJBQLRDvT9a6TYSldV1Y3Ee/cZdH/2Wc7CYNi5Hv7Edd2jjA/QbEeg3OQE2W90Udzmpp5Hj71JK5aDPUePifyagH2R/KqW2WfvzgdtExN+hjOM5bwqcVkqdcVKfs/OSB6XUAfSb800iEgQMBT6zsi8HRuX7X3ujlWiB9oD64uI4ioj4isirInJQdFdpjJVV3/q+Bf2gP2x1JfW00t9Av5X+LCJ/ichkB1kb5ZP1WfSDCeBu9Fv6Hqvbzt6d6gInHLYLPY8ObAeeAH4QkY4lqCvP9WNxmNzr51e0Muhrba9Gvyj2s/ZLLK+INEUrwvlW0lL0PTM4X9H8111R95Sr5+oztIIAuI2811xF3I+lbR/yno/y4KTDdloh+7ZzeDnwtoNcp9HnoshnjBnYLBpHxXQUeFkp9XL+QiLSD2gsIuKgzC4j92ZIRSsrW/lLHA5PQP+RYUqpuBLKl4A201ugHy7OOIp+A6uf78EPgFLqBHCvJVtvYLmIrLEUgCPHgA4Ov0PQN72rct8JXCYitgeBH7pb6Ub0AwYKnvN6IlJHKXXWxTac8Tn6pvYBdjv8tqNoi+xeF+rYgD6Pw4HFxZQF/fAYBlyLVmK10V1pAqCU2gIMsxT539EWSFOlVArwD/SLU3tgpejxoqNoS71lYY0ppfYDY0SPp90MLBaREKVUqguylgil1NsiUh34RUT6K6V2unDYMfSDypHL0C9uoJXVf9CW6qvoc/UR+py/W0pR70D/59/qyxXQimwcunvLRv7rrgVlZxHwHxFpAoxA97TY6q+I+7FU7dvEKMkPLUdsz9n5xZZ0wFhkrvMRMElEuosmWEQGi0hN9AMuC3hYRPxF5Gb0GIGN7UCYiESKHhieYsuw3qI+At4SkQYAItJYRAYWJ5B17GzgTdFOAL4i0tN6wDiWO47uqvmPaHdpHxFpYSlgRGSUdbGDfngodJdefhYCg0VkgPXw/Qf6hlhfnKyWtdHCOi+R1qc9+i2xUE8yS+4fgPdEpK51bvsW15YTvkD3z99P7pspwP/QltpA6/wFiB60b5K/AqVUEnp86l0RGS4iQZZMN4jI64W0WRN9fhLRLzKv2DJEpJqIjBWR2kqpTCAZ65yLdiq60nowJQHZVt5mIEX0QH6gJW97EelqHXe7iIRa14VN8Rf2P5YLSqnXgbfRD1pX3M6XAa1ET1/wE5HRQDvgOyt/PdAafY1sVkrtQiu+7uhu09IwDt3dGunwuQW4UURCnBzzMfCEiHS27vUrxRWHg3wopeLRVuUn6BeQP630CrkfS9t+JTMTeEYspyjRTimjijvIKDIXUUpFod+SZqAvrgPovmSUdju+2do/DYwGvnQ4dh963GM52gMyjwcjepzmALDR6oJajr6hXeEJ4A9gi9X2axT+v96JHgTebcm/mNzus67AJhE5h3Y6eUQp9Vch52Av2mX+HbQ1eBN66oIrbtfjgKVKqT+UUidsH/SDcIiI1HNy3B3o8Zg96DGwR11oqwDWzbsB7QyzwCH9KNpqehY9dnIUeBIn94ZS6j/A4+hBaFv5v5P37d7GPHTXTxz6vG8s5LfFWP/5JPQYHEBL9DVwzpL5PaXUKqXnBA5BP4wPof+Dj9GWHujxpF3W//g28DelVFoxp6ZMKD1X72NghTh4Jzopm4iW/x9o5f4UMEQplWDlpwJbgV0O19QG4LBS6lRJZRORHmhF+K7jNaeU+gZ9v40p7Dil1CLgZfQLTwr6v3V2fRbHZ2iL/LN86RV1P5am/UpDKfUV+hn2hXVf7ASKdemXvMM6hvJCROagB37/WdmyGAwGgzdjLDKDwWAweDRGkRkMBoPBozFdiwaDwWDwaIxFZjAYDAaPxswjKyX169dXzZo1q2wxDAaDwaOIjo5OUEqFFl/SdYwiKyXNmjUjKiqqssUwGAwGj0JEXF31yGVM16LBYDAYPBqjyAwGg8Hg0RhFZjAYDAaPxoyRlSOZmZnExsaSnp5e2aIYPIiAgACaNGmCv79/8YUNBkMBjCIrR2JjY6lZsybNmjXDYaVtg8EpSikSExOJjY2lefPmlS2OweCRmK7FciQ9PZ2QkBCjxAwuIyKEhIQYK95gKANGkZUzRokZSoq5ZgyGsmEUmcFg0CQcgIMrK1sKg6HEGEVmMBg0a16HL++rbCkMhhJjFJmhWMaPH8/ixYsLpB87doyRI0e6vf3+/fubVVQqgrNHIe00mIXEDR6GUWQGO9nZ2SUq36hRo0IVnMFDSYqFnCy4cK6yJTEYSoRRZFWI4cOH07lzZ8LCwvjwww8BqFGjBv/4xz+IiIhgw4YNzJs3j/DwcCIiIrjjjjvsx65Zs4arrrqKK664wq68YmJiaN++PQA9evRg165d9vI2Kyo1NZW77rqLbt260bFjR5YuXepUvuzsbJ544gnat29PeHg477zzToEy999/P126dCEsLIwXXnjBnj558mTatWtHeHg4TzzxBACLFi2iffv2RERE0Ldv3zKcuSpATjakHNPbaWcrVxaDoaQopcynFJ/OnTur/OzevbtA2sVEYmKiUkqp8+fPq7CwMJWQkKAAtWDBAqWUUjt37lQtW7ZU8fHxecqPGzdOjRw5UmVnZ6tdu3apFi1aKKWUOnTokAoLC1NKKfXmm2+q559/Ximl1LFjx1SrVq2UUko988wz6tNPP1VKKXXmzBnVsmVLde7cuULle++999Qtt9yiMjMz87Tfr18/tWXLljxpWVlZql+/fmr79u0qISFBtWrVSuXk5NjbUUqp9u3bq9jY2DxpFyuVfu0kxSn1Qi39Ob6jcmUxeDVAlCrn57GxyKoQ06dPJyIigh49enD06FH279+Pr68vt9xyCwArV65k1KhR1K9fH4B69erZjx0+fDg+Pj60a9eOkydPFqj71ltvtVtqCxcutI+d/fzzz7z66qtERkbSv39/0tPTOXLkSKHyLV++nPvuuw8/P78C7dtYuHAhnTp1omPHjuzatYvdu3dTu3ZtAgICuPvuu/nyyy8JCgoCoFevXowfP56PPvqoxN2mVY6kuNxtY5EZPAyzskcVYfXq1SxfvpwNGzYQFBRkVyoBAQH4+voWe3z16tXt26oQZ4DGjRsTEhLCjh07WLBgATNnzrSXXbJkCa1bty7zbzh06BDTpk1jy5Yt1K1bl/Hjx5Oeno6fnx+bN29mxYoVLF68mBkzZrBy5UpmzpzJpk2b+P777+ncuTPR0dGEhISUWQ6vJOlo7na6UWQGz8JYZFWEpKQk6tatS1BQEHv27GHjxo0FylxzzTUsWrSIxMREAE6fPl2iNkaPHs3rr79OUlIS4eHhAAwcOJB33nnHrvx+//13p8dfd911fPDBB2RlZRXafnJyMsHBwdSuXZuTJ0/yww8/AHDu3DmSkpK48cYbeeutt9i+fTsABw8epHv37rz00kuEhoZy9OhRDE5IdrTIzlSeHAZDKTCKrIowaNAgsrKyaNu2LZMnT6ZHjx4FyoSFhfF///d/9OvXj4iICB5//PEStTFy5Ei++OILbr31Vnvac889R2ZmJuHh4YSFhfHcc885Pf6ee+7hsssuszubfPbZZ3nyIyIi6NixI23atOG2226jV69eAKSkpDBkyBDCw8Pp3bs3b775JgBPPvkkHTp0oH379lx11VVERESU6PdUKZJiwcfqoDFdiwYPQwrrJjIUT5cuXVT+uU1//vknbdu2rSSJDJ5MpV87X4yFhP2QeAB6PwoDnq88WQxejYhEK6W6lGedxiIzGAzaIqvdBAJqG4vM4HF4rSITkcdEZJeI7BSRz0UkQESai8gmETkgIgtEpJpVtrq1f8DKb1a50ns3P/30E5GRkXk+I0aMqGyxqjbJcVqRBdY1Y2QGj8MrvRZFpDHwMNBOKZUmIguBvwE3Am8ppb4QkZnA3cD71vcZpdSVIvI34DVgdCWJ7/UMHDiQgQMHVrYYBhuZ6ZAabymyOsZr0eBxeK1FhlbSgSLiBwQBx4FrANuaSnOB4db2MGsfK3+AmNgahqqCzWOxdhMIqGO6Fg0eh1cqMqVUHDANOIJWYElANHBWKZVlFYsFGlvbjYGj1rFZVvkCE45EZKKIRIlIVHx8vHt/hMFQUSTF6u9ajY1FZvBIvFKRiUhdtJXVHGgEBAODylqvUupDpVQXpVSX0NDQslZnMFwcOFpkgXWNRWbwOLxSkQHXAoeUUvFKqUzgS6AXUMfqagRoAthmgcYBTQGs/NpAYsWKbDBUEo4WWYBlkeXkVK5MBkMJ8FZFdgToISJB1ljXAGA3sAqwBdAaB9iWYv/G2sfKX6m8YILdlClTmDZtWpnradasGQkJCeUgkWbmzJnMmzevXOoqb9mqJEmxEBwK/gG6a1HlwIWUypbKYHAZr/RaVEptEpHFwFYgC/gd+BD4HvhCRKZaabOsQ2YBn4rIAeA02sPR4CYmTZpU2SIYHEmO09YYaIsMdPdiQO3Kk8lgKAFeqcgAlFIvAC/kS/4L6FZI2XRgVHm2/+K3u9h9LLk8q6Rdo1q8cFNYkWVefvll5s6dS4MGDWjatCmdO3fm4MGDPPjgg8THxxMUFMRHH31EmzZt+Pbbb5k6dSoXLlwgJCSE+fPn07BhQxITExkzZgxxcXH07Nmz0EWCHRk+fDhHjx4lPT2dRx55hIkTJwI61tkjjzzCd999R2BgIEuXLqVhw4ZMmTKFGjVq8MQTT9C/f386duzI2rVrSU1NZd68efz73//mjz/+YPTo0UydOrXINgzlQFIshFyptwMtRZZ+Fri80kQyGEqCt3YtVkmio6P54osv2LZtG8uWLWPLli0ATJw4kXfeeYfo6GimTZvGAw88AEDv3r3ZuHEjv//+O3/72994/fXXAXjxxRfp3bs3u3btYsSIEU7DrtiYPXs20dHRREVFMX36dPuiw6mpqfTo0YPt27fTt29fPvroo0KPr1atGlFRUUyaNIlhw4bx7rvvsnPnTubMmWOvy1kbhjKiVO6qHqCdPcA4fBg8Cq+1yCqb4iwnd7B27VpGjBhhj8c1dOhQ0tPTWb9+PaNG5RqcGRkZAMTGxjJ69GiOHz/OhQsXaN68OaCjQX/55ZcADB48mLp16xbZ7vTp0/nqq68A7HHOQkJCqFatGkOGDAGgc+fO/PLLL4UeP3ToUAA6dOhAWFgYl156KQBXXHEFR48eJSQkxGkbhjKSngQXzuUqMnvXolndw+A5GEXm5eTk5FCnTh22bdtWIO+hhx7i8ccfZ+jQoaxevZopU6aUuH5ncc4A/P39sc0r9/X1tYdnyY8t1pmPj0+euGc+Pj5kZWUV2YahjNhc721jZHm6Fg0Gz8B0LXoRffv25euvvyYtLY2UlBS+/fZbgoKCaN68OYsWLQJ0oEtbvK6kpCQaN9YPsLlz5+apxxZC5YcffuDMGedv567EOSsrFdFGlcXmel/AIjOKzOA5GEXmRXTq1InRo0cTERHBDTfcQNeuXQGYP38+s2bNIiIigrCwMJYu1bMOpkyZwqhRo+jcuTP169e31/PCCy+wZs0awsLC+PLLL7nsssuctulKnLOyUhFtVFnyK7JqwToumbHIDB6EiUdWSkw8MkN5UmnXzvIXYf10+Ocp8PHVaW9cCW0Gw01vV7w8Bq/HxCMzGAzlS3Ic1GyUq8TALBxs8DiMs4fBJRITExkwYECB9BUrVhjvQU8mKRZqN86bZhYONngYRpEZXCIkJKRQz0eDh5MUC03zrREQUEfHJzMYPATTtWgwVFVyciD5WK7rvQ1jkRk8DKPIDIaqSuopyMnM9Vi0EVjXTIg2eBRGkRkMVZX8rvc2AupAerIJ5WLwGIwiMxiqKs4UWWAdQEFGUoWLZDCUBqPIvJjyikdWEubMmcOxY8fs+/fccw+7d+8ucT22hY9LSv/+/ck/v68sfPPNN7z66qvlUld5y1ZmHANqOmJW9zB4GEaRGcqV/Irs448/pl27diWup7SKrLwZOnQokydPrmwx3ENyHPgH5654b8Ost2jwMIz7vbv4YTKc+KN867ykA9xQtHVQWDyybdu2MWnSJM6fP0+LFi2YPXs2devWdTkW2P/+9z+mT5/OhQsX6N69O++99x4Ad999N1FRUYgId911F02bNiUqKoqxY8cSGBjIhg0buOGGG5g2bRpdunThxx9/5NlnnyU7O5v69euzYsUKNm/ezCOPPEJ6ejqBgYF88sknNG/enOeff560tDTWrVvHM888w5AhQ3jooYfYuXMnmZmZTJkyhWHDhpGWlsaECRPYvn07bdq0IS0trcjzc//997NlyxbS0tIYOXIkL774IqAjTY8bN45vv/2WzMxMFi1aRJs2bZgzZw5RUVHMmDGD8ePHExgYyO+//86pU6eYPXs28+bNY8OGDXTv3p05c+YU2cZFR9JRPYfMWtjZjj2Ui3H4MHgGRpF5EY7xyLKysujUqROdO3fmzjvv5J133qFfv348//zzvPjii/z3v/8FcmOBvf322wwbNozo6Gjq1atHixYteOyxxzh16hQLFizgt99+w9/fnwceeID58+cTFhZGXFwcO3fuBODs2bPUqVOHGTNm2BWXI/Hx8dx7772sWbOG5s2bc/r0aQDatGnD2rVr8fPzY/ny5Tz77LMsWbKEl156ya5AAJ599lmuueYaZs+ezdmzZ+nWrRvXXnstH3zwAUFBQfz555/s2LGDTp06FXmOXn75ZerVq0d2djYDBgxgx44dhIeHA1C/fn22bt3Ke++9x7Rp0/j4448LHH/mzBk2bNjAN998w9ChQ/ntt9/4+OOP6dq1K9u2bSMyMrLINi4qkuIKjo+B6Vo0eBxGkbmLYiwnd1BYPLLU1FTOnj1Lv379ABg3blye2GTFxQJbt24d0dHR9gWI09LSaNCgATfddBN//fUXDz30EIMHD+b6668vUraNGzfSt29fe8yzevXqAXpl+3HjxrF//35EhMzMzEKP//nnn/nmm2/sY37p6ekcOXKENWvW8PDDDwMQHh5erMJYuHAhH374IVlZWRw/fpzdu3fbj7n55psBHTvNFo8tPzfddBMiQocOHWjYsCEdOnQAICwsjJiYGCIjI4ts46IiOQ4aFhI3z3QtGjwMo8iqOMXFAlNKMW7cOP79738XOHb79u389NNPzJw5k4ULFzJ79uwSt//cc89x9dVX89VXXxETE0P//v0LLaeUYsmSJbRu3brEbdg4dOgQ06ZNY8uWLdStW5fx48fniWtm+/1liZ1WXBsXDVkZcO4k1G5aMM9YZAYPwzh7eBGFxSMLDg6mbt26rF27FoBPP/3Ubp25woABA1i8eDGnTp0C4PTp0xw+fJiEhARycnK45ZZbmDp1Klu3bgWgZs2apKSkFKinR48erFmzhkOHDtnrgbwx0WxjTIXVM3DgQN555x1s0Rp+//13+2+2xU7buXMnO3bscPpbkpOTCQ4Opnbt2pw8eZIffvjB5fPgKhXRRrmQbDnk5F9nEcA/EHyrmzEyg8dgLDIvwjEeWYMGDezdgXPnzrU7e1xxxRV88sknLtfZrl07pk6dyvXXX09OTg7+/v68++67BAYGMmHCBHKsSbM2i238+PFMmjTJ7uxhIzQ0lA8//JCbb76ZnJwcGjRowC+//MJTTz3FuHHjmDp1KoMHD7aXv/rqq3n11VeJjIzkmWee4bnnnuPRRx8lPDycnJwcmjdvznfffcf999/PhAkTaNu2LW3btqVz585Of0tERAQdO3akTZs2NG3alF69epXo/LpCRbRRLjibQwba+cMsU2XwIEw8slJi4pEZypMKv3a2fwFf3Qd/j4b6VxbMn9ENQlvD6E8rTiZDlcDEIzMYDOVD0lH9XVjXIhiLzOBRmK5Fg1fSvXt3MjIy8qR9+umndi/DKk9SHASF6PGwwgioAynHK1Ymg6GUGEVm8Eo2bdpU2SJc3CTFFj4+ZiOwLpz6s+LkMRjKgOlaNBiqIslxUKsoRWa6Fg2eg1FkBkNVpDiLLKAOZCRDTnbFyWQwlBKjyAyGqkZ6klZSzhw9wGF1DxPKxXDxYxSZwVDVSIrT38VZZGAmRRs8AqPIvIwaNWqU+thXXnnFvh0TE0P79u3LQySXcBY77dixY4wcObJUdeYPKWOwSLYUWZFjZLYV8M04meHixygygx1HRVZWnK1VWFIaNWrE4sWLS3WsUWROsM8hK8bZAyDdWGSGix/jfu8mXtv8GntO7ynXOtvUa8PT3Z52ufwbb7zBwoULycjIYMSIEfa4WMOHD+fo0aOkp6fzyCOPMHHiRCZPnkxaWhqRkZGEhYXx8ssvk52dzb333sv69etp3LgxS5cuJTAwkIMHD/Lggw8SHx9PUFAQH330EW3atGH8+PEEBATw+++/06tXL958880CMp0+fZq77rqLv/76i6CgID788EP7yvDbt2+nZ8+eJCQk8NRTT3HvvfcSExPDkCFD2LlzJ9nZ2UyePJnVq1eTkZHBgw8+yH333QfAa6+9xv/+9z98fHy44YYb6NKlS4HYaIGBTuZMVTWS4kB8oeYlzsuYhS2or0YAACAASURBVIMNHoRRZF7Kzz//zP79+9m8eTNKKYYOHcqaNWvo27cvs2fPpl69eqSlpdG1a1duueUWXn31VWbMmMG2bdsA3bW4f/9+Pv/8cz766CNuvfVWlixZwu23387EiROZOXMmLVu2ZNOmTTzwwAOsXLkSgNjYWNavX4+vr2+hcr3wwgt07NiRr7/+mpUrV3LnnXfa29yxYwcbN24kNTWVjh075ll7EWDWrFnUrl2bLVu2kJGRQa9evbj++uvZs2cPS5cuZdOmTQQFBXH69Gnq1avnNDZalSflBNRoAD6F/0eACeVi8CiMInMTJbGc3MHPP//Mzz//TMeOHQE4d+4c+/fvp2/fvkyfPp2vvvoKgKNHj7J//35CQkIK1NG8eXMiIyMBHaMrJiaGc+fOsX79+jwxzRxX0Bg1apRTJQawbt06lixZAsA111xDYmIiycnJAAwbNozAwEACAwO5+uqr2bx5s71922/asWOHvasxKSmJ/fv3s3z5ciZMmGCPw2aLdWZwwvkECA4tuoxx9jB4EEaReSlKKZ555hl715uN1atXs3z5cjZs2EBQUBD9+/d3Gi/LMd6Wr68vaWlp5OTkUKdOHbsVlZ/g4OBSyywiRe4rpXjnnXcYOHBgnvSffvqp1G1WSVLjIbh+0WX8A8Av0HQtGjwC4+zhpQwcOJDZs2dz7tw5AOLi4jh16hRJSUnUrVuXoKAg9uzZw8aNG+3H+Pv7O43QbKNWrVo0b96cRYsWAVq5bN++3WW5+vTpw/z58wGtVOvXr0+tWrUAWLp0Kenp6SQmJrJ69Wp7GBrH3/T+++/bZdy3bx+pqalcd911fPLJJ5w/fx7IjXXmLDZalSc1vniLDMzqHgaPwSgyL+X666/ntttuo2fPnnTo0IGRI0eSkpLCoEGDyMrKom3btkyePJkePXrYj5k4cSLh4eGMHTu2yLrnz5/PrFmziIiIICwsjKVLl7os15QpU4iOjiY8PJzJkyczd+5ce154eDhXX301PXr04LnnnqNRo0ZArmV2zz330K5dOzp16kT79u257777yMrKYtCgQQwdOpQuXboQGRlpd+O3xUaLjIwkLS3NZRm9ntRE1xRZQB1jkRk8Aq+NRyYidYCPgfaAAu4C9gILgGZADHCrUuqM6Cfl28CNwHlgvFJqa1H1m3hkFUN0dDSPP/44v/76a2WL4lYq7Nq5kAqvNIJrp0Dvx4ouO3sQ+PjB+O/cL5ehymDikZWMt4EflVJtgAjgT2AysEIp1RJYYe0D3AC0tD4TgfcrXlxDfqKiohgzZgyPPPJIZYviPaQm6O+gYsbIQE+KNs4eBg/AK509RKQ20BcYD6CUugBcEJFhQH+r2FxgNfA0MAyYp7R5ulFE6ojIpUopE5CplHzyySe8/fbbedJ69erFu+++63IdXbp0Yd++feUtWtXGpshc7lrc4V55DIZywCsVGdAciAc+EZEIIBp4BGjooJxOAA2t7cbAUYfjY620PIpMRCaiLTYuu+wytwnvDUyYMIEJEyZUthiG/JwvgSIzzh4GD8Fbuxb9gE7A+0qpjkAqud2IAFjWV4kGCJVSHyqluiiluoSGuvAgMBguNlLj9XdwwXmDBQioAxfOQXbRnqwGQ2XjrYosFohVStnCBC9GK7aTInIpgPV9ysqPA5o6HN/ESjMYvAu7InPRIgMTysVw0eOVikwpdQI4KiKtraQBwG7gG2CclTYOsPmNfwPcKZoeQJIZHzN4JakJ4B8E1VyYuG5fAd84fBgubrx1jAzgIWC+iFQD/gImoBX3QhG5GzgM3GqVXYZ2vT+Adr83gzsG7yQ1wTWPRTALBxs8Bq+0yACUUtus8axwpdRwpdQZpVSiUmqAUqqlUupapdRpq6xSSj2olGqhlOqglIoqrv6qRFRUFA8//HCltd+sWTMSEhIKpM+cOZN58+aVuL6zZ8/y3nvvlYdonocry1PZMAsHGzwEb7bIKpUTr7xCxp/lG8alets2XPLss2WuJzs7u8iFffPTpUuXi3IF+UmTJpXqOJsie+CBB8pZIg/gfALUKCJ8iyPGIjN4CF5rkVVVYmJiaNOmDWPHjqVt27aMHDmS8+fP06xZM55++mk6derEokWL+Pnnn+nZsyedOnVi1KhR9jUZt2zZwlVXXUVERATdunUjJSWF1atXM2TIEECvYzh8+HDCw8Pp0aMHO3boeUb5Izy3b9+emJgYUlNTGTx4MBEREbRv354FCxY4lX3FihV07NiRDh06cNddd+VZVf/111+nQ4cOdOvWjQMHDhRo8+DBgwwaNIjOnTvTp08f9uzRLxEnT55kxIgRREREEBERwfr165k8eTIHDx4kMjKSJ598shzPvgeQmlByi8yMkRkucoxF5ibKw3IqLXv37mXWrFn06tWLu+66y96NFhISwtatW0lISODmm29m+fLlBAcH89prr/Hmm28yefJkRo8ezYIFC+jatSvJyckFglEWFU+sMH788UcaNWrE999/D+jQK4WRnp7O+PHjWbFiBa1ateLOO+/k/fff59FHHwWgdu3a/PHHH8ybN49HH32U777Lu2ySsxhpDz/8MP369eOrr74iOzubc+fO8eqrr7Jz584i5fZKlCpZ12KA6Vo0eAbGIvNCmjZtSq9evQC4/fbbWbduHQCjR48GYOPGjezevZtevXoRGRnJ3LlzOXz4MHv37uXSSy+1rzpfq1Yt/PzyvuusW7eOO+64AygYT6wwOnTowC+//MLTTz/N2rVrqV27dqHl9u7dS/PmzWnVqhUA48aNY82aNfb8MWPG2L83bNiQ51jHGGmRkZHcd999HD+unU5XrlzJ/fffD+hQNM7arxJkJEP2Bddc7wH8qoF/sOlaNFz0GIvMC3EW18sWK0wpxXXXXcfnn3+ep9wff/xR6jb9/PzIycmx79tinLVq1YqtW7eybNky/vnPfzJgwACef/75Etfv+Jvy/77iYqQZLEqyPJUNs7qHwQMwFpkXcuTIEbvV8tlnn9G7d+88+T169OC3336zjzWlpqayb98+WrduzfHjx9myZQsAKSkpZGVl5TnWWTyxZs2asXWrDhiwdetWDh06BMCxY8cICgri9ttv58knn7SXyU/r1q2JiYmxy/Tpp5/Sr18/e75tbG3BggX07Nkzz7FFxUgbMGAA77+v14DOzs4mKSmp6sYpK8mCwTZMKBeDB2AUmRfSunVr3n33Xdq2bcuZM2fsXWs2QkNDmTNnDmPGjCE8PJyePXuyZ88eqlWrxoIFC3jooYeIiIjguuuuKxA92lk8sVtuuYXTp08TFhbGjBkz7F2Ef/zxB926dSMyMpIXX3yRf/7zn4XKHBAQwCeffMKoUaPo0KEDPj4+ebwSz5w5Q3h4OG+//TZvvfWWPd1mnTmLkfb222+zatUqOnToQOfOndm9ezchISH06tWL9u3bVy1nD/uqHiVQZIF1jLOH4aLHa+ORuZuLNR5ZTEwMQ4YMYefOnZUqR0Xw0EMP0alTJ69YnLhCrp3oOfDtI/DYbqjd2LVjvhgLp/+CBzYUX9ZgcAETj8xgsHjuuefYtGkTQ4cOrWxRPIfSWGSma9HgARhF5mU0a9bsorfGRowYQWRkZJ7PTz/9VKI6/vWvf7F582ZCQlxYxd2gSU2A6rXAr7rrxxhnD4MHYLwWDRXOV199VdkiVE1KMhnaRkAdyDwPWRklU4AGQwViLDKDoaqQGl8yj0VwWN3DWGWGixejyAyGqkJqQsnmkEFuKBfTvWi4iDGKzGCoKpRkeSobZuFggwdgFJnBUBXIyYHziSVXZCaUi8EDMIqsCnHjjTdy9qx+INWoUaPQMuPHj2fx4sXl1mZ5xjIrb9mqFOlnQWWXvGsxwKyAb7j4MV6LbmLtwn0kHD1XrnXWb1qDPre2KvXxy5YtK0dpXONijWVW5bDPISvlGJnpWjRcxBiLzIt44403mD59OgCPPfYY11xzDaBXgB87dmyhkZaVUvz973+ndevWXHvttZw6darINl566SW6du1K+/btmThxIraVYfr378/TTz9Nt27daNWqFWvXrgXIE8tsypQpjBs3jj59+nD55Zfz5Zdf8tRTT9GhQwcGDRpEZmZmkW0YyoB9weCSjpFZ0QJM16LhIsZYZG6iLJZTqdvs04f//Oc/PPzww0RFRZGRkUFmZiZr166lb9++/PbbbwWO+eqrr9i7dy+7d+/m5MmTtGvXjrvuustpG3//+9/tq9ffcccdfPfdd9x0000AZGVlsXnzZpYtW8aLL77I8uXLCxx/8OBBVq1axe7du+nZsydLlizh9ddfZ8SIEXz//fcMHz68yDYMpcRmkZXU/d7XD6rVNBaZ4aLGWGReROfOnYmOjiY5OZnq1avTs2dPoqKiWLt2LX369Cn0mDVr1jBmzBh8fX1p1KiR3YpzxqpVq+jevTsdOnRg5cqV7Nq1y55388032+WIiYkp9PgbbrgBf39/OnToQHZ2NoMGDQJ03DLbMUW1YSglpe1aBLO6h+Gix1hkXoS/vz/Nmzdnzpw5XHXVVYSHh7Nq1SoOHDhQLgvSpqen88ADDxAVFUXTpk2ZMmVKntXxq1fXKz/4+voWCP+Sv4yPjw/+/v721et9fHzIysoqtg1DKbGHcCnFkl4BJVwB/69fITMNWg8qeVsGQykwFpmX0adPH6ZNm0bfvn3p06cPM2fOpGPHjgWCUdro27cvCxYsIDs7m+PHj7Nq1SqnddsUSv369Tl37pxbPAgroo0qyfkE7bjhW4p318ASLhz8/eOwfErJ2zEYSomxyLyMPn368PLLL9OzZ0+Cg4MJCAhw2q0IegHflStX0q5dOy677LICQSsdqVOnDvfeey/t27fnkksuoWvXruUuf0W0USVJjS9dtyJoRRa/z7Wy8fsg8UDJx+IMhjJg4pGVkos1HpnBM3H7tfPJjfp7QimmYKx6Bda8AY/vgZoNiy677i3LGhN4PhF8fEvensGrqbR4ZCJyiYh8ISIHRSRaRJaJSLm55YlIfxG5qrzqMxgM+UhNKN34GEDYCFA5sHtp8WX3fG9tKDh/unTtGQwlpFhFJnpw5StgtVKqhVKqM/AMUMyrWYnoDxhFdhFRHjHDDBcRZelabNAWGrSDnUuKLpdyEmKjINSyLM8nFF3eYCgnXBkjuxrIVErNtCUopbaL5g3gBkABU5VSC0SkP/CEUmoIgIjMAKKUUnNEJAaYC9wE+AOjgHRgEpAtIrcDDyml1pbbLzSUChMzzIvIzoK006VXZADtb4aVUyEpFmo3KbzMvh8ABZ3Hw49PWy7/pqvd4H5c6VpsD0QXkn4zEAlEANcCb4jIpS7Ul6CU6gS8j1Z4McBM4C2lVKRRYgZDOZNmdfGVdFUPR8L0HEF2FfGCs2cZ1LkcmvfV+7a5awaDmymL+31v4HOlVLZS6iTwK+CKi9mX1nc00KwM7RsMBlewT4YugyILaQGXRjrvXsw4B3+thjaDcy2/1MTSt2cwlABXFNkuoHMJ6szKV29AvvwM6zsb4/5vMLifsqzq4Uj7W+DY75B4sGDewRWQnaEVWVA9QIxFZqgwXFFkK4HqIjLRliAi4cBZYLSI+IpIKNAX2AwcBtqJSHURqQMMcKGNFKBmiaU3uJXCFhk2eCD2BYPLqMjCRujvwroX9yzTE66b9tAu90EhRpEZKoxiFZnSE81GANda7ve7gH8DnwE7gO1oZfeUUuqEUuoosBDYaX3/7oIc3wIjRGSbiDifvWswGEqOfXmqMk5SrtMUmnaHnV/mTc/OhH0/QqtBuSuHBNc3XouGCsOlrj2l1DHg1kKynrQ++cs/BTxVSHozh+0otNs9Sql9QLgrsngKq+Z8yKnDf5VrnQ0uv4Krx090mv/GG29QvXp1Hn74YR577DG2b9/OypUrWblyJbNmzWLIkCG88sorKKUYPHgwr732GgCff/55oekGLyE1HsQnN7ZYWWh/C/zwFJzaAw3a6LQjG/Siwm0G55YLDs1VoAaDmzFrLXoRffr0sccBi4qK4ty5c/YwLq1ateLpp59m5cqVbNu2jS1btvD1119z7NixQtMNXkRqvLbGfMrhdm83DBDY5WCV7VkGfgHQwiFyQnB907VoqDCMs4WbKMpychf5w7h06tTJHsblpptuon///oSG6nGSsWPHsmbNGkSk0PThw4dXuPwGN3E+sWwei47UvASa9dbei/2f0Wl7v4cr+kO14NxyQfWNRWaoMIxF5kXkD+PSp08fexiXZs2aVbZ4VZucbLhwXodDSTkOp2Mgfo/+5OS4t+3U+PJTZKC7FxMPwIkdcHInnD0CrW/MWyY4VHc3Zl0ov3YNBicYReZlOAvj0q1bN3799VcSEhLIzs7m888/p1+/fk7TDeVIRop+6CfshTMxkHICMlNBKR23Kzuj2CrKRFmWpyqMtkPBx087fexZBgi0viFvGZviPG/mkhncj1crMmtqwO8i8p2131xENonIARFZICLVrPTq1v4BK79ZZcpdFvr06cPx48fp2bMnDRs2tIdxufTSS3n11Ve5+uqriYiIoHPnzgwbNsxpuqEcuZCqv+tcDqFt4JIIaBgGdS7T6dmZ7m0/NbF8w6oEh+iuxJ1fwp7voGk3qNEgXxnbpGgzTmZwP94+RvYI8CdQy9p/Db0U1hciMhO4G71U1t3AGaXUlSLyN6vc6MoQuKwMGDCAzMzcB+O+fblxpMaMGcOYMWMKHOMsPSYmxi0yVjlyMkF8rYnCDvj66+9sN3a/ZWVARlL5WmSguxe/vh+SjsC1LxbMt1tkZpzM4H681iITkSbAYOBja1+AawBbyOG5gM2jYZi1j5U/QJyFVDYYSkp2Zq7ScsTHHxD3KjL7ZOhyDnTZZjD4VrO2hxTMt1tkRpEZ3I/XKjLgv+i5bLaR9BDgrFIqy9qPBRpb242BowBWfpJVPg8iMlFEokQkKj7edJkYXMSZIhPR6W5VZOWwzmJhBNSGtjfBJeFQ/8qC+bb2TNeioQLwSkUmIkOAU0qpwlbtLzVKqQ+VUl2UUl1s7uqFlCnPJg3eQPaFXOslP77VUGX17Mu6oMOrFMb5clqeqjCGvQcTfig8L6COdggxFpmhAvBKRQb0AoZa8c++QHcpvg3UERHbuGATIM7ajgOaAlj5tYESu1sFBASQmJholJkhF5UDOVlWN2Ih2T5+JCadIyAg/9raLpKdBZ+Ngne6wLlTBfPLa53FwvAPgOo1Cs8TseaSGYvM4H680tlDKfUMOoo1DoE+x4rIImAkWrmNA2yx27+x9jdY+StVKbRRkyZNiI2NxXQ7GuzkZEHyKQjKgmpnC+annSUgfhtN+k8oXf2/PK/DpwBEz4F++VaGc1fXoiuYZaoMFYRXKrIieBr4QkSmohcznmWlzwI+FZEDwGngb6Wp3DYh2WCwc2QTLLoVxi6GloWE64uaDRsmQ89hUL1Ryere9hlsfBe63w+J+2HLLOj9WN7xuNQEbQ1Wr+W8HncRHGK8Fg0VgtcrMqXUamC1tf0X0K2QMunAqAoVzFA1SLZ6r2s5UVK1m+rvpFjnZQojNgq+fVRHY75+KhxcqbsYdy+FDiNzy6UmaMuoMpxwg0P1BHCDwc146xiZwXBxkHxMfztVZE30d9LREtR5HL4Yq9c9HDVXh0658lqodwVs/jBv2fJenqokmK5FQwVhFJnB4E5SjoN/kPbiK4xa1gwQZ16H+clMhwW362WvxnyeO8naxwe63gtHN8GxbbnlK1WR1YcL5/QyXAaDGzGKzGBwJ8lxUPNS5117AbX0nCxXFJlS8N1jEBcFN3+gl7lypONY8A/Oa5WdT3CPx6Ir2JbFMlaZwc0YRWYwuJPkY8WPfdVu6poi27kEtn+mw6e0valgfkBtiBwDfyzOVR6plajIzHqLhgrCKDKDwZ0kH8vtPnRG7SaujZEdWqOjPPctEHw9l24T9Wr60XP0YsWZ5yGowCI1FYNZpspQQRhFZjC4i5wcPUZWrEXWxDWLLGEfhLYtOtJzaGu9Mn3UbB0uBirRIrMUqHHBN7gZo8gMBneRGq8nRLuiyNLOQMY552WU0kE4Q1sV3263+/TYXPQcvW+6Fg1ejlFkBoO7KG4OmQ3bXDJb+cJITdDKrn7r4tttNVDHOtv8kd6vLEVWrQb4BRhFZnA7RpEZDO6iuDlkNlyZS5awV3+HuqDIfHz1WFmW5fYeXEljZCLWXDITJdrgXowiMxjchV2RueDsAUWPk8Xv0d+uKDKAjrfr+WtQeRYZaEcTY5EZ3IxRZAaDu0g5ptc5DCpmQnKNS3QE6SIV2T7dVVecUrQRWBcix0JwA6gW7LrM5U1wqFFkBrdjFJnB4C6Sj+nJ0EV5GYJeYqpWo+ItsvqtSrZm4sBXYNI618u7g+BQOG+6Fg3uxSgyg8FduDIZ2kZxLvgJ+1zvVrThVw1qNizZMeVNsNW1aGL0GdyIUWQGg7tIjiuhInPi7JGepOej1XfB9f5iIzgUstL1mosGg5swisxgcAdK6VXqS6TI4vQk6vzE79PfoW3KT76KwqzuYagAjCIzGNxB2hnt/l4SRZaTCamnCuaVxPX+YsMsHGyoAIwiMxjcgatzyGw4BtjMT/we8K0OdS4vH9kqElsIGeO5aHAjRpEZDO7A1TlkNoqaFB2/D0Ku1N6NnoZZpspQARhFZjC4A1eXp7JR1KTohL2urbF4MWKzyMzCwQY3YhSZweAOUo4DAjVcdH8PqA3VaxVUZJlpcOawZzp6APgH6oncZozM4EaMIjMY3EFynFZivv6uH1PYXLKE/YDyTNd7G8H1Tdeiwa0YRWYwuIOSTIa2UdhcsgSb670HeizaCA41FpnBrRhFZjC4AwdFduxsGlO+2cXOuKSijynMIovfA+KjnT08laD6RpEZ3IpRZAaDO3CYDP31tjjmrI9hyDvruGdulHOFVruJXpfwwvnctPi9ULc5+FWvAKHdhOlaNLgZo8gMhvImIwUykuyKbM/xFC6pFcDj17Vi86FE5wqtsACb8Xs9u1sRrIWDE8x6iwa3YRSZwVDeJB/X39Ycsj0nkmnfuBYPD2jJusnX5FFoD3/+O9k51gM+/1yy7Ew4fdA7FFlOFqSfrWxJDF6KUWQGQ3njMIcsIyubg/GptLmklk4K8LcrtLHdL+Ob7cc4cMpaUDf/XLLTh7QCqO/piswsU2VwL0aRGQzlTYplkdW8lAOnzpGdo2hzac08RWoF+DO2u15yat/JFHt5xCdXkZU0KvTFSmkV2fIpsOrf5S6OwfvwwDVvDIaLHAeLbM8OHVTSZpE5ckVoMD7ioMh8/bUysyky22LBnjyHDEq3TFXWBdj0AWSeh6Zd4cpr3SObwSswFpnBUN4kH4PAeuAfyJ4TyVT386FZSFCBYgH+vjQLCc5VZJB3Lln8PqjVBKrXqCDB3URQKRYOPrZVKzH/IFj6dx1NwGBwglFkBkN5k3zMwdEjhVYNa+LnW/it1rJhDfafdAg66TiXLH6P53crAgSF6O+SdC0eWgMI/O0zrQCXPeUW0QzegVFkBkN54xAZ+s/jKbS5pKbToq0b1iQmMZX0zGydYA+wma2Xp/IGReZXDQLqlGzh4ENr4JIO0OJq6PsU/LEQdn3tPhkNHo1RZAZDeZN8HGpdSsK5DBLOZdDm0oLjYzZaNqxJjoKD8TbPxaaQnQFxW3VgTk8fH7MRHOp612JmOhzdDM376v0+j0OjjvDdY5By0n0yGjwWo8gMhtKQmlh4ema6tjxqNWbvCT321bYIi6xVQ51n7160ueAfXKG/PXXV+/wEl2CZqtjNWpk366P3ff1hhOX48e0jZmK1oQBGkRkMJWXVKzDtSmscJx821/tajfjzeDIArYtQZM3rB+PnI7kOHzZFdsCmyLygaxFKpsgOrdXTEC7vmZsW2hoGvAD7foDf/+ceGQ0ei1FkBkNJ2LEIfn1NWwW/vFDQOrBHhm7EnhMphNasTkgN5+skVvPzoXn9YPblt8jiorS3X1A9N/yISqAkXYuH1uiuxIDaedO7T9JW2o/P6BhtBoOFUWQGg6sc3QxLH4TLe8Pg/2gX8T+/zVvGPhm6EXtOJBfp6GGjVcOauRZZQB0diFLleE+3ImilfD5RO7EUxYVUiIvO7VZ0xMcHhr+nt1dOLX8ZDR6LVyoyEWkqIqtEZLeI7BKRR6z0eiLyi4jst77rWukiItNF5ICI7BCRTpX7CwwXHWePwBe3aW/E0Z9Cp3HaEWPlvyA7K7ecNRk6q8Yl7Dt5jrZFOHrYaNmwBkfPnCftQjaI5FploV7i6AHWpGhV/HywIxshJxOaF6LIAOpcBq0GQsxaM1ZmsOOVigzIAv6hlGoH9AAeFJF2wGRghVKqJbDC2ge4AWhpfSYC71e8yIaLlowU+OxverWJ2xbq7j5fP7jmOR34cscXuWWTj0G1msSc8+VCVo5LFlnrhjVRioJrLnqTRRbs4qToQ2vAxw8u6+m8zGU9tOWbPwhpZZGelPdlxlDheKUiU0odV0pttbZTgD+BxsAwYK5VbC4w3NoeBsxTmo1AHRG5tILFNlyM5GTDknv05ORb5+S1ktreBI066fUAM9N1mjWH7M/juquwsKWp8tPS8lws4PBxkbjer9p7ig9+PYgqiwXkqiKLWQuNu0C1YOdlmnbT30c3l16e8iIjBd7qAG+102OmCQcqW6IqiVcqMkdEpBnQEdgENFRKWYMYnAAaWtuNAcfXu1grLX9dE0UkSkSi4uNNoMAqwS/Pw74f4YbXoMU1efNE4NopkBwLUbN0mjWHbM+JZPx8hBYNinggWzQLCaKar0+uIqtzmf6+CCyy9Mxsnlq8g3//sIdPN5bBwcK+3mIRnovpSXDsd+fdijYahOlxxCMbSy9PefHXah17rm4zWP8OzOgMswdpz8qMc8UdbSgnvFqRiUgNYAnwqFIq2TFP6dfLEr1iKqU+VEp1UUp1CQ0N0YCw2gAAIABJREFULUdJDRclhzfAhhnQ9V7odm/hZa7oB1dcDWumQXqyfXmqPcdTaBFag+p+vsU24+frwxWhDmsudhoPt86DWpXfKbA4Opb4lAxaN6zJS9/uJirmdOkqckWRHd6gnVxsE6Gd4esHjTvD0U2lk6U82f8zVK8F47+Hx3fDtS9qq3Ppg/Cf1hDzW2VLWCXwWkUmIv5oJTZfKfWllXzS1mVofZ+y0uOApg6HN7HSDFWZmLWAwIDnii434HlIOw2//RfOnbC73ucP3VIU2nPReoMPDoF2w0ovdzmRlZ3DB2sOEtG0Dgsn9aRJ3UDun7+VU8npJa8ssK6eG1ZU12LMWvCtDk26FV9f0+5wcqfu2qsslIL9v+hltHz9oeYl0PtR+HsU3PUT+FWHqNmVJ18VwisVmYgIMAv4Uyn1pkPWN8A4a3scsNQh/U7Le7EHkOTQBWmoqsRG6XGq/POZ8tO4E7QbDr9NB5VDWmBD4s6muTQ+ZqNVwxrEnU0jNePicRr4/o/jHD2dxgP9W1A70J8P7ujCufQsHpi/lQtZOSWrzMdXRwQoSpEdWqPHv/wDiq/vsu7aeouLLpkc5cnJndrppOX1edNFtEPKldfBwZXFTzkwlBmvVGRAL+AO4BoR2WZ9bgReBa4Tkf3AtdY+wDLgL+AA8BHwQCXIbLiYUEpPSm7SxbXy1/xTP1iBo5l1AEpkkdkcPvafcv+4ypdbY5n0aXTuQsWFkJOjeG/VQVo2qMF1bfVQcutLavL6yHCiDp/h5e93l7zh4FDnCwefPw0n/ii+W9FGk66AVK7Dx/6f9bezWGktr9OW+rHfK06mKopXKjKl1DqllCilwpVSkdZnmVIqUSk1QCnVUil1rVLqtFVeKaUeVEq1UEp1UEpFVfZvMFQyZ2L0BN7GnV0rX78ldLwdgL1pWim1LYFF1trmuXii5F1lSimyc1wb7k3PzOaVZXv4cdcJXvx2l9NyK/ecYu/JFCb1a4GPj9jTb4poxD29mzN3w2GWRMfmOSYnR7HnRDLzNx1m3f5CFFbtxtpBI2F/wbzDvwGq8InQhRFQGxq0q1yHj/2/wKURukuxMK64GhA4sLxCxaqKmAjRBkNh2LqsXLXIQHswXtKB9UcbUSfoBA1rOV+aKj9N6wVR3c8nb5DNfKRnZvPNtmPEnjlP3Nl0jielcexsGseS0qkb5M8vj/ejVoB/ke18s/0YCecy6NOyPp9vPkrny+sxsnOTPGWUUry3+gCN6wQyNLJRgTom39CGnceSeParP/D1EWLPnCfq8BmiD58hJV13jQb6+7L6yf40rOXQTXjdSzBvGHxyA9y5FBqG5eYdWqODaLr64gC6G3LnEsjJ0at+VCRpZ7SzSZ9/OC8THKK7nQ8sh/6TnZczlBmvtMgMhjITGwV+gdrV21WC6kG3e9lzUscg00O1ruHrI1zZoAb7iuha/Nd3u3lqyQ5mrDrA+oMJpGdm075xbUZ3acrJ5AxmrT1UZBtKKWatPUSbS2ryyfiu9LwihH9+/Qd7TuRx6GXTodNsPXKW+/pdgX8hAUH9fH2YcVsn6gZV49EF25j28z7izqQxJPxSpo2K4PN7e5Cdo3jjp715D2wYBuOX6QnPcwbn7XI7tFY7cPhVK/5k2WjaHTKSIf5P148pLw6u1F3J+cfH8nPldfpaOl9Kb0+DSxiLzGAojLgoaBSpXb1LQE6OYu+JFG7t0rT4wvlo1bAmG/8qPDxM0vlMvtwax80dG/PayPACCiY+JYOP1/7FuKuaUS+4cGWwZn8Ce0+mMG1UBH6+Prw9JpIh0/+/vfsOj+uqEz7+PffeqZpRt1UsybbcC+6JYzuk2imkERKypLAhhWSXpSSwy7LP8r6EZeGF3Sw1SwmBkE4KBAJJCMQpONhJXGI7rrHlIsmSLMmqo+n3nvePM7JlW7KKR5Y1Ph8/eu6UO3fO1cj3N6f9zlv84+MbeOGzywimanM/fqOKwoD7hOdQGPDwzN1LqGoOMb88l1z/0e9527IJPLhqD59aOoHZ43oMlhkzFW57CR65Bh65Gm5+DvInqmA054aB/JqOqFistjXvHF27OxV2/UWNxOyvBjl5Obz5bRX4PnT9qSnbGUjXyDTtWMk41G8eXDNXSk1rmHDcZsYgBnp0m1oUpL49Snskcdxzz66vIZKwuf3cib3Wkr50yVTCCZufvlnV5/EfWrWHsUEPV89VzYVjg15+dON8qlvCfOU37yOlZMuBdv76QRO3LZuI13XiOXAVBX4unDb2uCAG8E8XTSbP7+Y//rjt+Iwg+ZVw+8sq28dj18Kb/6UeH+hAj255E9UAkupTPJ/McVQgm7xcjcY8kXELVMDT/WTDSgcyTTvWwffVwo6D6R9L6U5NNW0QAz26TS0KALC78eh+MtuRPLJmH2dNyDu6dtPDlKIg184bxyOr93Gwl3le2+s7WLWrmVuXTsBtHflvv7iygC9fOo0X36/nV6v38ZM3qgh6LD65ZPygy99TttfFvSum8u7eFl7Z2nD8DjllcNvLkFsOa38O7iCUzBvcmwihmhdP9cTouvfU6Mv+mhVBBbpJF6n15ZxBTlnQBkwHMk07Vm1qoMe4wQeyHQ0dCHEkKA3G1MM5F4/uJ3t9RyM1LRFuXTrhhK+/Z/lUbEfywGvH5/t7aNVefC6TmxdXHPfcXedVsmJmEd98cTsvbannliXj+x00MhA3nlXO1KIA33ppB7FkL0P9g8Wqz6z8HJh97aCbcQEVyFr3Qqix/33TZdefAdH3sPtjTV4BXY3QsHlYi3Um04FM0451YB0Eio4k7x2EHfWdTCjIwu8e/EV5XK4Pn8s8buTiI2v2UZzt5dJZfQzzTqko8HPDWeU89W41NS3hw483dkR5YdMBblhU1mszoBCC+z8+l5JcL27T4PZlEwdd9t5YpsG/XzGT6pYwj67uI09jVgHc8Qpc9cOhvUnFOWp7Kmtlu/6s5rENdNHTyRerrW5eHDY6kGnasWrXqQvVIEYddhvoYpq9MQzB1KLAUYFsd2OIVbuauXlxRa99Y8f6/EVTMAzB9189MlfrkTX7SDqS204QoHJ8Lp69eynP/cNSxgQHPm2gP+dPHcMF08bww9d2cSgU63vHIfyuATWPy3SnN5DtXgkte3p/LtSoFlQdSLNit8BYVc7dK9NTPu04OpBpWk/hFmipGtJAj65Ykv0t4UGlpjrWlJ45F4FH1+zDbRrc2EuTYG+Kc7z8/Tnjef69WnY3dhKOJ3n87WoumVnEhMITZ+IvzvHyobJ+0nENwVevmEE4bh8VXNPG8kDp/PQN+Gg/AI9fBw+tgIO9ZC/pDkZTVgzuuJOXq2AbaTv5MmrH0YFM03o6sEFthzDQ44ODnUg5uNRUx5paFKCpM0ZbOE5HNMFz62u5cm4JhYGB15L+8YJJ+Fwm3/vLLp5bX0t7JMGnP1w55DKdrMljg9y8uIIn361m1wkmfA9Z+WKo33hkTbiTsflpQKoa4iNXHR/Mdv1ZNTsXzxnccSevAGnD3jdPvozacXQg07SeDqwDhPqWPwgJ2+G7f/kAyxDMLcsd8tv3HPDx3LpawnGbT/UzyONYBQEPd5w7kRffr+eHK3cxrzyXhePzhlymdLhn+VT8bpP/fHEYJi+XLwY7roLZyZASNj2lBp/c/orKaN8zmNlJqFqpgtJgM4mUnQWeHDVsX0s7Hcg0rafadTB2BngGXquSUvJ/freFVbua+da1H6I4ZwDZ2/vQHch2NHTw6Jp9zK/IZc4QAuOd51WS43PRHIrz6Q9XDirLyHDIz3Lz+Yum8OYHTazaleZFact7TIw+GQc2QPMHMO8mKJik1hjrGcxq16rFPwfbrAhqRGbl+app8mRW2tZ6pQOZpnWTUuVYHGT/2I/fqOLXa2v43EWTueGswWf06Kkkx0vQY/HI6n3sOxQedG2sW7bXxb9eNp2lkwq4dFZR/y84Bf5+6XjK8nx866UdA05yPCCBMWqS9cn2k218AiwvzPqoun9sMHv7xyBMtf7YUExZAZ110DgCKbUynA5kmtatZY9admMQ/WO/33iA/35lJ9fMK+WLK6aedBGEEEwuClDV1MWYoIfLZw99leibFlfw5KfPwRrAaMdTwWOZfPmy6Wyv7+D599K8bm33xOih1naSMZWAePqVR68/1zOYbX8BKpb0vz5dXyZ1D8PXzYvpdnr8hWva6aA2tXrPACdCv7u3hX95djNnT8znv66fk7bmu+4lXW5eXHFUFo5McNWcEuaW53L/KzuJxNO44GT5YpVto69h8/3Z+TJE22Dejcc/1x3MimbDwk8NvYw541QSaj2fLO0y63+Jpp2MA+vAlaX6yPpR1RTirsfWUZbv48FPLsRj9ZNzbxAWjs8j6LG4aYBD7kcTIQT//pEZNHRE+cVbQww6vTnZfrJNT0GwJLWGWC8KJsE//g3mfHxox+82+WLYvwZiwzB68wymA5mmdatdp0Yr9pMItrUrzm0Pr8UUgl996uxes2WcjOsXlvHuvy9nbHDog0ZOZ2dPzOeSmUX85I0qmjpPMEl6MMZMV01+m55SzYSDEWpUownn3NB/EuCTNWUFOAm1/pqWNjqQaRqoOUgN70NZ/wM9Hly1h5rWMD+/dREVBf60F0UIgc89zBfUEfaVy6cTSzr8YOUH6TmgYcDFX1MB4tc3QSIy8Ne+/6ya4zX3pvSU5UTKz1G1/qrXh/+9ziA6kGkaqCDmJPrtHwvFkjz+9n4um1XMgoqRnZs1mlWOCXDT4gqeerfmuGz/Q3bWHSpn4+6V8MTHIdb3IqVH2fgklC6AsdPTU44TsdwwYRns0YEsnXQg0zRITYSm3xGLT6+toTOa5K7zRi5TRqb4wsVT8LlMvv3yjvQddOGt8LEHYf9qePxjat7XidRvhoNb1NyxU6XyAji0G9pqTt17ZjgdyDQNVP9YsBSyS/vcJWE7/PKtvZw9IZ/5ujZ20goCHj5z4SRe3d7ImqreV8Yekjk3wMcfVhOcH7la5c/sy6anwHDB7OvS9/796R5QsueNU/eeGU4HMk0DVSPrp3/spffrOdAW0bWxNLp92URKc7x8/Q9bSdhpXHhy5jXwiSfV5ONfXQGtvSwjYydg8zMw7bKBL8mSDmNnqHyNunkxbXQg07SuZmjdp/Lh9UFKyc/e3MOkMVlcNH3sqStbhvO6TO67ehY7Gjr56RtV6T341Evg5mfUZ/uDOfDgBfDmf0PDFjVxeverau7ZvJvT+779EUI1L+55U68anSY6kGlnNjsJL/+ruj1+WZ+7/W33IbbVd3DXeZUYxsjmLcw0l8wq5oo5Jfzotd3pz45feQF8Zo0a0WhY8Po34afLVGD781fBXzjwlZ7TWq4LVRA9uOXUv3cG0oFMO3PZSXj+btjyHCy/74QDPX721yoKAx6umTfulBXvTPL1q2fh95h8+Teb05uHESBvAnz4i3Dnq/ClnWpk49iZ0F6rMnWYrvS+30BUXqC2unkxLQa/HrumZQI7Cc/fpfLrLb8Pzr23z1231XWwalcz/3LpNLyuzJ7fNVIKAx6+dtVM7n16E79avY87zu19NevOaIKvvbCVbXUdvT4/vTjIDYvKOaeyoPeac7BIjWxceKv6GxjuCdB9yS5Rk7j3vAHLvjAyZcggOpBpZ56jgtjX4dx7Trj7z1ftwe82uWXx+FNUwDPTR+eN44WNddz/yk5WzCg6brJ5VVOIux5dx75DYS6cNgbzmEBlO7ByRyO/21hHWZ6Pjy8s5/pFZYzL9fX+huYIX/4qL4T1D6vJ+K7MzOJyquhApp1ZBhnE6toi/GFTHZ9cMp4c/wg0QZ1BhBB889oPccn3/sq/Pb+Zx+9YfDgR8+s7Gvn8U+/hsgwev2MxSyYV9HqMaMLmla0NPLOuhu+9+gHfX/kB504u5F8unTakdd2G1aQL4Z2fQM3bR5oatSHRfWRa5pNSjUzcvwZ+c8eAgxjAw3/bi4Q+m7q09CrN9fGVy6fzt92HeGZdDVJK/vf13dz+yFoqCvy88NllfQYxUKMgr5k3jifuPIdVX76Qz100hR0NnXzq4bXUtQ0ibdWpMH6pGoCi55OdNCH1aqVDsmjRIrlu3bqRLoZ2LMdWS97vewuadkLzLrXqb7TtyD4DDGLtkQTLvv0aF00fyw9vnD+MhdZ6chzJjT9/m231HSydVMArWw9y9dxSvnPdnCHloKxqCnHNA39j0pgsnr57yenVz/nLyyERhrvfHOmSnDJCiPVSyoEv+jcAumlRGz5SqqASLFWd7MOlZY/6Vlv1ukoa2x20AsVQOAVmfwwKpqjbY6ZDbv+rOCdth3uf3kg4nuTu8/UE6MEItUaxkw45Y4aWUNkwBN+5bg6Xfv+v/GXbQf7t8uncdV7lkNd7mzQmwP/cMJe7H1vPfS9s5dvXzRnScYbFpAvh9W+p7COnclJ2htGBTBserfvgpS/DrlfU8vBTL4X5t8CUS9Iz3DkWgvceg3d/Di2pibTZZTDjStWJPvF8CIwZ0qGllNz3h628tqORb147m1mlQ1wR+Ay0b3Mzf/nlVqSEq++ZR/HEof3uJhRm8avbzsZlChZNOPkL/KWzivnshZN54PXdzC3P5caz+1/rTUpJTUuEdftbWLe/lU01bbhMg9JcL6U5PkpyfYzL9VKa62NmSfbQVuKuvEDNbdv7Jsy6dvCv1wAdyLTBklJlJuhLMg5rfqQyKIjU0hrRNtj4FOx8CbLGwtxPwPxPwpipg3//UBO8+zMVwKJtaun5xf+gvtkWTD5x2QbooVV7efztau4+v5Kb9UjFAZFSsv7l/bzzhz2MKQ8SCyf44482cc098xlTERzSMU/UFzYU966YyuYD7Xzt91uZXhzsNV9mZzTB7zbWsXp3M+v2tx5eLy3osZhXoQaL7Gjo5PUdTUQSR1a4nloU4P9eOYtzpxT2Ww4p5ZHaZekC8OSo1gQdyIZM95EN0RnXR9YdQNb+Alw+qDhHra1UsVgt325aql/qj1+E5p0w4yq47NuQU6ZebydUSqANj8EHf1LrP+VNOHKM8nNUs5/Ry7faZAxa9sK7D8LGJ9T96Veo+TflZ6f1NF96v57PPLGBKz5Uwo9unJ+RWTwcR9LRFKGzJUpnS5RQa4xQS5RQaxTTMqicP5bK+WPw+Ab2PTceTfLao9up2tDElLOKuPCT04l0xnn+/g0kEw4f/eJ8CkoDw3xWA9MWjnPVA2+RSEr+8LlzGRP0ALDrYCePrtnPbzfU0hW3KcvzsWh8Hgsn5LNofB5Ti4JHDfeXUtIWTnCgLcLOhk5+sHIX1S1hls8o4qtXzGBCYdZR7+s4knf2tvDMuhpe2drAnedO5N4VU1VA+/XN0LAZvrA5LV/ETnfD0UemA9kQnTGB7FAVrHlArdmUjMG0j4Dlgeq3obNO7eMOQOFUqNsAuRXwkftVU2JfOg+qkYP7/6aOE25Wj3tzYNxCQED4EERaVN9BPLWulOmGuTfC0s9B4RSklOxuDLHvUJj69ggH2iLUt0Wpa4twqCtORb6fWaXZzCrNYfa4bMrz/CcMTOv3t3LTz99m9rgcnrhz8ek1KCANpJTs3dTM27+rorUhfOQJAf5sN8F8L+GOOJ2HVEAbP7uAyYvGMmFOIa4+Blm0N0V4+aebaanrYsnHJjNvefnh2kZbY5jn/2cDSLj2SwvILUr/IqRDseVAO9f9ZDXzynO5bdkEHl2zn9VVh3BbBlfNKeXWpeMHPVQ/lrT55Vv7eOC1XcRth9vPnchnL5xMKJbkN+treWZdLdUtYYIei2nFQdbtb+WOcyfy1StmINY+BC/9M3z+PcjP/P5YHchOIxkdyOwk1K6Fd34K219QQ4R7BBBANTG210D1O2oeTN1G1d7/4S+BexAXLCnVYI3qt1PHeU8FLF8++AtUB7g/X+XEm3Y5BIuJJmz+sKmOR9fs5/0DR9abcpsGxTleSnO95Ge52dscZtfBTpKplEdBj8WM0uzDwW1WaTaTxwZwmQb7D3Vx7Y9Xk+21+O1nlpGf5U7f73OY2LZDa30XTdWdxCM2pVNyKSwLIHoJ1nW72ljz/G4a9nSQW+Rn3vJy8or9BPK8ZOV6MC1VE5ZScnBfB7vWHmT3+kbC7XEsj0nxxGzcXgvLbWC5DCy3iWkZbFtdBxIuvXM25TOP78tqqe/id9/dgGkZXPulBWQXHj852XEkXW2xwzXDztYooZYYodYogTwvM5aWDLl5si+/3VDLF5/ZBEBpjpdblozn7xaVUxDwnNRxGzui/PcrO3l2fS3ZXotQLIkjYUllATecVcZls0rwWAb/8cdt/Gr1Pm5eXME3zvVi/O8iuOK7anHQDKcD2WkkowJZdzDZ83pq5N8qiLWrtvuz7lB9UMM56nCAalvDPPFONb9+t5rWcIIpYwPccs545pbnUprrpTDLc1yNK5a0+aAhxNa6drbWdbClrp3t9R1EEyrruNsymF4c5FAoTjie5LefWcbEY5qFRpqUknBHnLaGMC31XTTVdNJcE+JQXQgnefT/X2/ARdn0PMpn5FM+I594JMnbv9/Dvs3NZOW4OevKicxYWoLRx8AEKSVOZyeJ+gZiBw5Qv7OFPfsc2mM+pMePbXmwk5JkwiERtykoDXDpp2edcIRic20nv/vue7g9BudfP4FIwqStIUzrwTBtB8O0NYaPOw+31yQrz0tHUwQ76VBYHmDG0hKmnl2MNys9E9N/u6EWv9ti+YyxhwdqtB0Ms3v9QbILfUxaMPZwgAew29owAgGE1XuTayJuc6g2RHNNJzt3HGJPVRvuUj9XXD+VyWVHD3qRUvKdP+3kp29Wcd38cdxfdzOidD783eNpObfTmQ5kw0gIcRnwA8AEHpJSfvtE+592gaytBrY+r366mlQ/lssHLr/aWr7eU/I4tlolt70aAJlTTkfph9nqW8BW/9l0Sh/RhE0kbhNJ2EQTNlOLgqyYWcT04uCQh0QPVE1LmLd2N7Ny+0Fe29EIwIqZRdy6ZAJLJhUM6P2llIQSIdqibYSTYUqzyjjY7qjAdkAFuJauOP/50dn9jpBL1NXR+drrRDZvwsrLx1VaglVcgqu0BFdJCUZuHvGYQzSUINqVIBJKEA3FsZOS/JIsCkr9iOZ6otu3E926jWRjI+7Jk/DOnIlnxgxiwk9TdSeHDnTRerBLXfDrQyTiR/6fupwo2bEGgh3VZLVUEeioxrJjdM44n7ayhTQ5hUR6zP11+ywWrChj2vg49u6dRLdtJ36gFhmJ4kSjyGhqG4lgt7fjdHUdfdJmqlnRVoMbPFMm45u/AN+C+bgrKg6/3olE1O1IFLu1lURDPcn6ehL1DRzqsNgw425sS9XIhHTIsqLkBCG32E9uRQE54wvJLs4hkO893D8X7Uqwa+1Btq+up6m6E8MSVM4bQ9GE7OM+eykliZhNNKR+77Hu339XgmC+l/IZeZTNyGfs+GyEdIjv3w+Og+PPZt+eONvfPkj97iM1fJ/fYNKYTspa1iI3vk2ipgYjEMC/eDFZS5bAh87mYDjAgZ2tNO7vpK2hi+7LqSfLIrvAR1N1J26fxdyLyph7cTmeHtlhpJT8cOVuvvfqBzw59jGWxFcjvrx35PI/niI6kA0TIYQJfACsAGqBtcCNUsptfb1myIGsdp0abdcdYFx+pHAhHRMpLITbi/D4wO3FQagg0h1I4knCsTiRaIJwLI4ZbaW44VWKq18kt3kDAO35cwhnT8Kwo5h2FCMZQSSjyHgUAxu3CYZQ6YCElGALOn3lbPPM5dXIFF6t99HRFcUnYpgygiUjBI0YWSJGFnE8MkZj3KHT48GVF6Ry4hhml+UzMS+LHH+AXH8OOZ5ccjzZOHGHSKgLt5R4hYRY7PCFTkYjOJEoTjSCjMZwohHiXSFCnSFqYjY7ow5buhLsS0oiHhNv0MuysTlcVJBNcSyB1dyO3diI3dyM4XGRCLjo9Ds0ueI0mCHqRCetIkyrDNMiQ0RMh4RlgnDhj7uoFKVUilJKZQH5iSBBx0uwII/guLEExheTNb4UM1d9iw5v2Ub9n96ibt12WtuihHy5xLKCJKXEQeIIiWOQuk0fHfYCIbwI4cMTj5EVbSMYbsFv2nTgo9OfT5c3m6QJyAjSiWHKMFa8AyPZhWGHETKO5RIE3BYBXxaBYDbBvHyCY4pwJW3a1q2lY+cOYki6AoV0lkzC8Xgo6KzBXV+LKxLDbdt4TBfekhIcvxfH7UG63dgeN45lYXvcJLP8JLwe4pZJHEk0HseOxzCiMQh14rS2Ig61YsbiuGwbXyKJN57El0jiSSQxJUghkPk5OEWF2GMLcQoKCPsKiEQtAqFmshr2Ye2vRrS2Hf1rys3BKS7GHlOIk5eDLQROKIQTChGKmDSKIlq8FdiWHyFcgAXChRCWuo2BSQJhxrFdUcJWiDarnWC4kEBcrf5tOBFy23aT31ZFKKuUpsKzcFw+fNEmyrq2UC5qaD2UpDrvLA4VzEIgKRYHmFTpEG9qo746Tou7nK6sEgDcIk5hIEZeMEl+jqSgwCQr143p99EWdrNpi2T/vgRuj8Hsc/KZc34J3oCHcCJMQ1cDT6zbzK4tL3KeZw0deTPA5ccQJpZhYAgLyzQxDAvLsDAMA8t0YZomLtOFKVxYhollWJiGhdu01POGC5dh4TJdWKYL1+Ef9+HbVuq2aViYhhvTdCEMSwVSYagfw1TTZ7q3wlADsjzZQw64OpANEyHEEuA+KeWlqfv/BiCl/H99vWaogeyBm24n4USPvHeP59RnIUldEgEHkEfd7p0BWAhMhDDVXtJGkgTs1Gt7MtW+mKgLgURKG4kNJHvZP11MRKqcYPZam5ISEA4SGymTqfIce97q4oWwEFjQva9M9LF/b0Qvxznq00DggHRwSICMDvC4Rx0idZzUzUHUXoWj3t8xIGk5RF2SiNsBBP6oiT9mIhje2nDSgqQHEh5IGhISNiID7/VXAAAIqElEQVQpsWxx5Mc5voky4k5imxJP3MRln3huVdJwSJoOpgOWbSBGKGueBAQ2QkpMKXEMg6QpkBgYtkAc89lLoS7ipmPjSSQw+1l6xhEWcXcA2/QBDoa0T7j/yRruq/rcKy/moluG1p+nM3sMn3FATY/7tcDiY3cSQtwF3AVQUdH/hMreSI8XIwoydQ3quVUXJpH6Zxy+xeGtkbpnIFIvlEKmLvoO6oJuq/2FAanAJkh9kwIcbCQOjnBA2jjYCCGwDBPDNDAM9WMahvp2ZrpAmIe/8UpMBA6GYyOcJCKZIBGLYcdiKtwaBtIwcYRAGgYOQpVRquBkY6eCrN39SwUECFVmIQSmZeGxLEzDxBQGLgSmBMeEuOEQN2wSJEgm4ySTCVymB6/lx2Nl4RY+DMdC2gIDiboU2Rg4CGkjpINhAS4BpiBpJAnZXXQlunBiCYjZiLij4mHSQkiB7c0iWmARy4NEjo0dEEiPicvnw+vx4fH48Xj9eH1+PJa3jwAtkdEETiSGDMdwwnGc9jjJUBInV5L0CmJuh5jbJuKySZoOHtOD1/Lis3wUWl68phfLsIgmo0TiYWJtHcTbOrHbQzixBJbfhyvLjzsrC28wgCcQxHK5iYY6iIZCxEIhEqEukuEIdjyBtAykJY7aOi4Dx2vgeE3o0T/kNt3kefLI9eSS51XbXE8ubsci1t5BtKWNaGsbkVa1tRNJDL8HfC6kz8L2GtgeA1vaOOEYMhLHCcfU7XBMfXXzmiS9gqRbEPdIopYNLhOP6cFjuXEZbtymG4/pxpQGMpFEJuzUTxIZT+IzfRT48skTAcyuKHZHB05HB8Lnw1VUhJmbq/7ekES7EhimJJmM0tLZTGvXITq6WgmFO7Cxj/79GAaurhykmcD2dIHtIJIOIikJJR2ETH35cCRGanvsbeHEMJIOJHMQmBiov3dDHAngUnYHIXn4i5DsfiL1+JG6x5HHum/KY0PYcRUVeczN415x9L59POnPP71WSdeBbBCklA8CD4KqkQ3lGJ97+MdpLZOmjTi91qg2wnT2e+UA0DMBX1nqMU3TNO00pwOZshaYIoSYKIRwA58AXhjhMmmapmkDoJsWASllUgjxWeAV1PD7X0opt45wsTRN07QB0IEsRUr5EvDSSJdD0zRNGxzdtKhpmqaNajqQaZqmaaOaDmSapmnaqKYDmaZpmjaq6RRVQySEaAL2j3Q5hlkh0DzShThFzqRzBX2+me50Pt/xUsox6TygDmRan4QQ69KdE+10dSadK+jzzXRn2vnqpkVN0zRtVNOBTNM0TRvVdCDTTuTBkS7AKXQmnSvo8810Z9T56j4yTdM0bVTTNTJN0zRtVNOBTNM0TRvVdCDTABBC7BNCvC+E2CiEWJd6LF8I8RchxK7UNm+ky5kuQohcIcRzQogdQojtQoglmXq+Qohpqc+1+6dDCHFPBp/vvUKIrUKILUKIp4QQ3tQSTe8IIXYLIZ5OLdeUEYQQX0id61YhxD2pxzLys+2LDmRaTxdKKef1mH/yFWCllHIKsDJ1P1P8APiTlHI6MBfYToaer5RyZ+pznQcsBMLA82Tg+QohxgGfBxZJKWejlmX6BPAd4HtSyslAK3DHyJUyfYQQs4FPA2ej/o6vFEJMJgM/2xPRgUw7kWuAR1K3HwE+OoJlSRshRA5wHvALACllXErZRoae7zEuBqqklPvJ3PO1AJ8QwgL8QD1wEfBc6vlMOtcZwDtSyrCUMgm8CXyMzP1se6UDmdZNAn8WQqwXQtyVeqxISlmfut0AFI1M0dJuItAEPCyEeE8I8ZAQIovMPd+ePgE8lbqdcecrpTwA3A9UowJYO7AeaEtd6AFqgXEjU8K02wJ8WAhRIITwAx8BysnAz/ZEdCDTup0rpVwAXA78kxDivJ5PSjVPI1PmaljAAuAnUsr5QBfHNL1k2PkCkOoXuhp49tjnMuV8U31B16C+rJQCWcBlI1qoYSSl3I5qNv0z8CdgI2Afs09GfLYnogOZBhz+JouUshHVf3I2cFAIUQKQ2jaOXAnTqhaolVK+k7r/HCqwZer5drsc2CClPJi6n4nnuxzYK6VsklImgN8Cy4DcVFMjQBlwYKQKmG5Syl9IKRdKKc9D9f99QGZ+tn3SgUxDCJElhAh23wYuQTVZvADcmtrtVuD3I1PC9JJSNgA1QohpqYcuBraRoefbw40caVaEzDzfauAcIYRfCCE48tm+Dlyf2idTzhUAIcTY1LYC1T/2JJn52fZJZ/bQEEJUomphoJrdnpRSflMIUQA8A1Sglqy5QUrZMkLFTCshxDzgIcAN7AFuQ32xy9TzzUJd5CullO2pxzLy8xVCfB34OyAJvAfcieoT+zWQn3rsFillbMQKmUZCiFVAAZAAviilXJmpn21fdCDTNE3TRjXdtKhpmqaNajqQaZqmaaOaDmSapmnaqKYDmaZpmjaq6UCmaZqmjWo6kGnaKCGUt4QQl/d47ONCiD+NZLk0baTp4feaNoqksp0/C8xHzfl7D7hMSlk1hGNZPfIPatqopQOZpo0yQoj/QuWHzEptxwOzARdwn5Ty90KICcBjqX0APiulXC2EuAD4BiqV0XQp5dRTW3pNSz8dyDRtlEll6dgAxIE/AlullI8LIXKBd1G1NQk4UsqoEGIK8JSUclEqkL0IzJZS7h2ZM9C09LL630XTtNOJlLJLCPE0EAJuAK4SQvxz6mkvKi1RHfBAKhWXDfSseb2rg5iWSXQg07TRyUn9COA6KeXOnk8KIe4DDqJWDTaAaI+nu05RGTXtlNCjFjVtdHsF+Fwq0ztCiPmpx3OAeimlA3wSMEeofJo27HQg07TR7RuoQR6bhRBbU/cBfgzcKoTYBExH18K0DKYHe2iapmmjmq6RaZqmaaOaDmSapmnaqKYDmaZpmjaq6UCmaZqmjWo6kGmapmmjmg5kmqZp2qimA5mmaZo2qv1/HsGTrQdZakAAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light", "tags": [] }, "output_type": "display_data" } ], "source": [ "year_counts_pivoted.drop(index=0).plot();\n", "plt.xlabel('Year')\n", "plt.ylabel('Count', rotation=0, labelpad=30)\n", "plt.title('Frequencies of Archive Classes in Known Archives over Time');" ] }, { "cell_type": "code", "execution_count": null, "id": "20a48355", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 461 }, "id": "BAkOQ8Ka5bnE", "outputId": "9e84aa69-b58b-421c-ad19-c3c4ec3c27da" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
kikišibuduitudmuudgasilašumu.DUmaškimekišibakzabardabumaššaglugalmašgalkiraenensikegiaigikarŋiriragabadubsarmašdasaŋŋaamarmadaakitiluabgudzigauzudašgargukkalšugid...sikiduʾagudumdumšuhugarišuturgagurunindašuraekaskalakusaŋnammahegizidniskugarasaŋ.DUN₃muhaldimgalšagiagalšagiamahkurunakgalugulaʾekšidimgalkalamenkudinkiʾanabaharhurizumlagabibaduballašembulugliniŋsahaensiarchivedomesticated_animalwild_animaldead_animalleather_objectprecious_objectwoolarchive_class
pn
1000411.01.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{}1.00.00.00.00.00.0domesticated_animal
1001891.00.01.01.01.01.01.01.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{}0.00.01.00.00.00.0dead_animal
1001901.00.01.01.01.01.00.01.00.01.01.01.01.01.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{}0.00.01.00.00.00.0dead_animal
1001911.00.01.01.01.01.00.00.01.00.00.00.00.00.00.01.01.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{}0.00.01.00.00.00.0dead_animal
1002111.00.01.01.01.01.01.01.01.00.00.00.00.00.00.00.01.01.01.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{}0.00.01.00.00.00.0dead_animal
......................................................................................................................................................................................................................................................
5143760.00.01.01.01.01.00.01.00.00.00.00.00.01.01.01.00.01.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{}1.00.00.00.00.00.0domesticated_animal
5171840.00.01.01.01.01.00.01.01.01.01.01.01.01.01.00.01.00.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.01.01.01.01.00.00.01.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.01.0{}0.00.01.00.00.00.0dead_animal
5194571.00.00.01.01.01.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{}1.00.00.00.00.00.0domesticated_animal
5195210.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{}0.01.00.00.00.00.0wild_animal
5197920.00.01.01.01.01.00.01.00.01.00.00.00.00.01.00.00.01.00.01.00.01.00.00.00.01.00.00.00.00.00.00.00.00.01.00.00.01.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{}1.00.00.00.00.00.0domesticated_animal
\n", "

11896 rows × 1084 columns

\n", "
" ], "text/plain": [ " ki kišib udu ... precious_object wool archive_class\n", "pn ... \n", "100041 1.0 1.0 1.0 ... 0.0 0.0 domesticated_animal\n", "100189 1.0 0.0 1.0 ... 0.0 0.0 dead_animal\n", "100190 1.0 0.0 1.0 ... 0.0 0.0 dead_animal\n", "100191 1.0 0.0 1.0 ... 0.0 0.0 dead_animal\n", "100211 1.0 0.0 1.0 ... 0.0 0.0 dead_animal\n", "... ... ... ... ... ... ... ...\n", "514376 0.0 0.0 1.0 ... 0.0 0.0 domesticated_animal\n", "517184 0.0 0.0 1.0 ... 0.0 0.0 dead_animal\n", "519457 1.0 0.0 0.0 ... 0.0 0.0 domesticated_animal\n", "519521 0.0 0.0 0.0 ... 0.0 0.0 wild_animal\n", "519792 0.0 0.0 1.0 ... 0.0 0.0 domesticated_animal\n", "\n", "[11896 rows x 1084 columns]" ] }, "execution_count": 62, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "known" ] }, { "cell_type": "code", "execution_count": null, "id": "715617d6", "metadata": { "id": "6A7coo1s6_Ha" }, "outputs": [], "source": [ "#run to save the prepared data\n", "known.to_csv('output/part_4_known.csv')\n", "known.to_pickle('output/part_4_known.p')\n", "unknown.to_csv('output/part_4_unknown.csv')\n", "unknown.to_pickle('output/part_4_unknown.p')\n", "unknown_0.to_csv('output/part_4_unknown_0.csv')\n", "unknown_0.to_pickle('output/part_4_unknown_0.p')" ] }, { "cell_type": "code", "execution_count": null, "id": "684ef69e", "metadata": { "id": "NvCdznPg8nwA" }, "outputs": [], "source": [ "#known = pd.read_pickle('https://gitlab.com/yashila.bordag/sumnet-data/-/raw/main/part_4_known.p')\n", "#unknown = pd.read_pickle('https://gitlab.com/yashila.bordag/sumnet-data/-/raw/main/part_4_unknown.p')\n", "#unknown_0 = pd.read_pickle('https://gitlab.com/yashila.bordag/sumnet-data/-/raw/main/part_4_unknown_0.p')\n", "\n", "model_weights = {}" ] }, { "cell_type": "markdown", "id": "4ff76978", "metadata": { "id": "kYrD-yBJcMRJ" }, "source": [ "#### 1.3.1 PCA/Dimensionality Reduction\n", "\n", "Here we perform PCA to find out more about the underlying structure of the dataset. We will analyze the 2 most important principle components and explore how much of the variation of the known set is due to these components." ] }, { "cell_type": "code", "execution_count": null, "id": "362f7696", "metadata": { "id": "lT3zo60ldMi9" }, "outputs": [], "source": [ "#PCA\n", "pca_archive = PCA()\n", "principalComponents_archive = pca_archive.fit_transform(known.loc[:, 'AN.bu.um':'šuʾura'])" ] }, { "cell_type": "code", "execution_count": null, "id": "c314d3fc", "metadata": { "id": "zl4Mq4OOduCq" }, "outputs": [], "source": [ "principal_archive_Df = pd.DataFrame(data = principalComponents_archive\n", " , columns = ['principal component ' + str(i) for i in range(1, 1 + len(principalComponents_archive[0]))])" ] }, { "cell_type": "code", "execution_count": null, "id": "92ad6d1e", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "9X7Ps200_TZY", "outputId": "846be970-c0ec-4615-c248-2abb1fc5c8d6" }, "outputs": [ { "data": { "text/plain": [ "69" ] }, "execution_count": 17, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "len(known.loc[:, 'AN.bu.um':'šuʾura'].columns)" ] }, { "cell_type": "code", "execution_count": null, "id": "23f16a1b", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 635 }, "id": "dmz8Tv8hduoE", "outputId": "408f1360-3de7-4c02-d2ae-2d76cdfe9eef" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
principal component 1principal component 2principal component 3principal component 4principal component 5principal component 6principal component 7principal component 8principal component 9principal component 10principal component 11principal component 12principal component 13principal component 14principal component 15principal component 16principal component 17principal component 18principal component 19principal component 20principal component 21principal component 22principal component 23principal component 24principal component 25principal component 26principal component 27principal component 28principal component 29principal component 30principal component 31principal component 32principal component 33principal component 34principal component 35principal component 36principal component 37principal component 38principal component 39principal component 40principal component 41principal component 42principal component 43principal component 44principal component 45principal component 46principal component 47principal component 48principal component 49principal component 50principal component 51principal component 52principal component 53principal component 54principal component 55principal component 56principal component 57principal component 58principal component 59principal component 60principal component 61principal component 62principal component 63principal component 64principal component 65principal component 66principal component 67principal component 68principal component 69
0-0.001153-0.000651-0.000559-0.0005240.000030.000357-0.000339-0.000428-0.000424-0.000372-0.000382-5.157586e-172.706112e-15-0.000481-0.0003060.000033-3.429768e-165.923137e-174.357782e-17-2.465750e-162.860153e-16-0.0006243.966646e-172.603275e-16-4.583354e-17-2.646101e-178.451599e-18-3.045379e-171.346336e-179.688025e-171.936758e-17-0.000510.000105-0.000049-0.000055-0.000027-0.000113-0.000011-0.000028-0.000117-1.617357e-16-0.000085-0.000069-0.0001070.000015-0.000031-0.0000421.970750e-151.869525e-179.415297e-162.019903e-15-1.169278e-181.894373e-20-7.211354e-21-2.128106e-201.746307e-21-2.687171e-20-2.849700e-202.445157e-211.963535e-203.486655e-21-2.377182e-221.506376e-201.319668e-21-2.913316e-234.035962e-23-2.844907e-235.776068e-23-3.962093e-21
1-0.001153-0.000651-0.000559-0.0005240.000030.000357-0.000339-0.000428-0.000424-0.000372-0.0003821.564922e-162.649608e-15-0.000481-0.0003060.000033-2.727474e-16-3.278168e-17-2.041174e-171.211487e-164.041074e-17-0.0006241.172185e-162.181246e-173.816922e-17-1.415496e-17-8.185121e-171.835050e-16-2.723499e-161.982747e-16-1.303244e-16-0.000510.000105-0.000049-0.000055-0.000027-0.000113-0.000011-0.000028-0.000117-6.575466e-17-0.000085-0.000069-0.0001070.000015-0.000031-0.0000429.619220e-15-6.600049e-17-1.429446e-16-1.916333e-16-4.540458e-173.457458e-19-6.477865e-193.464390e-19-1.141279e-192.092593e-191.045044e-181.123598e-18-1.462768e-181.294085e-184.663841e-19-3.597813e-19-4.046188e-20-1.200163e-20-1.178408e-20-6.275206e-21-1.761853e-206.692983e-20
2-0.001153-0.000651-0.000559-0.0005240.000030.000357-0.000339-0.000428-0.000424-0.000372-0.0003821.535254e-162.658846e-15-0.000481-0.0003060.000033-2.825484e-16-3.280272e-17-2.539356e-171.231791e-164.753436e-17-0.0006241.205493e-162.333757e-173.548553e-17-1.480301e-17-8.340641e-171.838694e-16-2.681072e-161.870916e-16-1.443350e-16-0.000510.000105-0.000049-0.000055-0.000027-0.000113-0.000011-0.000028-0.000117-6.308277e-17-0.000085-0.000069-0.0001070.000015-0.000031-0.0000429.504357e-15-6.459051e-17-1.301681e-16-1.933703e-16-3.696746e-171.681898e-19-6.751741e-194.477259e-19-7.269816e-204.045474e-191.223637e-181.075851e-18-1.593793e-181.216670e-182.723570e-19-4.703521e-193.232220e-197.222399e-215.820028e-215.370710e-212.005591e-201.115431e-19
3-0.001153-0.000651-0.000559-0.0005240.000030.000357-0.000339-0.000428-0.000424-0.000372-0.000382-9.229732e-182.727819e-15-0.000481-0.0003060.000033-2.245254e-16-2.029954e-17-1.117705e-17-8.220052e-175.707972e-17-0.000624-3.630915e-181.008721e-168.512500e-194.287588e-18-4.577265e-18-8.689851e-19-1.644222e-174.959985e-17-1.224857e-16-0.000510.000105-0.000049-0.000055-0.000027-0.000113-0.000011-0.000028-0.000117-2.502096e-17-0.000085-0.000069-0.0001070.000015-0.000031-0.000042-3.722096e-18-1.397288e-182.365451e-195.917212e-21-1.645157e-19-4.078526e-21-4.933178e-202.974763e-20-5.485142e-20-1.298604e-19-6.587155e-20-1.116076e-191.338942e-198.165997e-201.384465e-19-1.084071e-209.970619e-21-4.728920e-20-4.616010e-20-2.324160e-20-6.416898e-20-3.414129e-20
4-0.001153-0.000651-0.000559-0.0005240.000030.000357-0.000339-0.000428-0.000424-0.000372-0.0003821.769313e-182.735146e-15-0.000481-0.0003060.000033-1.948217e-16-3.447135e-17-4.074711e-17-7.168826e-174.512882e-17-0.0006241.529291e-171.304296e-16-1.218269e-186.169002e-18-1.717860e-17-2.588438e-17-2.077230e-173.981505e-17-1.113311e-16-0.000510.000105-0.000049-0.000055-0.000027-0.000113-0.000011-0.000028-0.000117-2.206909e-17-0.000085-0.000069-0.0001070.000015-0.000031-0.000042-2.521881e-17-1.442576e-17-1.046305e-163.904241e-171.555701e-171.626597e-172.837998e-171.654359e-18-5.697258e-191.987760e-178.221763e-181.752222e-17-1.942312e-17-1.099805e-183.031280e-17-4.952675e-21-1.094244e-19-2.285040e-20-3.140286e-20-8.951914e-21-3.201393e-209.188216e-19
..................................................................................................................................................................................................................
11879-0.001153-0.000651-0.000559-0.0005240.000030.000357-0.000339-0.000428-0.000424-0.000372-0.000382-4.823243e-182.730673e-15-0.000481-0.0003060.000033-2.266430e-16-1.855078e-17-1.093610e-17-7.766431e-175.736404e-17-0.000624-1.946653e-181.082104e-16-9.088522e-185.251149e-19-3.071913e-182.347497e-18-5.358961e-185.214637e-17-1.227922e-16-0.000510.000105-0.000049-0.000055-0.000027-0.000113-0.000011-0.000028-0.000117-2.485590e-17-0.000085-0.000069-0.0001070.000015-0.000031-0.000042-1.981995e-20-1.408317e-204.777440e-21-1.768909e-21-3.751041e-211.166248e-20-8.218202e-21-2.474420e-202.830425e-204.740412e-20-3.458524e-207.981751e-21-3.462157e-20-1.215977e-20-2.985825e-206.461742e-21-1.206271e-208.583833e-221.080165e-219.273926e-23-1.733363e-203.412558e-22
11880-0.001153-0.000651-0.000559-0.0005240.000030.000357-0.000339-0.000428-0.000424-0.000372-0.000382-4.749602e-182.730674e-15-0.000481-0.0003060.000033-2.266228e-16-1.869866e-17-1.114803e-17-7.765207e-175.744572e-17-0.000624-1.983822e-181.082685e-16-8.996091e-185.472673e-19-3.003457e-182.314899e-18-5.345549e-185.225182e-17-1.227673e-16-0.000510.000105-0.000049-0.000055-0.000027-0.000113-0.000011-0.000028-0.000117-2.485989e-17-0.000085-0.000069-0.0001070.000015-0.000031-0.000042-1.651102e-20-8.997080e-21-1.231869e-194.509590e-201.841453e-20-1.625999e-202.114757e-204.172252e-20-7.715687e-20-2.953932e-203.815738e-20-7.018896e-205.159629e-206.339990e-209.522999e-20-2.781818e-203.738191e-19-8.328490e-21-9.461461e-21-1.915426e-21-1.542882e-20-1.716732e-21
11881-0.001153-0.000651-0.000559-0.0005240.000030.000357-0.000339-0.000428-0.000424-0.000372-0.000382-4.745834e-182.730674e-15-0.000481-0.0003060.000033-2.266228e-16-1.869866e-17-1.114775e-17-7.765208e-175.744572e-17-0.000624-1.984056e-181.082685e-16-8.996113e-185.469898e-19-3.003380e-182.314654e-18-5.345462e-185.225182e-17-1.227673e-16-0.000510.000105-0.000049-0.000055-0.000027-0.000113-0.000011-0.000028-0.000117-2.485053e-17-0.000085-0.000069-0.0001070.000015-0.000031-0.000042-1.648348e-20-8.950326e-21-1.263781e-194.626611e-201.896135e-20-1.660133e-202.106983e-204.160353e-20-7.701988e-20-2.993905e-203.810188e-20-7.045446e-205.129683e-206.345979e-209.529311e-20-2.772713e-203.735923e-19-9.942613e-21-1.135000e-20-2.100779e-211.085541e-20-6.187519e-22
11882-0.001153-0.000651-0.000559-0.0005240.000030.000357-0.000339-0.000428-0.000424-0.000372-0.000382-4.745834e-182.730674e-15-0.000481-0.0003060.000033-2.266228e-16-1.869866e-17-1.114775e-17-7.765208e-175.744572e-17-0.000624-1.984056e-181.082685e-16-8.996113e-185.469898e-19-3.003380e-182.314654e-18-5.345462e-185.225182e-17-1.227673e-16-0.000510.000105-0.000049-0.000055-0.000027-0.000113-0.000011-0.000028-0.000117-2.485053e-17-0.000085-0.000069-0.0001070.000015-0.000031-0.000042-1.648348e-20-8.950326e-21-1.263781e-194.626611e-201.896135e-20-1.660133e-202.106983e-204.160353e-20-7.701988e-20-2.993905e-203.810188e-20-7.045446e-205.129683e-206.345979e-209.529311e-20-2.772713e-203.735923e-19-9.942613e-21-1.135000e-20-2.100779e-211.085541e-20-6.187519e-22
11883-0.001153-0.000651-0.000559-0.0005240.000030.000357-0.000339-0.000428-0.000424-0.000372-0.000382-4.745834e-182.730674e-15-0.000481-0.0003060.000033-2.266228e-16-1.869866e-17-1.114775e-17-7.765208e-175.744572e-17-0.000624-1.984056e-181.082685e-16-8.996113e-185.469898e-19-3.003380e-182.314654e-18-5.345462e-185.225182e-17-1.227673e-16-0.000510.000105-0.000049-0.000055-0.000027-0.000113-0.000011-0.000028-0.000117-2.485053e-17-0.000085-0.000069-0.0001070.000015-0.000031-0.000042-1.648348e-20-8.950326e-21-1.263781e-194.626611e-201.896135e-20-1.660133e-202.106983e-204.160353e-20-7.701988e-20-2.993905e-203.810188e-20-7.045446e-205.129683e-206.345979e-209.529311e-20-2.772713e-203.735923e-19-9.942613e-21-1.135000e-20-2.100779e-211.085541e-20-6.187519e-22
\n", "

11884 rows × 69 columns

\n", "
" ], "text/plain": [ " principal component 1 ... principal component 69\n", "0 -0.001153 ... -3.962093e-21\n", "1 -0.001153 ... 6.692983e-20\n", "2 -0.001153 ... 1.115431e-19\n", "3 -0.001153 ... -3.414129e-20\n", "4 -0.001153 ... 9.188216e-19\n", "... ... ... ...\n", "11879 -0.001153 ... 3.412558e-22\n", "11880 -0.001153 ... -1.716732e-21\n", "11881 -0.001153 ... -6.187519e-22\n", "11882 -0.001153 ... -6.187519e-22\n", "11883 -0.001153 ... -6.187519e-22\n", "\n", "[11884 rows x 69 columns]" ] }, "execution_count": 18, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "principal_archive_Df" ] }, { "cell_type": "code", "execution_count": null, "id": "80e10958", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "XxH7E-bIdxDA", "outputId": "cf5dd11d-d784-4fb5-eaa6-058f214c52b9" }, "outputs": [ { "data": { "text/plain": [ "(11884, 69)" ] }, "execution_count": 19, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "principal_archive_Df.shape" ] }, { "cell_type": "code", "execution_count": null, "id": "46c2aacf", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "2ROufIJtdzQF", "outputId": "20cc5234-7805-4a7d-ca58-4773f09cc533" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Explained variation per principal component: [1.48909600e-01 7.68896157e-02 5.44580086e-02 3.84859466e-02\n", " 3.73441272e-02 3.55736929e-02 3.34908306e-02 3.23180930e-02\n", " 3.08607958e-02 2.72465476e-02 2.62104855e-02 2.46991069e-02\n", " 2.46991069e-02 2.46822947e-02 2.23337377e-02 1.90080231e-02\n", " 1.85243302e-02 1.85243302e-02 1.85243302e-02 1.85243302e-02\n", " 1.85243302e-02 1.84959965e-02 1.23495535e-02 1.23495535e-02\n", " 1.23495535e-02 1.23495535e-02 1.23495535e-02 1.23495535e-02\n", " 1.23495535e-02 1.23495535e-02 1.23495535e-02 1.23306530e-02\n", " 1.17690579e-02 1.08917149e-02 1.05830913e-02 9.79182442e-03\n", " 9.66057258e-03 8.53332346e-03 8.13466952e-03 7.97584503e-03\n", " 6.17477673e-03 6.17425099e-03 4.86154090e-03 4.81603581e-03\n", " 4.71709331e-03 2.85814508e-03 1.25376384e-03 1.15905668e-30\n", " 3.84394984e-31 3.83344024e-32 3.01169726e-32 4.83032948e-33\n", " 7.20704280e-34 7.20704280e-34 7.20704280e-34 7.20704280e-34\n", " 7.20704280e-34 7.20704280e-34 7.20704280e-34 7.20704280e-34\n", " 7.20704280e-34 7.20704280e-34 7.20704280e-34 7.20704280e-34\n", " 7.20704280e-34 7.20704280e-34 7.20704280e-34 7.20704280e-34\n", " 1.58880404e-34]\n" ] } ], "source": [ "print('Explained variation per principal component: {}'.format(pca_archive.explained_variance_ratio_))" ] }, { "cell_type": "code", "execution_count": null, "id": "01d5af13", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 282 }, "id": "UBAter-T_khC", "outputId": "d05783cf-c1e8-439a-92e9-8ccf92573132" }, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 26, "metadata": { "tags": [] }, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD4CAYAAADiry33AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAesUlEQVR4nO3df5RU5Z3n8fenq6iGrg600K0iDdIohpAoqATzQ02iUTEzo5sdTSCZRBN3SCbxbGYzmYyeOetkzNnsMZPRJBN3TsiYX2YTdZzNhElI0Kgz+TFGQYMiCtKiQqNII79poOnu7/5Rt7FoGyjoH1V96/M6p0/Xvfepqm9xik89/dyn7qOIwMzM0qum3AWYmdnQctCbmaWcg97MLOUc9GZmKeegNzNLuWy5C+irsbExpk6dWu4yzMxGlMcee2xLRDT1d6zign7q1KksX7683GWYmY0okl483DEP3ZiZpZyD3sws5Rz0ZmYp56A3M0s5B72ZWco56M3MUs5Bb2aWcqkJ+l37DnDb/c+yYsP2cpdiZlZRUhP0Xd3B1x5Yy+/Xbyt3KWZmFSU1QV9XmwGgo7O7zJWYmVWW1AR9LlNDtkbs2d9V7lLMzCpKaoJeEnW5jHv0ZmZ9pCboAfK1Wffozcz6SFXQu0dvZvZ6qQr6fG2WPZ3u0ZuZFUtV0NflMnTsd4/ezKxYqoI+n3OP3sysr5KCXtI8SWsktUq6oZ/jF0p6XFKXpKv6OT5WUpukbwxG0YdTV5v1GL2ZWR9HDXpJGeB24HJgJrBA0sw+zdYD1wI/PMzDfBH41fGXWZp8LuNZN2ZmfZTSo58LtEbEuojoBO4CrixuEBEvRMSTQE/fO0s6FzgJuG8Q6j2iupx79GZmfZUS9JOADUXbbcm+o5JUA/w98LmjtFsoabmk5e3t7aU8dL/ytRn2dHYREcf9GGZmaTPUJ2M/BSyJiLYjNYqIRRExJyLmNDU1HfeT1eWyRMC+A6/7w8LMrGplS2izEZhctN2c7CvF24ELJH0KqAdyknZHxOtO6A6GfHJhsz2dXYzJZYbiKczMRpxSgn4ZMF1SC4WAnw98qJQHj4gP996WdC0wZ6hCHgo9eqAwl75+qJ7FzGxkOerQTUR0AdcDS4FngHsiYpWkmyVdASDprZLagKuBb0paNZRFH04+91qP3szMCkrp0RMRS4AlffbdVHR7GYUhnSM9xneB7x5zhcegrjbp0TvozcwOStk3Y5MevS+DYGZ2UKqC/uAYvXv0ZmYHpSroD866cY/ezOygVAW9e/RmZq+XqqB/bR69e/RmZr1SFfSjsxkk6PCFzczMDkpV0NfUiLpRGffozcyKpCroofea9O7Rm5n1Sl3Q571AuJnZIVIX9HW5rKdXmpkVSV3Q52szHroxMyuSuqCvy2V9MtbMrEjqgj5fm/H0SjOzIqkLeq8ba2Z2qNQFfT6X8fXozcyKpC7o62qzhRWmzMwMSGHQ53MZOrt76OzyAuFmZlBi0EuaJ2mNpFZJr1vzVdKFkh6X1CXpqqL9syU9LGmVpCclfXAwi+9P7xUs93qc3swMKCHoJWWA24HLgZnAAkkz+zRbD1wL/LDP/g7goxHxZmAe8FVJDQMt+kheu4Klx+nNzKC0NWPnAq0RsQ5A0l3AlcDTvQ0i4oXk2CHjJRHxbNHtlyRtBpqA7QOu/DB8TXozs0OVMnQzCdhQtN2W7DsmkuYCOeC5Y73vsfAqU2ZmhxqWk7GSJgJ3Ah+LiNedJZW0UNJyScvb29sH9Fy9PXoP3ZiZFZQS9BuByUXbzcm+kkgaC/wM+OuI+F1/bSJiUUTMiYg5TU1NpT50v/K9Qzfu0ZuZAaUF/TJguqQWSTlgPrC4lAdP2v8Y+H5E3Hv8ZZauzidjzcwOcdSgj4gu4HpgKfAMcE9ErJJ0s6QrACS9VVIbcDXwTUmrkrt/ALgQuFbSiuRn9pC8ksTBHr2nV5qZAaXNuiEilgBL+uy7qej2MgpDOn3v9wPgBwOs8Zgc7NH7wmZmZkAKvxlbN6oQ9O7Rm5kVpC7os5kaarM1HqM3M0ukLugB8r6wmZnZQakM+jpfqtjM7KDUBr179GZmBSkN+qx79GZmiVQGfb4241k3ZmaJVAZ9XS7refRmZolUBn0+5x69mVmvVAZ9XW3W16M3M0ukMujzuYyvR29mlkhl0Nflsuw90E13T5S7FDOzsktl0PeuMrX3gHv1ZmapDPqD68Z65o2ZWTqD/uC6sZ55Y2aWzqA/uG6se/RmZukMeq8yZWb2mlQGvdeNNTN7TUlBL2mepDWSWiXd0M/xCyU9LqlL0lV9jl0jaW3yc81gFX4kB3v0nktvZnb0oJeUAW4HLgdmAgskzezTbD1wLfDDPvcdD/wNcB4wF/gbSScMvOwjq8u5R29m1quUHv1coDUi1kVEJ3AXcGVxg4h4ISKeBHr63Pcy4P6I2BoR24D7gXmDUPcR5Ws9vdLMrFcpQT8J2FC03ZbsK0VJ95W0UNJyScvb29tLfOjDe61H76EbM7OKOBkbEYsiYk5EzGlqahrw49Vma8jUyBc2MzOjtKDfCEwu2m5O9pViIPc9bpIK68b6ZKyZWUlBvwyYLqlFUg6YDywu8fGXApdKOiE5CXtpsm/I5XO+VLGZGZQQ9BHRBVxPIaCfAe6JiFWSbpZ0BYCkt0pqA64GvilpVXLfrcAXKXxYLANuTvYNubrajMfozcyAbCmNImIJsKTPvpuKbi+jMCzT332/DXx7ADUel3wu61k3ZmZUyMnYoVCXc4/ezAxSHPR5LydoZgakOOjrchlfAsHMjBQHfT6X9SUQzMxIcdDX1bpHb2YGKQ763h59hBcIN7Pqltqgr6vN0BOwv6vvddbMzKpLaoM+7+UEzcyAFAd97xUsvZygmVW71AZ97zXpPfPGzKpdaoPePXozs4LUBv1rq0w56M2suqU26L1urJlZQWqDvnfWja93Y2bVLrVBX1eb9Og9dGNmVS61Qe8evZlZQWqDfswo9+jNzCDFQV9TU1gg3D16M6t2JQW9pHmS1khqlXRDP8drJd2dHH9E0tRk/yhJ35O0UtIzkm4c3PKPrC6X9SpTZlb1jhr0kjLA7cDlwExggaSZfZpdB2yLiNOB24Bbkv1XA7URcSZwLvCJ3g+B4ZCvzXjdWDOreqX06OcCrRGxLiI6gbuAK/u0uRL4XnL7XuBiSQICyEvKAmOATmDnoFReAvfozcxKC/pJwIai7bZkX79tIqIL2AFMoBD6e4CXgfXAVyJia98nkLRQ0nJJy9vb24/5RRxO3mP0ZmZDfjJ2LtANnAK0AH8haVrfRhGxKCLmRMScpqamQXvyutqsZ92YWdUrJeg3ApOLtpuTff22SYZpxgGvAh8CfhERByJiM/BbYM5Aiy6Ve/RmZqUF/TJguqQWSTlgPrC4T5vFwDXJ7auAB6Owht964CIASXngbcDqwSi8FHW5LLv3OejNrLodNeiTMffrgaXAM8A9EbFK0s2Srkia3QFMkNQKfBbonYJ5O1AvaRWFD4zvRMSTg/0iDufkcbVs3rWfA91eTtDMqle2lEYRsQRY0mffTUW391GYStn3frv72z9cWhrr6eoJNmztYFpTfbnKMDMrq9R+MxZgWlMegHXte8pciZlZ+aQ76BsLQf/8Fge9mVWvVAd9Q12O8fkc67bsLncpZmZlk+qgB2hpzHvoxsyqWuqDflpjnnUeujGzKpb6oG9pytO+az+79h0odylmZmWR+qCf1liYVukTsmZWrdIf9E2eeWNm1S31QX/qhDokeM4nZM2sSqU+6GuzGZpPGOMevZlVrdQHPRTG6de1ey69mVWnqgj6lsY8z2/ZQ+GCmmZm1aUqgv60pjwdnd28snN/uUsxMxt2VRH0LckUS18KwcyqUVUEva9iaWbVrCqC/uSxoxk9qsYzb8ysKlVF0NfUiBbPvDGzKlVS0EuaJ2mNpFZJN/RzvFbS3cnxRyRNLTp2lqSHJa2StFLS6MErv3TTkpk3ZmbV5qhBLylDYe3Xy4GZwAJJM/s0uw7YFhGnA7cBtyT3zQI/AD4ZEW8G3g2U5epi05rybNi2l84urx9rZtWllB79XKA1ItZFRCdwF3BlnzZXAt9Lbt8LXCxJwKXAkxHxBEBEvBoR3YNT+rFpaczT3ROs39pRjqc3MyubUoJ+ErChaLst2ddvm4joAnYAE4AzgJC0VNLjkj7f3xNIWihpuaTl7e3tx/oaStK7OLjH6c2s2gz1ydgscD7w4eT3+yVd3LdRRCyKiDkRMaepqWlICmnx+rFmVqVKCfqNwOSi7eZkX79tknH5ccCrFHr/v4qILRHRASwBzhlo0cdj3JhRNNbnPJfezKpOKUG/DJguqUVSDpgPLO7TZjFwTXL7KuDBKFxYZilwpqS65APgXcDTg1P6sWvxzBszq0JHDfpkzP16CqH9DHBPRKySdLOkK5JmdwATJLUCnwVuSO67DbiVwofFCuDxiPjZ4L+M0kxrrPdlEMys6mRLaRQRSygMuxTvu6no9j7g6sPc9wcUpliWXUtTni3LO9mx9wDjxowqdzlmZsOiKr4Z22uaT8iaWRWqrqBPpli2bvbwjZlVj6oK+pbGPHW5DE9t3FHuUszMhk1VBX2mRpw5aRwrNmwvdylmZsOmqoIeYPbkBp5+aaeveWNmVaPqgn7W5AY6u3tYvWlnuUsxMxsWVRn0AE94+MbMqkTVBf0p40bTWF/Lig0+IWtm1aHqgl4SsyeP44k29+jNrDpUXdADzGpu4Ln23ezcV5Y1UMzMhlV1Bv3kBiLgqTYP35hZ+lVl0J/VPA6AFR6+MbMqUJVB31CXo6Ux75k3ZlYVqjLoAWY1j+MJz7wxsypQvUE/uYFNO/exace+cpdiZjakqjroAV/3xsxSr2qDfubEsWRr5Pn0ZpZ6VRv0o0dleNPEsT4ha2apV1LQS5onaY2kVkk39HO8VtLdyfFHJE3tc3yKpN2SPjc4ZQ+OWZPH8WTbDnp6otylmJkNmaMGvaQMcDtwOTATWCBpZp9m1wHbIuJ04Dbglj7HbwV+PvByB9es5gZ27+/yguFmlmql9OjnAq0RsS4iOoG7gCv7tLkS+F5y+17gYkkCkPRfgOeBVYNT8uCZffCErKdZmll6lRL0k4ANRdttyb5+20REF7ADmCCpHvgr4G+P9ASSFkpaLml5e3t7qbUP2LSmeuprsx6nN7NUG+qTsV8AbouII46NRMSiiJgTEXOampqGuKTXZGrEWc3j+NnKl/naL9fyXLuHcMwsfbIltNkITC7abk729demTVIWGAe8CpwHXCXpy0AD0CNpX0R8Y8CVD5LPz5vBl372DF994Flu++WzvGniWP7wrIksmDuF8flcucszMxswRRx5xkkS3M8CF1MI9GXAhyJiVVGbTwNnRsQnJc0H/mtEfKDP43wB2B0RXznS882ZMyeWL19+PK9lQDbt2MeSlS/z0ydf4vH126mvzfKnF0zjugtaqK8t5fPQzKx8JD0WEXP6O3bUBIuILknXA0uBDPDtiFgl6WZgeUQsBu4A7pTUCmwF5g9e+cPj5HGj+fj5LXz8/BaefWUXt95X6OF/7+EX+NS7T+NP3nYqo0dlyl2mmdkxO2qPfriVq0ffnyc2bOcr963h12u30HzCGL545Vt4z4wTy12WmdnrHKlHX7XfjC3FrMkN3Hndefzwv53H6FEZPvbdZfzZDx7j5R17y12amVnJHPQleMfpjSz57xfwl5e9kQdXb+a9f/8f/NOv19HR2VXu0szMjspDN8do/asd3LT4Kf59TTtjRmV478yTuGLWKbzrjCZyWX9umll5DOhkrB1qyoQ6vnPtW3n0+a385ImX+PnKl/m3J15i7OgsV8w+hWvfMZXTT3xDucs0MzvIPfoBOtDdw2/WbuEnKzay5KlNdHb1cMH0Rj72zqm8+4wTqalRuUs0sypwpB69g34Qvbp7Pz96dD13/u5FXtm5n2mNeb75kXOZfpJ7+GY2tDzrZphMqK/l+oum85u/uoivLzibXfu7+Mgdj7Jha0e5SzOzKuagHwKjMjVcMesU7rxuLnsPdPOROx5h8y6vTWtm5eGgH0IzTh7Ldz72Vjbv2s9H73iUHR0Hyl2SmVUhB/0QO2fKCSz6yBzWte/hY9991HPvzWzYOeiHwfnTG/n6gtms2LCdS279FX/7b6v4z9YtHOjuKXdpZlYFPOtmGD20ejM/+N2L/Lp1C51dPYwdneXiN53En7ztVM499YRyl2dmI5i/MFUh3jPjRN4z40Q6Orv49dot3P/0K9y3ahM//v1GzpnSwJ9eMI1L33wyGc+9N7NB5B59me3Z38W9j7Vxx2+eZ/3WDiaPH8PcqRNQn6wXIIEQNTVw8YyTeO/Mk8pSs5lVHn9hagTo7gnuf3oT3/ntC7Rte/3VMSOCAHoi2Heghx17D3D1uc3c9EczecPoUcNfsJlVFA/djACZGjHvLROZ95aJR23b2dXD1x9Yy//591YeXvcqt35gNnNbxg9DlWY2EnnWzQiUy9bwucveyD9/8u3USHxw0cPc8ovV9PRU1l9nZlYZSgp6SfMkrZHUKumGfo7XSro7Of6IpKnJ/kskPSZpZfL7osEtv7qde+p4fv6ZC/jAuZP5x39/jr+890m6HfZm1sdRh24kZYDbgUuANmCZpMUR8XRRs+uAbRFxerI4+C3AB4EtwB9FxEuS3kJh3dlJg/0iqlm+NsstV53FKQ1juO2Xz9ITwVeunuWZO2Z2UClj9HOB1ohYByDpLuBKoDjorwS+kNy+F/iGJEXE74varALGSKqNiP0DrtwO8Zn3TidTA1+571m6e4JbPzCLbMYjc2ZWWtBPAjYUbbcB5x2uTUR0SdoBTKDQo+/1x8Dj/YW8pIXAQoApU6aUXLwd6vqLppOpqeGWX6ymO4KvfnA2oxz2ZlVvWGbdSHozheGcS/s7HhGLgEVQmF45HDWl1Z+9+zQyNfClJau5/+lXOHV8HadOyNPSWMfJ48aQGaQRnZoaUaPCz5hcDZe/ZSKjR2UG58HNbFCVEvQbgclF283Jvv7atEnKAuOAVwEkNQM/Bj4aEc8NuGI7qoUXnsZpTfU8+vxWnt+yhxdf7eDXa9vZ3zV019ZZ9sI2vvT+M4fs8c3s+JUS9MuA6ZJaKAT6fOBDfdosBq4BHgauAh6MiJDUAPwMuCEifjt4ZdvRXPymk7j4Ta99c7anJ9i57wCD8f243i9u9UQQAV97YC13L9vAx9/Zwukn1g/8CcxsUB016JMx9+spzJjJAN+OiFWSbgaWR8Ri4A7gTkmtwFYKHwYA1wOnAzdJuinZd2lEbB7sF2JHVlMjGupyQ/LYf3HJGSxe8RK3/GI13/pov1/MM7My8iUQbFDc/lArf7d0Dfd84u3+lq5ZGXjNWBtyH39nCyePHc3/WvIMldZ5MKt2DnobFGNyGT576Rk8sWE7S1ZuKnc5ZlbEQW+D5o/PaWbGyW/gy0tX0zmEM3zM7Nj46pU2aDI14obLZ3Dtd5bx9QfW8p4ZTeUuqQ9RI5BELlPD1MY66nL+L2Dp53e5Dap3ndHEBdMb+cZDrXzjodZyl3NEEjSfMIbpJ76B6SfVM6u5gbdNm8D4/NDMTjIrFwe9DSpJfOujc1j2wtZBmbM/WILXFm+JCPZ29vBc+27Wbt7N2ld28Zu1W+hMFmufOXEs7zhtAnNbxnNKwxga62uZUJ/z5SRsxHLQ26AbPSrDBdMrbdjmyA509/Bk2w7+s3ULv31uC99/+EX+6TfPH9JmfD7H6SfWM3fqeOZMPYFzTj2BsV7dy0YAz6M368e+A908/fJO2nftZ8vu/bTv2s/mXftZtXEHT720k+6eQIKWCXneMDrL6FEZxuQyjBmV4azmBv7gzIlMmVBX7pdhVcRrxpoNoo7OLlas386yF7axetNO9h7oZm9nN/sOdLNrXxfrtuwB4MxJ43jfmRN575tOZPL4Ol/0zYaUg95sGLVt6+DnKzfx05Uv88SG7Qf3N9bXMqlhNKc0jKGhLkc+l6Eul6GuNsuU8XVc9uaTvWCMHTcHvVmZbNjawaPPb2Xj9r28tH3vwd8793XRsb+LPZ3dB9tOP7Gez132Ri6deRKSA9+OzZGC3idjzYbQ5PF1TB5/+LH6np5gX1c3/7Gmnb+7bw2fuPMxzp7SwOcvm8HZUxoOaTsqU+Mevx0X9+jNKkRXdw///FgbX/3ls7yy8/WrbY4dneX9Z0/ig2+dwsxTxpahQqtkHroxG0H2dnbzkxUb2drRecj+1S/v4herNtHZ1cOZk8bxgTnNTJmQJyORqSn8ZDOiNluT/BRmAjXW15bpldhw8tCN2QgyJpdh/tz+107e3tHJv/5+I3ct28D//Mmqkh7vXWc0ceP7ZjDjZP8VUK3cozcbgSKC59r3sGPvAXoi6OoOunuCAz09dHb1sL+r8Hvjtr3c8Zt17N7fxVXnNvPZS97IyeNGl7t8GwLu0ZuljKSSl2285h2n8g8PtvL9h19g8RMv8fF3tnDd+S1M8JBO1XCP3qxKrH+1gy8vXc3PVr5MbbaGBXOnsPDCaUwcN6bcpdkgGPAKU5LmSVojqVXSDf0cr5V0d3L8EUlTi47dmOxfI+my430RZjYwUybU8Y0PncP9/+NC/uDMU/j+wy9y4Zcf4q/ufZKVbTu8MliKHbVHLykDPAtcArQBy4AFEfF0UZtPAWdFxCclzQfeHxEflDQT+BEwFzgF+CVwRkR0932eXu7Rmw2PDVs7+Nav13H3sg3s7+rhjJPq+eNzmnn/2ZM4cazH8UeaAU2vlPR24AsRcVmyfSNARPzvojZLkzYPS8oCm4Am4IbitsXtDvd8Dnqz4bWj4wA/XfkS//JYG4+v306NoKUxT42/nTvsZkwcyz8sOPu47jvQk7GTgA1F223AeYdrExFdknYAE5L9v+tz30n9FLgQWAgwZUr/08rMbGiMqxvFh887lQ+fdyrPte/mx49vZN2W3eUuqypNPmFozpdUxKybiFgELIJCj77M5ZhVrdOaCtfbsXQp5WTsRmBy0XZzsq/fNsnQzTjg1RLva2ZmQ6iUoF8GTJfUIikHzAcW92mzGLgmuX0V8GAUBv8XA/OTWTktwHTg0cEp3czMSnHUoZtkzP16YCmQAb4dEask3Qwsj4jFwB3AnZJaga0UPgxI2t0DPA10AZ8+0owbMzMbfP7ClJlZCgz4C1NmZjZyOejNzFLOQW9mlnIOejOzlKu4k7GS2oEXB/AQjcCWQSpnOIy0esE1D5eRVvNIqxfSVfOpEdHU3x0qLugHStLyw515rkQjrV5wzcNlpNU80uqF6qnZQzdmZinnoDczS7k0Bv2ichdwjEZaveCah8tIq3mk1QtVUnPqxujNzOxQaezRm5lZEQe9mVnKpSboj7aAeSWQ9G1JmyU9VbRvvKT7Ja1Nfp9Qzhr7kjRZ0kOSnpa0StJnkv0VWbek0ZIelfREUu/fJvtbkoXrW5OF7HPlrrUvSRlJv5f002S7omuW9IKklZJWSFqe7KvI90UvSQ2S7pW0WtIzkt5eqTVLemPyb9v7s1PSnx9PvakI+mQB89uBy4GZwIJkYfJK811gXp99NwAPRMR04IFku5J0AX8RETOBtwGfTv5tK7Xu/cBFETELmA3Mk/Q24Bbgtog4HdgGXFfGGg/nM8AzRdsjoeb3RMTsonndlfq+6PU14BcRMQOYReHfuyJrjog1yb/tbOBcoAP4McdTb0SM+B/g7cDSou0bgRvLXddhap0KPFW0vQaYmNyeCKwpd41Hqf8nwCUjoW6gDnicwhrHW4Bsf++XSvihsPraA8BFwE8BjYCaXwAa++yr2PcFhZXvnieZhDISai6q8VLgt8dbbyp69PS/gPnrFiGvUCdFxMvJ7U3ASeUs5kgkTQXOBh6hgutOhkBWAJuB+4HngO0R0ZU0qcT3x1eBzwM9yfYEKr/mAO6T9Jikhcm+in1fAC1AO/CdZIjsnyTlqeyae80HfpTcPuZ60xL0qRCFj+iKnO8qqR74F+DPI2Jn8bFKqzsiuqPw524zMBeYUeaSjkjSHwKbI+KxctdyjM6PiHMoDJl+WtKFxQcr7X1BYUW9c4B/jIizgT30GfaowJpJzs1cAfxz32Ol1puWoB/Ji5C/ImkiQPJ7c5nreR1JoyiE/P+NiP+X7K74uiNiO/AQhWGPhmTheqi898c7gSskvQDcRWH45mtUds1ExMbk92YKY8dzqez3RRvQFhGPJNv3Ugj+Sq4ZCh+kj0fEK8n2MdeblqAvZQHzSlW8sPo1FMbAK4YkUVgT+JmIuLXoUEXWLalJUkNyewyF8wnPUAj8q5JmFVMvQETcGBHNETGVwnv3wYj4MBVcs6S8pDf03qYwhvwUFfq+AIiITcAGSW9Mdl1MYT3riq05sYDXhm3geOot90mGQTxZ8T7gWQrjsX9d7noOU+OPgJeBAxR6F9dRGIt9AFgL/BIYX+46+9R8PoU/DZ8EViQ/76vUuoGzgN8n9T4F3JTsnwY8CrRS+BO4tty1Hqb+dwM/rfSak9qeSH5W9f6fq9T3RVHds4HlyfvjX4ETKrlmIA+8Cowr2nfM9foSCGZmKZeWoRszMzsMB72ZWco56M3MUs5Bb2aWcg56M7OUc9CbmaWcg97MLOX+P37lVT09CtH7AAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light", "tags": [] }, "output_type": "display_data" } ], "source": [ "plt.plot(pca_archive.explained_variance_ratio_)" ] }, { "cell_type": "code", "execution_count": null, "id": "107b64d4", "metadata": { "id": "5Ey9jWsOd3LL" }, "outputs": [], "source": [ "known_reindexed = known.reset_index()\n", "known_reindexed" ] }, { "cell_type": "code", "execution_count": null, "id": "632abd1e", "metadata": { "id": "vojBl0tgd53L" }, "outputs": [], "source": [ "plt.figure()\n", "plt.figure(figsize=(10,10))\n", "plt.xticks(fontsize=12)\n", "plt.yticks(fontsize=14)\n", "plt.xlabel('Principal Component 1',fontsize=20)\n", "plt.ylabel('Principal Component 2',fontsize=20)\n", "plt.title(\"Principal Component Analysis of Archives\",fontsize=20)\n", "targets = ['domesticated_animal', 'wild_animal', 'dead_animal', 'leather_object', 'precious_object', 'wool']\n", "colors = ['red', 'orange', 'yellow', 'green', 'blue', 'violet']\n", "for target, color in zip(targets,colors):\n", " indicesToKeep = known_reindexed.index[known_reindexed['archive_class'] == target].tolist()\n", " plt.scatter(principal_archive_Df.loc[indicesToKeep, 'principal component 1']\n", " , principal_archive_Df.loc[indicesToKeep, 'principal component 2'], c = color, s = 50)\n", "\n", "plt.legend(targets,prop={'size': 15})\n", "\n", "# import seaborn as sns\n", "# plt.figure(figsize=(16,10))\n", "# sns.scatterplot(\n", "# x=\"principal component 1\", y=\"principal component 2\",\n", "# hue=\"y\",\n", "# palette=sns.color_palette(\"hls\", 10),\n", "# data=principal_cifar_Df,\n", "# legend=\"full\",\n", "# alpha=0.3\n", "# )" ] }, { "cell_type": "markdown", "id": "176c95eb", "metadata": { "id": "pXxYqfyRLilb" }, "source": [ "## 2 Simple Modeling Methods" ] }, { "cell_type": "markdown", "id": "36cb6ba4", "metadata": { "id": "0NfR51-5M29w" }, "source": [ "### 2.1 Logistic Regression\n", "\n", "Here we will train our model using logistic regression to predict archives based on the features made in the previous subsection." ] }, { "cell_type": "markdown", "id": "1614eadb", "metadata": { "id": "pwuS7jLwcReG" }, "source": [ "#### 2.1.1 Logistic Regression by Archive\n", "\n", "Here we will train and test a set of 1 vs all Logistic Regression Classifiers which will attempt to classify tablets as either a part of an archive, or not in an archive." ] }, { "cell_type": "code", "execution_count": null, "id": "c516db73", "metadata": { "id": "rGmHTOlvd-Nl" }, "outputs": [], "source": [ "clf_da = LogisticRegression(random_state=42, solver='lbfgs', max_iter=200)\n", "clf_da.fit(known.loc[:, 'AN.bu.um':'šuʾura'], known.loc[:, 'domesticated_animal'])\n", "clf_da.score(known.loc[:, 'AN.bu.um':'šuʾura'], known.loc[:, 'domesticated_animal'])" ] }, { "cell_type": "code", "execution_count": null, "id": "9536cdff", "metadata": { "id": "T1tHDbt8eBFZ" }, "outputs": [], "source": [ "clf_wa = LogisticRegression(random_state=42, solver='lbfgs', max_iter=200)\n", "clf_wa.fit(known.loc[:, 'AN.bu.um':'šuʾura'], known.loc[:, 'wild_animal'])\n", "clf_wa.score(known.loc[:, 'AN.bu.um':'šuʾura'], known.loc[:, 'wild_animal'])" ] }, { "cell_type": "code", "execution_count": null, "id": "66f815ff", "metadata": { "id": "eyr-ZmqReDZ_" }, "outputs": [], "source": [ "clf_dea = LogisticRegression(random_state=42, solver='lbfgs', max_iter=200)\n", "clf_dea.fit(known.loc[:, 'AN.bu.um':'šuʾura'], known.loc[:, 'dead_animal'])\n", "clf_dea.score(known.loc[:, 'AN.bu.um':'šuʾura'], known.loc[:, 'dead_animal'])" ] }, { "cell_type": "code", "execution_count": null, "id": "362a3834", "metadata": { "id": "ibPqJEyjeEdJ" }, "outputs": [], "source": [ "clf_lo = LogisticRegression(random_state=42, solver='lbfgs', max_iter=200)\n", "clf_lo.fit(known.loc[:, 'AN.bu.um':'šuʾura'], known.loc[:, 'leather_object'])\n", "clf_lo.score(known.loc[:, 'AN.bu.um':'šuʾura'], known.loc[:, 'leather_object'])" ] }, { "cell_type": "code", "execution_count": null, "id": "59486eec", "metadata": { "id": "P4VI2NUXeFnL" }, "outputs": [], "source": [ "clf_po = LogisticRegression(random_state=42, solver='lbfgs', max_iter=200)\n", "clf_po.fit(known.loc[:, 'AN.bu.um':'šuʾura'], known.loc[:, 'precious_object'])\n", "clf_po.score(known.loc[:, 'AN.bu.um':'šuʾura'], known.loc[:, 'precious_object'])" ] }, { "cell_type": "code", "execution_count": null, "id": "2a87a9db", "metadata": { "id": "RVStvp5aeGtD" }, "outputs": [], "source": [ "clf_w = LogisticRegression(random_state=42, solver='lbfgs', max_iter=200)\n", "clf_w.fit(known.loc[:, 'AN.bu.um':'šuʾura'], known.loc[:, 'wool'])\n", "clf_w.score(known.loc[:, 'AN.bu.um':'šuʾura'], known.loc[:, 'wool'])" ] }, { "cell_type": "code", "execution_count": null, "id": "0cc28ad5", "metadata": { "id": "dL56mGZUTCTx" }, "outputs": [], "source": [ "known.loc[:, 'AN.bu.um':'šuʾura']" ] }, { "cell_type": "markdown", "id": "b0929509", "metadata": { "id": "bFvtNRjlGtsS" }, "source": [ "As we can see the domesticated animal model has the lowest accuracy while the leather object, precious_object, and wool classifiers work fairly well." ] }, { "cell_type": "markdown", "id": "5616d480", "metadata": { "id": "YopZHGeAb9LB" }, "source": [ "#### 2.1.2 Multinomial Logistic Regression\n", "\n", "Here we will be using multinomial logistic regression as we have multiple archive which we could classify each text into. We are fitting our data onto the tablets with known archives and then checking the score to see how accurate the model is.\n", "\n", "Finally, we append the Logistic Regression prediction as an archive prediction for the tablets without known archives." ] }, { "cell_type": "code", "execution_count": null, "id": "2d55e23b", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "G_Fx9IEBcqof", "outputId": "56380118-5d77-4c2a-946d-d66d5e152273" }, "outputs": [ { "data": { "text/plain": [ "0.6918291862811029" ] }, "execution_count": 131, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "clf_archive = LogisticRegression(random_state=42, solver='lbfgs', max_iter=300)\n", "clf_archive.fit(known.loc[:, 'AN.bu.um':'šuʾura'], known.loc[:, 'archive_class'])\n", "log_reg_score = clf_archive.score(known.loc[:, 'AN.bu.um':'šuʾura'], known.loc[:, 'archive_class'])\n", "model_weights['LogReg'] = log_reg_score\n", "log_reg_score" ] }, { "cell_type": "code", "execution_count": null, "id": "6416122f", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 529 }, "id": "BQtotAI5csuL", "outputId": "64c9ab17-0f5b-4aa2-b599-f0a3a601f716" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
kikišibuduitudmuudgasilašumu.DUmaškimekišibakzabardabumaššaglugalmašgalkiraenensikegiaigikarŋiriragabadubsarmašdasaŋŋaamarmadaakitiluabgudzigauzudašgargukkalšugid...sikiduʾagudumdumšuhugarišuturgagurunindašuraekaskalakusaŋnammahegizidniskugarasaŋ.DUN₃muhaldimgalšagiagalšagiamahkurunakgalugulaʾekšidimgalkalamenkudinkiʾanabaharhurizumlagabibaduballašembulugliniŋsahaensiarchivedomesticated_animalwild_animaldead_animalleather_objectprecious_objectwoolLogReg Predicted Archive
pn
1002170.00.00.01.01.01.00.01.00.01.00.00.00.00.01.00.00.00.00.00.00.01.00.00.00.00.00.01.01.01.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{domesticated_animal, wild_animal}1.01.00.00.00.00.0domesticated_animal
1002291.00.01.01.01.01.00.00.00.01.01.00.00.00.00.01.00.01.00.00.00.00.00.00.01.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{domesticated_animal, wild_animal}1.01.00.00.00.00.0domesticated_animal
1002840.00.01.01.01.01.01.01.00.01.00.00.00.00.00.00.00.01.00.00.01.00.00.00.00.00.00.01.00.01.00.00.00.01.01.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{domesticated_animal, wild_animal}1.01.00.00.00.00.0domesticated_animal
1002921.00.00.01.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{}0.00.00.00.00.00.0domesticated_animal
1003010.00.00.00.00.00.00.00.01.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{}0.00.00.00.00.00.0domesticated_animal
......................................................................................................................................................................................................................................................
5196470.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{}0.00.00.00.00.00.0domesticated_animal
5196500.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{}0.00.00.00.00.00.0domesticated_animal
5196580.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{}0.00.00.00.00.00.0domesticated_animal
5199570.00.00.01.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{}0.00.00.00.00.00.0domesticated_animal
5199591.01.00.01.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0{}0.00.00.00.00.00.0domesticated_animal
\n", "

3243 rows × 1084 columns

\n", "
" ], "text/plain": [ " ki kišib udu ... precious_object wool LogReg Predicted Archive\n", "pn ... \n", "100217 0.0 0.0 0.0 ... 0.0 0.0 domesticated_animal\n", "100229 1.0 0.0 1.0 ... 0.0 0.0 domesticated_animal\n", "100284 0.0 0.0 1.0 ... 0.0 0.0 domesticated_animal\n", "100292 1.0 0.0 0.0 ... 0.0 0.0 domesticated_animal\n", "100301 0.0 0.0 0.0 ... 0.0 0.0 domesticated_animal\n", "... ... ... ... ... ... ... ...\n", "519647 0.0 0.0 0.0 ... 0.0 0.0 domesticated_animal\n", "519650 0.0 0.0 0.0 ... 0.0 0.0 domesticated_animal\n", "519658 0.0 0.0 0.0 ... 0.0 0.0 domesticated_animal\n", "519957 0.0 0.0 0.0 ... 0.0 0.0 domesticated_animal\n", "519959 1.0 1.0 0.0 ... 0.0 0.0 domesticated_animal\n", "\n", "[3243 rows x 1084 columns]" ] }, "execution_count": 129, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "#Predictions for Unknown\n", "unknown[\"LogReg Predicted Archive\"] = clf_archive.predict(unknown.loc[:, 'AN.bu.um':'šuʾura'])\n", "unknown" ] }, { "cell_type": "code", "execution_count": null, "id": "fe64bfed", "metadata": { "id": "D2fSDs6AcvWs" }, "outputs": [], "source": [ "known['archive_class'].unique()" ] }, { "cell_type": "markdown", "id": "1d3769c3", "metadata": { "id": "gNWP2GzecAim" }, "source": [ "### 2.2 K Nearest Neighbors\n", "\n", "Here we will train our model using k nearest neighbors to predict archives based on the features made in the previous subsection. We are fitting our data onto the tablets with known archives and then checking the score to see how accurate the model is.\n", "\n", "We then append the KNN prediction as an archive prediction for the tablets without known archives.\n", "\n", "Then, we use different values for K (the number of neighbors we take into consideration when predicting for a tablet) to see how the accuracy changes for different values of K. This can be seen as a form of hyperparameter tuning because we are trying to see which K we should choose to get the highest training accuracy." ] }, { "cell_type": "code", "execution_count": null, "id": "8de92c39", "metadata": { "id": "5iHBQzoQc3_i" }, "outputs": [], "source": [ "#takes long time to run, so don't run again\n", "list_k = [3, 5, 7, 9, 11, 13]\n", "max_k, max_score = 0, 0\n", "for k in list_k:\n", " knn = KNeighborsClassifier(n_neighbors=k)\n", " knn.fit(known.loc[:, 'AN.bu.um':'šuʾura'], known.loc[:, 'archive_class'])\n", " knn_score = knn.score(known.loc[:, 'AN.bu.um':'šuʾura'], known.loc[:, 'archive_class'])\n", " print(\"Accuracy for k = %s: \" %(k), knn_score)\n", " if max_score <= knn_score:\n", " max_score = knn_score\n", " max_k = k\n", " " ] }, { "cell_type": "markdown", "id": "a7c1116b", "metadata": { "id": "FeBx_TVR-Aww" }, "source": [ "As we can see here, k = 5 and k = 9 have the best training accuracy performance which falls roughly in line with the Logistic Regression classification training accuracy." ] }, { "cell_type": "code", "execution_count": null, "id": "3478dc3b", "metadata": { "id": "D5m3ZZIrcxun" }, "outputs": [], "source": [ "knn = KNeighborsClassifier(n_neighbors=max_k)\n", "knn.fit(known.loc[:, 'AN.bu.um':'šuʾura'], known.loc[:, 'archive_class'])\n", "knn_score = knn.score(known.loc[:, 'AN.bu.um':'šuʾura'], known.loc[:, 'archive_class'])\n", "model_weights['KNN'] = knn_score" ] }, { "cell_type": "code", "execution_count": null, "id": "239e3978", "metadata": { "id": "DXC8SgIyc09u" }, "outputs": [], "source": [ "#Predictions for Unknown\n", "unknown[\"KNN Predicted Archive\"] = knn.predict(unknown.loc[:, 'AN.bu.um':'šuʾura'])\n", "unknown" ] }, { "cell_type": "markdown", "id": "33e11888", "metadata": { "id": "Pmiq9-8N9Zys" }, "source": [ "As we can see in the output from the previous cell, we can get different predictions depending on the classifier we choose." ] }, { "cell_type": "markdown", "id": "4c58e4cb", "metadata": { "id": "JTPzHB1U-cXx" }, "source": [ "Next we will split the data we have on tablets with known archives into a test and training set to further understant the atraining accuracy. For the next two sections, we will use `X_train` and `y_train` to train the data and `X_test` and `y_test` to test the data. As the known set was split randomly, we presume that both the training and test set are representative of the whole known set, so the two sets are reasonably comparable." ] }, { "cell_type": "code", "execution_count": null, "id": "db472f06", "metadata": { "id": "-iAwXCVkc548" }, "outputs": [], "source": [ "#Split known into train and test, eventually predict with unknown \n", "X_train, X_test, y_train, y_test = train_test_split(known.loc[:, 'AN.bu.um':'šuʾura'], \n", " known.loc[:, 'archive_class'], \n", " test_size=0.2,random_state=0) " ] }, { "cell_type": "markdown", "id": "af0f7284", "metadata": { "id": "kWzkxLtOcFNv" }, "source": [ "### 2.3 Naive Bayes\n", "\n", "Here we will train our model using a Naive Bayes Model to predict archives based on the features made in the previous subsection. Here, we make the assumption that the features are independent of each other, from which the descriptor _naive_ comes from. So:\n", "\n", "$$P(x_i|y; x_1, x_2, ..., x_{i-1}, x_{i+1}, ..., x_n) = P(x_i| y)$$\n", "\n", "and:\n", "\n", "$$P(x_1, x_2, ..., x_n | y) = \\prod_{i=1}^{n} P(x_i | y)$$\n", "\n", "Moreover, we will be using Bayes' Law, which in this case states:\n", "\n", "$$P(y|x_1, x_2, ..., x_n) = \\frac{P(y)P(x_1, x_2, ..., x_n | y)}{P(x_1, x_2, ..., x_n)}$$\n", "\n", "eg. the probability of a particular tablet (defined by features $x_1, x_2, ..., x_n$) is in archive $y$, is equal to the probability of getting a tablet from archive $y$ times the probability you would get a particular set of features $x_1, x_2, ..., x_n$ divided by the probability of getting a particular set of features $x_1, x_2, ..., x_n$.\n", "\n", "Applying our assumption of independence from before, we can simplify this to:\n", "\n", "$$P(y|x_1, x_2, ..., x_n) = \\frac{P(y)\\prod_{i=1}^{n} P(x_i | y)}{P(x_1, x_2, ..., x_n)}$$\n", "\n", "Which means the probability of a particular tablet (defined by features $x_1, x_2, ..., x_n$) is in archive $y$ is _proportional_ to \n", "\n", "$$P(y|x_1, x_2, ..., x_n) \\propto P(y)\\prod_{i=1}^{n} P(x_i | y)$$ probability of getting a tablet from archive $y$ times the product of probabilities of getting a feature $x_i$ given an archive $y$.\n", "\n", "We can then use this to calculate the maximizing archive.\n", "\n", "$$\\hat{y} = \\underset{y}{argmax} \\; P(y)\\prod_{i=1}^{n} P(x_i | y)$$\n", "\n", "We are training two models where the first assumes the features are Gaussian random variables and the second assumes the features are Bernoulli random variables.\n", "\n", "We are fitting our data onto the tablets with known archives and then checking the score to see how accurate the model is.\n", "\n", "Finally, we append the two Naive Bayes predictions as archive predictions for the tablets without known archives." ] }, { "cell_type": "code", "execution_count": null, "id": "9cea5277", "metadata": { "id": "tCIlLq7Jc7Hs" }, "outputs": [], "source": [ "#Gaussian\n", "gauss = GaussianNB()\n", "gauss.fit(X_train, y_train)\n", "gauss_nb_score = gauss.score(X_test, y_test)\n", "model_weights['GaussNB'] = gauss_nb_score\n", "gauss_nb_score" ] }, { "cell_type": "markdown", "id": "a4204922", "metadata": { "id": "BkK71TVzErST" }, "source": [ "We can see than the Gaussian assumption does quite poorly." ] }, { "cell_type": "code", "execution_count": null, "id": "cbf74cda", "metadata": { "id": "Ei_I_lWMc9Ar" }, "outputs": [], "source": [ "#Predictions for Unknown\n", "unknown[\"GaussNB Predicted Archive\"] = gauss.predict(unknown.loc[:, 'AN.bu.um':'šuʾura'])\n", "unknown" ] }, { "cell_type": "code", "execution_count": null, "id": "00ac7360", "metadata": { "id": "DtJQjvWSc_Ym" }, "outputs": [], "source": [ "#Bernoulli\n", "bern = BernoulliNB()\n", "bern.fit(X_train, y_train)\n", "bern_nb_score = bern.score(X_test, y_test)\n", "model_weights['BernoulliNB'] = bern_nb_score\n", "bern_nb_score" ] }, { "cell_type": "markdown", "id": "8851e94c", "metadata": { "id": "k0ZneRNOEwcg" }, "source": [ "However the Bernoulli assumption does quite well." ] }, { "cell_type": "code", "execution_count": null, "id": "f8fae852", "metadata": { "id": "91Gk1c5EdAka" }, "outputs": [], "source": [ "#Predictions for Unknown\n", "unknown[\"BernoulliNB Predicted Archive\"] = bern.predict(unknown.loc[:, 'AN.bu.um':'šuʾura'])\n", "unknown" ] }, { "cell_type": "markdown", "id": "e5062b38", "metadata": { "id": "rlwzgIeicIB3" }, "source": [ "### 2.4 SVM\n", "\n", "Here we will train our model using Support Vector Machines to predict archives based on the features made earlier in this section. We are fitting our data onto the tablets with known archives and then checking the score to see how accurate the model is.\n", "\n", "Finally, we append the SVM prediction as an archive prediction for the tablets without known archives." ] }, { "cell_type": "code", "execution_count": null, "id": "95e9a98d", "metadata": { "id": "3EZL1pkKdCMq" }, "outputs": [], "source": [ "svm_archive = svm.SVC(kernel='linear')\n", "svm_archive.fit(X_train, y_train)\n", "y_pred = svm_archive.predict(X_test)\n", "svm_score = metrics.accuracy_score(y_test, y_pred)\n", "model_weights['SVM'] = svm_score\n", "print(\"Accuracy:\", svm_score)" ] }, { "cell_type": "code", "execution_count": null, "id": "bbb8faf0", "metadata": { "id": "OBXM8WfldEMn" }, "outputs": [], "source": [ "unknown[\"SVM Predicted Archive\"] = svm_archive.predict(unknown.loc[:, 'AN.bu.um':'šuʾura'])\n", "unknown" ] }, { "cell_type": "markdown", "id": "e806b8b7", "metadata": { "id": "ssrCDxoMOhZm" }, "source": [ "## 3 Complex Modeling Methods" ] }, { "cell_type": "markdown", "id": "f632803b", "metadata": { "id": "PbG6QBz36R5R" }, "source": [ "## 4 Voting Mechanism Between Models\n", "\n", "Here we will use the models to determine the archive which to assign to each tablet with an unknown archive. \n", "\n", "We will then augment the words_df with these archives." ] }, { "cell_type": "code", "execution_count": null, "id": "06649d31", "metadata": { "id": "A_Z48QeD93Jd" }, "outputs": [], "source": [ "model_weights" ] }, { "cell_type": "code", "execution_count": null, "id": "b64cfde5", "metadata": { "id": "RMhkJmIJS1CL" }, "outputs": [], "source": [ "def visualize_archives(data, prediction_name):\n", " archive_counts = data.value_counts()\n", "\n", "\n", " plt.xlabel('Archive Class')\n", " plt.ylabel('Frequency', rotation=0, labelpad=30)\n", " plt.title('Frequencies of ' + prediction_name + ' Predicted Archives')\n", " plt.xticks(rotation=45)\n", " plt.bar(archive_counts.index, archive_counts);\n", "\n", " percent_domesticated_animal = archive_counts['domesticated_animal'] / sum(archive_counts)\n", "\n", " print('Percent of texts in Domesticated Animal Archive:', percent_domesticated_animal)" ] }, { "cell_type": "code", "execution_count": null, "id": "2052a8cf", "metadata": { "id": "7-0f5WJTSJqO" }, "outputs": [], "source": [ "#Log Reg Predictions\n", "visualize_archives(unknown['LogReg Predicted Archive'], 'Logistic Regression')" ] }, { "cell_type": "code", "execution_count": null, "id": "bf1b0746", "metadata": { "id": "Qgsc2nTVSf4O" }, "outputs": [], "source": [ "#KNN Predictions\n", "visualize_archives(unknown['KNN Predicted Archive'], 'K Nearest Neighbors')" ] }, { "cell_type": "code", "execution_count": null, "id": "f0e301dd", "metadata": { "id": "CECF4iExSmjP" }, "outputs": [], "source": [ "#Gaussian Naive Bayes Predictions\n", "visualize_archives(unknown['GaussNB Predicted Archive'], 'Gaussian Naive Bayes')" ] }, { "cell_type": "code", "execution_count": null, "id": "50ed6f66", "metadata": { "id": "pNSBTgPBSwhj" }, "outputs": [], "source": [ "#Bernoulli Naive Bayes Predictions\n", "visualize_archives(unknown['BernoulliNB Predicted Archive'], 'Bernoulli Naive Bayes')" ] }, { "cell_type": "code", "execution_count": null, "id": "3f9a8b7a", "metadata": { "id": "27WesTVUUfDG" }, "outputs": [], "source": [ "#SVM Predictions\n", "visualize_archives(unknown['SVM Predicted Archive'], 'Support Vector Machines Naive Bayes')" ] }, { "cell_type": "code", "execution_count": null, "id": "e3783714", "metadata": { "id": "0Xz6gARR-wq_" }, "outputs": [], "source": [ "def weighted_voting(row):\n", " votes = {} # create empty voting dictionary\n", " # tally votes\n", " for model in row.index:\n", " model_name = model[:-18] # remove ' Predicted Archive' from column name\n", " prediction = row[model]\n", " if prediction not in votes.keys():\n", " votes[prediction] = model_weights[model_name] # if the prediction isn't in the list of voting categories, add it with a weight equal to the current model weight \n", " else:\n", " votes[prediction] += model_weights[model_name] # else, add model weight to the prediction\n", " return max(votes, key=votes.get) # use the values to get the prediction with the greatest weight" ] }, { "cell_type": "code", "execution_count": null, "id": "bc0655b8", "metadata": { "id": "m6MQH0da-Aq-" }, "outputs": [], "source": [ "predicted_archives = unknown.loc[:, 'LogReg Predicted Archive':\n", " 'SVM Predicted Archive'].copy() # get predictions\n", "weighted_prediction = predicted_archives.apply(weighted_voting, axis=1) #apply voting mechanism on each row and return 'winning' prediction" ] }, { "cell_type": "code", "execution_count": null, "id": "b7eece68", "metadata": { "id": "Q38uV36MRHmJ" }, "outputs": [], "source": [ "weighted_prediction[weighted_prediction != 'domesticated_animal']" ] }, { "cell_type": "code", "execution_count": null, "id": "872b24af", "metadata": { "id": "PYuISOe9Gstd" }, "outputs": [], "source": [ "words_df" ] }, { "cell_type": "code", "execution_count": null, "id": "318fdedb", "metadata": { "id": "Pi4VugZGEsyZ" }, "outputs": [], "source": [ "archive_class = known['archive_class'].copy().append(weighted_prediction)\n", "words_df['archive_class'] = words_df.apply(lambda row: archive_class[int(row['id_text'][1:])], axis=1)" ] }, { "cell_type": "code", "execution_count": null, "id": "a9e491b4", "metadata": { "id": "z22PcVk-Mw2o" }, "outputs": [], "source": [ "words_df" ] }, { "cell_type": "markdown", "id": "9db15866", "metadata": { "id": "lPjcplQAQ8LX" }, "source": [ "## 5 Sophisticated Naive Bayes" ] }, { "cell_type": "code", "execution_count": null, "id": "6d6f9714", "metadata": { "id": "K60UB8wYGM6h" }, "outputs": [], "source": [ "import warnings\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "markdown", "id": "0f844643", "metadata": { "id": "3RvDKrY0UmLb" }, "source": [ "### 5.1 Feature and Model Creation" ] }, { "cell_type": "markdown", "id": "6fae866b", "metadata": { "id": "Q-V9P376rbgB" }, "source": [ "There are some nouns that are so closely associated with a specific archive that their presence in a text virtually guarantees that the text belongs to that archive. We will use this fact to create a training set for our classification model.\n", "\n", "The `labels` dictionary below contains the different archives along with their possible associated nouns." ] }, { "cell_type": "code", "execution_count": null, "id": "bc95e73d", "metadata": { "id": "MdQv15BnSJVy" }, "outputs": [], "source": [ "labels = dict()\n", "labels['domesticated_animal'] = ['ox', 'cow', 'sheep', 'goat', 'lamb', '~sheep', 'equid']\n", "dom = '(' + '|'.join(labels['domesticated_animal']) + ')'\n", "#split domesticated into large and small - sheep, goat, lamb, ~sheep would be small domesticated animals\n", "labels['wild_animal'] = ['bear', 'gazelle', 'mountain', 'lion'] # account for 'mountain animal' and plural\n", "wild = '(' + '|'.join(labels['wild_animal']) + ')'\n", "labels['dead_animal'] = ['die'] # find 'die' before finding domesticated or wild\n", "dead = '(' + '|'.join(labels['dead_animal']) + ')'\n", "labels['leather_object'] = ['boots', 'sandals']\n", "leath = '(' + '|'.join(labels['leather_object']) + ')'\n", "labels['precious_object'] = ['copper', 'bronze', 'silver', 'gold']\n", "prec = '(' + '|'.join(labels['precious_object']) + ')'\n", "labels['wool'] = ['wool', '~wool', 'hair']\n", "wool = '(' + '|'.join(labels['wool']) + ')'\n", "complete = []\n", "for lemma_list in labels.values():\n", " complete = complete + lemma_list\n", "tot = '(' + '|'.join(complete) + ')'\n", "# labels['queens_archive'] = []" ] }, { "cell_type": "code", "execution_count": null, "id": "df320e78", "metadata": { "id": "kNHCtQr8XY9v" }, "outputs": [], "source": [ "dom_tabs = set(words_df.loc[words_df['lemma'].str.match('.*\\[.*' + dom + '.*\\]')]['id_text'])\n", "wild_tabs = set(words_df.loc[words_df['lemma'].str.match('.*\\[.*' + wild + '.*\\]')]['id_text'])\n", "dead_tabs = set(words_df.loc[words_df['lemma'].str.match('.*\\[.*' + dead + '.*\\]')]['id_text'])\n", "leath_tabs = set(words_df.loc[words_df['lemma'].str.match('.*\\[.*' + leath + '.*\\]')]['id_text'])\n", "prec_tabs = set(words_df.loc[words_df['lemma'].str.match('.*\\[.*' + prec + '.*\\]')]['id_text'])\n", "wool_tabs = set(words_df.loc[words_df['lemma'].str.match('.*\\[.*' + wool + '.*\\]')]['id_text'])" ] }, { "cell_type": "markdown", "id": "113a4957", "metadata": { "id": "eQRfFFosCTQr" }, "source": [ "Each row of the `sparse` table below corresponds to one text, and the columns of the table correspond to the words that appear in the texts. Every cell contains the number of times a specific word appears in a certain text." ] }, { "cell_type": "code", "execution_count": null, "id": "c50679ba", "metadata": { "id": "GHyY6IbaRHYX" }, "outputs": [], "source": [ "# remove lemmas that are a part of a seal as well as words that are being used to determine training classes\n", "filter = (~words_df['label'].str.contains('s')) | words_df['lemma'].str.match('.*\\[.*' + tot + '.*\\]')\n", "sparse = words_df[filter].groupby(by=['id_text', 'lemma']).count()\n", "sparse = sparse['id_word'].unstack('lemma')\n", "sparse = sparse.fillna(0)\n", "\n", "#cleaning\n", "del filter" ] }, { "cell_type": "code", "execution_count": null, "id": "fa52d78e", "metadata": { "id": "4etDIDVgS93i" }, "outputs": [], "source": [ "text_length = sparse.sum(axis=1)" ] }, { "cell_type": "markdown", "id": "919684c2", "metadata": { "id": "B7_coDiFCiHu" }, "source": [ "If a text contains a word that is one of the designated nouns in `labels`, it is added to the set to be used for our ML model. Texts that do not contain any of these words or that contain words corresponding to more than one archive are ignored." ] }, { "cell_type": "code", "execution_count": null, "id": "cf620de3", "metadata": { "id": "e85qDgdYbejR" }, "outputs": [], "source": [ "class_array = []\n", "\n", "for id_text in sparse.index:\n", " cat = None\n", " number = 0\n", " if id_text in dom_tabs:\n", " number += 1\n", " cat = 'dom'\n", " if id_text in wild_tabs:\n", " number += 1\n", " cat = 'wild'\n", " if id_text in dead_tabs:\n", " number += 1\n", " cat = 'dead'\n", " if id_text in prec_tabs:\n", " number += 1\n", " cat = 'prec'\n", " if id_text in wool_tabs:\n", " number += 1\n", " cat = 'wool'\n", " if number == 1:\n", " class_array.append(cat)\n", " else:\n", " class_array.append(None)\n", "\n", "class_series = pd.Series(class_array, sparse.index)" ] }, { "cell_type": "markdown", "id": "1e443f84", "metadata": { "id": "ExKiUVAmDhB0" }, "source": [ "Next we remove the texts from `sparse` that we used in the previous cell." ] }, { "cell_type": "code", "execution_count": null, "id": "d196841f", "metadata": { "id": "Hj8bgltlEdzM" }, "outputs": [], "source": [ "used_cols = []\n", "\n", "for col in sparse.columns:\n", " if re.match('.*\\[.*' + tot + '.*\\]', col):\n", " used_cols.append(col)\n", " #elif re.match('.*PN$', col) is None:\n", " # used_cols.append(col)\n", "\n", "sparse = sparse.drop(used_cols, axis=1)" ] }, { "cell_type": "markdown", "id": "53e5f548", "metadata": { "id": "ZFhS09xHCskW" }, "source": [ "Now the `sparse` table will be updated to contain percentages of the frequency that a word appears in the text rather than the raw number of occurrences. This will allow us to better compare frequencies across texts of different lengths." ] }, { "cell_type": "code", "execution_count": null, "id": "7745e6fb", "metadata": { "id": "ArDjsWRSYqX1" }, "outputs": [], "source": [ "for col in sparse.columns:\n", " if col != 'text_length':\n", " sparse[col] = sparse[col]/text_length*1000" ] }, { "cell_type": "markdown", "id": "3960c041", "metadata": { "id": "Vp0R9mq1DN5F" }, "source": [ "We must convert percentages from the previous cell into integers for the ML model to work properly." ] }, { "cell_type": "code", "execution_count": null, "id": "c2d7b552", "metadata": { "id": "JnzN2SXG5QXJ" }, "outputs": [], "source": [ "this sparse = sparse.round()\n", "sparse = sparse.astype(int)" ] }, { "cell_type": "markdown", "id": "2c41d431", "metadata": { "id": "doDVY-klDS4f" }, "source": [ "To form X, we reduce the `sparse` table to only contain texts that were designated for use above in `class_series`. Y consists of the names of the different archives." ] }, { "cell_type": "code", "execution_count": null, "id": "2f34c355", "metadata": { "id": "l8TINhIO9ZuT" }, "outputs": [], "source": [ "X = sparse.loc[class_series.dropna().index]\n", "X = X.drop(X.loc[X.sum(axis=1) == 0, :].index, axis=0)\n", "y = class_series[X.index]" ] }, { "cell_type": "markdown", "id": "140979dc", "metadata": { "id": "LKvePiz_Dus9" }, "source": [ "Our data is split into a training set and a test set. The ML model first uses the training set to learn how to predict the archives for the texts. Afterwards, the test set is used to verify how well our ML model works." ] }, { "cell_type": "code", "execution_count": null, "id": "c2211eb3", "metadata": { "id": "o2VLiYkC1JoH" }, "outputs": [], "source": [ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.4, \n", " random_state = 9)" ] }, { "cell_type": "code", "execution_count": null, "id": "b6732b73", "metadata": { "id": "NrEWHgdxfyP_" }, "outputs": [], "source": [ "pipe = Pipeline([\n", " ('feature_reduction', SelectPercentile(score_func = f_classif)), \n", " ('weighted_multi_nb', MultinomialNB())\n", " ])" ] }, { "cell_type": "code", "execution_count": null, "id": "ed7a3756", "metadata": { "id": "TwK16rD1obnS" }, "outputs": [], "source": [ "from sklearn.model_selection import GridSearchCV\n", "f = GridSearchCV(pipe, {\n", " 'feature_reduction__percentile' : [i*10 for i in range(1, 10)],\n", " 'weighted_multi_nb__alpha' : [i/10 for i in range(1, 10)]\n", " }, verbose = 0, n_jobs = -1)" ] }, { "cell_type": "code", "execution_count": null, "id": "5bfbfc18", "metadata": { "id": "AxpnC26SiNS9" }, "outputs": [], "source": [ "f.fit(X_train, y_train);" ] }, { "cell_type": "code", "execution_count": null, "id": "3cfe2cfa", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "uShD8bbwkezW", "outputId": "e47c8212-be72-45f4-ebe0-66a217597586" }, "outputs": [ { "data": { "text/plain": [ "{'feature_reduction__percentile': 70, 'weighted_multi_nb__alpha': 0.1}" ] }, "execution_count": 117, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "f.best_params_" ] }, { "cell_type": "markdown", "id": "bebb464e", "metadata": { "id": "-fjlG0-vD5nC" }, "source": [ "Our best score when run on the training set is about 93.6% accuracy." ] }, { "cell_type": "code", "execution_count": null, "id": "f7cc8bf2", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "VDDiXkBinXcM", "outputId": "98476187-bed2-42a6-9782-1948d59fbfd4" }, "outputs": [ { "data": { "text/plain": [ "0.9359404096834265" ] }, "execution_count": 118, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "f.best_score_" ] }, { "cell_type": "markdown", "id": "881e2374", "metadata": { "id": "OyCwMd9SD_I6" }, "source": [ "Our best score when run on the test set is very similar to above at 93.2% accuracy, which is good because it suggests that our model isn't overfitted to only work on the training set." ] }, { "cell_type": "code", "execution_count": null, "id": "11588239", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "N3hcN7yvSIo-", "outputId": "9dac856d-e097-455b-b020-90aec50ede0e" }, "outputs": [ { "data": { "text/plain": [ "0.9321229050279329" ] }, "execution_count": 119, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "f.score(X_test, y_test)" ] }, { "cell_type": "code", "execution_count": null, "id": "67a80b76", "metadata": { "id": "1AYleoeySIR2" }, "outputs": [], "source": [ "predicted = f.predict(sparse)" ] }, { "cell_type": "markdown", "id": "0f2501f9", "metadata": { "id": "HdQk5tjvEJJh" }, "source": [ "The `predicted_df` table is the same as the `sparse` table from above, except that we have added an extra column at the end named `prediction`. `prediction` contains our ML model's classification of which archive the text belongs to based on the frequency of the words that appear." ] }, { "cell_type": "code", "execution_count": null, "id": "d89cb636", "metadata": { "id": "TKu-QXULG6Gm" }, "outputs": [], "source": [ "predicted_df = sparse.copy()\n", "predicted_df['prediction'] = predicted" ] }, { "cell_type": "code", "execution_count": null, "id": "9fc2c716", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 504 }, "id": "4hNchgn0HFrJ", "outputId": "d2cd5ef7-7d84-49a2-bef1-efefd812dc9d" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lemma$AN[NA]NA$GIR[NA]NA$HI[NA]NA$KA[NA]NA$KI[NA]NA$LAM[NA]NA$NI[NA]NA$UD[NA]NA$ŠA[NA]NA$ŠID[NA]NA1(aš)-a[]NU1(aš)-kam[]NU1(aš)-še₃[]NU1(aš)[]NU1(aš@c)[]NU1(ban₂)-bi[]NU1(ban₂)-ta[]NU1(ban₂)-še₃[]NU1(ban₂)[]NU1(barig)-ta[]NU1(barig)[]NU1(burʾu)[]NU1(bur₃)[]NU1(diš)-a-kam[]NU1(diš)-a-še₃[]NU1(diš)-a[]NU1(diš)-am₃[]NU1(diš)-bi[]NU1(diš)-kam-aš[]NU1(diš)-kam-ma-aš[]NU1(diš)-kam-ma[]NU1(diš)-kam[]NU1(diš)-ta[]NU1(diš)-x[]NU1(diš)-še₃[]NU1(diš)[]NU1(diš){ša}[]NU1(diš@t)-kam-aš[]NU1(diš@t)-kam-ma-aš[]NU1(diš@t)-kam[]NU...šuʾi[barber]Nšuʾura[goose]Nʾa₃-um[NA]NAṢa.lim.tum[00]PNṢe.AŠ₂.da.gan[00]PNṢe.er.ṣe.ra.num₂[00]PNṢe.la[00]PNṢe.li.uš.da.gan[00]PNṢe.lu.uš.da.gan.PA[00]PNṢe.lu.uš[00]PNṢe.lu.uš₂.da.gan[00]PNṢe.ra.am[00]PNṢe.ra[00]PNṢeherkinum[0]PNṢeṣe[0]PNṢe₂.la.šu[00]PNṢi.li.sud₃.da[00]PNṢilala[0]PNṢillašu[0]PNṢilliAdad[0]PNṢilliSud[0]PNṢilliŠulgi[0]PNṢillušDagan[0]PNṢillušdagan[0]PNṢillušŠulgi[0]PNṢillušṭab[0]PNṢipram[0]PNṢirula[0]PNṢummidili[0]PNṣa-bi₂-im[NA]NAṣa-bu-um[NA]NAṣi-il-x-{d}iškur[NA]NAṣi-ip-ra-am[NA]NAṣi-ra-am[NA]NAṬabaʾili[0]PNṬabumšar[0]PNṬabumšarri[0]PNṬabši[0]SNṬahili[0]PNprediction
id_text
P1000410000000000000000000000000000000000000000...000000000000000000000000000000000000000dom
P10018900000000000000000000000000000000000480000...000000000000000000000000000000000000000dead
P1001900000000000000000000000000000000330001000000...000000000000000000000000000000000000000dom
P10019100000000000000000000000000000000000480000...000000000000000000000000000000000000000dead
P1002110000000000000000000000000000000300001520000...000000000000000000000000000000000000000dead
......................................................................................................................................................................................................................................................
P5196500000000000000000000000000000000000000000...000000000000000000000000000000000000000dom
P5196580000000000000000000000000000000000000000...000000000000000000000000000000000000000wild
P519792000000000000000000000000000000000001550000...000000000000000000000000000000000000000dom
P519957000000000000000000710710000000000000000000...000000000000000000000000000000000000000dom
P5199590000000000000000000000000000000000000000...000000000000000000000000000000000000000dead
\n", "

15132 rows × 9174 columns

\n", "
" ], "text/plain": [ "lemma $AN[NA]NA $GIR[NA]NA $HI[NA]NA ... Ṭabši[0]SN Ṭahili[0]PN prediction\n", "id_text ... \n", "P100041 0 0 0 ... 0 0 dom\n", "P100189 0 0 0 ... 0 0 dead\n", "P100190 0 0 0 ... 0 0 dom\n", "P100191 0 0 0 ... 0 0 dead\n", "P100211 0 0 0 ... 0 0 dead\n", "... ... ... ... ... ... ... ...\n", "P519650 0 0 0 ... 0 0 dom\n", "P519658 0 0 0 ... 0 0 wild\n", "P519792 0 0 0 ... 0 0 dom\n", "P519957 0 0 0 ... 0 0 dom\n", "P519959 0 0 0 ... 0 0 dead\n", "\n", "[15132 rows x 9174 columns]" ] }, "execution_count": 124, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "predicted_df" ] }, { "cell_type": "code", "execution_count": null, "id": "a31a9350", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "i7ImpZnL3U5C", "outputId": "6da08322-0707-491a-c4b4-dd067b56a4fc" }, "outputs": [ { "data": { "text/plain": [ "Index(['P100041', 'P100189', 'P100190', 'P100191', 'P100211', 'P100214',\n", " 'P100215', 'P100217', 'P100218', 'P100219',\n", " ...\n", " 'P519534', 'P519613', 'P519623', 'P519624', 'P519647', 'P519650',\n", " 'P519658', 'P519792', 'P519957', 'P519959'],\n", " dtype='object', name='id_text', length=15132)" ] }, "execution_count": 125, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "predicted_df.index" ] }, { "cell_type": "markdown", "id": "6e77a951", "metadata": { "id": "p5U8vZ2iOTlZ" }, "source": [ "### 5.4 Testing the Model on Hand-Classified Data\n", "\n", "Here we first use our same ML model from before on Niek's hand-classified texts from the wool archive. Testing our ML model on these tablets gives us 82.5% accuracy." ] }, { "cell_type": "code", "execution_count": null, "id": "6d20c460", "metadata": { "id": "iCJi2jZZzZeS" }, "outputs": [], "source": [ "wool_hand_tabs = set(pd.read_csv('drive/MyDrive/SumerianNetworks/JupyterBook/Outputs/wool_pid.txt',header=None)[0])" ] }, { "cell_type": "code", "execution_count": null, "id": "32aebe60", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "5YioGFV12lao", "outputId": "f9ea6d73-5aa9-4f35-ccba-5cfe411b0d48" }, "outputs": [ { "data": { "text/plain": [ "0.8253968253968254" ] }, "execution_count": 126, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "hand_wool_frame = sparse.loc[wool_hand_tabs].loc[class_series.isna() == True]\n", "\n", "f.score(X = hand_wool_frame, \n", " y = pd.Series(\n", " index = hand_wool_frame.index, \n", " data = ['wool' for i in range(0, hand_wool_frame.shape[0])] ))" ] }, { "cell_type": "markdown", "id": "fc053bed", "metadata": { "id": "1lgQ_iMjG4Jt" }, "source": [ "Testing our ML model on 100 random hand-classified tablets selected from among all the texts gives us 87.2% accuracy." ] }, { "cell_type": "code", "execution_count": null, "id": "4088464f", "metadata": { "id": "I4xccyU3VDuQ" }, "outputs": [], "source": [ "niek_100_random_tabs = pd.read_pickle('/content/drive/MyDrive/niek_cats').dropna()\n", "niek_100_random_tabs = niek_100_random_tabs.set_index('pnum')['category_text']" ] }, { "cell_type": "code", "execution_count": null, "id": "eba4db0a", "metadata": { "id": "5skhfKRDkrCB" }, "outputs": [], "source": [ "random_frame = sparse.loc[set(niek_100_random_tabs.index)]\n", "random_frame['result'] = niek_100_random_tabs[random_frame.index]" ] }, { "cell_type": "code", "execution_count": null, "id": "64e86851", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "TRl1Pq1BlAhY", "outputId": "a18af6f2-6853-4024-a9e9-095ed998cf5e" }, "outputs": [ { "data": { "text/plain": [ "0.872093023255814" ] }, "execution_count": 188, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "f.score(X=random_frame.drop(labels='result', axis=1), y = random_frame['result'])" ] }, { "cell_type": "markdown", "id": "f932b71f", "metadata": { "id": "viioWwWRG9Wq" }, "source": [ "A large majority of the tablets are part of the domestic archive and have been classified as such." ] }, { "cell_type": "code", "execution_count": null, "id": "73d4b784", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "tzDgn3nbl-VH", "outputId": "b86adeed-15ed-4555-c3ee-919c6726acaa" }, "outputs": [ { "data": { "text/plain": [ "\n", "['dead', 'dom', 'dead', 'dom', 'dom', 'dom', 'dead', 'dom', 'dead',\n", " 'dom', 'dead', 'dead', 'dom', 'dom', 'dom', 'dom', 'dom', 'dom',\n", " 'dom', 'dom', 'wild', 'dom', 'dom', 'dom', 'dom', 'dom', 'dom',\n", " 'dead', 'dom', 'dom', 'dom', 'dom', 'dom', 'dom', 'dead', 'dom',\n", " 'dom', 'dom', 'dom', 'dom', 'dom', 'dead', 'dom', 'dom', 'dom',\n", " 'dom', 'dom', 'dom', 'dead', 'dom', 'dom', 'dom', 'dom', 'wild',\n", " 'dom', 'dom', 'dom', 'dom', 'dom', 'dom', 'dead', 'wild', 'dom',\n", " 'dom', 'dom', 'dom', 'wild', 'dom', 'dead', 'dom', 'dead', 'dead',\n", " 'dom', 'dead', 'dead', 'dom', 'dom', 'dom', 'dom', 'dom', 'dom',\n", " 'dom', 'dom', 'dom', 'dom', 'dom']\n", "Length: 86, dtype: object" ] }, "execution_count": 190, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "random_frame['result'].array" ] }, { "cell_type": "code", "execution_count": null, "id": "40dbac07", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "_p29Uif4lpja", "outputId": "195b73b0-ce3d-4ca0-81ac-9102fdb75830" }, "outputs": [ { "data": { "text/plain": [ "array(['dead', 'dom', 'dead', 'dom', 'dom', 'dom', 'dom', 'dom', 'dom',\n", " 'dom', 'dead', 'dead', 'dom', 'dom', 'dom', 'dom', 'dom', 'dom',\n", " 'dom', 'dom', 'wild', 'dom', 'dom', 'dom', 'dom', 'dom', 'dom',\n", " 'dead', 'dom', 'dom', 'dom', 'dom', 'dom', 'dom', 'dead', 'dom',\n", " 'dom', 'dom', 'dom', 'dom', 'dom', 'dead', 'dom', 'dom', 'dom',\n", " 'wild', 'dom', 'dom', 'dom', 'wild', 'dom', 'dom', 'dom', 'dom',\n", " 'dom', 'dom', 'dom', 'dom', 'dom', 'dom', 'dom', 'dead', 'dom',\n", " 'dom', 'dom', 'dom', 'dom', 'dom', 'dead', 'dom', 'dead', 'dom',\n", " 'dom', 'dom', 'dead', 'dom', 'dom', 'dom', 'dom', 'dom', 'dom',\n", " 'dom', 'dom', 'dom', 'dom', 'dom'], dtype='" ] }, "metadata": { "needs_background": "light", "tags": [] }, "output_type": "display_data" } ], "source": [ "plt.xlabel('Archive Class')\n", "plt.ylabel('Frequency', rotation=0, labelpad=30)\n", "plt.title('Frequencies of Predicted Archive Classes in All Tablets')\n", "plt.xticks(rotation=45)\n", "labels = list(set(predicted_df['prediction']))\n", "counts = [predicted_df.loc[predicted_df['prediction'] == label].shape[0] for label in labels]\n", "plt.bar(labels, counts);" ] }, { "cell_type": "markdown", "id": "5f7604c7", "metadata": { "id": "rG726vEEHQ3U" }, "source": [ "The below chart displays the actual frequencies of the different archives in the test set. As mentioned previously, it is visually obvious that there are many texts in the domestic archive, with comparatively very few texts in all of the other archives." ] }, { "cell_type": "code", "execution_count": null, "id": "85909ed0", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 310 }, "id": "-tiqpepZG7Og", "outputId": "13c2afdc-8886-4e38-949e-8124a355c137" }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAbUAAAElCAYAAABjzHyeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3de7xc873/8ddbxKURt5PUIVJRQpsoQVxaTpuiBCWpu1JRKpwfLefQ1qUtWlpK61RdispBqy4tjhStBlVVJReNS1wqJZpESEjEnSY+vz++32HZnb2zUzsz2d95Px+Peew137Vmre+amT3v+X7Xd9ZSRGBmZlaCZZpdATMzs67iUDMzs2I41MzMrBgONTMzK4ZDzczMiuFQMzOzYjjUzLqIpA9JekVSjwZuc0NJkyW9LOkrjdpuI0g6RdLPO5g/RdKwBlaptt07JX2p0du1znGoWcNJmibp9RwAtdtaza7X+xURf4+IlSJiYQM3+zXg9xHROyLOrc7IH/q153ehpDcq909c3A1JukzSaZ1YTpKelPTI4m5jcUTE4Ii4s6vXK2m5HKhPSHo1v1/HSBrQ1duyrudQs2bZLQdA7fZMdaakZZtVsW5mHWBKvRn5Q3+liFgJ+CNwVOX5/u4SrNMngQ8CH5a0RXsL5fBbGj+DfgXsDnweWAXYBJgEbN/MSlnnLI1vKGtRkkLSkZKeAJ7IZZ/N3WsvSrpH0saV5TeVdH/uertG0tW1loSkgyXdXWf96+fp5SWdLenvkp6T9BNJK+Z5wyTNkHSspNmSZkn6YmU9K0r6gaSnJc2XdHcuG5C3sWxebhVJl+bHz5R0Wq1rUtL6kv6QH/+8pGs6eF52z62uF3PX10dz+R3Ap4Hzcutrg8V4rg+R9KikeZJulbROLpekc/J+vyTpIUkbSRoNHAB8LW/r1x2sfhRwI3BLnq5u905Jp0v6E/AaKfgGSxonaW5+LaqtyOUkXZFf4ymShlbWNU3SDpLWyi3/1SvzNs3Pa8+O9rfO87ID8BlgRERMiIgFETE/Is6PiEvrLL+epDskvZC3d6WkVSvzv55f+5clPS5p+1y+paSJ+Tl+TtIPK4/ZOr/XX5T0gCpdrPl9/WRe31OSDujgdWhNEeGbbw29AdOAHeqUBzAOWB1YEdgUmA1sBfQgfUBOA5YHlgOeBv4L6AnsBfwDOC2v62Dg7jrrXz9PnwOMzdvqDfwa+F6eNwxYAHw7r3sX0gfwann++cCdQL9cr0/kOg3I21g2L3cDcBHQi9RyGQ8cnuddBZxE+mK5ArBtO8/VBsCrpA/anqTuxqnAcnn+ncCXOvGcv7McMCKv46PAssA3gHvyvJ1IrZJVAeVl1szzLqs9vx1s5wPAS/k52xN4vlbXSj3+DgzO2+4NzAKOzc9Db2CrvOwpwBt5XT2A7wH31nsfAXcAh1XmnQX8ZFH7W6f+ZwB/WIzncv382iwP9AXuAv4nz9sQmA6sle8PANbL038GvpCnVwK2ztP9gBfyPi+T1/1CXnev/NxumJddExjc7P/npe3W9Ar41nq3/GH0CvBivv1fLg9gu8pyFwLfafPYx4FPkbq4ngFUmXcPnQi1/GH9au0DJs/7OPBUnh4GvE4Op1w2G9g6f9C8DmxSZ78G5G0sC6wBvAmsWJm/P+n4F8AVwMXA2ot4rr4JXFu5vwwwExiW77/zAbuI9VQ/iH8DHNpmna+RujK3A/5a29c267iMRYfagcCc/BysAMwHPtemHt9u85z8pZ11nQLcVrk/CHi9zfuoFmpfAu7I0yKFyScXtb91tnkJcHVnn8s680bW9ie/12YDOwA92yx3F3Aq0KdN+deBn7Upu5X0ha4X6f9lz+r7yrf33tz9aM0yMiJWzbeRlfLplel1gGNzN8yLkl4E+gNr5dvMyP/12dOd3HZfUotiUmW9v83lNS9ExILK/ddI36j7kD6s/7aIbaxDalnNqmzjIlKLDVKLS8D43K12SDvrWau6XxHxNuk56rfo3eywbj+q1Gturku/iLgDOI/UGp0t6WJJKy/GukeRQnhBRLwBXEebLkje+xr3p+Pn8tnK9GvACqp/vPU64OOS1iR94XmbdBwROtjfOut5gdQC6hRJayh1e8+U9BLwc9J7hIiYChxDCufZebnagKhDSa3wxyRNkPTZSl33bvOe35bUWn4V2Bc4gvS+ulnSRzpb11bhULOlTTWkpgOnV8Jv1Yj4QERcReqy6idJleU/VJl+lRRcAEj698q850mtrcGV9a4SaUDFojxP6hJbbxHLTSe11PpUtrFyRAwGiIhnI+KwiFgLOBy4QPl4XxvPkD7oavshUhDM7ERdO6rb4W2e1xUj4p5ct3MjYnNSy2gD4Kv5cR1e0kPS2qSW3oGSnpX0LKlbeBdJfSqLtn2NP/w+9oVc53nA70gf+p8ntbZq2+lwf9u4Ddgy70tnfJe0Px+LiJVJLdV33pMR8YuI2Jb0GgZwZi5/IiL2J33JORP4laReua4/a1PXXhFxRn7crRHxGVLwPkZqWVqFQ82WZpcAR0jaKg9g6CVpV0m9ScckFgBfkdRT0h7AlpXHPgAMljRE0gqkb8vAO62dS4BzJH0QQFI/STstqkL5sWOAH+YBCj0kfVzS8m2Wm0X6kP2BpJUlLZMHFXwqb2/vygfnPNIH3tt1NnktsKuk7fOgh2NJYVnvA7mzfgKcIGlwrssqkvbO01vk57sn6YvBG5V6PUfHAfQFUtflhsCQfNsAmEHqZqznJmBNSccoDd7pLWmrf3G/fgEcRArSX1TK293ftiLiNtJx3RskbS5p2VynI9ppTfcmdaXPl9SPd78A1H5DuF1+b7xB+iL1dp53oKS++f30Yn7I26SW3m6SdsrvrRWUBi6tnVuFI3L4vZm3W+8909IcarbUioiJwGGk7rB5pIP9B+d5bwF75PtzSd/Qr6889q+kgR63kUZSvmckJOnYxVTg3txtdBvpw7gzjgMeAibkbZ9J/f+lg0gDWh7J9f8V73ZtbQHcJ+kV0oCVoyPiyTrPweOkb/8/JrUSdyP9HOKtTtb1n0TEDbnOV+d9fxjYOc9emRT480jdni+QBl0AXAoMyt1i/1dn1aOAC3Ir9J0bKVTadkHW6vIyaTDEbqSuxidIIzr/FWOBgcCzEfFAJ/e3nr1IIzevIR0TfBgYSnqPtHUqsFle7mYq70HS4JEzSK/bs6RW2Ql53nBgSn79fwTsFxGvR8R00sCWE0nHJqeTgnKZfPtvUut9LunY8n8u6klpNXrvIQmz7kvSZcCMiPhGs+tiZs3hlpqZmRXDoWZmZsVw96OZmRXDLTUzMyuGQ83MzIrhM6E3UZ8+fWLAgAHNroaZWbcyadKk5yOib715DrUmGjBgABMnTmx2NczMuhVJ7Z4Sz92PZmZWDIeamZkVw6FmZmbFcKiZmVkxHGpmZlYMh5qZmRXDoWZmZsVwqJmZWTH842sz61YGHH9zs6vQJaadsWuzq1Akt9TMzKwYDjUzMyuGQ83MzIrhUDMzs2I41MzMrBgONTMzK4ZDzczMiuFQMzOzYjjUzMysGA41MzMrhkPNzMyK4VAzM7NiONTMzKwYDjUzMytGy4eapBUkjZf0gKQpkk7N5etKuk/SVEnXSFouly+f70/N8wdU1nVCLn9c0k7N2SMzs9bV8qEGvAlsFxGbAEOA4ZK2Bs4EzomI9YF5wKF5+UOBebn8nLwckgYB+wGDgeHABZJ6NHRPzMxaXMuHWiSv5Ls98y2A7YBf5fLLgZF5ekS+T56/vSTl8qsj4s2IeAqYCmzZgF0wM7Os5UMNQFIPSZOB2cA44G/AixGxIC8yA+iXp/sB0wHy/PnAv1XL6zymuq3RkiZKmjhnzpwlsTtmZi3LoQZExMKIGAKsTWpdfWQJbuviiBgaEUP79u27pDZjZtaSHGoVEfEi8Hvg48CqkpbNs9YGZubpmUB/gDx/FeCFanmdx5iZWQO0fKhJ6itp1Ty9IvAZ4FFSuO2VFxsF3Jinx+b75Pl3RETk8v3y6Mh1gYHA+MbshZmZASy76EWKtyZweR6puAxwbUTcJOkR4GpJpwF/AS7Ny18K/EzSVGAuacQjETFF0rXAI8AC4MiIWNjgfTEza2ktH2oR8SCwaZ3yJ6kzejEi3gD2bmddpwOnd3Udzcysc1q++9HMzMrhUDMzs2I41MzMrBgONTMzK4ZDzczMiuFQMzOzYjjUzMysGA41MzMrhkPNzMyK4VAzM7NiONTMzKwYDjUzMyuGQ83MzIrhUDMzs2I41MzMrBgONTMzK4ZDzczMiuFQMzOzYjjUzMysGA41MzMrhkPNzMyK4VAzM7NitHyoSeov6feSHpE0RdLRufwUSTMlTc63XSqPOUHSVEmPS9qpUj48l02VdHwz9sfMrJUt2+wKLAUWAMdGxP2SegOTJI3L886JiLOrC0saBOwHDAbWAm6TtEGefT7wGWAGMEHS2Ih4pCF7YWZmDrWImAXMytMvS3oU6NfBQ0YAV0fEm8BTkqYCW+Z5UyPiSQBJV+dlHWpmZg3S8t2PVZIGAJsC9+WioyQ9KGmMpNVyWT9geuVhM3JZe+VttzFa0kRJE+fMmdPFe2Bm1tocapmklYDrgGMi4iXgQmA9YAipJfeDrthORFwcEUMjYmjfvn27YpVmZpa1fPcjgKSepEC7MiKuB4iI5yrzLwFuyndnAv0rD187l9FBuZmZNUDLt9QkCbgUeDQiflgpX7Oy2OeAh/P0WGA/SctLWhcYCIwHJgADJa0raTnSYJKxjdgHMzNL3FKDbYAvAA9JmpzLTgT2lzQECGAacDhAREyRdC1pAMgC4MiIWAgg6SjgVqAHMCYipjRyR8zMWl3Lh1pE3A2ozqxbOnjM6cDpdcpv6ehxZma2ZLV896OZmZXDoWZmZsVwqJmZWTEcamZmVgyHmpmZFcOhZmZmxXComZlZMRxqZmZWDIeamZkVw6FmZmbFcKiZmVkxHGpmZlYMh5qZmRXDoWZmZsVwqJmZWTEcamZmVgyHmpmZFcOhZmZmxXComZlZMRxqZmZWDIeamZkVw6FmZmbFcKiZmVkxWj7UJPWX9HtJj0iaIunoXL66pHGSnsh/V8vlknSupKmSHpS0WWVdo/LyT0ga1ax9MjNrVS0fasAC4NiIGARsDRwpaRBwPHB7RAwEbs/3AXYGBubbaOBCSCEInAxsBWwJnFwLQjMza4yWD7WImBUR9+fpl4FHgX7ACODyvNjlwMg8PQK4IpJ7gVUlrQnsBIyLiLkRMQ8YBwxv4K6YmbW8lg+1KkkDgE2B+4A1ImJWnvUssEae7gdMrzxsRi5rr7ztNkZLmihp4pw5c7q0/mZmrc6hlklaCbgOOCYiXqrOi4gAoiu2ExEXR8TQiBjat2/frlilmZllDjVAUk9SoF0ZEdfn4udytyL57+xcPhPoX3n42rmsvXIzM2uQlg81SQIuBR6NiB9WZo0FaiMYRwE3VsoPyqMgtwbm527KW4EdJa2WB4jsmMvMzKxBlm12BZYC2wBfAB6SNDmXnQicAVwr6VDgaWCfPO8WYBdgKvAa8EWAiJgr6TvAhLzctyNibmN2wczMwKFGRNwNqJ3Z29dZPoAj21nXGGBM19XOzMwWR8t3P5qZWTkcamZmVoxFdj9KWgg8VCkaGRHTlliNzMzM/kWdOab2ekQMqTcjjxxURLzdtdUyMzNbfIvd/ShpgKTHJV0BPAz0l/RVSRPyCX5PrSx7kqS/Srpb0lWSjsvld0oamqf7SJqWp3tIOquyrsNz+bD8mF9JekzSlTlQkbSFpHskPSBpvKTeku6SNKRSj7slbfI+niczM+sGOtNSW7Ey1P0p4L9IJ/MdFRH3Stox39+SNIpwrKRPAq8C+wFD8nbuByYtYluHkn73tYWk5YE/SfpdnrcpMBh4BvgTsI2k8cA1wL4RMUHSysDrpN+dHQwcI2kDYIWIeKAT+2pmZt3YYnc/5vMjPp1P5gvpR8Y7An/J91cihVxv4IaIeC0/bmwntrUjsLGkvfL9VfK63gLGR8SMvK7JwABgPjArIiYA1E5vJemXwDclfRU4BLisE9s2M7Nu7l/9ndqrlWkB34uIi6oLSDqmg8cv4N2uzxXarOvLEfGeM3FIGga8WSlaSAd1j4jXJI0jnVF/H2DzDupiZmaF6Ioh/bcCh+QTAiOpn6QPAncBIyWtKKk3sFvlMdN4N2j2arOu/8znYkTSBpJ6dbDtx4E1JW2Rl+8tqRZ2PwXOBSbkS8GYmVnh3vcZRSLid5I+Cvw5j914BTgwIu6XdA3wAOlkwBMqDzubdAqq0cDNlfKfkroV788DQebw7nXM6m37LUn7Aj+WtCLpeNoOwCsRMUnSS8D/vt99NDOz7kHprE8N2JB0Cilszm7Q9tYC7gQ+srT+5GDo0KExceLEZlfDrFsZcPzNi16oG5h2xq7NrkK3JWlSRAytN6/IM4pIOoh0oc+TltZAMzOzrtewExpHxCkN3NYVwBWN2p6ZmS0dimypmZlZa3KomZlZMRxqZmZWDIeamZkVw6FmZmbFcKiZmVkxHGpmZlYMh5qZmRXDoWZmZsVo+VCTNEbSbEkPV8pOkTRT0uR826Uy7wRJU/PVv3eqlA/PZVMlHd/o/TAzM4capAuIDq9Tfk5EDMm3WwAkDSJdzXtwfswFknpI6gGcD+wMDAL2z8uamVkDNezcj0uriLgrX827M0YAV0fEm8BTkqYCW+Z5UyPiSQBJV+dlH+ni6pqZWQfcUmvfUZIezN2Tq+WyfsD0yjIzcll75f9E0mhJEyVNnDNnzpKot5lZy3Ko1XchsB4wBJgF/KCrVhwRF0fE0IgY2rdv365arZmZ4e7HuiLiudq0pEuAm/LdmUD/yqJr5zI6KDczswZxS60OSWtW7n4OqI2MHAvsJ2l5SesCA4HxwARgoKR1JS1HGkwytpF1NjMzt9SQdBUwDOgjaQZwMjBM0hAggGnA4QARMUXStaQBIAuAIyNiYV7PUcCtQA9gTERMafCumJm1vJYPtYjYv07xpR0sfzpwep3yW4BburBqZma2mNz9aGZmxXComZlZMRxqZmZWDIeamZkVw6FmZmbFcKiZmVkxHGpmZlYMh5qZmRXDoWZmZsVwqJmZWTEcamZmVgyHmpmZFcOhZmZmxXComZlZMRxqZmZWDIeamZkVw6FmZmbFcKiZmVkxHGpmZlYMh5qZmRXDoWZmZsVwqJmZWTEcamZmVoyWDzVJYyTNlvRwpWx1SeMkPZH/rpbLJelcSVMlPShps8pjRuXln5A0qhn7YmbW6lo+1IDLgOFtyo4Hbo+IgcDt+T7AzsDAfBsNXAgpBIGTga2ALYGTa0FoZmaN0/KhFhF3AXPbFI8ALs/TlwMjK+VXRHIvsKqkNYGdgHERMTci5gHj+OegNDOzJazlQ60da0TErDz9LLBGnu4HTK8sNyOXtVf+TySNljRR0sQ5c+Z0ba3NzFqcQ20RIiKA6ML1XRwRQyNiaN++fbtqtWZmhkOtPc/lbkXy39m5fCbQv7Lc2rmsvXIzM2sgh1p9Y4HaCMZRwI2V8oPyKMitgfm5m/JWYEdJq+UBIjvmMjMza6Blm12BZpN0FTAM6CNpBmkU4xnAtZIOBZ4G9smL3wLsAkwFXgO+CBARcyV9B5iQl/t2RLQdfGJmZktYy4daROzfzqzt6ywbwJHtrGcMMKYLq2ZmZovJ3Y9mZlYMh5qZmRXDoWZmZsVwqJmZWTEcamZmVgyHmpmZFcOhZmZmxXComZlZMRxqZmZWDIeamZkVw6FmZmbFcKiZmVkxHGpmZlYMh5qZmRXDoWZmZsVwqJmZWTEcamZmVgyHmpmZFcOhZmZmxXComZlZMRxqZmZWDIeamZkVw6HWAUnTJD0kabKkiblsdUnjJD2R/66WyyXpXElTJT0oabPm1t7MrPU41Bbt0xExJCKG5vvHA7dHxEDg9nwfYGdgYL6NBi5seE3NzFqcQ23xjQAuz9OXAyMr5VdEci+wqqQ1m1FBM7NW5VDrWAC/kzRJ0uhctkZEzMrTzwJr5Ol+wPTKY2fkMjMza5Blm12Bpdy2ETFT0geBcZIeq86MiJAUi7PCHI6jAT70oQ91XU3NzMwttY5ExMz8dzZwA7Al8FytWzH/nZ0Xnwn0rzx87VzWdp0XR8TQiBjat2/fJVl9M7OW41Brh6ReknrXpoEdgYeBscCovNgo4MY8PRY4KI+C3BqYX+mmNDOzBnD3Y/vWAG6QBOl5+kVE/FbSBOBaSYcCTwP75OVvAXYBpgKvAV9sfJXNzFqbQ60dEfEksEmd8heA7euUB3BkA6pmZmbtcPejmZkVw6FmZmbFcKiZmVkxHGpmZlYMh5qZmRXDoWZmZsVwqJmZWTEcamZmVgyHmpmZFcOhZmZmxXComZlZMRxqZmZWDIeamZkVw6FmZmbFcKiZmVkxHGpmZlYMh5qZmRXDV74262YGHH9zs6vQZaadsWuzq2CFcUvNzMyK4VAzM7NiONTMzKwYDjUzMyuGB4p0U60+WKDV99/M6nNLrYtJGi7pcUlTJR3f7PqYmbUSh1oXktQDOB/YGRgE7C9pUHNrZWbWOtz92LW2BKZGxJMAkq4GRgCPNLVWZlYEd7svmiJiiay4FUnaCxgeEV/K978AbBURR1WWGQ2Mznc3BB5veEUXTx/g+WZXoklaed+htfff+750Wyci+tab4ZZag0XExcDFza5HZ0maGBFDm12PZmjlfYfW3n/ve/fddx9T61ozgf6V+2vnMjMzawCHWteaAAyUtK6k5YD9gLFNrpOZWctw92MXiogFko4CbgV6AGMiYkqTq/V+dZuu0iWglfcdWnv/ve/dlAeKmJlZMdz9aGZmxXComZlZMRxqZotBkppdBzNrn0PN/iWt9uEuaVtJW0aLHYSWtJakDza7HtYcktZodh0Wl0PNFoukzSR9oNU+3IHNgBslbQqtEer5A+0s4HPd8cPt/ar3GrfC6w5pPyX1AcZL+nyz67M4HGrWaZKGA78EPtbsujSKpGUAIuJc4GrgZ7UWW8kfcJIUEc8BPwOGArtIWqXJ1Wqo2hc3SZ+VdLiklVrly1wkzwNHAadI2rvZdeosh5p1iqQPAWcAB0fEfc2uT6NExNsAko4EVgaeBX4raatSgy0HWu3De3VgfeB7wH6t0BVZfU0lHQJ8F9gJuEnSpiW+5lW1/ZO0TET8GjgG+L6kfZtbs85xqFmHKv/AIl2B4I+5fLn8t2ez6tYokjYifWM9NSJ2AL4B3CBp6xK/uVdaKHsAxwF7AacBnwR2K7nFVg30vJ8CdoyIPYA/At8CNik12Np8oVlfUp+IuAX4AnBGdwg2h5otSq/89xlgLUnHAkTEW5K2B35Y66IrRZ0PrFnAROAtST0j4gLg18CdkjZueAWXEEmfknRWpWhtYEJEvBAR55H2+URgVIkttjaBdizwJ+BrwJcBIuKbwEPADym0C76y//8NXAT8UtL3SOew3R/4Tr76yFKrqA8j61qSdgKulPQN4ADgK8B2ks7Pl9n5AXBbrYuuBG2/qecLv74ErAQcBNT29Q/AbXleKf4KfEnSGfn+fcAHaoNjIuJq0rUBPwa80ZwqLjmV1/0TwBbAvqQQ/0g+/R0R8S3gDmBus+q5pEnaHDgQ2JW0/38nhftE4L+BYyX1bl4NO+ZzP1pdkrYlfSM9hNT1tjFpkMiRpDf6xsAJEfGbNl0W3Vrlg+1I0hXMHyGF1+HAdcB6uWU6FBgREX9vVl27iqRPA70jYqykjwATJS2MiJMk7QPsI2kr4GXgA8BpEVFSmAPvtNA/Rmqh3BcRUyQ9CbwCHCZpxYg4KyJOa2pFu1jt/7fyf7wK8EJEvAb8WdI80jHFT0TETZLujIhXmlrpDjjU7J/kb2F9SS2zZYCPAntGxKt5OP/oyrLFBFqN0oVc9wEOA84Edsx/dwI+DawHnF1CoGWvAE9KWjsiZkjakhRsrwBfJR1P+TSwKnB0RDzdxLp2qer7N/99MHfBHiZpm4j4k6TfA8uTwn014MVS3vN5MEit96E3qefhXmCepKMi4ryIeEzSs8BA4C7gtSZVt1N8QmN7D0k7Ap8AppJGOz4PbBcRc3N35DbAmRHxahOrucTkQD8AuIZ3u2B+DJwMXBoRFzWxel2u8i19JHA5cGRE/FzSmqTupp9ExHfysr0Kft0PIH1ozwZ+TnrdDyUNDvpjHhC1XMH7Pxr4DGn/7yV1L38C+HdSd+txwPCIeKpplewkH1Ozd0jaDNgduD0ifg78HzAZCEn/QTqGNr6kf+y2g0Ii4uWI+Alp+P7OwH4RcTMwB9hd0uoljHyr7UMOtJWBIaTu5W9J+mJEzCJ1sZ4g6Tv5YUv1N/TFoXSmlBXz9JdJg0HmARuSLh11K3AZcLakj0fEP0p631dJ2o10rOwM0vGzDUm9M+fl+2sAe3SHQAN3P1qWP+R+CvyDNKJRwKXAnsBvgReBE3OfejFdjpVjaEcB6wKrkf65nwOWA/5d0mdJ+390RBQxQKCy34Mi4hFJM4BpwFXAJZKIiP+VtC4p4CnlNZfUDzgeeFjSFcAA0mt7X55/IvD9iPhSHtZf7NXrJW1Det//KCImSXoM2I70f/9sRHy9u/2/u6VmtUEhw0ndbKsDu+UzCvwlIr5BOqa0Zx5I0K3e4O3J39Q/kKePBEYC5wObAF+OiPnAeFLAfZX0Ifd8s+q7JEj6OPAbSYeTfoP1n6Qg3x04R9IBEfFcRDzRzHouAc8Ak0jdjQcAg4FPVebfRP5sjIjzCzp22vaH5SuT9n8r4GBJG0fEq/kH12sBG0D3+zLjllqLy8OXLwHuB2aQutlOkvR2RPwYIH/Ak6e71Ru8njbf1McAK5J+g3MQ6Ywhx+UD6CfmLqrlqs9BCZR+PD+d1Ao5nPRbvHtIXcyfIh1fmde0Ci4hlWOIywCD8u1+4MuS5kbET0kjIAdIWhWYX8J7vqZOC30SqaU2HvgvSVeRRriuQnpPdDseKNLC8ii3M0lD8++VtD556C5ptNslEXFyM+u4JORvq6OAjUhD9rcD+pP+iQ+MiAX5OMs/gItK+lCDd1pow4FrgdeB/wGuB3qShrOfGhGnNq+GS/KMDW0AAAV2SURBVFYeFHIc8EXSYJDnSSM79yS10j4J7BsRU5pWySUov/5Xk07/dQfpMMOfSAODjiCNgPx2RDzQtEq+D+5+bG2rkP6Bt8v3nya11v5GGuU4rkn1WmIq3ae1b+p7k/Z3MHBXDrSDSV1xt5cWaNn0fLscGAbcDLwUEZeQfsbw8+ZVrSE2BH4REZOBY4H5pO6284DvA8MKDrS2LfRBpB/Z7wo8QDrU8PnuGmjgUGtpETEO2AM4RNL+EfEP0oCIzwJzI+LuEkb6VeWupwNIo91OJAXaQuAK4BhJF5I+2Pcq8FgSABExI3ezHUo6r+OBpG/tRMSlEfG3ZtavAe4HtpE0OCLeioj/AT4M/BvpfV/UsdOa3EI7ifRl9kDSF9jVgCdIvRafB5aJiDebVsku4GNqLS4ibpT0Nul0WHuSTgN1Su0YUqEtlXe+qSud4+7/kf6pLyK1XhZExIvNrGAjRMQDuVW6PXC0pAERMa25tWqIO0mnwfq8pDtIx1TnA+dGgWdKqai20C/g3Rb69ZIWAndGxMJmVrAr+JiaASBpd+DbwJURcVb1d0zNrVnXyz80Phg4qdbNJGkC8HsKPQXUoiidqPkfza5Ho0hai9RLsQewADguIh5sbq0aQ9ImpEsJ9Qb6RsRHmlylLuVQs3fks4mMAb4SEdc3uz5LSh7V9tV8t/ZN/RjgoIh4pmkVs4aT1Iv0ObjUnstwSVC6ysL2wNGkEwxMa26Nuo5Dzd5D0meAv0XEk82uy5LUyt/UzWpKbKE71Kylteo3dbNSOdTMzKwYHtJvZmbFcKiZmVkxHGpmZlYMh5qZmRXDoWa2lJI0UlJIWuwfx0qaJqlPnfLdJR3fRfXbWdJESY9I+oukH+TyUyQd1xXbMFtcDjWzpdf+wN357z+RtNinuYuIsRFxxvutmKSNSCcAPjAiBpGukj31/a7X7P1yqJkthSStBGxLOunwfpXyYZL+KGks8IikHpLOlvSwpAfzJXNqvizpfkkP1Vp7kg6WdJ6kVSQ9na8rhqRekqZL6ilpPUm/lTQpb6teS/FrwOkR8RhARCyMiAvr7MdhkiZIekDSdZULs+6d6/yApLty2WBJ4yVNzvsysEueTGspDjWzpdMI4LcR8VfgBUmbV+ZtBhwdERsAo4EBwJCI2Bi4srLc8xGxGXAh6fph78gnrJ7Mu1d8/ixwaz67xMWkq39vnh93QZ36bUS6evSiXB8RW0TEJsCjpJAG+BawUy7fPZcdAfwoIoaQWn4zOrF+s/dwqJktnfYnXciR/LfaBTk+Ip7K0zuQLmS6ACAi5laWq52/cxIp+Nq6Btg3T+8HXJNbiJ8AfilpMunKBWu+j/3YKLf2HgIOIF23DtJFKS+TdBjQI5f9GThR0teBdSLi9fexXWtRvvSM2VJG0uqkC7d+TFKQPvRDUu0kzK92clW162ItpP7/+ljgu3l7m5NO7twLeDG3ljoyJT9mUReTvAwYWbnMzTCAiDhC0laki1NOkrR5RPxCUu2ClbdIOjwi7ljE+s3ewy01s6XPXsDPImKdiBgQEf2Bp4D/qLPsOODw2qCRHFCdks93OQH4EXBTPi72EvCUpL3z+pQvVdLWWaRW1QZ5uWUkHVFnud7ALEk9SS018vLrRcR9EfEtYA7QX9KHgScj4lzgRmDjzu6LWY1DzWzpsz9wQ5uy66g/CvKnwN+BByU9QLp68eK4hnQV5GsqZQcAh+b1TSEd33uPfEWDY4CrJD0KPEy6enRb3wTuI3U3PlYpPysPYHkYuIfU4tsHeDh3e25Euhq52WLxCY3NzKwYbqmZmVkxHGpmZlYMh5qZmRXDoWZmZsVwqJmZWTEcamZmVgyHmpmZFcOhZmZmxfj/g3lPj2JMgb8AAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light", "tags": [] }, "output_type": "display_data" } ], "source": [ "plt.xlabel('Archive Class')\n", "plt.ylabel('Frequency', rotation=0, labelpad=30)\n", "plt.title('Frequencies of Test Archive Classes')\n", "plt.xticks(rotation=45)\n", "test_counts = [(class_series[X_test.index])[class_series == label].count() for label in labels]\n", "plt.bar(labels, np.asarray(test_counts));" ] }, { "cell_type": "markdown", "id": "8d0f0054", "metadata": { "id": "kYdnsxgYHW8a" }, "source": [ "Below is a chart of the predicted frequency of the different archives by our ML model in the test set. Our predicted frequency looks very similar to the actual frequency above, which is good." ] }, { "cell_type": "code", "execution_count": null, "id": "56ad4019", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 310 }, "id": "v6b1AStxRIW7", "outputId": "59bc33f3-94cd-47b7-8f83-ef8a76fb7d03" }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAbUAAAElCAYAAABjzHyeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3debxd093H8c9XxBRBVKoyVDwERQliaGmbUgQlqVmDGCr0iaKlrdKWtjzV0nqqhqJS1Kyl8qA0piotGTRBDHURkggSScxF4vf8sdZhu869ube595zcfb7v1+u8zj5rT2vts8/+7bX2OnsrIjAzMyuDpeqdATMzs47ioGZmZqXhoGZmZqXhoGZmZqXhoGZmZqXhoGZmZqXhoGalI+mTkl6X1K2G61xP0mRJr0k6ulbrzes+RdLlebhmZZc0TdKXOns9tSbpYEn3tjL+z5JG1jJPeb2XSDq11uvtahzUGlg+KL2VD4KVV59652txRcRzEbFiRCys4Wq/A9wVET0j4uzmIyXdLenfeRvPkXS9pDU6OhNtLbukIZJmdPT687L/XNif3pX0TuHzb/6D5b0ftNsw7d2S5klatv05b5uI2DkiLu3o5So5WtIjkt6QNEPSdZI+3dHrKjMHNdstHwQrr+eLIyUtXa+MdTFrAlMXMc1REbEisC6wCnBW8wnKsL3zQX/FXNYrgJ8X9q8jO2u9kgYAnwMC2H0R09asFt8OvwKOAY4GViXtJ38Cdq1nproaBzX7CEkhabSkJ4Enc9qXc/PafEl/l7RxYfpNJT2Ym96ukXR1pZmkWlNOXv46eXhZSWdKek7Si5J+I2n5PG5IPls9TtJLkmZJOqSwnOUl/ULSs5JekXRvThuQ17F0nm5lSRfn+WdKOrVyUJO0jqS/5vnnSLqmle2yu6SpeRvcLelTOf1O4IvAObk2sm5r2zci5gJ/BDbK80+T9F1JDwFvSFpa0tZ5O8+XNEXSkEI+1sp5fk3SOGC1wrjmZV9V0u8kPZ9rMH+S1AP4M9CnWEOXtJSkEyQ9JellSddKWrWw7APztn5Z0kmtlbGVbdjafvTd/P28JukJSdtLGgqcCOyb8zmllcUfBNwPXAJ8qHlQqenufEm3SHoD+KKk/ko15tm5TOc0m+fMvM2ekbRzIf1uSV/L++58SRsVxvVWav34+KLK22xdA4HRwP4RcWdEvB0Rb0bEFRFxepXpe0m6Ked9Xh7uVxh/sKSn87Z8RtKInN7i/i5pfUnjJM3N23+fwrhdJD2alzdT0vGtfA/1FRF+NegLmAZ8qUp6AONIZ4vLA5sCLwFbAd1IB4xpwLLAMsCzwDeB7sBewLvAqXlZBwP3Vln+Onn4LGBsXldP4P+An+ZxQ4AFwI/zsncB3gR65fHnAncDfXO+PpvzNCCvY+k83Q3ABUAP4OPAeOCIPO4q4CTSCd5ywLYtbKt1gTeAHXJevgM0Acvk8XcDX2tlW78/nhSE7gR+X/geJgP98/buC7ycy7tUXufLQO88/T+AX+ayfh54Dbg8j2te9puBa4BeOd9fKGzbGc3yeAwpKPTLy74AuCqP2wB4Pa9v2bz+BVTZf5ot85LCvtDafrQeMB3oUyjH2nn4lEr5FrGuJuC/gc1J++DqzfLxCrBN3qY9gCmk/a9H8bsn7bPvAofnfH4deB5Qle9yDHBaYT2jgVsXVd4qeT8SeLYd2/JjwJ7ACqTfzXXAn/K4HsCrwHr58xrAhq3t73me6cAhwNI573OADfL4WcDn8nAvYLN6H79a3E71zoBfdfzy0w/sdWB+flV+FAFsV5jufOAnzeZ9AvgC6SD3/g8+j/s7bQhqgEiBYu3CuM8Az+ThIcBb5AN0TnsJ2Dr/KN8CNqlSrgF5HUsDqwNvA8sXxu9Puv4FcBlwIdBvEdvqB8C1hc9LATOBIfnz+we6Fua/mxSQ5+f5ruCDIDUNOLQw7XfJAa+QdhvpoPhJUjDpURh3JVWCWj6YvUc+CWi2vCF8NKg9Bmxf+LwG6eC+NPBD4OrCuB7AO7QvqLW2H62Tv9svAd2bTXMKiwhqwLY5r6vlz48D32yWj8ua7Wezi/tWYdzBQFPh8wp5m36i+Xed8/tUYdr7gIMWVd4q6zwJuL+t27LKuEHAvMJ3M58U9JZvNl3V/R3YF/hbs7QLgJPz8HPAEcBKreVxSXi5+dGGR8Qq+TW8kD69MLwmcFxuQpkvaT6pVtEnv2ZG3vOzZ9u47t6kA8akwnJvzekVL0fEgsLnN4EVSbWd5YCnFrGONUk1lFmFdVxAqrFBqnEJGJ+bFg9tYTl9iuWKiPdI26jvoov5vqPzdu4bESMiYnZhXPPtvXez7b0tKcj0IR283ihM39L27g/MjYh5bczfmsANhXU+BiwknRj0KeYxr//lNi63uPyq+1FENAHHkgLYS0pN2O3ptDQS+EtEzMmfr6RZEyQf3sb9STWjBVT3QmUgIt7MgytWme4uYAVJWyld0xtEahmA1n83zb1M+n7bRNIKki7IzcGvAvcAq0jqlr+bfUm1v1mSbpa0fp61pf19TWCrZnkdAXwij9+T1HLwbG6+/Exb81prXf6itHWaYpCaTmpiOa35RJK+APSVpEJg+yQfBJs3SIGrMv0nCrPPIdW2NoyIme3M3xzg38DapGaklkwn1dRWq3YAi4gXSM1MSNoWuF3SPfkgW/Q88H4vNEkiHaDam++WNN/ev4+Iw5tPJGlNoJekHoXA9slm8xeXs6qkVSJifivrK05/aETcV2W9s4BPFT6vQGoCa48W9yOAiLgSuFLSSqQTj58BB7aQ12Lelgf2AbpJqgSjZUkH+U0iorJ/NN/Gn5S0dCuBbZEiYqGka0m1/xeBmyLitcI6WixvM3cA50oaHBET2zD9caQm260i4gVJg4B/kgIWEXEbcFveNqcCF5GaD6vu7zmvf42IHVoo5wRgmKTuwFHAtaT9f4njmpq1xUXAkflsVJJ6SNpVUk/S9Z0FwNGSukvaA9iyMO8UYENJgyQtRzoTB96v7VwEnFW4sN5X0k6LylCedwzwS6VODt0kfUbNunJHxCzgL8AvJK2k1Bli7RyMkbR34QL7PNKB770qq7wW2FWp80J30kHlbVJTa0e7HNhN0k65XMspdZrpFxHPAhOBH0laJh+Ydqu2kFz2PwPn5Y4F3SV9Po9+EfiYpJULs/wGOC0Hzkqnh2F53B+AL0vaVtIypOuc7T1+tLgfKf3Pb7v8/f2bdLJT+R5eBAZIaml9w0k1yg1INaVBpAD8N1LnkWrGk64TnZ7zsZykbdpZnoorSTWjEXl4keVtvoCIeBI4D7gqf9fL5DztJ+mEKuvsSdpG85U685xcGSFpdUnDlDoEvU26xPBeHtfS/n4TsK5SZ6Du+bWFpE/lvIyQtHJEvEu6XlftN7JEcFCzRcpnjocD55B+CE2k6w5ExDvAHvnzXNKP+/rCvP8iHQBvJ/WkbP6n1u/m5d2fm1FuJ52BtsXxwMPAhLzun1F9nz6I1KHl0Zz/P/BBU88WwAOSXid1WDkmIp6usg2eAA4Afk2qJe5G+jvEO23Ma5tFxHRgGKnX32zSWfS3+aBsXyV1PphLOphd1sriDiRda3qcdM3q2LyOx0mdBp7OzU19SF3KxwJ/kfQaqdPIVnn6qaROEFeSgsE8oF3/c2ttPyLVrE4nbdsXSM3D38vjrsvvL0t6sMqiRwK/i/QfvRcqr7yeEaryN4lI/+PbjXQt77lcln3bU57Csh4gtUj0IZ1EtKW81Rydpz2XdE3sKeArpM5Tzf0vqVPRHNL3dGth3FLAt0itC3NJ1yy/nsdV3d9z7XJHYL883wuk31PlJPFAYFr+jR5JCuBLpEpvHrMOI+kSUieE79c7L2bWWFxTMzOz0nBQMzOz0nDzo5mZlYZramZmVhoOamZmVhr+83UdrbbaajFgwIB6Z8PMrEuZNGnSnIjoXW2cg1odDRgwgIkT23LzADMzq5DU4q343PxoZmal4aBmZmal4aBmZmal4aBmZmal4aBmZmal4aBmZmal4aBmZmal4aBmZmal4T9fm1mXMuCEm+udhQ4x7fRd652FUnJNzczMSsNBzczMSsNBzczMSsNBzczMSsNBzczMSqPhg5qk5SSNlzRF0lRJP8rpa0l6QFKTpGskLZPTl82fm/L4AYVlfS+nPyFpp/qUyMyscTV8UAPeBraLiE2AQcBQSVsDPwPOioh1gHnAYXn6w4B5Of2sPB2SNgD2AzYEhgLnSepW05KYmTW4hg9qkbyeP3bPrwC2A/6Q0y8FhufhYfkzefz2kpTTr46ItyPiGaAJ2LIGRTAzs6zhgxqApG6SJgMvAeOAp4D5EbEgTzID6JuH+wLTAfL4V4CPFdOrzGNmZjXgoAZExMKIGAT0I9Wu1u+sdUkaJWmipImzZ8/urNWYmTUkB7WCiJgP3AV8BlhFUuU2Yv2AmXl4JtAfII9fGXi5mF5lnuI6LoyIwRExuHfv3p1SDjOzRtXwQU1Sb0mr5OHlgR2Ax0jBba882Ujgxjw8Nn8mj78zIiKn75d7R64FDATG16YUZmYGvqExwBrApbmn4lLAtRFxk6RHgaslnQr8E7g4T38x8HtJTcBcUo9HImKqpGuBR4EFwOiIWFjjspiZNbSGD2oR8RCwaZX0p6nSezEi/g3s3cKyTgNO6+g8mplZ2zR886OZmZWHg5qZmZWGg5qZmZWGg5qZmZWGg5qZmZWGg5qZmZWGg5qZmZWGg5qZmZWGg5qZmZWGg5qZmZWGg5qZmZWGg5qZmZWGg5qZmZWGg5qZmZWGg5qZmZWGg5qZmZWGg5qZmZWGg5qZmZWGg5qZmZWGg5qZmZWGg5qZmZWGg5qZmZWGg5qZmZWGg5qZmZVGwwc1Sf0l3SXpUUlTJR2T00+RNFPS5PzapTDP9yQ1SXpC0k6F9KE5rUnSCfUoj5lZI1u63hlYAiwAjouIByX1BCZJGpfHnRURZxYnlrQBsB+wIdAHuF3Sunn0ucAOwAxggqSxEfFoTUphZmYOahExC5iVh1+T9BjQt5VZhgFXR8TbwDOSmoAt87imiHgaQNLVeVoHNTOzGmn45sciSQOATYEHctJRkh6SNEZSr5zWF5hemG1GTmsp3czMasRBLZO0IvBH4NiIeBU4H1gbGESqyf2ig9YzStJESRNnz57dEYs0M7PMQQ2Q1J0U0K6IiOsBIuLFiFgYEe8BF/FBE+NMoH9h9n45raX0D4mICyNicEQM7t27d8cXxsysgTV8UJMk4GLgsYj4ZSF9jcJkXwEeycNjgf0kLStpLWAgMB6YAAyUtJakZUidScbWogxmZpY0fEcRYBvgQOBhSZNz2onA/pIGAQFMA44AiIipkq4ldQBZAIyOiIUAko4CbgO6AWMiYmotC2Jm1ugaPqhFxL2Aqoy6pZV5TgNOq5J+S2vzmZlZ52r45kczMysPBzUzMysNBzUzMysNBzUzMysNBzUzMysNBzUzMysNBzUzMysNBzUzMysNBzUzMysNBzUzMysNBzUzMysNBzUzMysNBzUzMysNBzUzMysNBzUzMysNBzUzMysNBzUzMysNBzUzMysNBzUzMysNBzUzMysNBzUzMysNBzUzMysNBzUzMysNBzUzMyuNhg9qkvpLukvSo5KmSjomp68qaZykJ/N7r5wuSWdLapL0kKTNCssamad/UtLIepXJzKxRNXxQAxYAx0XEBsDWwGhJGwAnAHdExEDgjvwZYGdgYH6NAs6HFASBk4GtgC2BkyuB0MzMaqPhg1pEzIqIB/Pwa8BjQF9gGHBpnuxSYHgeHgZcFsn9wCqS1gB2AsZFxNyImAeMA4bWsChmZg2v4YNakaQBwKbAA8DqETErj3oBWD0P9wWmF2abkdNaSjczsxpxUMskrQj8ETg2Il4tjouIAKKD1jNK0kRJE2fPnt0RizQzs8xBDZDUnRTQroiI63Pyi7lZkfz+Uk6fCfQvzN4vp7WU/iERcWFEDI6Iwb179+7YgpiZNbiGD2qSBFwMPBYRvyyMGgtUejCOBG4spB+Ue0FuDbySmylvA3aU1Ct3ENkxp5mZWY0sXe8MLAG2AQ4EHpY0OaedCJwOXCvpMOBZYJ887hZgF6AJeBM4BCAi5kr6CTAhT/fjiJhbmyKYmRk4qBER9wJqYfT2VaYPYHQLyxoDjOm43JmZWXs0fPOjmZmVh4OamZmVhoOamZmVhoOamZmVhoOamZmVhoOamZmVhoOamZmVhoOamZmVxiL/fC1pIfBwIWl4REzrtByZmZn9h9pyR5G3ImJQtRH5vomKiPc6NltmZmbt1+7mR0kDJD0h6TLgEaC/pG9LmiDpIUk/Kkx7kqR/SbpX0lWSjs/pd0sanIdXkzQtD3eTdEZhWUfk9CF5nj9IelzSFTmgImkLSX+XNEXSeEk9Jd0jaVAhH/dK2mQxtpOZmXUBbampLV+40e8zwDeBgcDIiLhf0o7585akeyiOlfR54A1gP2BQXs+DwKRFrOsw0l3vt5C0LHCfpL/kcZsCGwLPA/cB20gaD1wD7BsREyStBLxFuuv+wcCxktYFlouIKW0oq5mZdWHtbn7MT4d+NiLuz0k75tc/8+cVSUGuJ3BDRLyZ5xvbhnXtCGwsaa/8eeW8rHeA8RExIy9rMjAAeAWYFRETACoP95R0HfADSd8GDgUuacO6zcysi/tP79L/RmFYwE8j4oLiBJKObWX+BXzQ9Llcs2V9IyI+9BwySUOAtwtJC2kl7xHxpqRxwDDSI2M2byUvZmZWEh3Rpf824FBJKwJI6ivp48A9wHBJy0vqCexWmGcaHwSavZot6+v5SdRIWldSj1bW/QSwhqQt8vQ9JVWC3W+Bs4EJETFvsUpoZmZdwmI/Ty0i/iLpU8A/ct+N14EDIuJBSdcAU4CX+ODhmQBnkh7AOQq4uZD+W1Kz4oO5I8hsYHgr635H0r7AryUtT7qe9iXg9YiYJOlV4HeLW0YzM+salJ55WYMVSaeQgs2ZNVpfH+BuYP0l9S8HgwcPjokTJ9Y7G2ZdyoATbl70RF3AtNN3rXcWuixJkyJicLVxpbyjiKSDgAeAk5bUgGZmZh1vsZsf2yoiTqnhui4DLqvV+szMbMlQypqamZk1Jgc1MzMrDQc1MzMrDQc1MzMrDQc1MzMrjYYPapLGSHpJ0iOFtFMkzZQ0Ob92KYz7nqSm/KSCnQrpQ3Nak6QTal0OMzNzUIN0s+OhVdLPiohB+XULgKQNSE8e2DDPc15+XE434FxgZ2ADYP88rZmZ1VDN/qe2pIqIe/KTB9piGHB1RLwNPCOpifTIHYCmiHgaQNLVedpHOzi7ZmbWCtfUWnZUflDpGEm9clpfYHphmhk5raX0j5A0StJESRNnz57dGfk2M2tYDmrVnQ+sTXrA6SzgFx214Ii4MCIGR8Tg3r17d9RizcwMNz9WFREvVoYlXQTclD/OBPoXJu2X02gl3czMasQ1tSokrVH4+BWg0jNyLLCfpGUlrUV6Kvd40mN1BkpaS9IypM4kbXnSt5mZdaCGr6lJugoYAqwmaQZwMjBE0iAgSA80PQIgIqZKupbUAWQBMDoiFublHEV6yGk3YExETK1xUczMGl7DB7WI2L9K8sWtTH8acFqV9FuAWzowa2Zm1k5ufjQzs9JwUDMzs9JwUDMzs9JwUDMzs9JwUDMzs9JwUDMzs9JwUDMzs9JwUDMzs9JwUDMzs9JwUDMzs9JwUDMzs9JwUDMzs9JwUDMzs9JwUDMzs9JwUDMzs9JwUDMzs9JwUDMzs9JwUDMzs9JwUDMzs9JwUDMzs9JwUDMzs9JwUDMzs9JwUDMzs9JwUDMzs9Jo+KAmaYyklyQ9UkhbVdI4SU/m9145XZLOltQk6SFJmxXmGZmnf1LSyHqUxcys0TV8UAMuAYY2SzsBuCMiBgJ35M8AOwMD82sUcD6kIAicDGwFbAmcXAmEZmZWOw0f1CLiHmBus+RhwKV5+FJgeCH9skjuB1aRtAawEzAuIuZGxDxgHB8NlGZm1skaPqi1YPWImJWHXwBWz8N9gemF6WbktJbSP0LSKEkTJU2cPXt2x+bazKzBOagtQkQEEB24vAsjYnBEDO7du3dHLdbMzHBQa8mLuVmR/P5STp8J9C9M1y+ntZRuZmY15KBW3Vig0oNxJHBjIf2g3Atya+CV3Ex5G7CjpF65g8iOOc3MzGpo6XpnoN4kXQUMAVaTNIPUi/F04FpJhwHPAvvkyW8BdgGagDeBQwAiYq6knwAT8nQ/jojmnU/MzKyTNXxQi4j9Wxi1fZVpAxjdwnLGAGM6MGtmZtZObn40M7PScFAzM7PScFAzM7PScFAzM7PScFAzM7PScFAzM7PScFAzM7PScFAzM7PScFAzM7PScFAzM7PScFAzM7PScFAzM7PScFAzM7PScFAzM7PScFAzM7PScFAzM7PScFAzM7PScFAzM7PScFAzM7PScFAzM7PScFAzM7PScFAzM7PScFAzM7PScFAzM7PScFBrhaRpkh6WNFnSxJy2qqRxkp7M771yuiSdLalJ0kOSNqtv7s3MGo+D2qJ9MSIGRcTg/PkE4I6IGAjckT8D7AwMzK9RwPk1z6mZWYNzUGu/YcClefhSYHgh/bJI7gdWkbRGPTJoZtaoHNRaF8BfJE2SNCqnrR4Rs/LwC8DqebgvML0w74yc9iGSRkmaKGni7NmzOyvfZmYNael6Z2AJt21EzJT0cWCcpMeLIyMiJEV7FhgRFwIXAgwePLhd85qZWetcU2tFRMzM7y8BNwBbAi9WmhXz+0t58plA/8Ls/XKamZnViINaCyT1kNSzMgzsCDwCjAVG5slGAjfm4bHAQbkX5NbAK4VmSjMzqwE3P7ZsdeAGSZC205URcaukCcC1kg4DngX2ydPfAuwCNAFvAofUPstmZo3NQa0FEfE0sEmV9JeB7aukBzC6BlkzM7MWuPnRzMxKw0HNzMxKw0HNzMxKw0HNzMxKw0HNzMxKw0HNzMxKw0HNzMxKw0HNzMxKw0HNzMxKw0HNzMxKw0HNzMxKw0HNzMxKwzc0NutiBpxwc72z0GGmnb5rvbNgJeOampmZlYaDmpmZlYaDmpmZlYaDmpmZlYaDmpmZlYaDmpmZlYaDmpmZlYaDmpmZlYb/fN1FNfofcBu9/GZWnWtqZmZWGg5qZmZWGg5qHUzSUElPSGqSdEK982Nm1kh8Ta0DSeoGnAvsAMwAJkgaGxGP1jdnZlYGvpa8aK6pdawtgaaIeDoi3gGuBobVOU9mZg1DEVHvPJSGpL2AoRHxtfz5QGCriDiqMM0oYFT+uB7wRM0z2j6rAXPqnYk6aeSyQ2OX32Vfsq0ZEb2rjXDzY41FxIXAhfXOR1tJmhgRg+udj3po5LJDY5ffZe+6ZXfzY8eaCfQvfO6X08zMrAYc1DrWBGCgpLUkLQPsB4ytc57MzBqGmx87UEQskHQUcBvQDRgTEVPrnK3F1WWaSjtBI5cdGrv8LnsX5Y4iZmZWGm5+NDOz0nBQMzOz0nBQM2sHSap3HsysZQ5q9h9ptIO7pG0lbRkNdhFaUh9JH693Pqw+JK1e7zy0l4OatYukzSSt0GgHd2Az4EZJm0JjBPV8QDsD+EpXPLgtrmrfcSN875DKKWk1YLykr9Y7P+3hoGZtJmkocB3w6XrnpVYkLQUQEWeT7uX5+0qNrcwHOEmKiBeB3wODgV0krVznbNVU5cRN0pclHSFpxUY5mYtkDnAUcIqkveudp7ZyULM2kfRJ4HTg4Ih4oN75qZWIeA9A0mhgJeAF4FZJW5U1sOWAVjl4rwqsA/wU2K8RmiKL36mkQ4H/AXYCbpK0aRm/86JK+SQtFRH/BxwL/FzSvvXNWds4qFmrCj9gkZ5A8Lecvkx+716vvNWKpI1IZ6w/iogvAd8HbpC0dRnP3As1lD2A44G9gFOBzwO7lbnGVgzouZwCdoyIPYC/AT8ENilrYGt2QrOOpNUi4hbgQOD0rhDYHNRsUXrk9+eBPpKOA4iIdyRtD/yy0kRXFlUOWLOAicA7krpHxHnA/wF3S9q45hnsJJK+IOmMQlI/YEJEvBwR55DKfCIwsow1tmYB7TjgPuA7wDcAIuIHwMPALylpE3yh/N8CLgCuk/RT0j1s9wd+kp8+ssQq1cHIOpaknYArJH0fGAEcDWwn6dz8mJ1fALdXmujKoPmZen7w66vAisBBQKWsfwVuz+PK4l/A1ySdnj8/AKxQ6RwTEVcDj5IO6P+uTxY7T+F7/yywBbAvKYivn29/R0T8ELgTmFuvfHY2SZsDBwC7ksr/HCm4TwS+BRwnqWf9ctg63/vRqpK0LemM9FBS09vGpE4io0k7+sbA9yLiz82aLLq0woFtNLAz6SB+O3AE8Edg7VwzHQwMi4jn6pXXjiLpi0DPiBgraX1goqSFEXGSpH2AfSRtBbwGrACcGhFlCubA+zX0T5NqKA9ExFRJTwOvA4dLWj4izoiIU+ua0Q5W+f0WfscrAy9HxJvAPyTNI11T/GxE3CTp7oh4va6ZboWDmn1EPgvrTaqZLQV8CtgzIt7I3flHFaYtTUCrUHqQ6z7A4cDPgB3z+07AF4G1gTPLENCy14GnJfWLiBmStiQFtteBb5Oup3wRWAU4JiKerWNeO1Rx/83vD+Um2MMlbRMR90m6C1iWFNx7AfPLss/nziCV1oeepJaH+4F5ko6KiHMi4nFJLwADgXuAN+uU3TbxDY3tQyTtCHwWaCL1dpwDbBcRc3Nz5DbAzyLijTpms9PkgD4CuIYPmmB+DZwMXBwRF9Qxex2ucJY+HLgUGB0Rl0tag9Tc9JuI+EmetkeJv/cRpIP2S8DlpO/9MFLnoL/lDlHLlLj8o4AdSOW/n9S8/FngE6Tm1uOBoRHxTN0y2Ua+pmbvk7QZsDtwR0RcDvwJmAyEpM+RrqGNL9MPu3mnkIh4LSJ+Q+q+vzOwX0TcDMwGdpe0ahl6vlXKkAPaSsAgUvPyDyUdEhGzSE2s35P0kzzbEn2G3h5Kd0pZPg9/g9QZZB6wHunRUbcBlwBnSvpMRLxbpv2+SNJupGtlp5Oun61Hap05J39eHdijKwQ0cPOjZfkg91vgXVKPRgEXA3sCtwLzgRNzm5N9Qz0AAAdLSURBVHppmhwL19COAtYCepF+3C8CywCfkPRlUvmPiYhSdBAolHuDiHhU0gxgGnAVcJEkIuJ3ktYiBXjK8p1L6gucADwi6TJgAOm7fSCPPxH4eUR8LXfrL+3T6yVtQ9rvfxURkyQ9DmxH+t2/EBHf7Wq/d9fUrNIpZCipmW1VYLd8R4F/RsT3SdeU9swdCbrUDt6SfKa+Qh4eDQwHzgU2Ab4REa8A40kB7tukg9yceuW3M0j6DPBnSUeQ/oP1dVIg3x04S9KIiHgxIp6sZz47wfPAJFJz4whgQ+ALhfE3kY+NEXFuia6dNv9j+Uqk8m8FHCxp44h4I//hug+wLnS9kxnX1Bpc7r58EfAgMIPUzHaSpPci4tcA+QBPHu5SO3g1zc7UxwDLk/6DcxDpjiHH5wvoJ+YmqmWK26AMlP48P51UCzmC9F+8v5OamL9Aur4yr24Z7CSFa4hLARvk14PANyTNjYjfknpADpC0CvBKGfb5iio19Emkmtp44JuSriL1cF2ZtE90Oe4o0sByL7efkbrm3y9pHXLXXVJvt4si4uR65rEz5LPVkcBGpC772wH9ST/iAyJiQb7O8i5wQZkOavB+DW0ocC3wFvC/wPVAd1J39h9FxI/ql8POlTuFHA8cQuoMMofUs3NPUi3t88C+ETG1bpnsRPn7v5p0+687SZcZ7iN1DDqS1APyxxExpW6ZXAxufmxsK5N+wNvlz8+SamtPkXo5jqtTvjpNofm0cqa+N6m8GwL35IB2MKkp7o6yBbRsen5dCgwBbgZejYiLSH9juLx+WauJ9YArI2IycBzwCqm57Rzg58CQEge05jX0DUh/st8VmEK61PDVrhrQwEGtoUXEOGAP4FBJ+0fEu6QOEV8G5kbEvWXo6VeUm55GkHq7nUgKaAuBy4BjJZ1POrDvVcJrSQBExIzczHYY6b6OB5DO2omIiyPiqXrmrwYeBLaRtGFEvBMR/wv8F/Ax0n5fqmunFbmGdhLpZPYA0glsL+BJUqvFV4GlIuLtumWyA/iaWoOLiBslvUe6HdaepNtAnVK5hlTSmsr7Z+pK97j7b9KP+gJS7WVBRMyvZwZrISKm5Frp9sAxkgZExLT65qom7ibdBuurku4kXVN9BTg7SninlIJiDf08PqihXy9pIXB3RCysZwY7gq+pGQCSdgd+DFwREWcU/8dU35x1vPxH44OBkyrNTJImAHdR0ltALYrSjZrfrXc+akVSH1IrxR7AAuD4iHiovrmqDUmbkB4l1BPoHRHr1zlLHcpBzd6X7yYyBjg6Iq6vd346S+7V9u38sXKmfixwUEQ8X7eMWc1J6kE6Di6x9zLsDEpPWdgeOIZ0g4Fp9c1Rx3FQsw+RtAPwVEQ8Xe+8dKZGPlM3qyhjDd1BzRpao56pm5WVg5qZmZWGu/SbmVlpOKiZmVlpOKiZmVlpOKiZmVlpOKiZLaEkDZcUktr951hJ0yStViV9d0kndFD+dpY0UdKjkv4p6Rc5/RRJx3fEOszay0HNbMm1P3Bvfv8ISe2+zV1EjI2I0xc3Y5I2It0A+ICI2ID0lOymxV2u2eJyUDNbAklaEdiWdNPh/QrpQyT9TdJY4FFJ3SSdKekRSQ/lR+ZUfEPSg5IertT2JB0s6RxJK0t6Nj9XDEk9JE2X1F3S2pJulTQpr6taTfE7wGkR8ThARCyMiPOrlONwSRMkTZH0x8KDWffOeZ4i6Z6ctqGk8ZIm57IM7JCNaQ3FQc1syTQMuDUi/gW8LGnzwrjNgGMiYl1gFDAAGBQRGwNXFKabExGbAeeTnh/2vnzD6sl88MTnLwO35btLXEh6+vfmeb7zquRvI9LToxfl+ojYIiI2AR4jBWmAHwI75fTdc9qRwK8iYhCp5jejDcs3+xAHNbMl0/6kBzmS34tNkOMj4pk8/CXSg0wXAETE3MJ0lft3TiIFvuauAfbNw/sB1+Qa4meB6yRNJj25YI3FKMdGubb3MDCC9Nw6SA+lvETS4UC3nPYP4ERJ3wXWjIi3FmO91qD86BmzJYykVUkPbv20pCAd9ENS5SbMb7RxUZXnYi2k+m99LPA/eX2bk27u3AOYn2tLrZma51nUwyQvAYYXHnMzBCAijpS0FenhlJMkbR4RV0qqPLDyFklHRMSdi1i+2Ye4pma25NkL+H1ErBkRAyKiP/AM8Lkq044Djqh0GskBqk3y/S4nAL8CbsrXxV4FnpG0d16e8qNKmjuDVKtaN0+3lKQjq0zXE5glqTuppkaefu2IeCAifgjMBvpL+i/g6Yg4G7gR2LitZTGrcFAzW/LsD9zQLO2PVO8F+VvgOeAhSVNITy9uj2tIT0G+ppA2AjgsL28q6freh+QnGhwLXCXpMeAR0tOjm/sB8ACpufHxQvoZuQPLI8DfSTW+fYBHcrPnRqSnkZu1i29obGZmpeGampmZlYaDmpmZlYaDmpmZlYaDmpmZlYaDmpmZlYaDmpmZlYaDmpmZlYaDmpmZlcb/A62wAZjmPDyCAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light", "tags": [] }, "output_type": "display_data" } ], "source": [ "plt.xlabel('Archive Class')\n", "plt.ylabel('Frequency', rotation=0, labelpad=30)\n", "plt.title('Frequencies of Predicted Test Archive Classes')\n", "plt.xticks(rotation=45)\n", "test_pred_counts = [predicted_df.loc[X_test.index].loc[predicted_df['prediction'] == label].shape[0] for label in labels]\n", "plt.bar(labels, np.asarray(test_pred_counts));" ] }, { "cell_type": "markdown", "id": "7b51ad25", "metadata": { "id": "GeVOwslAH1pP" }, "source": [ "Unfortunately, since our texts skew so heavily towards being part of the domestic archive, most of the other archives end up being overpredicted (i.e. our model says a text is part of that archive when it is actually not). Below we can see that the domestic archive is the only archive whose texts are not overpredicted." ] }, { "cell_type": "code", "execution_count": null, "id": "739ae7ef", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 310 }, "id": "RRCs5wEESDVZ", "outputId": "8db2a17f-b490-4af8-d257-8944be46d747" }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZ0AAAElCAYAAAA/Rj+6AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3deZgcVdn+8e9NCLJFEDP6CkkIsqhhhxFQVCKLJi4BQYUAIgoEfxLEV0AQERBUNjdQkEV5EZBVUYMgqCyC7GGLhIiGNWGRsMqmEHh+f5zTUGlnhZnTmer7c119Tdcy3U91z9Td59TpKkUEZmZmJSzS6gLMzKx9OHTMzKwYh46ZmRXj0DEzs2IcOmZmVoxDx8zMinHo2JAn6ROS5kh6RtK6ra5nsEkaKykkLZqnfy/ps6/hccbk12zYINQYklYZ6Md9vSTdK2nzbpa9X9KdpWtqNw4d65P8z/p83kk9LOlUSUv38Xd3lvSXQSzvu8DUiFg6Im7p4vklaV9J/8jbcL+kwyW9YRBrKiYiJkbEz3tbr3mHGxH359fspcGt8PWTND4H2X6D9RwRcVVEvGOwHt8Sh471x8cjYmlgHWBd4GstrqdhRWBmD8uPBaYAOwEjgInAZsC5A11Io/Ux2L/Thj4LPE56D7vl13IIiAjffOv1BtwLbF6ZPgq4sDK9P3AX8DRwB/CJPP9dwL+Bl4BngCfz/DeQWij3A/8ETgCW6Oa5FwEOBO4DHgFOA5bJj/EMEMCzwF1d/O6q+bk3aJo/GvgPsCmwIfAwMKyy/BPAjMrzN7bvMVJYLZeXjc3Pv0velisr86YADwIPAftUHvsQ4JfAGcC/gF3z9vwsr/sA8K1GPcCw/Fo9CtwN7JEff9G8/Apg18rj7wbMqrwX6wGnAy8Dz+fX7KuVOhuPszwwjbRznw3s1lTzufm1f5oU8p09/L0E8KVc76PA0fl1XCw//pqVdd8CPAd0dPNYS+Xn3A54ofq8Xb3+3b0Glb/jfYAZwFPAOcDiedl4YG6+vx/wy6Y6jgGOzfe7fb9862Vf0uoCfBsaNyqhA4wC/gocU1n+qbzTWgTYlhQCb8vLdgb+0vR4P8g7uOVIrY8LgMO7ee7P553g24GlgfOB0yvLA1ilm9/9AnBfN8v+3HhOUqBsUVl2HrB/vr8XcF3e7jcAJwJn5WWNnd5peee4RGXeWXnemsC8yut3CPAisFV+vZYAfp0fd6m8E74B2L2yDX8jBeVywOV0Ezr5fXgAeDcgYBVgxeb3sKn2xuNcCRwPLE5qzc4DNq3U/G/gI6QQPBy4roe/l8h1LgeMAf5eqfF44MjKunsBF/TwWJ8h7dyH5b+TH3WxDdXXv7fX4AbS3+pypGD6Ql42nldDZ0VSEI7I08NyDRvl6W7fL9962Ze0ugDfhsYt/7M+Q/rkGMClwLI9rH8rsGW+vzOV0Mk7gmeBlSvz3gPc081jXQp8sTL9DtJOu7Gz7Cl0Duxu5wicDZyc738LOCXfH5HrWzFPzwI2q/ze2xrPX9npvb2yvDHvnZV5RwE/y/cPIX8iz9NvJbW6lqjMmwxcnu9f1tgx5ukP0X3oXALs1cN72GXokALtpcZONi8/HDi1UvOfKsvGAc/38P4HMKEy/UXg0nx/Q1KrRHl6OvDpHh7rT8APK6/LPGB40zZUX//eXoMdm96XE/L98eTQydN/AXbK97cgt6R7e7986/nmYzrWH1tFxAjSP+c7gZGNBZJ2knSrpCclPQmsUV3epANYEripsv7FeX5Xlid1rTXcR9pRvrUPNT9KComuvC0vBzgT2DoPLtgauDkiGs+5IvDrSq2zSDvo6vPP6eLxq/Puy9vR1bIVgeHAQ5XnOJH0CZr8e82P1Z3RpFZbfy0PPB4RTzc9zwqV6Ycr958DFu/lGEqX2x8R1+ffHy/pnaSWyLSuHkDSaOCDwC/yrN+SWmIf7eG5ensNmrejuwExZ5LCBGD7PA29v1/WA4eO9VtE/Bk4lXScAUkrAicDU4E3R8SywO2kFg2kT6JVj5KOLaweEcvm2zKRBil05UHSP3rDGGA+6VhQby4DRkvaoDoz78w2IrWiiIg7SDvGiSy4g4G0Q5tYqXXZiFg8Ih6orNO8jZB2ftWaH+xm/TmkT84jK4//xohYPS9/qIvH6s4cYOVulnVVY8ODwHKSRjQ9zwPdrN8XPW3/z4EdSV1nv4yIf3fzGJ8h7acukPQw6RjR4qSBBVXNr2d3r0F/nEcKxlGkY3yNv4ne3i/rgUPHXqsfAltIWpvUrx2kbg8kfY7U0mn4JzBK0mIAEfEyKaR+IOkt+XdWkPThbp7rLOB/Ja2Uh2l/BzgnIub3VmRE/J00SOEXkjaSNEzS6sCvSN1Ff6qsfibp+MIHSDuchhOAb+dwRVKHpC17e27gG5KWzM/3OdJB665qfAj4A/A9SW+UtIiklSVtklc5F/iSpFGS3kQa1NCdnwL7SFo/DxVfpVE36X14ezc1zAGuAQ6XtLiktUgH58/ow3Z2Z19Jb8oBvxcLbv8ZpB35jqTjMd35LPBN0jGmxm0b4COS3tzN7/T0GvRZRMwjdV3+H6nrd1ae39v7ZT1w6Nhrkv8hTwMOyq2E7wHXknZsawJXV1a/jDTa6WFJje6s/UiDA66T9C9Sv31335E4hTT66krgHtIB7T37Ue5U0o7oDNJxqYtJO5NtmtY7C9gEuCwiHq3MP4bU/fMHSU+TBhVs2Ifn/TNpGy8FvhsRf+hh3Z1II7vuAJ4gjW5rdAueTDpOcRtwM2kgRZci4jzg26QAfRr4DemAOaRjNAfmLqF9uvj1yaRjJA+SDpQf3BTK/fVb4CbS8b0LSaO9GnXOydsSwFVd/bKkjUgt3OMi4uHKbRrpdZ3c1e/18hr015nA5izY8oWe3y/rQeNAnpkNEEljSeE4vC+tsXYl6RTgwYg4sNW1WDn+IpWZFZeDeWvSl4ytjbh7zcyKknQYaaDJ0RFxT6vrsbLcvWZmZsW4pWNmZsU4dMzMrBgPJOjFyJEjY+zYsa0uw8xsyLjpppsejYguzzDi0OnF2LFjmT59eqvLMDMbMiR1e6omd6+ZmVkxDh0zMyvGoWNmZsU4dMzMrBiHjpmZFVOb0JF0iqRHJN3ewzrj84XGZkr6c8n6zMysRqFDuqjYhO4WSlqWdG32SfliS58qVJeZmWW1CZ2IuBJ4vIdVtgfOj4j78/qPFCnMzMxe0U5fDl0NGC7pCmAEcExE9HTFQrPXbOz+F7a6hAFx7xEfbXUJVjPtFDqLAusDmwFLANdKui5fzngBkqYAUwDGjOnpcvRmZtYftele64O5wCUR8Wy+FPGVwNpdrRgRJ0VEZ0R0dnR0efogMzN7DdopdH4LvE/SopKWJF3jflaLazIzayu16V6TdBYwHhgpaS5wMDAcICJOiIhZki4GZgAvAz+NiG6HV5uZ2cCrTehExOQ+rHM0cHSBcszMrAvt1L1mZmYt5tAxM7NiHDpmZlaMQ8fMzIpx6JiZWTEOHTMzK8ahY2ZmxTh0zMysGIeOmZkV49AxM7NiHDpmZlaMQ8fMzIpx6JiZWTG1Ocu0LVzqcrlm8CWbzQaSWzpmZlaMQ8fMzIpx6JiZWTG1CR1Jp0h6RFKPl6CW9G5J8yV9slRtZmaW1CZ0gFOBCT2tIGkYcCTwhxIFmZnZgmoTOhFxJfB4L6vtCfwKeGTwKzIzs2a1CZ3eSFoB+ATwkz6sO0XSdEnT582bN/jFmZm1ibYJHeCHwH4R8XJvK0bESRHRGRGdHR0dBUozM2sP7fTl0E7gbEkAI4GPSJofEb9pbVlmZu2jbUInIlZq3Jd0KvA7B46ZWVm1CR1JZwHjgZGS5gIHA8MBIuKEFpZmZmZZbUInIib3Y92dB7EUMzPrRjsNJDAzsxZz6JiZWTEOHTMzK8ahY2ZmxTh0zMysGIeOmZkV49AxM7NiHDpmZlaMQ8fMzIpx6JiZWTEOHTMzK8ahY2ZmxTh0zMysGIeOmZkVU5tLGyyMxu5/YatLGDD3HvHRVpdgZjXglo6ZmRXj0DEzs2JqEzqSTpH0iKTbu1m+g6QZkv4q6RpJa5eu0cys3dUmdIBTgQk9LL8H2CQi1gQOA04qUZSZmb2qNgMJIuJKSWN7WH5NZfI6YNRg12RmZguqU0unP3YBft/dQklTJE2XNH3evHkFyzIzq7e2Cx1JHySFzn7drRMRJ0VEZ0R0dnR0lCvOzKzmatO91heS1gJ+CkyMiMdaXY+ZWbtpm5aOpDHA+cBnIuLvra7HzKwd1aalI+ksYDwwUtJc4GBgOEBEnAAcBLwZOF4SwPyI6GxNtWZWRz4LSe9qEzoRMbmX5bsCuxYqx8zMutA23WtmZtZ6Dh0zMyvGoWNmZsU4dMzMrBiHjpmZFePQMTOzYhw6ZmZWjEPHzMyKceiYmVkxDh0zMyvGoWNmZsU4dMzMrBiHjpmZFePQMTOzYhw6ZmZWjEPHzMyKceiYmVkxtQkdSadIekTS7d0sl6RjJc2WNEPSeqVrNDNrd7UJHeBUYEIPyycCq+bbFOAnBWoyM7OK2oRORFwJPN7DKlsCp0VyHbCspLeVqc7MzKBGodMHKwBzKtNz87z/ImmKpOmSps+bN69IcWZm7aCdQqfPIuKkiOiMiM6Ojo5Wl2NmVhvtFDoPAKMr06PyPDMzK6SdQmcasFMexbYR8FREPNTqoszM2smirS5goEg6CxgPjJQ0FzgYGA4QEScAFwEfAWYDzwGfa02lZmbtqzahExGTe1kewB6FyjFrW2P3v7DVJQyYe4/4aKtLqJ126l4zM7MWc+iYmVkxDh0zMyvGoWNmZsU4dMzMrJheQ0fSS5JulXS7pAskLdvL+utI+sjAlWhmZnXRl5bO8xGxTkSsQTqhZm/DjtchfR/GzMxsAf3tXruWfJJMSRtIulbSLZKukfQOSYsBhwLb5tbRtpKWyte6uSGvu+VAb4SZmQ0Nff5yqKRhwGbAz/KsvwHvj4j5kjYHvhMR20g6COiMiKn5974DXBYRn89dczdI+lNEPDuwm2JmZgu7voTOEpJuJbVwZgF/zPOXAX4uaVUgyKec6cKHgEmS9snTiwNj8mOZmVkb6fMxHWBFQLx6TOcw4PJ8rOfjpDDpioBt8nGhdSJiTEQ4cMzM2lCfj+lExHPAl4C9JS1Kauk0Lg2wc2XVp4ERlelLgD0lCUDSuq+nYDMzG7r6NZAgIm4BZgCTgaOAwyXdwoLddJcD4xoDCUgtouHADEkz87SZmbWhXo/pRMTSTdMfr0yuVrl/YF7+OPDupofZ/bUWaGZm9eEzEpiZWTEOHTMzK6Y2oSNpgqQ7Jc2WtH8Xy8dIujx/QXWGT9VjZlZeLUInf3H1OGAiMA6YLGlc02oHAudGxLrAdsDxZas0M7NahA6wATA7Iu6OiBeAs4Hm0+0E8MZ8fxngwYL1mZkZ/TgNzkJuBWBOZXousGHTOocAf5C0J7AUsHmZ0szMrKEuLZ2+mAycGhGjSGfBPl1Sl9svaYqk6ZKmz5s3r2iRZmZ1VpfQeQAYXZkexatnS2jYBTgXICKuJZ22Z2RXDxYRJ0VEZ0R0dnR0DEK5ZmbtqS6hcyOwqqSV8uUVtgOmNa1zP+ks2Uh6Fyl03IwxMyuoFqETEfOBqaTzvM0ijVKbKelQSZPyansDu0m6DTgL2DkiojUVm5m1p7oMJCAiLgIuapp3UOX+HcDGpesyM7NX1aKlY2ZmQ4NDx8zMinHomJlZMQ4dMzMrxqFjZmbFOHTMzKwYh46ZmRXj0DEzs2IcOmZmVoxDx8zMinHomJlZMQ4dMzMrxqFjZmbFOHTMzKwYh46ZmRXj0DEzs2IcOmZmVkxtQkfSBEl3Spotaf9u1vm0pDskzZR0ZukazczaXS0uVy1pGHAcsAUwF7hR0rR8ierGOqsCXwM2jognJL2lNdWambWvurR0NgBmR8TdEfECcDawZdM6uwHHRcQTABHxSOEazczaXl1CZwVgTmV6bp5XtRqwmqSrJV0naUKx6szMDKhJ91ofLQqsCowHRgFXSlozIp5sXlHSFGAKwJgxY0rWaGZWa3Vp6TwAjK5Mj8rzquYC0yLixYi4B/g7KYT+S0ScFBGdEdHZ0dExKAWbmbWjuoTOjcCqklaStBiwHTCtaZ3fkFo5SBpJ6m67u2SRZmbtrhahExHzganAJcAs4NyImCnpUEmT8mqXAI9JugO4HNg3Ih5rTcVmZu2pNsd0IuIi4KKmeQdV7gfwlXwzM7MWqEVLx8zMhgaHjpmZFePQMTOzYhw6ZmZWjEPHzMyKceiYmVkxDh0zMyvGoWNmZsU4dMzMrBiHjpmZFePQMTOzYhw6ZmZWjEPHzMyKceiYmVkxDh0zMyvGoWNmZsU4dMzMrJjahI6kCZLulDRb0v49rLeNpJDUWbI+MzOrSehIGgYcB0wExgGTJY3rYr0RwF7A9WUrNDMzqEnoABsAsyPi7oh4ATgb2LKL9Q4DjgT+XbI4MzNL6hI6KwBzKtNz87xXSFoPGB0RF/b2YJKmSJouafq8efMGtlIzszZWl9DpkaRFgO8De/dl/Yg4KSI6I6Kzo6NjcIszM2sjdQmdB4DRlelReV7DCGAN4ApJ9wIbAdM8mMDMrKy6hM6NwKqSVpK0GLAdMK2xMCKeioiRETE2IsYC1wGTImJ6a8o1M2tPtQidiJgPTAUuAWYB50bETEmHSprU2urMzKxh0VYXMFAi4iLgoqZ5B3Wz7vgSNZmZ2YJq0dIxM7OhwaFjZmbFOHTMzKwYh46ZmRXj0DEzs2IcOmZmVoxDx8zMinHomJlZMQ4dMzMrxqFjZmbFOHTMzKwYh46ZmRXj0DEzs2IcOmZmVoxDx8zMinHomJlZMQ4dMzMrpjahI2mCpDslzZa0fxfLvyLpDkkzJF0qacVW1Glm1s5qETqShgHHAROBccBkSeOaVrsF6IyItYBfAkeVrdLMzGoROsAGwOyIuDsiXgDOBrasrhARl0fEc3nyOmBU4RrNzNpeXUJnBWBOZXpuntedXYDfd7dQ0hRJ0yVNnzdv3gCVaGZmdQmdPpO0I9AJHN3dOhFxUkR0RkRnR0dHueLMzGpu0VYXMEAeAEZXpkfleQuQtDnwdWCTiPhPodrMzCyrS0vnRmBVSStJWgzYDphWXUHSusCJwKSIeKQFNZqZtb1ahE5EzAemApcAs4BzI2KmpEMlTcqrHQ0sDZwn6VZJ07p5ODMzGyR16V4jIi4CLmqad1Dl/ubFizIzswXUoqVjZmZDg0PHzMyKceiYmVkxDh0zMyvGoWNmZsU4dMzMrBiHjpmZFePQMTOzYhw6ZmZWjEPHzMyKceiYmVkxDh0zMyvGoWNmZsU4dMzMrBiHjpmZFePQMTOzYhw6ZmZWTG1CR9IESXdKmi1p/y6Wv0HSOXn59ZLGlq/SzKy91SJ0JA0DjgMmAuOAyZLGNa22C/BERKwC/AA4smyVZmZWi9ABNgBmR8TdEfECcDawZdM6WwI/z/d/CWwmSQVrNDNre4qIVtfwukn6JDAhInbN058BNoyIqZV1bs/rzM3Td+V1Hu3i8aYAU/LkO4A7B3kTXo+RwH9tQxtp5+33trevhX37V4yIjq4WLFq6kqEgIk4CTmp1HX0haXpEdLa6jlZp5+33trfntsPQ3v66dK89AIyuTI/K87pcR9KiwDLAY0WqMzMzoD6hcyOwqqSVJC0GbAdMa1pnGvDZfP+TwGVRh75FM7MhpBbdaxExX9JU4BJgGHBKRMyUdCgwPSKmAT8DTpc0G3icFEx1MCS6AQdRO2+/t719Ddntr8VAAjMzGxrq0r1mZmZDgEPHzMyKcehYrfgLv2YLN4dOTbXbzlfS+yRt0G4jEiUtL+ktra7DWkPSW1tdQ385dGpG0nqSlmy3nS+wHvBbSetCe4Ru3uEcDXxiKO58Xq+u3uN2eN8hbaekkcANkrZvdT394dCpEUkTgPOANVtdSymSFgGIiGNJ59w7vdHiqfMOSJIi4p/A6UAn8BFJy7S4rKIaH6wkfUzS7pKWbpcPW5E8CkwFDpH0qVbX1FcOnZqQNAY4Atg5Iq5vdT2lRMTLAJL2AN4IPAxcLGnDugZPDpzGznU5YBXgcGC7duhqq76nkj4PfAf4MPA7SevW8T2vamyfpEUi4gLgy8BRkrZtbWV949AZ4ir/YCKdafuqPH+x/HN4q2orRdIapE9834yIzYEDgV9L2qiOn3wrn/C3BvYhnWHjW8AHgI/XucVTDdy8nQI+FBFbA1cBBwFr1zV4mj5wrCJpZERcBHwGOGIoBI9DZ+hbKv98EFhe0t4AEfGCpM2A7ze6oOqiix3KQ8B04AVJwyPieOAC4ApJaxUvcJBI2kTS0ZVZo4AbI+KxiPgxaZsPAD5bxxZPU+DsDVwNfBXYEyAivgH8Ffg+Ne1irmz/V4ATgfMkHU46t+Rk4LB8lv2FVq12Ru1G0oeBX0g6ENgB+BKwqaTj8uUevgf8qdEFVQfNn3TzBfz+BSwN7AQ0tvXPwJ/ysrr4O7CrpCPy9PXAko3BExFxNnAHaYf779aUOHgq7/t7gXcD25JC9p35NFhExEHAZaRTXdWSpPWBHYGPkrb/flL4Tge+AuwtaUTrKuxZLc691o4kvY/0ie7zpK6ltUiDCPYg/SGuBXwtIn7f1CQf0io7nj1IV4q9gxQuuwO/AlbOLbtOYMuIuL9VtQ4USR8ERkTENEnvBKZLeikivi7p08CnJW0IPA0sCXwrIuoUtsArLdw1SZ/wr8/nV7wbeAbYTdISEXF0RHyrpYUOsMb/b+X/eBngsYh4DrhW0hOkY1rvjYjfSboiIp5padE9cOgMQflTTAepZbMI8C5gm4h4Ng+XnlJZtzaB06B0kb1PA7uRLjv+ofzzw8AHgZWB79YhcLJngLsljYqIuZI2IAXPM8C+pP78DwLLAntFxH0trHVAVf9+888ZuYtxN0kbR8TVki4H3kAK3zcBT9blbz4PFmi03keQWu7XAU9ImhoRP46Iv0l6GFgVuBJ4rkXl9olP+DnESPoQ8F5gNmm02qPAphHxeO5u2xg4MiKebWGZgyYH7g7AObzaxfAj4GDgZxFxYgvLG3CVT7lbkS63vkdEnCHpbaTulBMi4rC87lI1ft93IO1UHwHOIL3vu5AGj1yVB8wsVuPtnwJsQdr+60jdp+8F/ofUnbgP6crI97SsyD7yMZ0hRNJ6wCTg0og4A/gNcCsQkt5POoZzQ53+8ZoHDUTE0xFxAml49ERgu4i4EJgHTJK0XB1GLjW2IQfOG4F1SN2nB0n6XEQ8ROpC/Jqkw/KvLdSfcPtD6UwLS+T7e5IGCzxBunz8Jfl2KvBdSe+JiBfr9HdfJenjpGM1R5CO37yD1Lvx4zz9VmDroRA44O61ISPvhH4KvEgakSbSNYK2AS4GngQOyH26telSqxzDmQqsBLyJ9M/3T2Ax4H8kfYy0/XtFRC0OIFe2e1xE3CFpLnAvcBZwsiQi4v8krUQKYOrynktaAdgfuF3SacBY0nt7fV5+AHBUROyah003XyW4NiRtTPq7PyYibpL0N2BT0v/9wxGx31D7f3dLZwjIgwYmkLqRlgM+nr+RfEtEHEg6prFNPtA8pP4Au5M/6S6Z7+8BbAUcB6wN7BkRTwE3kAJoX9JO6NFW1TsYJL0H+L2k3UnfQfl/pKCdBPxA0g4R8c+I+Ecr6xwEDwI3kbrTdgBWBzapLP8ded8VEcfV6Nhd8xdf30ja/g2BnSWtFRHP5i+ELg+sBkPvw4ZbOgu5PDz0ZOBmYC6pG+nrkl6OiB8B5B0w+f6Q+gPsStMn3VOAJUjfQdiJdMaBffIB1gNyF8xi1degDpS+3DuH9Cl+d9J3ka4hdaFuQurff6JlBQ6SyjGsRYBx+XYzsKekxyPip6QRbGMlLQs8VYe/+YYuWrg3kVo6NwD/K+ks0gjFZUh/E0OOBxIsxPIopSNJQ5+vk7QKeWgkabTSyRFxcCtrHAz5095ngTVIQ6I3BUaT/sl2jHR58j1JXY0n1mmnA6+0cCYA5wLPAz8EzgeGk4YLfzMivtm6CgdXHjSwD/A50mCBR0kj87YhtXI+AGwbETNbVuQgyu//2aTT+1xG6ka/mjRw5AukEWyHRsRtLSvydXD32sJtGdI/2KZ5+j5Sa+cu0ii1P7aorkFT6R5sfNL9FGl7VweuzIGzM6mr6dK6BU42J99+DowHLgT+FREnk4aJn9G60op4B3BmRNwK7A08RepO+jFwFDC+xoHT3MIdR/oS8EeB20hd6dsP1cABh85CLSL+CGwNfF7S5Ih4kXTA/GPA4xHxlzqM1KrKXSs7kEYrHUAKnJeA04AvS/oJacf7yRoeywAgIubmbqRdSOdV25H0qZeI+FlE3NXK+gq4GdhY0uoR8UJE/BB4O/Bm0t99rY7dNeQWztdJHzZ3JH3AfBPwD1Krf3tgkYj4T8uKHAA+prOQi4jfSnqZdLqbbUineTmkcQyjpp/0X/mkq3SOqS+S/ulOJH36nx8RT7aywBIi4rbcqtsM2EvS2Ii4t7VVFXEF6TQ320u6jHRM7yng2KjhmRYqqi3c43m1hXu+pJeAKyLipVYWOBB8TGeIkDQJOBT4RUQcXf0eR2srG3j5i5A7A19vdKNIuhG4nJqe4qU3SicyfbHVdZQiaXlSK39rYD6wT0TMaG1VZUham3SpihFAR0S8s8UlDSiHzhCSz0ZwCvCliDi/1fUMljwqad882fik+2Vgp4h4sGWFWXGSliLtpxbac4kNBqWzhG8G7EX6AvS9ra1o4Dh0hhhJWwB3RcTdra5lMLXzJ12zhjq2cB06tlBr10+6ZnXl0DEzs2I8ZNrMzIpx6JiZWTEOHTMzK8ahY2ZmxTh0zF4jSVtJCkn9/vKepHsljexi/iRJ+w9QfRMlTZd0h6RbJH0vzz9E0j4D8Rxm/eXQMXvtJgN/yT//i6R+n2YqIqZFxBGvtzBJa5BOkLljRIwjXWV09ut9XLPXy6Fj9hpIWhp4H+mknNtV5o+XdJWkacAdkoZJ+q6k2yXNyJdkaNhT0s2S/tpoLUnaWdKPJS0j6b58XRkkLSVpjqThklaWdLGkm/JzddXS+irw7Yj4G0BEvE2RRZYAAAJhSURBVBQRP+liO3aTdKOk2yT9qnLhvE/lmm+TdGWet7qkGyTdmrdl1QF5Ma2tOHTMXpstgYsj4u/AY5LWryxbj3R55dWAKaTLLa8TEWsBv6is92hErAf8hHT9mFfkE7reyqtXzPwYcEn+dvpJpKunrp9/7/gu6luDdPXN3pwfEe+OiLWBWaQQBTgI+HCePynP+wLpssnrkFpOc/vw+GYLcOiYvTaTSRfaIv+sdrHdEBH35Pubky40Nx8gIh6vrNc4f95NpGBqdg6wbb6/HXBObmG9FzhP0q2kM2+/7XVsxxq5tfRXXr00NKSLhp0qaTdgWJ53LXCApP2AFSPi+dfxvNamfGkDs36StBzpwnprSgrSTjkkNU5S+mwfH6pxXZSX6Pp/cRrwnfx865NOfroU8GRubfRkZv6d3i72dSqwVeUyCuMBIuILkjYkXTzsJknrR8SZkhoXFLtI0u4RcVkvj2+2ALd0zPrvk8DpEbFiRIyNiNHAPcD7u1j3j8DujUEFOUD6JJ9v7kbgGOB3+bjMv4B7JH0qP57yqfCbHU1qlayW11tE0he6WG8E8JCk4aSWDnn9lSPi+og4CJgHjJb0duDuiDgW+C2wVl+3xazBoWPWf5OBXzfN+xVdj2L7KXA/MEPSbaSrP/bHOaSrSJ5TmbcDsEt+vJmk40sLyGfk/jJwlqRZwO2kq282+wbpcshXA3+rzD86D3C4HbiG1GL6NHB77tZbg3Q1V7N+8Qk/zcysGLd0zMysGIeOmZkV49AxM7NiHDpmZlaMQ8fMzIpx6JiZWTEOHTMzK8ahY2Zmxfx/2a+bi/2hMLMAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light", "tags": [] }, "output_type": "display_data" } ], "source": [ "plt.xlabel('Archive Class')\n", "plt.ylabel('Rate', rotation=0, labelpad=30)\n", "plt.title('Rate of Overprediction by Archive')\n", "plt.xticks(rotation=45)\n", "rate = np.asarray(test_pred_counts)/np.asarray(test_counts)*sum(test_counts)/sum(test_pred_counts)\n", "plt.bar(labels, rate);" ] }, { "cell_type": "markdown", "id": "c083e610", "metadata": { "id": "VCkyV5guUAVw" }, "source": [ "### 5.3 Accuracy By Archive\n", "\n", "The accuracies for the dead and wild archives are relatively low. This is likely because those texts are being misclassified into the domestic archive, our largest archive, since all three of these archives deal with animals. The wool and precious archives have decent accuracies." ] }, { "cell_type": "code", "execution_count": null, "id": "19a813e6", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "OZmI6aR3G58T", "outputId": "b25b85b6-b6a3-47a0-d741-a98c128283a8" }, "outputs": [ { "data": { "text/plain": [ "0.734375" ] }, "execution_count": 169, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "f.score(X_test[class_series == 'dead'], y_test[class_series == 'dead'])" ] }, { "cell_type": "code", "execution_count": null, "id": "266a4a78", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "HbRZc7FSTt8B", "outputId": "20d98cb3-8354-4c33-eda9-c66f6d487af4" }, "outputs": [ { "data": { "text/plain": [ "0.9449010654490106" ] }, "execution_count": 170, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "f.score(X_test[class_series == 'dom'], y_test[class_series == 'dom'])" ] }, { "cell_type": "code", "execution_count": null, "id": "2dc2a622", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "23aREE8sTxKt", "outputId": "3eee05bc-0955-483f-ca5b-ca2094f3720c" }, "outputs": [ { "data": { "text/plain": [ "0.7410071942446043" ] }, "execution_count": 171, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "f.score(X_test[class_series == 'wild'], y_test[class_series == 'wild'])" ] }, { "cell_type": "code", "execution_count": null, "id": "9378b0e5", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "3pE1BBM5T2n_", "outputId": "04fb818b-8814-4fd1-9584-2579f183e476" }, "outputs": [ { "data": { "text/plain": [ "0.8333333333333334" ] }, "execution_count": 172, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "f.score(X_test[class_series == 'wool'], y_test[class_series == 'wool'])" ] }, { "cell_type": "code", "execution_count": null, "id": "59399459", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "LiVNsk8lT5qw", "outputId": "6a8e0e52-f860-4a35-90e3-820d71927a9b" }, "outputs": [ { "data": { "text/plain": [ "0.9264705882352942" ] }, "execution_count": 173, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "f.score(X_test[class_series == 'prec'], y_test[class_series == 'prec'])" ] }, { "cell_type": "markdown", "id": "6f0be9f0", "metadata": { "id": "NCBvhrQkBawe" }, "source": [ "We can also look at the confusion matrix. A confusion matrix is used to evaluate the accuracy of a classification. The rows denote the actual archive, while the columns denote the predicted archive. \n", "\n", "Looking at the first column: \n", "- 73.44% of the dead archive texts are predicted correctly\n", "- 1.31% of the domestic archive texts are predicted to be part of the dead archive\n", "- 1.47% of the wild archive texts are predicted to be part of the dead archive\n", "- 1.43% of the wool archive texts are predicted to be part of the dead archive\n", "- none of the precious archive texts are predicted to be part of the dead archive" ] }, { "cell_type": "code", "execution_count": null, "id": "4241129f", "metadata": { "id": "FMszAvyUBaR5" }, "outputs": [], "source": [ "from sklearn.metrics import confusion_matrix\n", "archive_confusion = confusion_matrix(y_test, f.predict(X_test), normalize='true')" ] }, { "cell_type": "code", "execution_count": null, "id": "f03d0b21", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "50GlA0txCLgq", "outputId": "f055525f-a3ec-41e5-c650-dabc6061b778" }, "outputs": [ { "data": { "text/plain": [ "array([[0.734375 , 0.171875 , 0.015625 , 0.0625 , 0.015625 ],\n", " [0.0130898 , 0.94490107, 0.00487062, 0.03531202, 0.00182648],\n", " [0.01470588, 0.05882353, 0.92647059, 0. , 0. ],\n", " [0.01438849, 0.21582734, 0.02158273, 0.74100719, 0.00719424],\n", " [0. , 0.08333333, 0.08333333, 0. , 0.83333333]])" ] }, "execution_count": 175, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "archive_confusion" ] }, { "cell_type": "markdown", "id": "b21d3325", "metadata": { "id": "Vm1vDZPGIr1f" }, "source": [ "This is the same confusion matrix converted into real numbers of texts. Since the number of domestic archive texts is so high, even a small bit of misclassification of the domestic archive texts can overwhelm the other archives.\n", "\n", "For example, even though only 1.3% of the domestic archive texts are predicted to be part of the dead archive, that corresponds to 43 texts, while the 73% of the dead archive texts that were predicted correctly correspond to just 47 texts. As a result, about half of the texts that were predicted to be part of the dead archive are incorrectly classified." ] }, { "cell_type": "code", "execution_count": null, "id": "54b5662d", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "HSDHsdWRDy_e", "outputId": "cb72fa2a-8899-4c4e-95ac-32df77871d06" }, "outputs": [ { "data": { "text/plain": [ "array([[ 47, 11, 1, 4, 1],\n", " [ 43, 3104, 16, 116, 6],\n", " [ 1, 4, 63, 0, 0],\n", " [ 2, 30, 3, 103, 1],\n", " [ 0, 2, 2, 0, 20]])" ] }, "execution_count": 176, "metadata": { "tags": [] }, "output_type": "execute_result" } ], "source": [ "confusion_matrix(y_test, f.predict(X_test), normalize=None)" ] }, { "cell_type": "markdown", "id": "84759158", "metadata": { "id": "G5YG_CKScVNB" }, "source": [ "## 6 Save Results in CSV file & Pickle" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 5 }