CORPUS MAPS
After performing a
lexical extraction
on the corpus, we used
CorText Manager
(
[doc])
to produce corpus maps applying the tool's clustering and layout algorithms.
Several maps were produced, with either approx. 150 nodes or approx. 250 nodes.
The maps are accessible from the
Corpus maps menu.
In terms of the
lexical items exploited to create the maps, two general
types of maps were created:
-
The maps that have the word terms in their name (e.g. 150 terms)
are based on DBpedia concept mentions, identified thanks to the procedure described
here.
-
The maps that have the word
multiword in their name (e.g. 250 multiword) are based
on keyphrases, which were extracted as described
here.
Regarding
layout and navigation functions, the different types of maps created
are outlined in following:
- Static maps
The complete corpus is considered at once, leaving aside
the temporal dimension (i.e. each document's date)
-
Tagged clusters maps:
Each cluster has been tagged with two terms
that represent the content of the cluster.
CorText does select these terms automatically, but for some clusters
the terms have been edited manually, based on the terms suggested
by CorText.
The map is based on a gexf file
produced by CorText, edited with Gephi
and post-processed with Inkscape
to add the cluster tags.
-
Navigable maps:
The nodes and layout in the maps are the same as in the tagged clusters ones.
But they have been produced with two tools that create a searchable and zoomable map:
- Dynamic maps
These maps are "dynamic" in the sense that they
represent how terms found in the corpus evolve over time.
The visualizations have been created with CorText Manager's
temporal analysis functions.
The visualization format is a Sankey diagram
or river chart,
representing the evolution of clusters of terms in the corpus.
The tool obtains two representative terms for each cluster at each period.
These terms are used to label the nodes of the chart.
Besides, the tool assesses whether a cluster of terms is stable, splits into
other clusters or merges with other clusters, based on the terms shared across
different clusters across time.
How to read the maps
- Hovering over a node displays the list of terms for that node.
- Hovering over an arc connecting two nodes displays the list
of terms shared by both nodes.
- Heatmaps
The heatmaps represent which clusters are focused on
in a given subset of the corpus' documents. These subsets can be created according to
criteria like document type, author, geographical origin, etc.
Our heatmaps have been created by dividing the corpus into decades. Each heatmap shows
the clusters (or areas in the corpus map) that are salient in each decade.
The difference between heatmaps and the dynamic maps
also available on this site is that the dynamic maps trace the evolution of a
lexical cluster: whether the cluster is stable across time, whether it
is split into others, or whether its terms merge with those in other clusters.
How to read the maps
-
The Basemap tab represents the clusters in the corpus regardless
of decade.
-
The remaining tabs display the heatmap for the decade starting with the year
mentioned on the tab.
Differences between the networks in the heatmaps and those in maps
elsewhere on this site ([a],
[b])
are due to a slightly different set of terms chosen
to create the networks with. The same clusters are present in all types of networks.
The heatmaps are created by CorText Manager,
but cluster names were added with Inkscape
to CorText's PDF export.
The fact that nodes in the heatmaps are represented with triangles, whereas in all other maps
they are represented with circles, does not have any special meaning.
The way the heatmaps are displayed on the site is inspired by
this work.