What is "humanities data" as a concept and framework? What are some examples of humanities datasets? What research questions do they help answer, and what is the potential for their visualization?
To begin to answer these questions, we asked workshop participants to provide a response to one of three more bounded questions related to their particular expertise. For humanities scholars bringing their data to the workshop, we asked: "What are your data and why do they matter?" For humanities scholars and computer scientists whose research concerns visualization, we asked: "What does your visualization research entail?" And for visualization designers, we asked: "What are the initial questions or activites you employ when beginning a new visualization project?"
Taken together, these responses begin to trace the contours of the field we might call humanities data visualization. They allow us to gain a better sense of the range of data that humanities scholars employ in their work, the challenges of their content and structure, and the range of research questions they can help to explore. These responses allow us to better understand the process of visualization, and how visualization designers begin to think through the possible design outcomes associated with a particular dataset. And finally, they show us the range of research related to visualization, from both humanistic and computational perspectives, and what an expanded converation about visualization, its possibilities, and its limits could become.
What are your data and why do they matter?
Kim Gallon
My data set is from the digitized Baltimore Afro-American newspaper that is included in the Proquest Historical Newspaper database collection. It is located in a HNP XML text file. Marked with clear fields, this “BIG Data” set consists of close to a million pages of newspaper coverage published between 1893-1988. This data offers a powerful opportunity to conduct text mining to identify the patterns and trends in language used to designate gender and sexual identity among African Americans over time. Yet, even as this data set holds the potential to transform what we know about African American sexuality, a broader understanding of the history of sexuality and how race came to bear on sexual discourses constructed by the medical and scientific establishment is necessary to fully understand the text that is mined. Moreover, a significant understanding of the role the black press played in constructing and shaping African American individual and collective identity is vital to analyzing statistical outputs. The following humanistic questions serve as a starting point for discovery and exploration of this data: 1) How did terms and definitions about sexuality among African
Americans change over time? 2) What relationship did sexual language in the African American community have to broader historical events? 3) What connection might have existed between sexual language and racial signifiers?
Yanna Yannakakis
My digital project “Power of Attorney: Law, Native People, and Social Networks in Southern Mexico, 1700-1852” analyzes the relationship between legal representation and the exercise of local power in Oaxaca, Mexico’s most indigenous and linguistically diverse region. Due to the linguistic skills and specialized knowledge required to bring a case to court, many native litigants hired representatives who were well versed in legal genres, discursive forms, and procedures, and could interact with officials in Spanish courts and shepherd cases through proper channels and stages of appeal. I am interested in expressing the relationship among native communities, their legal representatives, and the courts in order to better understand indigenous engagement with the law.
I derive my data from the notarial documents that recorded the process of granting power of attorney. I am in the process of copying or digitizing approximately one thousand “letters of attorney” from the Archivo Histórico Judicial de Oaxaca and the Archivo Histórico de Notarias de Oaxaca. Although highly formulaic, these documents record important information: who sought to grant power of attorney, to whom, and for what purpose. I have begun a pilot in cooperation with Emory’s Center for Digital Studies (ECDS) that includes approximately 250 of these records. I culled data from the documents across many categories in an excel sheet, and then with Sara Palmer from ECDS, simplified the data into a relational database that expressed the names and places of origin of the grantors and grantees of power of attorney, as well as the date and place of the document’s production (in short, the categories are: places, people, events).
Nicholas Shapiro
My collaborator, Bill McKenna, and I are bringing atmospheric data from the Global Forecasting System. The slice of this large data set that we will be focusing on relates to wind direction and wind velocity from sea level up to 30,000 feet. The data is publicly accessible and updated four times a day, yet understanding the data, as Bill knows, takes years of intense and intimate work and experimentation. On an instrumental level this information is important to us because we are working on a project lead by the artist Tomas Saraceno to develop solar balloon travel as a viable means of transportation. It is the only hydrocarbon-free means of flight. The wind travels in different directions at different altitudes. So, with a flight path visualization, solar aeronauts could alter their altitude to catch winds of different directions and hone in on a specific destination. Think about how this simple visualization could humanize the sea of air above us, enabling the viewer to imagine where she could go with simply the power of the sun and wind. Think about how this calculated submission to elemental forces could provide alternatives to the prevailing ethos of engineering that aim to overcome nature. Imagine that you could trace backwards the breathe you just inhaled to see where it was before it entered your body, how might that knowledge re-situate your place in the world and your politics?
David J. Kim, Jim Casey, and Labanya Mookerjee
Our data comes from the Colored Conventions Project, a collaborative group that brings new digital life to the buried records and extraordinary history of 19th-century African American conventions. National and state conventions spanned sixty years, engaging thousands of Black leaders and participants in debates and calls for racial justice. Though they speak to contemporary issues of equality, access, and activism, they have received little scholarly or public attention.
We aim to change that partly through three areas of data. First, we have a Convention Name Index (csv file) that lists the names, residences, and attendances of 3-4,000 convention delegates. Second, the CCP Corpus (plain text) contains 1,400 transcribed pages of minutes from 40 conventions. Third, we have the CCP Catalog (csv file) that tracks 142 conventions as events (place, date, type, host religious affiliation, etc) and as publications (publisher, place, date). All of these are in-progress. Eventually we expect to have more than 5,000 names and 200 conventions in 31 states.
The data help us raise important and challenging questions. What activist communities emerged in the conventions? Do certain communities correspond with certain topics at different meetings? Did those networks develop by area or can we visualize any patterns of mobility for Black leaders across periods of slavery and Reconstruction-era racial oppression? Early visualizations are promising, but deriving data from the minutes risks reproducing the erasure of women’s crucial roles in the conventions. Women are named rarely or just in the aggregate, as in “the Ladies of Sacramento”. How do we account for these silences that resound in our data?
Miriam Posner
New Zealand’s libraries and archives (like those of many countries) primarily use Library of Congress Subject Headings (LCSH) to organize their materials. Since these subject headings are widely used, they allow institutions to share records and organize information uniformly. However, LCSH also impose a Western (and specifically American) worldview on the materials they categorize. For New Zealand’s Māori community, the LCSH proved unacceptable for describing cultural materials.
In response, a group composed of library and information science professionals and Maori experts began in 2000 to develop an alternate classification scheme: the Ngā Ūpoko Tukutuku, or Māori subject headings, which comprise the dataset that I’m bringing to the workshop. Where LCSH begin from Western categories of knowledge (philosophy, world history, history of the Americas, etc.), the Māori subject headings encode Māori ways of relating to the universe. For example, the most general headings are Wairua (the spiritual), Tinana (the physical), and Hinengaro (the psychological/mental).
What would a visualization of the Māori subject headings look like? The headings, encoded in XML, form a hierarchy that seems amenable to a network visualization, similar to the LCSH Galaxy (http://cads.stanford.edu/lcshgalaxy/) produced by Stanford Unviersity and the Library of Congress. If the LCSH terms form a galaxy, what alternate galaxy might we see when we visualize the Māori subject headings?