What is Humanities Data Visualization?

What is "humanities data" as a concept and framework? What are some examples of humanities datasets? What research questions do they help answer, and what is the potential for their visualization?

To begin to answer these questions, we asked workshop participants to provide a response to one of three more bounded questions related to their particular expertise. For humanities scholars bringing their data to the workshop, we asked: "What are your data and why do they matter?" For humanities scholars and computer scientists whose research concerns visualization, we asked: "What does your visualization research entail?" And for visualization designers, we asked: "What are the initial questions or activites you employ when beginning a new visualization project?"

Taken together, these responses begin to trace the contours of the field we might call humanities data visualization. They allow us to gain a better sense of the range of data that humanities scholars employ in their work, the challenges of their content and structure, and the range of research questions they can help to explore. These responses allow us to better understand the process of visualization, and how visualization designers begin to think through the possible design outcomes associated with a particular dataset. And finally, they show us the range of research related to visualization, from both humanistic and computational perspectives, and what an expanded converation about visualization, its possibilities, and its limits could become.

What are your data and why do they matter?

Kim Gallon
My data set is from the digitized Baltimore Afro-American newspaper that is included in the Proquest Historical Newspaper database collection. It is located in a HNP XML text file. Marked with clear fields, this “BIG Data” set consists of close to a million pages of newspaper coverage published between 1893-1988. This data offers a powerful opportunity to conduct text mining to identify the patterns and trends in language used to designate gender and sexual identity among African Americans over time. Yet, even as this data set holds the potential to transform what we know about African American sexuality, a broader understanding of the history of sexuality and how race came to bear on sexual discourses constructed by the medical and scientific establishment is necessary to fully understand the text that is mined. Moreover, a significant understanding of the role the black press played in constructing and shaping African American individual and collective identity is vital to analyzing statistical outputs. The following humanistic questions serve as a starting point for discovery and exploration of this data: 1) How did terms and definitions about sexuality among African Americans change over time? 2) What relationship did sexual language in the African American community have to broader historical events? 3) What connection might have existed between sexual language and racial signifiers?

Yanna Yannakakis
My digital project “Power of Attorney: Law, Native People, and Social Networks in Southern Mexico, 1700-1852” analyzes the relationship between legal representation and the exercise of local power in Oaxaca, Mexico’s most indigenous and linguistically diverse region. Due to the linguistic skills and specialized knowledge required to bring a case to court, many native litigants hired representatives who were well versed in legal genres, discursive forms, and procedures, and could interact with officials in Spanish courts and shepherd cases through proper channels and stages of appeal. I am interested in expressing the relationship among native communities, their legal representatives, and the courts in order to better understand indigenous engagement with the law.

I derive my data from the notarial documents that recorded the process of granting power of attorney. I am in the process of copying or digitizing approximately one thousand “letters of attorney” from the Archivo Histórico Judicial de Oaxaca and the Archivo Histórico de Notarias de Oaxaca. Although highly formulaic, these documents record important information: who sought to grant power of attorney, to whom, and for what purpose. I have begun a pilot in cooperation with Emory’s Center for Digital Studies (ECDS) that includes approximately 250 of these records. I culled data from the documents across many categories in an excel sheet, and then with Sara Palmer from ECDS, simplified the data into a relational database that expressed the names and places of origin of the grantors and grantees of power of attorney, as well as the date and place of the document’s production (in short, the categories are: places, people, events).

Nicholas Shapiro
My collaborator, Bill McKenna, and I are bringing atmospheric data from the Global Forecasting System. The slice of this large data set that we will be focusing on relates to wind direction and wind velocity from sea level up to 30,000 feet. The data is publicly accessible and updated four times a day, yet understanding the data, as Bill knows, takes years of intense and intimate work and experimentation. On an instrumental level this information is important to us because we are working on a project lead by the artist Tomas Saraceno to develop solar balloon travel as a viable means of transportation. It is the only hydrocarbon-free means of flight. The wind travels in different directions at different altitudes. So, with a flight path visualization, solar aeronauts could alter their altitude to catch winds of different directions and hone in on a specific destination. Think about how this simple visualization could humanize the sea of air above us, enabling the viewer to imagine where she could go with simply the power of the sun and wind. Think about how this calculated submission to elemental forces could provide alternatives to the prevailing ethos of engineering that aim to overcome nature. Imagine that you could trace backwards the breathe you just inhaled to see where it was before it entered your body, how might that knowledge re-situate your place in the world and your politics?

David J. Kim, Jim Casey, and Labanya Mookerjee
Our data comes from the Colored Conventions Project, a collaborative group that brings new digital life to the buried records and extraordinary history of 19th-century African American conventions. National and state conventions spanned sixty years, engaging thousands of Black leaders and participants in debates and calls for racial justice. Though they speak to contemporary issues of equality, access, and activism, they have received little scholarly or public attention.

We aim to change that partly through three areas of data. First, we have a Convention Name Index (csv file) that lists the names, residences, and attendances of 3-4,000 convention delegates. Second, the CCP Corpus (plain text) contains 1,400 transcribed pages of minutes from 40 conventions. Third, we have the CCP Catalog (csv file) that tracks 142 conventions as events (place, date, type, host religious affiliation, etc) and as publications (publisher, place, date). All of these are in-progress. Eventually we expect to have more than 5,000 names and 200 conventions in 31 states.

The data help us raise important and challenging questions. What activist communities emerged in the conventions? Do certain communities correspond with certain topics at different meetings? Did those networks develop by area or can we visualize any patterns of mobility for Black leaders across periods of slavery and Reconstruction-era racial oppression? Early visualizations are promising, but deriving data from the minutes risks reproducing the erasure of women’s crucial roles in the conventions. Women are named rarely or just in the aggregate, as in “the Ladies of Sacramento”. How do we account for these silences that resound in our data?

Miriam Posner
New Zealand’s libraries and archives (like those of many countries) primarily use Library of Congress Subject Headings (LCSH) to organize their materials. Since these subject headings are widely used, they allow institutions to share records and organize information uniformly. However, LCSH also impose a Western (and specifically American) worldview on the materials they categorize. For New Zealand’s Māori community, the LCSH proved unacceptable for describing cultural materials. In response, a group composed of library and information science professionals and Maori experts began in 2000 to develop an alternate classification scheme: the Ngā Ūpoko Tukutuku, or Māori subject headings, which comprise the dataset that I’m bringing to the workshop. Where LCSH begin from Western categories of knowledge (philosophy, world history, history of the Americas, etc.), the Māori subject headings encode Māori ways of relating to the universe. For example, the most general headings are Wairua (the spiritual), Tinana (the physical), and Hinengaro (the psychological/mental). What would a visualization of the Māori subject headings look like? The headings, encoded in XML, form a hierarchy that seems amenable to a network visualization, similar to the LCSH Galaxy (http://cads.stanford.edu/lcshgalaxy/) produced by Stanford Unviersity and the Library of Congress. If the LCSH terms form a galaxy, what alternate galaxy might we see when we visualize the Māori subject headings?

What does your visualization research entail?

John Stasko
My research primarily involves the design, development, and evaluation of new information visualization techniques and systems. Because of that, I am always looking for people from interesting new domains who have large collections of data and are seeking to gain insight from them. I have not worked extensively with data from the digital humanities, but have encountered some related projects and issues when people have used my group's Jigsaw visual analytics system to explore document collections. In the workshop, I hope to learn more about the digital humanities, the types of data found in this area, and how visualization might help digital humanities researchers in their work. In my own research these days, I am exploring ways to better articulate the value of visualization as an exploratory, analytic, and communicative tool. I look forward to discussing with humanities researchers how visualization could potentially benefit them, and how we might be able to articulate, explain, and perhaps even quantify that value.

Chris Weaver
The main theme of my research is interaction between humans and computers in processes of visual analysis. I study richly interactive graphical representations of information as contexts for exploration and analysis of humanistic, social, natural, and built systems. My central research questions revolve around the issues of expressing queries and interpreting computed results visually: How can people express complex human questions through interactive manipulation of visualized data and the graphical spaces in which they are drawn? How do people interpret visual representations of data computed in response to such manipulation, and how can they record their observations through interaction? How can simple representation and interaction building blocks be composed for depiction and querying of complex information structures and relationships? What patterns of composition are useful and usable for exploration and analysis of data sets from particular domains, and in general? How can data processing languages, architectures, and systems be designed to support composition? How can and should human intentions, actions, interpretations, and conclusions about data be captured as data and reflected visually? How can data-capturing visualization support scholarly processes of observation, interpretation, and narrative? I believe that answering these questions will lay theoretical and practical foundations that will help to evolve development of richly interactive visualization tools from a flourishing craft to a science of synthesis. Working primarily in the fields of information visualization and visual analytics, I draw from human-computer interaction, database management systems, programming languages, computational linguistics, software engineering, cognitive and perceptual psychology, and graphical design.

Lauren Klein
My research is concerned, most generally, with the cultural and critical dimensions of data visualization. I’m at work on a book about the history of data visualization that emphasizes how the modern visualizing impulse emerged from a network of complex intellectual and politically-charged contexts in the eighteenth and nineteenth centuries, contexts that still have broad impact today. In other recent work, I’ve attempted to theorize the function of visualization for the humanities, both in terms of its ability to reframe historical data in new ways, and in terms of its ability to call attention to the process of scholarly knowledge production. In other words, I’m interested in how data visualization, and digital methods considered more generally, help to demonstrate not only what knowledge we, as scholars, can produce, but also how come to produce it. The third component of my research involves the practice of visualization. I’m interested in designing data visualizations that present concepts, advance arguments, and perform critique. Most recently, I’ve been exploring how failed visualization schemes of the past, when recreated in digital or physical form, allow us to imagine alternative visual futures.

Gabby Resch
My research grapples with epistemic biases related to ocularcentric interaction. Specifically, I am interested in how epistemic commitments carried from diverse fields, including Human-Computer Interaction, Design, and Museology, to name a few, become entangled in projects that use emergent computational methods or digital tools to promote alternatives to ocularcentric interaction. Along which vectors are these biases and commitments translated into software, hardware, or practices that constitute an unfolding environment of interaction? How are these commitments illuminated, articulated, and negotiated? Are epistemic gaps that emerge in this context always commensurable - or do they always have to be - in successful interdisciplinary projects? Additionally, I investigate how epistemological concepts like consistency, validation, truth, meaning, and knowledge are brought to bear on the question of "post-ocular" interaction. I explore these questions through an experimental wedding of 3D printing and DIY haptics in two related contexts: tactile interaction with museum artifacts caught in a performative state of representation; and the negotiation of subjective data experience through 3D manipulation (in both digital and material arrangements) of what is generally thought of as "flat" data.

Data experience, in both informal and formal academic contexts, has traditionally relied on engagement with standardized data sets, normative representational templates, and software applications that span a spectrum of complexity and sophistication. Rarely, however, does it encourage dynamic and participatory data collection from unwieldy or 'messy' sources; creative expression and reinterpretation that augments existing ocularcentric practices (such as physical data sculpture or narrativization); or critical inquiry into assumptions made by data collectors, about the source of the data (i.e. what gets obscured or occluded), or about the means of representing and interpreting it. I believe that exploring these data-specific themes in an interdisciplinary scholarly setting will illuminate parallel challenges in cultural institutions troubled by a potential shift away from ocularcentric interaction.

Katie Rawson
I am interested in ways that we maintain the variety and provenance of data while making it usable. How do we design systems of data work that allow us to find and explore patterns without erasing difference? The goal is not an idealistic atomization, but being deliberate in and transparent about what we smooth out — and really attending to the decisions we make as we manipulate data and trying to find new ways to do that manipulation and recording. I am currently exploring these questions through Curating Menus, a research and data curation project that uses the New York Public Library’s What’s On the Menu? data set. Our aim is to both conduct research on food culture and to develop a framework that allows the data to be used and reused in a variety of ways and extend to other sets of menu data. Our approach involves making indexes and linkable data; however, I am excited to explore other paradigms and approaches.

Matt Ratto
I was at a ‘big data’ talk recently where the main speaker started his presentation by saying ‘Imagine if we knew where all the sick people were?’ His follow-up statement was ‘We could route the buses around them!’ His overall argument was that with access to data from real-time wearables, health records, and other forms it becomes possible to figure out where in a city people are becoming ill and by changing the routing of public transportation, reduce the potential for contagious diseases to spread. What was missing from both the speaker’s talk and the wider discussion that followed was any sense of the problematic ‘view from above’ that was intrinsic in the big data future that was being described. My sense is that there is a need for broader critical engagement with the socio-technical modes being described under the rubric of big data. My current interest is therefore in the development of ‘critical making’ modes and pedagogies that encourage reflexive, critical, and humanistic stances in conventional data science and information visualization curriculum.

Dawn Nafus
In my design work, I am less interested in the one fabulous visualization optimized for a particular narrative than I am in building and using visual tools to explore data. The people I am working with (Quantified Self folks) have a deep understanding of the context in which the data was generated, and so data acts as a heuristic for them, not hermeneutic. Therefore I emphasize interaction over visual form—the “story” is a malleable one for them. As part of my participation in that community, I began a side project called Atlas of Caregiving, in which we gave family caregivers the sorts of sensors that Quantified Self people often use, and interviewed them before and after. We were exploring what we could learn about stress and time use from the approach, and whether participants themselves got any value out of the “self” enquiry that went beyond the constrained ways that sensing devices serve up data to people. The answer turned out to be yes, there is indeed value for them as well as us researchers, which for us raises interesting questions about what participatory research might look like with respect to sensor data beyond neoliberal “citizen science.” However, we hit a number of constraints that meant that the particular data types/data rates we were working with cannot be easily packaged up in ways that make it implementable by caregivers themselves, without researchers’ help, and parsing it ourselves proved more difficult than we would like.

Roderic N. Crooks
I am a critical scholar who engages with cultural aspects of new media and information technology, primarily through ethnographic research in minority communities. I am very interested in drawing out the subjectivity presumed and shaped by digital artifacts and systems. The central object waiting for analysis in digital humanities is the digital humanistic subject. What is the subjectivity assumed and shaped by the digital? In DH work, this question can be brushed aside in favor of an emphasis on output and intellectual product (visualizations, maps, charts, graphics, and so forth), but in cognate fields that study digital media, this question is increasingly gaining scholarly attention. Because digital humanities invites scholars from various disciplines to work together, these questions are difficult to frame and address, but ignoring them produces wan, superficial scholarship. The idea should be to get more work in DH to engage with this central theoretical void and to understand what kind of a persona the DH approach imagines and brings into being.

Paolo Ciuccarelli
Our research entails the design of novel interfaces that enable peculiar inquiry processes on complex sets of data. Humanists - as for our experience - quest for a peculiar - constructive and interpretative - relationship with data, and that's where our approach (human centered, data agnostic and context sensitive) can help. Plus, we aim at coping with complex issues by researching on multi-perspective interfaces, visualization of uncertain data and processes, visual reduction of complex patterns (i.e. graphs)

Duen Horng (Polo) Chau
My research bridges data mining and human-computer interaction (HCI) to synthesize scalable, interactive tools that help people understand and interact with big data, e.g., massive networks with billions of nodes and edges. I blend techniques from machine learning (Belief Propagation), data mining(anomaly detection), visualization and user interaction.

Jennifer S. Singh
My current research is investigating systems of care in relation to barriers in autism diagnosis and services in metro Atlanta in the Latino and African American communities. I am conducting in-depth interviews with clinicians, service providers, educational specialists, social and cultural brokers, and parents who have a child with autism. I am also conducting participant observation at the Autism Clinic, which provides diagnosis and follow-up services to underserved communities in Atlanta, as well as at Parent-to-Parent of Georgia that offers autism support group meetings and workshops for parents. I am interested in transforming this qualitative work (Interviews and participant observation) into a visual artifact that can help students and policy makers better contextualize the challenges parents face while navigating autism services, especially for underserved communities. My research brings into focus how structural and institutional barriers shape inequalities to diagnosis and services, as well as the labyrinth of coordination and care parents must negotiate in order to get services for their autistic child. I also have a forthcoming paper; titled Parenting Work and Autism Trajectories of Care that empirically and theoretically articulates the range of work parents must engage in order to get a diagnosis, access educational services, and provide opportunities that enable their children to reach his or her fullest potentials.

Marisa Parham
Right now my workgroup is mostly thinking about visualizing social media, particularly in terms of spatially mapping Twitter activity, as well as the question of what would or could constitute sentiment analysis and visualization in a world of likes, favorites, and retweets. We are also interested in the particularities of archiving streamed data sets, as well as working harder to problematize how the techniques that might produce better or more explanatory social media data, and its visualization, share uncomfortable relationships to state surveillance and privacy concern-addled marketing techniques. Underlying all of this work is an interest in how social media makes state violence towards black life constantly visible and explicable in ways that are both necessary and deleterious. Where are the limits of visualization, insofar as acts of successfully visualizing data also, potentially, visualize things outside of a proper set?

Jennifer Sterling
While sports occupy a largely quantified presence in society, and approaches to the study of sporting bodies are often scientific in nature, their cultural significance is widespread and well-researched. My research sits at these intersections and examines how scientific and quantified ways of knowing the active human body are connected to larger social, political and economic forces, provide inequitable representations, and shape understandings of race, ethnicity, gender, class, age and ability. As the sports industry (from journalism to player management) and sporting research (from kinesiology to sport management) become more data, data analytic, and data visualization driven in a digital era, I am interested in understanding what and who is being left out (of both representation and usage) and how similar methods could be harnessed to process and highlight social issues in sport and their complexity and nuance. I think this final question aligns with wider humanities visualization discussions that are attempting to understand the possibilities and limitations of visualization for cultural and civic data, and that the duality of sports as a field of research offers unique contributions and perspectives to these conversations and presents a need for their inclusion in them.

Christopher Le Dantec
My research entails integrating community participation into the design and creation of digital tools for civic engagement. This often means working to synthesize lots of different kinds of data into meaningful and persuasive interfaces that help communities advocate for their needs. Projects may work entirely with oral history data and need to provide ways for community members to construct arguments about their neighborhoods piecemeal from the diverse voices contained in that oral history. Other projects may rely on instrumented data where policy makers and city officials need to make sense of community needs through crowdsourced data production. Across the different kinds of project I am involved, I am interested in how different visualization techniques can be brought to bear or created to convey specific arguments—whether those arguments are expressly created by a concerned public or whether they emerge through sense-making tools for professionals working with publicly produced data.

Nihad M. Farooq
How do we collect data and trace associations from narratives of abduction, separation, dislocation, and omission from the public record? How did slave networks operate across these varied temporalities, affiliations, and experiences to create unified structures of organization and revolt in the Atlantic world? My larger work examines how news and information traveled, and how networks of communication forged disparate yet powerful communities of resistance in the Black Atlantic. I suggest an expansion of the recent concept of diasporic dispersal and the resulting “underground” enabled by it, as a vast network of fleeting encounters that drew its very strength from transience and invisibility, and that relied on the willful occlusion of intent, evidence, and tracking as a deliberate strategy of resistance. As such, this is a project that is organized around the resistance to data, and the politics, potentialities, and challenges of constructing narratives from elision, and of attempting to render the ephemeral.

Sara Palmer
I’m interested in exploring how to organize humanities data in ways that can support a variety of visualizations. In designing relational databases, I have encountered the tension between immediate and future research goals as well as the challenges of constructing complex queries to meet the needs of different visualization tools.

What are the initial questions or activities you use to begin a new visualization project?

Jessica Yurkofsky
One of the things I find most exciting about visualizing unfamiliar data is opening myself up to unexpected discoveries and serendipity. In the early stages of projects, I find it valuable to rapidly iterate on which aspects of the data are the focus, which fields are turned on or off, and look for the occasions where a rich and useful visual noise is generated, patterns that ask their own questions and provoke curiosity; surprising balances of light and dark, or unexpected ripples that emerge when overlaying large amounts of text. Moving back and forth between this exploratory mode of design and one driven by research questions helps me to think laterally, to discover more, and to layer moments "interestingness" onto the visualization.

Blacki Migliozzi
Usually during the process of cleaning, I start by trying to understand how the data was created and for what purpose. I leverage charting tools to quickly plot and map the data in as many ways as possible to get a sense of what it looks like. I sketch on paper and try to imagine how one would experience the data.

Early on I try to talk to a domain expert to get a sense of what they feel is important in the data. If possible I try to instill some value judgements about what are responsible ways to represent this data. In an effort to try to understand the target audience I try to speak with non-experts. I attempt to explain to them conversationally my understanding of the data and what I find interesting and challenging.

Song Hia
These are a few of the questions I might ask when approaching a new visualization project.

Choosing A Dataset: Why am I interested in a particular dataset? Who does it hold value for and why?

Examination: What it is the origin of the dataset? What are the elements of the data? What attributes and patterns are available to potentially express? What’s missing? How can I learn more about the relationship people currently have with the information at hand?

Sketching Experiences / Prototyping: What stories can I tell? How can the data be grouped and presented? Can people interact with the data? Can I make the experience feel personal? Is it appropriate to make the data playful? Is there something interesting that can be discovered by recontextualizng the data or combining it with different datasets or forms? What do I hope people gain from the experience and how can I convey this?

Jim Foley
I have always been fascinated by the challenge and process of mapping data to visual representations. So I strive to understand underlying data models and the questions that users might ask about the data, and then imagine new visualizations, and empirically test those visualizations.

Jenna Fizel
I usually have two concerns when I start a visualization project: what are you trying to convey and what are the qualities of the data? I pretty strongly believe that any visualization must have some thesis or intention. The data may either support or contradict that intention, but without it as an organizing principle it’s really hard to make the visualization meaningful to viewers. Equally important is understanding the quality and texture of the data being used. The content, completeness, uniformity, size and reliability of the data itself should be reflected in the visualization. A good visualization should demonstrate the viewpoint of the creators while being as clear and transparent as possible about the qualities of the supporting data.

Catherine D'Ignazio
I try to learn as much as possible about the human and technical web of relations that produce the dataset. Why is it collected? Who is it for? Who are the stakeholders? Who processes it? Material things - Do people write on paper to log things, use handhelds, use sensors? What decisions are made with it? Increasingly I'm interested in the idea of producing not just data dictionaries but user guides to data sets that verge on the ethnographic. I haven't done this yet but I'd love to talk about it with folks in the workshop. I think those could hold some really interested possibilities for visualization, too. I.e. You first collect and visualize the sociotechnical, political, contextual metadata about your data. I'm also interested in data sets with distinct and idiosyncratic human voices. So, for example, my students and I have been working with the City of Boston to "visualize" citizens' ideas for the future of transportation in Boston as animated GIFs. In all the projects and processes, the consideration of site and audience/community is paramount. Who is the visualization for? How do we architect attention? What is our theory of change?

Paolo Ciuccarelli
We don't use any specific tool, we start any (research) project by discussing with "clients", stakeholders and/or prospect users about the expected results, with recurrent questions being "what do you want to see (or to make visible)" and "for what purpose (what do you want to enable)"

Often we use visualization (quick drafts and sketches) as a tool to provocate feedbacks (especially when they don't come easily)

Matthew Battles
Our visualization research explores collections and the institutions that hold them as sociotechnical systems for the display, preservation, and discipline of objects of abiding interest. Our methods include participant observation, critical reflection, and expressive visualization of metadata as cultural artifacts always in the making and traces of the correspondence of multiple communities. In the art museum, where much of our work currently takes place, information networks act as cybernetic systems of aesthetic immanence, cool mediators of the methods and materialities coded as “art” by the disciplines that claim and structure the institution. In museum catalogs, interactive displays, and online presences, the design and material constraints of data systems dialogue with art history, connoisseurship, and instrumental valorizations of art in ways that act to privilege, efface, or syncretize multiple, overlapping sets of norms. Through visualization, scholarship, and multimedia installations, we encounter boundaries of practice, meaning, and value—always alive to the dark abundance of unglimpsed possibility lying beyond the walls of institutions and the norms that discipline and define them.

Alex Endert
I tend to think about the tasks of the people who will be using the tools I create to make sense of their data. This includes semi-structured interviews, observations of tasks using current processes and tools, and general requirements gathering.