Program

(T002)
The Lives and Deaths of Data
Location 122
Date and Start Time 01 September, 2016 at 14:00
Sessions 5

Convenors

  • Sabina Leonelli (University of Exeter) email
  • Brian Rappert (University of Exeter) email

Mail All Convenors

Short Abstract

This track investigates the relational constitution of data: how stages in the life of data articulate to one another and the challenges involved in storing, moving, classifying, manipulating and interpreting them.

Long Abstract

This session explores the collectivities emerging through data collection, dissemination, assemblage and analysis. Analysing the ways in which information becomes taken as given things, the manner in which data and their varying contexts of use are co-constituted, and the means by which utility is invested and divested in them provides a platform to explore and challenge the powers attributed to "Big" and "Open" data by governments, lobby groups and institutions around the world. With its long standing attention to the conditions of knowledge production, STS scholarship is well positioned to reflect on the value(s) attributed to data under a variety of different circumstances, how such attribution changes in time, and what this indicates about the properties of the objects being identified and used as 'data' and of the communities involved in such identification and use. Questions to be addressed include: What would it mean to speak of the birth of data? How do they develop, especially when they are used for a variety of purposes by different stakeholders? Do they ever cease to be data, and how can we conceptualize situations in which data are dismissed, forgotten, erased, lost or regarded as obsolete? This session will be organised as a set of individual presentations encompassing several different aspects and areas of data use. We aim to allocate between 15 and 20 minutes per paper, and to allocate chairs who can also work as discussants, helping to bring the content of the papers together.

SESSIONS: 5/5/5/5/4

This track is closed to new paper proposals.

Papers

Journeys and Deaths of Scientific Data

Author: Sabina Leonelli (University of Exeter)  email

Short Abstract

This paper discusses the idea of data journeys and its implications for the conditions under which objects can cease to be treated as scientific data. It is grounded on an ongoing empirical study of the movements of data across a variety of situations.

Long Abstract

This paper discusses the idea of data journeys and its implications for understanding how objects come to be treated as scientific data, and cease to be treated as such. I focus on cases where online databases act as crucial passage points for data travel, documenting the types of expertise, resources and conceptual scaffolding used by database curators and users to expand the evidential value of data thus propagated. In particular, I consider 'omics' data gathered on model organisms (particularly thale cress and yeast); phenomics data gathered on plants; and data about cancer mutations gathered on human and non-human organisms. From the reconstruction and qualitative analysis of such data journeys, I draw a conceptualisation of data as relational objects, whose epistemic role is defined by their use as prospective evidence for claims. In this view, data are mutable mobiles, and their integrity is a matter of lineage rather than of stability and reticence to change. Within this framework, data death is a common occurrence and happens whenever the objects that serve as data are lost, misplaced, mistreated and/or forgotten. At the same time, death is not necessarily the final stage of the life of data. The same dataset can die and resurrect in a variety of forms and for different purposes, depending on the treatment of the objects concerned and the materials, infrastructures, institutions and cultures within which such treatment is embedded.

A new life abroad: the portability of racialized data.

Authors: Andrew Smart (Bath Spa University)  email
Kate Weiner (University of Sheffield)  email
Catherine Will (University of Sussex)  email

Short Abstract

We examine the life of data on the effectiveness of treatments for hypertension, in particular how racialized data circulates internationally. The paper explores how data that is encoded by national-ethnic labels can be decoded and re-labeled, giving it a new life abroad.

Long Abstract

In this paper we examine the life of data on the effectiveness of treatments for hypertension, in particular how racialized data circulates internationally. The use of racialized categories in biomedical research and practice is controversial for a variety of social and scientific reasons, but it continues nonetheless. We will reflect on the portability of racially categorized data across national borders and the 'work' and logics that are necessary to enable the data to have life beyond the land of its birth. Our empirical work has focused on prescribing guidelines for hypertension in England and Wales and the data that underpins the racialized treatment pathways they recommended. We were interested to understand the controversies that might surround the production and use of prescribing guidance that required practitioners to make judgments about a patient's race/ethnicity. We undertook semi-structured interviews with experts involved in guideline development, and we traced a variety of documentary sources, including the published clinical trials data that was cited as evidence in the guidelines. One debate that emerged related to using clinical trials data that predominantly came from 'African Americans' to guide prescribing for 'Black British' patients. The paper explores how data that is encoded by national-ethnic labels can be decoded and re-labeled, giving it a new life abroad.

Key Issues in Social Studies of Disclosure Control

Authors: Andrew Turner (University of Bristol)  email
Madeleine Murtagh (University of Bristol)  email
Paul Burton (University of Bristol)  email

Short Abstract

We highlight the complex and changing contexts in which sensitive health data live, focusing on disclosure control methods that seek to create data, or to create specific contexts for data, that allow it to be considered low-risk, anonymous, and useful.

Long Abstract

Sharing of health data is crucial in order to reduce waste and inefficiency as well as to maximise the scientific potential of data. However, there is a well-recognised need to protect the confidentiality of participant's and their data. Consequently calls for increasing openness of of health data have been accompanied by new norms and techniques to evaluate and control disclosure risk.

This presentation will address the themes of the track by highlighting the complex and changing contexts in which sensitive health data live; focusing on how and why disclosure control methods seek to create data, or to create specific contexts for data, that allow it to be considered low-risk, anonymous, and useful. We review some of the central concepts deployed in the literature about disclosure control. For example, the construction of risk in terms of the internal statistical properties of data, and in terms of the external context, or 'data environments' (Elliot et al, 2010), which data inhabit and move through. Finally we provide some suggestions for areas where STS scholarship may usefully engage with the field of Statistical Disclosure Control.

Creating Infrastructures: The Rise and Imaginary of Microfilm (1920-1950)

Author: Estelle Blaschke (University of Lausanne)  email

Short Abstract

The paper investigates the history of microfilm as a missing link between the materiality of paper and the immateriality of the digital.

Long Abstract

The history of microfilm ties into the earliest and deepest imaginaries present since the invention of photography: the dream of immateriality, of 'collecting everything', and of providing access to vast archives and collections. Along the investigation of seminal projects and events between the 1920s and the 1950s (Project A, World Exhibition 1937, Emergency Program, UNESCO Mobile Microfilm Unit), this paper sheds light on the formation of transnational networks of people, companies, research and governmental institutions which propelled the idea of microfilm as a future, 'global' information technology. While the modern history of microfilm is rooted in Europe, it was developed, tested and advanced in the United States in form of large-scale copying programs for foreign manuscripts, books, newspapers, pictorial material as well as government and business data. This paper reflects on the intellectual, economic and political apparatus that was put in place to enhance the ways in which especially scientific and historical sources were shared, diffused, preserved and appropriated through photography. But while the microform (microfilm, microfiche) was established as a information and preservation technology during the 20th century, the transition to the digital medium as a further step in the dematerialisation of storage devices puts into question the longevity of data and the sustainability of media.

Data birth, transformation and use in complex systems sciences

Author: Fabrizio Li Vigni (École des Hautes Études en Sciences Sociales)  email

Short Abstract

Data are supposed to talk about the study objects. What's their origin, transformation and use in complex systems sciences? Three different cases will allow us propose a classification of a different relation to study objects in this interdisciplinary field of research.

Long Abstract

In the last 10-15 years, computing power, as well as the Big Data era, have allowed scientists reach ambitious goals. Modeling and simulation are thus central to old and new sciences, especially in the field of complex systems sciences. But they produce a new kind of experience, which can have a different relationship to data. Particularly, in many complex systems sciences subfields of research, the study objects are unseizable, so they must be reconstructed through a long interpretative process. How do scientists using these tools obtain legitimacy and credibility, not only among their peers but also among policy makers? Before being able to answer this and other questions, I think we must answer another, more fundamental one: What is the epistemology inscribed in these sciences and what does it imply as for the treatment of data? Refusing the normative, overhanging viewpoint of philosophy of science all alone, I tried to describe the actual epistemologies of three subdomains of complex systems sciences by means of an ethnographic study of their practices. I chose a laboratory of biology, one of epidemiology, and one of geography. Data are supposed to talk about the study objects. What's their origin, transformation and use in complex systems sciences? I have used different theoretical frameworks to propose a classification of the main epistemological approaches that one can find in these fields in relation to their distance from the study object. This work can shed new light on a socio-epistemological approach to sciences.

When are data? Reflections on the making (possible) of research data

Authors: Jutta Haider (Lund University )  email
Sara Kjellberg (Lund University)  email

Short Abstract

A study of how research data is approached in the building of big science facilities. A focus on temporal aspects makes visible how the meaning of data is contingent to when they are approached, to the possibilities of support infrastructures’ materiality and to strategic roles data are assigned.

Long Abstract

Digital research data is often advanced as a trouble-free concept and used to describe the material of research at all stages in a research process. However frequently such framing lacks understanding of the roles of disciplinary and organisational cultures, and of what the various practices, including those of funding policy and university administration, involved in shaping what research data are and can do and when. In this paper we explore data and the making of data during the construction of two big science facilities in Sweden, the European Spallation Source (ESS) and MAX IV. Specifically we focus on the necessary infrastructure for dealing with various aspects of research data management (RDM). We transform the question "What are data?" into "When are data?" and highlight how research data are also shaped by their entanglement across different support services and professional practices aiding, preceding and succeeding the work of the researcher. This brings into relief how the making of data relates to the making possible of data and that making possible occurs by setting up and planning for production, storage, and use of research data. Paying attention to temporal aspects in the construction of research data allows us to show some ways in which the meaning of data is contingent to when they are approached, to the possibilities of support infrastructures and their various materialities and to strategic roles data are assigned. This paper draws on a study carried out in 2014 and 2015. The material used was compiled in interviews and document studies.

How not to get scooped

Author: Goetz Hoeppe (University of Waterloo)  email

Short Abstract

This paper traces how members of an astronomical research collaboration achieved agreement on a digital astronomical dataset as they navigated the changing context of other projects and the dangers of being scooped by competing teams.

Long Abstract

My contribution to the panel would consider how members of a collaboration of research astronomers have worked together in making a digital astronomical dataset using observations from diverse telescopes around the globe and in space. It traces how members achieved agreement on the dataset as they negotiated the temporalities of production of its constituent data and its value, navigating the changing context of other projects and the dangers of being scooped by competing teams.

Combining ethnography with an ethnomethodological analysis, the paper draws on fieldwork at an astronomical research institute, at two observatories, and at group and collaboration meetings, where I have documented a series of instructional and collaborative interactions. These include teacher-student and peer interactions at screen-work, group meetings and teleconferences, which I have documented with audio and video recordings.

The Limits to Data Sharing in Low-resourced Research Environments

Authors: Brian Rappert (University of Exeter)  email
Louise Bezuidenhout (University of Notre Dame)  email

Short Abstract

Based on fieldwork with (bio)chemistry laboratories in sub-Sahara Africa, this presentation examines the global inequalities in research capabilities that lead scientists opt not to share their research data with other scientists.

Long Abstract

'Open Data' has recently emerged as a prominent label for renewed attempts to promote greater exchange in science. As part of such efforts, the release of data is often portrayed as mutually beneficial: individual scientists accrue greater prominence while at the same time fostering communal knowledge. This assumption, however, is not unproblematic. Based on fieldwork with (bio)chemistry laboratories in sub-Sahara Africa, this presentation examines a variety of reasons why scientists opt for closure over openness. We argue that the heterogeneity of research environments calls into question many of the presumptions in Open Data. Inequalities in research environments can mean that moves towards sharing and openness create binds and dilemmas. These observations suggest that if scientists in low-resourced research settings are to enjoy the increased credibility that Open Data offers, more must be done. Indeed, those promoting openness in data sharing must critically examine current research and funding structures that continue to perpetuate these inequalities. This presentation will conclude by suggesting a novel approach facilitating openness in research by enabling scientists to address their day-to-day demands. Such a starting basis provides an alternative but vital link between the aspirations for science aired today and the everyday challenges of undertaking research in low-resourced environments.

The shaping of an e-research infrastructure: drawings as equipped data

Authors: Dominique Vinck (Lausanne University)  email
Pierre-Nicolas Oberhauser (University of Lausanne)  email

Short Abstract

The paper accounts for a research project aimed at studying children’s representations of supernatural agents. The project team is going through the set-up and analysis of thousands of drawings made by children from various regions of the world. Our study documents the process of equipping these data.

Long Abstract

Digitization of human products leads to data and databases that open new opportunities for researchers in the humanities and social sciences. However, the process is not straightforward. The path that leads from "raw" material to usable data is a tortuous and complex one. The shaping of digital data appears to be a key stage during which individuals and teams reconsider not only the objects under study and their research goals but also their competences, tools and relationships to other disciplines (e.g. IT experts).

The paper is based on a two-year participant-observation inside an interdisciplinary research project aimed at studying children's representations of supernatural agents. The project gathers scholars and practitioners from developmental and cultural psychology, social psychology, religious studies and computer science. Together, they are going through the set-up and analysis of thousands of drawings made by children from various regions of the world (Brazil, Iran, Japan, Romania, Russia, Switzerland, etc.). Our paper explores the emergence of the research collective and the way collaboration is achieved (or endangered) between team members. It documents the constitution of the data (digitized drawings and metadata), the database and the analytical tools. It shows the emergence of an e-research infrastructure and the way choices of various orders are progressively blackboxed into data, IT tools, research protocols, organization and skills. Central to our argument is the notion of data equipment, which we use to describe the process of adding various entities to data in order to enable their circulation and use.

Linkage, Exploration and Gatekeeping: The Role of Information Security In Biomedical Data Journeys

Authors: Niccolo Tempini (University of Exeter)  email
Sabina Leonelli (University of Exeter)  email

Short Abstract

The paper explores how information security strategies and solutions affect the trajectories and directions of data journeys and data-intensive discovery, on the basis of the ethnographic study of two linkage infrastructures for biomedical and environmental data based in the UK.

Long Abstract

Whether data consigned to databases are accessible and useable depends partly on the strategies employed by database developers and curators to keep data alive as potential evidence and valuable commodities. As well-documented by STS scholars and Open Science advocates, those strategies are informed and constrained by many factors, ranging from financial and human resources to available materials, skills, expertise, policies, incentives and institutional locations. In this paper, we explore one crucial factor affecting the inclusion, accessibility and re-usability of data in databases, which has received little attention within STS so far: the management of information security strategies and policies, and its embedding in the material, social and regulatory landscapes of research. We show that security concerns and solutions exert a strong influence on the trajectories and outcomes of data sharing efforts, and the results can be at odds with the emphasis on exploratory research typical of open and big data science. To this aim, we build on an ethnographic study of two data linkage infrastructures in the biomedical research domain: the Secure Anonymised Information Link (SAIL), a databank based in Wales that aims to facilitate appropriate re-use of routine health data generated through public services and of otherwise unavailable datasets generated by scientific projects; and the Medical & Environmental Data Mash-up Infrastructure (MEDMI), which brings together researchers from the Universities of Exeter and Bristol, the London School of Hygiene & Tropical Medicine, the MET Office and Public Health England to link and analyse complex meteorological, environmental and epidemiological data.

Epistemic Data Cultures

Authors: Clifford Tatum (Leiden University)  email
Alex Rushforth (Leiden University)  email
Sarah de Rijcke (Leiden University)  email

Short Abstract

In this study we shift focus from concerns of open data to a stratified account of data sharing practices. Through in-depth case studies, the aim of this approach is to develop a better understanding of established data practices as a means to inform the challenges and opportunities of the Open Data movement.

Long Abstract

Studies of open data often focus on the status and potential of making data publicly available for reuse by academic actors situated outside of the local context in which they were produced or by public actors not directly associated with academic research (Borgman 2012). This formulation of open data imagines the widest practical range of potential (re)users and invokes significant effort to prepare data for use by unknown others. Often overlooked in this approach is the assessment of data practices that occur in fields with a tradition of data sharing that would not be considered 'open data'.

In this study, we shift the focus from concerns of public access to a stratified account of data sharing practices. We expand the conceptualization of openness to include epistemic concerns, such as: facilitating discussion about the practicalities of making data reusable, confronting concerns about transparency and validity, foregrounding concerns about globalization of research, and drawing attention to the commodification of data (Leonelli 2013:7).

To achieve this, we investigate data sharing practices within three fields: Soil Science, Human Genetics, and Digital Humanities. Empirically, we draw on interviews of key actors involved with data collection, analysis, and deposition. With this disciplinary mix, we expect to find new and emerging roles associated with data, and multiple configurations of data sharing within each of the selected cases. The aim of this approach is to develop a better understanding of established data practices as a means to inform the challenges and opportunities of the Open Data movement.

The role of samples in the "birth" of data

Author: Gregor Halfmann (University of Exeter)  email

Short Abstract

This paper explores the role of samples and materiality in the creation of long-term, oceanographic plankton data. In my case study, the “birth” of the plankton data crucially depends on the creation, handling, and manipulation of samples, which I aim to consider as epistemic things.

Long Abstract

In light of recent considerations of scientific data as constituting a "life cycle" or "journey", this paper zooms in on a local case of a data life cycle's first stage, the "birth" of data. The paper is based on an empirical study of the production of spatio-temporal oceanographic plankton data at the Sir Alister Hardy Foundation of Ocean Science in Plymouth, UK. The aim of the paper is to reconstruct the process of data production and focus on the relation of samples to scientific data. The samples in this case are produced by mechanical devices, which filter the ocean water through silk while being towed behind commercial ships, squashing organisms between two silk layers. The handling and analysis of these samples in order to produce data require careful manual steps of manipulation, microscope usage, and counting. I will consider a view of the silk samples as an example of epistemic things and their creation as the beginning of the history of an object, from which a data life cycle as well as a sample life cycle originate. I aim to discuss the requirements of displacing and using the samples for data creation, what exactly distinguishes samples and data, whether samples might be regarded as a specific form of data or vice versa, and if the "birth" of data is preceded by an inevitable "birth" of samples.

Data and natural history: Do museums dream of digital insects?

Author: Tahani Nadim (Museum für Naturkunde Berlin)  email

Short Abstract

Based on ethnographic work at the Natural History Museum Berlin, I attend to questions of loss in the mass-digitization of natural history collections. Combining the sociology of data and infrastructural studies, I query the nature of digital specimens and the hopes and promises pinned on them.

Long Abstract

Big natural history museums have begun digitizing their collections on an industrial scale. Digital production lines are turning molluscs, pressed plants, microscopic slides and millions of insects into data objects. The National Museum of Natural History Paris has digitized most of its herbarium; the Natural History Museum London has just begun mass-digitizing their 80 million specimens; the Naturalis Biodiversity Center in Leiden has already made more than 30 million specimens digitally available; and the Natural History Museum Berlin has recently completed a pilot project mass-digitizing 10,000 insect drawers. The ambition driving these efforts and iterated in national and international roadmaps is capacious: increase accessibility of collections, rationalise collection management, aid preservation , facilitate monitoring and conservation, allow for discoveries "born from the data", address societal needs and interests. Driven by the prospect of irrecoverable loss and decay and the promises of globally coordinated data-intensive biodiversity science, the production of digital specimens has thus emerged as a key response to institutional, environmental and political pressures. Yet, production of digital specimens is characterized by a distinct register of losses and absences (What gets digitized when, by whom? How is it made intelligible, for whom?). In this presentation I wish to attend to questions of loss by examining data and digitization practices at the Museum für Naturkunde Berlin. Based on ethnographic work at the museum, my analysis combines insights from the sociology of data and infrastructural studies to problematise the nature of digital specimens and the hopes and promises pinned on them.

Data friction and the power dynamics of meteorological data infrastructures

Authors: Jo Bates (University of Sheffield)  email
Paula Goodale (University of Sheffield)  email
Yuwei Lin (University for the Creative Arts)  email

Short Abstract

We compare three cases in which people are engaged in efforts to reduce and/or maintain “friction” in the movement of meteorological data between different sites, and explore the role of data friction in the emerging power dynamics of meteorological data infrastructures.

Long Abstract

Whilst data can be mobile between different sites of data generation, processing and use, they do not often 'flow' easily. As data move they experience "friction" (Edwards, 2010) which slows down or blocks their movement. These frictions are significant to the ways in which data and their complex socio-material contexts are co-constituted. It is therefore important to observe sites of potential, blocked and lack of movement, and think critically about the unseen "conflicts, disagreements, inexact or unruly processes" (Edwards et al, 2011) shaping data movements in a seemingly harmonious aggregated data infrastructure.

Through development of the "data journeys" methodology, we identified sites within the UK's meteorological data infrastructure where people are working to reduce and/or maintain data "friction" for different ends. Here, we discuss findings from three such sites. Firstly, the struggles of archivists, climate scientists and citizen scientists working on the Old Weather Project to recover 'lost' data from the logs of historical ships and move them into the ICOADS dataset. Secondly, policy developments aimed at "opening" meteorological data for commercial re-use to spur innovation in the weather derivatives industry. Thirdly, the maintenance of "data friction" through the commercialisation of data in an effort to sustain the physical infrastructure of Sheffield Weston Park weather station - one of the oldest stations in the UK - in the context of deep public spending cuts. The paper contributes to understanding about how power dynamics shape the movement of data through knowledge infrastructures, and demonstrates a new methodology for capturing such insight.

New energy data in the making: meaning, value and governance

Authors: Mette Kragh-Furbo (Lancaster University)  email
Gordon Walker (Lancaster University)  email

Short Abstract

New ‘smart’ metering technologies and associated software enable a more dense, spatially and temporally differentiated view of patterns of energy use. Yet, what does it take to make this ‘smart’ energy data meaningful? Preliminary thoughts and research findings are discussed.

Long Abstract

New 'smart' metering technologies and associated software have made it possible to know energy consumption in new spatial and temporal terms. The mundane world of metering is being transformed by organisations that are marketing hardware, software and analytical services that can enable a much more dense, spatially and temporally differentiated view of patterns of energy use. The 'smart' meters and their associated milieu of infrastructures of different forms, and devices of knowledge management, data processing and data representations can then be said to give birth to 'new' data on energy use, in which this flow of data across 'smarter' energy grids enables new ways of knowing and evaluating energy consumption as well as generating new possibilities for the active governance of energy demand. Yet, what does it take to make this data meaningful? While this kind of energy data is attributed value because of its (imagined/presumed) capacity to govern energy demand and enable participation in energy markets, how does this play out in practice? And in line with the track's focus on the birth and death of data, does 'smart' energy data ever become 'stuck' or 'dead' data? (cf. Dawn Nafus 2014). In this paper, we present preliminary thoughts and findings from a research project on the governance of energy demand in 'smarter' local grids within large organisations in the UK.

Hidden Cooperative Specialization in a High Energy Physics experiment

Authors: Emiko Adachi (RIKEN)  email
Yasunobu Ito (Japan Advanced Institute of Science and Technology (JAIST))  email

Short Abstract

In this case study, we examine collaborative work of a high energy physics experiment and demonstrate the divisional cooperation scheme.

Long Abstract

In this case study, we examine collaborative work of a high energy physics experiment "The PHENIX Collaboration" at Brookhaven National laboratory's Relativistic Heavy Ion Collider. Since all authors are treated equally, its papers have 500 authors listed in an alphabetical order. However, they have precise distribution of roles inside the collaboration. Also, they know who and how contributed to a paper. We conducted interviews with current and former scientists in the collaboration and analyzed the author's transition for 140 papers between 2000 and 2014 for the quantitative study. The PHENIX Collaboration has a ten-step process for the creation of scientific papers, including internal review mechanisms. Through the ten-step process, we find the divisional cooperation scheme in the collaboration. Detector constructors mainly focus on the construction or upgrade a sub-system and data collection. Computer specialists support data collection, data calibration, and production job. They contribute to steps prior to data analysis and they are hardly involved in physics data analysis. Almost all detector constructors and computer specialists are senior and have tenure positions in universities or institutions. This case study demonstrates how detector constructors, computer specialists, and data analysts cooperate to produce a paper inside the collaboration.

Data Analysis and the Perceived Value of Data

Author: Jessey Wright (University of Western Ontario)  email

Short Abstract

Data often requires significant analysis to be used as evidence. Distinguishing between perceived and actual value, I use the interpretation of a meta-analysis of neuroimaging data to show that the intuition about an analysis technique determines the perceived value of data.

Long Abstract

Faced with 'big', or otherwise complex data sets, scientists use analysis techniques to isolate data patterns that are relevant to their research. In this paper I show how the perceived value of data is, in part, determined by the methods available for probing the content of data, and the intuitive understanding of what the patterns isolated by those methods are about. This value can change over time as new techniques are developed, and as the conceptual understanding of existing techniques changes. To demonstrate this I review a recent debate over the interpretation of meta-analyses provided by NeuroSynth, an online database that correlates brain activation coordinates and terms used in neuroimaging publications. Neuroimaging data are subject to significant processing and analysis in order to isolate patterns in the data that can be used as evidence. The specific patterns isolated, and their interpretation, depends on a conception of the phenomena under investigation and what patterns are regarded as evidence. The claim prompting the debate is that patterns isolated by NeuroSynth's 'reverse inference' analysis can support claims about the selectivity of brain regions for cognitive functions. The disagreement is between the published authors and the database developer, and rests on a different intuition, or understanding, of what NeuroSynth's automated analyses are about. I show that an intuitive understanding of analysis techniques determines the perceived value of data, which can be distinct from its actual value. I conclude by situating this in the context of philosophical discussions about conceptual practices in data-intensive science.

Molecular Tumor Boards: data interpretation in the age of sequencing

Authors: Alberto Cambrosio (McGill University)  email
Pascale Bourret (Aix-Marseille Université / SESSTIM)  email
Sylvain Besle (INSERM)  email

Short Abstract

Based on the comparative analysis of the activities of Molecular Tumor Boards in North America and Europe, the paper explores the co-production of data and their interpretation within these collective forums devoted to the discussion of the results of the genomic analysis of patient tumors.

Long Abstract

The adoption of high-throughput technologies in oncology has led, among other things, to the establishment of a new kind of institutions, referred to generically as Molecular Tumor Boards. MTBs provide a forum for clinicians, molecular biologists, and bioinformatics specialists to discuss the results of the genomic analysis of patient tumors, and make therapeutic recommendations on that basis. To analyze sequencing and gene expression data, MTBs resort to a heterogeneous set of evidential resources, including a number of genomic databases, publications, clinical trial results, previous experience with other patients, and basic knowledge about mutations and genetic pathways, all to be related to the singular clinical trajectory of individual patients. While individual MTBs share a common purpose of providing an "informed" data interpretation, the means to reaching that goal differ from one MTB to the other, from the actual composition of the MTBs, to the extent to which molecular results are taken for granted or questioned, the resort or not to prioritization algorithms, the extent to which they are followed, and pragmatic considerations such as access to specific drugs. Based on the comparative analysis of the activities of several MTBs in North America and Europe, the paper explores the co-production of data and their interpretation within these institutions., emphasizing their situated aspects, and in particular how the definition of what counts as relevant data is not only an input for data interpretation, but also the outcome of interpretative practices grounded in the definition of what may count as actionable molecular alterations.

Spurious Categories: A study of data-model symbiosis in the Human Brain Project

Authors: Christine Aicardi (King's College London)  email
Tara Mahfoud (King's College London)  email

Short Abstract

We question the data-model boundary: When does data become model and model, data? How is data made strategic in this lifecycle? How has the data-model separation in the Human Brain Project fostered a human infrastructure where power/knowledge relations are disputed across the data-model divide?

Long Abstract

The Human Brain Project is a European Commission Flagship research project in computational neuroscience, of which a primary goal is to build multi-level models for the simulation of mouse and human brains. It is presented as a data integration project rather than a data production project, producing only 'strategic data' to complement and complete existing datasets required to build the models. The project extends the rhetorical separation between data and model to the organisation of the project as well as the design of its technological infrastructure, which will provide formats and tools for data collection, classification, storage and interpretation. This paper draws on fieldwork in the Human Brain Project to look specifically at Hippocampus neuron reconstruction. Through this case study, it questions the boundary between model and data, and illustrates different aspects, present in this digital reconstruction, of what Paul Edwards calls 'data-model symbiosis' (Edwards, P. N. 2010. A vast machine: Computer models, climate data, and the politics of global warming. MIT Press). In particular, we ask: When and how does data become model and model, data? How is data made strategic in this data-model life-cycle? We argue that the data-model separation in the Human Brain Project has fostered the development of a complicated 'human infrastructure' where power/knowledge relationships are being disputed and re-negotiated between participants across the data-model boundary.

Preconditions, Procedures and Potentials: Data in Post-Genomic Cancer Research

Authors: Imme Petersen (University of Hamburg)  email
Regine Kollek (University of Hamburg)  email

Short Abstract

The variety of omics data and their large volume require bioinformatics approaches in data handling and processing. This paper discusses the impact of bioinformatics on cancer research and the consequences it may have for the process of knowledge production in biomedicine.

Long Abstract

In biomedical research, high-throughput technologies for molecular profiling are employed to produce large amounts of data on genomes, transciptomes, proteomes and epigenomes, often summarized under the label of omics data. The variety of such data and their large volume require new approaches in data storage and processing. Bioinformatics tools have been used to resolve the challenges of systematising, integrating and sharing these large data stocks. At a first glance, the application of bioinformatics simply seems to support data handling. However, a careful analysis of the acquisition and processing of such data in cancer research will show that they are systematically transformed by the supporting bioinformatics procedures. In particular, automation, standardization and quantification of omics data are regarded as necessary preconditions before they are regarded as reliable, valid and finally useful for cancer research. However, these procedures are usually not seen as being an integral part of the research process itself. In contrast to this we argue that data processing not only supports scientific endeavours, but also fundamentally influence the nature of data, and hence the knowledge produced. This argument is based on ongoing empirical analysis of research consortia that use omics data in post-genomic cancer research in Germany. Using qualitative ethnographic methods, we will trace how omics data are processed by bioinformatics requirements, how scientists using such data in research perceive such processing and which consequences for knowledge production arise from it.

The Life and Death of Big data in Education

Authors: Assunta Viteritti (University Sapienza )  email
Orazio Giancola (Sapienza - Università di Roma)  email

Short Abstract

The paper aims to investigate the constitution of data in education at international level and analyze how different configurations, articulations and extensions of the two macro-actors, IEA and OECD, contribute to create living or dead data.

Long Abstract

This paper intends to investigate, in the STS perspective, the constitution of big data in education and aims to explore the different actor-net emerging through data collection, dissemination, assemblage and analysis at international level. The relevance of international space of education has grown over the years thanks to the many devices, network and sociomaterial infrastructure produced by Large Scale Assessment (LSA) that have become the new socio-technical actors at European and global level. In LSA the two most important networks in the world are IEA and OECD. The first includes the most important LSA as PIRLS (reading literacy), TIMSS (math and science) and CIVED (civicness). The OECD education network produces various recursive LSA among which: PISA (math) and PIACC (adult literacy and numeracy). The paper aims to achieve a reflection on the effects produced by the two major networks in terms of life or death of data. LSA data product values, labels, conceptions, allies and enemies. They are neo-liberist government devices but also producers of alignments and participation. The paper aims to analyze how different configurations, articulations, extensions, artefacts, knowledges of the two main marco-actors (IEA and OECD) contribute to co-create living data (open and public, visible and usable by scholars, media, governments, lobby groups and institutions around the world) or dead data (produced by narrow networks, used only by academics without becoming live, public sociomaterial objects).

Data Phantoms: The Uncanny Lives of Data Assets

Author: Mary Ebeling (Drexel University)  email

Short Abstract

Health data are collected and made "useful" in medical marketing. I explore how data brokers make assets out of health data. These assets are unleashed by marketers and become "data phantoms" that haunt patients, primarily through personalized health marketing.

Long Abstract

My paper focuses on how data phantoms haunt health information networks in the United States. Such information is often assumed to be "private data," as it is produced by patients under legislative privacy protection, but these data, in fact, undergo innovations and are packaged into data assets to be sold to data brokers, like IMS Health Inc. and Experian plc. I discuss how innovation transforms health information from "dead" matter into "lively" data, and in the process gives birth to both data commodities and data ownership claims made by third parties.

The bio-data asset is imbued with a "phantom-like objectivity" that takes on a power and agency of its own (Marx 1976, Vol. 1:128). The social conditions of capitalist medicine and the "Big Data" industries help to construct the data asset. In fact, a market logic suffuses the anonymization, repackaging, and abstraction of health data (Rajan 2006, 42). This is the value of the data asset and through this transmutation, the data goes on to live a life of its own in the databases of clinics, pharmaceutical companies, health informatics analysts, data brokers, credit card companies, and many other unconnected companies that directly profit from the buying and selling of private health data.

Throughout the paper I consider how these interventions raise questions about what is considered public and private information, who may lay claim to data ownership and why, and how those tensions are exploited by US capitalist medicine and the healthcare industries, especially those companies working in digital health.

(Re)making data: A case study of the Data Documentation Initiative (DDI)

Authors: Judit Gárdos (Hungarian Academy of Sciences)  email
Natasha Mauthner (University of Aberdeen)  email

Short Abstract

Using the Data Documentation Initiative as a case study, this paper explores how data archiving classification systems and standards are (re)making data, and the social sciences more generally, in historically- and culturally-specific ways.

Long Abstract

Over the past two decades the archiving of research data within the social and the natural sciences has increasingly become subject to regulation. Research funding organizations, Universities, and academic journals are institutionalizing data archiving as a normative practice while many data archives are implementing standardized classification systems for archiving and sharing data. One example is the Data Documentation Initiative (DDI) international metadata standard for statistical and social science data. DDI comprises DDI-Codebook used for describing data at the archiving stage of the research process, and DDI-Lifecycle which conceptualizes the entire research lifecycle in terms of data conceptualization, collection, processing, distribution, discovery, analysis, repurposing, and archiving. DDI constitutes itself as a neutral and passive classification system, which enables comprehensive description of data for discovery and analysis, and allows effective data sharing. Drawing on STS literature which challenges both the taken for granted-ness, and assumed innocence, of classification systems (Foucault, 1970; Derrida, 1994; Ritvo, 1997; Bowker & Star, 1999; Waterton, 2002; Bowker, 2005; Sommerlund, 2006), our paper approaches the DDI as an object of study in order to explore how DDI embeds and enacts a historically- and culturally-specific conception of the nature of 'data', and social science more generally. Following Barad (2007) and Derrida (1994), and building on our existing work in this area (Mauthner and Gardos, 2015, Mauthner, 2016), we further investigate how the DDI materializes power through a dual process of embodying a specific conceptualization of data (and social science), and naturalizing this 'privileged topology' (Derrida 1994: 3).

Beyond the deluge. Data and its invisible work.

Author: Jerome Denis (Mines ParisTech)  email

Short Abstract

Both advocates and detractors consider data as powerful entities. Beyond such obviousness, the history of the emergence of data in organizations and the ethnography of data work foreground the richness of such work, the conditions of its invisibilization, and the fragility of data enactments.

Long Abstract

Advocates and detractors of big or open data projects generally share the very idea that data are steady and powerful entities. Wether described as a new oil the circulation of which will improve transparency and innovation, something that pours in like rain and changes the way science and politics are made, or a technology of governance that performs unquestioned realities and reifies new inequities, data seem to be defined in a same positivist, or realist, ontology in which their very existence is taken for granted, and their agentivity is assumed as the result of intrinsic properties. In this communication, I propose to question such obviousness and highlight the ecology of visible and invisible work (Star & Strauss, 1999) it performs. I will first show that the earliest investments in standardized information and the emergence of data as a valuable resource within organizations (Beniger, 1986; Yates, 1989; Agar, 2003; Gardey, 2008) are tightly linked to the mechanization and the invisibilization of information work. I will then draw on two ethnographic studies (in the back office of a bank, and a start-up that works with French administrations) to explore some aspects of today's data work and the conditions of its invisibilization. This will allow to foreground the fragile and uncertain process through which very different — sometimes undefined — things progressively and temporarily become data.

This track is closed to new paper proposals.