Skip to content

Literature Review

Margeaux Spring edited this page Aug 12, 2018 · 1 revision

D4D Code of Ethics Literature Review

If you’re working on summarizing a resource, please put your name down next to/under the title, and write your notes/summaries in this markdown page, just under the title of the resource, as you go through it.

Big Data: The Perspective of the American Statistical Association Committee on Professional Ethics

(6 pages, a summary of the document below) @lilianhj

Goal is for ethical statistical practice by “all who handle and analyze data of any size or complexity”, big data raises no new ethical issues regardless of how large or complex the dataset or analysis

It does provide new contexts (for applying the same principles):

  • New participants, many of whom are not part of a community with a strong culture of data-related ethical norms
  • Data analysts/programmers may not know the laws of probability needed to distinguish randomness from significant patterns

SOLUTION:

  • transparent reporting of assumptions, tests, and methods, and contextualization of results, so trained analysts can identify mistakes made by algorithms/untrained analysts.
  • More complexity, in each aspect of big data and with the overall system often being beyond the understanding of a single individual, e.g. the merging of diverse data sources.
  • Hard to track provenance and ensure that final use is consistent with the terms of collecting the initial datasets

Data quality is hard to ensure under a mindset of simply collecting whatever is available, regardless of its value

HAVE TO:

  • be aware of this
  • Greater rewards, and greater temptation
  • Financial profit
  • Singling out vulnerable groups for special treatment, in the name of some greater social good

HAVE TO:

  • recognize this risk exists

Audience for guidelines:

  • “those whose primary occupation is statistics and those in all other disciplines who use statistical methods in their professional work”, including anyone regardless of job title or specific field who uses and reports statistical analyses and their implications

Voluntary for all, ASA does not play a role in enforcement

GUIDELINES:

  • Professional integrity and accountability
  • Basically intentions and mindset
  • Use relevant and appropriate data/methodology, no favoritism or personal prejudice
  • In a manner INTENDED to produce valid, interpretable, reproducible results
  • Do not knowingly accept work if unqualified
  • Be honest with the client about limitations of expertise
  • Consult other statisticians if need be
  • Understand the assumptions that are required by different methods in order to achieve interpretable results

Integrity of data and methods

  • Be aware of and upfront about intrinsic limitations, biases, and degree of reliability of the data
  • Include disclaimers and any suitable weighting when reporting non-representative data analysis
  • Do not assume that the dataset itself is what is of interest, and that existing patterns in the dataset are more important than what is representative or generalizable
  • Responsibilities to science, public, funder, client Keep interests of public, funder, client/customer, professional colleagues, scientific community in mind
  • Support valid inferences, transparency, and “good science in general”
  • Understands and abides by confidentiality requirements established by data provider and/or legal requirements
  • Guards privileged information of employer/client/funder

Responsibilities to research subjects

  • Respects and protects rights and interests of human and animal subjects at all project stages
  • Protects the privacy and confidentiality of subjects and their related data
  • ANTICIPATES and seeks approval for secondary and indirect data uses (including data linkage) when obtaining subject consent
  • Gets subject consent to allow for peer review and independent replication of analysis
  • Recognizes, considers, and is sensitive to how information is framed and statistical descriptions are carried out, in order to avoid stereotypes and disproportionate harm to vulnerable groups
  • Must recognize that all decisions made in an analysis can have cascading effects beyond the publication of results, as these results are deemed

ACTIONABLE

  • Responsibilities to (non-statistician) research colleagues
  • Ensure transparent and sufficient documentation for another trained analyst to understand and evaluate the method
  • Responsibilities to other statistics practitioners Mutually respectful discourse, focusing on scientific principles, rigorous methods, and substantive data interpretations
  • Promote sharing of data and methods as much as appropriate and compatible with other ethical obligations
  • Ensures transparent documentation for replication/reproducibility, metadata studies, etc
  • Responsibilities regarding misconduct allegations
  • Avoids misconduct and questionable scientific practices, knows how to handle each
  • “Avoids condoning or appearing to condone incompetent or unethical practices”

Responsibilities of statisticians’ employers

  • Understand and respect statisticians’ obligation of objectivity
  • Respect and rely on the expertise and judgment of the qualified statisticians they have hired

IMPLEMENTATION OF GUIDELINES:

  • Active outreach program for several years, to make resources widely available
  • Resources include guidelines, ongoing detailed discussion of guidelines, case studies, links to other educational resources *Solicit and create case studies from real practice around the world
  • Liaising with other organizations and course instructors that deal with data analysis in any field, to ensure familiarity with the guidelines
  • Understand differing cultural contexts, and how these guidelines could be interpreted/disseminated in other cultures and different ethical traditions

Ethical philosophical alignment *Overcoming linguistic barriers

Ethical Guidelines for Statistical Practice (14 pages, an expanded version of the above document) @victoria.t

Interesting Content in the Discussion section about specific cases or additional details too!

Purpose of the Guidelines

  • Aid statisticians/(any profession involving the use of statistics) make decisions ethically
  • set the expectation stakeholders of statistical results should have

Professional Integrity and Accountability The ethical statistician:

  • selects methodology based on understanding of limitations, ease of interpretation and reproduction
  • rejection of work beyond expertise
  • consults reputable peers for clarification
  • stands by quality of work

Integrity of Data and Methods The ethical statistician:

  • is transparent on any assumptions made during the statistical analysis
  • explains and shares data sampling techniques(source, creation)
  • discloses any sponsors of analysis
  • open to corrections if found

Responsibilities to Science/Public/Funder/Client The ethical statistician:

  • presents various different equally valid scenarios varying in scope, cost or precision
  • fully outlines any consequences of failures in sampling or plan
  • shares knowledge to provide benefits to society

Responsibilities to Research Subjects

The ethical statistician:

  • protects privacy and confidentiality of sensitive data
  • knows any applicable legal limitations and refuses to use unethically obtained data as determined by individuals who provided the information,
  • is aware of any biases and stereotypes inherent in the data

Responsibilities to Research Team Colleagues The ethical statistician:

  • aims to promote transparency in all areas of analysis,
  • avoids cutting corners for the sake of delivering towards a deadline

Responsibilities to Other Statisticians or Statistics Practitioners

The ethical statistician:

  • create suitable documentation and provides * clear steps to reproduce analysis
  • contribute meaningfully to the work of others, through thoughtful peer reviews
  • teachers others the practical value of statistical analysis

Responsibilities Regarding Allegations of Misconduct The ethical statistician:

  • avoids accusation of others for perceived incompetent statistical analysis, instead engages in an open discussion

Responsibilities of Employers, Including Organizations, Individuals, Attorneys, or Other Clients Employing Statistical Practitioners

  • Employers of statisticians are expected to: recognize that guidelines exist and are meant to protect statisticians and consumers
  • recognize that studies may not have desired outcome, and prevent from influences any results

Data validity checklist for data journalists (1 page) @becki Checklist document with a series of questions for self-evaluation of data used in journalism Topics covered:

  • Source of data
  • If and how the journalist manipulated the data
  • Who reviewed the data, and how
  • Attribution and giving cited sources, people, and orgs a chance to respond
  • Inclusion of methodology
  • Open-source considerations - publishing to Github, README

Four ethical priorities for neurotechnologies and AI (4-page journal article) @margeaux, @maria

It might take years or even decades until brain-computer interface (BCI) and other neurotechnologies are part of our daily lives. But technological developments mean that we are on a path to a world in which it will be possible to decode people's mental processes and directly manipulate the brain mechanisms underlying their intentions, emotions and decisions; where individuals could communicate with others simply by thinking; and where powerful computational systems linked directly to people's brains aid their interactions with the world such that their mental and physical abilities are greatly enhanced.

This article posits that existing ethics guidelines are not sufficient for AI

Four areas for ethical priority

  • privacy and consent
  • agency and identity
  • augmentation
  • bias

Current investments:

  • Private, including - Kernal, Neuralink (Elon Musk)
  • Gov’t - US BRAIN initiative (federal-level investment), DARPA
  • Academic - Duke Univ, Univ of Wash
  • Tech - IBM, Google, Apple, FB

Privacy and Consent

  • Extensive data trails already exist
  • Algorithms that are used to target advertising, calculate insurance premiums or match potential partners will be considerably more powerful if they draw on neural information
  • neural devices connected to the Internet open up the possibility of individuals or organizations (hackers, corporations or government agencies) tracking or even manipulating an individual's mental experience
  • citizens should have the ability — and right — to keep their neural data private
  • Opting out of neural data collection
  • Restricting the central processing of neural data

Agency and identity

  • As neurotechnologies develop and corporations, governments and others start striving to endow people with new capabilities, individual identity (our bodily and mental integrity) and agency (our ability to choose our actions) must be protected as basic human rights.
  • Add clauses protecting such rights ('neurorights') to international treaties, such as the 1948

Universal Declaration of Human Rights

  • creation of an international convention to define prohibited actions related to neurotechnology and machine intelligence, similar to the prohibitions listed in the 2010 International Convention for the * Protection of All Persons from Enforced Disappearance
  • protect people's rights to be educated about the possible cognitive and emotional effects of neurotechnologies

Augmentation

  • The pressure to adopt enhancing neurotechnologies, such as those that allow people to radically expand their endurance or sensory or mental capacities, is likely to change societal norms, raise issues of equitable access and generate new forms of discrimination.
  • guidelines should be established at both international and national levels to set limits on the augmenting neurotechnologies that can be implemented, and to define the contexts in which they can be used — as is happening for gene editing in humans.
  • outright bans of certain technologies could simply push them underground, so efforts to establish specific laws and regulations must include organized forums that enable in-depth and open debate
  • efforts should draw on the many precedents for building international consensus and incorporating public opinion into scientific decision-making at the national level
  • recommend that the use of neural technology for military purposes be stringently regulated
  • any moratorium should be global and sponsored by a UN-led commission

Bias

  • When scientific or technological decisions are based on a narrow set of systemic, structural or social concepts and norms, the resulting technology can privilege certain groups and harm others.
  • Public discussions and debate are necessary to shape definitions of problematic biases
  • Countermeasures to combat bias must become the norm for machine learning
  • Probable user groups, including those that have been marginalized are involved/have input in the design of algorithms and devices for machine learning/AI from the first stages of development

2016: A Year of Data-Driven Confusion (1 blog post) @emkg

This post by Zara Rahman references numerous external resources we should look into. Rahman unpacks some of the issues surrounding the two major events in 2016 that shook two of the largest economies in the world.

Both the campaigns for the US Presidential Election and the UK’s vote to leave the EU were characterized by “relentless statistical crossfire”—where it was rarely obvious if incomplete, biased, or blatantly wrong information was being cited. Sometimes data was used this way on purpose, as it was in the UK where prominent members of the Leave campaign admitted to exaggerating and making false promises to win votes. But other cases were more ambiguous, as with polling data for the US election.

Rahman advocates for building critical data literacy, and in that effort she highlights the following:

  • “thick data” — where context is priority to simplicity. Complexity cannot be taboo, rather, it should be understood that complexity can bring nuance to light where simplified quantification can obscure the truth beyond recognition
  • diversity — diverse points of view are critical to combat bias. This means we need more diversity in the field with respect to experiences, ethnicities and identities, but also diversity of thought should be welcome. It should be seen as a sign of strength to push back on conclusions and ask tough questions.
  • responsible approaches to data (especially re: responsibledataforum.io) — where an important cornerstone is understanding power dynamics, especially how the least powerful actors are affected by the process at hand
  • the principles of Feminist Data Visualization:
  • rethink binaries
  • embrace pluralism
  • examine power and aspire to empowerment
  • consider context
  • legitimize embodiment and affect
  • make labor visible
  • applying the principles of feminist data visualization more broadly to data generally, i.e.:
  • rethinking quantification
  • prioritizing context
  • thoughtfully considering consent
  • examining power structures and dynamics etc

Questions for issues that are still unclear:

  • what are the implications if misleading, uncertain, or inaccurate poll data impacted voter behavior?
  • what can be done to build critical data literacy alongside media literacy beyond techno-solutionism and online courses, especially if we acknowledge that trust and community are vital to how people get and analyze information? Especially with communities with less access to technology in mind?
  • how should we hold each other accountable? because of the fallacy of false precision and other biases, simply correcting false figures is not broadly effective. Likewise, bad actors often see their efforts lead to victories, as in the UK’s vote to leave the EU

Using Ethical Reasoning to Amplify the Reach and Resonance of Professional Codes of Conduct in Training Big Data Scientists (22 pages) @lilianhj

  • Recommends introducing ethical reasoning (i.e. the ability to reason ethically) rather than trying to agree upon/predict all the relevant ethical issues/content areas
  • Implicit distinction between skills (in general ethical reasoning) and knowledge (in domain-specific content, particular actions, etc)
  • The idea is to use ethical reasoning skills to explore and improve one’s understanding of the specific terms/content outlined in a code of conduct, or rather, using the specific code of conduct terms as content on which to practice more general ethical reasoning skills
  • Proposes a model for training people in ETHICAL REASONING, drawing upon codes of conduct they may already be familiar with

Focus on:

  • Ethical reflection - thinking about the ethical implications of iterative decisions made throughout a project - rather than mastery of a set of facts/topics
  • Building capacity for ongoing discussion
  • A curriculum outlining desired knowledge/skills/abilities, with difference performance levels (rather than specific assignments/tasks)
  • Introduce students to main domains/fundamental moral imperatives (e.g. honesty, fairness, social responsibility, honoring confidentiality, handling misconduct, respecting diversity)
  • Give them the chance to learn, practice, and get feedback on ethical reasoning skills
  • Opportunities to engage in conversation
  • Ethical reasoning skillset involves:
  • Identifying and assessing one’s prerequisite knowledge (to what extent is one competent to judge, design, and execute an analysis?)
  • Recognizing a moral issue
  • Identifying relevant decision-making frameworks (e.g. utilitarianism, social justice)
  • Identifying and evaluating alternative actions
  • Making and justifying a decision on the moral issue
  • Reflecting on the decision
  • Determining the success of an ethical training program/code of conduct can be done by evaluating its alignment with the following goals:
  • Importance/relevance to ethical or responsible conduct of research/practice
  • What this means for a code of conduct: ensure that at least one professional code of conduct is introduced and discussed, develop its main domains up to a particular target level
  • Identifying and addressing concrete deficiency
  • What this means for a code of conduct: address a lack of training in ethical reasoning
  • Targeting something that is actually, observably amenable to active intervention
  • What this means for a code of conduct: describe detailed levels of participation that are observable and assessable by participants themselves (and any instructor)
  • Achievement of intentions is documented/documentable through quantitative or qualitative outcomes
  • What this means for a code of conduct: have a self-assessment that is justified through prior work Achievement of intentions results in detectable and meaningful change
  • What this means for a code of conduct: people who participate should demonstrate skills (ethical reasoning) and knowledge (professional conduct) that is currently not observed

Feasibility

  • What this means for a code of conduct: offer a complete syllabus/set of resources that can be used or adapted

Suggestions for implementation:

  • Integrate a syllabus into academic programs that train students to collect and/or use big data
  • Make resources available for people at any stage of the career trajectory
  • Differentiate levels of training and achievement depending on the professional level of the practitioner
  • Provide ways for individuals to document their higher-level achievements

Data Science Association Code of Professional Conduct (1 webpage) @becki

This is a very detailed, thorough list. Would recommend having another set of eyes review this one!

A detailed list of principles and actions for data sciences to adhere to in their work.

  • Rule 1 defines 56 terms related to data science
  • Rules 2-4 govern client interactions
  • Data scientists shall provide competent service to clients
  • Scope of services between data scientist & client
  • Data scientists shall not counsel clients to engage in criminal/fraudulent activity
  • Communication with Clients
  • Rule 5 covers confidential information
  • Defines what is confidential
  • How and why it should be protected
  • Exceptions to the protection of confidential information (preventing reasonably certain death or bodily harm, crime, or fraud)
  • Rule 6 covers conflicts of interest
  • Services for clients may not have adverse effects on other clients
  • Services for clients may not limit the data scientists’ abilities to serve other clients
  • Rule 7 covers duties to prospective clients
  • Do not use or reveal information provided by prospective clients
  • Rule 8 covers data science evidence, data quality, and evidence quality
  • Share all info and results, even if adverse
  • Rate the quality of data and disclose that rating to clients
  • Rate the quality of evidence and disclose that rating to clients
  • Take reasonable remedial measures when clients are misusing data, persuade the client to use data appropriately
  • Take reasonable measures when clients engage in criminal or fraudulent conduct related to data science services (including notifying proper authorities)
  • Also listed are 14 things data scientists must not do, related to transparency and honesty of their work and the data involved, including process adherence and transparency (scientific method, no cherry-picking)
  • Design algorithms and machine learning systems to avoid harm, fully disclose risks to the client, persuade clients to use algorithms and machine learning systems appropriately
  • Use diligence when using terms “statistically significant”, “correlation”, “causation”, “spurious correlation”
  • No cherry picking, no presenting incomplete evidence as complete
  • Question assumptions; avoid engaging in consequentially distorting assumptions
  • Recognize, disclose, and factor “agency problems” (agents may hide risk and structure relationships to their benefit)
  • Recognize, disclose, and factor risks in using data science
  • Use the data science method:
  • Careful observations of data, data sets, and relationships between data
  • Deduction of meaning from the data and different data relationships
  • Formation of hypotheses
  • Experimental or observational testing of the validity of the hypotheses
  • Rule 9 covers misconduct, including
  • Violating the Code of Conduct itself
  • Committing criminal acts
  • Engaging in dishonesty, fraud, deceit, or misrepresentation
  • Engaging in prejudice
  • Misusing results to communicate a false reality or promote an illusion of understanding

Cabinet Office Data Science Ethical Framework (17 pages) - @lilianhj

Six principles of framework:

  • Start with clear user need and public benefit
  • Think about what decisions might be made as a result of the insights obtained
  • Weigh against risk to privacy
  • Weigh against risk of mistakes and unintended negative consequences
  • Understand the probability of achieving this benefit
  • Identify metrics for assessing the benefit attained
  • Use data and tools that have minimum necessary intrusion
  • Consider: what data from what sources?
  • How sensitive is it (how much would people care)?
  • How identifiable is it?
  • Minimum data necessary to achieve the project aim

If working with sensitive personal data, safeguard privacy through:

  • De-identifying individuals
  • Aggregating to higher levels
  • Querying against datasets through APIs rather than accessing whole dataset
  • Using synthetic data
  • Consider training on smaller datasets about people of interest and then applying this to larger datasets Intrusiveness in taking action as well;
  • consider using social media data to spot activity trends and then alerting local service providers, rather than the government taking action themselves

Legal requirement:

  • reasonable steps must be taken to ensure individuals are not identifiable when you link data or combine it with other public-domain data
  • Create robust data science models
  • Consider quality and representativeness of data
  • Use techniques like shadow analysis to spot bias
  • Code in affirmative action to remove bias
  • Include metrics showing who the data is representative of, who is being excluded
  • Understand how decision was made, as far as algorithm allows
  • Note if algorithms are using protected characteristics (e.g. ethnicity) to make decisions
  • Have processes for detecting and fixing errors
  • Regular testing when new data is added
  • Algorithms make tradeoffs – consider false positives, false negatives, and which matters more in this situation

Quality assurance for results

  • Create an accuracy rating for findings so people can decide whether/how to use the findings
  • Ensure the accuracy rating stays with the findings throughout as they are passed on
  • Maintain clear provenance of findings as they are passed between different parties
  • Think through unintended consequences
  • An incorrect decision may cause distress or other unintended negative consequences for someone; does the risk outweigh the benefit?
  • Maybe give people choices in the final decision made, e.g. offer them different service options
  • If cannot guarantee protection from risk, then be clear upfront about the existence of the risk
  • Draw on knowledge of domain/policy/operational experts
  • Human oversight rather than solely automated decision-making
  • Combine different findings/results to inform a decision, rather than relying solely on one algorithm; e.g. independent teams corroborating each other’s findings

Be alert to public perceptions (Mostly relates to existing data sources)

  • Understand how people would reasonably expect their personal data to be used
  • “fair, proportionate, and compatible with the original purpose for which it was collected”
  • Did users give informed consent for personal data to be used for certain purposes?
  • Are users aware of all the data being used?
  • Are users aware of the storage of their data?
  • Understand the different terms and conditions and levels of consent associated with social media data, commercial data, data scraped from the web, etc
  • Be aware of changes in public perceptions
  • The public cannot easily distinguish between the ethics of data science (how an insight is produced) and the decision/intervention taken as a result of this insight – in other words, to the public, the end justifies the means to some extent, or at least, whether they approve of the intervention informs/influences whether they approve of the data usage
  • If data is freely volunteered, a badly-handled project may affect whether people choose to provide/share their data in future
  • Be as open and accountable as possible (Mostly relates to collecting new data)
  • Be transparent about the intent of the project
  • Be as open as is feasible about tools, data, and algorithms (but have to moderate this in situations where openness jeopardizes the aim of the project, e.g. in identifying illegal activity)
  • Let people know about the project benefits and its impact on collective and individual outcomes
  • Explanations in plain English
  • Inform people at the point of collection about how their data will be used, or notify people that their existing data is being used
  • Make it possible for people to view, extract, and correct their data that is being held by the government
  • Give people recourse to appeal/contest decisions they think are wrongly made
  • Build in oversight and accountability throughout the project
  • Oversight of initial purpose and method
  • Oversight of how this is implemented/executed
  • Not just a single decision-maker – consult others
  • May bring in external oversight for complex projects
  • Transparency helps to both mitigate unethical behavior and secure public acceptance
  • Keep data secure
  • Have security measures in place to keep data from being lost or stolen
  • Control who can access the data and for how long
  • Use registers and APIs to hold data separately/distributed and draw it together, rather than holding it in bulk, so it is safer from attack
  • Set how long the data is stored, and how it is deleted
  • Legislation like the Data Protection Act is crucial in providing guidance on collecting, storing, sharing, processing, and deleting data
  • Smaller organizations/departments have their own specific guidelines
  • Principles are interlinked, e.g. proving public benefit helps decide what risks are justified and so what methods/data should be used, e.g. using sensitive data

Why an ethical framework is needed:

  • Rapid changes in technology and public opinion
  • Large amounts of data, decisions being made without human oversight
  • Aim is to contextualize new technology in existing/relevant law
  • Aim is to prompt data scientists and policymakers to responsibly consider public reaction when innovating with data
  • Ultimately, it is up to the government decision-makers to decide how the public benefit of doing the project stacks up against the risks of the project

Goals:

  • Respect for privacy
  • Nobody experiences unintended negative consequences Set the public at ease (through transparency and engagement)

How framework is used:

  • Guiding principles used to think through and ask appropriate questions at each stage of a data science project
  • Checklist of the principles
  • Checklist is a set of scales, decisionmakers indicate where their project falls on each scale (e.g. high vs low risk of unintended negative consequences, automated decision-making vs high level of human oversight)
  • Recognizes that some departments/projects may intrinsically fall low/high on the scales due to the nature of their work; this does not mean that the project should not go ahead, but recommends that the decision-makers be aware of these issues, consider them carefully, take extreme care and oversight, and try to bring in some positive elements if possible
  • A standardized impact assessment, i.e. a form where the decision-maker describes their project and how it aligns with the principles (what steps have been taken to maximize benefit, minimize privacy risk, etc), basically ensuring that some thought has at least gone into each of these elements, even if they can’t all be achieved
  • Detailed case studies of real examples, positive and negative
  • Practical suggestions of what can be done to act ethically

IEEE Code of Ethics (1 webpage) victoria.t

concise and easy to understand 10 point list outlining a set of principles to which members of the IEEE (Institute of Electrical and Electronics Engineers) agree to follow

  • accountability to end users
  • transparency in education of limitations/assumptions
  • honesty of any personal/ulterior motives
  • improvement and refinement of technical skills fairness in treatment of others, accepting criticism, and collaboration among peers
  • few end notes describing the process on how changes are to be made to IEEE code of ethics

When Data Science Destabilizes Democracy and Facilitates Genocide (1 blog post) @mo, @maria

Examples of BAD data science

  • Inflammatory Russian propaganda to 126 million Americans on Facebook
  • Ethical cleansing justification in Myanmar
  • VW engineer that doctored air pollution results
  • Parents in jail to increase prison sentences

Data science is foundational to Facebook, it determines:

  • What is shown
  • Who it is shown to

Technology is inherently about humans, when creating, consider:

  • Human psychology
  • Sociology
  • History

Important to ask (bc less than <.5% of pop knows how to code)

  • How could trolls use your service to harass vulnerable people?
  • How could an authoritarian government use your work for surveillance? (here are some scary surveillance tools)
  • How could your work be used to spread harmful misinformation or propaganda?
  • What safeguards could be put in place to mitigate the above?

Examples of runaway feedback loops

  • men expressing more interest than women in tech
  • meetups (Meetup changed algorithm so this did not happen through a feedback loop)
  • Facebook encourages this bc once someone is in a group, FB suggests similar groups to join
  • Predicting crime reinforces the neighborhoods where people are arrested

Myths:

  • “This is a neutral platform”,
  • “How users use my tech isn’t my fault”, “Algorithms are impartial”

Reality: there is no neutral platform, some claim restricting users would restrict free speech but dominant site users can actually restrict free speech of underrepresented groups

Data is bias, examples include

  • Google Photos automatically labeling Black people as “gorillas”
  • Software to assesses criminal recidivism risk that is twice as likely to mistakenly predict that Black defendants are high risk
  • Google’s popular Word2Vec language library creating sexist analogies such as man→computer programmer :: woman→homemaker.
  • Neural networks learning that “hotness” is having light skin
  • An app to compare job candidates’ word choice, tone, and facial movements with current employees, which Princeton Professor Arvind Narayanan described as AI whose only conceivable purpose is to perpetuate societal biases
  • Google Translate converting gender neutral sentences to “He is a doctor. She is a nurse”

AI Research Is in Desperate Need of an Ethical Watchdog (1 news article) @becki

  • Regardless of researchers’ intent, impact can be negative - tools designed for good can be used for evil (intentional or otherwise)
  • Data scientists need clear ethical guidelines to prevent accidental harm
  • Review boards use rules developed 40 years ago, designed to protect participants in real-life interactions
  • If using a database for a study (no interaction with humans) it’s not clear that a review board is required
  • Review boards are not allowed to to evaluate a study for potential social consequences
  • Researchers in data science are taking ethics into their own hands
  • Pervade (Pervasive Data Ethics for Computational Research) - group putting together a clearer ethical process for big data research

HOWTO design a code of conduct for your community (1 webpage) @margeaux

Examples: Django Rust, Civic Data Alliance

  • Establishing clear lines around what is acceptable vs not acceptable behavior for your community or even can mitigate harassment, promote inclusion and widen participation in your project or org
  • An org or event will be more efficient and pleasant for current participants
  • A code of conduct can attract new people to your org
  • A transparent code of conduct draws on open source models

An effective Code of conduct must include 3 things

  • List specific common behaviors that are not okay
  • Include detailed directions for reporting violations
  • Have a defined and documented complaint handling process

In the process of crafting a code of conduct, your document should address the following:

  • Who adopts and enforces your community code of conduct?
  • What are the consequences for violating your code of conduct?
  • What are the consequences for violating your code of conduct?
  • BE SPECIFIC about what is not acceptable behavior as this is what actually makes your code of conduct effective
  • Prevents mods/admin burnout
  • Encourages people to report as they know they will be taken seriously
  • Sends a signal to people joining your community that they will be heard and safe in your space in a way that a general statement will not, as general statements can give cover to harassers/abusers via the ability to engage in Tone arguments
  • Asking people to attempt resolution by discussion is used as a delaying tactic
  • Asking people to attempt resolution by discussion is used as a way to abuse people longer.

Lower Priority For Now:

General: PERVADE – Pervasive Data Ethics for Computational Research @ilanalight

(1 website, a lot of interesting publications linked. There’s a lot to dig into, so I’m prioritizing the more digestible resources in the previous section, because I don’t want us to get overwhelmed right now. But if anyone wants to have a shot at anything here, feel free!)

This group is concerned with computation and use of pervasive data, which are datasets containing deeply personal information and derived at scale from pervasive technologies such as IoT devices, wearable devices and social media platforms. They are concerned with “the growth in the scale, scope, speed, and depth of human data research”.

The team run empirical projects to evaluate computational research ethics. Technical investigations - the need for mathematical methods to determine when release of sensitive or “de-identified” data poses risk to subjects

Seem to look at kaggle competitions and evaluate the balance between infromation about identity shared (entropy) and accuracy of the predictive algorithm.

I assume these two are in some kind of trade-off. Ultimately, they aim to provide some kind of metrics to help data practitioners decide how much data can be shared or used to balance information (entropy) sharing and the nhe need for predictive accuracy by algorithms.

Responsible Data Forum (1 website, lots of great material, looks like a pretty similar initiative but specifically oriented towards the advocacy/nonprofit community. Again, there’s so much it might be overwhelming right now, so we don’t have to summarize it for the first meeting, but this looks like a great resource to gradually work through)

Bit By Bit: Social Research in the Digital Age chapter 6 (1 chapter of an online book, about more general research ethics, including the All Our Ideas platform we’re using)

The Ivory Tower Can’t Keep Ignoring Tech (1 news article, specifically about the potential role of academia in data science ethics)

Legal Theory Lexicon: Rules, Standards, and Principles (1 blog post, general notes on rules vs standards vs principles) @chavan

Legal norms can be formulated as: Rules

  • Most constraining and rigid
  • Application of rule to fact decides issue
  • Hard-soft rule continuum
  • Defines triggering conditions and consequences
  • Choose for predictability and certainty, guide future conduct

Standards

  • Intermediate level of constraint
  • Guides decisions (May allow for balancing factors)
  • Exhaustive set of considerations
  • Choose for fairness and sensitivity

Principles

  • Least constraining
  • Non-exhaustive set of considerations
  • Do not resolve legal issues by themselves
  • Choose for giving legal form across contexts
  • Example: Eligibility for being President of the United States

Rule

  • Hard rule: Should be 35 years of age
  • Soft rule : Should be an adult

Standard

  • Should be sufficiently mature

The tech industry needs a moral compass (1 blog post) @margeaux

The Designers Accord / Initiatives (1 website, on logistics of building out and sustaining a code of ethics)

We Can’t Trust Facebook to Regulate Itself (1 opinion article)

boyd,danah,andKate Crawford. “Critical Questions for Big Data.” Information, Communication & Society 15, no. 5 (June2012): 662–679. doi:10.1080/1369118X.2012.678878.

O’Neil, Cathy. On Being a Data Skeptic, Sebastopol,CA: O’Reilly Media, 2013.

Hardt, Moritz. “How Big Data Is Unfair: Understanding Sources of Unfairness in Data Driven Decision Making.” Medium, September 26, 2014

Crawford,Kate. “The Hidden Biases in Big Data.” Harvard Business Review, April 1,2013.

Barocas, Solon, and Andrew Selbst. “Big Data’s Disparate Impact.” California Law Review 104, no. 3 (June 2016).

[Peppet, Scott R. “Regulating the Internet of Things: First Steps Toward Managing Discrimination, Privacy, Security, and Consent.” Texas Law Review 93, no. 1 (2014): 85–176.](http://www.texaslrev.com/wp-content/uploads/2015/08/Peppet- 93-1.pdf)

Feminist Data Visualization (5 page article) @emkg

Engaging the Ethics of Data Science in Practice (2 page article)

Roughly labeled by theme: The Brutal Fight to Mine Your Data and Sell It to Your Boss (1 news article) (data ownership and provenance) @chavan

  • “people analytics” is a new term, the concept is not
  • In 1917 psychologists evaluated young men being drafted into the U.S. Army
  • US Military and Intelligence used psychological evaluation and aptitude testing
  • Data-scraping bots are only one of the technologies forcing us to rethink privacy protections.
  • HIQ created as a defence against recruiters by identifying restless employees
  • LinkedIn complained about unfair data scraping and made it hard to scrape
  • Refers to "The Constitution in Cyberspace: Law and Liberty Beyond the Electronic Frontier" by Laurence H. Tribe
  • What were the public squares and private rooms of the web?
  • Who got to determine access?
  • Should data be protected as speech? If so, how?
  • HiQ sued LinkedIn to force LinkedIn to let HiQ use its data HIQ arguments
  • Only scraped data that as publicly available (LinkedIn users could hide some data)

LinkedIn’s defense

  • Violated user agreement
  • Threatened LinkedIn users’ privacy
  • the key distinction wasn’t between public and private or visible and invisible, but between a person browsing a website and a bot brigade copying data at scale.
  • “Tomorrow’s monopolies won’t be able to be measured just by how much they sell us,” the authors wrote. “They’ll be based on how much they know about us and how much better they can predict our behavior than competitors.” (https://hbr.org/2017/07/the-next-battle-in-antitrust-will-be-about-whether-one-company-knows-everything-about-you) HiQ won the case.

AI Can Be Made Legally Accountable for Its Decisions (1 article) (transparency and openness; responsible communications) @chavan

This article addresses the question: "How to make AI accountable for its decisions without stifling innovation?"

It summarizes the work of Finale Doshi-Velez and others at Harvard University Accountability of AI Under the Law: The Role of Explanation “Computer scientists, cognitive scientists, and legal scholars say AI systems should be able to explain their decisions without revealing all their secrets.”

  1. The problem
    1. We need an explanation when AI systems deliver results that are:
    • unacceptable
    • difficult to understand
    1. Challenges making AI systems explain decisions:
    • Requires considerable resources to build and use
    • Can reveal businesses' trade secrets
    • Human-understandable explanations of complex AI-generated results
  2. What is an explanation?
    • Explanations are reasons or justification for a decision
    • Not necessarily a description of the decision-making process
  3. Explanation should answer questions such as:
    1. What were the main factors?
    2. Would changing a factor change the decision?
    3. What two similar cases lead to different decision?
  4. When is an explanation needed?
    1. When benefit outweighs cost:
    2. Morally
    3. Socially
    4. Legally
    5. When decision as impact on person other than decision-maker
    6. When there is reason to believe decision was erroneous
    • Example: Influenced by irrelevant factor
    1. When decision benefits one group unfairly
    2. Increase trust with consumers
  5. How?
    1. Explanation system should be separate and distinct from proprietary AI Model
    2. AI systems should be held to same standards of explanation as humans. For exanmple: cases of strict liability, divorce, or discrimination; for administrative decisions; and for judges and juries.
    3. In future, AI systems may be held to another standard

Can A.I. Be Taught to Explain Itself? By Cliff Kuang Nov. 21, 2017, The New York Times @chavan @emkg

This article introduces the emerging field of Explainable A.I. (X.A.I). The article provides background on AI and ML and many interesting stories.

I have summarized broad concepts below:

As machine learning becomes more powerful, the field’s researchers increasingly find themselves unable to account for what their algorithms know — or how they know it. This is sometimes called the “black box problem”.

This article gives the example of the story of asthmatics with pneumonia which eventually became a legendary allegory in the machine-learning community. One reason explanations are important for black box style algorithms is that while their predictions seem accurate, they are often based on extremely incorrect inferences. I.e., a machine will find that patients with asthma are more likely to survive pneumonia and spit that out as a fact without the context that pneumonia patients with asthma are given priority care and more attention as a rule in hospitals.

In May 2018, the European Union will begin enforcing a law requiring that any decision made by a machine be readily explainable, on penalty of fines that could cost companies like Google and Facebook billions of dollars. European Union’s General Data Protection Regulation. Some say it is too vague and unenforceable.

This quote offers a punchline to why these new laws matter with respect to the ethics of the way data is being used: “Taken together, Articles 21 and 22 introduce the principle that people are owed agency and understanding when they’re faced by machine-made decisions.”

A.I. reasoning works differently human reasoning but A.I. from must nonetheless conform to the society we’ve built — one in which decisions require explanations, whether in a court of law, in the way a business is run or in the advice our doctors give us.

There may have to be as many approaches to XAI as there are to machine learning.

There is a legal and ethical motivation for explainability: Even if a machine made perfect decisions, a human would still have to take responsibility for them — and if the machine’s rationale was beyond reckoning, that could never happen.

Some approaches to XAI for deep learning include: Modular approach to using smaller understable neural nets which can be combined like Lego to take on more complex problems

Another is when a neural net comes up with a result it looks at data to find the best example matching the decision (similar to firefighters who classify a fire based on 12 types)

Using two neural nets lashed together -- one to do the task and the other to explain the result. The Hamlet Strategy

XAI research represented a new kind of science in which machines could access truths that lay beyond human intuition. The problem was reducing what a computer knew into a single conclusion that a human could grasp and consider.

Feedback from algo fairness community about this article summarized by @nkrishaswami:

  • Too much space for Kozinski's dodgy research
  • Nice summary of Caruana's case study
  • Decently, if unoriginally, lays out salient issues
  • Large and rapidly growing community of researchers: hundreds, not a few dozen
  • Side-eye at the the author for managing to sample a field with many prominent, accomplished women contributors and including only quotes and research results from men
Clone this wiki locally