Sprint Updates

29 May to 9 June

Resolved turn order bug in Viral Spiral Cancel power (fix: turn order does not change when cancel is executed by...)
Changed Cancel player mechanic and made voting optional (feat: updated cancel to make voting an toggle-able feature...)
ISOC Grant
Creation of misinfocon microsite: Hello from Misinfocon India Docs | Misinfocon India Docs

17 May to 26 May

Creation of misinfocon working groups
Mozilla DFL proposal
Uli authentication backend work

1 May to 12 May

Click to expand

Misinfocon
Creating the Viral Spiral Demo video

17 April to 28 April

Click to expand

Factshala application
Internews Proposal
Dataset (80% of the way there)
FLAME talk \m/

Viral Spiral [Project Board]

Added turn to fake UI to frontend (feat: added turn to fake power button for users ·...)
Fixed turn to fake and cancel power related bugs (Merge pull request #108 from Bhargav-Dave/feat/minor-fixes...)

3rd April to 14th April

Click to expand

Viral Spiral [Project Board]

Uli

WIP exploration of ORY as an auth solution
- Implement Ory [WIP] by duggalsu · Pull Request #242 · tattle-made/OGBV

20th March to 31st March 2023

Click to expand

Viral Spiral [Project Board]

Continued work on latency testing
Added new logic for scoring: Maintain scores separately by dennyabrain · Pull Request...
Added logic to download images from the google sheet to the json data creator: feat: added download image logic to xls_to_json by...
Ingested new card data to the json: chore: updated cards.json with newly added cards by...
Progress updates now appear while waiting in the room creation lobby: feat: send progress updates during create room. closes #53...
Cards are being shown in the Viral spiral UI: feat: show card image on UI · tattle-made/viral-spiral-frontend@019e316

Uli

WIP on bringing the slur replacement and add slur to Uli features on the Web: Added functionality to use slur replacement on web by...

6th March to 17th March 2023

Click to expand

Viral Spiral [Project Board]

Added e2e testing files for the basic features of Viral Spiral: added script to simulate game between 4 players by...
Added documentation for e2e testing and deck generators: docs: added rundowns of xls_to_json.py and deck generators..., docs: updated documentation to include e2e testing by...
Created new deck generator logic:
- Proof of concept: feat: added new deck generator logic poc. Is related to #72...
- Adding logic for different types of cards: fix: fixed conditions for various types of cards by...
- Removed the storyline variable and kept logic to increase draw pool with increasing TGB: feat: added functionality to increase draw pool with tgb by...
Performed latency testing for various features of Viral Spiral

Uli

Using new parser after the changes in the Twitter DOM: Rewrote existing parser with a new logic to detect tweets...

---

20th February to 3rd March 2020

Click to expand

Viral Spiral playtesting open calls

6th February to 17th February 2023

Click to expand

Viral Spiral [Project Board]

FEAT : Changed the room creation flow. Room names are dynamically generated by the server now. This allows room links to be passed around and joined easily
FEAT : Added realtime notification to allow players know other player's actions
FIX : Only two affinities are active in a game now
FIX : all card entries now have encyclopedia entries
FEAT : Background images now change as per the total global bias
FIX : Player scores update via a broadcast from the server. User scores update almost instantaneously for every one now
FEAT : Added a script to simulate the basic gameplay with 4 players joining

23rd January to 3rd February 2023

Click to expand

Uli

Research Ory Hydra and Kratos as auth solution
Added research tab to Uli website (feat: added research tab on Uli website. Closes #225 by...)
Added ML research and feedback feature blogposts to Uli website
Added automated testing script for feedback feature in Chrome (feat: added code to test feedback in chrome by Bhargav-Dave...)
Research and work on updating and improving twitter parser
Updated documentation inthe OGBV repository (Updated Documentation by Bhargav-Dave · Pull Request #226 · tattle-made/OGBV)

Viral Spiral

Did a playtest with the writing team. Begun tracking the progress ==here==

General/Operations

Hosted Open Call
Applied to NLNet Foundation and Tactical Tech grant calls
Met BMGF's India Director for Financial Inclusion

16th January to 20th January 2023

Click to expand

2nd January to 13th January 2023

Click to expand

12th December to 23rd December

Click to expand

applied to mozfest
Attended Agami Summit
Second version of HR policy draft released, awaiting comments
Finalised Sumati as fourth member of ICC, reached out to folks for legal representatives

28th Nov 2022 to 9th December 2022

Click to expand

Uli

Released v0.1.12
- Users can now add a slur to their custom slur list by selecting a text and right clicking on it
  - Release Notes - Release Development Build : · tattle-made/OGBV
Fixed issues in the release
- Incorporated Github Action to Automate Release to Chrome and Firefox Stores
  1. Chrome : tattle-made/OGBV#73
  2. Firefox : tattle-made/OGBV#178
- Verified that slur replacement was case insensitive for Uli

Viral Spiral

Addressed Latency Issues on the backend Server
UI Polish
- Added icons for affinities
- Show all user scors on screen, removed the hovering feature
Added a reload button to enable easy refresh without leaving the room
Setup S3, Cloudfront and Route 53 configuration to host the game on the real domain - Viral Spiral

14 November to 25 November 2022

Click to expand

GitHub prioritization matrix
Coordinated with hackshackers and MCA on misinfocon.
Added the slur list and gitHub link to Uli website
Edit the Employee Handbook
Conduct book club meeting
Reach out to ICC members - conversation with Zainab, reach out to Sumati from MP
Created a page for Tattle on Itch
Emailed Illustrators

31 October to 11 November 2022

Click to expand

2 demos for Uli
GitHub project: finished document
Feminist ki Batti Jalao grant application
Sent out comments on Indian Telecom Bill, 2022

Uli

Investigated Twitter API to find ways to bring the Uli experience to Mobile
Investigated the annotated tweets used to train the OGBV ML model by performing EDA, tested preliminary hypothesis useful towards publishing the data
Finalized a stable way to add multilingual data and change languages on the website
- tattle-made/OGBV#162
Setup metabase on our k8 cluster and created a WIP dashboard for Uli
- Dashboard URL - Metabase

Viral Spiral

Setup automatic deploy for frontend - viral-spiral-frontend/gh-pages.yml at main · tattle-made/viral-spiral-frontend

Currently the webpage is deployed at Viral Spiral
Prototyped animations for card selection and passing/keeping card using react-spring

17 October to 28 October 2022

Click to expand

Uli

4 demos done
Added documentation for end to end testing and roadblocks on repository - PR Link

Viral Spiral

Merged Dockerfile and docker-compose to master. Its now possible to run the viral spiral backend along with database via a single docker-compose command
Made more progress on fleshing out the modules related to UI, state management and socket and creating a relatively clear separation of concerns. End to end play through of the game is possible now. This includes loading dummy data, a creating a room that multiple players can join. At present they are able to receive, pass or keep cards with one another. The user of super-powers like viral-spiral and encyclopaedia is pending.

OGBV ML Model

Performed EDA on the annotated tweets dataset in order to clean it for publication and garner relevant insights

---

In Person Sprint (10th October to 15th October 2022)

Click to expand

OGBV

The docker compose pushed to ogbv-ml-rest can now load a backend for Uli locally
Added Selenium scripts in order to do end to end testing of Uli features namely the slur replacement, local archiving and oGBV banner - Issue Link for Chrome and PR Link for Firefox
- Eliminated the roadblock of loading local copies of extension using '--load-extension' command for chrome and 'web-ext' framework and 'install_addon()' for firefox
- Eliminated the roadblock of automatically getting the extension id in an instance of chrome and firefox

---

One week sprint (3rd October to 7th October)

OGBV

Participated in Hacktoberfest
- Applied to be featured on the Github Social Impact blog. Final blog post here - Volunteer on digital public goods for Hacktoberfest 2022!
- Created issues for hacktoberfest participants - Issues · tattle-made/OGBV

19 September to 30 September 2022

Click to expand

Uli

Automated testing
- Created selenium scripts in order to test out censoring of tweets with slurs and those detected as ogbv on both Firefox and Chrome
Dashboard(todo bhargav)
- Added a "/dashboard" endpoint to uli - PR Link
- Added queries to return number of new users per week, number of posts archived per week and top archivers in a week

Viral Spiral

Architect Modules for UI Rendering (React), Communication (websocket) and State Management (recoil-js)

Gftw

Published first blog of project documentation via the Meshi Content Sharing write-up on Tattle website

---

5 September to 16 September 2022

Click to expand

Uli

Reduced the size of ogbv-ml-rest docker by 60% and the time to reload the container again by a similar amount by using docker/k8 volumes&#x20.
- feat: Remove "download model" logic out of Dockerfile....
Create k8 files for GKE autopilot.
- OGBV/ogbv-ml-rest/k8
Gathered questions for ML feedback feature and added a prototype of it.
Created a local visualization of ULI data, to be deployed as a dashboard.

GftW

Handed off final analysis data to team MP. On standby for data fidelity issues.

22nd August to 2nd September 2022

Click to expand

GftW

Handed off study analysis data to Team MP

Uli

Added a blogpost that functions as an entry point to get started with AWS co-pilot
Added a potential way to host Uli on co-pilot
Made significant progress on removing download model logic from the docker image, storing persistent data using volumes and making running the model much faster upon subsequent runs

Viral Spiral

Implemented e2e test to test test `viral-spiral-backend`
Added a scaffold for the game room

Organizational Stuff

Made progress on Quarter Planning and putting it into a Vision, Goal, Bets Initiative Framework

Set up organizational calendar
Starting off with Tattle Comms
Open Call on PoSH

7th August 2022 to 19th August 2022

Click to expand

GftW

Submitted the interim report [link](Interim Report for Using Web Monetization for Incentivizing...)
Sent out payments to all participants

Viral Spiral:

Dockerized viral-spiral-backend
- GitHub - tattle-made/viral-spiral-backend at feat/docker
Moved socket messages into models to document API between frontend and backend
- Added first set of message templates · tattle-made/viral-spiral-backend@aba9bb1

Website

Started working on adding support for blog posts to the website. Now any .md file added to the /blog folder shows up as a blog post on the website.
[WIP commit](feat: add source and trasnform plugins for people, blog and...)

Infrastructure

Explored Google Kubernetes Engine Autopilot and Cloud run as cost effective alternatves to our current k8 cluster on AWS.
- Happy with the scale down to
Explored AWS Autopilot as a cheaper alternative to host Uli

---

13th June 2022 to 24th June 2022

Click to expand

GitHub:

Focus group discussion with GitHub

Uli

Translate Usability Study feedback into actionable items and created Project Management Page for it - Build software better, together
Train a single model on for all languages for each label on concatenated dataset
Data analysis on existing datasets

GfW

Send onboarding emails to ~ 900 people
Figure out payments for study participants
Build automations via Github Actions to update dashboard, user status, email bookkeeping
- https://github.com/tattle-made/Comparative-Study-of-Various-Incentives-and-their-Effect-on-Content-Sharing-Behaviour/tree/main/actions/post-status-breakdown
- Comparative-Study-of-Various-Incentives-and-their-Effect-on-Content-Sharing-Behaviour/actions/post-status-breakdown at main

Viral Spiral

Begun Requirement Gathering for the digital prototype
Research into finding suitable UI engine for the project

---

30th May 2022 to 10th June 2022

Click to expand

GfW

Onboarded first 20 users to the study
Audit user data, post metrics and user metrics

OGBV/Uli

Plan Usability Study and Create artefacts for the Note Takers
Finish a feature complete User Guide
Document model results and error analysis
Make notes on related research papers

Viral Spiral

Conduct Playtesting and refine game mechanics

Kosh

Document Plans to improve Kosh Resiliency - Plan for Improving Kosh's Resiliency · Discussion #66 · tattle-made/kosh-v2

---

16th May to 27th May, 2022

Click to expand

GfW

Add support for environments on Frontend
feat : add environments and show on login screen ·...
feat : add environments support for frontend ·...
Seed Database with Indian Usernames as opposed to Random usernames

feat : add indian usernames ·...
feat : seeder script for indian usernames ·...

Add survey form post study

feat : add post study survey form ·...

Fix incorrect displaying of User Metrics

feat : move some code out of transaction ·...
fix : solve incorrect computation of User Metrics ·...

Simplify and refactor post metric collection

uniformize recording of post metrics ·...
feat : refactor component pages into one factory component...
chore : cleanup feed and post metrics ·...

Add Content Pages to the app

Add Consent Form : feat: add consent form ·...
Add Signup Information : feat : add sign up information on login page ·...

Add project intro for README

OGBV

Website

Setup Repository Structure
feat : init setup · tattle-made/OGBV@b15332f
Translated Landing page design to Code
feat : add landing section and misc styling things · tattle-made/OGBV@2c1c586
feat : misc changes · tattle-made/OGBV@6dbdcc2
Add CI/CD workflow :
feat : add CI/CD for uli website · tattle-made/OGBV@e679e5c
Create Content Pages for Landing Page and Privacy Policy
feat : add privacy policy · tattle-made/OGBV@cd22741
feat : add about page and style nav links · tattle-made/OGBV@e158dfa

Plugin\
Apply new visual styling\
Prepare Chrome Store Page and submit for review
Machine Learning
Compared the performance of different models and datasets.
Trained and evaluated models on our dataset
Started domain adaptation experiments

---

25th April to 13th May, 2022

Click to expand

GftW Project Platform Development

Add Authentication : tattle-made/Comparative-Study-of-Various-Incentives-and-their-Effect-on-Content-Sharing-Behaviour#12
Create Seeder Scripts for User and Allocations tattle-made/Comparative-Study-of-Various-Incentives-and-their-Effect-on-Content-Sharing-Behaviour#13
Write Seeder script to get posts from Google Sheet into the database tattle-made/Comparative-Study-of-Various-Incentives-and-their-Effect-on-Content-Sharing-Behaviour#14
Measure User Events : tattle-made/Comparative-Study-of-Various-Incentives-and-their-Effect-on-Content-Sharing-Behaviour#25
Add SQL Data Models : tattle-made/Comparative-Study-of-Various-Incentives-and-their-Effect-on-Content-Sharing-Behaviour#16
Add Feeds : tattle-made/Comparative-Study-of-Various-Incentives-and-their-Effect-on-Content-Sharing-Behaviour#26
Measure Secondary Events : tattle-made/Comparative-Study-of-Various-Incentives-and-their-Effect-on-Content-Sharing-Behaviour#29
Add triggers to advance users through study phases : tattle-made/Comparative-Study-of-Various-Incentives-and-their-Effect-on-Content-Sharing-Behaviour#15
Render Different screens based on User's Progress in the study : tattle-made/Comparative-Study-of-Various-Incentives-and-their-Effect-on-Content-Sharing-Behaviour#31
Show User metrics based on their study groups : tattle-made/Comparative-Study-of-Various-Incentives-and-their-Effect-on-Content-Sharing-Behaviour#32
Include real content on the UI : tattle-made/Comparative-Study-of-Various-Incentives-and-their-Effect-on-Content-Sharing-Behaviour#40
Add Post Study Survey : tattle-made/Comparative-Study-of-Various-Incentives-and-their-Effect-on-Content-Sharing-Behaviour#42
Build Fail Safe for Bad Network : tattle-made/Comparative-Study-of-Various-Incentives-and-their-Effect-on-Content-Sharing-Behaviour#41
Enabled and facilitated payment system to study participants using Razorpay

OGBV Project

Coordinate with the Visual Designers
- Finished Visual Design of Landing Page and User Guide
- Design Asset Handoff done
- Created gifs to illustrate various features
Built Proof of concept for using the hand-drawn patterns as border around divs in a way that is responsive to different screen widths

border-2 border-one

Machine Learning :

Completed training models on all existing datasets for 3 languages
Evaluated the performance of trained models on the test dataset for all 3 labels
Wrote script for domain adaptation

---

11th April - April 22, 2022

Click to expand

OGBV Project

Improved the core DOM parsing part of the code. With some assumptions, it should be able to quickly respond to changing DOM structure by modifying a line of code here
Added support for plugin features on search result page
Relying more on event listeners as opposed to time based solutions for running the tweet processors. Could be improved further to run reliably.
Wrote script for model benchmarking.
Setup model training pipeline to evaluate various model architectures and tested the pipeline.

Viral Spiral

Wrote a script to take data from a Google Sheet and convert it into PDF of cards for playtest.

Kosh

Added factcheck database to kosh and indexed around 3000 posts.
Identified some pattern of failures in indexing which need to be debugged before proceeding further.

Fact-checking scraper

Added appropriate error handling and logging files.

Annotator

Created Wireframes for features related to administering and managing annotation projects using the Annotator.

---

March 31, 2022 to April 9, 2022 (in person sprint)

Click to expand

Fact Checking Database:

pushed three new fact checking sites to production. Scraped historical data
Wrote tracker script to check status of fresh hits to fact checking sites data

Kosh

Tested the script to add data from factcheck database to kosh

OGBV Project

Implemented the archive tweet and invoke network features at the tweet level at opposed to the URL level
It is now possible to unblur slurs momentarily
Add support for ML model based categorizing tweets via an API call
Attempted to run approximate slur detection in the browser via pydiode but could not finish it in the time needed.
Setup Mace to measure annotator reliability

---

March 7, 2022 to March 18 2022

Click to expand

Grant for Web Project

Created preliminary mock ups for the Monetization Psychological study.

monk_monetary-metrics monk_vanity-metrics

Centralized Project Management

Since we use Github Projects to plan out sprints and milestones and track progress, anyone interested in helping us with our immediate roadmap can use this project page to find issues and features that interest them - link

Improvements in Documentation for the OGBV Plugin

[browser extension README] (https://github.com/tattle-made/OGBV/tree/main/browser%20extension)
backend README
[frontend README] (https://github.com/tattle-made/OGBV/tree/main/browser%20extension/plugin)

FastAPI Integration to HF Model

Added test script for testing user queries: https://github.com/tattle-made/OGBV/tree/main/FastAPI%20-%20OGBV

---

Feb 22 2022 to March 4 2022

Click to expand

Allocation Automation

Added script for automating allocation and uploading to spreadsheet as a backup to annotation UI
Allocation is done using following criteria - the k% posts for 3 random annotators and remaining (100-k)% posts for any one annotator.
google-spreadsheet api is used to automate writing to spreadsheets.
Link - OGBV/allocation_spreadsheet_automate.py at main · tattle-made/OGBV

Annotations

Finished pairwise annotations

FastAPI Integration to HF Model

Integrated FastAPI to one of the english models (Maha/OGBV-gender-twtrobertabase-en-trac1 · Hugging Face) on local machine to validate the results

Pooling Task

Trained binary classifiers for English,Hindi and Tamil to predict the hatefulness of tweets-pushed the models to HF hub.(OGBV/pooling at main · tattle-made/OGBV)
Predicted the hatefulness of unlabelled tweets using the trained ML models.
Performed stratified pooling to pick 8000 tweets per language for annotation## Pooling Task

Stress Tested the Annotation UI:

Added prompts to alert annotators about changes in annotation.

---

Dec 21 2021 to Jan 17 2022

Click to expand

Twitter Scraper

Updated the scraper with logging, inputs from file, until-since functionality, multi threading for scrapping simultaneously: https://github.com/tattle-made/OGBV/blob/main/Scrapers/Twitter/dev_twitterscraper_replies_until_since.py
Added script for data uploading to mongo and s3: https://github.com/tattle-made/OGBV/blob/main/Scrapers/Twitter/dev_twitter_datauploader.py

Annotator Allocation

Added script for allocating posts to annotators based on the criteria defined: https://github.com/tattle-made/OGBV/blob/main/allotment.py

Slur Replacement Methodology

Added the experiment notebook containing the workflow, comparison of different techniques (Levenstein vs Damerau Levenstein) and their results: https://github.com/tattle-made/OGBV/blob/main/slur-replacement/Slur%20Replacement%20-%20Latest.ipynb

Kosh

We've been refactoring and rebuilding parts of our archive. This is nearing completion and we'll be opening up the service to a closed beta soon. In the meanwhile here are some links to follow along on the development. We'll be updating individual project READMEs soon.

Work in progress Documentation for Feluda, our Search Service - Tattle Search Documentation
Feluda Codebase - GitHub - tattle-made/tattle-api: A configurable multimedia...
The archive software - GitHub - tattle-made/kosh-v2
Project progress can be tracked here - Build software better, together

OGBV Plugin

We've been exploring ways to use our work on the OGBV project via a chrome extension to mediate user experience on twitter. We did some early prototyping on using a chrome extension to replace slurs seen on twitter UI and to add inline UI elements to provide affordances to users to act on individual tweets.

Annotation UI;

Fixed a bug on the annotation UI (link) to ensure that all users are allocated posts in the same order. This enables easy comparison of posts across users while trying to discuss annotations post facto.

Fact checking database work

Swastika wrote scrapers for two fact checking sites that are currently under testing: https://github.com/tattle-made/factchecking-sites-scraper/tree/the-logical-indian

---

November 21 2021 to December 17 2021

Click to expand

Updated the twitter scraper to scrap all the data fields(except cashtags) fetched by twint.
Added experiment notebooks for slur replacement methodology using fuzzywuzzy library,which uses Levenstein distance under the hood.
Added script for text extraction using different tools such as py-tesseract and easy-ocr and compared the results.

Annotator UI

The app now remembers what post you were when you last used the app. So when you relogin or reopen the window, your session will resume for your last post.
Change of logic of what is completed a “complete” annotation. Now only if a user fills all of the first 3 fields in the annotation form, then it is considered complete.
Added a none of the above option to the 4th question
At the very top of the window, next to the app name you can see how many of your posts are pending completion.
I’ve made it so that if the image is larger than the window height of the post area, it will become scrollable
Seeded the database with English and Tamil posts and allocated it to corresponding annotators.

Kosh

We have on-boarded new developers and volunteers to work on Kosh. To facilitate focused development we did a tight project scoping and finalized features to be built: https://github.com/tattle-made/kosh-v2/wiki/Kosh-Sprint-Dec-21-to-Jan-22 "Kosh Sprint Dec 21 to Jan 22 · tattle-made/kosh-v2 Wiki"

The Sprint is being tracked on Github Projects. Find the open tasks here - Build software better, together

The first user story regarding token management has already been merged into the [main branch](https://github.com/tattle-made/kosh-v2/commit/31fd860c6289d36b77060deaffee7d32b23530f8] by Guru Prasad

---

November 8 to November 19 2021

Click to expand

Memebox

We participated in FossHack 2021 with a project called memebox. ( FOSS United (@FOSSUnited) | Twitter, FOSS Hack 2021 (@FOSSHack) | Twitter)
We tried to rebuild a version of our social media archiving software suited for bookmarking and cataloguing memes.
Github repo
demo video on Vimeo

OGBV Annotation

We developed a tool to annotate images for the OGBV project (Tattle - Tattle)
We vetted a few available tools (discussion here: (https://github.com/tattle-made/OGBV/discussions/2)), but went ahead with our bespoke tool mainly because of our need to have multiple users login and work on the project and support multiple languages within the software.

This would be our second iteration of a bespoke annotation tool. If anyone's working with or needs multi user annotation tools especially equipped for multimodal and multilingual data, please ping us. We should be able to help.
Added krippendorrf scores notebook implemented using simpledorf library on a sample annotators data.

OGBV Scraper

Updated the twitter scraper to upload additional fields such as whether post is reply or not, whether post is a retweet or not, language of the tweet, timestamp of scraping

---

October 25 to November 5 2021

Click to expand

OGBV project

Pushed website for OGBV project: https://tattle.co.in/products/ogbv
Made a proof of concept development for a custom multiuser multilingual annotation UI for annotating tweets to train ML model for OGBV detection.
Added Documentation for Twitter and Instagram Scrapers:https://github.com/tattle-made/OGBV
Updated the twitter scraper to scrape the tweets iterating along the dates in the given range

Search Updates

Our documentation for the search server existed as markdown files on Github. We used Gatsby to convert it into a dedicated documentation site. feluda_doc

---

October 11 to October 22 2021

Click to expand

### OGBV Added Twitter and Instagram Scrapers - it scrapes the posts, uploads the images/videos to s3 bucket and finally stores the post metadata to mongo-db: https://github.com/tattle-made/OGBV/tree/main/Scrapers

Search Updates

Added some end to end tests for the index and search end to end workflow. Changes haven't been merged to master yet but can be perused here - feat : add test for /represent image: https://github.com/tattle-made/tattle-api/commit/1e3d8d1b40c45e76afec0b299a41777b8e48d2f3

Deployed Metabase to Kubernetes

The official method for installing metabase on kubernetes does not seem to be supported anymore. We did a barebones install of metabase backed by our SQL database. The config files for this are available here: https://gist.github.com/dennyabrain/bfb01368f15fec57dd5c195ba6ecdbbb

---

Sep 6 to Sep 17 2021:

Click to expand

### Search Updates Continued work on the v 1.0 release of search engine. Refactored modules into core, features and operators. Began adding tests for the core modules. Documentation website describing the architecture and how to contribute to the search engine will be released later this month.

---

August 23 to Sep 4 2021:

Click to expand

### Fact check DB/Dashboard: * Made the number of articles in each week visible on the dashboard * Fixed a typo in the dashboard * Renamed tattle-research repository on GitHub to factchecking-sites-scraper. This was long due. Updated the documentation for the repository.

Search Improvement

Continued work on the v 1.0 release of search engine. Incorporated Gatsby to generate documentation for the project using the markdown files.

Other Tasks:

Converged on a GitHub issues conventions. Either closed or categorized old issues.
Preliminary work to collect OGBV content from Instagram at scale.

---

August 9 to August 20 2021

Click to expand

*** ### Website Changes * Created a dedicated space on our website to host updates on our Research front * We've also added links to our contributor's websites/portfolios on our community page

Search Improvement

Work on the v 1.0 release of search engine is underway. Spent the week adding unit tests to the various media operators. Updated documentation for the project too. A dedicated webpage documenting the search server in detail to come soon.

---

July 26 to August 6 2021

Click to expand

***

This sprint was slow on tech development. We focused on publishing the report: (A Case Study of the Information Chaos During India's Second Covid-19 Wave)[https://tattle.co.in/articles/covid-whatsapp-public-groups/]

---

July 12 to July 23 2021

Click to expand

***

Improvements to Search Server

Made the search server's operation modes configurable via a .yml file instead of being hardcoded in the code. Mainly, this lets us selectively enable or disable loading of heavy ML models into memory depending on whether one intends to run the server with features dependent on those models or not.

OGBV project with CIS

Pull out Tweets with slurs in hatebase, and attempt preliminary annotation. Link to Code: https://github.com/tattle-made/OGBV
Refine feature list for the tool.

WhatsApp group analysis report

Amongst other small tasks, we worked on creating a Web UI for the upcoming report that emphasizes the interactive visualizations.

June 28 to July 09 2021

Click to expand

*** ### WhatsApp Scraper/ Data *Attempted:* * cross analysis of whatsapp data with external data containing fact checked information. * word frequency analysis of text in text and image items. * number of external links

Known Issues:

Google translate issues- for example, common emoticons translated to the word 'bamboo'. Context also sometimes lost in translation.
The WhatsApp export chat with media does not export reliably.

New Issues from Analysis:

Change functionality of WhatsApp scraper to also check for variation in in-between rows in different exports of the same chat.

Building a performant interactive T-SNE visualization for web

We needed to show ~2600 images on a webpage at once in a way that encouraged users to explore the underlying dataset. We used t-sne algorithm to layout these images as nodes in 2D space

We explored possibility of html canvas and svg along with react to render these nodes. We stuck to svg for now given the ease with which we could tap into html events to add interactions to it.

Screenshots from the Work in Progress visualisations here:

T-Sne Viz

June 11 to June 25 2021

Click to expand

***

Website Tweaks

Fixed website links. Updated privacy policy. Addressed certain open issues like this, this and this.

Search Service Optimizations

Refactored the source code to separate out reusable chunks of code meant to deal with different media types - text, images and videos.

Data Collection from Messaging App

We wrapped up a small scale 2 month data collection exercise. With the goal to learn more about the kinds of conversations happening around the second wave of Corona in India, we joined 20 public messaging groups and collected media from them. We'll share observations from this exercise in a forthcoming report.
Experimented with ML based and other image processing approaches to anonymise social media content for more ethical reporting on chat apps.

May 31 to June 11 2021

Click to expand

***

Search Server

Created an API endpoint to index media by sending files directly to our search server as opposed to doing this via a S3 bucket. This substantially reduced the indexing latency. Blog post detailing this to follow soon.

Data Collection

We continued data collection from public Covid relief groups on whatsapp, that we joined in late April.

Data vizualization exploration

Building on the work we did here, we spent time exploring ways to visualize clusters of large number of image data using algorithms like t-sne and data vizualization libraries like d3.js

---

April 15 to May 28

Click to expand

***

Realtime Collaborative Annotation Engine :

We improved upon our barebones annotation web UI. This time we added support for multiple people to work on a project simultaneously and annotate media posts. Since different teams might have different metadata they want to annotate on an image, this new engine provides the flexibility to define the annotation form schema dynamically. Source code : tattle-made/collaborative-media-annotator

Supporting dra.ft members with datasets

We queried our fact check article datasets and created custom subsets containing articles of certain themes. These were made available to members of the dra.ft festival.

Search Server optimizations

As we tacked on one feature after another on our search engine, it has grown to become big in terms of its storage and memory requirements. We've been digging deep into our dependency tree and docker layers to find ways to reduce the cost of developing, iterating and running this server. We will try to document this in a blog post later on.

Data Collection from WhatsApp:

We started Collecting Data from twenty Covid Relief groups on WhatsApp. The goal is to do a small scale study on the types of conversations happening around corona on WhatsApp during the second wave of the pandemic in India.

Vaccine Hesitancy Article:

We parsed through fact checks on vaccines and created a short report on vaccine hesitancy: https://tattle.co.in/articles/vaccine-hesitancy/

Sprint March 15 2021 to April 15 2021

Click to expand

***

Kosh

Opened up the archive for a semi open beta for interested researchers to sign up and try out the platform for giving feedback
Shared early access with the folks working at https://dra-ft.site https://instagram.com/dra_ft_/. A Walkthrough is part of this webinar. Scrub to 16:45 https://youtube.com/watch?v=_Z0LT3EozQ4

Privacy and Security Audit

Signed off on privacy policy and security policy documents with our partner organisation to lay the foundation for implementing workflows and processes to prevent or mitigate any privacy or security related incidents.

Fact Check Article Dashboard

Refactored the dashboard code to make adding weekly data easier.
Added CI/CD workflow using Github actions to make production deploy automated

Scrapers

Updated scraper runs to reflect latest IFCN certification status - https://github.com/tattle-made/tattle-research/blob/master/factchecking_sites_status.md
Added scrapers for Digiteye and Newsmeter
Amended WhatsApp Scraper to accommodate lag in exported files.

Sprint Jan 1, 2021 to Jan 30 2021

Click to expand

***

Scrapers

WhatsApp Scraper Tested and pushed WhatsApp Scraper https://github.com/tattle-made/whatsapp-scraper/tree/master/python_scraper… This is a handy tool for anyone who wants to archive their WhatsApp group content. It consolidates exported WhatsApp chats into one database.
- Features tested: - deduplication across multiple exports of the same group. - manage time difference across multiple exports using correlation. - anonymize phone numbers and group names.
Restructured fact checking sites scraper, thanks to @su__deep

We discovered that our fact checking article scraper was missing articles from some domain. We took that opportunity to update the way we structure our Scrapers.
- The proposal for which can be seen here - https://github.com/tattle-made/tattle-research/issues/10… We think this structure extends to all kinds of scraping. If you are into data scraping or have improvements to suggest, please reply to the issue

DataScience

Participated in the SEMEVAL 2021 TASK 6 ON "DETECTION OF PERSUASION TECHNIQUES IN TEXTS AND IMAGES https://propaganda.math.unipd.it/semeval2021task6/https://twitter.com/proppy
Submitted results to the development and test set for the challenge for the following 3 tasks.
Given memes -
1. Classified memes with propaganda techniques given only meme text

2. Classified memes with propaganda technique and identified the matching span of text using only meme text

3. Classified propaganda in memes using both meme image and accompanying text
This collaboration happened organically between Yohan, @su__deep and @KruttikaNadig on our slack group and we are very excited to see how this plays out.
Improvement to multi-lingual search Replaced word2vec word embeddings with a pre trained Sentence Transformer Embeddings (https://sbert.net/index.html)
This pretrained model helps generate a vector representation of input text in Indian languages like Hindi, Bengali, Gujarati, Marathi, Tamil, Malayalam etc. Full list can be found here https://sbert.net/docs/pretrained_models.html… This helps us improve our multi lingual search capabilities.

Infrastructure

We've been optimizing our infrastructure costs by trying to take advantage of kubernetes features. One such optimization came in the form of the elastic search operator that helped us host elastic search in our own cluster and not use 3rd party managed offerings.

---

Sprint ending Dec 4 2020

Click to expand

***

Engaging Everyday Chat App Users:

Recruited more users for the Khoj pilot
Continued forwarding existing fact checks to their queries and shared a digest of curated fact checking articles every other day.

Search

Implemented Kubernetes volume to persistently store word2vec vectors in our infrastructure
Deploy Tattle Search and ElasticSearch cluster in our Kubernetes Cluster

Data Science

Wrote a script to enhance our weekly data analysis that will cluster all the duplicate/near-duplicate images scraped during that week - tattle-made/data-experiments
Trained an XGBoost claims detection model on our annotated social media dataset with nearly 80% cross-fold validation accuracy - tattle-made/content-relevance
Modified our 'Themes in Factchecking Articles' dashboard generation script to store the English translations of article headlines in our database as we generate them - tattle-made/data-experiments
This will allow for better data analysis as the Indian language factchecking articles in Tattle's archive will become searchable with English queries

Research

Submitted a collaborative data annotation activity proposal to Mozfest

Infrastructure

Created a helper module for our various services to connect to their respective Mongo database with one line of code - https://github.com/tattle-made/sharechat-scraper/blob/master/db_config.py
This is part of our plan to have a common set of helper modules and functions for our web scrapers, search engines and WhatsApp chat archiver
Did a proof of concept for complex data routing with RabbitMQ, which sends different data from Pandas dataframes to different queues and consumers based on routing keys for each type of data - tattle-made/pipelines

Sprint Ending Nov 8 2020

Click to expand

***

Search

Implemented multistage Docker builds for the Tattle Search API which will reduce production container sizes from 5gb each to 1.5gb each - https://github.com/tattle-made/tattle-api/tree/feature/consistent-api
Enabled video compression with ‘ffmpy’ while indexing long videos into the Tattle Search API

Data science

Deployed a Luigi data pipeline cron job to process multimedia posts from our database and flag them if they contain keywords that appear frequently in fact-checking articles

Engaging Everyday Chat App Users

Recruit 3 users (7 more pending) for the Khoj Pilot Study. All users are in the demographic of 40-60 year old women.

Created whatsapp sharing friendly images.
Created a Draft of the Guiding Document for the pilot

Sprint Ending Oct 26 2020

Click to expand

***

Search

Made the existing Tattle Search API consistent with our newer Simple Search service. This included changing the API endpoints and conventions and adding Rabbitmq queueing to make it easier to index media in bulk - https://github.com/tattle-made/tattle-api/tree/feature/consistent-api

Data science

Applied perceptual image hashing to 10k images from our database and had good results for duplicate and near-duplicate image search with pHash. We documented this in a technical blog recently - https://blog.tattle.co.in/clustering-similar-images-with-phash/
We tried out a bigrams-based approach for clustering factchecking articles and found that it was useful for identifying the top stories in a time window - https://github.com/tattle-made/data-experiments/blob/master/bigram-clustering.ipynb

Engaging Everyday Chat App Users

Brainstormed ideas with the team to understand everyone’s ideas about what the scope of the Khoj Pilot with 40-60 year old users should be. Defined categories of behavioural nudges that would be effective with our demographic to slow down the spread of misinformation

Sprint Ending Oct 11 2020

Click to expand

***

Infrastructure

Issues in developing, deploying and debugging 5 different docker containers led us down a rabbithole of Kubernetes tooling.
We evaluated a lot of them and liked K9s (CLI for cluster and resource management), Stern (for streaming logs from multiple containers), Telepresence (for debugging containers inside a cluster using standard tools like debuggers).
Tired of entering long kubectl commands repeatedly, we’ve now switched to these oh-my-zsh aliases (https://github.com/ohmyzsh/ohmyzsh/tree/master/plugins/kubectl))
Started using this github action (https://github.com/steebchen/kubectl) to apply changes to a deployment and get its status. Example here (https://github.com/tattle-made/simple-rt-search/blob/1f83c7ffcb760edbb4b599732388bd7f6113501e/.github/workflows/staging-deploy.yml#L50))

Engaging Everyday Chat App Users in Verification

We’ve been conducting weekly (soon to become biweekly starting this week) open discussion sessions to frame questions around our upcoming Khoj Pilot. Minutes of meeting will continually be updated here https://github.com/tattle-made/docs/wiki/Khoj-App-and-Community-UI-Working-Group

Search

We committed scripts for a) bulk indexing media from datasets into our simple realtime search engine based on date range or random batch selection
b) reporting the success/failure of each one back to the database via an additional Rabbitmq queue and receiver and c) retrying failures - https://github.com/tattle-made/sharechat-scraper/tree/development

Data science

We made a small but meaningful change in how we model the visualisations in our Themes in Factchecking Articles dashboard, and retrospectively applied it to past visualisations.
We are now using the article's date of publishing instead of our date of scraping as the basis for the visualisations - https://github.com/tattle-made/data-experiments/commit/82a8b9d6db3a57fd4a0a224f5edc5a09e60dffaf

FOSS contributions

Shoutout to Sudeep for helping us enforce code formatting consistency in one of our Python repositories by adding pre-commit hooks for code formatting using 'black' and PEP style using ‘flake8’ - https://github.com/tattle-made/sharechat-scraper/pull/11

Sprint Ending 27th September 2020

Click to expand

***

Infrastructure

We incorporated AWS ALB Ingress Controller (https://kubernetes-sigs.github.io/aws-alb-ingress-controller/guide/ingress/annotation/) for routing traffic to our services via a single AWS Application LoadBalancer. We were using a single LoadBalancer per service earlier and are hoping this optimization will cut costs.

Search

We created manifest files for our real time search engine so that we could deploy it to our Kubernetes cluster. We will be uploading them to Github shortly after removing the confidential fields. Installed RabbitMQ into our cluster vial Helm Charts and created a custom service to expose its Web Management UI
Committed scripts to index images and videos from our database into our simple real time search engine. This will allow exact image / video search for media that has circulated on social networks. The search will also return information about the matched media’s source - https://github.com/tattle-made/simple-rt-search/tree/development

Whatsapp Scraper

We incorporated Service Accounts into our Whatsapp Scraper. This makes it easy for others to archive whatsapp groups that they are part of. Once someone exports the chat from a whatsapp group to their google drive, they can then share that folder with our service account’s email id. This will enable our bot to scrape the data and archive it. Email us at [email protected] us if you are part of any whatsapp group whose content you wish to archive.

Data science

Launched an interactive dashboard for exploring themes in factchecking articles we scrape each week - https://services.tattle.co.in/khoj/dashboard
Began testing and tuning our multimodal content relevance classifier on a dataset of 2000 Hindi social media posts that we annotated in-house

FOSS contributions

Shoutout to @duggalsu for fixing a bug in one of our social media scrapers that was causing ‘reposted’ media to be saved incorrectly

Sprint Ending 4th September 2020

Click to expand

Build a high quality Sharechat dataset

We annotated 2000 Sharechat posts for 7 categories with moderate to excellent inter-annotator agreement.
The annotated dataset will be released later along with details about the sampling strategy, categories and methodology.

Data science

We have started sharing weekly data insights in the form of data visualisations on our public channels. These show evolving trends and viral content (including some viral misinformation) on Indian social media and chat apps. GitHub commit - https://github.com/tattle-made/data-experiments/blob/master/eda_insights_templates.ipynb
We worked on a simple realtime search engine that can be used to index audio, video and images. There are a spectrum of search needs in the misinformation domain but the need to identify exact duplicates of images, videos and audios and the ability to retrieve metadata associated with them forms the bulk of the challenge. We’ve fine tuned our simple search project as a standalone repo and added documentation for it here - https://github.com/tattle-made/simple-rt-search

Engaging Everyday Chat App Users in Verification

We finalized our illustrations for the app. A sneak peek here :

Sprint Ending 21st August 2020

Click to expand

***

Build a high quality Sharechat dataset

We expanded the Sharechat scraper’s scope to include tag creation dates, unique tag identifiers, reported/rejected post counts and verified account status. This allows deeper analysis of temporal trends, influencer generated content and the platform’s approach to content curation within News / Politics / Health related content tags.
We deployed a ‘virality scraper’ that tracks the likes, shares and views of fresh Sharechat posts from day 0 to day 5 of their lifespan and will provide insights into the life cycle of posts containing misinformation. For instance, this scraper would let us identify content that goes viral in 24 hrs.

Engaging Every-Day Chat App Users in Verification

Search plays a crucial role in reducing manual effort across different aspects of fact checking. Using Elasticsearch we are able to search across different textual fields across our app - user query, community response, metadata etc. Youtube Demo
We used Appbaseio to prototype quickly and efficiently.

Sprint Ending 7th August 2020

Click to expand

Stabilize Infrastructure

Archive server ReplicaSet has been deployed successfully on the k8s dev cluster, with a single redis pod
With this, all the primary PoCs for Kubernetes are completed, and the basic streamlined deployment pipeline is in place
We stumbled upon a strange bug wrt redis deployment that hasn’t been fixed yet. Details here.
We’ve setup Sematex to monitor our app logs and infrastructure health. This enables us to debug deployed software in real time and builds the foundation for optimizing the resources taken up by our software.
Monitoring of the ShareChat Scraper has been implemented in the Sematext app

Sematext

Engaging Every-Day Chat App Users in Verification

Further progress was made on the Community Response section of the app. We have great support for showing responses of the following type - text, image, url.

The query submission page underwent a visual design revamp. Usability concerns regarding the “Choose from screenshot” and “Use Recently Copied Text” persist and are up for feedback.

Corresponding to this is the Community UI that lets any verified member of the community answer user queries coming from khoj users. Youtube Demo The section of interest is the Response section that lets us respond to a user with text, image and a url of an article. We are opening up the community UI for a closed pilot. If you would like to join and be part of a community which responds to queries regarding whatsapp messages/misinformation for uncles and aunties around the country, ping us.
We have added the following illustrations in the onboarding screens for the Khoj App. They are still undergoing modifications

onboarding_1 onboarding_2

Create a High Quality WhatsApp Dataset

The WhatsApp data dump contains user phone numbers that we dont want to store in our database. We have implemented a technique to obfuscate phone numbers in a way that retains privacy but also allows so basic analysis

Build a high quality Sharechat dataset

We have integrated duplicate detection into our scraping cron jobs and are now archiving 5k unique posts per day. We have started tracking the sources of the duplicates to help improve our scraper targeting.
We are storing HTML previews of the daily scraped posts for the convenience of journalists, researchers and anyone else who’d like to work with our data. These include media thumbnails and metadata in a tabular format.

Sprint Ending 24th July 2020

Click to expand

Stabilize Infrastructure

Kubernetes was configured to trigger from Github actions
Sharechat Scraper Service now gets built, uploaded, and deployed to 2 k8s replica pods on commit
SCS cron job is deployed and tested (awaiting testing for re-deployment with static Docker image tag)
SCS REST server CICD on k8s is implemented and tested
Khoj API is deployed and CICD on k8s is implemented and tested
You can see the Github workflow files for the ShareChat scraper REST API and Khoj API here and here

Engaging Every-Day Chat App Users in Verification

We’re exploring different ways to engage users of the Khoj android app. The app now supports showing multiple type of community response. Here in the image you see a red box that we are calling the “summary_card”, that summarizes the query and response in a shareable byte. You also see a feedback section in the image that lets a user tell us if they are happy with our response.

Introducing Ruhi Maasi We have been using this stakeholder internally as a stand in for the WhatsApp user we want to get to eventually. We now have a visual representation of her that we plan to use in our product illustration

Create a High Quality WhatsApp Dataset

The WhatsApp scraper community UI now enables users to moderate scraped WhatsApp messages. This include deleting useless messages, tagging them. Youtube Demo

Build a high quality ShareChat dataset

We now have over 200k unique ShareChat posts along with their metadata in our database. This includes images, videos and text from tags related to Politics, Health and Patriotism in Hindi, Marathi, Bengali and Rajasthani.

Content Relevance Pipeline

We have completed building a multimodal machine learning pipeline that can classify if a piece of multilingual multimedia is relevant to the Tattle archive with ~80% accuracy. Relevant is defined as that which could be potential misinformation or is of historical value. This will help us triage new data we collect from WhatsApp and other sources.