A Python tool to bulk download Gong.io call transcripts for analysis and content creation, with enhanced participant tracking and filtering capabilities.
- Bulk Download: Download transcripts for multiple calls in a date range
- Resume Capability: Resume interrupted downloads
- Multiple Formats: Save data in JSON, formatted text, and CSV formats
- Year-based Organization: Automatically organize files by year
- Enhanced Participant Tracking: Detailed participant information and statistics
- Participant Filtering: Easy filtering and analysis by specific participants
- Multiple Organization Methods: By date, speaker, and participant
Perfect for extracting insights from your sales conversations:
- Content Creation: Mine transcripts for LinkedIn posts, blog content, case studies
- Sales Training: Identify best practices and common objections
- Product Insights: Discover feature requests and customer pain points
- Competitive Intelligence: Understand competitor mentions and positioning
- Sales Performance: Analyze team performance and coaching opportunities
- Python 3.8 or higher
- Gong.io account with API access
- Technical administrator permissions in Gong (to access API settings)
-
Clone or download this project:
git clone <your-repo-url> cd gongtranscripts-downloader
-
Install dependencies:
pip install -r requirements.txt
-
Get your Gong API credentials:
- Log in to your Gong account
- Go to Settings > API
- Click "Create" to generate:
- Access Key
- Access Key Secret
- Note your Gong subdomain (e.g., if your URL is
https://acme.gong.io, subdomain isacme)
-
Create a
.envfile in the project directory:# Required - Gong API Credentials GONG_ACCESS_KEY=your_access_key_here GONG_ACCESS_KEY_SECRET=your_access_key_secret_here GONG_SUBDOMAIN=your_subdomain_here # Optional - Download Configuration DOWNLOAD_START_DATE=2022-01-01 DOWNLOAD_END_DATE=2024-12-31 OUTPUT_DIRECTORY=./transcripts MAX_CONCURRENT_DOWNLOADS=3 API_RATE_LIMIT=2.5
-
Test your setup:
python main.py test
# First: Estimate how many calls and download time
python main.py estimate
# Test API connection
python main.py test
# Download all transcripts in configured date range
python main.py download
# Show setup instructions
python main.py setup
# Show current configuration
python main.py info# Download with custom date range
python main.py download --start-date 2023-01-01 --end-date 2023-12-31
# Dry run to see what would be downloaded
python main.py download --dry-run
# List calls without downloading transcripts
python main.py list-calls --format csv
# Custom output directory
python main.py download --output-dir /path/to/custom/directoryThe tool creates an organized directory structure:
transcripts/
├── raw_json/ # Raw API responses
│ ├── call_123.json
│ ├── call_456.json
│ └── all_data.json # Consolidated file
├── transcripts/ # Formatted text transcripts
│ ├── transcript_123_2023-01-15.txt
│ └── transcript_456_2023-01-16.txt
├── by_date/ # Organized by date
│ ├── 2023-01-15/
│ └── 2023-01-16/
├── calls_metadata.csv # Summary of all calls
├── summary_statistics.csv
└── logs/ # Download logs
└── download_20240101_120000.log
Complete API responses with all metadata and transcript data.
Human-readable transcripts with:
- Call metadata (date, time, participants)
- Timestamped conversation
- Speaker identification
Example:
================================================================================
CALL TRANSCRIPT
================================================================================
Call ID: 123456789
Date: 2023-01-15
Time: 14:30
Duration: 45 minutes
Title: Discovery Call - Acme Corp
Participants: John Sales, Jane Prospect
--------------------------------------------------------------------------------
[00:01] John Sales: Hi Jane, thanks for taking the time today...
[00:45] Jane Prospect: Thanks John, excited to learn more about your solution...
Structured data for analysis:
- Call IDs, dates, durations
- Participant lists (internal/external)
- CRM associations
- Transcript availability status
| Variable | Required | Default | Description |
|---|---|---|---|
GONG_ACCESS_KEY |
✅ | - | Your Gong API Access Key |
GONG_ACCESS_KEY_SECRET |
✅ | - | Your Gong API Access Key Secret |
GONG_SUBDOMAIN |
✅ | - | Your Gong subdomain |
DOWNLOAD_START_DATE |
❌ | 2022-01-01 |
Start date (YYYY-MM-DD) |
DOWNLOAD_END_DATE |
❌ | 2024-12-31 |
End date (YYYY-MM-DD) |
OUTPUT_DIRECTORY |
❌ | ./transcripts |
Output directory |
MAX_CONCURRENT_DOWNLOADS |
❌ | 3 |
Concurrent API calls |
API_RATE_LIMIT |
❌ | 2.5 |
API calls per second |
SAVE_RAW_JSON |
❌ | True |
Save raw JSON files |
SAVE_FORMATTED_TEXT |
❌ | True |
Save formatted transcripts |
SAVE_METADATA_CSV |
❌ | True |
Save metadata CSV |
Downloads transcripts from Gong.
Options:
--start-date YYYY-MM-DD- Override start date--end-date YYYY-MM-DD- Override end date--output-dir PATH- Override output directory--dry-run- Show what would be downloaded without downloading
Tests API connection and credentials.
Shows detailed setup instructions and checks current configuration.
Lists calls in date range without downloading transcripts.
Options:
--format [json|csv|txt]- Output format (default: csv)
Shows current configuration and output directory status.
Gong API has default limits:
- 3 calls per second
- 10,000 calls per day
The tool respects these limits with:
- Configurable rate limiting (
API_RATE_LIMIT) - Exponential backoff on rate limit errors
- Resume capability for large downloads
For higher limits, contact Gong support.
If a download is interrupted:
- The tool saves progress automatically
- Run
python main.py downloadagain - It will resume from where it left off
- Progress file is cleaned up after successful completion
For large datasets (thousands of calls):
-
Adjust rate limiting:
export API_RATE_LIMIT=2.8 # Just under the 3/second limit
-
Use concurrent downloads:
export MAX_CONCURRENT_DOWNLOADS=5 -
Split date ranges for very large datasets:
python main.py download --start-date 2022-01-01 --end-date 2022-06-30 python main.py download --start-date 2022-07-01 --end-date 2022-12-31
1. "Configuration error" when running commands
- Check that your
.envfile exists and has the required variables - Run
python main.py setupto see what's missing
2. "API connection failed"
- Verify your credentials in Gong Settings > API
- Check that your subdomain is correct
- Ensure you have technical administrator permissions
3. "Rate limited" errors
- Reduce
API_RATE_LIMITin your.envfile - Contact Gong to increase your rate limits
4. Transcripts missing for some calls
- Some calls may not have transcripts available
- Check the
calls_metadata.csvfile forhas_transcriptstatus - Gong only transcribes calls that meet certain criteria
5. Large downloads timing out
- Use the resume capability - restart the download
- Split into smaller date ranges
- Increase
API_TIMEOUTif needed
- Check the logs in
transcripts/logs/for detailed error information - Run
python main.py infoto check your configuration - Use
--dry-runto test without downloading
Once you have your transcripts, here are some analysis ideas:
# Find common customer pain points
import pandas as pd
df = pd.read_csv('transcripts/calls_metadata.csv')
# Analyze external participant questions# Analyze call duration vs success metrics
# Compare internal vs external talk time
# Identify top-performing sales reps# Search transcripts for feature requests
# Find competitor mentions
# Analyze pricing discussions- Credentials: Store API credentials securely in
.envfile (not in code) - Data: Downloaded transcripts contain sensitive customer conversations
- Access: Ensure appropriate access controls on output directories
- Compliance: Follow your company's data retention and privacy policies
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
MIT License - see LICENSE file for details.
Happy transcript mining! 🎉
Transform your sales conversations into actionable insights.
The tool now collects comprehensive participant information for future filtering and analysis:
- Basic Info: Name, email, context (Internal/External)
- Professional Info: Role, company, title
- Call Statistics: Total calls, duration, host/organizer counts
- Temporal Data: First/last seen dates, call history
- Speaker Mapping: Speaker IDs for transcript correlation
participants.csv: Complete participant profiles with statisticsparticipant_summary.csv: Summary statistics across all participants- Enhanced
calls_metadata.csv: Individual participant columns for easy filtering by_participant/: Directory structure with transcripts organized by participant
With the enhanced participant data, you can easily:
- Filter by Sales Rep: Find all calls by specific sales representatives
- Performance Analysis: Compare call statistics across team members
- Customer Analysis: Track interactions with specific customers
- Team Collaboration: Analyze internal vs external participation
- Content Creation: Extract insights from top performers' calls
- Training: Use successful calls as training materials
# Get Greg's call statistics
python participant_filter.py --year 2025 --participant "Greg"
# Compare with other sales reps
python participant_filter.py --year 2025 --list-participants --context Internal# Find calls with specific customers
python participant_filter.py --year 2025 --search "Acme Corp"# See who works together most
python participant_filter.py --year 2025 --list-participants- API Rate Limits: The tool includes built-in rate limiting and retry logic
- Large Date Ranges: Consider downloading in smaller chunks for very large ranges
- Missing Transcripts: Some calls may not have transcripts available
- Participant Matching: Use email addresses for more precise participant filtering
If a download is interrupted, simply run the tool again. It will automatically resume from where it left off.
- Python 3.8+
- Gong API access
- Required packages (see
requirements.txt)
[Your License Here]