GitHub - wikimedia/wmfdata-python: Tools for working with Wikimedia data on the internal analytics clients

Important

This repository has moved to gitlab.wikimedia.org/repos/data-engineering/wmfdata-python.

Unless you are developing Wmfdata, you just need to continue to upgrade whenever Wmfdata prints a notice, using the pip command included in the notice.

wmfdata is an Python package for analyzing Wikimedia data on Wikimedia's non-public analytics clients.

Features

Wmfdata's most popular feature is SQL data access. The hive.run, spark.run, presto.run, and mariadb.run functions allow you to run commands using these different query engines and receive the results as a Pandas dataframe, with just a single line of code.

Other features include:

Easy generation of Spark sessions using spark.create_session (or spark.create_custom_session if you want to fine-tune the settings)
Loading CSV or TSV files into Hive using hive.load_csv
Turning cryptic Kerberos-related errors into clear reminders to renew your Kerberos credentials

Documentation

For an introduction to using Wmfdata, see the quickstart notebook.

Installation and upgrading

Wmfdata comes preinstalled in the Conda environments used on the analytics clients.

To upgrade to a newer version, use:

pip install --upgrade git+https://gitlab.wikimedia.org/repos/data-engineering/wmfdata-python.git@release

Support and maintenance

Tasks related to Wmfdata are tracked in Wikimedia Phabricator in the Wmfdata-Python project. The best starting place is the backlog in priority order.

The Wikimedia Foundation's Movement Insights and Data Products teams are joint code stewards of Wmfdata. Data Products is the ultimate steward of the data access and analytics infrastructure interface portions, while Movement Insights is ultimate steward of the analyst ergonomics portions.

The current maintainers of Wmfdata are:

@nshahquinn-wmf
@xcollazo

If you're a hero who would like to contribute code, we welcome merge requests!

Name		Name	Last commit message	Last commit date
Latest commit History 259 Commits
docs		docs
wmfdata		wmfdata
wmfdata_tests		wmfdata_tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Features

Documentation

Installation and upgrading

Support and maintenance

About

Uh oh!

Releases

Uh oh!

Contributors 10

Uh oh!

Languages

License

wikimedia/wmfdata-python

Folders and files

Latest commit

History

Repository files navigation

Features

Documentation

Installation and upgrading

Support and maintenance

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors 10

Uh oh!

Languages