Skip to content

Commit

Permalink
Updated the workflows documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
dmatekenya committed Nov 1, 2024
1 parent dbc3b75 commit 72b8c34
Show file tree
Hide file tree
Showing 3 changed files with 30 additions and 6 deletions.
7 changes: 7 additions & 0 deletions docs/general-guide.md → docs/folders-and-naming.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,13 @@ since there are many things which can be named, here provide general guidelines.
- **Underscore Vs. hyphen.** Except for cases where use of hyphen is not allowed (e.g., Python script names), all folder and file names should be separated by hyphen. For example, ```damage-assessment``` as opposed to ```DamageAssessment``` or ```Damage-Assessment```.
- **Theme based naming.** As much as possible, ensure names are informative and match with topic/theme. For example, in the data folder, one can have directory for ```admin-boundaries```

## Adding Data to Project Sharepoint
We recognize that this approach may create some duplication and additional effort. However, wherever possible (if datasets arent to large), we require that datasets (both raw and derived) be uploaded to the project’s SharePoint. This enables other Bank staff, who are often our clients on the project, to access the data as needed. In summary, you will maintain copies of the data in the data folder on your local machine for your analysis. As outlined in the [Git workflows](/docs/git-workflows.md), this data will not be uploaded to GitHub and will remain locally stored.

## Programming Environments
- **Python virtual environments.** We recommend using ```.venv``` for virtual environments. This allows for automatic detection by tools and editors like VS Code, simplifies setup, and keeps the folder hidden in most operating systems, reducing clutter. It also promotes consistency across projects, making it easier for others to understand and navigate your setup. because this keeps the folder tree clean among other advantages.
- **Environment file for secrets and credentials**. In the project folder, you will find a file ```.env.example```, rename that file to ```.env```. This is what you will to keep API keys and other secrets. Again, refer to [this part](https://worldbank.github.io/template/README.html) of the documentation for details.




Expand Down
19 changes: 13 additions & 6 deletions docs/git-workflows.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,21 @@
# Guidelines for Git and GitHub Workflows
In this series of documents, we present what we consider best practices for executing data science projects. It’s important to note that these practices are tailored specifically to the work of the <span style="color:#3EACAD">Data Lab</span>. While they may not be universally applicable to all data science projects, we believe they remain highly valuable.
This section provides essential guidelines for using Git and GitHub effectively, ensuring a structured and collaborative workflow for all team members in a project. By following these practices—such as consistently ignoring the "data" folder to protect sensitive information, avoiding direct pushes to the main branch, creating descriptive branch names, and submitting pull requests once work on a branch is complete—we can maintain a clean, organized codebase and promote efficient collaboration. These guidelines help uphold version control best practices, streamline teamwork, and reduce the potential for errors in project repositories.

These documents will cover the following topics:

## Branch Names and Other General Practices
- **Branch names**. After joining the project and cloning the repository, create a concise, descriptive branch name for your work and ensure you switch to that branch before beginning any work on your machine.
- **Update branches**. Avoid creating new update branches; instead, push your changes and resolve any conflicts directly. For instance, if bots in the repository modify your code (e.g., adjusting indentations), simply pull these changes before pushing your own updates.
- **Pull requests (PR)**. When you believe your changes are final, create a pull request and assign the project lead as the reviewer.

## Folders and Files to Ignore
As all data science repos in the Data Lab use this template, the project repo will come with ```.ignore``` file prepopulated with most files and folders which need to be ignored. However, once you join the project and create your own branch. You will have to make sure that the following folders are being ignored.
- Data folder
- Virtual environments (```.venv```)
- Environment (```.env```)
Feel free to add any other files (e.g., system files specific to your OS) to the ```.gitignore```


1.**Folder Structure and Naming Conventions for Project Setup**

2. **Git and GitHub Workflow Standards and Guidelines**

3.**Standards for Documenting and Styling Analytical Notebooks**

4.**Guidelines for Communicating and Presenting Data Outputs.**

10 changes: 10 additions & 0 deletions docs/notebooks-workflows.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Guidelines for Documenting and Styling Analytical Notebooks
This section provides best practices for structuring analytical notebooks to enhance readability. The guidelines include recommendations for hiding code cells to maintain a clean appearance in Jupyter Book, incorporating references where relevant, and organizing content logically to ensure clarity for readers.

- **Structure**. In all the Data Lab projects, please follow [this analytics structure](https://github.com/worldbank/sudan-poverty-monitoring/blob/main/docs/2-analytics.md).
- **Editing _toc.yml**
- **Removing/hiding cell blocks** All notebooks will be rendered in Jupyter Book. To enhance readability, ensure code cells are hidden or removed using cell tags. In some cases, you may use the hide-input cell tag.




0 comments on commit 72b8c34

Please sign in to comment.