Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(provider): add gcp provider with data #372

Merged
merged 1 commit into from
Mar 22, 2025

Conversation

jonperron
Copy link
Contributor

@jonperron jonperron commented Feb 4, 2025

Changelog

  • Update providers.csv
  • Add data from gcp with documentation links

Fix #145

@jonperron
Copy link
Contributor Author

Two remaining questions:

@jonperron jonperron force-pushed the feat/boaviztapi-145 branch from 771b7de to 3b6c2c8 Compare February 4, 2025 16:55
@da-ekchajzer
Copy link
Collaborator

What "platform" means in this context ?

The platform is the bare metal server that hosts the instance. Since we are calculating the impact of the instance as part of the bare metal server, we need to know its architecture. The platform field must match one of the id's of the available server archetypes. So you should add a platform name that matches a platform you are going to create in boaviztapi/data/archetypes/server.csv. This documentation may help you:

It is always difficult to get all the data, so sometimes you will have to guess. If so, you should mention it in the warning column. I recommend that you set the CPU name so that the other values are automatically filled in at runtime. If you do this, make sure the CPU is listed in the CPU data file. If it is not listed, you should add it so that everyone can benefit from it: https://doc.api.boavizta.org/contributing/cpu/.

The most important values are :

  • CPU.vcpu
  • case_type (always rack in cloud contexte)
  • CPU.units
  • SSD.units
  • SSD.capacity
  • HDD.units
  • HDD.capacity

For POWER_SUPPLY and USAGE feel free to use the values from the other providers.

GCP has a couple of GPU machine types (https://cloud.google.com/compute/docs/gpus?hl=en), with a range of vcpus and ram. How should they be taken into account ?

We believe that GPUs are never shared between instances, so there is no need to include the resources in the instance file. However, you can add the data to the server.csv. For know we are not calculating the impact of GPU but we will be soon so I recommend you to add it.

Hope this is clear, if not ping me on mattermost.

@jonperron jonperron marked this pull request as draft February 5, 2025 08:03
@jonperron jonperron force-pushed the feat/boaviztapi-145 branch 2 times, most recently from 4a066ef to 098e421 Compare February 9, 2025 16:30
@da-ekchajzer
Copy link
Collaborator

I dont know if you are aware but we use the ";" as a separator in the CSV to distinguish between average;min;max value of a parameter. I just saw that you replaced them with "," in server.csv file. Just to confirm, it is normal to have ;. It can be useful if you don't know an exact value but prefer to give a range. The API will calculate the min and max values.

@jonperron
Copy link
Contributor Author

Thanks for the quick review @da-ekchajzer. I am filling the csv file using open office, that's why it's filled with ",". I will make a fix once I'm done 👍

@jonperron jonperron force-pushed the feat/boaviztapi-145 branch 7 times, most recently from 0165bc2 to ae90d6f Compare March 1, 2025 10:05
@jonperron jonperron marked this pull request as ready for review March 1, 2025 10:06
@jonperron
Copy link
Contributor Author

PR ready for review @da-ekchajzer 👍

@jonperron jonperron force-pushed the feat/boaviztapi-145 branch from ae90d6f to fbbfca7 Compare March 1, 2025 12:38
@da-ekchajzer
Copy link
Collaborator

Well done! Were you able to automate it or did you do it manually?

At first glance, I see strange numbers:

302x4-megamem-1440-metal 1,4424,576000x4-megamem-1440-metalhttps://cloud.google.com/compute/docs/memory-optimized-machines?hl=en#x4_machine_types

I think the "," should be removed as it seems to be a thousand separator.

Other remark, are these only instances with SSD storage? No HDD?

@da-ekchajzer
Copy link
Collaborator

Ideally we should implement a test that computes all instances. But I should open a topic dedicated to this.

@jonperron
Copy link
Contributor Author

jonperron commented Mar 2, 2025

Were you able to automate it or did you do it manually?

I did it manually, but I could add a script as it seems that most of the information are stored in HTML tables.

I think the "," should be removed as it seems to be a thousand separator.

Indeed, I have double-checked and removed the thousand separators.

are these only instances with SSD storage? No HDD?

As I understood the documentation, most of the machines does not come SSD nor HDD. One needs to rent a hard drive when creating a machine. In addition, some of the types such as n4 does not support persistent or local SSD, whereas others support a lot (see https://cloud.google.com/compute/docs/general-purpose-machines?hl=en#c3_disks for example). That's why I only added hard drive capabilities when it is included in a machine such as c4a machines.

Would you do it otherwise ?

Changelog
* Update providers.csv
* Add data from gcp with documentation links
* Update cpu_specs.csv
* Update tests
@jonperron jonperron force-pushed the feat/boaviztapi-145 branch from fbbfca7 to 1673ddd Compare March 2, 2025 14:26
@da-ekchajzer
Copy link
Collaborator

As I understood the documentation, most of the machines does not come SSD nor HDD. One needs to rent a hard drive when creating a machine. In addition, some of the types such as n4 does not support persistent or local SSD, whereas others support a lot (see https://cloud.google.com/compute/docs/general-purpose-machines?hl=en#c3_disks for example). That's why I only added hard drive capabilities when it is included in a machine such as c4a machines.

So you include ssd when it is explicitly a local SSD ? As I understand not all C4a have SSD.

image

We are planning to add storage as a service to get the impacts of additional storage

@da-ekchajzer
Copy link
Collaborator

da-ekchajzer commented Mar 4, 2025

Have you been able to test all instances through a script for instance ?

@jonperron
Copy link
Contributor Author

jonperron commented Mar 4, 2025

So you include ssd when it is explicitly a local SSD ? As I understand not all C4a have SSD.

Yes exactly 👍 That's why there are two archetypes for a "standard c4a" server: c4a-standard and c4a-standard-lssd. The latest corresponds to instances with a SSD included.

Have you been able to test all instances through a script for instance ?

I will add a test 👍 I only added a single instance in the tests/data/archetype at the moment.

@jonperron
Copy link
Contributor Author

I thought about this

@pytest.fixture
def valid_cpu_specs():
    with open(cpu_specs, 'r') as f:
        reader = csv.DictReader(f)
        return {row['name'] for row in reader}


SERVER_ARCHETYPE_MANDATORY_FIELDS = [
    "id",
    "case_type",
    "CPU.units", 
    "RAM.units",
    "SSD.units",
    "HDD.units",
    "GPU.units", 
    "POWER_SUPPLY.units",
    "POWER_SUPPLY.unit_weight",
    "USAGE.time_workload",
    "USAGE.use_time_ratio",
    "USAGE.hours_life_time",
    "USAGE.other_consumption_ratio"
]

def test_archetypes_properly_filled(providers, valid_platforms, valid_cpu_specs):
    for provider_name in providers:
        provider_csv_path = f"{cloud_path}/{provider_name}.csv"

        try:
            with open(provider_csv_path, 'r') as f:
                reader = csv.DictReader(f)
                for row in reader:
                    for field in SERVER_ARCHETYPE_MANDATORY_FIELDS:
                        if not row.get(field):
                            pytest.fail(f"Empty value for field '{field}' in provider '{provider_name}'")
                    if cpu:= row.get('CPU.name'):
                        if cpu not in valid_cpu_specs:
                            pytest.fail(f"CPU '{cpu}' for provider '{provider_name}' not found in cpu_specs.csv")

        except FileNotFoundError:
            pytest.fail(f"CSV file for provider '{provider_name}' not found: {provider_csv_path}")

(It's basically an update of the test test_platform_exists_in_server_csv with explicits checks on mandatory fields)
Is it what you're thinking about ? If not, I'llt catch you on mattermost so we can exchange about what you have in mind.

@da-ekchajzer da-ekchajzer merged commit 88aacba into Boavizta:main Mar 22, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GCP cloud instances
2 participants