Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BACKEND] Fix create_update_segments_excursions: cas 1 seul position par MMSI dans batch non couvert #349

Open
marthevienne opened this issue Dec 11, 2024 · 13 comments · Fixed by #371 · May be fixed by #400
Open

[BACKEND] Fix create_update_segments_excursions: cas 1 seul position par MMSI dans batch non couvert #349

marthevienne opened this issue Dec 11, 2024 · 13 comments · Fixed by #371 · May be fixed by #400
Assignees
Labels
backend bug Something isn't working P1

Comments

@marthevienne
Copy link
Collaborator

(.venv) (base) ➜ backend git:(main) ✗ python3 bloom/tasks/create_update_excursions_segments.py
/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/pydantic/_migration.py:283: UserWarning: pydantic.generics:GenericModel has been moved to pydantic.BaseModel.
warnings.warn(f'{import_path} has been moved to {new_location}.')
[bloom INFO @ 21:20:17] DEBUT - Création / mise à jour des excursions et des segments
[bloom INFO @ 21:20:18] Lecture des nouvelles positions depuis le 2024-11-11 20:14:57.333136+00:00
[bloom INFO @ 21:20:18] 1028 nouvelles positions
[bloom INFO @ 21:20:29] Création des excursions
[bloom ERROR @ 21:20:29] Session rollback because of exception
Traceback (most recent call last):
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/pandas/core/indexes/range.py", line 413, in get_loc
return self._range.index(new_key)
^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: 0 is not in range

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Users/marthevienne/12_bloom/backend/bloom/infra/database/database_manager.py", line 32, in session
yield session
File "/Users/marthevienne/12_bloom/backend/bloom/tasks/create_update_excursions_segments.py", line 160, in run
df_start.loc[-1] = df_start.loc[0]

File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/pandas/core/indexing.py", line 1191, in getitem
return self._getitem_axis(maybe_callable, axis=axis)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/pandas/core/indexing.py", line 1431, in _getitem_axis
return self._get_label(key, axis=axis)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/pandas/core/indexing.py", line 1381, in _get_label
return self.obj.xs(label, axis=axis)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/pandas/core/generic.py", line 4301, in xs
loc = index.get_loc(key)
^^^^^^^^^^^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/pandas/core/indexes/range.py", line 415, in get_loc
raise KeyError(key) from err
KeyError: 0
Traceback (most recent call last):
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/pandas/core/indexes/range.py", line 413, in get_loc
return self._range.index(new_key)
^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: 0 is not in range

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Users/marthevienne/12_bloom/backend/bloom/tasks/create_update_excursions_segments.py", line 375, in
run()
File "/Users/marthevienne/12_bloom/backend/bloom/tasks/create_update_excursions_segments.py", line 160, in run
df_start.loc[-1] = df_start.loc[0]
~~~~~~~~~~~~^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/pandas/core/indexing.py", line 1191, in getitem
return self._getitem_axis(maybe_callable, axis=axis)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/pandas/core/indexing.py", line 1431, in _getitem_axis
return self._get_label(key, axis=axis)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/pandas/core/indexing.py", line 1381, in _get_label
return self.obj.xs(label, axis=axis)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/pandas/core/generic.py", line 4301, in xs
loc = index.get_loc(key)
^^^^^^^^^^^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/pandas/core/indexes/range.py", line 415, in get_loc
raise KeyError(key) from err
KeyError: 0
@marthevienne
Copy link
Collaborator Author

On a une autre erreur liée :
(.venv) (3.12.5) (base) ➜ backend git:(minor-changes-str) python3 -m bloom.tasks.create_update_excursions_segments
[bloom INFO @ 14:56:22] DEBUT - Création / mise à jour des excursions et des segments
[bloom INFO @ 14:56:22] Lecture des nouvelles positions depuis le 2024-12-16 15:04:42.767117+00:00
[bloom INFO @ 14:56:37] 313205 nouvelles positions
[bloom INFO @ 14:56:51] Création des excursions
[bloom ERROR @ 15:02:44] Session rollback because of exception
Traceback (most recent call last):
File "/Users/marthevienne/12_bloom/backend/bloom/infra/database/database_manager.py", line 32, in session
yield session
File "/Users/marthevienne/12_bloom/backend/bloom/tasks/create_update_excursions_segments.py", line 192, in run
df["distance"] = df.apply(get_distance_in_miles, axis=1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/pandas/core/frame.py", line 10374, in apply
return op.apply().finalize(self, method="apply")
^^^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/pandas/core/apply.py", line 916, in apply
return self.apply_standard()
^^^^^^^^^^^^^^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/pandas/core/apply.py", line 1063, in apply_standard
results, res_index = self.apply_series_generator()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/pandas/core/apply.py", line 1081, in apply_series_generator
results[i] = self.func(v, *self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/bloom/tasks/create_update_excursions_segments.py", line 190, in get_distance_in_miles
return distance.distance(p1, p2).miles
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/geopy/distance.py", line 540, in init
super().init(*args, **kwargs)
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/geopy/distance.py", line 276, in init
kilometers += self.measure(a, b)
^^^^^^^^^^^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/geopy/distance.py", line 556, in measure
a, b = Point(a), Point(b)
^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/geopy/point.py", line 175, in new
return cls.from_sequence(seq)
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/geopy/point.py", line 472, in from_sequence
return cls(*args)
^^^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/geopy/point.py", line 188, in new
_normalize_coordinates(latitude, longitude, altitude)
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/geopy/point.py", line 63, in _normalize_coordinates
raise ValueError('Point coordinates must be finite. %r has been passed '
ValueError: Point coordinates must be finite. (nan, nan, 0.0) has been passed as coordinates.
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/Users/marthevienne/12_bloom/backend/bloom/tasks/create_update_excursions_segments.py", line 432, in
run()
File "/Users/marthevienne/12_bloom/backend/bloom/tasks/create_update_excursions_segments.py", line 192, in run
df["distance"] = df.apply(get_distance_in_miles, axis=1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/pandas/core/frame.py", line 10374, in apply
return op.apply().finalize(self, method="apply")
^^^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/pandas/core/apply.py", line 916, in apply
return self.apply_standard()
^^^^^^^^^^^^^^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/pandas/core/apply.py", line 1063, in apply_standard
results, res_index = self.apply_series_generator()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/pandas/core/apply.py", line 1081, in apply_series_generator
results[i] = self.func(v, *self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/bloom/tasks/create_update_excursions_segments.py", line 190, in get_distance_in_miles
return distance.distance(p1, p2).miles
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/geopy/distance.py", line 540, in init
super().init(*args, **kwargs)
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/geopy/distance.py", line 276, in init
kilometers += self.measure(a, b)
^^^^^^^^^^^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/geopy/distance.py", line 556, in measure
a, b = Point(a), Point(b)
^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/geopy/point.py", line 175, in new
return cls.from_sequence(seq)
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/geopy/point.py", line 472, in from_sequence
return cls(*args)
^^^^^^^^^^
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/geopy/point.py", line 188, in new
_normalize_coordinates(latitude, longitude, altitude)
File "/Users/marthevienne/12_bloom/backend/.venv/lib/python3.11/site-packages/geopy/point.py", line 63, in _normalize_coordinates
raise ValueError('Point coordinates must be finite. %r has been passed '
ValueError: Point coordinates must be finite. (nan, nan, 0.0) has been passed as coordinates.

@ejamet73
Copy link
Collaborator

ejamet73 commented Dec 19, 2024

merde
je regarde demain

@rv2931
Copy link
Collaborator

rv2931 commented Dec 22, 2024

@ejamet73 t'as eu le temps de jeter un oeil ?

@ejamet73
Copy link
Collaborator

ejamet73 commented Dec 22, 2024

oui mais rapidement je l'ai pas résolu, l'erreur vient du fait que geopy.distance contienne des valeurs non valides
if (df.shape[0] > 0):
# calculate distance
def get_distance_in_miles(x) -> float:
p1 = (x.start_latitude, x.start_longitude)
p2 = (x.end_latitude, x.end_longitude)
return distance.distance(p1, p2).miles
df["distance"] = df.apply(get_distance_in_miles, axis=1)

donc des NaN (j'imagine) : soit start_latitude start_longitude end_longitude end_latitude
je cherche laquelle ça peut être,
mais il faudrait que je sache dans quel cas on obtient l'erreur, un mmsi par batch ? on recupere une ancienne position du bateau ou bien il n'y en avait pas ?
[edit] je crois que l'erreur vient du moment où le batch ne contient pas le mmsi en question mais qu'il y a une last position du mmsi en base. je vérifie

@rv2931 rv2931 removed the Done! label Dec 22, 2024
@rv2931
Copy link
Collaborator

rv2931 commented Dec 22, 2024

je vais regarder demain si j'ai le temps. je maitrise moins cette partie mais je connais le problème des NaN/Inf car je l'ai déjà au côté API. Par contre bizarre que ça n'apparaissait pas avant la modif de primary keys

@ejamet73
Copy link
Collaborator

L'ereur vient du cas ou on reçoit une seule position mmsi par batch & c'est la toute première position du navire ( last_segment est vide). Je corrige le code mais ça a des conséquences sur une grande partie du script c'est long ^^

@marthevienne
Copy link
Collaborator Author

Coucou !

En creusant le script, j'ai vu une autre erreur qui a un impact important. J'ai tout décortiqué et repris. L'erreur est la suivante : quand on reçoit des positions pour un navire qui n'a pas de segment en base, ses positions ne sont pas prises en compte et ne sont pas sauvées (exemple pour Ortegal dos).

Je pensais que vous ne bosseriez pas dessus, je me suis du coup mise dessus juste.

@marthevienne
Copy link
Collaborator Author

@ejamet73, on a qu'a bosser à 2 dessus. J'ai vraiment tout repris parce que la logique du code n'était pas clair du tout et il y avait donc ces 2 erreurs de df à 1 row et segments jamais sauvés en base

@rv2931
Copy link
Collaborator

rv2931 commented Dec 23, 2024 via email

@rv2931
Copy link
Collaborator

rv2931 commented Dec 23, 2024

On peut très faire évoluer la class Segment pour y ajouter une méthode du genre segment.isValid() qui resterait si les deux points sont identiques ou null

@ejamet73
Copy link
Collaborator

@ejamet73, on a qu'a bosser à 2 dessus. J'ai vraiment tout repris parce que la logique du code n'était pas clair du tout et il y avait donc ces 2 erreurs de df à 1 row et segments jamais sauvés en base

Où est ce que tu travailles ? j'ai créé une nouvelle PR à partir de la même branche de travail 373-fix-checking-of-the-excursion-of-the-last-segment

@marthevienne
Copy link
Collaborator Author

Là, j'ai fait un notebook pour décortiquer le code sur une branche en local.

@marthevienne
Copy link
Collaborator Author

On peut très faire évoluer la class Segment pour y ajouter une méthode du genre segment.isValid() qui resterait si les deux points sont identiques ou null

C'est effectivement le cas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend bug Something isn't working P1
Projects
None yet
3 participants