Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AtumPartition.subPartitionContext orders partition keys differently in Scala 2.12 vs Scala 2.13 #261

Closed
kevinwallimann opened this issue Sep 4, 2024 · 3 comments · Fixed by #266
Assignees
Labels
bug Something isn't working

Comments

@kevinwallimann
Copy link

kevinwallimann commented Sep 4, 2024

Describe the bug

When I create a sub-partition, the order of the partition keys are different across Scala version 2.12 and 2.13, as they get combined with the parent partition's keys. This is probably because of a different behaviour in the ++ operator of ListMap.

It's not exactly clear to me, if this is a bug, or if the ordering of the partition keys is not relevant.

To Reproduce

Steps to reproduce the behavior OR commands run:
Using AtumAgent v0.2.0, execute the following commands:

    val rawCtx = AtumAgent.getOrCreateAtumContext(AtumPartitions(
      List[(String, String)](
        "source_system" -> "test_system",
        "dataset_name" -> "test_pipeline",
        "info_date" -> "2024-08-30"
      )
    ))
    val publishCtx = rawCtx.subPartitionContext(AtumPartitions(
      List[(String, String)](
        "catalog_path" -> "test_db.test_pipeline",
        "info_date" -> "2024-08-30"
      )
    ))

In Scala 2.12 publishCtx.atumPartitions is

List(
  PartitionDTO("source_system", "test_system"),
  PartitionDTO("dataset_name", "test_pipeline"),
  PartitionDTO("catalog_path", "test_db.test_table"),
  PartitionDTO("info_date", "2024-08-30")
)

However, in Scala 2.13, it is

List(
  PartitionDTO("source_system", "test_system"),
  PartitionDTO("dataset_name", "test_pipeline"),
  PartitionDTO("info_date", "2024-08-30"),
  PartitionDTO("catalog_path", "test_db.test_table")
)

The issue seems to be a different behaviour of the ++ operator in ListMap between Scala 2.12 and 2.13.

A similar example is the following code:

import scala.collection.immutable.ListMap
val countryCapitals1: ListMap[String, String] = ListMap(
    "Canada" -> "Ottawa",
    "India" -> "Delhi"
  );

val countryCapitals2: ListMap[String, String] = ListMap(
  "Cuba" -> "Havana",
  "India" -> "New Delhi"
);

val combinedCountryCapitals = countryCapitals1 ++ countryCapitals2

println("combinedCountryCapitals:")
combinedCountryCapitals.foreach { entry =>
  print("country: " + entry._1)
  println(", capital: " + entry._2)
};

When executed in Scala 2.12.19, the output is

combinedCountryCapitals:
country: Canada, capital: Ottawa
country: Cuba, capital: Havana
country: India, capital: New Delhi

However, in 2.13.14, it is

combinedCountryCapitals:
country: Canada, capital: Ottawa
country: India, capital: New Delhi
country: Cuba, capital: Havana

Expected behavior

The order should be the same across Scala 2.12 and 2.13 (unless ordering in partition keys is not relevant anymore)

@kevinwallimann kevinwallimann added the bug Something isn't working label Sep 4, 2024
@benedeki
Copy link
Contributor

benedeki commented Sep 4, 2024

That's very interesting and little disturbing. We will check it out and find a solution. Almost a topic for CQC.

@lsulak
Copy link
Collaborator

lsulak commented Sep 4, 2024

I found this ticket in Scala: scala/bug#11719

@salamonpavel salamonpavel self-assigned this Sep 9, 2024
@github-project-automation github-project-automation bot moved this from 🆕 To groom to ✅ Done in CPS small repos project Sep 10, 2024
Copy link

Release notes:

  • Fixes sub partitions context creation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants