Skip to content

Commit ea2d549

Browse files
committed
Fixed floorDiv(long,int) to floorDiv(long,long) for Java8 compatibility.
1 parent 6c3119e commit ea2d549

27 files changed

+165
-59
lines changed

Writerside/topics/Warehouse-Plans.md

Lines changed: 111 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
# Warehouse Plans
22

3-
Warehouse Plans are the way you can control how 'each' Database is translated between clusters or storage environments.
4-
Warehouse plans allow you to control where the data will be translated to.
3+
Warehouse Plans in "hms-mirror" are a database-level configuration mechanism designed to manage and map storage locations within a Hive database during metadata migration. They involve reviewing the database metadata to identify all storage locations (e.g., table and partition locations for both External and Managed tables) and mapping these to predefined Warehouse Plan locations. The intersection of this metadata and the Warehouse Plan is then used to construct **Global Location Maps**, which "hms-mirror" uses internally to provide a consistent mapping between storage systems across clusters or within a single cluster.
54

65
There are three types of 'Warehouse Plans'.
76

@@ -30,3 +29,113 @@ When you choose to use non-standard locations for 'partition specs', we can't bu
3029
we will throw an 'error' for the offending table and describe to imbalance. You can either fix/adjust the dataset OR choose
3130
to use the `SQL` data movement strategy.
3231

32+
## Warehouse Plans Explained
33+
34+
### What Are Warehouse Plans For?
35+
36+
- **Purpose**: At the database level, Warehouse Plans define how storage locations in the source database metadata are translated to target locations. This process ensures that the migrated metadata aligns with the desired storage structure on the target system. Unlike the earlier [`globalLocationMap`](Release-Notes.md#global-location-maps), which offered a simpler, global key-value mapping, Warehouse Plans operate specifically at the database scope, providing a more targeted and detailed approach.
37+
- [**Database Warehouse Plans**](Database-Warehouse-Plans.md) : These are the core of the feature, specifying location mappings for a single database. The documentation implies that ["Global Warehouse Plans"](Global-Warehouse-Plans.md) may extend this concept across multiple databases, but the primary focus is on individual database-level planning.
38+
39+
The resulting Global Location Maps, derived from Warehouse Plans, serve as an internal framework for "hms-mirror" to handle location translations during migration, ensuring consistency between source and target storage systems.
40+
41+
### What to Use Warehouse Plans For?
42+
43+
Warehouse Plans are primarily designed for scenarios where storage locations within a database need to be systematically mapped or reorganized during a migration. Here’s an explanation of their use cases, reflecting their database-level scope and original intent for `STORAGE_MIGRATION`, as well as their broader applicability:
44+
45+
1. **Primary Use Case: Storage Migration Within a Cluster (STORAGE_MIGRATION)**:
46+
- **Scenario**: You’re migrating the storage layer behind a database’s metadata from HDFS to Ozone (or another system like an encrypted zone), as highlighted in the [`STORAGE_MIGRATION` strategy](storage_migration.md).
47+
- **Use**: Define a Warehouse Plan for the database to map all existing locations (e.g., `/apps/hive/warehouse/my_db`) to a new target (e.g., `ofs://ozone1/vol1/bucket1/my_db`). The metadata is reviewed, and all table and partition locations are updated accordingly. The resulting Global Location Maps ensure the migration reflects the new storage system.
48+
- **Example**: For a database `finance_db`:
49+
```yaml
50+
databaseWarehousePlans:
51+
finance_db:
52+
EXTERNAL_TABLE: "/finance_db/ext"
53+
MANAGED_TABLE: "/finance_db/managed"
54+
```
55+
This maps all External and Managed table locations within `finance_db` to the specified Ozone paths.
56+
- The Ozone Namespace is pulled from the `targetNamespace` which can be set in the config or via the Web UI.
57+
```yaml
58+
transfer:
59+
targetNamespace: ofs://myOzone
60+
```
61+
62+
2. **Reorganizing Storage During Schema Migration (SCHEMA_ONLY)**:
63+
- **Scenario**: You’re migrating a database’s schema between clusters (e.g., on-premises to cloud) using `SCHEMA_ONLY` (page 150), and you want to reorganize storage locations simultaneously.
64+
- **Use**: A Warehouse Plan can redefine the database’s storage locations (e.g., from `/data/my_db` to `s3a://mybucket/my_db`). Since `SCHEMA_ONLY` doesn’t move data, pair it with the [`-dc|--distcp` option](cli-options.md) to generate `distcp` plans for separately migrating the data to the new locations.
65+
- **Example**:
66+
```yaml
67+
databaseWarehousePlans:
68+
my_db:
69+
EXTERNAL_TABLE: "/my_db/ext"
70+
MANAGED_TABLE: "/my_db/managed"
71+
```
72+
Run: `hms-mirror -d SCHEMA_ONLY -db my_db -dc` to get the schema migration and a `distcp` plan. Again, use the `transfer.targetNamespace` to set the `s3` location. EG `s3a://mybucket`.
73+
74+
3. **Migrating and Moving Data with SQL Strategy (SQL)**:
75+
- **Scenario**: You’re using the [`SQL` data strategy](SQL.md) to migrate both metadata and data between linked clusters, and you need to adjust storage locations.
76+
- **Use**: Define a Warehouse Plan to map the database’s locations to the target cluster’s storage (e.g., from `hdfs://source/my_db` to `hdfs://target/my_db`). The SQL strategy will move the data to the new locations as part of the migration.
77+
- **Example**:
78+
```yaml
79+
databaseWarehousePlans:
80+
my_db:
81+
EXTERNAL_TABLE: "/my_db/ext"
82+
MANAGED_TABLE: "/my_db/managed"
83+
```
84+
Run: `hms-mirror -d SQL -db my_db`.
85+
86+
4. **Supporting Data Strategies Without Data Movement**:
87+
- **Scenario**: You’re using a strategy like [`SCHEMA_ONLY`](SCHEMA_ONLY.md) that doesn’t move data, but you need a plan to migrate the data later.
88+
- **Use**: Warehouse Plans map the database’s locations, and the `-dc|--distcp` option generates `distcp` scripts to handle the data migration separately, aligning with the mapped locations.
89+
- **Example**: For `SCHEMA_ONLY`:
90+
```bash
91+
hms-mirror -d SCHMEA_ONLY -db my_db -dc -wps my_db=/my_db/ext:/my_db/managed
92+
```
93+
The Warehouse Plan ensures the dump reflects the intended target locations, and `distcp` plans are provided.
94+
95+
5. **Consistency Within a Database Across Table Types**:
96+
- **Scenario**: A database contains both External and Managed tables with various locations, and you want a unified storage structure on the target.
97+
- **Use**: A Warehouse Plan reviews all locations in the database metadata and maps them to consistent target locations, regardless of table type. This is more precise than global mappings, as it’s tailored to the database.
98+
- **Example**: Mapping all tables in `sales_db` to a single storage layer:
99+
```yaml
100+
databaseWarehousePlans:
101+
sales_db:
102+
EXTERNAL_TABLE: "/new_storage/division_ext"
103+
MANAGED_TABLE: "/new_storage/division_mngd"
104+
```
105+
106+
### How to Use Warehouse Plans
107+
108+
To implement Warehouse Plans, configure them in the `default.yaml` file under the `databaseWarehousePlans` section at the database level.
109+
110+
Here’s how, based on the documentation and your clarification:
111+
112+
- **Syntax**:
113+
```yaml
114+
databaseWarehousePlans:
115+
<database_name>:
116+
EXTERNAL_TABLE: "<target_location_for_external>"
117+
MANAGED_TABLE: "<target_location_for_managed>"
118+
```
119+
- The metadata for `<database_name>` is analyzed, and all locations (for External and Managed tables) are mapped to the specified targets. These mappings feed into the Global Location Maps used internally.
120+
- NOTE: The database name is appended to the location you specify, so do NOT include that in the location. This allows you to use the same location for multiple databases in the scenario you want to have multiple databases share the same root location.
121+
122+
- **Command-Line Integration**: While Warehouse Plans are configuration-driven, they work with strategies like `STORAGE_MIGRATION`, `SCHEMA_ONLY`, or `SQL`. Use `-dc|--distcp` for strategies without data movement.
123+
- The Warehouse Plans can be set in the config, via the Web UI, or via the commandline option `-wps <db=ext-dir:mngd-dir[,db=ext-dir:mngd-dir]...>`
124+
125+
```
126+
hms-mirror -d STORAGE_MIGRATION -db my_db -dc -wps my_db=/my_db/ext:/my_db/managed
127+
```
128+
or
129+
```
130+
hms-mirror -d SCHEMA_ONLY -db my_db -dc
131+
```
132+
133+
- **Execution**: The Warehouse Plan drives the location translation process. For `STORAGE_MIGRATION`, data is moved directly (if using `DISTCP` or `SQL` as the data movement strategy, page 202). For other strategies, `-dc` ensures data migration plans are provided.
134+
135+
### Practical Tips
136+
- **Database Scope**: Define Warehouse Plans per database to ensure precise mapping. Unlike the older `globalLocationMap`, they don’t apply globally unless explicitly extended via "Global Warehouse Plans."
137+
- **Dry-Run**: Test with a dry-run (`hms-mirror -db <db_name>`) to review the mappings in the output reports (page 111) before executing (`-e`).
138+
- **Storage Access**: Verify permissions for the mapped locations, especially for cross-cluster scenarios (page 69, "Linking Cluster Storage Layers").
139+
- **Original Intent**: While designed for `STORAGE_MIGRATION` (e.g., HDFS to Ozone), their flexibility supports broader reorganization tasks.
140+
141+
In summary, Warehouse Plans are a database-level tool in "hms-mirror" for mapping storage locations within a database’s metadata, originally for `STORAGE_MIGRATION` (e.g., HDFS to Ozone), but also valuable for `SCHEMA_ONLY` and `SQL` strategies. They construct Global Location Maps internally, ensuring accurate location translations, and can be paired with `-dc|--distcp` for separate data movement when needed.

Writerside/topics/hms-mirror-Default-Configuration-Template.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Use this as a template for the `default.yaml` configuration file used by the `cl
44

55

66
```yaml
7-
# Copyright 2024 Cloudera, Inc. All Rights Reserved.
7+
# Copyright 2021 Cloudera, Inc. All Rights Reserved.
88
#
99
# Licensed under the Apache License, Version 2.0 (the "License");
1010
# you may not use this file except in compliance with the License.
@@ -29,7 +29,9 @@ transfer:
2929
clusters:
3030
LEFT:
3131
# Set for Hive 1/2 environments
32-
legacyHive: true
32+
# Remove, see 'platformType' legacyHive: true
33+
# platform type
34+
platformType: 'HDP2'
3335
# Is the 'Hadoop COMPATIBLE File System' used to prefix data locations for this cluster.
3436
# It is mainly used as the transfer location for metadata (export)
3537
# If the primary storage for this cluster is 'hdfs' than use 'hdfs://...'
@@ -56,15 +58,16 @@ clusters:
5658
# This will require the user to install the jdbc driver for the metastoreDirect in $HOME/.hms-mirror/aux_libs
5759
metastore_direct:
5860
uri: "<jdbc_url_to_metastore_db_including_db>"
59-
type: MYSQL|POSTGRES|ORACLE
61+
type: MYSQL|POSTRES|ORACLE
6062
connectionProperties:
6163
user: "<db_user>"
6264
password: "<db_password>"
6365
connectionPool:
6466
min: 3
6567
max: 5
6668
RIGHT:
67-
legacyHive: false
69+
# Removed, see platformType. ...legacyHive: false
70+
platformType: 'CDP7_1'
6871
# Is the 'Hadoop COMPATIBLE File System' used to prefix data locations for this cluster.
6972
# It is mainly used to as a baseline for where "DATA" will be transfered in the
7073
# STORAGE stage. The data location in the source location will be move to this
@@ -93,4 +96,5 @@ clusters:
9396
auto: true
9497
# When a table is created, run MSCK when there are partitions.
9598
initMSCK: true
99+
96100
```

pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828

2929
<groupId>com.cloudera.utils.hadoop</groupId>
3030
<artifactId>hms-mirror</artifactId>
31-
<version>2.3.1.4</version>
31+
<version>2.3.1.5</version>
3232
<packaging>jar</packaging>
3333

3434
<name>hms-mirror</name>

src/main/java/com/cloudera/utils/hms/mirror/cli/HmsMirrorCommandLineOptions.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1487,7 +1487,7 @@ CommandLineRunner configWarehousePlans(HmsMirrorConfig config, @Value("${hms-mir
14871487
log.info("warehouse-plan: {}", value);
14881488
String[] warehouseplan = value.split(",");
14891489

1490-
if (nonNull(warehouseplan) && warehouseplan.length > 0) {
1490+
if (nonNull(warehouseplan)) {
14911491
// for each plan entry, split on '=' for db=ext_dir:mngd_dir
14921492
for (String plan : warehouseplan) {
14931493
String[] planParts = plan.split("=");

src/main/java/com/cloudera/utils/hms/mirror/connections/ConnectionState.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,5 +22,5 @@ public enum ConnectionState {
2222
INITIALIZED,
2323
CONNECTED,
2424
DISCONNECTED,
25-
ERROR;
25+
ERROR
2626
}

src/main/java/com/cloudera/utils/hms/mirror/domain/Messages.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -69,11 +69,11 @@ public List<String> getMessages() {
6969
List<String> messageList = new ArrayList<String>();
7070
for (MessageCode messageCode : MessageCode.getCodes(bitSet)) {
7171
if (!argMap.containsKey(messageCode.ordinal())) {
72-
messageList.add(messageCode.toString() + "-->" + messageCode.getDesc());
72+
messageList.add(messageCode + "-->" + messageCode.getDesc());
7373
} else {
7474
Object[] vMap = argMap.get(messageCode.ordinal());
7575
String m = MessageFormat.format(messageCode.getDesc(), vMap);
76-
messageList.add(messageCode.toString() + "-->" + m);
76+
messageList.add(messageCode + "-->" + m);
7777
}
7878
}
7979
// String[] rtn = messageList.toArray(new String[0]);

src/main/java/com/cloudera/utils/hms/mirror/domain/WarehouseMapBuilder.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,7 @@ protected Object clone() throws CloneNotSupportedException {
180180

181181
Map<String, Warehouse> newWarehousePlan = new HashMap<>();
182182
for (Map.Entry<String, Warehouse> entry : warehousePlans.entrySet()) {
183-
newWarehousePlan.put(entry.getKey(), (Warehouse) entry.getValue().clone());
183+
newWarehousePlan.put(entry.getKey(), entry.getValue().clone());
184184
}
185185
clone.setWarehousePlans(newWarehousePlan);
186186

src/main/java/com/cloudera/utils/hms/mirror/domain/support/CollectionEnum.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,5 +22,5 @@ public enum CollectionEnum {
2222
IN_PROGRESS,
2323
COMPLETED,
2424
ERRORED,
25-
SKIPPED;
25+
SKIPPED
2626
}

src/main/java/com/cloudera/utils/hms/mirror/domain/support/ConfigSection.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,5 +24,5 @@ public enum ConfigSection {
2424
LEFT_CLUSTER,
2525
RIGHT_CLUSTER,
2626
TRANSLATOR,
27-
TRANSFER;
27+
TRANSFER
2828
}

src/main/java/com/cloudera/utils/hms/mirror/domain/support/ConnectionStatus.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,5 +24,5 @@ public enum ConnectionStatus {
2424
CHECK_CONFIGURATION,
2525
SKIPPED,
2626
DISABLED,
27-
UNKNOWN;
27+
UNKNOWN
2828
}

0 commit comments

Comments
 (0)