Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read throughput of table is not getting set correctly in 4.16.0 version #158

Open
ganeshashree opened this issue Dec 4, 2021 · 0 comments

Comments

@ganeshashree
Copy link

When Hive tries to read data from DynamoDB backed Hive table using DynamoDBStorageHandler, read throughput is getting set as null despite ReadCapacityUnits being set in ProvisionedThroughput configured in table properties. This is leading to incorrect mappers calculation during split generation. I found this issue in 4.16.0 version and this issue doesn't exist in 4.9.0 version of dynamodb connector.

The following are the relevant Hive logs:

2021-11-25T18:04:12,287 INFO  [f2f33a58-71b4-4bd0-b0e5-6a38d5fe410c main([])]: dynamodb.DynamoDBClient (DynamoDBClient.java:call(136)) - Describe table output: {Table: {AttributeDefinitions: [{AttributeName: id,AttributeType: N}, {AttributeName: version,AttributeType: N}],TableName: volume,KeySchema: [{AttributeName: id,KeyType: HASH}, {AttributeName: version,KeyType: RANGE}],TableStatus: ACTIVE,CreationDateTime: *** ProvisionedThroughput: {LastIncreaseDateTime: ***,NumberOfDecreasesToday: 0,ReadCapacityUnits: 31104,WriteCapacityUnits: 960},TableSizeBytes: 149776006681,ItemCount: 140893823,TableArn: ****,TableId: ******,}}
2021-11-25T18:04:12,287 INFO  [f2f33a58-71b4-4bd0-b0e5-6a38d5fe410c main([])]: dynamodb.DynamoDBStorageHandler (DynamoDBStorageHandler.java:configureTableJobProperties(127)) - Average item size: 1063.04168267902
2021-11-25T18:04:12,287 INFO  [f2f33a58-71b4-4bd0-b0e5-6a38d5fe410c main([])]: dynamodb.DynamoDBStorageHandler (DynamoDBStorageHandler.java:configureTableJobProperties(203)) - Average item size: 1063.04168267902
2021-11-25T18:04:12,287 INFO  [f2f33a58-71b4-4bd0-b0e5-6a38d5fe410c main([])]: dynamodb.DynamoDBStorageHandler (DynamoDBStorageHandler.java:configureTableJobProperties(204)) - Item count: 140893823
2021-11-25T18:04:12,287 INFO  [f2f33a58-71b4-4bd0-b0e5-6a38d5fe410c main([])]: dynamodb.DynamoDBStorageHandler (DynamoDBStorageHandler.java:configureTableJobProperties(205)) - Table size: 149776006681
2021-11-25T18:04:12,287 INFO  [f2f33a58-71b4-4bd0-b0e5-6a38d5fe410c main([])]: dynamodb.DynamoDBStorageHandler (DynamoDBStorageHandler.java:configureTableJobProperties(206)) - Read throughput: null
2021-11-25T18:04:12,287 INFO  [f2f33a58-71b4-4bd0-b0e5-6a38d5fe410c main([])]: dynamodb.DynamoDBStorageHandler (DynamoDBStorageHandler.java:configureTableJobProperties(207)) - Write throughput: null

Split generation log:

2021-11-25 18:04:13,244 [INFO] [InputInitializer {Map 1} #0] |read.AbstractDynamoDBInputFormat|: Read percentage: 0.2
2021-11-25 18:04:13,809 [INFO] [InputInitializer {Map 1} #0] |read.AbstractDynamoDBInputFormat|: Would use 139 segments for size
2021-11-25 18:04:13,809 [INFO] [InputInitializer {Map 1} #0] |read.AbstractDynamoDBInputFormat|: Would use 0 segments for throughput
2021-11-25 18:04:13,809 [INFO] [InputInitializer {Map 1} #0] |read.AbstractDynamoDBInputFormat|: Using computed number of segments: 139
2021-11-25 18:04:13,812 [INFO] [InputInitializer {Map 1} #0] |read.AbstractDynamoDBInputFormat|: Max number of cluster map tasks: 62
2021-11-25 18:04:13,812 [INFO] [InputInitializer {Map 1} #0] |read.AbstractDynamoDBInputFormat|: Configured read throughput: 1
2021-11-25 18:04:13,812 [INFO] [InputInitializer {Map 1} #0] |read.AbstractDynamoDBInputFormat|: Calculated to use 1 mappers

I suspect this commit added in 4.11.0 version. But, don't have much context on the change done. So, need some help in fixing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant