-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS Billing partitionSpec mismatch #1002
Comments
The CRD validation was added much later so it's entirely possible we messed that up, and we haven't had access to an AWS account with billing data in it until recently, and we've been hitting issues with that (unrelated to any of this though) so it's been problematic for us to test it recently. The thing to know is that columns and partition columns are separate and you can't use regular columns as partition columns. So when you define the partitions, it's like defining another column, but it's a partition column, hence the usage of It would be easier to debug if you could get the hive metastore and hive server logs (just the metastore and hiveserver2 container logs is fine). That said I see the issues you've pointed out as most likely candidates for some of these problems. If I built a custom reporting-operator image would you be able to try it out to see if it resolves the issue for you? As I mentioned, our AWS billing reports are giving us issues in other ways (preventing us from even scanning the bucket for reports) due to how they were setup and I've got no real way to test this right now. |
I've nearly made this work, with the following changes: Partitionspec field names need to match the columns
to this:
HiveTable CRD needs to accept an object/map, not string array for partitionSpec Change hive.crd.yml from this...
to this...
Literals need to be escaped with single quotes, not backticks. pkg\operator\reporting\db.go line 73
to this:
LOCATION needs to be quoted db.go line 55
to this:
With all of those changes, the partition statement looks plausible. It still fails though, with this:
|
Yep these are the same changes I was looking at minus the quote changes, but those seem necessary too as we used the wrong quotes. That last one is probably caused by #993, which looks like is the other one you filed. I may try to combine these PRs and bugs to make it easier to fix. |
Yeah I'm happy to help test with a custom image. You'll need to get the SDK updated like I have in my private fork, that's the only change I'm running with in that image other than the ones I just mentioned. |
@SapientGuardian ill make a separate PR for just that that we can get merged sooner than later since I actually need to do that for the other bug i was hitting that I mentioned. |
PR #1004 is to update the SDK. |
I have a reporting-operator image which contains the SDK update, datasource + query name fixes, and partitionSpec handling fixes. The image is
|
I also found another environment with billing data but not for the clusters I'm using, but it should work for testing these fixes, so I'm working on verifying it myself as well now. |
Oh, you'll also need to run the modified metering-ansible-operator since it has the updates to the templates. Follow https://github.com/operator-framework/operator-metering/blob/master/Documentation/manual-install.md#install-with-a-custom-metering-operator-image for how to install with a custom metering-operator image and use
|
Well found a panic as part of this. Making progress though. |
I fired up the fix-aws-datasources1 reporting-operator image. As expected (since I made the same changes) it behaves just like mine does, getting far enough to fail on creating the partition with I haven't tested your ansible image yet, though I'm pretty sure I have all the same changes on mine. Assuming it has all of the active PRs merged in (does it?), the one necessary change I have in my fork that I haven't seen a PR for is setting fsGroup in reporting-operator-deployment.yaml so the non-root user can read the projected token:
Is the panic you ran into the one about dbName being empty, coinciding with errors about the metering database not existing yet already existing, at the same time? I was holding off on filing the issue for that until the other things got resolved. I'm not actually sure how to fix that one, I couldn't figure out what the real problem was. But I did figure out a workaround: Don't enable the awsBillingReportDataSource on a fresh install; wait until everything else is created, then enable it. |
I used the hive cli (beeline) to manually run
and it worked, so I'm not sure why the reporting operator is getting "table not found". Wondering if somehow it's not in the metering database when it tries to run it? |
Table not found datasource_metering_aws_billing i believe is because the datasource is named incorrectly, which is what requires the ansible operator change. That said, I don't understand why the partition being added works since I expect the table name to be off. I did forget to build with the the fsGroup option in the ansible operator though. Just fixed that though. The panic is that issue, yes. It's because it's checking the spec, not the status of the datasource for the database name causing a few issues. I'm continuing to test this this week and next so I hope to have verified this myself soon. I'm also looking into how we can mock this out in some tests. |
If it's a clue, while the table did exist in hive, it appeared to be empty (but did have a ton of columns). Did you push the fsGroup change to the fix-aws-datasources1 tag on the ansible image? I just tried using that, seems like it doesn't have it. |
How did you attempt to use it? Can I see your MeteringConfig? It should be updated. |
Oh the table not existing issue is database schema not being properly selected when we add partitions I think. |
I used the env vars, this is included in the output of ./hack/upstream-install.sh:
Here's my MeteringConfig:
With your ansible image, the reporting-operator keeps spitting out this error:
Here's the describe of the ansible pod:
|
You need to specify the fsGroup in your MeteringConfig:
|
I believe I've got everything working, just verifying everything and making sure some changes are still necessary. I'll post here when I republish the images. |
Ok I was able to generate a report finally. I pushed the updated images just now, and I expect it should all work. I'll begin working on opening a PR today and monday and getting bugs sorted we can get this backported into the next 4.2.z release also. |
Seems like we're super close. I ran into one easily fixed issue with the reporting operator:
Edit: Nevermind, I figured out what's going on here. |
I'm trying to get my AWS billing report loaded in, and am running into errors like this:
validation failure list: \ nspec.partitions.partitionSpec in body must be of type array: "object" " app=metering component=reportDataSourceWorker error=" HiveTable.metering.openshift.io "reportdatasource-metering-aws-billing" is invalid: []: Invalid value: map[string]
followed by a dump of the HiveTable that's trying to be applied. I've copied the relevant part below, with the table name redacted:
Looking through the Go source code, it seemed like partitionSpec really should be an object, not a string array. So I modified hive.crd.yaml from this...
to this...
That allowed the HiveTable update to actually go through, but then I started getting errors like this:
I couldn't reconcile this behavior with the Go code. aws_usage_hive.go seems to hard-code the names as "start" and "end":
But db.go seems to look for keys matching the column names, which I guess are billing_period_start and billing_period_end in this case:
What's going on here? It seems like there's a mismatch between the hivetable spec used by aws_usage_hive.go, the hive crd, and db.go
The text was updated successfully, but these errors were encountered: