Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28505: OTEL: Implement OTEL Exporter to expose query details fro… #5451

Merged
merged 6 commits into from
Sep 27, 2024

Conversation

tanishq-chugh
Copy link
Contributor

…m HiveServer2

What changes were proposed in this pull request?

Implement OTEL Exporter to expose basic query details from HiveServer2

Why are the changes needed?

Better Usability & introduce various OTEL Clients to fetch query data

Does this PR introduce any user-facing change?

OTEL Clients can be configured to fetch data from HS2

Is the change a dependency upgrade?

No

How was this patch tested?

Manual Testing by integrating it with OTEL collector & various receivers such as Jaeger, Zipkin.

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanx @tanishq-chugh, have dropped some comments, please check & I think your code ain't properly formatted

Comment on lines 215 to 219
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-otlp</artifactId>
<version>${otel.version}</version>
</dependency>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we adding in itest? We are only having changes in the prod code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required in case of users running OTEL Service for HS2 which is running locally built from source code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't package itests as part of the distribution, this is integration test module. HS2 should run locally without this itests, it ain't there in the class path either


while (lQuery.getQueryDisplay() == null) {
try {
Thread.sleep(500);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there are sleep here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some cases, OTEL service was capturing the liveQueries in a state where queryDisplay was not initialised and leading to Null Pointer Exception. This is changed to drop such queries for the current OTEL run and capture it in next run.

Comment on lines 94 to 95
if(queryIdToSpanMap.containsKey(queryID)){
Span rootspan = queryIdToSpanMap.get(queryID);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

      Span rootspan = queryIdToSpanMap.get(queryID);
      if (rootspan != null) {

Comment on lines 99 to 108
queryIdToTasksMap.get(queryID).add(task.getTaskId());
Context parentContext = Context.current().with(rootspan);
Span currSpan = tracer.spanBuilder(queryID+ " - " + task.getTaskId() + " - live").setParent(parentContext).setAllAttributes(addTaskAttributes(task))
.setStartTimestamp(task.getBeginTime(), TimeUnit.MILLISECONDS).startSpan();
while (task.getEndTime() == null){
try {
Thread.sleep(500);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is same as below, refactor into a method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have refactored the code to remove the while loop of thread.sleep and usage of currSpan variable as well leading to single line of code for this block.


while (task.getEndTime() == null) {
try {
Thread.sleep(500);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't have sleep in the prod code like this, that too when the code is sequential, if it is null, just drop this task in this iteration and we will catch this in next iteration

String hQueryId = hQuery.getQueryDisplay().getQueryId();
Span rootspan = queryIdToSpanMap.get(hQueryId);
for (QueryDisplay.TaskDisplay task : hQuery.getQueryDisplay().getTaskDisplays()) {
if(queryIdToTasksMap.get(hQueryId) == null || !queryIdToTasksMap.get(hQueryId).contains(task.getTaskId())){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use rootspan instead of queryIdToTasksMap.get(hQueryId)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

queryIdToTasksMap.get(hQueryId) == null is not required anymore as this will never be null in case rootspan is not null. Have removed this condition.

Comment on lines 155 to 157
Span currSpan = tracer.spanBuilder(hQueryId+ " - " + task.getTaskId() + " - completed").setParent(parentContext).setAllAttributes(addTaskAttributes(task))
.setStartTimestamp(task.getBeginTime(), TimeUnit.MILLISECONDS).startSpan();
currSpan.end(task.getEndTime(), TimeUnit.MILLISECONDS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to store currSpan we can chain the calls

            tracer.spanBuilder(hQueryId + " - " + task.getTaskId() + " - completed")
                .setParent(parentContext).setAllAttributes(addTaskAttributes(task))
                .setStartTimestamp(task.getBeginTime(), TimeUnit.MILLISECONDS).startSpan()
                .end(task.getEndTime(), TimeUnit.MILLISECONDS);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this change at all the other points as well.

Comment on lines 161 to 164
//Update the rootSpan name & attributes before ending it
queryIdToSpanMap.get(hQueryId).updateName(hQueryId + " - completed");
queryIdToSpanMap.get(hQueryId).setAllAttributes(addQueryAttributes(hQuery));
queryIdToSpanMap.get(hQueryId).end();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work instead

        //Update the rootSpan name & attributes before ending it
        rootspan.updateName(hQueryId + " - completed").setAllAttributes(addQueryAttributes(hQuery)).end();

Comment on lines 173 to 174
if (!historicalQueryId.contains(hQuery.getQueryDisplay().getQueryId())) {
historicalQueryId.add(hQuery.getQueryDisplay().getQueryId());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a Set, you don't need to do contains then add, you can directly add and check the return value, if it is true, means the element wasn't there, if false, means the element was already there

}
rootSpan.setAllAttributes(addQueryAttributes(hQuery)).end();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your historicalQueryId will grow infinitely, you need to reset it the only ones returned by List<QueryInfo> historicalQueries = operationManager.getHistoricalQueryInfos();

This would return say only 25 (by default) but you would be storing it infinitely which will ultimately lead to OOM on the HS2 side

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i have refactored the code such that at the end of each OTEL loop, all the queries which have been removed from historicalQueries will also be removed from our historicalQueryId.

Comment on lines 215 to 219
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-otlp</artifactId>
<version>${otel.version}</version>
</dependency>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't package itests as part of the distribution, this is integration test module. HS2 should run locally without this itests, it ain't there in the class path either

private static final String INSTRUMENTATION_NAME = OTELExporter.class.getName();
private static final Logger LOG = LoggerFactory.getLogger(OTELExporter.class);
private final OperationManager operationManager;
private Set<String> historicalQueryId;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be final

Comment on lines 51 to 52
private Map<String, Span> queryIdToSpanMap;
private Map<String, List<String>> queryIdToTasksMap;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be final

//In case of live query previously encountered in past loops
if (rootspan != null) {
for (QueryDisplay.TaskDisplay task : lQuery.getQueryDisplay().getTaskDisplays()) {
if (task.getReturnValue() != null && task.getEndTime() != null && !queryIdToTasksMap.get(queryID).contains(task.getTaskId())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the line length is more than the allowed limit of 120

Comment on lines 102 to 103
rootspan = tracer.spanBuilder(queryID + " - live")
.startSpan();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

        rootspan = tracer.spanBuilder(queryID + " - live").startSpan();

.startSpan();
Context parentContext = Context.current().with(rootspan);

Span initSpan = tracer.spanBuilder(hQuery.getQueryDisplay().getQueryId() + " - completed").setParent(parentContext).startSpan()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit
line length is more than 120

attributes.put(AttributeKey.stringKey("Error Message"), query.getQueryDisplay().getErrorMessage());
attributes.put(AttributeKey.stringKey("explainPlan"), query.getQueryDisplay().getExplainPlan());
attributes.put(AttributeKey.stringKey("fullLogLocation"), query.getQueryDisplay().getFullLogLocation());
attributes.put(AttributeKey.stringKey("taskDisplays"), Joiner.on("\t").join(query.getQueryDisplay().getTaskDisplays()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this ain't required IMO, we are having spans for each task now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, removed taskDisplays attribute.


private AttributesMap addQueryAttributes(QueryInfo query){
AttributesMap attributes = AttributesMap.create(Long.MAX_VALUE, Integer.MAX_VALUE);
attributes.put(AttributeKey.stringKey("queryId"), query.getQueryDisplay().getQueryId());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think queryId is already there right? if not q -> Q

Comment on lines 191 to 194
attributes.put(AttributeKey.longKey("End Time"), query.getEndTime());
attributes.put(AttributeKey.stringKey("Operation Id"), query.getOperationId());
attributes.put(AttributeKey.stringKey("Operation Log Location"), query.getOperationLogLocation());
attributes.put(AttributeKey.stringKey("Error Message"), query.getQueryDisplay().getErrorMessage());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove space

Comment on lines 165 to 167
.setAttribute("queryID", hQuery.getQueryDisplay().getQueryId())
.setAttribute("queryString", hQuery.getQueryDisplay().getQueryString())
.setAttribute("Begin Time", hQuery.getBeginTime());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we instead call addQueryAttributes here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make initSpan and rootSpan storing the same content in case of historical queries. Would be better if we keep only these minimal attributes in initSpan so that there's consistency between live and historical traces.

Comment on lines 124 to 125
queryIdToSpanMap.put(queryID,rootspan);
queryIdToTasksMap.put(queryID,completedTasks);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't formatted, can you format the code properly here & the other similar places

}
}

List<String> historicalQueryIDs = new ArrayList<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be a set

Span initSpan = tracer.spanBuilder(queryID + " - live").setParent(parentContext).startSpan()
.setAttribute("QueryId", queryID)
.setAttribute("QueryString", lQuery.getQueryDisplay().getQueryString())
.setAttribute("BeginTime", lQuery.getBeginTime());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should set setStartTimestamp for the intiSpan with this value rather than the attribute

Context parentContext = Context.current().with(rootspan);

Span initSpan = tracer.spanBuilder(queryID + " - live").setParent(parentContext).startSpan()
.setAttribute("QueryId", queryID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add user as well


Span initSpan = tracer.spanBuilder(queryID + " - live").setParent(parentContext).startSpan()
.setAttribute("QueryId", queryID)
.setAttribute("QueryString", lQuery.getQueryDisplay().getQueryString())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can let queryString present only in rootSpan, some queries are huge to be put twice and are more eye catching as compared to just query id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added queryString into initSpan and removed it from rootSpan

if (lQuery.getQueryDisplay().getErrorMessage() != null) {
initSpan.setAttribute("ErrorMessage", lQuery.getQueryDisplay().getErrorMessage());
}
initSpan.end();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this I believe should have the begin time as the end time here

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


package org.apache.hive.service.servlet;

import java.util.ArrayList;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused import

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, removing it.

Copy link

sonarcloud bot commented Sep 25, 2024

@ayushtkn ayushtkn merged commit 9d97e4a into apache:master Sep 27, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants