-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-28505: OTEL: Implement OTEL Exporter to expose query details fro… #5451
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanx @tanishq-chugh, have dropped some comments, please check & I think your code ain't properly formatted
itests/hive-unit/pom.xml
Outdated
<dependency> | ||
<groupId>io.opentelemetry</groupId> | ||
<artifactId>opentelemetry-exporter-otlp</artifactId> | ||
<version>${otel.version}</version> | ||
</dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we adding in itest? We are only having changes in the prod code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is required in case of users running OTEL Service for HS2 which is running locally built from source code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't package itests as part of the distribution, this is integration test module. HS2 should run locally without this itests, it ain't there in the class path either
|
||
while (lQuery.getQueryDisplay() == null) { | ||
try { | ||
Thread.sleep(500); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there are sleep here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some cases, OTEL service was capturing the liveQueries in a state where queryDisplay was not initialised and leading to Null Pointer Exception. This is changed to drop such queries for the current OTEL run and capture it in next run.
if(queryIdToSpanMap.containsKey(queryID)){ | ||
Span rootspan = queryIdToSpanMap.get(queryID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Span rootspan = queryIdToSpanMap.get(queryID);
if (rootspan != null) {
queryIdToTasksMap.get(queryID).add(task.getTaskId()); | ||
Context parentContext = Context.current().with(rootspan); | ||
Span currSpan = tracer.spanBuilder(queryID+ " - " + task.getTaskId() + " - live").setParent(parentContext).setAllAttributes(addTaskAttributes(task)) | ||
.setStartTimestamp(task.getBeginTime(), TimeUnit.MILLISECONDS).startSpan(); | ||
while (task.getEndTime() == null){ | ||
try { | ||
Thread.sleep(500); | ||
} catch (InterruptedException e) { | ||
throw new RuntimeException(e); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is same as below, refactor into a method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have refactored the code to remove the while loop of thread.sleep and usage of currSpan variable as well leading to single line of code for this block.
|
||
while (task.getEndTime() == null) { | ||
try { | ||
Thread.sleep(500); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't have sleep in the prod code like this, that too when the code is sequential, if it is null, just drop this task in this iteration and we will catch this in next iteration
String hQueryId = hQuery.getQueryDisplay().getQueryId(); | ||
Span rootspan = queryIdToSpanMap.get(hQueryId); | ||
for (QueryDisplay.TaskDisplay task : hQuery.getQueryDisplay().getTaskDisplays()) { | ||
if(queryIdToTasksMap.get(hQueryId) == null || !queryIdToTasksMap.get(hQueryId).contains(task.getTaskId())){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use rootspan
instead of queryIdToTasksMap.get(hQueryId)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
queryIdToTasksMap.get(hQueryId) == null
is not required anymore as this will never be null in case rootspan is not null. Have removed this condition.
Span currSpan = tracer.spanBuilder(hQueryId+ " - " + task.getTaskId() + " - completed").setParent(parentContext).setAllAttributes(addTaskAttributes(task)) | ||
.setStartTimestamp(task.getBeginTime(), TimeUnit.MILLISECONDS).startSpan(); | ||
currSpan.end(task.getEndTime(), TimeUnit.MILLISECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to store currSpan
we can chain the calls
tracer.spanBuilder(hQueryId + " - " + task.getTaskId() + " - completed")
.setParent(parentContext).setAllAttributes(addTaskAttributes(task))
.setStartTimestamp(task.getBeginTime(), TimeUnit.MILLISECONDS).startSpan()
.end(task.getEndTime(), TimeUnit.MILLISECONDS);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made this change at all the other points as well.
//Update the rootSpan name & attributes before ending it | ||
queryIdToSpanMap.get(hQueryId).updateName(hQueryId + " - completed"); | ||
queryIdToSpanMap.get(hQueryId).setAllAttributes(addQueryAttributes(hQuery)); | ||
queryIdToSpanMap.get(hQueryId).end(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work instead
//Update the rootSpan name & attributes before ending it
rootspan.updateName(hQueryId + " - completed").setAllAttributes(addQueryAttributes(hQuery)).end();
if (!historicalQueryId.contains(hQuery.getQueryDisplay().getQueryId())) { | ||
historicalQueryId.add(hQuery.getQueryDisplay().getQueryId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a Set, you don't need to do contains then add, you can directly add and check the return value, if it is true, means the element wasn't there, if false, means the element was already there
} | ||
rootSpan.setAllAttributes(addQueryAttributes(hQuery)).end(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your historicalQueryId
will grow infinitely, you need to reset it the only ones returned by List<QueryInfo> historicalQueries = operationManager.getHistoricalQueryInfos();
This would return say only 25 (by default) but you would be storing it infinitely which will ultimately lead to OOM on the HS2 side
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, i have refactored the code such that at the end of each OTEL loop, all the queries which have been removed from historicalQueries
will also be removed from our historicalQueryId
.
itests/hive-unit/pom.xml
Outdated
<dependency> | ||
<groupId>io.opentelemetry</groupId> | ||
<artifactId>opentelemetry-exporter-otlp</artifactId> | ||
<version>${otel.version}</version> | ||
</dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't package itests as part of the distribution, this is integration test module. HS2 should run locally without this itests, it ain't there in the class path either
private static final String INSTRUMENTATION_NAME = OTELExporter.class.getName(); | ||
private static final Logger LOG = LoggerFactory.getLogger(OTELExporter.class); | ||
private final OperationManager operationManager; | ||
private Set<String> historicalQueryId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be final
private Map<String, Span> queryIdToSpanMap; | ||
private Map<String, List<String>> queryIdToTasksMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be final
//In case of live query previously encountered in past loops | ||
if (rootspan != null) { | ||
for (QueryDisplay.TaskDisplay task : lQuery.getQueryDisplay().getTaskDisplays()) { | ||
if (task.getReturnValue() != null && task.getEndTime() != null && !queryIdToTasksMap.get(queryID).contains(task.getTaskId())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the line length is more than the allowed limit of 120
rootspan = tracer.spanBuilder(queryID + " - live") | ||
.startSpan(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
rootspan = tracer.spanBuilder(queryID + " - live").startSpan();
.startSpan(); | ||
Context parentContext = Context.current().with(rootspan); | ||
|
||
Span initSpan = tracer.spanBuilder(hQuery.getQueryDisplay().getQueryId() + " - completed").setParent(parentContext).startSpan() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
line length is more than 120
attributes.put(AttributeKey.stringKey("Error Message"), query.getQueryDisplay().getErrorMessage()); | ||
attributes.put(AttributeKey.stringKey("explainPlan"), query.getQueryDisplay().getExplainPlan()); | ||
attributes.put(AttributeKey.stringKey("fullLogLocation"), query.getQueryDisplay().getFullLogLocation()); | ||
attributes.put(AttributeKey.stringKey("taskDisplays"), Joiner.on("\t").join(query.getQueryDisplay().getTaskDisplays())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this ain't required IMO, we are having spans for each task now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, removed taskDisplays attribute.
|
||
private AttributesMap addQueryAttributes(QueryInfo query){ | ||
AttributesMap attributes = AttributesMap.create(Long.MAX_VALUE, Integer.MAX_VALUE); | ||
attributes.put(AttributeKey.stringKey("queryId"), query.getQueryDisplay().getQueryId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think queryId is already there right? if not q -> Q
attributes.put(AttributeKey.longKey("End Time"), query.getEndTime()); | ||
attributes.put(AttributeKey.stringKey("Operation Id"), query.getOperationId()); | ||
attributes.put(AttributeKey.stringKey("Operation Log Location"), query.getOperationLogLocation()); | ||
attributes.put(AttributeKey.stringKey("Error Message"), query.getQueryDisplay().getErrorMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove space
.setAttribute("queryID", hQuery.getQueryDisplay().getQueryId()) | ||
.setAttribute("queryString", hQuery.getQueryDisplay().getQueryString()) | ||
.setAttribute("Begin Time", hQuery.getBeginTime()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we instead call addQueryAttributes
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would make initSpan and rootSpan storing the same content in case of historical queries. Would be better if we keep only these minimal attributes in initSpan so that there's consistency between live and historical traces.
queryIdToSpanMap.put(queryID,rootspan); | ||
queryIdToTasksMap.put(queryID,completedTasks); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't formatted, can you format the code properly here & the other similar places
} | ||
} | ||
|
||
List<String> historicalQueryIDs = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be a set
Span initSpan = tracer.spanBuilder(queryID + " - live").setParent(parentContext).startSpan() | ||
.setAttribute("QueryId", queryID) | ||
.setAttribute("QueryString", lQuery.getQueryDisplay().getQueryString()) | ||
.setAttribute("BeginTime", lQuery.getBeginTime()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should set setStartTimestamp
for the intiSpan with this value rather than the attribute
Context parentContext = Context.current().with(rootspan); | ||
|
||
Span initSpan = tracer.spanBuilder(queryID + " - live").setParent(parentContext).startSpan() | ||
.setAttribute("QueryId", queryID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add user as well
|
||
Span initSpan = tracer.spanBuilder(queryID + " - live").setParent(parentContext).startSpan() | ||
.setAttribute("QueryId", queryID) | ||
.setAttribute("QueryString", lQuery.getQueryDisplay().getQueryString()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can let queryString present only in rootSpan, some queries are huge to be put twice and are more eye catching as compared to just query id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added queryString into initSpan and removed it from rootSpan
if (lQuery.getQueryDisplay().getErrorMessage() != null) { | ||
initSpan.setAttribute("ErrorMessage", lQuery.getQueryDisplay().getErrorMessage()); | ||
} | ||
initSpan.end(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this I believe should have the begin time as the end time here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
package org.apache.hive.service.servlet; | ||
|
||
import java.util.ArrayList; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my bad, removing it.
Quality Gate passedIssues Measures |
…m HiveServer2
What changes were proposed in this pull request?
Implement OTEL Exporter to expose basic query details from HiveServer2
Why are the changes needed?
Better Usability & introduce various OTEL Clients to fetch query data
Does this PR introduce any user-facing change?
OTEL Clients can be configured to fetch data from HS2
Is the change a dependency upgrade?
No
How was this patch tested?
Manual Testing by integrating it with OTEL collector & various receivers such as Jaeger, Zipkin.