Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doubts on foxtrot APIs and internal architecture #259

Open
sumannewton opened this issue Jun 5, 2019 · 4 comments
Open

Doubts on foxtrot APIs and internal architecture #259

sumannewton opened this issue Jun 5, 2019 · 4 comments

Comments

@sumannewton
Copy link

sumannewton commented Jun 5, 2019

I started using foxtrot from last few days. Wanted to understand more about it. I am running it locally.
We have some concerns/doubts about the same:

  • Does foxtrot support UPDATE of documents with the same id?
    If yes, how? Right now document with same id and same timestamp gets updated as the row key is designed with that combination. But document with the same id and different/no timestamp creates a new entry(row key).
  • In ES, after inserting the document, we are unable to see the _source field and its entries. Only the meta info is visible like below.
    Data inserted is:
{
  "id": "1",
  "data": {
    "name": "newton",
    "age": 123,
    "work": 1,
    "score": {
      "phy": 93,
      "che": 90,
      "mat": 90
    }
  }
}

ES output:

curl "localhost:9200/foxtrot-testapp3-table-05-6-2019/_search"?pretty
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "foxtrot-testapp3-table-05-6-2019",
        "_type" : "document",
        "_id" : "\rtestapp3:00000001559717884149:1:__RAW_KEY_VERSION_2__",
        "_score" : 1.0
      }
    ]
  }
}

Is there anything I am missing here?

  • Why does the ES document id has \r,\u0000,\t prefixed. Why?
    This makes it difficult to get the actual ES document using ES APIs.
  • TTL for hbase table created by foxtrot API is FOREVER always.
    Ex: Table created with seggregatedBackend=true and ttl=3 and forceCreate=true is having TTL as forever in hbase.
  • Does foxtrot store actual/raw data in ES as well or just the IDs of the data.
    As per the documentation, it says

We keep max 90 days of data on the Elasticsearch cluster. We store only document IDs on this layer. For the Query analytics, we apply the filtering on Elasticsearch and get the raw data from the Key-Value store.

Does every query always contact hbase to get the recent data as well or is it directly served from ES?

  • Is there any documentation on the internal architecture of foxtrot?
    If yes, can you please update it in the wiki.

@santanusinha @r0goyal

Thanks

@santanusinha
Copy link
Collaborator

  • Update will happen if you post with same id
  • Foxtrot uses _ES in pure doc index mode. Use the get apis with your id to get raw document or use select query
  • It's not about ES, it's basically a one byte prefix for better region distribution on hbase
  • The hbase store is there for long term event storage and events are not meant to be deleted from that. Having said that, there is a PR that basically enforces TTL on backend as well (this however will increase the compaction time on HBase somewhat)
  • ES is used as a pure doc index. Mostly inverted indexes are kept on data (all events in stipulated ttl). The actual source is stored on hbase for long term storage.
  • We do not support nested types and parent child relationships.
  • There is no other documentation other than what is available on github. There is however a presentation that you can refer to to get more insight into why this was built.

There are actually a whole bunch of things that are not yet documented. For example:

  • we have a great new console with a variety of new and configurable widgets. You can find it at http://host:portecho/index.htm. There is a new console for FQL as well. That can be found at: http://host:port/echo/fql/index.htm.
  • Internal heuristics to block grouping queries on high cardinality fields to protect ES against exploding caches
    It is used at massive scale (1B+ events/day) at places. We are working on cleaning up some core parts of it. Feel free to take it for a spin. And get back to us on the gitter channel or issues.

Fell free to close the issue if your queries have been answered.

@sumannewton
Copy link
Author

Thanks. That pretty much answers most of the doubts. Will reopen this if required.

@sumannewton
Copy link
Author

@santanusinha I have patched foxtrot in my local to support below functionalities:

  • Support for the nested datatype and query.
  • Support for boolean filters - AND/OR.

Let me know if there are any issues supporting these above features. If not, I will raise a pull request to review.

@sumannewton sumannewton reopened this Sep 23, 2019
@santanusinha
Copy link
Collaborator

santanusinha commented Sep 24, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants