Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: panic when worker actor start picking up jobs #408

Closed
Tracked by #8
mudler opened this issue Jul 11, 2024 · 6 comments · Fixed by #412
Closed
Tracked by #8

bug: panic when worker actor start picking up jobs #408

mudler opened this issue Jul 11, 2024 · 6 comments · Fixed by #412
Assignees
Labels
bug Something isn't working prio: high

Comments

@mudler
Copy link
Contributor

mudler commented Jul 11, 2024

Masa-Oracle version:
N/a

Environment, CPU architecture, OS, and Version:

Describe the bug

To Reproduce

Expected behavior

Logs

Jul 11 06:02:52 ip-10-1-1-112 masa-node[1282]: time="2024-07-11T06:02:52Z" level=info msg="[+] Worker Address: 185.244.181.123"
Jul 11 06:02:52 ip-10-1-1-112 masa-node[1282]: time="2024-07-11T06:02:52Z" level=info msg="[+] Worker Address: 92.246.142.195"
Jul 11 06:02:52 ip-10-1-1-112 masa-node[1282]: time="2024-07-11T06:02:52Z" level=info msg="[+] Actor started"
Jul 11 06:02:52 ip-10-1-1-112 masa-node[1282]: message repeated 2 times: [ time="2024-07-11T06:02:52Z" level=info msg="[+] Actor started"]
Jul 11 06:02:52 ip-10-1-1-112 masa-node[1282]: panic: runtime error: invalid memory address or nil pointer dereference
Jul 11 06:02:52 ip-10-1-1-112 masa-node[1282]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x8bb1dc]
Jul 11 06:02:52 ip-10-1-1-112 masa-node[1282]: goroutine 5996 [running]:
Jul 11 06:02:52 ip-10-1-1-112 masa-node[1282]: github.com/asynkron/protoactor-go/actor.(*PID).ref(0x0, 0xc000edb200)
Jul 11 06:02:52 ip-10-1-1-112 masa-node[1282]: #011/home/ubuntu/go/pkg/mod/github.com/asynkron/[email protected]/actor/pid.go:28 +0x1c
Jul 11 06:02:52 ip-10-1-1-112 masa-node[1282]: github.com/asynkron/protoactor-go/actor.(*PID).sendUserMessage(0x0, 0x30?, {0x17ab420, 0xc002239a40})
Jul 11 06:02:52 ip-10-1-1-112 masa-node[1282]: #011/home/ubuntu/go/pkg/mod/github.com/asynkron/[email protected]/actor/pid.go:49 +0x25
Jul 11 06:02:52 ip-10-1-1-112 masa-node[1282]: github.com/asynkron/protoactor-go/actor.(*RootContext).sendUserMessage(0xc0012e8ae0?, 0xc00156fc00?, {0x17ab420?, 0xc002239a40?})
Jul 11 06:02:52 ip-10-1-1-112 masa-node[1282]: #011/home/ubuntu/go/pkg/mod/github.com/asynkron/[email protected]/actor/root_context.go:148 +0xb8
Jul 11 06:02:52 ip-10-1-1-112 masa-node[1282]: github.com/asynkron/protoactor-go/actor.(*RootContext).Send(...)
Jul 11 06:02:52 ip-10-1-1-112 masa-node[1282]: #011/home/ubuntu/go/pkg/mod/github.com/asynkron/[email protected]/actor/root_context.go:113
Jul 11 06:02:52 ip-10-1-1-112 masa-node[1282]: github.com/masa-finance/masa-oracle/pkg/workers.SendWork.func2()
Jul 11 06:02:52 ip-10-1-1-112 masa-node[1282]: #011/home/ubuntu/masa-oracle/pkg/workers/workers.go:322 +0x1dd
Jul 11 06:02:52 ip-10-1-1-112 masa-node[1282]: created by github.com/masa-finance/masa-oracle/pkg/workers.SendWork in goroutine 5973
Jul 11 06:02:52 ip-10-1-1-112 masa-node[1282]: #011/home/ubuntu/masa-oracle/pkg/workers/workers.go:314 +0x46d

Additional context

if !isBootnode(ipAddr) && p.IsTwitterScraper || p.IsWebScraper || p.IsDiscordScraper {

  • We don't know if it's actually the bittensor layer to generate invalid requests, however the oracle shouldn't panic and return an error message instead saying what went wrong
@mudler mudler added bug Something isn't working prio: high labels Jul 11, 2024
@juanmanso
Copy link
Contributor

🟢 [UPDATE] Tried a few endpoints with my setup on devnet and no panic

My setup

  • Chain endpoint Devnet (ws://54.205.45.3:9945), netuid 1
  • local validator (make run-validator, uid 5)
  • local miner (make run-miner, uid 6)
  • local oracle node (make run on masa-oracle repo, staked account)
    • Twitter credentials ✅
    • Discord bot token ✅
    • Scraper booleans -> All true

Tested endpoints

  • ✅ GET /data/twitter/followers/juanmanso_
  • ✅ GET /data/twitter/profile/juanmanso_
  • ✅ POST /data/twitter/tweets/recent -> query "internet", count 1
  • POST /data/web -> url "https://www.github.com", depth 1
    • Returns empty array, I assume it's because oracle is taking too long to respond
    • Pinged the oracle directly and took some time but eventually responded correctly
  • ✅ GET /data/discord/profile/691473028525195315

Next step

  • Try with Docker-compose (which is similar to the deployed environment)

@juanmanso
Copy link
Contributor

@juanmanso
Copy link
Contributor

Link to the whole log:

From the miner's side of the log I was able to reproduce with the following steps:

  • Run local miner (in my case on devnet since I have no access to testnet)
  • Run local oracle node
  • Perform a request to validator to reach miner
  • When miner starts processing it and I see oracle receiving the request, shut down oracle manually ( Ctrl + C )
  • See output of miner
Log of original issue My reproduced issue's log

From here it is clear to assume that the miner is failing because of the oracle.

@juanmanso
Copy link
Contributor

Inspecting the Oracle's side of the log, the only thing outstanding there is the memory access attempt

image

So I'm guessing there's something going on in the deployed's oracle side that it's making it break. Not enough resources on the server? Older code?

I think as far as the @masa-finance/subnet team we are done here. I'll leave it to the @masa-finance/protocol team to debug this further since they have more context

@juanmanso juanmanso assigned juanmanso and unassigned juanmanso Jul 11, 2024
@juanmanso
Copy link
Contributor

@mudler moved it to In Progress since work has been done here. However, since there's no assignee it might be counter productive. Should I move it back to Ready?

DEFAULT ACTION

  • Leaving it In Progress but assign the team as a whole until someone picks it up

@juanmanso
Copy link
Contributor

Okay cannot assign @masa-finance/protocol for some reason 🫠

@teslashibe teslashibe self-assigned this Jul 11, 2024
@teslashibe teslashibe changed the title panic when worker actor start picking up jobs bug: panic when worker actor start picking up jobs Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working prio: high
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants