Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random ServerException #465

Open
Masadow opened this issue Jun 12, 2024 · 4 comments
Open

Random ServerException #465

Masadow opened this issue Jun 12, 2024 · 4 comments

Comments

@Masadow
Copy link

Masadow commented Jun 12, 2024

While on production, I receive random errors both in android and web (I haven't tested other platforms)

On web, it resembles to :

ServerException(originalException: ClientException: XMLHttpRequest error., uri=https://api.v2.medami.fr, originalStackTrace: Error
    at Object.ccc (https://patient.v2.medami.fr/main.dart.js:8636:19)
    at bJ9.$1 (https://patient.v2.medami.fr/main.dart.js:124760:63)
    at Object.cPc (https://patient.v2.medami.fr/main.dart.js:7401:19)
    at b4v.<anonymous> (https://patient.v2.medami.fr/main.dart.js:318907:10)
    at bxv.q7 (https://patient.v2.medami.fr/main.dart.js:70756:12)
    at cGj.$0 (https://patient.v2.medami.fr/main.dart.js:70021:11)
    at Object.atE (https://patient.v2.medami.fr/main.dart.js:7263:40)
    at aS.ld (https://patient.v2.medami.fr/main.dart.js:69941:3)
    at cOb.$0 (https://patient.v2.medami.fr/main.dart.js:70546:20)
    at Object.cPa (https://patient.v2.medami.fr/main.dart.js:7395:19), parsedResponse: null)

On android :

ServerException(originalException: ClientException with SocketException: Software caused connection abort (OS Error: Software caused connection abort, errno = 103), address = api.v2.medami.fr, port = 44844, uri=https://api.v2.medami.fr, originalStackTrace: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
pid: 18642, tid: 18715, name 1.ui
os: android arch: arm64 comp: yes sim: no
build_id: 'b866ace19d6955e3f7dae5b7ee781c6c'
isolate_dso_base: 7036011000, vm_dso_base: 7036011000
isolate_instructions: 7036377580, vm_instructions: 7036361000
    #00 abs 000000703712b433 virt 000000000111a433 _kDartIsolateSnapshotInstructions+0xdb3eb3
<asynchronous suspension>
    #01 abs 00000070371179f3 virt 00000000011069f3 _kDartIsolateSnapshotInstructions+0xda0473
<asynchronous suspension>
    #02 abs 00000070371174cb virt 00000000011064cb _kDartIsolateSnapshotInstructions+0xd9ff4b
<asynchronous suspension>
    #03 abs 0000007037117133 virt 0000000001106133 _kDartIsolateSnapshotInstructions+0xd9fbb3
<asynchronous suspension>
, parsedResponse: null)

I've seen another issue (#358) but I doubt it's linked to CORS since my error is random and not systematic + it also occurs on android.

Relevant code :

await gql.client.request(GGetStepReq((b) => b..vars.id = stepId)).first

gql being of class

import 'package:flutter/widgets.dart';
import 'package:ferry/ferry.dart';
import 'package:ferry_hive_store/ferry_hive_store.dart';
import 'package:hive/hive.dart';
import 'package:gql_exec/src/request.dart';
import 'package:medami_utils/services/auth.dart';
import 'package:medami_utils/services/graphql/log.dart';
import 'package:provider/provider.dart';

class GraphQLClient extends ChangeNotifier {
  static late final Cache cache;
  static late final String endpoint;
  static late final Function(Request request, LinkException e) logError;
  late Client client;
  String? token;

  static init(String endpoint, Function(Request request, LinkException e) logError) async {
    GraphQLClient.endpoint = endpoint;
    GraphQLClient.logError = logError;

    final box = await Hive.openBox("graphql");

    final store = HiveStore(box);

    cache = Cache(store: store);
  }

  void build(String? token) {
    this.token = token;
    
    client = Client(
      link: HttpLinkWithLog(endpoint, token, logError),
      cache: cache,
      defaultFetchPolicies: {
        OperationType.query: FetchPolicy.NetworkOnly,
      }
    );

    notifyListeners();
  }
}

class GraphQL extends StatelessWidget {
  GraphQL({super.key, required this.child});

  final GraphQLClient _graphql = GraphQLClient();
  final AuthToken _auth = AuthToken();

  final Widget Function(GraphQLClient) child;

  @override
  Widget build(BuildContext context) {
    return ChangeNotifierProvider(
      create: (context) => _auth,
      child: Consumer<AuthToken>(
        builder: (context, authToken, c) {
          print('rebuild graphql client with token: ${authToken.value}');
          _graphql.build(authToken.value);
          return c!;
        },
        child: ChangeNotifierProvider(
          create: (context) => _graphql,
          child: child(_graphql),
        ),
      ),
    );
  }
}

Custom httplink to log link errors globally

import 'dart:async';

import 'package:ferry/ferry.dart';
import 'package:gql_exec/src/request.dart';
import 'package:gql_exec/src/response.dart';
import 'package:gql_http_link/gql_http_link.dart';

class HttpLinkWithLog extends HttpLink {
  HttpLinkWithLog(endpoint, token, this.logError) : super(
    endpoint,
    defaultHeaders: {
      if (token !=  null) 'Authorization': token,
    },
  );

  Function(Request request, LinkException e) logError;

  @override
  Stream<Response> request(
    Request request, [
    NextLink? forward,
  ]) async* {
    final controller = StreamController<Response>();

    Future<void>(() async {
      try {
        await for (final response in super.request(request)) {
          controller.add(response);
        }
      } on LinkException catch (e) {
        logError(request, e);
        controller.addError(e);
      } finally {
        await controller.close();
      }
    });

    yield* controller.stream;
  }
}

The graphql object is not being rebuilt when the error occured so it can't be the cause of the issue too

At this point, I have no idea where to look at to understand and fix the issue.

Since it's very random (happen maybe 1/50), it's impossible for me to build a small repro as well

@Masadow Masadow changed the title Random errors Random ServerException Jun 12, 2024
@knaeckeKami
Copy link
Collaborator

"Software caused connection abort" on mobile devices happens typically when a request is in flight, the app is backgrouded and the OS kills any open sockets to save resources.

Nothing that gql could help with, that's just how mobile OS work.

But in my experience, the issue is less problematic when using native http implementations.
See https://pub.dev/packages/http#2-configure-the-http-client
https://pub.dev/packages/native_dio_adapter
depending on which http implementation you use

@Masadow
Copy link
Author

Masadow commented Jun 13, 2024

I'll look into both provided links for android, thanks. However the application run in kiosk mode, so it's never put in background, the issue is raised while user is interacting with the app so I'm very certain that the app is in foreground.

Would you suggest that I should have a retry strategy for every request made ?

What about the error occuring on web ?

@knaeckeKami
Copy link
Collaborator

knaeckeKami commented Jun 16, 2024

likely nothing related to gql, but underlying network, proxy, firewall ... issues.

though you could try ErrorLink as workaround

@Masadow
Copy link
Author

Masadow commented Nov 26, 2024

Hi,

I made some progress on this lately by analyzing packets on my server.

First thing I noticed was that there was a race condition with keep alive. Default keep alive with HTTP 1.1 is 5 seconds, if a request was made right after 5 seconds of inactivity, the server would close the tcp socket and gql http link would send a request in the same time that would become invalid. There should have been a retry mechanism in the lowest level of gql link for such cases because it's a common problem in HTTP 1.1 protocol with keep alive.

Anyway, I decided to disable keep-alive altogether from the service side by returning Connection: Close instead of Connection: Keep-Alive but it appears to not be supported as shown in wireshark log below

Screenshot 2024-11-24 at 17 47 13

Screenshot 2024-11-26 at 14 49 41

Indeed, as you can see at the beginning, it's all ok until packet 364 where server notify about the TCP socket to be closed which gql client respond with an ACK.

As you can see aswell, the server responded in packet 363 with a Connection: Close header meaning client should close the tcp socket and not reuse it.

However, client will send another HTTP request on the same socket at packet 431 two seconds later which would obviously result in a fail and a panic RST from the server

To conclude, it looks like gql is missing a few logic in connection handling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants