Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround causes serial port discovery issues #1332

Open
wborn opened this issue May 13, 2020 · 22 comments
Open

Workaround causes serial port discovery issues #1332

wborn opened this issue May 13, 2020 · 22 comments

Comments

@wborn
Copy link
Member

wborn commented May 13, 2020

The binding currently uses serialPortManager.getIdentifiers() to get the SerialPortIdentifier.

This workaround was added in #1222 by @bodiroga.

The workaround causes serial port detection issues because the getIdentifiers() method will only return discovered ports.

As a result:

  • Undiscovered (non-standard) RXTX ports will not work this way (e.g ports defined in udev rules), the transport adds undiscovered ports to gnu.io.rxtx.SerialPorts so they can be used without users having to configure this
  • RFC2217 ports cannot be used because there is no discovery logic for these ports.

Perhaps the workaround can be removed by instead detecting that a controller is disconnected by just opening the port and handling any resulting exceptions?

Related to:

@RafalLukawiecki
Copy link

I think calling getIdentifiers does not make sense for RFC2217, as I have alluded to in this comment.

Interestingly, there is code marked "TODO" that supposedly was going to do that, but I cannot believe it would be possible to ever implement it, see RFC2217PortProvider.java:

@Override
public Stream<SerialPortIdentifier> getSerialPortIdentifiers() {
    // TODO implement discovery here. https://github.com/eclipse/smarthome/pull/5560
    return Stream.empty();
}

@wborn
Copy link
Member Author

wborn commented May 13, 2020

I also think it's unlikely RFC2217 discovery will ever be implemented.

It would be interesting to know if ZWave with RFC2217 works for you without:

  • this workaround
  • the calls to methods that throw unsupported operation exceptions

@RafalLukawiecki
Copy link

In my testing I have commented out this workaround, last weekend, and unfortunately it did not work. Binding was still showing "No such port exists" messages. I suppose it should be all re-tested without calls to the unsupported methods. I just don't know if I can invest more time into it—let me explain (perhaps I should make a more generic post about the difficulties on the OH forum).

Debugging OpenHAB core is too hard: it is very hard to know how the bits fit together whilst replacing and recompiling them. Maven and Java only make it harder by creating a huge barrier that could only be removed by better "welcome to a new developer" documentation on OH or more focused onboarding support for newcomers. I gave up after 35 hours invested into the ZWave binding and nrjavaserial. I am glad that my efforts yielded an important fix to nrjavaserial, but I was very disheartened that the ZWave binding maintainers were not as able to help me debug the issue as the developer of nrjavaserial was in fixing theirs.

I realise everyone is a volunteer and their work has been excellent. Probably due to other factors, unfortunately, the number of times I was told that the maintainer did not read or know the code that was closely related to the issue affecting their binding, or that it was the responsibility of some other openhab component to do things properly was too much for me. For a newcomer it is not encouraging to feel that I should be more committed to fixing an issue with a binding that its own maintainer wants to. It made me concerned that I would be on my own, going forward, without stronger support in the (hard) debugging effort. If this was not Java/Maven/Eclipse but something more discoverable and less byzantine, or even just a bit better documented, I would have persisted. But with the high barrier to entry for even a fairly experienced developer—admittedly, in many other languages and platforms—but with lack of stronger support, I am sorry to say, I had to give up.

I really wanted to be a helpful part of it. I am very sorry.

@wborn
Copy link
Member Author

wborn commented May 13, 2020

I think you did an excellent job to try to get it working @RafalLukawiecki as newcomer. Back when I started working on openHAB I also found it very difficult to understand how everything works together (even as a seasoned Java developer). What worked for me was fixing small issues in add-ons and learning my way around everything eventually.

For me the RFC2217 part is new and looks like you've uncovered quite a few issues with it. It probably has to do with that it's not used by many. Which is reinforced by the same issues of course.

Hopefully we do can get it fixed/improved eventually. After spending 35 hours and without a positive outlook to get it fixed it's OK to give up. Thanks for recompiling the FreeBSD nrjavaserial libraries and triggering me to have a look at the RFC2217 part. 👍 I'll continue to make some improvements for it.

@bodiroga
Copy link
Contributor

Hi guys!

Sorry for being a little bit late to the party, but I will try to help fixing this bug and testing the proposed solution 👍

As far as I remember, that workaround was introduced because the function serialPortManager.getIdentifier(portId); didn't return null when the USB controller was unplugged and, thus, making the binding believe everything was 'ok'. I know that it doesn't make any sense, but that was what happened on my computer and I explain it in the PR (see #1222).

I will start up the Eclipse IDE, replace my function with the original one and test it again, let's see if things have changed (I hope so). I will also get a spare Raspberry Pi and test the RFC2217 functionality 🤞

Many thanks to @RafalLukawiecki for his awesome work during this week (your job with the nrjavaserial library has been amazing) and @wborn for having a look at this topic 🎆

Best regards,

Aitor

@wborn
Copy link
Member Author

wborn commented May 14, 2020

That would be awesome! It seems RFC2217 is working better with these changes.

@bodiroga
Copy link
Contributor

That would be awesome! It seems RFC2217 is working better with these changes.

Hi @wborn!

I'm sorry, but I'm still seeing the erroneous behavior after replacing getSerialPortIdentifier(portId) with serialPortManager.getIdentifier(portId): the function doesn't return null when the USB device is unplugged. As a consequence, the reconnection logic is broken 😢 I have waited 10 minutes and the function is still returning a non null value, so if it is cache problem, it keeps the value for a long time.

Which one is the function being executed when serialPortManager.getIdentifier(portId) is called? This one? Or is the function overwritten by the nrjavaserial library? That would explain why my workaround works 🤔

Any idea?

Thank you guys!

@bodiroga
Copy link
Contributor

Is this the function being executed?

@wborn
Copy link
Member Author

wborn commented May 14, 2020

Yes that's the one. Doesn't it throw exceptions that you can properly handle when you open the port when the device is unplugged?

@bodiroga
Copy link
Contributor

Yes that's the one. Doesn't it throw exceptions that you can properly handle when you open the port when the device is unplugged?

No sir, openHAB crashes on:

SerialPortIdentifier commPort = portIdentifier.open("org.openhab.binding.zwave", 2000);

without throwing any exception. The only message I get in the Eclipse IDE console is /dev/ttyACM0: No existe el archivo o el directorio (/dev/ttyACM0: file or directory does not exist), but that seems to be a JVM message and not an openHAB one.

This is the reason why the workaround was added 😢

@wborn
Copy link
Member Author

wborn commented May 14, 2020

Does it generate a hs_err_pid*.log file of the crash? If it is an issue with nrjavaserial we should report it or test if it is still an issue with 5.1.0 or newer which has some fixes for device removal.

It can also be related to NeuronRobotics/nrjavaserial#144.

There's a similar issue in the zigbee code openhab/org.openhab.binding.zigbee#577.

@bodiroga
Copy link
Contributor

bodiroga commented May 14, 2020

Does it generate a hs_err_pid*.log file of the crash?

No hs_err_pid*.log file found after executing the command: sudo find / -name hs_err_pid*. Fu** 😞

If it is an issue with nrjavaserial we should report it or test if it is still an issue with 5.1.0 or newer which has some fixes for device removal.

Yeah, I have been reading that issue in the nrjavaserial repo, but I have no clue how could I try the latest version of the library (5.1.1) with the Eclipse IDE app.bndrun launcher. Is it possible to add it editing the pom.xml file? I'm going to try to add this https://mvnrepository.com/artifact/com.neuronrobotics/nrjavaserial/5.1.1 to the pom file, let's see if I am lucky 😄

Many thanks for your help @wborn, I really appreciate it 😉

PS: No, I can not make it work, the core.io.transport.serial.* bundles depend on 3.x version of nrjavaserial library 🤔

@wborn
Copy link
Member Author

wborn commented May 15, 2020

Yes that was to be expected. The easiest way to test it without updating/recompiling core is by manually updating the bundle manifest and replacing:

~/.m2/repository/org/openhab/nrjavaserial/3.15.0.OH2/nrjavaserial-3.15.0.OH2.jar

with this nrjavaserial-3.15.0.OH2.jar.

It's a hack, but it saves a lot of work. 😉

After resolving the Demo App bundles, it should add nrjavaserial;version='[3.99.9,3.99.10)',\

You can also check the Felix Web Console that it is using that bundle:

http://localhost:8080/system/console/bundles

username/password: admin/admin

It should show:

reversioned

@bodiroga
Copy link
Contributor

Hi @wborn!

It's a hack, but it saves a lot of work. 😉

Great idea, I haven't thought about it 👍

Unfortunately... openHAB keeps crashing after removing the Z-Wave USB stick, and the worst thing is that I fear that I might be doing something wrong, so I'm not sure if we should rise the error to the nrjavaserial repository 😢 I don't know if @RafalLukawiecki (or @cdjackson) has still enough energy to makes some tests with the USB device connected to the host PC instead of using the RFC2217 port 😉

On the other hand, the feature introduced in the 5.1.0 release works great. After changing this to true and removing the long sleep here, I see the HARDWARE_ERROR event and I can use it to trigger the onSerialPortError function. But, on the next watchSerialPort execution, the binding tries to open the portIdentifier and openHAB crashes. I must be making some stupid mistake...

Anyway, I have not given up yet, so I will keep trying!

Thanks for your help!

PS: do I have to activate anything in the Eclipse IDE or the Demo App in order to get a .pid file for the JVM crash?

@bodiroga
Copy link
Contributor

It can also be related to NeuronRobotics/nrjavaserial#144.

Just to make things more clear, this issue seems to be related to Windows, isn't it? (The file @Ranjith619 is suggesting to change is "src/main/c/src/windows/termios.c") In my case, my tests are being made in a Linux computer (Ubuntu 18.04 in a Dell XPS 13 developer edition) and, for the sake of completeness, the java -version command outputs:

openjdk version "1.8.0_252"
OpenJDK Runtime Environment (build 1.8.0_252-8u252-b09-1~18.04-b09)
OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode)

Should I try a different JDK or is the used one correct?

@wborn
Copy link
Member Author

wborn commented May 16, 2020

Your JDK should be fine and I'm also on Ubuntu 18.04 and was also able to reproduce it with another USB stick.

It might be that the library encounters some error and just exits the whole process. If you look at fuserImp.c there are several exit(0) and exit(1) calls that may cause this.

@wborn
Copy link
Member Author

wborn commented May 16, 2020

OK I did some more testing and it seems that if you just discover the identifiers and then get the identifier it doesn't crash. Can you test if that also solves the issue for you @bodiroga?

serialPortManager.getIdentifiers();
serialPortManager.getIdentifier(...);

In a very simple example it solves the issue for me.

@wborn
Copy link
Member Author

wborn commented May 16, 2020

I've created NeuronRobotics/nrjavaserial#180 for this.

@bodiroga
Copy link
Contributor

bodiroga commented May 16, 2020

Ummmmm... two strange findings from my side:

  • As you have posted, I have added serialPortManager.getIdentifiers(); before the SerialPortIdentifier portIdentifier = serialPortManager.getIdentifier(portId); line, but I don't see any behavioural change. openHAB continues crashing when trying to connect to the device after replugin the Z-Wave stick.

  • Instead of only adding the getIdentifiers() line, I also tried with the Stream<SerialPortIdentifier> idents = serialPortManager.getIdentifiers(); line, but doing so makes the serialPortManager.getIdentifier(...) function return null in each cycle (30 seconds). Perhaps the ESH serial transport does something different than the direct nrjavaserial library? That is the only difference I can spot 🤔

Thanks again for your work, we are closer to the final solution 😉

PS: Wait, the second case only occurs if I print the idents object size. Here you can see what I have added:

Stream<SerialPortIdentifier> idents = serialPortManager.getIdentifiers();
logger.error("identifiers size: {}", idents.count()); // Delete
SerialPortIdentifier portIdentifier = serialPortManager.getIdentifier(portId);
logger.error("Port identifier: {}", portIdentifier); // Delete

With just this:

Stream<SerialPortIdentifier> idents = serialPortManager.getIdentifiers();
SerialPortIdentifier portIdentifier = serialPortManager.getIdentifier(portId);
logger.error("Port identifier: {}", portIdentifier); // Delete

openHAB keeps crashing, just as before.

@wborn
Copy link
Member Author

wborn commented May 22, 2020

Can you try this reversioned nrjavaserial 5.2.1 in this nrjavaserial-3.15.0.OH2.jar?
It has the latest fixes for unplugging/reinserting USB sticks.

@bodiroga
Copy link
Contributor

Hi @wborn!

I'm not sure if things have improved, this is the current situation:

Every 30 seconds (when the watchSerialPort is executed), the serialPort.getIdentifier(portId) function returns the correct identifier and null in turns: correct, null, correct, null, correct, null... Perhaps I could change the watchSerialPort logic and only call it if after a disconnect is detected, but I didn't want to change the reconnect strategy so much. I will make some tests first, but I'm not sure how to proceed when the behaviour changes in every new vesion 😄

Many thank for your work!

@wborn
Copy link
Member Author

wborn commented May 24, 2020

When the port is opened with the newer nrjavaserial versions, it will no longer be part of the discovered ports. I noticed that change in behavior as well, see NeuronRobotics/nrjavaserial#166 (comment). That change is due to the fact that the OH nrjavaserial fork didn't use lockfiles. So you cannot use the discovered ports to determine if a device is still connected.

wborn added a commit to wborn/openhab-core that referenced this issue May 27, 2020
With this version it will no longer completely exit the JVM in certain scenarios when ports are closed/reopened.

Related to:

* openhab/org.openhab.binding.zigbee#577
* openhab/org.openhab.binding.zwave#1332

Signed-off-by: Wouter Born <[email protected]>
cweitkamp pushed a commit to openhab/openhab-core that referenced this issue May 27, 2020
With this version it will no longer completely exit the JVM in certain scenarios when ports are closed/reopened.

Related to:

* openhab/org.openhab.binding.zigbee#577
* openhab/org.openhab.binding.zwave#1332

Signed-off-by: Wouter Born <[email protected]>
splatch pushed a commit to ConnectorIO/copybara-hab-core that referenced this issue Jul 11, 2023
With this version it will no longer completely exit the JVM in certain scenarios when ports are closed/reopened.

Related to:

* openhab/org.openhab.binding.zigbee#577
* openhab/org.openhab.binding.zwave#1332

Signed-off-by: Wouter Born <[email protected]>
GitOrigin-RevId: 5b325aa
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants