Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting no value on motor signals together with Betaflight SITL #241

Open
Honungsburken opened this issue Oct 1, 2024 · 14 comments
Open
Labels
bug Something isn't working enhancement New feature or request question Further information is requested

Comments

@Honungsburken
Copy link

Hello,

I don’t get values on the motor signals together with Betaflight SITL when running the ReadMe example.

I tried downloading Betaflight Configurator as that was previously in the instructions. But for the latest version, my understanding is that the eeprom.bin file replaces the Configurator. (Though, I had not noticed “Initialized motor count 4” when running the betaflight_SITL.elf file before I set the motors to PWM in the configurator, but that can just be me)

PWMOut seems to be set to port 9001/9002, but when running the simulation it creates a new terminal running the betaflight_SITL.elf from the betaflights/bf0/obj/main/ folder. The Bind port 5761 for UART1 subsequently fails because I am already running the betaflight_SITL.elf file from betaflight/obj/main/ as per the PyBullet gym ReadMe examples. The signals over 9001 and 9002 seems to be sent, but I can only see them in Wireshark if I close the terminal with the failed bind port message. When closing I get the “catch up” messages and then it continues running. If I close the terminal immediately I just get the continuous messages over the whole simulation. The problem is that all the messages are empty, so the drone does not fly at all.

I am confused about what betaflight_SITL.elf file should be running, the one in betaflight/obj/main/ as per the PyBullet gym ReadMe or betaflights/bf0/obj/main/ folder. I have also tried to modify the code in BetaAviary.py that opens the terminal to just open one at the time. I get messages on 9001/9002 regardless of which betaflight_SITL.elf is running, but the data is always just zeros. Though, one difference is that I get only get the “Initialized motor count 4” when running the file from betaflight/obj/main/ as per the ReadMe.

I also modified the clone_bfs.sh script a little to get the right version of Betaflight, I added “git checkout c155f58 #v4.5.0” to get version 4.5.0 instead of 4.5.1.

I’ll attach the outputs from the betaflight_SITL.elf terminals when running the example. What can be the reason for not getting values for the motor signals?
run_as_example

@Honungsburken Honungsburken closed this as not planned Won't fix, can't repro, duplicate, stale Oct 7, 2024
@Honungsburken
Copy link
Author

Update: I get some motor signals with values but then they quickly go back to zero.

I have tried again with the exact versions of both gym-pybullet and Betaflight that was used when the ReadMe was updated, the gym-pybullet commit 9a9ca8a on nov 27 2023 and the Betaflight commit cafe727 on nov 23 2023. I usually get the same results with no movement/motor signals zero, but now sometimes the drones starts following the trajectory and does so for different periods of time, before the motor signals go back to all zero and the drone drops. So far it seems to be random if the drone moves at all and for how long, it has never completed a simulation.
I'll attatch the plots from the longest run I had so far.

longest

@Honungsburken Honungsburken reopened this Oct 7, 2024
@JacopoPan JacopoPan added the question Further information is requested label Oct 10, 2024
@JacopoPan
Copy link
Member

Hello @Honungsburken

it should be noted that the simulation is running in its own python interpreter process at the rate set DEFAULT_CONTROL_FREQ_HZ (should be at least 500) and very naively aligned to the wall-clock here

def sync(i, start_time, timestep):
"""Syncs the stepped simulation with the wall-clock.

while betaflight binaries is run as its own binaries and bidirectional communication is over UDP (see lines linked below)
def step(self, action, i):
obs, reward, terminated, truncated, info = super().step(self.beta_action)
t = i/self.CTRL_FREQ
for j in range(self.NUM_DRONES): #TODO: add multi-drone support
#### State message to Betaflight ###########################
o = obs[j,:] # p, q, euler, v, w, rpm (all in world frame)
p = o[:3]
q = np.array([o[6], o[3], o[4], o[5]]) # w, x, y, z
v = o[10:13]
w = o[13:16] # world frame
w_body = rotate_vector(w, qconjugate(q)) # local frame
fdm_packet = struct.pack(
'@dddddddddddddddddd', # t, w, a, q, v, p, pressure

The first thing I would investigate if you did anything that might have affected the time-alignment of sim and bf (like not synching to wall-clock or changing the sim freq)

@Honungsburken
Copy link
Author

Honungsburken commented Oct 10, 2024

Hi @JacopoPan, Thanks for your answer!

I have looked into your comments but I can not see that I have done anything that might affect the time-alignment. However, that doesn't necessarily mean it's not the problem.

I have switched some frequencies in both beta.py and Base/BetaAviary (to both higher and lower). But the ones set in beta.py overwrites the default values of 240 in Base/BetaAviary so there is no point in changing those. I tried setting sim/ctrl frequency to 500 again, which was the original, and to my surprise, the whole simulation ran without problems. I can’t recall that I tried anything that I didn’t do before, but the whole simulation ran. Then the drone crashed again after some time, and sometimes it succeeded again. Still seems to be very random. I have not changed anything regarding the sync.

The only trend I might see (but am not sure) is that the simulation seems to complete/run for longer if I restart the terminals. When checking CPU load it seems to be utilizing only 50%. I am running a laptop with Ryzen 5 4500U. Should it be no problem or could it be that it lacks single core performance?

When I look at the render output in the terminal during the sim, the sim time vs the wall clock time always starts low and ramps up until 0.89x. Is this normal or should it be closer to 1?

The only other clock related thing I can think of is that I am dual booting Windows and Ubuntu 22.04, which gives the wrong time for Windows, but I don’t think it should affect Ubuntu since I have not changed any time settings.

@JacopoPan
Copy link
Member

JacopoPan commented Oct 11, 2024

I would try to increase the ctrl (not necessary the physics) freq to > 500hz and remove the sync() from the loop running the sim
I don't think dual booting is a problem, the main limitation is that syncing pybullet and beta flight purely through sleep() is very empiric, a better way would require to modify the beta flight firmware to run in lockstep with the sim (but that can be quite a bit of work)

@Honungsburken
Copy link
Author

Hi again,

There is a check in BaseAviary

if self.PYB_FREQ % self.CTRL_FREQ != 0:
            raise ValueError(f'[ERROR] in BaseAviary.__init__(), pyb_freq {pyb_freq} is not divisible by env_freq {ctrl_freq}.')

That makes sure that the physics freq is always a multiple of the control, therefore I can't only increase the ctrl frequency. I tried removing that and modified the code to make it run anyway, but I did not get it to work properly. Do you think it is worth spending more time on that?

I tried raising both ctrl and physics frequency to values between 500-1500, the higher values subsequently makes the drone run its trajectory faster but not more robust. But I've found (and that was the reason it worked better after my second message), that commenting out the env.render() that makes the high frequent terminal output, makes the simulation complete in about 75% of runs. Do you have any other ideas how to improve this further, or is this expected with the "naive" sync?

@JacopoPan
Copy link
Member

no no, of course the PYB_FREQ must be an integer multiple of CTRL_FREQ because the pybullet engine steps N time between every control input.

I'm not sure about the drone going faster over the trajectory changing any of those freqs (it shouldn't happen: check the simple trajectory generation in the example, you probably need to scale it to cover the greater number of steps per second you take now).

Removing all the printouts does make sense, have you removed the sync() sleep as well?

@Honungsburken
Copy link
Author

Thanks, I'll look into the scaling. I have tried removing the sync() sleep several times but then the drone does not move at all for some reason

@kovalishinilya
Copy link

Hello,
I'm facing the same exact problem with betaflight SITL. With fresh clone of your repo and after doing all listed steps to set up the environment SITL does not seem to interact with ctrl at all

@JacopoPan
Copy link
Member

@kovalishinilya can you check socket info with sudo ss -tulpn while running the example? Or have you tried including this delay (https://github.com/betaflight/betaflight/blob/master/src/main/main.c#L52 for me it was working better without but it is intended for the SITL build)

@Honungsburken
Copy link
Author

Honungsburken commented Dec 9, 2024

Update: Found a solution #241 (comment)

Hello @JacopoPan . When running the example, the sudo ss -tulpn looks like this

Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
udp UNCONN 0 0 127.0.0.53%lo:53 0.0.0.0:* users:(("systemd-resolve",pid=498,fd=13))
udp UNCONN 0 0 0.0.0.0:33128 0.0.0.0:* users:(("avahi-daemon",pid=584,fd=14))
udp UNCONN 896 0 127.0.0.1:9002 0.0.0.0:* users:(("python",pid=28667,fd=9))
udp UNCONN 0 0 0.0.0.0:9003 0.0.0.0:* users:(("betaflight_SITL",pid=28713,fd=5))
udp UNCONN 0 0 0.0.0.0:9004 0.0.0.0:* users:(("betaflight_SITL",pid=28713,fd=6))
udp UNCONN 0 0 0.0.0.0:42687 0.0.0.0:* users:(("betaflight_SITL",pid=28713,fd=4))
udp UNCONN 0 0 0.0.0.0:42843 0.0.0.0:* users:(("betaflight_SITL",pid=28713,fd=3))
udp UNCONN 0 0 0.0.0.0:34986 0.0.0.0:* users:(("python",pid=28667,fd=8))
udp UNCONN 0 0 0.0.0.0:5353 0.0.0.0:* users:(("avahi-daemon",pid=584,fd=12))
udp UNCONN 0 0 [fe80::de57:335a:81f4:f52a]%wlp1s0:546 [::]:* users:(("NetworkManager",pid=590,fd=27))
udp UNCONN 0 0 [::]:5353 [::]:* users:(("avahi-daemon",pid=584,fd=13))
udp UNCONN 0 0 [::]:38505 [::]:* users:(("avahi-daemon",pid=584,fd=15))
tcp LISTEN 0 4096 127.0.0.53%lo:53 0.0.0.0:* users:(("systemd-resolve",pid=498,fd=14))
tcp LISTEN 0 128 127.0.0.1:631 0.0.0.0:* users:(("cupsd",pid=932,fd=7))
tcp LISTEN 0 10 0.0.0.0:5761 0.0.0.0:* users:(("betaflight_SITL",pid=28713,fd=8))
tcp LISTEN 0 511 :80 : users:(("apache2",pid=946,fd=4),("apache2",pid=945,fd=4),("apache2",pid=835,fd=4))
tcp LISTEN 0 128 [::1]:631 [::]:
users:(("cupsd",pid=932,fd=6))

It does not seem to work better with the delay on line 52 in main.

I have also extracted the package data from wireshark running the betaflight example and plotted in the following graphs. I am running a standard install of gym-pybullet-drones and betaflight commit cafe727.

  • As we can see, the motor outputs suddenly spike after a while (around t = 10, t is just wireshark timestamps here, not seconds)
    motor_outputs_example

  • This induces a large spike in the x angular velocity, it is important to note that the motors seems to change before the state, and not the other way around. (The spikes in the beginning is just when the drone falls from it spawning point 0.1m up, to the ground)
    angular_vel_example

  • Here is also the RC inputs.
    rc_inputs_example

Does this help to draw any conclutions and is there some other data that could be useful?

@Honungsburken
Copy link
Author

I have also done similar tests with my own controller, with similar but slightly different results:
(Here using gym-pybullet-drones commit 9a9ca8a)
I am currently running the simulation in 500hz and without the gui (instead a getCameraImage from the drone to cv2), and hence no sync. I am currently not running the trajectory example, but a vision based PID controller that rises to a setpoint (in the following example, I only have throttle control, not roll/pitch/yaw). But I have the same issues with crashes nonetheless. I still have about 75% success rate for shorts sims, but when it fails I have found out the following, using Wireshark and plotting all the data in both state, RC, and motor messages:

  • The motor packets rises when the trajectory is initiated after 1.5 seconds (Drone is armed after 0.3). Then after a short time, the motor values drop to zero, without an additional spike.

motor_outputs

  • The whole motor spike, including the drop to zero packets is sent before we get a big spike in the IMU angular velocity for both X and Y.

angular_velocity_drop

  • As shown here, it is only one strongly deviating packet, and from what I can see nothing happens before the motor spike.

scatter_angular_velocity

  • All of the other state values are hard coded to zero (or 1 for pressure and Quaternion w) in BetaAviary as per the example, which I've verified with graphs. Drone is also always armed/AUX 1 =1500 after arm.time passed.

  • RC packets. Except for controller performance I don't see anything odd.

RC_control_inputs

@Honungsburken
Copy link
Author

Update: I have done some testing/debugging in Betaflight and I noticed that it seems to be crashing due to activation of the failsafe: FAILSAFE_RX_LOSS_DETECTED. Hopefully someone with greater knowledge can dive into the root cause of the problem, but I have found a simple solution to get around the problem:

In the Betafligt code, in failsafe.c, change the failsafeState monitoring to false and rebuild the Betaflight code with make clean && make TARGET=SITL. The drone will not go into failsafe, and the simulation will keep on running without crashing.

void failsafeStartMonitoring(void) { failsafeState.monitoring = false; //true }
https://github.com/betaflight/betaflight/blob/58e3d2a817044699cbb1c397f2aa4aeda7678fb3/src/main/flight/failsafe.c#L139-L142

@JacopoPan
Copy link
Member

Thanks for the finding @Honungsburken
For a less-invasive fix, the failsafe should be configurable with the Configurator as well: https://betaflight.com/docs/wiki/configurator/failsafe-tab
If you PR a new version of the configuration replacing /gym-pybullet-drones/gym_pybullet_drones/assets/eeprom.bin, I'd be happy to test and merge

@Honungsburken
Copy link
Author

I have now made a PR with a fixed the eeprom file. But as I’ve stated before, the newest version of Betaflight does not seem to work with the eeprom file. The reason seems to be that the newest version creates a hex file with version 4.6.0, which does not run properly (or can be opened with the configurator).

The older cafe727 commit creates a hex file with Betaflight version 4.5.0, (betaflight_4.5.0_SITL.hex), Therefore I have been using Betaflight commit cafe727. Hence, I have also added a git checkout cafe727 in the clone_bfs script.

For the eeprom file, I couldn’t find a setting that disabled failsafe all together, but by increasing the time-to-failsafe-delays to max using the CLI, the simulation ran without problems:

`
set failsafe_delay = 200
set failsafe_off_delay = 200
set failsafe_throttle_low_delay = 300

save
`

I also fixed a small artifact in the beta.py example file that referenced to an old trajectory file and not the new.

Hopefully this works for you as well :)

@JacopoPan JacopoPan added bug Something isn't working enhancement New feature or request labels Dec 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants