Linux, multiple GPUs - segfault in OpenGL

BlitzMax Forums/BlitzMax Programming/Linux, multiple GPUs - segfault in OpenGL

Derron(Posted 2016) [#1]
Hi,

I recently got this issue posted on github:
https://github.com/GWRon/TVTower/issues/76


Right on trying to access the OpenGLDriver() it segfaults as soon as the user utilizes his second GPU instead of the inbuilt one.

Had someone of you similar issues - and if so, how to solve that?


I do not have these kind of configuration here, so I cannot easily debug the issue. Sending the user a dozen of debug binaries is only the last chance I want to try.


bye
Ron


Brucey(Posted 2016) [#2]
You could try it with an SDL backend ? ...


Derron(Posted 2016) [#3]
Thought about that too ... at least for the NG build.

Just hoped, someone had similar trouble and solved it "differently". Want to avoid having a borked "opengl implementation" in vanilla Max2D.


bye
Ron


dawlane(Posted 2016) [#4]
The user hasn't given you a lot to go on with the hardware, but it sounds like they are using a machine with a Hybrid graphics system. Not the best supported under Linux.

Make sure that they have the latest nvidia binary driver (nvidia-361) from the repository and the nvidia-prime package installed.

General guide for Trusty, but easily adapted for xenial.


Derron(Posted 2016) [#5]
I cited you in that issue.

Let's wait what the user reports about the SDL binary. Brucey mentioned some many moons ago that SDL improves handling of multiple screens on Linux (Dual/Triple-screen setups...) - maybe this includes some OGL stuff too (fixed pipeline stuff - excuse, i am not familar with that stuff).


@ hybrid
I knew the constellation, but the technical term just did not came into my mind in that moment.

bye
Ron


dawlane(Posted 2016) [#6]
This guide seems to be a little more up to date, but I would still install the nvidia-prime package as its marked as recommends and not as a dependency.

You can switch manually by setting the environment variable DRI_PRIME to 1 before running the program.
e.g
DRI_PRIME=1 ./executable

For a Steam game you would add in the steam game launcher DRI_PRIME=1 %command%

Note: That nVidia does not support Hybrid systems and the drivers that you get directly from the nVidia download site will not work. Systems with Hybrid graphics tend to get custom drivers supplied by the manufactures.
Your user will have to go searching the internet for solutions when dealing with these systems and Linux. A good place to start would be here.


If the drivers are installed correctly and there is still a problem; then I would suggest building a 32 bit test application as well as a 64 bit test application to check that the 32 bit nvidia kernel modules are actually working.


Derron(Posted 2016) [#7]
The user stated, that both, 32 and 64bit executables (32bit with vanilla, 64bit with NG) failed.

Still to wait for results with "NG + SDL" build.



PS: updated your cite ...hope it helps that user somehow.


bye
Ron


Derron(Posted 2016) [#8]
Ok, user replied (see issue link in the OP).


Seems he has run the 64bit-NG-build (using SDL) but it did not set back the original desktop resolution once he exited my app.
Dunno if this is then an issue with his setup and SDL ... or if the SDL apps need some additional commands run at the end of the app then.


bye
Ron


dawlane(Posted 2016) [#9]
This could be down to any number of problems from the drivers, xorg.config setup (or the auto detect) or the application.
You shouldn't be relying on the OS to reset the screen resolution. You should be storing the current desktop resolution and restore it yourself when the application closes.

These problems are why Linux Desktop will never be suitable for the masses and will always be considered an operating system for geeks.
These types of issues have been know about for a long time and still have never been fixed or fixed correctly. Without a complete rethink of how the whole lot is put together they never will.


Derron(Posted 2016) [#10]
> You shouldn't be relying on the OS to reset the screen resolution. You should be storing the current desktop resolution and restore it yourself when the application closes.


I will happily accept solutions to that issue on how to store/restore such things (especially as setting that via an application is surely something hmm "tricky" looking at the multiple DEs available for linux).
Do not say I just have to do something like "EndGraphics()" when exiting the application ?!


Just assumed, an app creates a graphical context and when destroying that context, the OS / driver handles the rest.



bye
Ron


grable(Posted 2016) [#11]
Just assumed, an app creates a graphical context and when destroying that context, the OS / driver handles the rest.

Looking at /brl.mod/glgraphics.mod/glgraphics.linux.c its the same situation as on Windows, GL has no api for screen resolutions so has to rely on the window manager (X in this case).
And it looks like its supposed to reset to the previous resolution when its done, why its not doing that i dunno.
To make sure, you can try to do what bbGLGraphicsClose does, line 412-415.


dawlane(Posted 2016) [#12]
The problem with BlitzMax is that it's using a lot of old depreciated API libraries. Having to use third party APIs such as SDL can actually cause more problems, as I my self have discovered. SDL2 has a little problem with correctly handling the way my dual monitors are configured when I first initialise SDL.

@Derron: I would suggest wrapping functionality of the Xrandr library and use that to get/set the display.


Derron(Posted 2016) [#13]
Hmm, Brucey once wrote to me, that SDL solves such issues with dual monitor setups for him (if I remember correctly).

For you it is doing the opposite?


For me I once had a game running in the center of my two screens:
(think it was a digesteroids compilate by Brucey ... maybe with NG).

OK when windowed

.------------------.  .------------------.
|                  |  |                  |
|          .-------|  |-------.          |
|          |       |  |       |          |
|          |    PRO|  |GRAM   |          |
|          |       |  |       |          |
|          '-------|  |-------'          |
'------------------'  '------------------'
       |    |                |    |
      ========              ========    

FAILED with fullscreen
.------------------.  .------------------.
|                  |  |                  |
|                  |  |                  |
|                  |  |                  |
|     P   R   O   G|  |  R   A   M       |
|                  |  |                  |
|                  |  |                  |
'------------------'  '------------------'
       |    |                |    |
      ========              ========


Correct would be

.------------------.  .------------------.
|                  |  |                  |
|                  |  |                  |
|                  |  |                  |
|  P R O G R A M   |  |  P R O G R A M   |
|                  |  |                  |
|                  |  |                  |
'------------------'  '------------------'
       |    |                |    |
      ========              ========


BEST would be
(when doing a "windowed fullscreen")
.------------------.  .------------------.
|                  |  |                  |
|                  |  |                  |
|                  |  |                  |
|  P R O G R A M   |  |  D E S K T O P   |
|                  |  |                  |
|                  |  |                  |
'------------------'  '------------------'
       |    |                |    |
      ========              ========


(Yes, I am using old 19" Screens with 5:4 ratio)

@ Wrapping Xrandr
Ohhh, I am glad to be the pro when it comes to such tasks ;-).
You know that I am using BlitzMax because I (at that time) hoped for the tool doing the work for me.

If you have the time and are in mood to tinker with Xrandr, feel free to help "all" by pushing your enhancements to github.com/maxmods. I am sure that I wont be able to do that without headache, trouble and tons of bugs.


Edit: Make sure to use the latest brl-thingies of maxmods - I fixed an issue with the linux variants not reacting to focusOut-events of the system: With fix you could react to appsuspend, which you else would not be able to.
(dunno if this is needed somehow for "windowed fullscreen" - as you then are allowed to do things on your other screens simultaneously).


dawlane(Posted 2016) [#14]
Derron you could just cheat and write a little bash script to use the xrandr application, store the resolution, call your program and then use xrandr to put it all back ;-).

As for dual monitors. You have to consider a number of things:
1) What type of graphics card port each monitor is connected to. Mine is a bit weird as the one card I have installed, has a DVI (connected to HP w2448hc) and a HDMI connection (connected to a Belinea 19"). The latter seems to cause a few problems with SDL2 and resolution setup.
2) The number of available GPUs or graphic cards. Having a GPU for each monitor makes life a little easier when constructing an xorg.conf
3) The favoured output port the the graphics card uses for primary display when the system is first turned on.
4) The xorg.conf server layout and screen sections. What options you put in those sections can make the difference.
5) Conflicting API e.g. xinerama and twin-view.
6) Issues with tools to set resolutions, etc.

I haven't use SDL with BlitzMax. I was using FreePascal (I will have to knock up a C/C++ test later), but the problem I get has to do with the value in display_mm_width being evaluated to zero and then being divided by the mode width for the second monitor for hdpi data member of SDL_DisplayData (divide by zero exception). This could be to do with how the shared SDL2 was originally built, or I've come across a bug with the SDL code, or there is a problem with the desktop manager. The strange thing is there are no problems after I disable the second monitor, build and run. When I then enable the second monitor; it works as normal.


Derron(Posted 2016) [#15]
@ 1) - 6)

Of course you can "configure your system to dead" but it should work for "average joe linux distro user" (plugging in the second screen, opening up their drivers config tool and click "extended / cloned / ...").

So best thing would be to just use the most basic approach (whatever that is) and if that fails, allow the app-user to override behaviour somehow (eg. customize params given to a tool like xrandr).


Also: like said the "easiest" way is "windowed fullscreen" as you do not switch to full screen mode (eg. activating "cloning" for such cases) but behave like a maximized window.



@ SDL Bugs
uhhmm ya...
div/0 should not happen - except they could guarantee to have "0" being never returned by "SDL_DisplayData". So this sounds as if SDL contains a bug (not handling corrupt/broken data).

The name "display_mm_width" sounds as if you want to get some kind of "DPI" ... and the HDMI one does not return its measurements (mm width * mm height) to the system).


Maybe "SDL_GetDisplayDPI" returns something useful instead?
https://wiki.libsdl.org/SDL_GetDisplayDPI


bye
Ron


dawlane(Posted 2016) [#16]
Well the SDL2 problem with the Pascal headers has got me stumped.
The current version that comes with kubuntu 16.04 is SDL 2.0.4. I've test the Pascal version with kubuntu 14.04 and there are no problems with it.
I will have to install a few other distributions to test.

This is one of the reason why Linux distributions tend to get on my nerves. Too many distribution with too many API systems build on top of each other. Makes trying to solve such problems impossible.

@SDL_GetDisplayDPI: Iwill have to write a bit of test code in C/C++ and Pascal and compare the output.


Derron(Posted 2016) [#17]
Systems should work regardless of their "DE".

So "*ubuntu" should work all the same ... in theory ;-)

They also throw away so much manpower with those dozens of distributions. On the other hand this increases diversity, options and "freedom of choice".

If they would have some guidelines like for "APIs" they would be exchangeable without much fuzz (only things like "additions/extensions" would be able to create trouble then ... when having some functionality "crossing" other addons ... and hmm, this makes the "API"-like-approach useless).



But this leads to off topic.


@ SDL
Reply here if you get results.


bye
Ron


dawlane(Posted 2016) [#18]
Systems should work regardless of their "DE".

So "*ubuntu" should work all the same ... in theory ;-)

They also throw away so much manpower with those dozens of distributions. On the other hand this increases diversity, options and "freedom of choice".
Not quite; there is always some discrepancy between releases. It's just a question of what's changed and what it breaks. And with increased diversity, etc; comes increased problems for the developer and hassle for the user when things go wrong.

Take for instance Kubuntu 16.04 LTS with the Plasma 5 desktop. You wouldn't believe the amount of bugs that I've come across. The best one is if you changed the theme; kwin has a habit of removing the window decoration at random. Took me a few hours and a fair bit of searching the bugs reports to find out that a setting in the resource file was loading the wrong K-library module. As a result of the bugs I've come across, this release should never have been given the LTS status.

As for the SDL2+Pascal problem, executing the command line XRandr is showing a dpi of 0mm x 0mm for the second monitor. So looks like I will have to do a bit of digging in the X11 setting. If that fails; then I will have to have a look in the X language binding in the Free Pascal units.

One problem with getting information from SDL_GetDisplayDPI requires that SDL Video subsystem is initialised, which is the problem to start with.

In my search for this quest I've come across something from the ArchLinux forums that may be of some help in you quest.


Derron(Posted 2016) [#19]
dawlane:
One problem with getting information from SDL_GetDisplayDPI requires that SDL Video subsystem is initialised, which is the problem to start with.


Ahh, sorry, did not check that before. So it surely relies on other functions (returning 0) to calculate its result.

@x11
So if that external command is already failing / returning borked data - why should the language binding be the erroneous part in the chain?


I just checked that command (did not use it before as all my trouble with xorg.conf were on my HTPC with getting rid of overscan on the old SONY TV)
Mine does what it needs:
DFP1 connected 1280x1024+0+0 (normal left inverted right x axis y axis) 376mm x 301mm
CRT1 connected 1280x1024+1280+0 (normal left inverted right x axis y axis) 376mm
(yes... no FHD ... that one is placed 2m on the left of my working place ;-) ... two smaller screens save energy consumption)


bye
Ron


dawlane(Posted 2016) [#20]
So it surely relies on other functions (returning 0) to calculate its result.
You would only get such a result as long as the function can handle CPU exception and fail gracefully. SDL2 needs a subsystem initialised before you can make any queries.

@x11
So if that external command is already failing / returning borked data - why should the language binding be the erroneous part in the chain?
Well something must be wrong with the Free Pascal binding or the SDL2 bindings as it uses the X and Xlib pascal units.
The C/C++ handled it correctly with the odd way that I had the monitors set up, but I suspect that some blame has to be place on the SDL2 library as well for not checking invalid data prior to use.

I've got round the problem by doing a little rearrangement of how I had the monitors connected.

Previously connected
Hewlett Packard HP w2448hc connect via VGA cable using VGA-2-DVI adaptors at both input and output ports.
Belinea 10 19 27 connected via HDMI cable using a HDIM-2-DVI adaptors at input port on the monitor with it being DVI.

Doing the connection in that way made the HP monitor primary during system boot. But the Belinea was reporting display dimension of 0mm x 0mm via xrandr

Now connected
Hewlett Packard HP w2448hc connect via HDMI cable, no need for adaptors as the monitor is HDMI ready.
Belinea 10 19 27 connected connect via VGA cable using VGA-2-DVI adaptors at both input and output ports.

The downside is that now boot goes to the second monitor and I may have a few BlitzMax full screen issues forcing me to mess around with the xorg.conf file. The code for BlitMax as stated before is well out of date for connecting to the X server. XFree86 has been depreciated for quite a few years now.

It brings you to
1) What type of graphics card port each monitor is connected to and how.


Derron(Posted 2016) [#21]
"I've got round the problem"

Hmm, would be better to have something get fixed "for all".


At least _I_ would prefer to have things work how _I_ want and not how the system "allows". Of course that is not possible for everything.



@ OP
The user did not reply up to now... so I still do not know if your weblink helped.


bye
Ron


dawlane(Posted 2016) [#22]
Hmm, would be better to have something get fixed "for all".
This is the problem. A fix for one system isn't guaranteed to work elsewhere. Connecting monitors to graphics cards tends to be a bit of a hit and miss affair with how they are physically connected and the configuration file.

So the problem is hardware and part software. You should read the comments in in the src/video/X11/SDL_modes.c of what the programmer thinks of XRandR; and I would agree with him.

As the older monitor Belinea DVI input is a DVI-D and not a DVI-I; I would guess that connecting this via the HDMI is the issue causing incorrect physical screen size dimensions. I could get round this problem with an active display adaptor, but for what these cost it works out cheaper buying a new HDMI/DisplayPort monitor.

Oh an one more thing that would cause a display not to return back to it's original state would be if a segfault has occurred. If it's a SDL/BlitzMax issue; then the ball would be in Brucey's court.


dawlane(Posted 2016) [#23]
Here's a little script that get some details of the monitors connected. With a few modification it can be used to get and then restore monitor displays.


Derron(Posted 2016) [#24]
@ Brucey's court

The user did not reply of whether it happened with 32bit (vanilla) or 64bit (NG).


@ DVI-D
The culprit might be the CABLE ... as there are 17+1, 18+1, 24+1 or 12+1 pins...
If you got such a 12+1 cable, then no DCC informations are getting transported to the computer.

if you got 12+5 pins then it is an DVI-A cable, useable for DVI-D too (but then also without DCC).


Edit:
@ script
Thanks, but I will wait for the users reply first ... as else everything is like being in the dark.


bye
Ron


dawlane(Posted 2016) [#25]
@ DVI-D:
I suspect the real culprit is dodgy coding in xrandr, the Pascal language bindings and SDL2 2.0.4 along with the use of a HDMI-2-DVI adaptor and the fact that the monitor in question would be considered old tech.

Using the --listactivemonitors option of xrandr shows that the display in millimetres is being reported correctly, but querying the xrandr modes show that something is broken with it returning a value of "HDMI-0 connected 1920x1080+1920+0 (normal left inverted right x axis y axis) 0mm x 0mm."

It's a question of where the bug is and who needs the bug information to fix it.


Derron(Posted 2016) [#26]
Ah ok, so if the system recognizes it correctly, then i am pretty sure it isn't the cable then.
(Exception would be some kind "default dimension" for given resolutions...which I really doubt).


Building xrandr your own and adding debug messages to find out where it fails?

Maybe just create an issue for the "xrandr"-developers and let them fiddle out how to tackle that?



bye
Ron


Derron(Posted 2016) [#27]
Hmm user responded ... without giving further clues.


bye
Ron