Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Directory Scan validates bin/cue compressed in 7zip, but rejects them after extraction #1513

Open
shchukin opened this issue Jan 17, 2025 · 11 comments

Comments

@shchukin
Copy link

I have a working Road Rash rom file, and both tracks MD5 matches with redump data base.

Image

But using Import Content -> Scan Directory I can't add this rom to the playlist.

All of the other CD's from the same source can be added normally. I also tried redownloading Road Rash from other sources and faced the same problem.

The strange thing I noticed is that it can be added normally while it is still inside of 7z archive, but can't be added right after extraction.

Using RetroArch 1.19.1 on M1 mac.
DB is updated via Main menu -> online update -> Update Databases

Log:

[INFO] [Scanner]: Scanning content: "/Users/tony/Roms/new sega cd/"..
[INFO] [Scanner]: 1/3: Road Rash (USA).cue..
[INFO] [Scanner]: Scanning of directory finished.

Contains of Road Rash (USA).cue:

FILE "Road Rash (USA) (Track 1).bin" BINARY
  TRACK 01 MODE1/2352
    INDEX 01 00:00:00
FILE "Road Rash (USA) (Track 2).bin" BINARY
  TRACK 02 AUDIO
    INDEX 00 00:00:00
    INDEX 01 00:02:00

@OctopusButtons
Copy link
Contributor

OctopusButtons commented Jan 20, 2025

Note the expected relevant redump data is here.

In some cases redump data could be getting precedented/over-ruled by something earlier in the build / higher in the repository for the final database build, though I don't see any other Mega CD .dat that exists for any conflict. (In a case where you're searching repository for possibly conflicting dat but not sure if you're missing seeing one, you could verify the serial in the .rdb with a hex editor.)

RetroArch should be using serial for matching Mega CD and disc-based games with databases, scanned inside the game's binary (NOT metadata). Could the 7zip load cause some other handling of how the file is ID'd? Like it's using the checksum if 7zip has that pre-computed and accessible, when RA normally wouldn't use the checksum (see above) when scanning a disc? I searched PR's for info about 7z scanning behavior but I don't see anything that seems relevant.

@shchukin
Copy link
Author

shchukin commented Jan 20, 2025

@OctopusButtons thanks for response. I am trying to understand how it works. First I checked serial number inside of bin-file file using hex editor:
Image
and it is matching with expected relevant redump data and serial in the .rdb you mentioned above. I also checked rom size in bytes, MD5, CRC and SHA1, and they also match. This confirms that this Road Rash image should be scanned, right?

So, something else went wrong, like following, right?
In some cases redump data could be getting precedented/over-ruled by something earlier in the build / higher in the repository for the final database build

P.S. I was thinking it's a problem on my end, so I also tried scanning this copy of Road Rash on another computer with newly installed Retroarch and problem still persists.

P.P.S. if it worth checking and I can be somehow helpful I can make a screencast of what I am doing to add this game to the playlist.

@OctopusButtons
Copy link
Contributor

OctopusButtons commented Jan 20, 2025

Not a screencast, but do logs show the same thing happening when you scan 7zip versus unzipped?

The fallback is Manual Scan, that will get the game into playlist, though the intriguing mystery remains.

The .rdb file is the final real data used by RetroArch, so if your check showed the correct serial in your local .rdb, that should mean there's no precedence/over-rule issue (which would only apply at the level of multiple conflicting .dat files in the repository, which get amalgamated and compiled for the final .rdb. And I see only one Mega CD .dat in the entire repository). I linked to the repository rdb, but verify your local one just to rule out issues with the downloading/serving. The Sources doc says that No Intro and TOSEC are part of the Sega CD data (with No Intro having precedence), but I don't see any TOSEC data for Sega CD in the repository, so it's moot-- whereas in other cases you would search all present .dat sources to find the wrong/conflicting data (though earlier/higher in repository has precedence in final .rdb build).

7zip file scanning successfully while uncompressed fails is interesting. I searched but I don't see any info about 7zip scans doing matching that is different from normal (see below). I suppose it's worth trying new RA version 1.20 unless your second computer test was already 1.20.

About database matching and file ID. Note that while the sha/md5 etc convince us that your file matches Redump's, Retroarch database matching only uses serial number in a disc game's binary data (and uses CRC checksum for database matching for smaller file / non-disc-based games). I can't imagine how the serial number would be different if the md5 is the same, which points back to RA's serial-scanning behavior going wrong, so I'm stumped.

Scan settings. I know very little about the ins/outs of scanning (I usually do Manual Scan myself), especially with cue/bin.

I'm interested in this case because it will help my research for a documentation update I'm working on.

@RobLoach
Copy link
Member

I've found that both running and scanning in 7z archives can be hit or miss. While your mileage may vary, I'd recommend trying without using 7z.

For reference, here's the entry in the dat:

game (
name "Road Rash (USA)"
description "Road Rash (USA)"
region "USA"
serial "T-50085"
rom ( name "Road Rash (USA) (Track 1).bin" size 624799392 crc 2B9A3535 md5 2DED7861E91654E30C23C616E647B6C3 sha1 84E80E20EBF9AFC7C381E341FAEFD3B4F5595089 serial "T-50085" )

game (
	name "Road Rash (USA)"
	description "Road Rash (USA)"
	region "USA"
	serial "T-50085"
	rom ( name "Road Rash (USA) (Track 1).bin" size 624799392 crc 2B9A3535 md5 2DED7861E91654E30C23C616E647B6C3 sha1 84E80E20EBF9AFC7C381E341FAEFD3B4F5595089 serial "T-50085" )
)

@OctopusButtons
Copy link
Contributor

OctopusButtons commented Jan 20, 2025

I've found that both running and scanning in 7z archives can be hit or miss

The telling thing in this case is that it's scanning/validating correctly when in the 7z, but failing when outside/uncompressed. I thought RA could be doing non-normal behavior because of the 7z, like somehow grabbing and matching checksum that was stored in the 7zip metadata, somehow not doing the expected serial scan. But I don't see any documentation of anything like that, and if that feature exists I still don't get the anomaly of the failed uncompressed scan.

@shchukin
Copy link
Author

Yes, this is happening after extraction. Just to make sure, here is the quick screencast of what is going on (70mb mkv file in Dropbox):
https://www.dropbox.com/scl/fi/1c0m0w1m1xjyql1c16zwu/2025-01-21_00-37-10.mkv?rlkey=05ilahmn60wb56ipqwop9pjoc&dl=0

@shchukin
Copy link
Author

shchukin commented Jan 20, 2025

I may have a lead, actually. I forgot to check log file when I was making video, so I re-did the same test, and this time I noticed, that when it is scanning 7z files, it starts with BIN file, checks it, and add to playlist right after that:

Image

But when scanning extracted version it only checks CUE file:

Image

So this can be a difference between 7z`ed and extracted files.


It will be a good idea to check Road Rash's CUE file. Maybe there are wrong paths, but no, everything seems fine:

FILE "Road Rash (USA) (Track 1).bin" BINARY
  TRACK 01 MODE1/2352
    INDEX 01 00:00:00
FILE "Road Rash (USA) (Track 2).bin" BINARY
  TRACK 02 AUDIO
    INDEX 00 00:00:00
    INDEX 01 00:02:00

The other files, from the other games only differ by the first line:

CATALOG 0000000000000
FILE "Earnest Evans (Japan) (Track 01).bin" BINARY
  TRACK 01 MODE1/2352
    INDEX 01 00:00:00
FILE "Earnest Evans (Japan) (Track 02).bin" BINARY
  TRACK 02 AUDIO
    INDEX 00 00:00:00
    INDEX 01 00:02:00

But adding this line to the Road Rash's cue doesn't help.

@RobLoach
Copy link
Member

Looked a bit further. Sega CD database indexes the serial: https://github.com/libretro/libretro-super/blob/9f8ebb51b21c81802050da9670771c271ebf5b2a/libretro-build-database.sh#L334C2-L334C65

build_libretro_database "Sega - Mega-CD - Sega CD" "rom.serial"

In task_database.c, it does attempt to find the serial:
https://github.com/libretro/RetroArch/blob/391ba55b810b21ca427573982ad246584fc98074/tasks/task_database.c#L248-L253

This calls detect_scd_game().

@OctopusButtons
Copy link
Contributor

OctopusButtons commented Jan 20, 2025

I recommend now changing the issue title to something like: "Directory Scan successfully validates bin/cue compressed in 7zip, but ignores the bins and scans/rejects the cue if files are extracted" or specifying that it validates and scans fine in 7z but not outside. Might help get more attention. (I know that's similar in essence to your current title though.) It may be correct to move the issue to RetroArch issues rather than database. And the fact that the 7zip scan validates fine (I think?) it's not really a Road Rash issue now.

I'm really interested now because bin/cue scanning is a mystery to me and I don't see documentation. The .cue is normally used as the operative game file (for example for loading and playlisting) and it refers to the bin paths, but the .bin is the item in database and with the serial (key field for matching) inside it. Mysterious.

To maybe learn more:

  • Remove Road Rash from playlist, then try several Scan File, not Scan Directory:
    • For first Scan File, target only the cue file (everything uncompressed, not in 7z) See if it adds it.
    • If it added it, then the issue is specific to directory scanner (possibly something with file sorting/order, or something deeper with the code).
    • Remove it from playlist again, and now Scan File on the .bin. See if it adds it. (However I honestly don't understand how cue/bin scanning is supposed to work, I only know that I only target cue when doing anything in RA, though I also know the database/matching is dependent seemingly on bin not cue.)
    • This might tell us if there's a bug/discrepancy between Directory and File Scan, or if both are doing the wrong behavior.
  • Zip the cue and bin into a .zip instead of .7z and see if Scanning is equally successful as the 7z or not.
  • Verify that your successful scan games have expected data in the .dat files. It might expose something obvious we missed. (E.g. I'm not even sure whether the .dat should be only Bin 1?
  • A big question is whether the issue is unique to Road Rash, or if the same problem will apply to games that share a particular property.

My tips are like blackbox-testing, because I have no capability to read the scanner code on github to understand directly what it's doing. The answers might shed some light.

Could even possibly be some OS-related quirk happening with RA / 7z. Can you make an edit to specify which OS's the 2x test computers were?

If trying to figure out how databases work in order to go deeper on troubleshooting, I have pending/draft additional clarifying research here. Though this issue is looking like RetroArch rather than databases now.

Also this issue seems similar: their scan/validation worked with the zip, but not with extracted files. (Note that when you see reference to hash/checksum match or check, in reality now disc-based games use a serial NOT a crc/hash for matching, and RA gets the serial by scanning game's binary data.)

@shchukin shchukin changed the title Sega CD: Road Rash can't be recognized after extraction from 7z Directory Scan validates bin/cue compressed in 7zip, but rejects them after extraction Jan 22, 2025
@shchukin
Copy link
Author

>>It may be correct to move the issue to RetroArch issues rather than database.
I am afraid I don't have enough permissions. There should be an option in the sidebar, but I can't see it for this repo:

Image

@RobLoach can you kindly take a look? I see you are being added to the organization.

>>For first Scan File, target only the cue file (everything uncompressed, not in 7z) See if it adds it.
No, same issue:

[INFO] [Scanner]: Scanning content: "/Users/tony/Roms/Sega - Mega CD - Sega CD/Road Rash (USA)/Road Rash (USA).cue"..
[INFO] [Scanner]: 1/1: Road Rash (USA).cue..
[INFO] [Scanner]: Scanning of file finished.

Both scanner seems work the same.

>>Remove it from playlist again, and now Scan File on the .bin.
This actually worked. Reading bin file directly adds the game to the default system playlist. But the game is unplayable. There is just a black screen when I try to run it from playlist, not even the bios loads. Same as if I do Main Menu -> Load content: cue file will run the game, while bin file will gives me the same black screen.

It makes me think there is something wrong with the cue file, like a wrong path or wrong character in the document. But I keep comparing it with another working game symbol by symbol and it seems perfectly fine:

Image

>>Zip the cue and bin into a .zip instead of .7z and see if Scanning is equally successful as the 7z or not.
Everything seems the same. Either it is 7zip or zip, the game is being added by reading the bin file:

Image

Image

This can, actually, be the reason why Sega CD games can't be run from the 7z / zip archives: archived games added by bin files, and bin files alone can't run a game:

Image

May I ask if you guys have any thoughts on that? Was this always an issue?

>>Verify that your successful scan games have expected data in the .dat files. It might expose something obvious we missed. (E.g. I'm not even sure whether the .dat should be only Bin 1?
A big question is whether the issue is unique to Road Rash, or if the same problem will apply to games that share a particular property.

If we talk about this DAT file, then ever each of 16 games I have, they all add to the playlist just fine. I checked first bin track for a couple of them and it matches DAT perfectly: serial, file size, md5, crc, sha. For some reason it struggle with Road Rash only. Will test more games later.

>>Could even possibly be some OS-related quirk happening with RA / 7z. Can you make an edit to specify which OS's the 2x test computers were?
The second one was Windows 11. Will do more tests in there as well.

@OctopusButtons
Copy link
Contributor

OctopusButtons commented Jan 23, 2025

Sorry, I should have said open on RetroArch issues separately, rather than transfer per se. Because it looks like a scanner/import program issue, rather than database. (But leave the database issue open until final word.)

This actually worked. Reading bin file directly adds the game to the default system playlist. But the game is unplayable.

In my testing of my games, Genesis Plus GX core can run fine if a .bin is selected. So the failure you mentioned is maybe informative (unless it's a PicoDrive problem). Still the issue seems bigger than Road Rash because it correctly validated for you and database functioning works perfectly fine when it's compressed. I'm not able to read (or even find) the the Scanner/Importer code on github, which would probably give hints to why it scans when compressed but not when extracted. But I agree the Road Rash .cue must be a problem, because the scanner (when extracted) doesn't proceed to the bins, only does cue, and rejects it.

I found and read this pull request from years ago. And from your info, is the 7zip/zip scanner more "diligent" and checks every file, while ignoring files in folders when extracted? Like you mentioned and saw in your logs, the scan of the extracted files (apparently) works based on file extension (unlike zips)...it ignored the bins directly and only scanned the cue. But frankly I don't understand that, since if it's scanning cue it should be looking at the referenced bins that are referenced by the cue file...because the bin is what's defined in the database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants