Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change scanner behavior for known file extensions #534

Closed
igorcafe opened this issue Mar 8, 2024 · 10 comments
Closed

Change scanner behavior for known file extensions #534

igorcafe opened this issue Mar 8, 2024 · 10 comments
Labels
enhancement New feature or request

Comments

@igorcafe
Copy link
Contributor

igorcafe commented Mar 8, 2024

Currently the scanner doesn't use the file extension to match contents against a single .dat, but instead it searches inside all .dat files based on the CRC for most extensions.
For ".cue", ".pbp", ".m3u", the scanner scans based on the exact rom name.

This is what I thought:
image

@kivutar
Copy link
Member

kivutar commented Mar 9, 2024

I like the idea.

But we need to add the arcade roms use case in this diagram first.

Iirc, for the arcade roms we check the crc of the full zip itself instead of iterating over the zip contents.

@igorcafe
Copy link
Contributor Author

Something like this?

image

@kivutar
Copy link
Member

kivutar commented Mar 10, 2024

Way better, thanks.

We also need to care about use cases where the 1:1 mapping between extension and DAT doesn't work:

For example, .bin extension is sometimes used for Genesis games instead of the usual .md. But .bin is also used by a lot of other consoles, for example Atari 5200, Nintendo DS, etc.

.bin is not the only case, there is also .exe that can be used by NXEngine and DOSBox, and maybe a few more extensions like this.

@kivutar kivutar added the enhancement New feature or request label Mar 10, 2024
@igorcafe
Copy link
Contributor Author

igorcafe commented Mar 10, 2024

I think we could map a extension to multiple dats, like:

var datsByExtension = map[string][]string{
  ".gba": {"Nintendo - Game Boy Advance"},
  ".bin": {"Sega - Genesis", "Atari 5200", ...},
}

Otherwise if we don't know all consoles that an extension can be mapped to, we could change the "Ignore" part to match by MD5?
I'm trying to think a better way than matching by only CRC in all dats... We could match by CRC and file size, by I don't think we have the file size currently.

Edit: actually some roms have CRC and size. What about using these two infos when available?

Edit 2: actually all roms that have CRC have size

@igorcafe
Copy link
Contributor Author

For unzipped games from known extensions we can match by file size + CRC and for zipped games (excluding arcade) by uncompressed file size + CRC

@igorcafe
Copy link
Contributor Author

Investigating even further I noticed a bigger problem that is simpler to solve.
If you have any empty file in your roms, it will match any game that has no CRC, since the zero value is 0.
In my case a single empty file in my directory tree was matching 11 thousands games that I don't even have.

By ignoring CRCs = 0 and also matching the file size, I think most of the problems are solved...

@igorcafe
Copy link
Contributor Author

I still wonder if FindByCRC should continue to search after matching a content.
I tested ignoring CRC = 0 and matching the filesize, but there are many duplicate results...

image

@igorcafe
Copy link
Contributor Author

igorcafe commented Mar 12, 2024

So wrapping up, I think the real problems are:

  • .cue, .pbp and .m3u inside zips aren't matched by name
  • Ludo does not search arcade games based on the zip checksum, only of its contents
  • Ludo doesn't ignore zero CRCs (I've already opened a PR for that)

I don't think we need to map rom extension to dat anymore. In my testings with 8K roms I didn't have that problem, and the cases of "collisions" I noticed weren't real collisions, but instead the same title repeated and some games with unknown CRC.

We can keep the algorithm simple by looking inside all dats like we already do, but with some small changes to be like:

  • Check CRC for all files, including zips and unknown extensions
  • For both regular files and files inside zip: if it is .cue, .pbp or m3u, find by name

@igorcafe
Copy link
Contributor Author

Related to #453

@igorcafe
Copy link
Contributor Author

Closing this. I will continue in #453

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants