Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading entire plugins #14

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

benjaminwinger
Copy link

I know that you say in the README that esplugin focuses on providing an API to libloadorder and LOOT, rather than being a general-purpose plugin parser, but would you be interested in contributions that would make it more general purpose?

One thing worth noting is that I'm doing this from the perspective of Morrowind/OpenMW, and while it would be great to implement general purpose reading and writing of plugins for all formats supported by esplugin, I also quickly realized that writing in support for the other formats would require significantly more work, and I'm not sure that I'm the best person to be writing the code for the other formats, as I likely won't be actually using it.

It's also worth noting that I plan to eventually add support for modifying and writing plugins.

Basically there were two options I was considering:

  1. Implement and merge general purpose code for Morrowind/OpenMW plugins only, with the possibility of extending it to the other formats in future.
  2. Create a derivative library that focuses on being a general purpose Morrowind/OpenMW plugin library.

I prefer the first option, as keeping code together would put more eyes on it, but if you prefer I can move this work to a derivative project.

Note that the code included in the merge request was fairly hastily thrown together and will need some changes before it can be merged (e.g. read_plugin is currently broken for games other than Morrowind, and I haven't written any tests for this). This pull request is mostly just to get your input, but I thought I should include the changes I've made so far.

@Ortham
Copy link
Owner

Ortham commented Aug 25, 2019

Have you considered using xEdit, e.g. via Mator's xedit-lib? I think its TES3 support lags behind the other games, but it's the elephant in the room when it comes to parsing plugins, and it may be less effort to patch up the TES3 support in that than build out everything needed for supporting full reading and writing in esplugin.

If you do want to extend esplugin, I'm not going to say no to the idea. If we do end up disagreeing on some aspect of implementation (e.g. I think what you change makes things too much harder to maintain), then you can fall back to having a fork that's specific to your needs.

If you don't want to get into supporting all games in new functionality, you could:

  • Implement the functionality in a separate module (e.g. esplugin::morrowind) so it's clear that the code isn't generic across games.
  • Implement the functionality as generic functions, but handle non-Morrowind plugins gracefully (i.e. not by panicking). E.g. Plugin::is_valid_as_light_master() always returns false for all games apart from Fallout 4 and Skyrim SE. Another option would be to add an Unsupported error variant to use with functions that return Result.

I'd probably go with putting things in a separate module to start with - I'm not very happy with how I implemented extending record ID support for Morrowind, it's made the code a lot messier than I think it was worth. I'd like to go back and experiment with changing it at some point.

@benjaminwinger
Copy link
Author

I hadn’t actually looked at xedit-lib previously, but I’m not sure that it’s really what I’m looking for. For one thing I’m using this in a command line utility written in rust (omwcmd), so it integrates well with esplugin, but somewhat less so with xedit-lib as I’d also have to write a wrapper and deal with the build system somehow (given that xedit-lib only includes Windows binaries and build files). Elephant is also a quite apt term for xedit-lib, as the code base is massive, and I think I’d prefer something a little more lightweight, as unless I’m missing something big (I’m basically going off this for my knowledge of the non-morrowind formats, as I gather they haven’t changed much since oblivion), even full read and write support for all platforms shouldn’t be an excessive amount of work, while diving into xedit-lib might be.

One alternative that I had considered was creating a library from the OpenMW plugin code, given that OpenMW contains complete reader and writer code. One huge benefit over something like xedit-lib, or even esplugin, is that this would make it easy to keep feature parity with OpenMW once they start extending the plugin format post 1.0. The wrapper is still an issue, but revisiting this I eventually managed to get rust-bindgen to generate rust bindings for the openmw plugin code. Basically it would work, and OpenMW even builds (but does not distribute) a components library that contains, among other things, their plugin code, but the unfortunate reality is that it would significantly complicate the build process to use this rather than something like esplugin, particularly for cross-compiling. This would mean that the distribution of omwcmd, and thus Portmod, the system that depends on it, would be a lot more difficult than it currently is, not to mention that this sort of C++ interface is riddled with unsafe code out of necessity.

I also recognise that I’m glossing over the importance of ensuring that records are properly structured, as a large part of xedit-lib, in addition to a large part of OpenMW’s plugin code, is dedicated to handling the structure of the various record definitions. I’d be fine with a more lightweight tool that puts this responsability on the developer using the library and just uses a generic record structure like esplugin already does. That being said, it would also be possible to repurpose the bindings generated from the OpenMW code to incorporate the record structures and various constants without too much work.

I’m not entirely sure what the best way forward is at the moment, but below are my thoughts as to how the changes to esplugin could be done:

Looking at the format for non-morrowind plugins, I think that the plugin content should be able to be encoded using an enum like this, wrapped in a vector:

enum PluginEntry {
    Record(Record),
    Group(Group),
}

Then we can handle Morrowind and other games more or less in the same manner, as at the top level the plugin contents (excluding the header) would be a Vec<PluginEntry>, regardless of the plugin type. There would need to be checks when reading (and eventually writing) to ensure that the format matches what is expected for the GameId, e.g. for Morrowind it would be necessary to ensure that no groups are present in the list, but structurally everything that handles plugin data could be the same.

This would also include record id parsing, which could be done in a more GameId-independent manner as it would only be necessary to have a single function that parses Ids from a Vec<PluginEntry>.

In the short term, my thought would be to implement the above and omit the code that handles Groups for now, just having a conditional to choose between the new-style functionality for morrowind and the old-style functionality for the other games at the plugin parsing level.

I.e.:

fn parse_plugin(...) {
    ...
    if game_id == GameId::Morrowind {
        /* parse records */
        /* parse record ids from records */
        ...
    } else {
        /* parse record ids as in original*/
        /* leave PluginEntry vector empty */
        ...
    }
    ...
}

For failing gracefully, the Plugin class could have a get_entries function (returning the Vec<PluginEntry>) that returns an unsupported error for plugins other than morrowind plugins. Honestly the idea of doing this seems a little weird to me, since it won’t be necessary to return a Result once (or if) other engines are supported, but it’s a little more meaningful than just returning an empty vector, or panicking, given that this would be a public function.

Alternatively, as you suggested, wrapping the Plugin struct in a MorrowindPlugin struct in a separate module would avoid this odd error reporting, though at the cost of having to write a bunch of code that is only necessary as long as parsing support for the other engines is unimplemented, which admittedly could be a long time or even indefinitely.

@Ortham
Copy link
Owner

Ortham commented Aug 26, 2019

Ah, if you're already in Rust then preferring a language-native approach makes sense. Your implementation ideas make sense to me, though I'd like to make it clear that esplugin currently prioritises:

  • RecordID comparison speed (because LOOT does this for many pairs of plugins in a load order)
  • Plugin parsing speed (where LOOT is only interested in the TES3/TES4 record's content and just the RecordIDs of all other records, and libloadorder is only interested in the TES3/TES4 record's content)

I think esplugin is pretty fast at those two things, it's certainly fast enough for libloadorder and not bottlenecking LOOT, so I'm OK with swapping some speed for functionality, but performance is still an important factor (which is why the RecordID bits for Morrowind in particular are a bit convoluted).

Other than that, feel free to go ahead, I'm interested to see what you come up with.

@benjaminwinger
Copy link
Author

Okay, I'll bear that in mind.

I suspect that the runtime at the moment is dominated by file I/O, so reading the records into structs before parsing them for IDs is unlikely to be noticeably slower than the current method of parsing the IDs as the file is read. I'll do a side-by-side comparison to make sure I don't introduce any significant slowdown though.

It also occurs to me that memory usage could also be an issue when handling plugins in bulk. For my purposes the potential of high memory usage is fine and to be expected, but I'm guessing it's probably an issue you'd rather avoid entirely in LOOT. I think it would make sense to include a flag similar to load_header_only that controls whether or not the record stores the entire contents of the plugin. The parsed records could still be used to get the ids so that there aren't separate implementations; the flag would just determine if the records are also stored in the PluginData after being used to find the ids.

@benjaminwinger
Copy link
Author

Returning to this finally; I was rather busy for some time, and my motivations changed.

I've added a ParseMode enum for determining how parsing behaves. The previous functionality is preserved using the ParseMode::HeaderOnly (previously header_only = true) and ParseMode::RecordIds (previously header_only = false), while the new functionality is included using ParseMode::All (but only for GameId::Morrowind; other gamemodes fall back to ParseMode::RecordIds). This seemed cleaner and more verbose than adding another flag, which I briefly experimented with.

Performance seems decent, but there is a noticeable slowdown on my machine when benchmarking against Wyrmhaven.esp (chosen just for its size. HearthFires.esm obviously doesn't work for a Morrowind-only feature). The full parser takes about ~20ms per iteration, compared to ~6ms just parsing record ids, which I'm guessing is due to the additional memory allocation. While this still likely isn't too significant in real-world tests due to the file i/o bottleneck, I figured that between this and the higher memory usage when parsing large numbers of plugins it would be worth separating the functionality.

For the moment, I've left the ffi interface the same (i.e it doesn't support ParseMode::All), but it could use integer flags for specifying the mode.

Also, any preference with respect to the commits? I could clean them up, if you want, but if you plan on squashing them later I won't bother.

Copy link
Owner

@Ortham Ortham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general comment, there's a lack of tests that exercise your new functionality, I wouldn't merge this without some. Other than that, the changes generally seem fine, and I've left a few comments for some specific things.

As for the commit history, it would be good if they could be cleaned up, I'm not a fan of squashing whole pull requests into one commit (unless there's only one commit's worth of changes, of course).

@@ -46,7 +46,11 @@ pub unsafe extern "C" fn esp_plugin_parse(plugin_ptr: *mut Plugin, load_header_o
ESP_ERROR_NULL_POINTER
} else {
let plugin = &mut *plugin_ptr;
match plugin.parse_file(load_header_only) {
match plugin.parse_file(if load_header_only {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd extract the if condition into a separate variable, I think it makes the function call easier to read.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly

*is_valid = Plugin::is_valid(
mapped_game_id,
rust_path,
if load_header_only {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, as above, I'd extract the if condition into a separate variable, I think it makes the function call easier to read.

src/plugin.rs Outdated
@@ -64,7 +77,7 @@ fn is_ghost_file_extension(extension: &std::borrow::Cow<'_, str>) -> bool {
}

#[derive(Clone, PartialEq, Eq, Debug, Hash)]
enum RecordIds {
pub enum RecordIds {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enum is part of my implementation that I'm not very happy with, it makes dealing with record IDs unergonomic, so I'm not keen on making it part of the public API.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd set it to be public as I was initially making use of it externally, but I don't need it any more.

src/plugin.rs Outdated
&self.data.entries
}

pub fn get_recordids(&self) -> &RecordIds {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need to get a reference to the RecordIds enum? Also, this should probably be named get_record_ids.

src/record.rs Outdated
@@ -197,12 +226,16 @@ impl Record {
self.header.record_type
}

pub fn header_type_str(&self) -> &str {
std::str::from_utf8(&self.header.record_type).unwrap()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's generally a good idea to avoid panicking in library code, especially on user input. There are a couple of places I've unwrapped, but that's only been where the function is not expected to fail because the validity of the input has already been verified, and I use expect() to describe the unexpected failure should it ever occur.

In this case, we know valid record types only use ASCII characters, but this isn't enforced during parsing, so calling this function on an invalid record would cause a panic, probably crashing the process. I'd either return the Result<T,E> here, or enforce that ASCII check during parsing and use expect() here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is basically just a utility function to allow comparisons against strings rather than slices, would it be preferable to use String::from_utf8_lossy? We know that any valid type will be parsed correctly, and invalid types won't match properly due to the error character that replaces the non-utf8 value. You'd still be able to access the exact value via the header_type function if it's important to display the exact underlying data.

That being said, that would require unnecessary copying (to avoid the Cow enum wrapping the result, anyway), compared to simply matching against the Result of from_utf8.

I might try playing around with getting the Utf8Errors to nicely propagate when parsing the records, but, on the other hand, it would be a lot simpler just to return the Result.

src/subrecord.rs Outdated
@@ -93,6 +93,10 @@ impl Subrecord {
pub fn data(&self) -> &[u8] {
&self.data
}

pub fn type_str(&self) -> &str {
std::str::from_utf8(&self.subrecord_type).unwrap()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, it's generally a good idea to avoid panicking in library code, especially on user input. There are a couple of places I've unwrapped, but that's only been where the function is not expected to fail because the validity of the input has already been verified, and I use expect() to describe the unexpected failure should it ever occur.

In this case, we know valid subrecord types only use ASCII characters, but this isn't enforced during parsing, so calling this function on an invalid subrecord would cause a panic, probably crashing the process. I'd either return the Result<T,E> here, or enforce that ASCII check during parsing and use expect() here.

src/record.rs Outdated
@@ -50,6 +50,10 @@ impl RecordHeader {
pub fn flags(&self) -> u32 {
self.flags
}

pub fn typ(&self) -> String {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to err on the side of descriptiveness, so would call this record_type().

src/plugin.rs Outdated
);
assert!(plugin.parse_file(false).is_ok());
assert!(!plugin.is_valid_as_light_master());
for mode in vec![ParseMode::RecordIds, ParseMode::All] {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem with this pattern is that if one mode fails, the panic message won't be able to tell you which mode it was. Since there's only two values and only three statements in the loop, I'd consider just unrolling the loop, though it's a trade-off. At some point some sort of parameterised testing framework would become useful.

Also, it doesn't matter, but you could use &[ParseMode::RecordIds, ParseMode::All] instead of vec![ParseMode::RecordIds, ParseMode::All].

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that hadn't occurred to me.

There's a crate called parameterized, but for some reason it doesn't seem to be working for me.

test-case, on the other hand does, though it requires enabling the custom_test_frameworks unstable feature. It's fairly concise: e.g.:

#[test_case(ParseMode::RecordIds; "when mode is RecordIds")]
#[test_case(ParseMode::All; "when mode is All")]
fn parse_file_should_succeed(mode: ParseMode) {
    let mut plugin = Plugin::new(
         GameId::Morrowind,
        Path::new("testing-plugins/Morrowind/Data Files/Blank.esm"),
    );
 
     assert!(plugin.parse_file(mode).is_ok());
 
     match plugin.data.record_ids {
     RecordIds::NamespacedIds(ids) => assert_eq!(10, ids.len()),
         _ => panic!("Expected namespaced record IDs"),
    }
}

produces the test output:

test plugin::tests::morrowind::parse_file_should_succeed::when_mode_is_all ... ok                                                                              
test plugin::tests::morrowind::parse_file_should_succeed::when_mode_is_recordids ... ok

@benjaminwinger
Copy link
Author

I hadn't added new tests, partially because I thought I'd wait for feedback first, and I also because I noticed that the test plugins don't have that much in terms of non-header records. I'll see if I can find some small morrowind plugins which are released under permissive licenses that could be included in the testing plugins repo (assuming you're okay with that of course). Not that the tests really need a lot of variety in the records, but I think the records in the testing plugins are a little too homogeneous.

@benjaminwinger benjaminwinger changed the title WIP: Reading entire plugins Reading entire plugins Feb 10, 2021
@benjaminwinger
Copy link
Author

I keep forgetting about this. I've updated it to fix the conflicts. I'm not sure if it really needs any more complicated plugin, as outside of the contents, the records only really vary in their size. I think all the records other than the headers in the test plugins have precisely two subrecords and identical sizes, but the headers do provide some variety there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants