Reading entire plugins #14

benjaminwinger · 2019-08-25T02:40:12Z

I know that you say in the README that esplugin focuses on providing an API to libloadorder and LOOT, rather than being a general-purpose plugin parser, but would you be interested in contributions that would make it more general purpose?

One thing worth noting is that I'm doing this from the perspective of Morrowind/OpenMW, and while it would be great to implement general purpose reading and writing of plugins for all formats supported by esplugin, I also quickly realized that writing in support for the other formats would require significantly more work, and I'm not sure that I'm the best person to be writing the code for the other formats, as I likely won't be actually using it.

It's also worth noting that I plan to eventually add support for modifying and writing plugins.

Basically there were two options I was considering:

Implement and merge general purpose code for Morrowind/OpenMW plugins only, with the possibility of extending it to the other formats in future.
Create a derivative library that focuses on being a general purpose Morrowind/OpenMW plugin library.

I prefer the first option, as keeping code together would put more eyes on it, but if you prefer I can move this work to a derivative project.

Note that the code included in the merge request was fairly hastily thrown together and will need some changes before it can be merged (e.g. read_plugin is currently broken for games other than Morrowind, and I haven't written any tests for this). This pull request is mostly just to get your input, but I thought I should include the changes I've made so far.

Ortham · 2019-08-25T07:32:05Z

Have you considered using xEdit, e.g. via Mator's xedit-lib? I think its TES3 support lags behind the other games, but it's the elephant in the room when it comes to parsing plugins, and it may be less effort to patch up the TES3 support in that than build out everything needed for supporting full reading and writing in esplugin.

If you do want to extend esplugin, I'm not going to say no to the idea. If we do end up disagreeing on some aspect of implementation (e.g. I think what you change makes things too much harder to maintain), then you can fall back to having a fork that's specific to your needs.

If you don't want to get into supporting all games in new functionality, you could:

Implement the functionality in a separate module (e.g. esplugin::morrowind) so it's clear that the code isn't generic across games.
Implement the functionality as generic functions, but handle non-Morrowind plugins gracefully (i.e. not by panicking). E.g. Plugin::is_valid_as_light_master() always returns false for all games apart from Fallout 4 and Skyrim SE. Another option would be to add an Unsupported error variant to use with functions that return Result.

I'd probably go with putting things in a separate module to start with - I'm not very happy with how I implemented extending record ID support for Morrowind, it's made the code a lot messier than I think it was worth. I'd like to go back and experiment with changing it at some point.

benjaminwinger · 2019-08-26T19:36:43Z

I hadn’t actually looked at xedit-lib previously, but I’m not sure that it’s really what I’m looking for. For one thing I’m using this in a command line utility written in rust (omwcmd), so it integrates well with esplugin, but somewhat less so with xedit-lib as I’d also have to write a wrapper and deal with the build system somehow (given that xedit-lib only includes Windows binaries and build files). Elephant is also a quite apt term for xedit-lib, as the code base is massive, and I think I’d prefer something a little more lightweight, as unless I’m missing something big (I’m basically going off this for my knowledge of the non-morrowind formats, as I gather they haven’t changed much since oblivion), even full read and write support for all platforms shouldn’t be an excessive amount of work, while diving into xedit-lib might be.

One alternative that I had considered was creating a library from the OpenMW plugin code, given that OpenMW contains complete reader and writer code. One huge benefit over something like xedit-lib, or even esplugin, is that this would make it easy to keep feature parity with OpenMW once they start extending the plugin format post 1.0. The wrapper is still an issue, but revisiting this I eventually managed to get rust-bindgen to generate rust bindings for the openmw plugin code. Basically it would work, and OpenMW even builds (but does not distribute) a components library that contains, among other things, their plugin code, but the unfortunate reality is that it would significantly complicate the build process to use this rather than something like esplugin, particularly for cross-compiling. This would mean that the distribution of omwcmd, and thus Portmod, the system that depends on it, would be a lot more difficult than it currently is, not to mention that this sort of C++ interface is riddled with unsafe code out of necessity.

I also recognise that I’m glossing over the importance of ensuring that records are properly structured, as a large part of xedit-lib, in addition to a large part of OpenMW’s plugin code, is dedicated to handling the structure of the various record definitions. I’d be fine with a more lightweight tool that puts this responsability on the developer using the library and just uses a generic record structure like esplugin already does. That being said, it would also be possible to repurpose the bindings generated from the OpenMW code to incorporate the record structures and various constants without too much work.

I’m not entirely sure what the best way forward is at the moment, but below are my thoughts as to how the changes to esplugin could be done:

Looking at the format for non-morrowind plugins, I think that the plugin content should be able to be encoded using an enum like this, wrapped in a vector:

enum PluginEntry {
    Record(Record),
    Group(Group),
}

Then we can handle Morrowind and other games more or less in the same manner, as at the top level the plugin contents (excluding the header) would be a Vec<PluginEntry>, regardless of the plugin type. There would need to be checks when reading (and eventually writing) to ensure that the format matches what is expected for the GameId, e.g. for Morrowind it would be necessary to ensure that no groups are present in the list, but structurally everything that handles plugin data could be the same.

This would also include record id parsing, which could be done in a more GameId-independent manner as it would only be necessary to have a single function that parses Ids from a Vec<PluginEntry>.

In the short term, my thought would be to implement the above and omit the code that handles Groups for now, just having a conditional to choose between the new-style functionality for morrowind and the old-style functionality for the other games at the plugin parsing level.

I.e.:

fn parse_plugin(...) {
    ...
    if game_id == GameId::Morrowind {
        /* parse records */
        /* parse record ids from records */
        ...
    } else {
        /* parse record ids as in original*/
        /* leave PluginEntry vector empty */
        ...
    }
    ...
}

For failing gracefully, the Plugin class could have a get_entries function (returning the Vec<PluginEntry>) that returns an unsupported error for plugins other than morrowind plugins. Honestly the idea of doing this seems a little weird to me, since it won’t be necessary to return a Result once (or if) other engines are supported, but it’s a little more meaningful than just returning an empty vector, or panicking, given that this would be a public function.

Alternatively, as you suggested, wrapping the Plugin struct in a MorrowindPlugin struct in a separate module would avoid this odd error reporting, though at the cost of having to write a bunch of code that is only necessary as long as parsing support for the other engines is unimplemented, which admittedly could be a long time or even indefinitely.

Ortham · 2019-08-26T20:37:13Z

Ah, if you're already in Rust then preferring a language-native approach makes sense. Your implementation ideas make sense to me, though I'd like to make it clear that esplugin currently prioritises:

RecordID comparison speed (because LOOT does this for many pairs of plugins in a load order)
Plugin parsing speed (where LOOT is only interested in the TES3/TES4 record's content and just the RecordIDs of all other records, and libloadorder is only interested in the TES3/TES4 record's content)

I think esplugin is pretty fast at those two things, it's certainly fast enough for libloadorder and not bottlenecking LOOT, so I'm OK with swapping some speed for functionality, but performance is still an important factor (which is why the RecordID bits for Morrowind in particular are a bit convoluted).

Other than that, feel free to go ahead, I'm interested to see what you come up with.

benjaminwinger · 2019-08-27T02:27:51Z

Okay, I'll bear that in mind.

I suspect that the runtime at the moment is dominated by file I/O, so reading the records into structs before parsing them for IDs is unlikely to be noticeably slower than the current method of parsing the IDs as the file is read. I'll do a side-by-side comparison to make sure I don't introduce any significant slowdown though.

It also occurs to me that memory usage could also be an issue when handling plugins in bulk. For my purposes the potential of high memory usage is fine and to be expected, but I'm guessing it's probably an issue you'd rather avoid entirely in LOOT. I think it would make sense to include a flag similar to load_header_only that controls whether or not the record stores the entire contents of the plugin. The parsed records could still be used to get the ids so that there aren't separate implementations; the flag would just determine if the records are also stored in the PluginData after being used to find the ids.

benjaminwinger · 2020-04-23T18:37:58Z

Returning to this finally; I was rather busy for some time, and my motivations changed.

I've added a ParseMode enum for determining how parsing behaves. The previous functionality is preserved using the ParseMode::HeaderOnly (previously header_only = true) and ParseMode::RecordIds (previously header_only = false), while the new functionality is included using ParseMode::All (but only for GameId::Morrowind; other gamemodes fall back to ParseMode::RecordIds). This seemed cleaner and more verbose than adding another flag, which I briefly experimented with.

Performance seems decent, but there is a noticeable slowdown on my machine when benchmarking against Wyrmhaven.esp (chosen just for its size. HearthFires.esm obviously doesn't work for a Morrowind-only feature). The full parser takes about ~20ms per iteration, compared to ~6ms just parsing record ids, which I'm guessing is due to the additional memory allocation. While this still likely isn't too significant in real-world tests due to the file i/o bottleneck, I figured that between this and the higher memory usage when parsing large numbers of plugins it would be worth separating the functionality.

For the moment, I've left the ffi interface the same (i.e it doesn't support ParseMode::All), but it could use integer flags for specifying the mode.

Also, any preference with respect to the commits? I could clean them up, if you want, but if you plan on squashing them later I won't bother.

Ortham

As a general comment, there's a lack of tests that exercise your new functionality, I wouldn't merge this without some. Other than that, the changes generally seem fine, and I've left a few comments for some specific things.

As for the commit history, it would be good if they could be cleaned up, I'm not a fan of squashing whole pull requests into one commit (unless there's only one commit's worth of changes, of course).

Ortham · 2020-04-26T07:31:36Z

ffi/src/plugin.rs

@@ -46,7 +46,11 @@ pub unsafe extern "C" fn esp_plugin_parse(plugin_ptr: *mut Plugin, load_header_o
            ESP_ERROR_NULL_POINTER
        } else {
            let plugin = &mut *plugin_ptr;
-            match plugin.parse_file(load_header_only) {
+            match plugin.parse_file(if load_header_only {


I'd extract the if condition into a separate variable, I think it makes the function call easier to read.

Ortham · 2020-04-26T07:31:49Z

ffi/src/plugin.rs

+            *is_valid = Plugin::is_valid(
+                mapped_game_id,
+                rust_path,
+                if load_header_only {


Same, as above, I'd extract the if condition into a separate variable, I think it makes the function call easier to read.

Ortham · 2020-04-26T07:34:05Z

src/plugin.rs

@@ -64,7 +77,7 @@ fn is_ghost_file_extension(extension: &std::borrow::Cow<'_, str>) -> bool {
 }

 #[derive(Clone, PartialEq, Eq, Debug, Hash)]
-enum RecordIds {
+pub enum RecordIds {


This enum is part of my implementation that I'm not very happy with, it makes dealing with record IDs unergonomic, so I'm not keen on making it part of the public API.

I think I'd set it to be public as I was initially making use of it externally, but I don't need it any more.

Ortham · 2020-04-26T07:35:11Z

src/plugin.rs

+        &self.data.entries
+    }
+
+    pub fn get_recordids(&self) -> &RecordIds {


Do you really need to get a reference to the RecordIds enum? Also, this should probably be named get_record_ids.

Ortham · 2020-04-26T07:49:43Z

src/record.rs

@@ -197,12 +226,16 @@ impl Record {
        self.header.record_type
    }

+    pub fn header_type_str(&self) -> &str {
+        std::str::from_utf8(&self.header.record_type).unwrap()


It's generally a good idea to avoid panicking in library code, especially on user input. There are a couple of places I've unwrapped, but that's only been where the function is not expected to fail because the validity of the input has already been verified, and I use expect() to describe the unexpected failure should it ever occur.

In this case, we know valid record types only use ASCII characters, but this isn't enforced during parsing, so calling this function on an invalid record would cause a panic, probably crashing the process. I'd either return the Result<T,E> here, or enforce that ASCII check during parsing and use expect() here.

Given that this is basically just a utility function to allow comparisons against strings rather than slices, would it be preferable to use String::from_utf8_lossy? We know that any valid type will be parsed correctly, and invalid types won't match properly due to the error character that replaces the non-utf8 value. You'd still be able to access the exact value via the header_type function if it's important to display the exact underlying data.

That being said, that would require unnecessary copying (to avoid the Cow enum wrapping the result, anyway), compared to simply matching against the Result of from_utf8.

I might try playing around with getting the Utf8Errors to nicely propagate when parsing the records, but, on the other hand, it would be a lot simpler just to return the Result.

Ortham · 2020-04-26T07:50:27Z

src/subrecord.rs

@@ -93,6 +93,10 @@ impl Subrecord {
    pub fn data(&self) -> &[u8] {
        &self.data
    }
+
+    pub fn type_str(&self) -> &str {
+        std::str::from_utf8(&self.subrecord_type).unwrap()


As above, it's generally a good idea to avoid panicking in library code, especially on user input. There are a couple of places I've unwrapped, but that's only been where the function is not expected to fail because the validity of the input has already been verified, and I use expect() to describe the unexpected failure should it ever occur.

In this case, we know valid subrecord types only use ASCII characters, but this isn't enforced during parsing, so calling this function on an invalid subrecord would cause a panic, probably crashing the process. I'd either return the Result<T,E> here, or enforce that ASCII check during parsing and use expect() here.

Ortham · 2020-04-26T07:52:40Z

src/record.rs

@@ -50,6 +50,10 @@ impl RecordHeader {
    pub fn flags(&self) -> u32 {
        self.flags
    }
+
+    pub fn typ(&self) -> String {


I prefer to err on the side of descriptiveness, so would call this record_type().

Ortham · 2020-04-26T08:15:38Z

src/plugin.rs

-            );
-            assert!(plugin.parse_file(false).is_ok());
-            assert!(!plugin.is_valid_as_light_master());
+            for mode in vec![ParseMode::RecordIds, ParseMode::All] {


One problem with this pattern is that if one mode fails, the panic message won't be able to tell you which mode it was. Since there's only two values and only three statements in the loop, I'd consider just unrolling the loop, though it's a trade-off. At some point some sort of parameterised testing framework would become useful.

Also, it doesn't matter, but you could use &[ParseMode::RecordIds, ParseMode::All] instead of vec![ParseMode::RecordIds, ParseMode::All].

Hmm, that hadn't occurred to me.

There's a crate called parameterized, but for some reason it doesn't seem to be working for me.

test-case, on the other hand does, though it requires enabling the custom_test_frameworks unstable feature. It's fairly concise: e.g.:

#[test_case(ParseMode::RecordIds; "when mode is RecordIds")] #[test_case(ParseMode::All; "when mode is All")] fn parse_file_should_succeed(mode: ParseMode) { let mut plugin = Plugin::new( GameId::Morrowind, Path::new("testing-plugins/Morrowind/Data Files/Blank.esm"), ); assert!(plugin.parse_file(mode).is_ok()); match plugin.data.record_ids { RecordIds::NamespacedIds(ids) => assert_eq!(10, ids.len()), _ => panic!("Expected namespaced record IDs"), } }

produces the test output:

test plugin::tests::morrowind::parse_file_should_succeed::when_mode_is_all ... ok test plugin::tests::morrowind::parse_file_should_succeed::when_mode_is_recordids ... ok

benjaminwinger · 2020-04-27T01:58:32Z

I hadn't added new tests, partially because I thought I'd wait for feedback first, and I also because I noticed that the test plugins don't have that much in terms of non-header records. I'll see if I can find some small morrowind plugins which are released under permissive licenses that could be included in the testing plugins repo (assuming you're okay with that of course). Not that the tests really need a lot of variety in the records, but I think the records in the testing plugins are a little too homogeneous.

Currently only does anything with the Morrowind GameId

benjaminwinger · 2021-11-24T16:44:01Z

I keep forgetting about this. I've updated it to fix the conflicts. I'm not sure if it really needs any more complicated plugin, as outside of the contents, the records only really vary in their size. I think all the records other than the headers in the test plugins have precisely two subrecords and identical sizes, but the headers do provide some variety there.

Ortham reviewed Apr 26, 2020

View reviewed changes

benjaminwinger force-pushed the read_records branch from 1c043ac to 80c40aa Compare April 29, 2020 15:57

benjaminwinger force-pushed the read_records branch from 80c40aa to 45fdda7 Compare February 10, 2021 17:53

benjaminwinger changed the title ~~WIP: Reading entire plugins~~ Reading entire plugins Feb 10, 2021

Benjamin Winger and others added 9 commits November 23, 2021 12:15

Initial support for reading entire Morrowind plugins

9c8fdc9

Added option to store all records

3ec2c0e

Currently only does anything with the Morrowind GameId

Added public methods to allow inspection of records

b588ce7

Replaced boolean flags with a ParseMode enum

5236dba

Updated name of full parsing benchmark to better reflect mode

f1477bf

Updated arguments to new tests

ec1b8af

Updated ffi interface to use ParseModes

459318d

Sort record ids in parse_record_ids_from_entries

2257775

Added tests for parsing plugin contents

d57652b

benjaminwinger force-pushed the read_records branch from 45fdda7 to d57652b Compare November 24, 2021 16:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading entire plugins #14

Reading entire plugins #14

benjaminwinger commented Aug 25, 2019

Ortham commented Aug 25, 2019 •

edited

Loading

benjaminwinger commented Aug 26, 2019

Ortham commented Aug 26, 2019

benjaminwinger commented Aug 27, 2019

benjaminwinger commented Apr 23, 2020

Ortham left a comment

Ortham Apr 26, 2020

benjaminwinger Apr 27, 2020

Ortham Apr 26, 2020

Ortham Apr 26, 2020

benjaminwinger Apr 27, 2020

Ortham Apr 26, 2020

Ortham Apr 26, 2020

benjaminwinger Apr 27, 2020

Ortham Apr 26, 2020

Ortham Apr 26, 2020

Ortham Apr 26, 2020

benjaminwinger Apr 27, 2020

benjaminwinger commented Apr 27, 2020

benjaminwinger commented Nov 24, 2021

Reading entire plugins #14

Are you sure you want to change the base?

Reading entire plugins #14

Conversation

benjaminwinger commented Aug 25, 2019

Ortham commented Aug 25, 2019 • edited Loading

benjaminwinger commented Aug 26, 2019

Ortham commented Aug 26, 2019

benjaminwinger commented Aug 27, 2019

benjaminwinger commented Apr 23, 2020

Ortham left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benjaminwinger commented Apr 27, 2020

benjaminwinger commented Nov 24, 2021

Ortham commented Aug 25, 2019 •

edited

Loading