-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading entire plugins #14
base: master
Are you sure you want to change the base?
Conversation
Have you considered using xEdit, e.g. via Mator's xedit-lib? I think its TES3 support lags behind the other games, but it's the elephant in the room when it comes to parsing plugins, and it may be less effort to patch up the TES3 support in that than build out everything needed for supporting full reading and writing in esplugin. If you do want to extend esplugin, I'm not going to say no to the idea. If we do end up disagreeing on some aspect of implementation (e.g. I think what you change makes things too much harder to maintain), then you can fall back to having a fork that's specific to your needs. If you don't want to get into supporting all games in new functionality, you could:
I'd probably go with putting things in a separate module to start with - I'm not very happy with how I implemented extending record ID support for Morrowind, it's made the code a lot messier than I think it was worth. I'd like to go back and experiment with changing it at some point. |
I hadn’t actually looked at xedit-lib previously, but I’m not sure that it’s really what I’m looking for. For one thing I’m using this in a command line utility written in rust (omwcmd), so it integrates well with esplugin, but somewhat less so with xedit-lib as I’d also have to write a wrapper and deal with the build system somehow (given that xedit-lib only includes Windows binaries and build files). Elephant is also a quite apt term for xedit-lib, as the code base is massive, and I think I’d prefer something a little more lightweight, as unless I’m missing something big (I’m basically going off this for my knowledge of the non-morrowind formats, as I gather they haven’t changed much since oblivion), even full read and write support for all platforms shouldn’t be an excessive amount of work, while diving into xedit-lib might be. One alternative that I had considered was creating a library from the OpenMW plugin code, given that OpenMW contains complete reader and writer code. One huge benefit over something like xedit-lib, or even esplugin, is that this would make it easy to keep feature parity with OpenMW once they start extending the plugin format post 1.0. The wrapper is still an issue, but revisiting this I eventually managed to get rust-bindgen to generate rust bindings for the openmw plugin code. Basically it would work, and OpenMW even builds (but does not distribute) a components library that contains, among other things, their plugin code, but the unfortunate reality is that it would significantly complicate the build process to use this rather than something like esplugin, particularly for cross-compiling. This would mean that the distribution of omwcmd, and thus Portmod, the system that depends on it, would be a lot more difficult than it currently is, not to mention that this sort of C++ interface is riddled with unsafe code out of necessity. I also recognise that I’m glossing over the importance of ensuring that records are properly structured, as a large part of xedit-lib, in addition to a large part of OpenMW’s plugin code, is dedicated to handling the structure of the various record definitions. I’d be fine with a more lightweight tool that puts this responsability on the developer using the library and just uses a generic record structure like esplugin already does. That being said, it would also be possible to repurpose the bindings generated from the OpenMW code to incorporate the record structures and various constants without too much work. I’m not entirely sure what the best way forward is at the moment, but below are my thoughts as to how the changes to esplugin could be done: Looking at the format for non-morrowind plugins, I think that the plugin content should be able to be encoded using an enum like this, wrapped in a vector: enum PluginEntry {
Record(Record),
Group(Group),
} Then we can handle Morrowind and other games more or less in the same manner, as at the top level the plugin contents (excluding the header) would be a This would also include record id parsing, which could be done in a more GameId-independent manner as it would only be necessary to have a single function that parses Ids from a In the short term, my thought would be to implement the above and omit the code that handles Groups for now, just having a conditional to choose between the new-style functionality for morrowind and the old-style functionality for the other games at the plugin parsing level. I.e.: fn parse_plugin(...) {
...
if game_id == GameId::Morrowind {
/* parse records */
/* parse record ids from records */
...
} else {
/* parse record ids as in original*/
/* leave PluginEntry vector empty */
...
}
...
} For failing gracefully, the Plugin class could have a get_entries function (returning the Alternatively, as you suggested, wrapping the Plugin struct in a MorrowindPlugin struct in a separate module would avoid this odd error reporting, though at the cost of having to write a bunch of code that is only necessary as long as parsing support for the other engines is unimplemented, which admittedly could be a long time or even indefinitely. |
Ah, if you're already in Rust then preferring a language-native approach makes sense. Your implementation ideas make sense to me, though I'd like to make it clear that esplugin currently prioritises:
I think esplugin is pretty fast at those two things, it's certainly fast enough for libloadorder and not bottlenecking LOOT, so I'm OK with swapping some speed for functionality, but performance is still an important factor (which is why the RecordID bits for Morrowind in particular are a bit convoluted). Other than that, feel free to go ahead, I'm interested to see what you come up with. |
Okay, I'll bear that in mind. I suspect that the runtime at the moment is dominated by file I/O, so reading the records into structs before parsing them for IDs is unlikely to be noticeably slower than the current method of parsing the IDs as the file is read. I'll do a side-by-side comparison to make sure I don't introduce any significant slowdown though. It also occurs to me that memory usage could also be an issue when handling plugins in bulk. For my purposes the potential of high memory usage is fine and to be expected, but I'm guessing it's probably an issue you'd rather avoid entirely in LOOT. I think it would make sense to include a flag similar to |
Returning to this finally; I was rather busy for some time, and my motivations changed. I've added a ParseMode enum for determining how parsing behaves. The previous functionality is preserved using the ParseMode::HeaderOnly (previously header_only = true) and ParseMode::RecordIds (previously header_only = false), while the new functionality is included using ParseMode::All (but only for GameId::Morrowind; other gamemodes fall back to ParseMode::RecordIds). This seemed cleaner and more verbose than adding another flag, which I briefly experimented with. Performance seems decent, but there is a noticeable slowdown on my machine when benchmarking against For the moment, I've left the ffi interface the same (i.e it doesn't support ParseMode::All), but it could use integer flags for specifying the mode. Also, any preference with respect to the commits? I could clean them up, if you want, but if you plan on squashing them later I won't bother. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general comment, there's a lack of tests that exercise your new functionality, I wouldn't merge this without some. Other than that, the changes generally seem fine, and I've left a few comments for some specific things.
As for the commit history, it would be good if they could be cleaned up, I'm not a fan of squashing whole pull requests into one commit (unless there's only one commit's worth of changes, of course).
ffi/src/plugin.rs
Outdated
@@ -46,7 +46,11 @@ pub unsafe extern "C" fn esp_plugin_parse(plugin_ptr: *mut Plugin, load_header_o | |||
ESP_ERROR_NULL_POINTER | |||
} else { | |||
let plugin = &mut *plugin_ptr; | |||
match plugin.parse_file(load_header_only) { | |||
match plugin.parse_file(if load_header_only { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd extract the if condition into a separate variable, I think it makes the function call easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly
ffi/src/plugin.rs
Outdated
*is_valid = Plugin::is_valid( | ||
mapped_game_id, | ||
rust_path, | ||
if load_header_only { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, as above, I'd extract the if condition into a separate variable, I think it makes the function call easier to read.
src/plugin.rs
Outdated
@@ -64,7 +77,7 @@ fn is_ghost_file_extension(extension: &std::borrow::Cow<'_, str>) -> bool { | |||
} | |||
|
|||
#[derive(Clone, PartialEq, Eq, Debug, Hash)] | |||
enum RecordIds { | |||
pub enum RecordIds { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This enum is part of my implementation that I'm not very happy with, it makes dealing with record IDs unergonomic, so I'm not keen on making it part of the public API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd set it to be public as I was initially making use of it externally, but I don't need it any more.
src/plugin.rs
Outdated
&self.data.entries | ||
} | ||
|
||
pub fn get_recordids(&self) -> &RecordIds { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really need to get a reference to the RecordIds
enum? Also, this should probably be named get_record_ids
.
src/record.rs
Outdated
@@ -197,12 +226,16 @@ impl Record { | |||
self.header.record_type | |||
} | |||
|
|||
pub fn header_type_str(&self) -> &str { | |||
std::str::from_utf8(&self.header.record_type).unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's generally a good idea to avoid panicking in library code, especially on user input. There are a couple of places I've unwrapped, but that's only been where the function is not expected to fail because the validity of the input has already been verified, and I use expect()
to describe the unexpected failure should it ever occur.
In this case, we know valid record types only use ASCII characters, but this isn't enforced during parsing, so calling this function on an invalid record would cause a panic, probably crashing the process. I'd either return the Result<T,E>
here, or enforce that ASCII check during parsing and use expect()
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is basically just a utility function to allow comparisons against strings rather than slices, would it be preferable to use String::from_utf8_lossy
? We know that any valid type will be parsed correctly, and invalid types won't match properly due to the error character that replaces the non-utf8 value. You'd still be able to access the exact value via the header_type
function if it's important to display the exact underlying data.
That being said, that would require unnecessary copying (to avoid the Cow enum wrapping the result, anyway), compared to simply matching against the Result of from_utf8
.
I might try playing around with getting the Utf8Errors to nicely propagate when parsing the records, but, on the other hand, it would be a lot simpler just to return the Result.
src/subrecord.rs
Outdated
@@ -93,6 +93,10 @@ impl Subrecord { | |||
pub fn data(&self) -> &[u8] { | |||
&self.data | |||
} | |||
|
|||
pub fn type_str(&self) -> &str { | |||
std::str::from_utf8(&self.subrecord_type).unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, it's generally a good idea to avoid panicking in library code, especially on user input. There are a couple of places I've unwrapped, but that's only been where the function is not expected to fail because the validity of the input has already been verified, and I use expect()
to describe the unexpected failure should it ever occur.
In this case, we know valid subrecord types only use ASCII characters, but this isn't enforced during parsing, so calling this function on an invalid subrecord would cause a panic, probably crashing the process. I'd either return the Result<T,E>
here, or enforce that ASCII check during parsing and use expect()
here.
src/record.rs
Outdated
@@ -50,6 +50,10 @@ impl RecordHeader { | |||
pub fn flags(&self) -> u32 { | |||
self.flags | |||
} | |||
|
|||
pub fn typ(&self) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to err on the side of descriptiveness, so would call this record_type()
.
src/plugin.rs
Outdated
); | ||
assert!(plugin.parse_file(false).is_ok()); | ||
assert!(!plugin.is_valid_as_light_master()); | ||
for mode in vec![ParseMode::RecordIds, ParseMode::All] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One problem with this pattern is that if one mode fails, the panic message won't be able to tell you which mode it was. Since there's only two values and only three statements in the loop, I'd consider just unrolling the loop, though it's a trade-off. At some point some sort of parameterised testing framework would become useful.
Also, it doesn't matter, but you could use &[ParseMode::RecordIds, ParseMode::All]
instead of vec![ParseMode::RecordIds, ParseMode::All]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that hadn't occurred to me.
There's a crate called parameterized, but for some reason it doesn't seem to be working for me.
test-case, on the other hand does, though it requires enabling the custom_test_frameworks unstable feature. It's fairly concise: e.g.:
#[test_case(ParseMode::RecordIds; "when mode is RecordIds")]
#[test_case(ParseMode::All; "when mode is All")]
fn parse_file_should_succeed(mode: ParseMode) {
let mut plugin = Plugin::new(
GameId::Morrowind,
Path::new("testing-plugins/Morrowind/Data Files/Blank.esm"),
);
assert!(plugin.parse_file(mode).is_ok());
match plugin.data.record_ids {
RecordIds::NamespacedIds(ids) => assert_eq!(10, ids.len()),
_ => panic!("Expected namespaced record IDs"),
}
}
produces the test output:
test plugin::tests::morrowind::parse_file_should_succeed::when_mode_is_all ... ok
test plugin::tests::morrowind::parse_file_should_succeed::when_mode_is_recordids ... ok
I hadn't added new tests, partially because I thought I'd wait for feedback first, and I also because I noticed that the test plugins don't have that much in terms of non-header records. I'll see if I can find some small morrowind plugins which are released under permissive licenses that could be included in the testing plugins repo (assuming you're okay with that of course). Not that the tests really need a lot of variety in the records, but I think the records in the testing plugins are a little too homogeneous. |
1c043ac
to
80c40aa
Compare
80c40aa
to
45fdda7
Compare
Currently only does anything with the Morrowind GameId
45fdda7
to
d57652b
Compare
I keep forgetting about this. I've updated it to fix the conflicts. I'm not sure if it really needs any more complicated plugin, as outside of the contents, the records only really vary in their size. I think all the records other than the headers in the test plugins have precisely two subrecords and identical sizes, but the headers do provide some variety there. |
I know that you say in the README that esplugin focuses on providing an API to libloadorder and LOOT, rather than being a general-purpose plugin parser, but would you be interested in contributions that would make it more general purpose?
One thing worth noting is that I'm doing this from the perspective of Morrowind/OpenMW, and while it would be great to implement general purpose reading and writing of plugins for all formats supported by esplugin, I also quickly realized that writing in support for the other formats would require significantly more work, and I'm not sure that I'm the best person to be writing the code for the other formats, as I likely won't be actually using it.
It's also worth noting that I plan to eventually add support for modifying and writing plugins.
Basically there were two options I was considering:
I prefer the first option, as keeping code together would put more eyes on it, but if you prefer I can move this work to a derivative project.
Note that the code included in the merge request was fairly hastily thrown together and will need some changes before it can be merged (e.g. read_plugin is currently broken for games other than Morrowind, and I haven't written any tests for this). This pull request is mostly just to get your input, but I thought I should include the changes I've made so far.