Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: auto-generate type alignment (#[pgrx(alignment = "on")]) #1942

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

JamesGuthrie
Copy link
Contributor

@JamesGuthrie JamesGuthrie commented Nov 12, 2024

Postgres allows for types to specify their alignment in the CREATE TYPE statement. This change adds the ability to derive Postgres' alignment configuration from the type's Rust alignment (std::mem::align_of::<T>()). This functionality is opt-in through the #[pgrx(alignment = "on")] attribute:

#[derive(PostgresType)]
#[pgrx(alignment = "on")]
struct AlignedTo8Bytes {
  v1: u64,
  v2: [u64; 3]
}

@@ -94,6 +94,8 @@ pub trait SqlGraphIdentifier {
fn line(&self) -> Option<u32>;
}

pub use postgres_type::Alignment;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this is the right way to expose the Alignment enum (no other modules do it this way). I didn't want to change the visibility of the postgres_type module to global, for fear of leaking other types which we don't want to.

Comment on lines 32 to 94
pub enum Alignment {
Char,
Int2,
Int4,
Double
}

const INVALID_ATTR_CONTENT: &str =
r#"expected `#[pgrx(alignment = align)]`, where `align` is "char", "int2", "int4", or "double""#;

impl ToTokens for Alignment {
fn to_tokens(&self, tokens: &mut TokenStream) {
let value = match self {
Alignment::Char => format_ident!("Char"),
Alignment::Int2 => format_ident!("Int2"),
Alignment::Int4 => format_ident!("Int4"),
Alignment::Double => format_ident!("Double"),
};
let quoted = quote! {
::pgrx::pgrx_sql_entity_graph::Alignment::#value
};
tokens.append_all(quoted);
}
}

impl Alignment {

pub fn from_attribute(attr: &Attribute) -> Result<Option<Self>, syn::Error> {
if attr.style != AttrStyle::Outer {
return Err(syn::Error::new(
attr.span(),
"#[pgrx(alignment = ..)] is only valid in an outer context",
));
}

let attr = attr.parse_args::<PgrxAttribute>()?;
for arg in attr.args.iter() {
let PgrxArg::NameValue(ref nv) = arg;
if !nv.path.is_ident("alignment") {
continue;
}

return match nv.value {
ArgValue::Lit(Lit::Str(ref s)) => {
match s.value().as_ref() {
"char" => Ok(Some(Self::Char)),
"int2" => Ok(Some(Self::Int2)),
"int4" => Ok(Some(Self::Int4)),
"double" => Ok(Some(Self::Double)),
_ => Err(syn::Error::new(s.span(), INVALID_ATTR_CONTENT))
}
}
ArgValue::Path(ref p) => {
Err(syn::Error::new(p.span(), INVALID_ATTR_CONTENT))
}
ArgValue::Lit(ref l) => {
Err(syn::Error::new(l.span(), INVALID_ATTR_CONTENT))
}
};
}

Ok(None)
}

/// Used to parse a generator config from a set of item attributes
pub fn from_attributes(attrs: &[Attribute]) -> Result<Option<Self>, syn::Error> {
if let Some(attr) = attrs.iter().find(|attr| attr.path().is_ident("pgrx")) {
Self::from_attribute(attr)
} else {
Ok(None)
}
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this should live here, or elsewhere?

@@ -163,7 +256,7 @@ impl ToSql for PostgresTypeEntity {
\tINTERNALLENGTH = variable,\n\
\tINPUT = {schema_prefix_in_fn}{in_fn}, /* {in_fn_path} */\n\
\tOUTPUT = {schema_prefix_out_fn}{out_fn}, /* {out_fn_path} */\n\
\tSTORAGE = extended\n\
\tSTORAGE = extended{alignment}\n\
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kinda icky, but it works.

@JamesGuthrie JamesGuthrie force-pushed the jg/postgrestype-derive-alignment branch from 9ec5953 to 46c90bb Compare November 12, 2024 14:00
@@ -781,6 +781,7 @@ Optionally accepts the following attributes:

* `inoutfuncs(some_in_fn, some_out_fn)`: Define custom in/out functions for the type.
* `pgvarlena_inoutfuncs(some_in_fn, some_out_fn)`: Define custom in/out functions for the `PgVarlena` of this type.
* `pgrx(alignment = "<align>")`: Define alignment for this type. One of `char`, `int2`, `int4`, or `double` (see https://www.postgresql.org/docs/current/sql-createtype.html).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this should go here. Seems like the best place.

@JamesGuthrie JamesGuthrie force-pushed the jg/postgrestype-derive-alignment branch 2 times, most recently from 22e25c2 to 84e2e70 Compare November 12, 2024 15:32
@workingjubilee
Copy link
Member

could you just make this a property of repr(align)?

@JamesGuthrie
Copy link
Contributor Author

could you just make this a property of repr(align)?

Sorry, I'm not sure I understand the suggestion. Are you saying that instead of adding the (custom) attribute #[pgrx(alignment = "...")], we should piggy-back on the presence of a #[repr(align(x))], and use that to generate the ALIGNMENT field of the generated SQL?

@workingjubilee
Copy link
Member

exactly.

@workingjubilee
Copy link
Member

workingjubilee commented Nov 13, 2024

if the Rust align and the Postgres align disagree, UB is probably about to occur anyways.

@JamesGuthrie
Copy link
Contributor Author

On second thought: what I would really want is for pgrx to auto-generate the correct alignment for my type T based off the the value of std::mem::align_of::<T>().

What do you think about that idea?

I'm not sure if that is something that we'd want to "turn on" by default. If not, then I'll strawman this:

#[derive(PostgresType)]
#[pgrx(alignment = "on")]
struct AlignedTo8Bytes {
  v1: u64,
  v2: [u64; 3]
}

@workingjubilee
Copy link
Member

On second thought: what I would really want is for pgrx to auto-generate the correct alignment for my type T based off the the value of std::mem::align_of::().

What do you think about that idea?

Yes, that sounds fine if you can make it happen.

@JamesGuthrie JamesGuthrie force-pushed the jg/postgrestype-derive-alignment branch from 84e2e70 to 8607b6a Compare November 15, 2024 07:53
@JamesGuthrie JamesGuthrie changed the title feat: allow specifying type alignment through #[pgrx(alignment = ...)] feat: auto-generate type alignment (#[pgrx(alignment = "on")]) Nov 15, 2024
@JamesGuthrie JamesGuthrie force-pushed the jg/postgrestype-derive-alignment branch 3 times, most recently from 71b591e to 1fabcbe Compare November 15, 2024 07:58
@JamesGuthrie
Copy link
Contributor Author

JamesGuthrie commented Nov 15, 2024

On second thought: what I would really want is for pgrx to auto-generate the correct alignment for my type T based off the the value of std::mem::align_of::().
What do you think about that idea?

Yes, that sounds fine if you can make it happen.

I can! I've pushed my changes.

Postgres allows for types to specify their alignment in the `CREATE
TYPE` statement. This change adds the ability to derive Postgres'
alignment configuration from the type's Rust alignment
(`std::mem::align_of::<T>()`). This functionality is opt-in through
the `#[pgrx(alignment = "on")]` attribute:

```
#[derive(PostgresType)]
#[pgrx(alignment = "on")]
struct AlignedTo8Bytes {
  v1: u64,
  v2: [u64; 3]
}
```
@JamesGuthrie JamesGuthrie force-pushed the jg/postgrestype-derive-alignment branch from 1fabcbe to e936e5d Compare November 15, 2024 08:07
@workingjubilee
Copy link
Member

Musing on design thoughts:

  • Perhaps we should do this fully-implicitly and bail silently for types that request an invalid alignment, but then use the attribute as a sort of "I require this alignment, fail if I can't get it"?
  • Alternatively, we can view this as an early opt-in.

@JamesGuthrie
Copy link
Contributor Author

I've realized that this alignment functionality is only really relevant to people who are implementing IntoDatum and FromDatum manually (activated through the #[bikeshed_postgres_type_manually_impl_from_into_datum] attribute). (Note: I believe that the default CBOR-based FromDatum/IntoDatum implementations are not alignment-sensitive).

Musing on design thoughts:

  • Perhaps we should do this fully-implicitly and bail silently for types that request an invalid alignment, but then use the attribute as a sort of "I require this alignment, fail if I can't get it"?
  • Alternatively, we can view this as an early opt-in.

My initial motivation for making this opt-in is backwards-compatibility. Extensions that have a custom type's data stored are in a difficult situation, because the generated sql is not backwards-compatible with the previously-generated sql, and there's no real path out of this situation.

To clarify: if you have a type which is 8-byte aligned in Rust, but 4-byte aligned in postgres (the default for varlena), then (going forward) the SQL we would generate for the type would have the 8-byte alignment, but all instances of custom types which are stored on disk are already "locked-in" at 4 byte alignment. It's not possible to change alignment via ALTER TYPE; the only way to "upgrade" a type to the correct alignment is to drop and re-create it.

In our extension we intend to support the incorrectly-aligned types that already exist in the wild, and generate the 8-byte alignment for all new installations.

On the other hand one could argue that by always emitting the alignment, the extension developer is in no worse a situation than today. Developers of existing extensions may already unwittingly face the "alignment problem". By emitting the correct alignment for their types, we fix all new installations. The developer then still needs to support incorrectly-aligned types for extensions which have already been released.

Specifically on this:

use the attribute as a sort of "I require this alignment, fail if I can't get it"?

I could imagine that there is somebody who still wants 4-byte alignment for a type which Rust would 8-byte align, mostly for ultimate compactness (maybe the type is mostly 12 bytes, and the additional 4 bytes of overhead is too much to stomach). Obviously in this case the onus is on the developer to ensure that they implement to/from datum correctly, to account for potentially un-aligned data (do-able with read_unaligned). Note: there may be more gotchas here that I haven't considered.

In summary:

  • I think it would make sense to emit the alignment by default, but only when #[bikeshed_postgres_type_manually_impl_from_into_datum] is used.
  • I think that there is a use-case around being able to specify 4-byte alignment in Postgres, even when the type is 8-byte aligned. I would propose that we kick this can down the road though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants