Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Saxy.stream_state/5 #132

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tanguykurylo
Copy link

Add a new function Saxy.stream_state/5 to have more flexibility over parsing.

The existing function Saxy.stream_events/2 does not discard events and does not empty the state while parsing. This led to memory and performance issues when we handled files of more than 50MB. Our current solution uses Saxy.Transform and Saxy.Partial while controlling the emit/cleanup manually along the stream.

We propose Saxy.stream_state/5 as a generic solution. As with Saxy.parse_stream/4, it accepts a Saxy.Handler and an initial state. It also requires an emit function that controls state emission and cleanup while streaming.

Example of our current manual solution for comparison

defmodule CustomXMLParser do
  def parse_to_stream(xml_stream) do
    Stream.transform(xml_stream, &new_partial/0, &emit_elements/2, &close_stream/1)
  end

  defp new_partial() do
    new_state =  %{parsed: [], current: nil}
    {:ok, partial} = Saxy.Partial.new(CustomXMLParser.Handler, new_state)
    partial
  end

  defp emit_elements(_, {:stop, partial}), do: {:halt, partial}

  defp emit_elements(xml, partial) do
    with state <- cleanup_previously_emitted(partial),
         {:cont, partial} <- Saxy.Partial.parse(partial, xml, state),
         emitted <- get_parsed(partial) do
      {emitted, partial}
    else
      {:error, exception} ->
        emitted = [
          {:error, {:parse_error, Saxy.ParseError.message(exception)}}
        ]

        {emitted, {:stop, partial}}
    end
  end

  defp close_stream({:stop, partial}), do: Saxy.Partial.terminate(partial)
  defp close_stream(partial), do: Saxy.Partial.terminate(partial)

  defp cleanup_previously_emitted(partial), do: %{Saxy.Partial.get_state(partial) | parsed: []}
  defp get_parsed(partial), do: Saxy.Partial.get_state(partial)[:parsed]
end

Example of proposed solution

defmodule CustomXMLParser do
  def parse_to_stream(xml_stream) do
    Saxy.stream_state(
      xml_stream,
      CustomXMLParser.Handler,
      %{parsed: [], current: nil},
      fn %{parsed: parsed} = state -> {parsed, Map.put(state, :parsed, [])} end
    )
  end
end

@tanguykurylo tanguykurylo changed the title Implement Saxy.stream_state Implement Saxy.stream_state/5 Jul 26, 2024
@qcam
Copy link
Owner

qcam commented Oct 22, 2024

@tanguykurylo thanks for the PR 🙏 . I love the idea, will take a deeper look into the PR later this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants