Introduce Oppia proto API v1: Android #1

BenHenning · 2021-08-06T07:33:45Z

Note that this PR was written largely in collaboration with @anandwana001 & @seanlip (see commit history for specific attribution if necessary),

Overview

This introduces the initial version of Oppia's backend proto API, which is currently only spec'd for the Android client.

Design details

Some things in particular to note:

This is introducing the request & response-based structures for both the topic list & individual topic content, based on the latest defined backend specification for Android support
The TopicSummary & related structures include some extra data that the client technically doesn't need for the preview experience: descriptions & thumbnails for stories, and skill IDs. This is primarily to (1) provide sufficient details to the client so that it can correctly download the content it needs, and (2) avoid the need for non-summary structures to be downloaded (all of the data needed for topics & stories are fully captured in these summary structures which simplifies the content download portions a bit).
Compatibility for matching against versions is designed based on the versioning scheme introduced here. See the next section in this comment for more specifics.
The content download API has pagination inherently built in. The client can request multiple structures (of the same or different types), and the backend has the choice of what to respond with. If, for example, the client requests 10 explorations, the backend may choose to only respond with 2 which will prompt the client to request the other 8 in a follow-up request.
Questions are being introduced to ensure they are at least defined, but they won't be used by the GA version of the Android app (per the current roadmap). That being said, they are worth noting because they behave a bit differently from the other structures. Since it's not known how many questions can be associated with a given topic, and that list could eventually be quite large (consider 20-100 skills per topic and dozens of questions per skill), the API is designed to provide the full list of questions separately based on a specific skill ID. This is done in three steps:
1. First, by receiving the list of skills associated with a topic
2. Second, by requesting the list of question IDs associated with a given skill (the client expects to receive the full list)
3. Third, by requesting the questions themselves. This approach does not require pagination since, per the previous design point, the backend has full flexibility over how many questions it decides to provide.
The proto design here favor strong-typing aggressively (hence substantial duplication in the state interaction definitions) to reduce assumptions or validity checks being needed for clients that consume this API (at the expensive of having less generic interoperable code)
The repository is initially being set up to be Bazel compatible (the repository has been confirmed to build) for Java
The layout of the repository facilitates potentially supporting more than just the Android client in the future (or future versions of the Android client with evolved APIs)
The initial proto structure is very focused on the exact needs of the Android client, hence the structures are substantially reduced compared with the total data available for each structure in the backend

Data flow expectations

All data that the Android client needs to fetch is divided into two categories:

The metadata necessary to:
- Show the topic list
- Provide the user with enough context (topic details) to know whether they want to download/update a topic
- Know whether a downloaded topic has an update available
- Have sufficient technical context to ensure that topic downloads/updates are fully compatible with the local client
An explicit download/update as initiated by a user

The metadata is expected to always be automatically downloaded, and is a prerequisite to downloading or updating topics since it provides the necessary context for the client to make this availability determination.

Downloading/updating topic list metadata

Topic metadata provides all the context necessary for users to decide to download/update topics, and for the client to actually make this happen. At a high-level, this is:

Lists of topics, stories, chapters, and subtopics/revision cards
Versioning information needed to request specific topic content when downloading/updating (see below sections for specifics on versioning & content downloading)

Metadata is always automatically downloaded by the client when connectivity is available to ensure users have the most up-to-date context. However, there's one key challenge that needs to be accounted for in this design: some of the metadata is tied to versioned structures that can change in incompatible ways. For example, if a new incompatible chapter is added to a story, we don't want to display that chapter since the user wouldn't be able to download it, and this would lead to an inconsistency. Another such scenario is if a story's title is updated without the corresponding topic being downloaded.

To properly account for this, metadata is requested with the client's compatibility context so that the server can ensure that it's only providing compatible results. Further, the response for this metadata will distinguish between metadata for undownloaded topics vs. downloaded ones (so that titles & thumbnails aren't updated for topics that are downloaded until the topic itself is updated).

One other issue that crops up here is the time that the user may need to wait in the client for metadata to be available to display the homescreen. While this design is not stipulating frontend design, it is expected that the client may make use of several techniques to improve this experience, including:

Embedding an initial topic list that's up-to-date as of the time of that app version's launch
Downloading the latest topic list during sign-in to avoid needing to have a loading experience in the home screen
Never updating the topic list after the home screen is loaded until app restart (though this might not actually need to happen if the app only ever exposes update/download availability and never changes metadata for already-downloaded topics)

Downloading/updating topics (content downloading)

While directly user initiated, topics are downloaded/updated piecemeal. Topics consist of potentially thousands of files (dozens to ~hundreds of structures, and hundreds to ~thousands of images). An active design decision is to keep topics fragmented and put the onus for downloading each of these to the client. While this increases complexity, it has the key advantage of making topic downloads highly pausable/restartable by introducing more granularity when downloading the constituent pieces of topics.

The image part of topics are naturally piecemeal since they are separate statically GCS-hosted assets that must be individually downloaded. Structures, on the other hand, aren't quite as composed. To make that more viable, pagination & selection were key design choices in the proto API proposed here. In particular, the API is designed to:

Allow the client to limit how much it can download (based on actual data sizes rather than arbitrary limits)
Allow the client to select exactly the structures that it needs from a single endpoint rather than needing to bounce between multiple endpoints

It's expected that topic content endpoints provide exactly what's requested based on content & structure versions. See the versioning section below for specifics on how versioning is expected to work. Keep in mind that versioning information for topic content is retrieved via metadata (see the section above for more specifics).

Versioning

Versioning starts getting a bit complicated when considering the backend VersionedModels, the model structures themselves, schema structures, proto structures, the API, snapshots of the API, and other things. To try and keep things simple, this design intends to define a separate set of versions that are specific to this API. How these map to Oppia backend's many versions is an implementation detail not actually relevant to this design.

API version

The API itself is versioned and has separate releases. To keep things simple, it has a major & minor version (e.g. 1.0). The major version is represented directly in the directory structure (e.g. 'v1') since the repository could, in the future, house multiple major versions (which is necessary for the backend to maintain long-term backward compatibility with older API versions). The major version is only incremented if a breaking change must be introduced into the API (generally this will only come when we want to make substantial changes to the API/redesign it).

The API's minor version essentially counts the number of releases that major version has been changed. This means we increment the minor version anytime we make a compatible change to the API, and then release it. The main benefit of having distinct releases is it allows the backend & frontends to opt into taking on a particular version of the API. Due to the backward/forward interoperability nature of protobuf, it's unlikely there will ever be an incompatibility so long as structure versions are respected (see below), and even in those cases there should be potential for graceful failures. Note that the minor version is only ever represented in the release numbers and never in code form since it only serves the purpose of cataloging.

Content versions

Content versions are meant to be the version of the entity itself corresponding to a particular structure (such as a topic, skill, or exploration). These are used to track whether there are content updates for a particular structure. These are expected to map exactly to the versions stored in the corresponding structure's VersionedModel in the backend, but how this versioning behaves is up to the backend so long as it's monotonically increasing and positive.

Structure versions

Structure versions correspond to groups of proto structures themselves. These are unique versions to the proto structures and are not shared with schema structure versions in the backend. Generally, these only need to be incremented when a proto structure is updated in such a way that the default data will break existing logical assumptions (thus requiring a data migration). Proto structures should never be updated in an incompatible way (without incrementing the major version--see the API version section above), but sometimes data migrations are necessary and these versions help track when those need to occur. Unlike schema versions in the backend, these are likely to not increment as often since most proto changes can be safely ignored by older clients.

Note that one aspect of this is a bit complicated: some of the substructures are shared across high-level structures (such as SubtitledHtml, or other language-based substructures). To ensure these are properly tracked, some substructures are grouped into their own combined version to track changes. The following are the list of structure types whose versions are tracked:

TopicSummaryStructureVersion
RevisionCardStructureVersion
ConceptCardStructureVersion
ExplorationStructureVersion
QuestionStructureVersion
StateStructureVersion
LanguageStructuresVersion
ImageStructureVersion

A note on interaction versioning

I did want to briefly note that it was discovered during the creation of this PR & structure versions that separate versioning for interactions is not actually needed since, per the strong typing of State's substructures, the structures of individual interactions & how they relate to the exploration/question experience is implied as part of State's structure version. This does not mean the backend won't need additional version tracking for interactions in order to compute compatibility with specific proto State structure versions, but it does serendipitously simplify version tracking at the sub-State level a bit.

Open tasks

This includes the initial proto definitions for v1 of Oppia backend's proto-based API. v1 only includes Android support.

BenHenning · 2021-08-06T07:35:30Z

@anandwana001 @seanlip PTAL and let me know your initial thoughts. This isn't done yet (a lot of documentation still needs to be added for both the protos and the README), but this is the initial structure. I think it can potentially unblock the immediate next work until the PR can be checked in & an official v1.0 released.

seanlip

Thanks @BenHenning for doing this! I took a first pass, and had some questions. PTAL when you get a chance.

proto/v1/api/android.proto

seanlip · 2021-08-08T04:51:17Z

proto/v1/structure/concept_card.proto

+  repeated WorkedExample worked_examples = 5;
+  RecordedVoiceovers recorded_voiceovers = 6;
+  WrittenTranslations written_translations = 7;
+  ReferencedImageList referenced_image_list = 8;


Just wondering, why ReferencedImageList rather than repeated ReferencedImage (here and elsewhere)?

Because the definition of an image list is a shared substructure, so its version is actually governed as part of ImageStructureVersion. My intent is to design for future iterations on how we represent images (which seems likely to change in coming years as we add more complex image editing in the exploration editor, etc.)

I'm not quite sure I follow, sorry -- wouldn't the same thing still work if you had repeated ReferencedImage and ReferencedImage still includes an ImageProtoVersion? I.e. what's an example of a future change that would affect a list of images that wouldn't just affect each image individually?

In your example the version would only handle how ReferencedImage is represented, not the fact that it's represented using a linear list. If we wanted to represent referenced images differently (for example, by mapping them to content or image IDs) then it would change ReferencedImageList but not the structures that contain it.

[Update: Chatted with Ben, I think the idea is that this might possibly change to e.g. a dict in the future. Note that, if this is necessary, the message name can be changed without issue, so it's not a problem that it's currently named "*List".]

Though, changing the type will require a new field (and a migration by updating the proto version & having a routine to migrate lists to a map). We would deprecate the old field & add a new one.

Can this now be considered resolved @seanlip?

proto/v1/structure/languages.proto

seanlip · 2021-08-08T04:54:06Z

proto/v1/structure/languages.proto

+message Voiceover {
+  string filename = 1;
+  int32 file_size_bytes = 2;
+  reserved 3; // needs_update


You probably do want needs_update; any reason not to add it here?

(OK with keeping it reserved too, just wondered if you had any questions.)

When would a client care about needs_update?

I actually meant to remove all reserved lines, I had just missed this file earlier.

needs_update is for when the voiceover is a bit out of sync with the text, because the text has changed and the voiceover isn't updated yet.

(Up to you if you want to keep it in for this iteration or not. I don't have a strong preference.)

We don't have a clear UX flow for when that condition is hit. If anything, we'd consider not pushing updates that point to out-of-date voiceovers/text translations to the client in the first place if we're concerned with consistency (which is something we need to figure out as part of the versioning project). The fewer decisions the client has to make, the better I think.

I had updated it. Removed all the reserved fields as per the Open task list.

Is this addressed now @seanlip?

proto/v1/structure/languages.proto

anandwana001

Thanks @BenHenning
Added few questions which I don't understand.

proto/v1/api/android.proto

proto/v1/structure/state.proto

BenHenning

Thanks @seanlip & @anandwana001. Just followed up.

Note that the changes aren't yet applied; I've just collected the remaining work items & added them to the opening comment.

proto/v1/api/android.proto

BenHenning · 2021-08-09T22:41:32Z

proto/v1/structure/concept_card.proto

+  repeated WorkedExample worked_examples = 5;
+  RecordedVoiceovers recorded_voiceovers = 6;
+  WrittenTranslations written_translations = 7;
+  ReferencedImageList referenced_image_list = 8;


Because the definition of an image list is a shared substructure, so its version is actually governed as part of ImageStructureVersion. My intent is to design for future iterations on how we represent images (which seems likely to change in coming years as we add more complex image editing in the exploration editor, etc.)

proto/v1/structure/state.proto

proto/v1/api/android.proto

seanlip · 2021-08-10T04:04:24Z

proto/v1/structure/concept_card.proto

+  repeated WorkedExample worked_examples = 5;
+  RecordedVoiceovers recorded_voiceovers = 6;
+  WrittenTranslations written_translations = 7;
+  ReferencedImageList referenced_image_list = 8;


I'm not quite sure I follow, sorry -- wouldn't the same thing still work if you had repeated ReferencedImage and ReferencedImage still includes an ImageProtoVersion? I.e. what's an example of a future change that would affect a list of images that wouldn't just affect each image individually?

seanlip · 2021-08-10T04:05:16Z

proto/v1/structure/languages.proto

+message Voiceover {
+  string filename = 1;
+  int32 file_size_bytes = 2;
+  reserved 3; // needs_update


needs_update is for when the voiceover is a bit out of sync with the text, because the text has changed and the voiceover isn't updated yet.

(Up to you if you want to keep it in for this iteration or not. I don't have a strong preference.)

BenHenning · 2021-08-10T22:14:02Z

proto/v1/structure/topic_summary.proto

+option java_package = "org.oppia.v1.structure";
+option java_multiple_files = true;
+
+message TopicSummary {


Note from discussion with Sean: this can't include things that can update immediately without a download.

update immediately without a download.

So, if we are removing those items which are not dependent on whether a topic is downloaded or not, then how could we show the topic preview?

A darn. I forgot part of the context here.

I think the issue is that changes to the stories in the topic shouldn't result in changes shown in the client (necessarily). This is actual a slight gap in our current product plan that needs more thought. We might need to change how we're approaching this design such that we download a very lightweight topic list by default, then download version-sensitive data separately. This will require us to block the home screen on a longer download operation (probably with a loading interstitial), or embed an initial topic list with all relevant data already included directly in the client (I really like this idea since it'll make startup much snappier & work even for older clients).

Latest changes address this by updating the API to ensure the backend does not resend topic summary info for downloaded topics (though a failsafe needed to be added in case something goes wrong and the client still needs to fetch the topic summary). I think the latest version is more robust than the past implementation. I'm not 100% confident it covers all cases, but I think we cover about 95% of the cases (it'll become clearer as we dig into both the server & client implementations).

proto/v1/api/android.proto

BenHenning · 2021-08-11T06:53:14Z

FYI that I will need to follow up on this tomorrow.

anandwana001 · 2021-12-07T08:37:51Z

Hi @BenHenning
Could you please check my last 2 commits:

I had added the Dto suffix and added 4 explorations in state.proto:

    AlgebraicExpressionInputInstance algebraic_expression_input = 10;
    MathEquationInputInstance MathEquationInput = 11;
    NumericExpressionInputInstance numeric_expression_input = 12;
    NumberWithUnitsInstance number_with_units = 13;

Let me know if these changes are correct.

anandwana001 · 2021-12-07T08:51:02Z

I am guessing, the rules should only be *Spec and interaction name only be *Instance not the *Dto.
Let me know if this is correct thing I am thinking, I will update in next commit.

BenHenning · 2021-12-07T19:46:38Z

Thanks @anandwana001. To correct one thing: I think you mean 'interactions' not 'explorations.

Regarding the changes:

The 'DTO' change looks correct, but we should probably add a follow-up CI check (after this PR) to ensure all messages always end in DTO. Can you please update Introduce GitHub Actions #2 to include details for such a check? Also, if there are any other messages that aren't ending with 'Dto' then please make sure those are also updated.
We shouldn't be including number with units--that isn't supported by the app, and should be actually be removed from android_validation_constants
The new interactions & objects are missing doc strings--please add these (I suggest referencing the other similar fields in the proto file for reference)
I think your math interactions are missing some of the customization arguments (specifically, numeric expression has 'useFractionForDivision', math equation & algebraic expression inputs use both 'useFractionForDivision' and 'customOskLetters'); also, math equation & algebraic expression have an incorrect customization argument 'value'. If it helps, you should reference the interaction's definition files: https://github.com/oppia/oppia/tree/develop/extensions/interactions (e.g. https://github.com/oppia/oppia/blob/develop/extensions/interactions/MathEquationInput/MathEquationInput.py).
Please make sure field names always use snake_case; some don't
The RuleSpec input names could probably be clearer. Suggest 'math_expression' given the type used in the rule templates: https://github.com/oppia/oppia/blob/develop/extensions/interactions/rule_templates.json.
The actual rule specs that we define should correspond to the new PRD not the existing templates since we're changing all three interactions; there's no planned legacy support here so we should design against the final solution on the outset for the proto API

I am guessing, the rules should only be *Spec and interaction name only be *Instance not the *Dto. Let me know if this is correct thing I am thinking, I will update in next commit.

All messages should end with 'Dto' for consistency. I think this will help avoid any sorts of potential naming conflicts in both web & Android when they use the objects generated from these protos.

org/oppia/proto/v1/structure/state.proto

…sSpecDto

seanlip

Hi @BenHenning -- per our chat on Wednesday, I've done a full review pass on android.proto. PTAL and feel free to let me know if anything I wrote is unclear -- thanks!

seanlip · 2022-01-30T10:52:48Z