Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Research #149

Open
3 tasks
MirandaEcho opened this issue Mar 2, 2020 · 3 comments
Open
3 tasks

Initial Research #149

MirandaEcho opened this issue Mar 2, 2020 · 3 comments
Assignees

Comments

@MirandaEcho
Copy link
Collaborator

MirandaEcho commented Mar 2, 2020

Initial thread: https://secure.helpscout.net/conversation/1067300996/4949?folderId=1211654

"If we're going to go this route of updating the RSS output feed in Largo, we'd need to estimate the work and get a contract in place, the elements to be estimated are below. Before we proceed, though, we should probably connect on this approach and make sure we're all aligned. We're working on a project intake form which will help with this. Copying in Miranda and Jonathan for visibility.

To estimate:

  • Updating Largo's RSS feed
  • Packaging and releasing an update of Largo
  • Identifying which members are on Largo and assisting with upgrading to latest version
  • Documentation for individual sites to update their non-Largo RSS feed outputs"

Additional project information from Sarah with requirements, use cases, etc.: https://docs.google.com/forms/d/1GX20803VK9bAn7o6VJb-4npnL7R813TnCJlCcQrgMHQ/edit?ts=5e59472c#responses

End goal: Users can sign up for content vertical newsletters from the sign up page on the INN site and reliably get formatted content newsletters from INN.

To be completed in research phase:

  • Possible approaches
  • Concerns/caveats
  • An estimate for the appraoch(es)
@benlk
Copy link
Collaborator

benlk commented Mar 3, 2020

Me and Jonathan are thinking it might be best to set up some sort of
standardization campaign in members' CMS so, for example, they declare a
source so we can identify articles by outlet name or they know to put one
to two sentences in the summary section, not the entire article. It'd be
helpful not only for this project, but for any sort of future content
sharing we do on a systematic level. Do you have any advice on how hard it
might be to do this / any suggestions for getting started? Do you know if a
project like this is already in the works?

Before we even start work on updating Largo's RSS feed, or identifying how to patch non-Largo INN member sites' RSS feeds, we're going to have to:

  • Specify what the "Source" information that Sarah et al are trying to capture is.
    • Based on the Helpscout thread, it looks like they're trying to capture the newsroom's name, which with the Bridge Magazine example at https://www.bridgemi.com/feed would be the contents of the "title" tag within the "channel" tag.
    • If it's not something already provided in the feeds, then we're going to have to determine:
      • what that data is
      • whether there's already a standard way of encoding that information within RSS, because we want to avoid a https://xkcd.com/927/ scenario
      • whether that information can be provided using the features already present in existing CMS implementations
      • whether we'll need to modify just Largo's RSS feeds, or all WordPress-using sites' RSS feeds, or the RSS feeds of all INN members. How many person-years are we willing to invest on this project? At what point does it become easier to write a custom scraper that outputs the feeds that we want instead of asking the whole world to change their websites?

So the immediate questions are:

  • what is the "source" of a feed, and is it meaningfully distinct from the feed channel title?
  • can Mailchimp/Feedly get at the feed channel title in its merge tags?

@benlk
Copy link
Collaborator

benlk commented Mar 3, 2020

This project's workflow:

  1. Members publish items to their individual RSS feeds.
  2. Those 200+ feeds are aggregated into a Private feed on Feedly by Sarah.
  3. Feedly publishes that private feed at https://feedly.com/f/IAit3cvWmGwmqOAru9z88oXi.atom as an ATOM feed.
  4. Mailchimp ingests that feed for an RSS-powered campaign.

I've created a test Mailchimp campaign to replicate the workflow: https://us1.admin.mailchimp.com/campaigns/show?id=2908759 You can edit the design and send a test to yourself to see how this works.

There are two problems here:

  1. Not all entries in the Feedly feed have source elements.
  2. Mailchimp does a weird thing when parsing the source elements on feed items that have those.

The Feedly Problem

When I looked through https://feedly.com/f/IAit3cvWmGwmqOAru9z88oXi.atom today, it looked like most sites that I know to be running WordPress, with or without Largo, had source items in most of their feed entries. Sometimes different entries by the same site would differ on whether they had a source element.

For example:

<entry>
	<id>
		tag:feedly.com,2013:cloud/entry/KCNU8io3K9TtGXp87GBRzwX15HJKpviJ7M/o6RI4jDU=_1708d40dcbc:11087f75:fd9c96c2
	</id>
	<title type="html">
	Is Lightfoot’s war on poverty too late to stop Chicago’s black exodus?
	</title>
	<published>2020-03-02T17:36:05Z</published>
	<updated>2020-03-02T17:36:05Z</updated>
	<category term="Government and Politics"/>
	<category term="Race and Culture"/>
	<category term="black exodus"/>
	<category term="chicago"/>
	<category term="community benefits agreement"/>
	<category term="Eve Ewing"/>
	<category term="gentrification"/>
	<category term="Great migration"/>
	<category term="homeless"/>
	<category term="homelessness"/>
	<category term="Lori Lightfoot"/>
	<category term="Obama Presidential Center"/>
	<category term="Poverty"/>
	<category term="UIC"/>
	<link rel="alternate" href="https://www.chicagoreporter.com/is-lightfoots-war-on-poverty-too-late-to-stop-chicagos-black-exodus/?utm_source=rss&utm_medium=rss&utm_campaign=is-lightfoots-war-on-poverty-too-late-to-stop-chicagos-black-exodus" type="text/html"/>
	<summary type="html">
		<p> A new study says Chicago’s black population decline is due to decades of racial inequality.</p> <p>The post <a rel="nofollow" href="https://www.chicagoreporter.com/is-lightfoots-war-on-poverty-too-late-to-stop-chicagos-black-exodus/">Is Lightfoot’s war on poverty too late to stop Chicago’s black exodus?</a> appeared first on <a rel="nofollow" href="https://www.chicagoreporter.com">Chicago Reporter</a>.</p>
	</summary>
	<content type="html"></content>
	<author>
		<name>Josh McGhee</name>
	</author>
	<media:content medium="image" url="https://www.chicagoreporter.com/wp-content/uploads/2015/03/Elvtrsmthrs21967-e1426094396489.jpg" width="1000" height="800"/>
	<source>
		<id>
			tag:feedly.com,2013:cloud/feed/http://chicagoreporter.com/feed/
		</id>
		<title type="html">Chicago Reporter</title>
		<link rel="alternate" type="text/html" href="https://www.chicagoreporter.com"/>
		<updated>2020-03-02T17:36:05Z</updated>
	</source>
</entry>
<entry>
	<id>
		tag:feedly.com,2013:cloud/entry/dtQfP/jExOmox7hhyUBblbLw6ZSoJutJLG9Q/6SSD9o=_1708cbf814f:8e3518:ce54b40a
	</id>
	<title type="html">
		Federal civil rights agency: Unequal discipline of women in prison must be addressed
	</title>
	<published>2020-02-28T17:02:08Z</published>
	<updated>2020-02-28T17:02:08Z</updated>
	<link rel="alternate" href="https://www.chicagoreporter.com/doj-report-unequal-discipline-of-women-in-prison-must-be-addressed/" type="text/html"/>
	<summary type="html">
		Federal agency cites collaborative investigation by the Reporter, NPR and the Medill School of Journalism at Northwestern University which exposed disparities at prisons across the country.
	</summary>
	<content type="html"></content>
	<author>
		<name/>
	</author>
	<media:content medium="image" url="https://www.chicagoreporter.com/wp-content/uploads/2018/09/web_180315-Logan-Prison-098-By-Bill-Healy-for-SJNN-1170x778.jpg" width="1170" height="778"/>
</entry>

So we need to talk to Feedly about why different items from the same feed have and have not source elements. I have not yet filed a support request with them about that.

The Mailchimp problem

Here's Mailchimp's docs on merge tags: https://mailchimp.com/help/rss-merge-tags/

With a campaign with the following labeled merge tags in an RSSITEM:

RSSITEM:SOURCE: *|RSSITEM:SOURCE|*
RSSITEM:SOURCE_TITLE: *|RSSITEM:SOURCE_TITLE|*
RSSITEM:SOURCE:TITLE: *|RSSITEM:SOURCE:TITLE|*

And the following source element in an entry in the feed:

<source>
	<id>
		tag:feedly.com,2013:cloud/feed/http://www.injusticewatch.org/feed/
	</id>
	<title type="html">Injustice Watch</title>
	<link rel="alternate" type="text/html" href="https://www.injusticewatch.org"/>
	<updated>2020-03-02T17:38:01Z</updated>
</source>

The following is output in the campaign:

RSSITEM:SOURCE: tag:feedly.com,2013:cloud/feed/http://www.injusticewatch.org/feed/ Injustice Watch 2020-03-02T17:38:01Z
RSSITEM:SOURCE_TITLE: tag:feedly.com,2013:cloud/feed/http://www.injusticewatch.org/feed/ Injustice Watch 2020-03-02T17:38:01Z
RSSITEM:SOURCE:TITLE: tag:feedly.com,2013:cloud/feed/http://www.injusticewatch.org/feed/ Injustice Watch 2020-03-02T17:38:01Z

This seems like a bug to me; the use of TITLE indicates to me that the output from the merge tag would be the title element within the source tag.

I have filed a support request with Mailchimp regarding this.

@benlk
Copy link
Collaborator

benlk commented Mar 3, 2020

To estimate:

  • Updating Largo's RSS feed
  • Packaging and releasing an update of Largo
  • Identifying which members are on Largo and assisting with upgrading to latest version
  • Documentation for individual sites to update their non-Largo RSS feed outputs"

None of this is presently needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants