Skip to content
This repository has been archived by the owner on Sep 16, 2021. It is now read-only.

Create XML Sitemap #132

Closed
ElectricMaxxx opened this issue Apr 10, 2014 · 25 comments
Closed

Create XML Sitemap #132

ElectricMaxxx opened this issue Apr 10, 2014 · 25 comments
Labels
Milestone

Comments

@ElectricMaxxx
Copy link
Member

As we have access to the routes inside the PHPCR and the cmf handles routes it would be easy to create a xml representation of their structure.

But i would not only suggest to create a xml Stitemap, even a html representation would be doable. Maybe could add a configuration value for a template to render beautyfull lists of Routes, maybe with some previews to the content.

The xml-generation could be done by SonataSeoBundle. The other thing need be done by us.

@dbu
Copy link
Member

dbu commented Apr 14, 2014

the xml sitemap sounds very useful indeed. for a html sitemap, i am less sure - maybe we can just provide the thing that walks the phpcr tree in a way that it can be reused by somebody wanting to build the html sitemap. how to actually do that is probably very site specific (what to include / exclude in the sitemap, for example).

@benglass
Copy link
Member

In this case we would want to add sitemap priority and a boolean exclude from sitemap to the seo metadata.

For an html sitemap I think that knp menu bundle is a better choice for the task (in other words I would advocate for sticking to just XML since that is directly relevant to SEO but a sitemap page on your site is less so and is already handled by knp menu).

@benglass
Copy link
Member

I would also consider perhaps just providing instructions to users as to possible sitemap solutions as opposed to trying to implement one. The reason I say that is that there are 2 bundles out there already that provide sitemap function that I can think of SonataSeo and PrestaSitemap.

SonataSeo is somewhat limited because it is based on sql queries (unless they have updated it). This makes it a poor fit for cmf documents because writing a raw sql query that generates the url for a cmf document is difficult or not plausible.

PrestaSitemap is the bundle we chose because it is more flexibile with the ways you can populate the sitemap and provides the concept of "sections" which we use to implement multiple sitemaps for different websites in a multi-host solution.

This bundle could definitely provide the standardized ability to store sitemap related information for dynamic objects like sitemap priority and whether an object should be excluded from the sitemap.

@benglass
Copy link
Member

@ElectricMaxxx
Copy link
Member Author

Not only the admin bundle, the sonata-seo-bundle should provide some functions to show sitemaps

@dbu
Copy link
Member

dbu commented Jul 10, 2014

while doing this, we could also look into https://support.google.com/webmasters/answer/2620865?hl=en and #166 to provide language alternatives in the xml sitemap.

@ElectricMaxxx
Copy link
Member Author

👍

Mit freundlichen Grüßen

Maximilian Berghoff


Maximilian Berghoff
Wiesenstraße 44
91617 Oberdachstetten

Mail: [email protected]
Mobile: +49 151 64825096

On 10.07.2014, at 09:26, David Buchmann [email protected] wrote:

while doing this, we could also look into https://support.google.com/webmasters/answer/2620865?hl=en and #166 to provide language alternatives in the xml sitemap.


Reply to this email directly or view it on GitHub.

@ElectricMaxxx
Copy link
Member Author

Just some more questions:

  1. what should have some config?
  2. Include/Exclude some Routes
  3. global value for priority/chanfreq
  4. How to generate lastmod? Does our documents server some information about it?
  5. Extract/Generate priority from route/content?

Can do that with extractors, but as @wouterj wanted to deprecate them, how to handle that else?

From the technical point of view, i wanted to use the sitemap stuff from SeoBundle, but this one only provide an abstraction to query a doctrine. So i will do by looping through all routes and create xml by using JmsSerializer.

@ElectricMaxxx
Copy link
Member Author

Btw, using extractors will make no sense when looping through a list of routes. So we will need some mapping/configuration to get the information from the routes's content.

@ElectricMaxxx
Copy link
Member Author

Came to the conclusion, that it will be better to start with the content and create the url from it by the help of the url generator. So we will have the power to generate the alternate urls too, as i did it in #175.
I also think we should implement a provider mechanism to cover all possible databases or content sources. I would suggest to write a default provider to create a collection of Sitemap-Entries. So it is up to the implementation of the provider, how to create the properties of those Entries. By doing this one of the other providers could be to just use the sonata-seo-bundle way.
The output of that list should depend on the content-type of the header. Usually google would request application/json, but why not serving a rendered Template when somebody requests text/html.
Just one question: I would server a configuration for the url of the sitmap, any hints how to generate a Route from it directly in the bundles extension? Is there something like $container->addRoute()? (think so, right?)

@dbu
Copy link
Member

dbu commented Jul 14, 2014

asking to explicitly register the routing.xml file and define it in there has the benefit that its more visible and one could customize the url - though that would be a bad idea with google.

i think the provider idea makes most sense. there can be so different logics. maybe we need a visitor pattern or something? the visitor would be passed routes and metadata. then the metadata can be infered from the content of the route (e.g. a news does not change once its published, homepage updates often, ...) or from some manual data stored somewhere.

@ElectricMaxxx
Copy link
Member Author

Got one little performance issue in my head:
creating the Sitemap by looping through the content and just doing one UrlGenerator::generate($content) call per document, would cause N+1 queries, right?
Should i relax that by querying the routes collection manually?

@dbu
Copy link
Member

dbu commented Jul 14, 2014

its going to be slow either way i fear. i guess the thing to do is provide a command to optionally dump the sitemap to the fs. to take load off the db.

but you can experiment with prefetching data - sometimes it helps. sometimes it also hurts more, so definitely try it first. and make it optional.

@ElectricMaxxx ElectricMaxxx modified the milestones: 1.2, 1.1 Oct 7, 2014
@dbu
Copy link
Member

dbu commented Oct 9, 2014

@ElectricMaxxx did you start any work on this? if not i will probably tackle this soon.

@ElectricMaxxx
Copy link
Member Author

not jet just prepared the "many-function" for the alternate locale in the other PR.

@ElectricMaxxx
Copy link
Member Author

@wouterj i had a look into the KunstmannSitemapBundle (https://github.com/Kunstmaan/KunstmaanSitemapBundle), cause by the strong coupling to the orm we can't use the controller. Btw: there is no chance to hook into it. So the only think we could use would be the templates and the twig extensions. Enough for depending on that bundle?

@wouterj
Copy link
Member

wouterj commented Nov 3, 2014

Unless the twig extension is very complex, -1 :)

@dbu
Copy link
Member

dbu commented Nov 3, 2014

or try to refactor the kunstmaan bundle to the point where it can do
what we want. kunstmaan is not using the CmfRoutingBundle, or is it? if
not, we would need to replace the whole part about url generation too.
but if we can refactor and use a substantial part of their bundle then,
i think it would be worth it.

@dbu
Copy link
Member

dbu commented Nov 3, 2014

oh, actually i am -1 now. looked at composer.json and they not only
require their admin and "node" bundle but also fosuserbundle, which imo
has NOTHING to do with a sitemap. unless they can provide a lot of
valueable things and we find a way to fix (=remove) those dependencies,
i doubt its worth it. maybe we can steal the concept. or make their
bundle depend on the cmf bundle to eliminate their general code and only
keep the integration with all the other stuff they seem to do :-)

@ElectricMaxxx
Copy link
Member Author

... and the twig extensions arn't that poverful as they seemed to be. there are more extensions to hide a node then displaying one :-)

@ElectricMaxxx
Copy link
Member Author

Conclusion what we planed in the comments above:

  • create a node visitor, which should be able to get the special metadata (lastMod, chanFreq) from the content of a route. i.e. by interfaces
  • controller listening that somebody creates the route to its action
  • create a TwigExtension which should be able to call the controller and display the list (maybe i can implement route root mechanism for that)

@wouterj
Copy link
Member

wouterj commented Nov 3, 2014

I think you're missing some things:

  • We need a voting system which determines if a page should be included in the sitemap. E.g. we need a publish workflow voter. Also pages under a non-anonymous firewall should not be included. Apart from that, the user might has pages that he wants to make only accessible if you have the link, etc.
  • We need a listener which hooks in before the router and assigns the route to the sitemap controller if the feature is enabled (we need to make the configurable, as someone might want to generate its own sitemap)

@ElectricMaxxx
Copy link
Member Author

thanks @wouterj

why the listener? would it be enough if somebody create its custom route and map it to the controller/action we provide?

@wouterj
Copy link
Member

wouterj commented Nov 3, 2014

why the listener? would it be enough if somebody create its custom route and map it to the controller/action we provide?

Yeah, but I don't like it to use routes to configure some feature. I use a listener for it most of the time

@ElectricMaxxx ElectricMaxxx mentioned this issue Nov 7, 2014
4 tasks
@ElectricMaxxx
Copy link
Member Author

solved by #196

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants