Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing performance of external graph usage - performance patch #377

Closed
wants to merge 27 commits into from

Conversation

michel-heon
Copy link
Member

Testing performance of external graph usage:

What does this pull request do?

This PR partially solves the network latency problem in the communication between a VIVO instance in server-less mode and its remote triples server.

What's new?

Four classes are affected by this PR:
- RDFServiceSparql - Make the connection test more generic
- JsonServlet - Decrease the number of individuals called per page
- IndividualListController - Unify the individual name value to be processed per page as defined in JsonServlet
- GetRenderedSearchIndividualsByVClass - Set the addShortViewRenderings method to parallel processing mode

How should this be tested?

Prerequisites

  • A computational instance (e.g. AWS-EC2) containing a triplet server (e.g. jena Fuseki or AWS-Neptune)
  • A computational instance containing VIVO in server mode less
    Note: it is important that both servers (VIVO & TripleStore) are in their own computational instance (VM) to observe network latency during SPARQL calls

Configuration

  • Ensure that the triples server communications ports are open and that the network route opens up access to the SparqlEndPoint at the VIVO instance
  • In VIVO, properly configure the applicationSetup.ttl file, including the triplet: :application :hasContentTripleSource :sparqlContentTripleSource ; and the contents of `sparqlContentTripleSource

Here is a sample configuration for AWS-NEPTUNE for our development

# ------------------------------------------------------------------------------
#
# This file specifies the structure of the Vitro application: which modules
# are used, and what parameters they require.
#
# Most Vitro installations will not need to modify this file.
#
# For most installations, only the settings in the runtime.properties file will
# be changed.
#
# ------------------------------------------------------------------------------

@prefix : <http://vitro.mannlib.cornell.edu/ns/vitro/ApplicationSetup#> .
@prefix vitroWebapp: <java:edu.cornell.mannlib.vitro.webapp#> .

# ----------------------------
#
# Describe the application by its implementing class and by references to the
# modules it uses.
#

:application
    a   vitroWebapp:application.ApplicationImpl ,
        vitroWebapp:modules.Application ;
    :hasSearchEngine              :instrumentedSearchEngineWrapper ;
    :hasSearchIndexer             :basicSearchIndexer ;
    :hasImageProcessor            :iioImageProcessor ;
    :hasFileStorage               :ptiFileStorage ;
    :hasContentTripleSource       :sparqlContentTripleSource ;
    :hasTBoxReasonerModule        :jfactTBoxReasonerModule ;
    :hasConfigurationTripleSource :tdbConfigurationTripleSource .
    
# ----------------------------
#
# Image processor module:
#

:iioImageProcessor
    a   vitroWebapp:imageprocessor.imageio.IIOImageProcessor ,
        vitroWebapp:modules.imageProcessor.ImageProcessor .

# ----------------------------
#
# File storage module:
#    The PairTree-inspired implementation is the only standard option.
#    It requires no parameters.
#

:ptiFileStorage
    a   vitroWebapp:filestorage.impl.FileStorageImplWrapper ,
        vitroWebapp:modules.fileStorage.FileStorage .

# ----------------------------
#
# Search engine module:
#    The Solr-based implementation is the only standard option, but it can be
#    wrapped in an "instrumented" wrapper, which provides additional logging
#    and more rigorous life-cycle checking.
#

:instrumentedSearchEngineWrapper
    a   vitroWebapp:searchengine.InstrumentedSearchEngineWrapper ,
        vitroWebapp:modules.searchEngine.SearchEngine ;
    :wraps :solrSearchEngine .

:solrSearchEngine
    a   vitroWebapp:searchengine.solr.SolrSearchEngine ,
        vitroWebapp:modules.searchEngine.SearchEngine .

# ----------------------------
#
# Search indexer module:
#    There is only one standard implementation. You must specify the number of
#    worker threads in the thread pool.
#

:basicSearchIndexer
    a   vitroWebapp:searchindex.SearchIndexerImpl ,
        vitroWebapp:modules.searchIndexer.SearchIndexer ;
    :threadPoolSize "10" .

# ----------------------------
#
# Content triples source module: holds data contents
#    The SDB-based implementation is the default option. It reads its parameters
#    from the runtime.properties file, for backward compatibility.
#
#    Other implementations are based on a local TDB instance, a "standard" SPARQL
#    endpoint, or a Virtuoso endpoint, with parameters as shown.
#

#:sdbContentTripleSource
#    a   vitroWebapp:triplesource.impl.sdb.ContentTripleSourceSDB ,
#        vitroWebapp:modules.tripleSource.ContentTripleSource .

#:tdbContentTripleSource
#    a   vitroWebapp:triplesource.impl.tdb.ContentTripleSourceTDB ,
#        vitroWebapp:modules.tripleSource.ContentTripleSource ;
    # May be an absolute path, or relative to the Vitro home directory.
#    :hasTdbDirectory "tdbContentModels" .

:sparqlContentTripleSource
    a   vitroWebapp:triplesource.impl.sparql.ContentTripleSourceSPARQL ,
        vitroWebapp:modules.tripleSource.ContentTripleSource ;
    # The URI of the SPARQL endpoint for your triple-store.
    :hasEndpointURI "https://vivo-studio-neptune-cluster.cluster-ro-c2o1sdzzfasi.ca-central-1.neptune.amazonaws.com:8182/sparql" ;
    # The URI to use for SPARQL UPDATE calls against your triple-store.
    :hasUpdateEndpointURI "https://vivo-studio-neptune-cluster.cluster-c2o1sdzzfasi.ca-central-1.neptune.amazonaws.com:8182/sparql" .

#:virtuosoContentTripleSource
#    a   vitroWebapp:triplesource.impl.virtuoso.ContentTripleSourceVirtuoso ,
#        vitroWebapp:modules.tripleSource.ContentTripleSource ;
#    # The URI where Virtuoso can be accessed: don't include the /sparql path.
#    :hasBaseURI "http://localhost:8890" ;
#    # The name and password of a Virtuoso account that has the SPARQL_UPDATE role.
#    :hasUsername "USERNAME" ;
#    :hasPassword "PASSWORD" .


# ----------------------------
#
# Configuration triples source module: holds configuration data and user accounts
#    The TDB-based implementation is the only standard option.
#    It requires no parameters.
#

:tdbConfigurationTripleSource
    a   vitroWebapp:triplesource.impl.tdb.ConfigurationTripleSourceTDB ,
        vitroWebapp:modules.tripleSource.ConfigurationTripleSource .

# ----------------------------
#
# TBox reasoner module:
#    The JFact-based implementation is the only standard option.
#    It requires no parameters.
#

:jfactTBoxReasonerModule
    a   vitroWebapp:tboxreasoner.impl.jfact.JFactTBoxReasonerModule ,
        vitroWebapp:modules.tboxreasoner.TBoxReasonerModule .

Compilation and execution

  1. Start the triplet server and empty it of its contents
  2. Load a large triplet source
  3. Compile VIVO without this PR and start the execution
  4. Observe the slow refresh of the person page
  5. Apply the PR, compile VIVO and start the execution
  6. Observe the improvement in the refresh of the person page

Additional Notes:

For more details on how to perform more formal testing, please refer to the outcome definition

Interested parties

@chenejac

michel-heon and others added 27 commits February 22, 2023 14:51
java 8. It will abort the compilation if the Java context is version 8
                <configuration>
                    <source>JAVA_RELEASE</source>
                    <target>JAVA_RELEASE</target>
                </configuration>
by
<maven.compiler.release>JAVA_RELEASE</maven.compiler.release> in top pom
file
return to 1.13.1
Return to 1.13.1
Return to 1.13.1
Return 1.13.1
Return 1.13.1
Return 1.13.1
fixed message printing bugs on previous version
@michel-heon michel-heon changed the title Semantic web Testing performance of external graph usage - performance patch Mar 6, 2023
Copy link
Contributor

@litvinovg litvinovg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments, but in general I suggest to clean up the PR first.

<build>
<plugins>
<plugin>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General note: please remove changes not specific to this PR. There are a lot of modifications from other PR that aren't necessary here.

* I suggest to put the assignment of this variable in runtime.properties
*
*/
private static final int INDIVIDUALS_PER_PAGE = 15;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it should be in runtime.properties or it there could be some other place to configure that, but changing default value for all Vitro and VIVO users doesn't look good to me.

@@ -79,864 +79,863 @@
*/
public class RDFServiceSparql extends RDFServiceImpl implements RDFService {

private static final Log log = LogFactory.getLog(RDFServiceImpl.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more general notice: please revert formatting changes for all unnecessary changes in this PR. Not only does this make the review much more difficult, but it could also introduce unnecessary merge conflicts with other PRs.

@chenejac
Copy link
Contributor

chenejac commented Mar 7, 2023

superseded by #378

@chenejac chenejac closed this Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants