This guide provides the basic steps to set up the LASSO platform in standalone mode on a single, local machine.
We use the running example of a test-driven code search for Base64 implementations by ingesting the popular library Apache Commons Codec into LASSO's executable corpus.
Important: Possible security risks are not taken into consideration, so do not expose your instances.
- Linux (tested using Ubuntu 22.04 LTS) - MacOS and Windows is untested
- a working docker installation, preferably running as a non-root user (e.g., https://docs.docker.com/engine/install/ubuntu/)
- Java JDK >= 11 (any free JDK distribution should work)
- the frontend modules (webapps using angular) are built with nodejs/npm (retrieved automatically using a Maven plugin (https://github.com/eirslett/frontend-maven-plugin/)
- LASSO uses Apache Ignite (https://ignite.apache.org/), so several ports are opened automatically to enable cluster (grid) communication
The project is managed using Maven by relying on Maven Wrapper (https://maven.apache.org/wrapper/) for building all required modules (your local Maven may work as well).
The following command needs to be executed in the root directory of the repository:
./mvnw -DskipTests \
-Dfrontend.build=embedded \
clean install
The chosen profile (i.e., embedded
) for the webapps assume that LASSO's RESTful webservice will be running on localhost:10222
.
For each module, the builds are available in the corresponding target/*.jar
folders.
In the next step, we set up a new executable corpus which consists of two components -
- code search index using Solr/Lucene,
- code repository using Sonatype Nexus OSS.
Why? The code search index is populated by LASSO to enable interface-driven code searches.
see detailed instructions in solr.md.
Why? The code repository stores executable artifacts and acts as a proxy for existing artifacts (including Maven Central by default).
see detailed instructions in nexus.md.
Why? We want to demonstrate the ingestion (import) of new artifacts into LASSO's executable corpus.
This uses functionality provided by the crawler module.
In this example, we aim to ingest Apache Commons Codec 1.15
(sources and bytecode) and make it available in the executable corpus.
# set your path to LASSO's repository
export LASSO_REPO=/my/path/lasso/repository
# create working directory (where artifacts are stored)
mkdir lasso_crawler
# run crawler to download commons-codec
java -Dartifacts=commons-codec:commons-codec:1.15:sources \
-Dindexer.work.path=lasso_crawler \
-Dbatch.maven.repo.url=http://localhost:8081/repository/maven-public/ \
-Dlasso.indexer.worker.threads=1 \
-jar $LASSO_REPO/crawler/target/crawler-1.0.0-SNAPSHOT.jar
where
artifacts
takes a '|' separated list of Maven coordinates (format:groupId:artifactId:version:classifier
)indexer.work.path=lasso_crawler
points to your working directorybatch.maven.repo.url
points to your nexus repositorylasso.indexer.worker.threads
sets the number of worker threads for crawling artifacts
Note that the crawler
module can also be used to index entire Maven-compatible repositories including Maven Central based on Nexus indices (https://maven.apache.org/repository/central-index.html).
It is also possible to ingest git repositories (see examples in pipelines.md).
Why? We aim to analyze (code analytics) and index the previously downloaded artifact(s) to enable code searches.
This uses functionality provided by the analyzer module.
The following command first conducts static code analysis and then populates the results in the Solr index lasso_quickstart
to enable code search
# set your path to LASSO's repository
export LASSO_REPO=/my/path/lasso/repository
# run analyzer (points to directory of crawler above)
java -Xms2g -Xmx2g \
-Dindexer.work.path=lasso_crawler/ \
-Dlasso.indexer.worker.threads=4 \
-Dbatch.job.writer.threads=-1 \
-Dbatch.job.commit.interval=1 \
-Dbatch.solr.url=http://localhost:8983/solr \
-Dbatch.solr.core.candidates=lasso_quickstart \
-jar $LASSO_REPO/analyzer/target/analyzer-1.0.0-SNAPSHOT-exec.jar
where
indexer.work.path=lasso_crawler/
points to your crawler working directorylasso.indexer.worker.threads
sets the number of worker threads for generating Solr documentsbatch.job.writer.threads sets
the number of writer threads for Solrbatch.job.commit.interval sets
the commit interval for committing Solr documents (batching)batch.solr.url=http://localhost:8983/solr
sets the Solr urlbatch.solr.core.candidates=lasso_quickstart
sets the Solr core (i.e., code search index)
Now, you can now open your web browser and go to http://localhost:8983/solr/#/lasso_quickstart/query to see the results.
When you hit Execute Query, hundreds of documents should appear that describe the code that has been indexed. There are two types of documents present: class- and method documents.
You can try simple keyword queries with Solr's query syntax such as the query (i.e., q) name_fq:"Base64"
to retrieve all classes similar to Base64
. You can add a filter query (i.e., fq), by only returning all method (documents) of the classes found (i.e., doctype_s:"method"
),
See https://solr.apache.org/guide/solr/latest/query-guide/query-syntax-and-parsers.html for Solr's query syntax.
A description of LASSO's index schema is in index.md.
Next, we set up the LASSO platform to run on a single machine. In this case, the platform runs in embedded
mode, so both the manager node as well as one worker node are running on the same machine.
To get started, running LASSO's service in docker is the simplest way to set it up.
see detailed instructions in docker.md.
The Dockerfile is located in Dockerfile
LASSO's service can also be run as a local Java application (Spring Boot).
The following commands first set up
- the configuration (executable corpus and users), and
- the working directory in which pipeline scripts executions and traces are stored.
# create LASSO work directory
mkdir lasso_work
# create config
mkdir lasso_config
cp lasso_config/users.json lasso_config/
cp lasso_config/corpus.json lasso_config/
# copy over arena jar
mkdir -p lasso_work/repository/support/
cp arena/target/arena-1.0.0-SNAPSHOT-exec.jar lasso_work/repository/support/arena-1.0.0-SNAPSHOT.jar
# start LASSO in embedded mode (--add-opens arguments are required for Java > 11)
java --add-opens=java.base/jdk.internal.access=ALL-UNNAMED \
--add-opens=java.base/jdk.internal.misc=ALL-UNNAMED \
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED \
--add-opens=java.base/sun.util.calendar=ALL-UNNAMED \
--add-opens=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED \
--add-opens=jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED \
--add-opens=java.base/sun.reflect.generics.reflectiveObjects=ALL-UNNAMED \
--add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED \
--add-opens=java.base/java.io=ALL-UNNAMED \
--add-opens=java.base/java.nio=ALL-UNNAMED \
--add-opens=java.base/java.net=ALL-UNNAMED \
--add-opens=java.base/java.util=ALL-UNNAMED \
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED \
--add-opens=java.base/java.util.concurrent.locks=ALL-UNNAMED \
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED \
--add-opens=java.base/java.lang=ALL-UNNAMED \
--add-opens=java.base/java.lang.invoke=ALL-UNNAMED \
--add-opens=java.base/java.math=ALL-UNNAMED \
--add-opens=java.sql/java.sql=ALL-UNNAMED \
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED \
--add-opens=java.base/java.time=ALL-UNNAMED \
--add-opens=java.base/java.text=ALL-UNNAMED \
--add-opens=java.management/sun.management=ALL-UNNAMED \
--add-opens java.desktop/java.awt.font=ALL-UNNAMED \
-server -ea -Xms2G -Xmx4G\
-Dserver.port=10222 -Djava.net.preferIPv4Stack=true -Dcluster.nodeId=lasso-quickstart -Dcluster.embedded=true\
-Dthirdparty.docker.uid=$(id -u) -Dthirdparty.docker.gid=$(id -g)\
-Dlasso.workspace.root="$PWD/lasso_work/"\
-Dusers="file:$PWD/lasso_config/users.json"\
-Dcorpus="file:$PWD/lasso_config/corpus.json"\
-jar service/target/service-1.0.0-SNAPSHOT.jar
Example configuration files (JSON) for this quickstart guide of the corpus and the users are located in lasso_config.
Despite the verbose arguments needed to run the platform on Java >= 11 (i.e., add-opens), here is a quick description of the arguments
# allocate sufficient memory for LASSO's Ignite cluster (depends on specific workload)
-server -ea -Xms2G -Xmx4G
# configures the Ignite cluster and tells the engine to run in embedded mode
-Dserver.port=10222 -Djava.net.preferIPv4Stack=true -Dcluster.nodeId=lasso-quickstart -Dcluster.embedded=true
# some docker images require the correct user/group ids of the host to avoid access problems
-Dthirdparty.docker.uid=$(id -u) -Dthirdparty.docker.gid=$(id -g)
# set working directory for LASSO in which executions/traces are stored
-Dlasso.workspace.root="$PWD/lasso_work/"
# sets the users required to access the webapps / RESTful API
-Dusers="file:$PWD/lasso_config/users.json"
# sets the corpus configuration
-Dcorpus="file:$PWD/lasso_config/corpus.json"
The platform comes with a dashboard to manage, monitor and view results of pipeline scripts and their execution. In addition, it allows users to search code etc.
At time of writing, there are two webapps available
- the new angular GUI based on material design is available at http://localhost:10222/webui/
- the old angular GUI based on bootstrap is available at http://localhost:10222/lasso/
To submit a new script, follow these steps
- Login by picking a user(s) from users.json
- Submit a new LSL script pipeline.
To exemplify, the following LSL script pipeline realizes a test-driven code search using the LASSO platform for classes (methods) that realize Base64 encoding. Copy and paste the following script into the LSL script editor in the dashboard and submit it for execution.
dataSource 'lasso_quickstart'
def totalRows = 10
def noOfAdapters = 100
// interface in LQL notation
def interfaceSpec = """Base64{encode(byte[])->byte[]}"""
study(name: 'Base64encode') {
/* select class candidates using interface-driven code search */
action(name: 'select', type: 'Select') {
abstraction('Base64') {
queryForClasses interfaceSpec, 'class-simple'
rows = totalRows
excludeClassesByKeywords(['private', 'abstract'])
excludeTestClasses()
excludeInternalPkgs()
}
}
/* filter candidates by two tests (test-driven code filtering) */
action(name: 'filter', type: 'ArenaExecute') { // filter by tests
containerTimeout = 10 * 60 * 1000L // 10 minutes
specification = interfaceSpec
sequences = [
// parameterised sheet (SSN) with default input parameter values
// expected values are given in first row (oracle)
'testEncode': sheet(base64:'Base64', p2:"user:pass".getBytes()) {
row '', 'create', '?base64'
row 'dXNlcjpwYXNz'.getBytes(), 'encode', 'A1', '?p2'
},
'testEncode_padding': sheet(base64:'Base64', p2:"Hello World".getBytes()) {
row '', 'create', '?base64'
row 'SGVsbG8gV29ybGQ='.getBytes(), 'encode', 'A1', '?p2'
}
]
features = ['cc'] // enable code coverage measurement (class scope)
maxAdaptations = noOfAdapters // how many adaptations to try
dependsOn 'select'
includeAbstractions 'Base64'
profile('myTdsProfile') {
scope('class') { type = 'class' }
environment('java11') {
image = 'maven:3.6.3-openjdk-17' // Java 17
}
}
// match implementations (note no candidates are dropped)
whenAbstractionsReady() {
def base64 = abstractions['Base64']
// define oracle based on expected responses in sequences
def expectedBehaviour = toOracle(srm(abstraction: base64).sequences)
// returns a filtered SRM
def matchesSrm = srm(abstraction: base64)
.systems // select all systems
.equalTo(expectedBehaviour) // functionally equivalent
}
}
/* rank candidates based on functional correctness */
action(name:'rank', type:'Rank') {
// sort by functional similarity (passing tests/total tests) descending
criteria = ['FunctionalSimilarityReport.score:MAX:1'] // more criteria possible
dependsOn 'filter'
includeAbstractions '*'
}
}
The pipeline defines a study block in which three actions are executed
- select action: The first action selects candidates textually using an interface-driven code search from the index we have created earlier in this guide (i.e.,
lasso_quickstart
), see datasources.json - filter action: The second action defines two tests (including expected values) using the sequence sheet notation, and runs them on the textually selected candidates. Only those candidates are returned by the action which match the expected behaviour defined in the first column of the test sequences.
- rank action: Finally, the third action sorts all candidates in descending order based on their passing rate (passed tests/total tests).
Once the execution has ended, the overview site of all executed scripts in the dashboard offers various ways to obtain the results (e.g., viewing the results in a classic search results view, Results
, or analyzing the data stored in LASSO's database etc.).
You can filter implementations by selecting a reference implementation from the results (i.e., select an implementation from the Responses tab as shown below).
Use our jupyterlab playground to explore and manipulate the SRM data obtained for the running example with Python pandas (https://pandas.pydata.org/)
Next to vertical scaling of code analysis, the LASSO platform also scales vertically, hence is designed to run on more than one node. See distributed.md for more information.
Here is a quick summary of the three services that were created as part of this guide.
- The code search index (Solr) is located here: http://localhost:8983/solr/lasso_quickstart (dashboard is available here: http://localhost:8983/solr/#/lasso_quickstart/query)
- The artifact repository (Nexus) is located here: http://localhost:8081/repository/maven-public/ (dashboard is available here: http://localhost:8081/)
- LASSO instance http://localhost:10222
- New GUI: http://localhost:10222/webui/
- Old GUI: http://localhost:10222/lasso/
LASSO is extensible via its Actions API. See actions.md for more information.