Skip to content
This repository has been archived by the owner on Dec 13, 2021. It is now read-only.

Trying to make the link between Csniper and CQP #3

Open
simonmeoni opened this issue Oct 6, 2017 · 10 comments
Open

Trying to make the link between Csniper and CQP #3

simonmeoni opened this issue Oct 6, 2017 · 10 comments

Comments

@simonmeoni
Copy link

I try to install CSniper everything until the CQP installation
I download cwb and I install it with the command : ./install-cwb.sh
I modify the properties of CSniper, I add this line : engine.cqp.executable=path/to/cqp/file
I start tomcat and I reached the website succesfully but I cannot select the cqp engine. I have no log of this, I think this is the mistake of me. Anyone have an idea ?
Thanks in advance

@dodinh
Copy link
Member

dodinh commented Oct 6, 2017

This sounds like tomcat could not find the csniper configuration file.
Are you sure there is no log? Usually, these should be at /var/log/tomcat6/catalina.out (Debian-based).

@simonmeoni
Copy link
Author

I see the log and there are no errors about cqp. I think that my configuration is detected because the configuration of db is working (otherwise the csniper server crash i suppose).
I join my docker file and my propertie file if it is help.
PS : I use tomcat 8 :s, I am trying to use tomcat 6 and I come back to you
csniper.tar.gz

@simonmeoni
Copy link
Author

simonmeoni commented Oct 9, 2017

I modify my docker file in order to use tomcat 6 and I have this warning during deployment phase :
csniper_1 | 2017-10-09 08:12:09 WARN [main] (PropertiesLoaderSupport:198) - Could not load properties from class path resource [META-INF/csniper.properties]: class path resource [META-INF/csniper.properties] cannot be opened because it does not exist
I have the same problem

(I join my configuration on this comment)
csniper.tar.gz

@reckart
Copy link
Member

reckart commented Oct 9, 2017

That is just a warning that you should be able to safely ignore. CSniper tries to load the properties from the classpath in addition of looking for them in its home directory.

@simonmeoni
Copy link
Author

Ok thanks Richard. Can you tell me more about the configuration about cwb and cqp ? Where I can find a test corpus in the csniper format ?

@reckart
Copy link
Member

reckart commented Oct 9, 2017

We do not have a pre-converted corpus at the moment (would be a good idea), but here is a description which describes the conversion process and lists several corpora that have been tested.

https://dkpro.github.io/dkpro-csniper/documentation/conversion/

@reckart
Copy link
Member

reckart commented Oct 9, 2017

It has been a while...

Looking at the code, it looks like you get the choice for a given engine only if there is actually a corpus present that was prepared for this engine.

public List<SearchEngine> listEngines(String aCorpusId)
{
File corpusPath = new File(repositoryPath, aCorpusId);
List<SearchEngine> filteredEngines = new ArrayList<SearchEngine>();
for (SearchEngine engine : engines) {
if (new File(corpusPath, engine.getName()).exists()) {
filteredEngines.add(engine);
}
}
return filteredEngines;
}

@simonmeoni
Copy link
Author

simonmeoni commented Oct 9, 2017

Ok thank you, I will prepare a corpus !

@simonmeoni
Copy link
Author

I tried to prepare data using this sample : http://www.coli.uni-saarland.de/projects/sfb378/negra-corpus/corpus-sample.export.

This is my script :

import de.tudarmstadt.ukp.dkpro.core.api.resources.CompressionMethod;
import de.tudarmstadt.ukp.dkpro.core.io.bincas.SerializedCasWriter;
import de.tudarmstadt.ukp.dkpro.core.io.imscwb.ImsCwbWriter;
import de.tudarmstadt.ukp.dkpro.core.io.negra.NegraExportReader;
import org.apache.uima.UIMAException;
import org.apache.uima.fit.pipeline.SimplePipeline;
import java.io.IOException;

import static org.apache.uima.fit.factory.AnalysisEngineFactory.createPrimitiveDescription;
import static org.apache.uima.fit.factory.CollectionReaderFactory.createDescription;

public class Converter {
    public static void main(String[] args) throws UIMAException, IOException {
        // Collection ID
        String id = args[0];

        // Source file (e.g. tuebadz-5.0.anaphora.export.bz2)
        String source = args[1];

        // Target folder
        String target = args[2];

        SimplePipeline.runPipeline(
                createDescription(NegraExportReader.class,
                        NegraExportReader.PARAM_SOURCE_LOCATION, source,
                        NegraExportReader.PARAM_COLLECTION_ID, id,
                        NegraExportReader.PARAM_LANGUAGE, "de",
                        NegraExportReader.PARAM_ENCODING, "ISO-8859-15",
                        NegraExportReader.PARAM_READ_PENN_TREE, true),

                createPrimitiveDescription(SerializedCasWriter.class,
                        SerializedCasWriter.PARAM_TARGET_LOCATION, target + "/bin",
                        SerializedCasWriter.PARAM_USE_DOCUMENT_ID, true,
                        SerializedCasWriter.PARAM_COMPRESSION, CompressionMethod.XZ),

                createPrimitiveDescription(ImsCwbWriter.class,
                        ImsCwbWriter.PARAM_TARGET_ENCODING, "UTF-8",
                        ImsCwbWriter.PARAM_TARGET_LOCATION, target + "/cqp/",
                        ImsCwbWriter.PARAM_WRITE_TEXT_TAG, true,
                        ImsCwbWriter.PARAM_WRITE_DOCUMENT_TAG, true,
                        ImsCwbWriter.PARAM_WRITE_OFFSETS, true,
                        ImsCwbWriter.PARAM_WRITE_LEMMA, true,
                        ImsCwbWriter.PARAM_WRITE_DOC_ID, false));
    }
}

I used gradle to resolve dependencies. But I have some problem, I obtain two files after excute my jar :
SAMPLE -- cqp (uima CONLL files (i dont know))
'-- bin -- 1.ser.xz (binary files)

(I attach to my post the output tree
SAMPLE.tar.gz
)
csniper logs give an undefined error for the corpus that I have generated.

Can you tell me more about the input format ?

@dodinh
Copy link
Member

dodinh commented Oct 11, 2017

What ImsCwbWriter produces here is the input format used for the cwb tools [1].
Your cqp file has to be converted using the tools detailed in the link.

The easier alternative (which is not documented on the csniper website, only in the code... mea culpa) is to specify the directory in which the cwb-tools (cwb-make, cwb-encode) reside, with ImsCwbWriter.PARAM_CQP_HOME. Running the pipeline then automatically creates the encoded cqp files, which have to be moved into the correct directories as outlined at the end of [2].

[1] http://cwb.sourceforge.net/files/CWB_Encoding_Tutorial/node3.html
[2] https://dkpro.github.io/dkpro-csniper/documentation/conversion/#example-conversion

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants