Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table clipper functionality (norma-0.7.0-alpha) and supporting analysis code #7

Open
wants to merge 69 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
384d5d5
FunnelTest
petermr Jun 8, 2017
b5e12b4
Tidy demos
petermr Jun 8, 2017
6daa21c
SNAPSHOT pom
petermr Jun 10, 2017
aad30f2
Correct apparently changed signature for getRotatedTexts(characterLis…
petermr Jun 10, 2017
9933edd
added minor changes due to upstream SVG changes
petermr Aug 1, 2017
8fe083b
fixed spaces in StyleSpan without fontSize
petermr Aug 9, 2017
a326fe5
transferred table stuff from norma
petermr Aug 9, 2017
9bd59de
added edge cases tables as tests
petermr Aug 9, 2017
bc92d3c
new table code and bypass PDF2SVG
petermr Aug 9, 2017
43ad529
minor tweaks reacting to compile problems
petermr Aug 9, 2017
2658bdd
tests supporting table extraction, funnels,
petermr Aug 9, 2017
162c82e
updated pom to bypass PDF2SVG
petermr Aug 9, 2017
ff85a9f
missing test data
petermr Aug 9, 2017
732153e
added code for fill, panel, rotate; and removed much cruft in tests, …
petermr Aug 11, 2017
f7ce154
summarisation of graphics and text components
petermr Aug 12, 2017
a864ab7
adding cell analysis in tables
petermr Aug 21, 2017
c847093
table row analysis
petermr Aug 21, 2017
c5a7adb
fix or comment out regressing tests
petermr Aug 21, 2017
d9ff472
add ContentBox functionality
petermr Sep 4, 2017
a0f6453
add ContentBox functionality
petermr Sep 4, 2017
fd6eb7d
phrase support for table analysis
petermr Sep 4, 2017
0eea3d3
more table section analysis
petermr Sep 4, 2017
ee631b8
added final changes
petermr Sep 4, 2017
4371435
added table tests
petermr Sep 4, 2017
1674c2d
added GenericRow and RowManager
petermr Sep 10, 2017
a6c6f3b
renamed Ruler => Rule
petermr Sep 10, 2017
7b8e642
renamed Rulers
petermr Sep 10, 2017
189d37f
fixed tests
petermr Sep 10, 2017
298b0aa
fixed tests
petermr Sep 10, 2017
b41df27
moved ContentBox to new package
petermr Sep 22, 2017
184844c
changing PhraseListList to TextChunk and various other renames
petermr Sep 22, 2017
17b0258
added sectioning code and signals for table section separation
petermr Sep 22, 2017
0411831
knock on changes from PhraseLists
petermr Sep 22, 2017
8b62285
fixed bugs in ContentBox
petermr Sep 22, 2017
68872fe
added some package info
petermr Sep 27, 2017
a482fdb
refactored move to svg
petermr Sep 27, 2017
e9ebe71
more changes due to refactoring to svg
petermr Sep 27, 2017
502cf68
MAJOR refactor; moved text functionality to svg and any caches (Conte…
petermr Oct 2, 2017
02ee9c6
MAJOR refactor; text and cache stuff moved to svg
petermr Oct 2, 2017
f7ec7cb
removed indexer/ as never used; Ignored long tests
petermr Oct 3, 2017
03377f2
fixed or ignored tests so they run; after some major deletions
petermr Oct 3, 2017
d92713a
removed most of the OLD stuff from svg2xml and successfully ran tests
petermr Oct 3, 2017
6784e30
post-refactor tweaks
petermr Oct 6, 2017
fac8e39
cleaned pom and ignored long tests
petermr Oct 31, 2017
b70c4e6
ignored long test
petermr Oct 31, 2017
3a87e0e
added PageCropper and tested
petermr Nov 6, 2017
6cc4a12
more tweaks to PageCropper
petermr Nov 8, 2017
25a54b7
added test output for bar plots
petermr Nov 9, 2017
d020917
adjustments (mainly to tests) because of rafactor
petermr Nov 23, 2017
42299c7
fixed failing tests due to upstream changes and missing files
petermr Dec 12, 2017
cb43bd2
fixed merge so it compiles and passes all tests except those which de…
petermr Dec 14, 2017
29e6349
added TableTextStructurer to bridge svghtml text to svg2xml table. A …
petermr Dec 16, 2017
8fd47f7
PageCropper working on TableClipper project example
jkbcm Jan 12, 2018
0b2e8e6
Make pom cross-platform using UTF-8 encoding
jkbcm Jan 15, 2018
16e6df2
Remove trailing space on directory name (NTFS compatibility)
jkbcm Jan 15, 2018
8876071
Add example PDF.js document
jkbcm Jan 19, 2018
6dbb883
added TableTextStructurer to bridge svghtml text to svg2xml table. A …
petermr Dec 16, 2017
e2659a6
Added TableTextStructurer to bridge svghtml text to svg2xml table (pmr)
jkbcm Jan 29, 2018
9100006
hanging changes
petermr Jan 16, 2018
e0ece00
changed dependencies; upstream pdf2svg; downstream cproject;
petermr Jan 17, 2018
d9141f1
added page cropping functionality
petermr Jan 24, 2018
6817302
added Demos in test
petermr Jan 24, 2018
921eb21
reinstated pdf2svg conversion
petermr Jan 24, 2018
46f8274
added cropping
petermr Jan 25, 2018
257d60d
Ensure build is cross-platform copy resources using UTF-8
jkbcm Feb 6, 2018
017b285
Remove trailing space on directory name (NTFS compatibility)
jkbcm Jan 15, 2018
6deff5f
Subtables from indent and split compound columns
jkbcm Feb 19, 2018
191f4f0
Temporary fix for handling for Universal Greek with MathPi font
jkbcm Feb 19, 2018
df76a5c
Default subtable recognition by identation to false
jkbcm Mar 1, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
Binary file added demos/plot/101016cell201601002/fulltext.pdf
Binary file not shown.
776 changes: 776 additions & 0 deletions demos/plot/101016cell201601002/svg/fulltext-page1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4,557 changes: 4,557 additions & 0 deletions demos/plot/101016cell201601002/svg/fulltext-page10.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5,037 changes: 5,037 additions & 0 deletions demos/plot/101016cell201601002/svg/fulltext-page11.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5,918 changes: 5,918 additions & 0 deletions demos/plot/101016cell201601002/svg/fulltext-page12.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7,080 changes: 7,080 additions & 0 deletions demos/plot/101016cell201601002/svg/fulltext-page13.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4,854 changes: 4,854 additions & 0 deletions demos/plot/101016cell201601002/svg/fulltext-page14.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4,646 changes: 4,646 additions & 0 deletions demos/plot/101016cell201601002/svg/fulltext-page2.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5,704 changes: 5,704 additions & 0 deletions demos/plot/101016cell201601002/svg/fulltext-page3.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4,595 changes: 4,595 additions & 0 deletions demos/plot/101016cell201601002/svg/fulltext-page4.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7,084 changes: 7,084 additions & 0 deletions demos/plot/101016cell201601002/svg/fulltext-page5.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6,719 changes: 6,719 additions & 0 deletions demos/plot/101016cell201601002/svg/fulltext-page6.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5,482 changes: 5,482 additions & 0 deletions demos/plot/101016cell201601002/svg/fulltext-page7.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5,930 changes: 5,930 additions & 0 deletions demos/plot/101016cell201601002/svg/fulltext-page8.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5,362 changes: 5,362 additions & 0 deletions demos/plot/101016cell201601002/svg/fulltext-page9.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8,000 changes: 8,000 additions & 0 deletions demos/plot/101016cell201601002/svg/page4diag.svg

Large diffs are not rendered by default.

Binary file added demos/plot/101016cell201601002a.zip
Binary file not shown.
Binary file added demos/plot/101016cell201601002a/fulltext.pdf
Binary file not shown.
1,305 changes: 1,305 additions & 0 deletions demos/plot/101016cell201601002a/svg/barchart1.10.svg

Large diffs are not rendered by default.

776 changes: 776 additions & 0 deletions demos/plot/101016cell201601002a/svg/fulltext-page1.svg

Large diffs are not rendered by default.

4,557 changes: 4,557 additions & 0 deletions demos/plot/101016cell201601002a/svg/fulltext-page10.svg

Large diffs are not rendered by default.

7,080 changes: 7,080 additions & 0 deletions demos/plot/101016cell201601002a/svg/fulltext-page13.svg

Large diffs are not rendered by default.

4,646 changes: 4,646 additions & 0 deletions demos/plot/101016cell201601002a/svg/fulltext-page2.svg

Large diffs are not rendered by default.

4,595 changes: 4,595 additions & 0 deletions demos/plot/101016cell201601002a/svg/fulltext-page4.svg

Large diffs are not rendered by default.

7,084 changes: 7,084 additions & 0 deletions demos/plot/101016cell201601002a/svg/fulltext-page5.svg

Large diffs are not rendered by default.

4,288 changes: 4,288 additions & 0 deletions demos/plot/22649_Sada_2012-1/svg/fulltext-page1.svg

Large diffs are not rendered by default.

5,524 changes: 5,524 additions & 0 deletions demos/plot/22649_Sada_2012-1/svg/fulltext-page2.svg

Large diffs are not rendered by default.

2,901 changes: 2,901 additions & 0 deletions demos/plot/22649_Sada_2012-1/svg/fulltext-page3.svg

Large diffs are not rendered by default.

3,766 changes: 3,766 additions & 0 deletions demos/plot/22649_Sada_2012-1/svg/fulltext-page4.svg

Large diffs are not rendered by default.

1,648 changes: 1,648 additions & 0 deletions demos/plot/22649_Sada_2012-1/svg/fulltext-page5.svg

Large diffs are not rendered by default.

22 changes: 16 additions & 6 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,22 +5,27 @@
<modelVersion>4.0.0</modelVersion>

<properties>
<svg2xml.version>2.1.0</svg2xml.version>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<!-- upstream -->
<pdf2svg.version>2.1.0</pdf2svg.version>
<!-- <svghtml.version>1.0.0-SNAPSHOT</svghtml.version> --> <!-- not used -->

<pdf2svg.version>2.4.0-SNAPSHOT</pdf2svg.version>

</properties>

<groupId>org.contentmine</groupId>
<groupId>org.xmlcml</groupId>
<artifactId>svg2xml</artifactId>
<version>${svg2xml.version}</version>
<version>2.4.0-SNAPSHOT</version>
<description>Takes SVG from PDF2SVG and analyzes the domain-independent semantics</description>

<dependencies>
<!-- uses new combined library through pdf2svg -->
<dependency>
<groupId>org.contentmine</groupId>
<groupId>org.xmlcml</groupId>
<artifactId>pdf2svg</artifactId>
<version>${pdf2svg.version}</version>
</dependency>
<!-- org.xmlcml:pdf2svg:jar:2.4.0-SNAPSHOT -->
</dependencies>


Expand All @@ -33,11 +38,12 @@
</resources>

<plugins>
<!--
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
</plugin>

-->
<!-- giant jar -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
Expand Down Expand Up @@ -76,6 +82,10 @@
<mainClass>org.xmlcml.svg2xml.pdf.PDFAnalayzer</mainClass>
<id>svg2xml</id>
</program>
<program>
<mainClass>org.xmlcml.svg2xml.page.PageCropper</mainClass>
<id>cropper</id>
</program>
</programs>
</configuration>
</plugin>
Expand Down
104 changes: 104 additions & 0 deletions src/main/java/org/xmlcml/svg2xml/PDF2SVGConverterWrapper.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
package org.xmlcml.svg2xml;

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

import org.apache.commons.io.FileExistsException;
import org.apache.commons.io.FileUtils;
import org.apache.log4j.Level;
import org.apache.log4j.Logger;
import org.xmlcml.graphics.svg.SVGSVG;
import org.xmlcml.pdf2svg.PDF2SVGConverter;

/** dummy class to enable new PDF2SVG to be slotted in.
*
* wrapper for PDF2SVG
*
* @author pm286
*
*/
public class PDF2SVGConverterWrapper {
private static final Logger LOG = Logger.getLogger(PDF2SVGConverterWrapper.class);
private PDF2SVGConverter converter;

static {
LOG.setLevel(Level.DEBUG);
}

public PDF2SVGConverterWrapper() {
converter = new PDF2SVGConverter();
}

public void run(String ...pdf2svgArgs) {
converter.run(pdf2svgArgs);
return;
}

public List<SVGSVG> getPageList() {
return converter.getPageList();
}

public void createSVGFromPDF(File pdfOrigDir, File projectDir) {
projectDir.mkdirs();
File[] pdfDirs = pdfOrigDir.listFiles();
for (File pdfDir : pdfDirs) {
File projectPdfDir = new File(projectDir, pdfDir.getName());
projectPdfDir.mkdirs();
File pdfFile = new File(pdfDir, "fulltext.pdf");
File svgDir = new File(projectPdfDir, "svg/");
svgDir.mkdirs();

String[] args0 = {"-logger", "-infofiles", "-logglyphs", "-outdir", svgDir.toString(), pdfFile.toString()};
List<String> argList = new ArrayList<String>(Arrays.asList(args0));
argList.add("-compact");
run(argList.toArray(new String[0]));

File pngDir = new File(projectPdfDir, "png/");
List<File> pngFiles = new ArrayList<File>(FileUtils.listFiles(svgDir, new String[] {"png"}, false));
for (File pngFile : pngFiles) {
movePngFileToDirectory(pngDir, pngFile);
}
}
}

private void movePngFileToDirectory(File pngDir, File pngFile) {
if (FileUtils.sizeOf(pngFile) == 0) {
pngFile.delete();
} else {
try {
FileUtils.moveToDirectory(pngFile, pngDir, true);
} catch (FileExistsException fee) {
File oldFile = new File(pngDir, pngFile.getName());
try {
oldFile.delete();
FileUtils.moveToDirectory(pngFile, pngDir, true);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
} catch (Exception e) {
LOG.error("cannot move file: "+e);
}
}
}

/**
* Page numbered from ONE
*
* @param file
* @param page
* @return
*/
public static SVGSVG getSVGPageFromPDF(File file, int page) {
PDF2SVGConverterWrapper converter = new PDF2SVGConverterWrapper();
converter.run("-outdir target "+file);
LOG.warn("PDF2SVGConverter shorted out");
SVGSVG svgPage = (page < 1 || page > converter.getPageList().size() ? null : converter.getPageList().get(page - 1));
return svgPage;
}


}
165 changes: 0 additions & 165 deletions src/main/java/org/xmlcml/svg2xml/builder/GeometryBuilder.java

This file was deleted.

Loading