diff --git a/README b/README index 0e00890..5eef31a 100644 --- a/README +++ b/README @@ -1,8 +1,50 @@ +PURPOSE +The File Analyzer and Metadata Harvester is designed to automate simple, file-based operations performed by cultural heritage institutions. +- Example: count files by type +- Example: test file naming conventions + +The application assembles a toolkit of tasks a user can perform +- Present tasks in a simple User Interface +- Reduce learning curve when deploying a new task +- New tasks are easy for a user to locate + +The modular code makes it easy to deploy new code and to share code. + +APPLICATION DESCRIPTION +- Desktop Application written in Java +- Scans a file system and performs an action on files +- Imports a file, and performs an action on records +- Presents summary results +- Results Merging +- Customizable +- Easy to add additional rules to the application to implement local business rules +- Same UI for all tasks + +OVERVIEW PRESENTATIONS +FILE ANALYZER WIKI +- https://github.com/Georgetown-University-Libraries/File-Analyzer/wiki +WRLC (Washington Research Library Consortium) Forum Presentation - Sept 2013 +- https://docs.google.com/presentation/d/1hqbusyuNcsphFeAmsNuI3120c4iQqBycB9n7RQ6txdA/pub?start=false&loop=false&delayms=3000 +OpenRepositories 2013 Presentation +- http://or2013.net/sessions/focus-your-content-not-ingesting-your-content +- http://or2013.net/sites/or2013.net/files/slides/OR2013%20Focus%20on%20Your%20Content.ppt +Code4Lib Presentation +- https://docs.google.com/presentation/d/1Sq2gqGm58DeuzSTigEofGGci_szqo5rwctgqER2HOKo/pub?start=false&loop=false&delayms=3000 +- https://archive.org/details/Code4LibLightningTalksApr2013TerryBrady +DSpace Tools Overview +- http://prezi.com/wridwe2h0d4a/automating-dspace-folder-creation/?kw=view-wridwe2h0d4a&rc=ref-16497460 +- http://prezi.com/cigoderbzi2g/dspace-ingest-via-file-analyzer/?kw=view-cigoderbzi2g&rc=ref-16497460 +Digitization Tools +- https://docs.google.com/document/d/1KopsLHZCN2dxmZoWAV4QUpFETBRnpOID8lpq65I6Ne4/pub + +================================================================================================== + This code has been derived from the NARA File Analyzer and Metadata Harvester which is available at https://github.com/usnationalarchives/File-Analyzer. PREREQUISITES -- JDK 1.6 or higher +- JDK 1.6 or higher (for build) +- JRE 1.6 or higher (for runtime) - Maven (or you will need to compile the modules manually) INSTALLATION @@ -19,11 +61,24 @@ This code will build 3 flavors of the File Analyzer. (3) Demo File Analyzer This version contains extensions illustrating various capabilities of the File Analyzer. This version of the file analyzer is a self-extracting jar file that references both the core and dspace file analyzer jar files. - Sample data will also be added. + This version of the application uses features of Apache Tika and BagIt + +================================================================================================= +License information is contained below. +------------------------------------------------------------------------- + +Copyright (c) 2013, Georgetown University Libraries +All rights reserved. + +Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: -The original license information is contained below. +Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. +Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ------------------------------ +================================================================================================= +The NARA license for the base code is contained below. +------------------------------------------------------------------------- NARA OPEN SOURCE AGREEMENT VERSION 1.3, Based on NASA Open Source Agreement for Government Agencies, as approved by the Open Source Initiative. diff --git a/core/src/main/gov/nara/nwts/ftapp/FTDriver.java b/core/src/main/gov/nara/nwts/ftapp/FTDriver.java index 95630dd..08f5c82 100644 --- a/core/src/main/gov/nara/nwts/ftapp/FTDriver.java +++ b/core/src/main/gov/nara/nwts/ftapp/FTDriver.java @@ -2,7 +2,7 @@ import gov.nara.nwts.ftapp.filetest.FileTest; import gov.nara.nwts.ftapp.filter.FileTestFilter; -import gov.nara.nwts.ftapp.importer.DelimitedFileImporter; +import gov.nara.nwts.ftapp.importer.DelimitedFileReader; import gov.nara.nwts.ftapp.importer.Importer; import gov.nara.nwts.ftapp.stats.Stats; import gov.nara.nwts.ftapp.stats.StatsItemConfig; @@ -68,7 +68,9 @@ public Preferences getPreferences() { public boolean hasPreferences() { return getPreferences()!=null; } - + + public void setPreference(String path, String value){ + } protected Vector>batchItems; public static void dumpNode(Node n) { @@ -264,7 +266,7 @@ public void initiateFileTest(File input, File output) { fileTraversal.traverseFile(); } public void loadBatch(File f) throws IOException { - batchItems = DelimitedFileImporter.parseFile(f, "\t", false); + batchItems = DelimitedFileReader.parseFile(f, "\t", false); batchLoaded(); } diff --git a/core/src/main/gov/nara/nwts/ftapp/counter/Cell.java b/core/src/main/gov/nara/nwts/ftapp/counter/Cell.java new file mode 100644 index 0000000..bf9dc47 --- /dev/null +++ b/core/src/main/gov/nara/nwts/ftapp/counter/Cell.java @@ -0,0 +1,115 @@ +package gov.nara.nwts.ftapp.counter; + +import java.text.NumberFormat; +import java.util.ArrayList; +import java.util.List; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +public class Cell { + int row = 0; + int col = 0; + boolean valid = false; + + static Pattern pCell = Pattern.compile("^([A-Z]+)(\\d+)$"); + + Cell(int row, int col) { + this.row = row; + this.col = col; + valid = true; + } + + Cell(String cellname) { + Matcher m = pCell.matcher(cellname); + if (!m.matches()) return; + String colstr=m.group(1); + int row = Integer.parseInt(m.group(2)) - 1; + int col = -1; //offset -1 for vector ref + for(int i=0;i 0; val = val / 26) { + val--; //A=1 if not last digit + rem = val % 26; + colstr.append((char)('A'+rem)); + } + return colstr.reverse().toString() + rowstr; + } + + public static final NumberFormat nf = NumberFormat.getIntegerInstance(); + static { + nf.setMinimumIntegerDigits(6); + nf.setGroupingUsed(false); + } + + public String getCellSort() { + return nf.format(row) + "," + nf.format(col); + } + + public static List makeRange(int srow, int scol, int endrow, int endcol) { + ArrayList cells = new ArrayList(); + for(int r=srow; r<=endrow; r++) { + for(int c=scol; c<=endcol; c++) { + cells.add(new Cell(r, c)); + } + } + return cells; + } + + public static void test(String s) { + Cell c = new Cell(s); + System.out.println(s+" "+c.getCellname()+" "+c.row+","+c.col); + } + + public static void main(String[] argv) { + test("A1"); + test("A2"); + test("B1"); + test("B10"); + test("Z1"); + test("Z99"); + test("AA1"); + test("AA1000"); + test("AZ1"); + test("AZ2"); + test("BA1"); + test("BA2"); + test("ZA1"); + test("ZA2"); + test("ZZ1"); + test("ZZ2"); + test("AAA1"); + test("AAA2"); + test("YYY1"); + test("YYY2"); + test("YZZ1"); + test("YZZ2"); + test("ZYZ1"); + test("ZYZ2"); + test("ZZZ1"); + test("ZZZ2"); + test("AAAA1"); + test("AAAA2"); + } +} \ No newline at end of file diff --git a/core/src/main/gov/nara/nwts/ftapp/counter/CellCheck.java b/core/src/main/gov/nara/nwts/ftapp/counter/CellCheck.java new file mode 100644 index 0000000..23f1cc3 --- /dev/null +++ b/core/src/main/gov/nara/nwts/ftapp/counter/CellCheck.java @@ -0,0 +1,35 @@ +package gov.nara.nwts.ftapp.counter; + +import java.util.ArrayList; +import java.util.List; + +public class CellCheck { + ArrayList cells = new ArrayList(); + + CounterCheck check; + CellCheck(CounterCheck check) { + this.check = check; + } + CellCheck(CounterCheck check, Cell cell) { + this.check = check; + cells.add(cell); + } + CellCheck(CounterCheck check, List acell) { + this.check = check; + for(Cell cell: acell){ + cells.add(cell); + } + } + List performCheck(CounterData cd) { + ArrayList results = new ArrayList(); + for(Cell cell: cells) { + CheckResult res = check.performCheck(cd, cell, cd.getCellValue(cell)); + results.add(res); + if (res.stat.ordinal() >= CounterStat.ERROR.ordinal()) { + break; + } + } + return results; + } + +} diff --git a/core/src/main/gov/nara/nwts/ftapp/counter/CheckResult.java b/core/src/main/gov/nara/nwts/ftapp/counter/CheckResult.java new file mode 100644 index 0000000..1f4bc58 --- /dev/null +++ b/core/src/main/gov/nara/nwts/ftapp/counter/CheckResult.java @@ -0,0 +1,65 @@ +package gov.nara.nwts.ftapp.counter; + +public class CheckResult { + CounterRec rec; + public Cell cell; + public CounterStat stat; + public String message = ""; + public String newVal; + + CheckResult(Cell cell, CounterStat stat) { + this.rec = CounterRec.CELL; + this.cell = cell; + this.stat = stat; + } + + CheckResult(CounterStat stat) { + this.rec = CounterRec.FILE; + this.stat = stat; + } + + CheckResult setMessage(String message) { + this.message = message; + return this; + } + + CheckResult setNewVal(String newVal) { + this.newVal = newVal; + return this; + } + + static CheckResult createFileStatus(CounterStat stat) { + return new CheckResult(stat); + } + static CheckResult createCellStatus(Cell cell, CounterStat stat) { + return new CheckResult(cell, stat); + } + static CheckResult createCellValid(Cell cell) { + return createCellStatus(cell, CounterStat.VALID); + } + static CheckResult createCellWarning(Cell cell, String message) { + return createCellStatus(cell, CounterStat.WARNING).setMessage(message); + } + static CheckResult createCellInvalid(Cell cell, String message) { + return createCellStatus(cell, CounterStat.INVALID).setMessage(message); + } + static CheckResult createCellInvalidCase(Cell cell, String message) { + return createCellStatus(cell, CounterStat.WARNING_CASE).setMessage(message); + } + static CheckResult createCellInvalidPunct(Cell cell, String message) { + return createCellStatus(cell, CounterStat.WARNING_PUNCT).setMessage(message); + } + static CheckResult createCellInvalidTrim(Cell cell, String message) { + return createCellStatus(cell, CounterStat.WARNING_TRIM).setMessage(message); + } + static CheckResult createCellInvalidDate(Cell cell, String message) { + return createCellStatus(cell, CounterStat.WARNING_DATE).setMessage(message); + } + static CheckResult createCellInvalidSum(Cell cell, String message) { + return createCellStatus(cell, CounterStat.INVALID_SUM).setMessage(message); + } + static CheckResult createCellError(Cell cell, String message) { + return createCellStatus(cell, CounterStat.ERROR).setMessage(message); + } + +} \ No newline at end of file diff --git a/core/src/main/gov/nara/nwts/ftapp/counter/ColSumCounterCheck.java b/core/src/main/gov/nara/nwts/ftapp/counter/ColSumCounterCheck.java new file mode 100644 index 0000000..7ad03b2 --- /dev/null +++ b/core/src/main/gov/nara/nwts/ftapp/counter/ColSumCounterCheck.java @@ -0,0 +1,22 @@ +package gov.nara.nwts.ftapp.counter; + +public class ColSumCounterCheck extends SumCounterCheck { + String val; + int sr; + int er; + public ColSumCounterCheck(int sr, int er, String message) { + super(message); + this.sr = sr; + this.er = er; + } + + public int getRangeSum(CounterData cd, Cell cell) { + int rangesum = 0; + for(int r=sr; r<=er; r++) { + rangesum += getIntValue(cd.getCellValue(new Cell(r, cell.col)), 0); + } + return rangesum; + } + +} + diff --git a/core/src/main/gov/nara/nwts/ftapp/counter/CounterCheck.java b/core/src/main/gov/nara/nwts/ftapp/counter/CounterCheck.java new file mode 100644 index 0000000..635ef1a --- /dev/null +++ b/core/src/main/gov/nara/nwts/ftapp/counter/CounterCheck.java @@ -0,0 +1,27 @@ +package gov.nara.nwts.ftapp.counter; + +public class CounterCheck { + String message = "Cell does not match specifications"; + CounterStat stat = CounterStat.INVALID; + boolean allowNull = false; + + public CheckResult performCheck(CounterData cd, Cell cell, String cellval) { + return CheckResult.createCellValid(cell); + } + + public CounterCheck setMessage(String message) { + this.message = message; + return this; + } + + public CounterCheck setCounterStat(CounterStat stat) { + this.stat = stat; + return this; + } + + public CounterCheck setAllowNull(boolean b) { + allowNull = b; + return this; + } + +} diff --git a/core/src/main/gov/nara/nwts/ftapp/counter/CounterData.java b/core/src/main/gov/nara/nwts/ftapp/counter/CounterData.java new file mode 100644 index 0000000..b7ecbc2 --- /dev/null +++ b/core/src/main/gov/nara/nwts/ftapp/counter/CounterData.java @@ -0,0 +1,152 @@ +package gov.nara.nwts.ftapp.counter; + +import java.util.ArrayList; +import java.util.TreeMap; +import java.util.Vector; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +public class CounterData { + Pattern pRptType = Pattern.compile("^(.*?)\\(([^\\)]+)\\)( - )?(.*?)$"); + Vector> data; + public ArrayList results = new ArrayList(); + + public CheckResult fileStat = CheckResult.createFileStatus(CounterStat.VALID); + + public String report = ""; + public String version = ""; + public RPT rpt; + public String title = ""; + + public int getMaxRow() { + return data.size(); + } + + public int getMaxCol(int row) { + if (data.size() > row) { + return data.get(row).size()-1; + } + return 0; + } + + public int getLastCol(int row) { + if (data.size() > row) { + Vector drow = data.get(row); + return getLastCol(row, drow.size() - 1); + } + return 0; + } + + public int getLastCol(int row, int mcol) { + if (data.size() > row) { + Vector drow = data.get(row); + if (mcol > drow.size()-1) mcol = drow.size() - 1; + for(int c=mcol; c>=0; c--) { + String s = drow.get(c); + if (s == null) continue; + if (!s.isEmpty()) return c; + } + } + return 0; + } + + public int getLastRow() { + for(int r=data.size()-1; r>=0; r--) { + Vector row = data.get(r); + if (row == null) continue; + for(String s: row) { + if (s == null) continue; + if (!s.isEmpty()) return r; + } + } + return 0; + } + + public CounterStat getStat() { + return fileStat.stat; + } + + public String getMessage() { + return fileStat.message; + } + + public CounterData(Vector> data) { + this.data = data; + } + + + public void validate() { + ReportType reportType = identifyReportType(); + if (reportType == null) return; + if (fileStat.stat != CounterStat.VALID && fileStat.stat != CounterStat.UNSUPPORTED_REPORT) return; + results.addAll(reportType.validate(this)); + TreeMap resultCount = new TreeMap(); + CounterStat overall = fileStat.stat; + for(CheckResult cr: results) { + overall = (cr.stat.ordinal() > overall.ordinal()) ? cr.stat : overall; + Integer x = resultCount.get(cr.stat); + x = (x == null) ? 1 : x.intValue() + 1; + resultCount.put(cr.stat, x); + } + StringBuffer buf = new StringBuffer(); + Vector cstats = new Vector(); + for(CounterStat st: resultCount.keySet()) { + cstats.add(0, st); + } + for(CounterStat st: cstats) { + buf.append(st.name()); + buf.append(": "); + buf.append(resultCount.get(st)); + buf.append(" cells; "); + } + fileStat = CheckResult.createFileStatus(overall).setMessage(buf.toString()); + } + + + public ReportType identifyReportType() { + String A1 = getCellValue(Cell.at("A1")); + if (A1 == null) { + fileStat = CheckResult.createFileStatus(CounterStat.INVALID).setMessage("Empty Cell A1"); + return null; + } + + Matcher m = pRptType.matcher(A1); + if (m.matches()) { + report = m.group(1).trim(); + version = m.group(2); + rpt = RPT.createRPT(report, version); + if (rpt == null) { + fileStat = CheckResult.createFileStatus(CounterStat.UNKNOWN_REPORT_TYPE).setMessage("Unrecognized version: "+version); + return null; + } + + title = m.group(4); + } else { + report = A1; + fileStat = CheckResult.createFileStatus(CounterStat.UNKNOWN_REPORT_TYPE).setMessage("No version specified"); + return null; + } + + String B1 = getCellValue(Cell.at("B1")); + title += B1; + + ReportType reportType = rpt.reportType; + if (reportType == null) { + fileStat = CheckResult.createFileStatus(CounterStat.UNKNOWN_REPORT_TYPE); + return null; + } + + reportType.init(this); + + if (!reportType.isSupported()) { + fileStat = CheckResult.createFileStatus(CounterStat.UNSUPPORTED_REPORT); + } + return reportType; + } + + public String getCellValue(Cell cell) { + if (cell.row >= data.size() || cell.row < 0) return null; + if (cell.col >= data.get(cell.row).size() || cell.col < 0) return null; + return data.get(cell.row).get(cell.col); + } +} diff --git a/core/src/main/gov/nara/nwts/ftapp/counter/CounterRec.java b/core/src/main/gov/nara/nwts/ftapp/counter/CounterRec.java new file mode 100644 index 0000000..4553618 --- /dev/null +++ b/core/src/main/gov/nara/nwts/ftapp/counter/CounterRec.java @@ -0,0 +1,6 @@ +package gov.nara.nwts.ftapp.counter; + +public enum CounterRec { + FILE, + CELL +} diff --git a/core/src/main/gov/nara/nwts/ftapp/counter/CounterStat.java b/core/src/main/gov/nara/nwts/ftapp/counter/CounterStat.java new file mode 100644 index 0000000..ed744ea --- /dev/null +++ b/core/src/main/gov/nara/nwts/ftapp/counter/CounterStat.java @@ -0,0 +1,17 @@ +package gov.nara.nwts.ftapp.counter; + +public enum CounterStat { + VALID, //Cell fonforms to rule + WARNING, //Cell does not conform to spec, but issue occurrs so frequently it is being passed + WARNING_CASE, //Cell has case mismatch + WARNING_PUNCT, //Cell has trailing punct + WARNING_TRIM, //Cell has extra whitespace + WARNING_DATE, //Cell date is not in proper format + INVALID_BLANK, //Non blank expected + INVALID_SUM, //Invalid sum found + INVALID, //Cell does not conform to rule + ERROR, //Processing error interpreting rule + UNSUPPORTED_REPORT, //Code not yet implemented for this report type + UNKNOWN_REPORT_TYPE, //Report type does not match know types + UNSUPPORTED_FILE, //File cannot be broken into cells for processing +} diff --git a/core/src/main/gov/nara/nwts/ftapp/counter/DateCounterCheck.java b/core/src/main/gov/nara/nwts/ftapp/counter/DateCounterCheck.java new file mode 100644 index 0000000..7599756 --- /dev/null +++ b/core/src/main/gov/nara/nwts/ftapp/counter/DateCounterCheck.java @@ -0,0 +1,77 @@ +package gov.nara.nwts.ftapp.counter; + +import java.text.ParseException; +import java.text.SimpleDateFormat; +import java.util.Date; +import java.util.GregorianCalendar; + +class DateCounterCheck extends CounterCheck { + String val; + String fmtDisp; + String fmtParse; + SimpleDateFormat dfParse; + SimpleDateFormat dfDisp; + static SimpleDateFormat defdf = new SimpleDateFormat("MM/dd/yyyy"); + static SimpleDateFormat def2df = new SimpleDateFormat("yy-MMM"); + static Date y2000 = new GregorianCalendar(2000,1,1).getTime(); + static { + defdf.set2DigitYearStart(y2000); + def2df.set2DigitYearStart(y2000); + } + + public DateCounterCheck(String fmtDisp, String fmtParse, String message) { + this.fmtDisp = fmtDisp; + this.fmtParse = fmtParse; + this.message = message; + this.dfParse = new SimpleDateFormat(fmtParse); + this.dfParse.set2DigitYearStart(y2000); + this.dfDisp = new SimpleDateFormat(fmtDisp); + } + + public DateCounterCheck(String fmtDisp, String message) { + this(fmtDisp, fmtDisp, message); + } + public Date getDate(String cellval, SimpleDateFormat cdf) { + Date date = null; + try { + date = cdf.parse(cellval); + } catch (ParseException e) { + } + return date; + } + + public Date getDate(String cellval) { + Date date = getDate(cellval, dfParse); + if (date == null) { + date = getDate(cellval, defdf); + } + if (date == null) { + date = getDate(cellval, def2df); + } + return date; + } + + @Override + public CheckResult performCheck(CounterData cd, Cell cell, String cellval) { + if (cellval == null) { + return CheckResult.createCellInvalid(cell, message + ". Null cell value"); + } + if (cellval.isEmpty()) { + return CheckResult.createCellInvalid(cell, message + "Empty cell value"); + } + Date date = getDate(cellval); + + if (date == null) { + return CheckResult.createCellInvalid(cell, "Date parse error"); + } + + String s = dfDisp.format(date); + if (s.equals(cellval)) { + return CheckResult.createCellValid(cell); + } + + return CheckResult.createCellInvalidDate(cell, "Date in inproper format").setNewVal(s); + } + +} + diff --git a/core/src/main/gov/nara/nwts/ftapp/counter/IntCounterCheck.java b/core/src/main/gov/nara/nwts/ftapp/counter/IntCounterCheck.java new file mode 100644 index 0000000..8af1311 --- /dev/null +++ b/core/src/main/gov/nara/nwts/ftapp/counter/IntCounterCheck.java @@ -0,0 +1,37 @@ +package gov.nara.nwts.ftapp.counter; + +public class IntCounterCheck extends CounterCheck { + String val; + public IntCounterCheck(String message) { + this.message = message; + } + + public Integer getIntValue(String cellval) { + cellval = (cellval == null) ? "" : cellval.replaceAll("\\.\\d+$", ""); + try { + return Integer.parseInt(cellval); + } catch(NumberFormatException e) { + return null; + } + } + + public Integer getIntValue(String cellval, int def) { + cellval = (cellval == null) ? "" : cellval.replaceAll("\\.\\d+$", ""); + try { + return Integer.parseInt(cellval); + } catch(NumberFormatException e) { + return def; + } + } + + @Override + public CheckResult performCheck(CounterData cd, Cell cell, String cellval) { + if (getIntValue(cellval) == null) { + return CheckResult.createCellInvalid(cell, "Cells should be numeric"); + } + + return CheckResult.createCellValid(cell); + } + +} + diff --git a/core/src/main/gov/nara/nwts/ftapp/counter/IntOptCounterCheck.java b/core/src/main/gov/nara/nwts/ftapp/counter/IntOptCounterCheck.java new file mode 100644 index 0000000..0519e9c --- /dev/null +++ b/core/src/main/gov/nara/nwts/ftapp/counter/IntOptCounterCheck.java @@ -0,0 +1,14 @@ +package gov.nara.nwts.ftapp.counter; + +public class IntOptCounterCheck extends IntCounterCheck { + + public IntOptCounterCheck(String message) { + super(message); + } + + public CheckResult performCheck(CounterData cd, Cell cell, String cellval) { + if (cellval.isEmpty()) return CheckResult.createCellValid(cell); + return super.performCheck(cd, cell, cellval); + } + +} diff --git a/core/src/main/gov/nara/nwts/ftapp/counter/PatternCounterCheck.java b/core/src/main/gov/nara/nwts/ftapp/counter/PatternCounterCheck.java new file mode 100644 index 0000000..c77012a --- /dev/null +++ b/core/src/main/gov/nara/nwts/ftapp/counter/PatternCounterCheck.java @@ -0,0 +1,43 @@ +package gov.nara.nwts.ftapp.counter; + +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +class PatternCounterCheck extends CounterCheck { + Pattern patt; + Pattern fixmatch; + String rep; + public PatternCounterCheck(Pattern patt) { + this.patt = patt; + } + public PatternCounterCheck(Pattern patt, Pattern fixmatch, String rep) { + this.patt = patt; + this.rep = rep; + this.fixmatch = fixmatch; + } + @Override + public CheckResult performCheck(CounterData cd, Cell cell, String cellval) { + if (patt == null || cellval == null) { + return CheckResult.createCellInvalid(cell, message); + } + Matcher m = patt.matcher(cellval); + if (m.matches()) { + return CheckResult.createCellValid(cell); + } + + if (fixmatch != null && rep != null) { + m = fixmatch.matcher(cellval); + String newVal = m.replaceAll(rep); + m = patt.matcher(newVal); + if (m.matches()) { + return CheckResult.createCellStatus(cell, CounterStat.WARNING).setMessage(message).setNewVal(newVal); + } else { + return CheckResult.createCellInvalid(cell, message); + } + } + + CheckResult res = CheckResult.createCellStatus(cell, stat).setMessage(message); + return res; + } + +} diff --git a/core/src/main/gov/nara/nwts/ftapp/counter/REV.java b/core/src/main/gov/nara/nwts/ftapp/counter/REV.java new file mode 100644 index 0000000..28c308d --- /dev/null +++ b/core/src/main/gov/nara/nwts/ftapp/counter/REV.java @@ -0,0 +1,12 @@ +package gov.nara.nwts.ftapp.counter; + +public enum REV { + NA,R1,R3,R4; + + public static REV find(String s) { + for(REV rev: REV.values()) { + if (rev.name().equals(s)) return rev; + } + return REV.NA; + } +} diff --git a/core/src/main/gov/nara/nwts/ftapp/counter/RPT.java b/core/src/main/gov/nara/nwts/ftapp/counter/RPT.java new file mode 100644 index 0000000..f7951d7 --- /dev/null +++ b/core/src/main/gov/nara/nwts/ftapp/counter/RPT.java @@ -0,0 +1,82 @@ +package gov.nara.nwts.ftapp.counter; + +import gov.nara.nwts.ftapp.counterReport.BookReport1R4; +import gov.nara.nwts.ftapp.counterReport.BookReport2; +import gov.nara.nwts.ftapp.counterReport.BookReport2R4; +import gov.nara.nwts.ftapp.counterReport.DatabaseReport1; +import gov.nara.nwts.ftapp.counterReport.DatabaseReport1R4; +import gov.nara.nwts.ftapp.counterReport.DatabaseReport3; +import gov.nara.nwts.ftapp.counterReport.JournalReport1; +import gov.nara.nwts.ftapp.counterReport.JournalReport1R4; +import gov.nara.nwts.ftapp.counterReport.UnknownReport; + +/** + * R3: http://www.projectcounter.org/r3/Release3D9.pdf + * R4: http://www.projectcounter.org/r4/COPR4.pdf + * JournalReport1 + * JournalReport1a + * JournalReport1GOA (R4) + * JournalReport2 + * JournalReport3 + * JournalReport3Mobile + * JournalReport4 + * JournalReport5 + * DatabaseReport1 + * DatabaseReport2 + * DatabaseReport3 (R3) + * PlatformReport1 (R4) + * BookReport1 + * BookReport2 + * BookReport3 + * BookReport4 + * BookReport5 + * MultimediaReport1 + * MultimediaReport2 + * TitleReport1 + * TitleReport2 + * TitleReport3 + * TitleReport3Mobile + * + * ConsortiumReport1 + * ConsortiumReport2 + * ConsortiumReport3 + * + * @author twb27 + * + */ +public enum RPT { + UNKNOWN("",REV.NA, new UnknownReport()), + JR1_R3(JournalReport1.NAME, REV.R3, new JournalReport1()), + JR1_R4(JournalReport1.NAME, REV.R4, new JournalReport1R4()), + DR1_R3(DatabaseReport1.NAME, REV.R3, new DatabaseReport1()), + DR1_R4(DatabaseReport1.NAME, REV.R4, new DatabaseReport1R4()), + DR3_R3(DatabaseReport3.NAME, REV.R3, new DatabaseReport3()), + BR2_R1(BookReport2.NAME, REV.R1, new BookReport2()), + BR1_R4(BookReport1R4.NAME, REV.R4, new BookReport1R4()), + BR2_R4(BookReport2R4.NAME, REV.R4, new BookReport2R4()), + ; + + public String name; + public REV rev; + public ReportType reportType; + + RPT(String name, REV rev, ReportType reportType) { + this.name = name; + this.rev = rev; + this.reportType = reportType; + } + + static RPT createRPT(String name, REV rev) { + for(RPT rpt: RPT.values()) { + if (rpt.name.equals(name) && rpt.rev == rev) return rpt; + } + return null; + } + + static RPT createRPT(String name, String rev) { + for(RPT rpt: RPT.values()) { + if (rpt.name.equals(name) && rpt.rev.name().equals(rev)) return rpt; + } + return null; + } +} diff --git a/core/src/main/gov/nara/nwts/ftapp/counter/ReportType.java b/core/src/main/gov/nara/nwts/ftapp/counter/ReportType.java new file mode 100644 index 0000000..ac86884 --- /dev/null +++ b/core/src/main/gov/nara/nwts/ftapp/counter/ReportType.java @@ -0,0 +1,172 @@ +package gov.nara.nwts.ftapp.counter; + +import java.util.ArrayList; +import java.util.List; +import java.util.regex.Pattern; + +public abstract class ReportType { + + ArrayList checks = new ArrayList(); + + String name; + REV rev; + String title; + + public ReportType(String name, REV rev) { + this(name, rev, null); + } + public ReportType(String name, REV rev, String title) { + this.name = name; + this.rev = rev; + this.title = title; + } + + public final void init(CounterData data) { + initBase(data); + initCustom(data); + } + public void initCustom(CounterData data) { + + } + public final void initBase(CounterData data) { + checks.clear(); + addCheck("A1",new StaticCounterCheck(name + " (" + rev.name() + ")")); + if (title != null) { + addCheck("B1",new StaticCounterCheck(title).setCounterStat(CounterStat.WARNING)); + } + addCheckStandard(data); + } + + protected void addCheck(String cell, CounterCheck ccheck) { + checks.add(new CellCheck(ccheck, new Cell(cell))); + } + + protected void addCheck(int row, int col, CounterCheck ccheck) { + checks.add(new CellCheck(ccheck, new Cell(row, col))); + } + + protected void addCheckRange(CounterCheck ccheck, int srow, int scol, int endrow, int endcol) { + checks.add(new CellCheck(ccheck, Cell.makeRange(srow, scol, endrow, endcol))); + } + + public boolean isSupported() { + return false; + } + List validate(CounterData cd) { + ArrayList results = new ArrayList(); + for(CellCheck check: checks) { + results.addAll(check.performCheck(cd)); + if (results.size() > 0) { + CheckResult last = results.get(results.size()-1); + if (last.stat.ordinal() >= CounterStat.ERROR.ordinal()){ + break; + } + } + } + return results; + } + + public void checkColHeader(CounterData data) { + for(int c=getFirstDataCol(); c <= getLastDataCol(data); c++) { + addCheck(getHeadRow(), c, ReportType.MMMYYYY); + } + } + + public void checkFields(CounterData data, int dr, int fieldCol, String[] fields) { + for(int datarow = dr; data.getLastCol(datarow) > 0; datarow += fields.length){ + addCheckRange(ReportType.NONBLANK, datarow, 0, datarow+fields.length-1, 2); + for(int i=0; i 1) { + addCheckRange(new IntCounterCheck("Total must be a number"), getDataRow(), getTotalCol(data) + 1, data.getLastRow(), getTotalCol(data) + getTotalCols().length - 1); + } + } + + public void addCheckStandard(CounterData data) { + if (rev == REV.R3 || rev == REV.R1) { + addCheck("A2", ReportType.NONBLANK); + addCheck("A3", new StaticCounterCheck("Date run:")); + addCheck("A4", ReportType.YYYYMMDD); + } else if (rev == REV.R4) { + addCheck("A2", ReportType.NONBLANK); + //Institutional Identifier, A3 may be blank + addCheck("A4", new StaticCounterCheck("Period Covered by Report")); + addCheck("A5", ReportType.YYYYMMDD_to_YYYYMMDD); + addCheck("A6", new StaticCounterCheck("Date run")); + addCheck("A7", ReportType.YYYYMMDD); + } + + for(int col=0; col c) { + for(FileTest ft: this) { + if (c.isInstance(ft)) { + this.remove(ft); + break; + } + } + } } diff --git a/core/src/main/gov/nara/nwts/ftapp/filetest/BaseNameMatch.java b/core/src/main/gov/nara/nwts/ftapp/filetest/BaseNameMatch.java index c3d3245..98a24eb 100644 --- a/core/src/main/gov/nara/nwts/ftapp/filetest/BaseNameMatch.java +++ b/core/src/main/gov/nara/nwts/ftapp/filetest/BaseNameMatch.java @@ -43,7 +43,7 @@ public void initFilters() { } public String getDescription() { - return "This test reports on file size by base name (no extension) regardless of the directory in a file name is found."; + return "This test reports on file size by base name (no extension) regardless of the directory in which a file name is found."; } } diff --git a/core/src/main/gov/nara/nwts/ftapp/filetest/CountByType.java b/core/src/main/gov/nara/nwts/ftapp/filetest/CountByType.java index 821b611..caffa76 100644 --- a/core/src/main/gov/nara/nwts/ftapp/filetest/CountByType.java +++ b/core/src/main/gov/nara/nwts/ftapp/filetest/CountByType.java @@ -37,6 +37,8 @@ public void initFilters() { } public String getDescription() { - return "This test counts the number of files found by file extension."; + return "This test counts the number of files found by file extension.\n" + + "A report will be generated listing the number of files found for each extension" + + " as well as a cumulative number of bytes for files of each type."; } } diff --git a/core/src/main/gov/nara/nwts/ftapp/filetest/CounterValidation.java b/core/src/main/gov/nara/nwts/ftapp/filetest/CounterValidation.java new file mode 100644 index 0000000..76d8e56 --- /dev/null +++ b/core/src/main/gov/nara/nwts/ftapp/filetest/CounterValidation.java @@ -0,0 +1,247 @@ +package gov.nara.nwts.ftapp.filetest; + +import gov.nara.nwts.ftapp.counter.CheckResult; +import gov.nara.nwts.ftapp.counter.CounterData; +import gov.nara.nwts.ftapp.counter.CounterRec; +import gov.nara.nwts.ftapp.counter.CounterStat; +import gov.nara.nwts.ftapp.counter.REV; +import gov.nara.nwts.ftapp.ActionResult; +import gov.nara.nwts.ftapp.FTDriver; +import gov.nara.nwts.ftapp.Timer; +import gov.nara.nwts.ftapp.filetest.DefaultFileTest; +import gov.nara.nwts.ftapp.filter.CSVFilter; +import gov.nara.nwts.ftapp.filter.CounterFilter; +import gov.nara.nwts.ftapp.filter.TxtFilter; +import gov.nara.nwts.ftapp.ftprop.FTPropEnum; +import gov.nara.nwts.ftapp.importer.DelimitedFileReader; +import gov.nara.nwts.ftapp.importer.Importer; +import gov.nara.nwts.ftapp.stats.Stats; +import gov.nara.nwts.ftapp.stats.StatsGenerator; +import gov.nara.nwts.ftapp.stats.StatsItem; +import gov.nara.nwts.ftapp.stats.StatsItemConfig; +import gov.nara.nwts.ftapp.stats.StatsItemEnum; + +import java.io.File; +import java.io.IOException; +import java.util.HashSet; +import java.util.Set; +import java.util.Vector; + + +/** + * Extract all metadata fields from a TIF or JPG using categorized tag defintions. + * @author TBrady + * + */ +public class CounterValidation extends DefaultFileTest implements Importer { + public static enum Separator{ + DefaultForType(""), + Comma(","), + Tab("\t"), + Semicolon(";"), + Pipe("|"); + + public String separator; + Separator(String s) {separator = s;} + } + public static final String DELIM = "Delimiter"; + public static enum FIXABLE { + NA,YES,NO; + } + + public static enum CounterStatsItems implements StatsItemEnum { + File_Cell(StatsItem.makeStringStatsItem("File [cell row, cell col]", 100)), + Filename(StatsItem.makeStringStatsItem("File", 150)), + Cell(StatsItem.makeStringStatsItem("Cell", 50)), + Rec(StatsItem.makeEnumStatsItem(CounterRec.class, "Record").setWidth(60)), + Stat(StatsItem.makeEnumStatsItem(CounterStat.class, "Compliance").setWidth(170)), + Fixable(StatsItem.makeEnumStatsItem(FIXABLE.class, "Fixable").setWidth(40)), + Report(StatsItem.makeStringStatsItem("Counter Report", 150)), + Version(StatsItem.makeEnumStatsItem(REV.class, "Rev").setWidth(40)), + Message(StatsItem.makeStringStatsItem("Message", 400)), + CellValue(StatsItem.makeStringStatsItem("Cell Value", 250)), + Replacement(StatsItem.makeStringStatsItem("Replacement", 250)), + ; + StatsItem si; + CounterStatsItems(StatsItem si) {this.si=si;} + public StatsItem si() {return si;} + } + + public static enum Generator implements StatsGenerator { + INSTANCE; + class CounterStats extends Stats { + public CounterStats(String key) { + super(details, key); + this.setVal(CounterStatsItems.Rec, CounterRec.FILE); + } + public CounterStats(File f, String cellname) { + super(details, getKey(f, cellname)); + this.setVal(CounterStatsItems.Rec, CounterRec.CELL); + } + + } + public CounterStats create(String key) {return new CounterStats(key);} + public CounterStats create(File f, String cellname) {return new CounterStats(f, cellname);} + } + public static StatsItemConfig details = StatsItemConfig.create(CounterStatsItems.class); + + long counter = 1000000; + boolean showValid = false; + + Vector files = new Vector(); + Set reportName = new HashSet(); + + public CounterValidation(FTDriver dt) { + super(dt); + this.ftprops.add(new FTPropEnum(dt, this.getClass().getName(), DELIM, "delim", + "Delimiter character separating fields - if not default", Separator.values(), Separator.Comma)); + } + public String toString() { + return "Counter Compliance - CSV"; + } + public String getKey(File f) { + return f.getName(); + } + + + public static String getKey(File f, String cellname) { + return f.getName() + " [" + cellname + "]"; + } + + public String getShortName(){return "Counter";} + + + public String getSeparator(File f, String ext) { + Separator sep = (Separator)this.getProperty(DELIM, Separator.DefaultForType); + if (sep != Separator.DefaultForType) return sep.separator; + if (ext.equals("CSV")) return ","; + if (ext.equals("TXT")) return "\t"; + return "\t"; + } + + public Vector> getDataFromFile(File f, Stats s) { + String ext = getExt(f); + String sep = getSeparator(f, ext); + try { + Vector> data = new Vector>(); + if (ext.equals("CSV") || ext.endsWith("TXT")) { + data = DelimitedFileReader.parseFile(f, sep); + } + return data; + } catch (IOException e) { + s.setVal(CounterStatsItems.Stat, CounterStat.UNSUPPORTED_FILE); + s.setVal(CounterStatsItems.Message, e.toString()); + } catch (Exception e) { + e.printStackTrace(); + s.setVal(CounterStatsItems.Stat, CounterStat.UNSUPPORTED_FILE); + s.setVal(CounterStatsItems.Message, e.toString()); + } + return null; + } + + + public Object fileTest(File f) { + Stats s = getStats(f); + + Vector> data = getDataFromFile(f, s); + if (data == null) return s.getVal(CounterStatsItems.Stat); + + CounterData cd = new CounterData(data); + cd.validate(); + + if (cd.report != null) { + s.setVal(CounterStatsItems.Report, cd.report); + reportName.add(cd.report); + } + if (cd.version != null) s.setVal(CounterStatsItems.Version, REV.find(cd.version)); + s.setVal(CounterStatsItems.Filename, f.getName()); + files.add(f.getName()); + + if (cd.rpt == null) { + s.setVal(CounterStatsItems.Stat, CounterStat.UNSUPPORTED_REPORT); + s.setVal(CounterStatsItems.Message, "Report Type could not be identified"); + } else { + setCellStats(f, cd); + + s.setVal(CounterStatsItems.Stat, cd.getStat()); + s.setVal(CounterStatsItems.CellValue, cd.title); + s.setVal(CounterStatsItems.Message, cd.getMessage()); + } + + return s.getVal(CounterStatsItems.Stat); + } + + void setCellStats(File f, CounterData cd) { + for(CheckResult result: cd.results) { + if (result.stat == CounterStat.VALID && !showValid) continue; + Stats stat = Generator.INSTANCE.create(f, result.cell.getCellSort()); + this.dt.types.put(stat.key, stat); + stat.setVal(CounterStatsItems.Stat, result.stat); + stat.setVal(CounterStatsItems.Report, cd.rpt.name); + stat.setVal(CounterStatsItems.Version, cd.rpt.rev); + stat.setVal(CounterStatsItems.Filename, f.getName()); + stat.setVal(CounterStatsItems.Cell, result.cell.getCellname()); + stat.setVal(CounterStatsItems.Message, result.message); + + stat.setVal(CounterStatsItems.Replacement, result.newVal == null ? "" : result.newVal); + + String cellval = ""; + if (result.cell != null) { + String s = cd.getCellValue(result.cell); + cellval = (s == null) ? "" : s; + } + stat.setVal(CounterStatsItems.CellValue, cellval); + + FIXABLE fix = FIXABLE.NA; + if (result.stat != CounterStat.VALID) { + fix = result.newVal == null ? FIXABLE.NO : FIXABLE.YES; + } + stat.setVal(CounterStatsItems.Fixable, fix); + } + } + + + public void refineResults() { + getStatsDetails().get(CounterStatsItems.Filename.ordinal()).values = files.toArray(); + getStatsDetails().get(CounterStatsItems.Report.ordinal()).values = reportName.toArray(); + showValid = false; + } + + public void init() { + files.clear(); + } + + public Stats createStats(String key){ + return Generator.INSTANCE.create(key); + } + public StatsItemConfig getStatsDetails() { + return details; + } + + public String getDescription() { + return "Verify Counter Compliance (V3 or V4) of a supported report"; + } + + public void initFilters() { + filters.add(new CounterFilter()); + filters.add(new CSVFilter()); + filters.add(new TxtFilter()); + } + @Override + public ActionResult importFile(File selectedFile) throws IOException { + showValid = true; + Timer timer = new Timer(); + dt.types.clear(); + init(); + fileTest(selectedFile); + refineResults(); + return new ActionResult(selectedFile, selectedFile.getName(), this.toString(), details, dt.types, true, timer.getDuration()); + } + @Override + public boolean allowForceKey() { + return false; + } + + + +} diff --git a/core/src/main/gov/nara/nwts/ftapp/filetest/DigitalDerivatives.java b/core/src/main/gov/nara/nwts/ftapp/filetest/DigitalDerivatives.java new file mode 100644 index 0000000..24b6334 --- /dev/null +++ b/core/src/main/gov/nara/nwts/ftapp/filetest/DigitalDerivatives.java @@ -0,0 +1,217 @@ +package gov.nara.nwts.ftapp.filetest; + +import gov.nara.nwts.ftapp.FTDriver; +import gov.nara.nwts.ftapp.filetest.DefaultFileTest; +import gov.nara.nwts.ftapp.filter.DefaultFileTestFilter; +import gov.nara.nwts.ftapp.ftprop.FTPropString; +import gov.nara.nwts.ftapp.stats.Stats; +import gov.nara.nwts.ftapp.stats.StatsGenerator; +import gov.nara.nwts.ftapp.stats.StatsItem; +import gov.nara.nwts.ftapp.stats.StatsItemConfig; +import gov.nara.nwts.ftapp.stats.StatsItemEnum; + +import java.io.File; +import java.util.Vector; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +/** + * Extract all metadata fields from a TIF or JPG using categorized tag defintions. + * @author TBrady + * + */ +class DigitalDerivatives extends DefaultFileTest { + public static enum FOUND { + NOT_FOUND, + FOUND, + FOUND_DUP + ; + } + + public static enum STAT { + INCOMPLETE, + COMPLETE, + EXTRA + } + + private static enum DerivStatsItems implements StatsItemEnum { + Basename(StatsItem.makeStringStatsItem("Basename", 200)), + Stat(StatsItem.makeEnumStatsItem(STAT.class,"Status")), + Count(StatsItem.makeIntStatsItem("Count")), + CountExtra(StatsItem.makeIntStatsItem("Count Extra")), + Extras(StatsItem.makeStringStatsItem("Extra Items")), + ; + + StatsItem si; + DerivStatsItems(StatsItem si) {this.si=si;} + public StatsItem si() {return si;} + } + + public static enum Generator implements StatsGenerator { + INSTANCE; + class DerivStats extends Stats { + public DerivStats(String key) { + super(details, key); + } + + } + public DerivStats create(String key) {return new DerivStats(key);} + } + public static StatsItemConfig details = StatsItemConfig.create(DerivStatsItems.class); + + long counter = 1000000; + + public static final String REGX_MATCH = "regex-match"; + public static final String REGX_REPLACE = "regex-replace"; + public static final String EXT_REQ = "file-extensions-req"; + public static final String EXT_OPT = "file-extensions-opt"; + + public static final String DEF_MATCH = "^(.*)\\.[^\\.]+$"; + public static final String DEF_REPLACE = "$1"; + + public DigitalDerivatives(FTDriver dt) { + super(dt); + ftprops.add(new FTPropString(dt, this.getClass().getSimpleName(), REGX_MATCH, REGX_MATCH, + "Regex pattern to compute basename", DEF_MATCH)); + ftprops.add(new FTPropString(dt, this.getClass().getSimpleName(), REGX_REPLACE, REGX_REPLACE, + "Regex replacement for basename", DEF_REPLACE)); + ftprops.add(new FTPropString(dt, this.getClass().getSimpleName(), EXT_REQ, EXT_REQ, + "Required file extensions to report (lower-case, comma separated list)", "tif,jpg,xml")); + ftprops.add(new FTPropString(dt, this.getClass().getSimpleName(), EXT_OPT, EXT_OPT, + "Optional file extensions to report (lower-case, comma separated list)", "")); + extensions = new Vector(); + extensionsReq = new Vector(); + } + + public String toString() { + return "Digital Derivatives"; + } + + String match; + Pattern pMatch; + String replace; + Vectorextensions; + VectorextensionsReq; + + public void init() { + + match = getProperty(REGX_MATCH, DEF_MATCH).toString(); + try { + pMatch = Pattern.compile(match); + } catch (Exception e) { + } + replace = getProperty(REGX_REPLACE, DEF_REPLACE).toString(); + + details = StatsItemConfig.create(DerivStatsItems.class); + extensions.clear(); + extensionsReq.clear(); + + String ext = getProperty(EXT_REQ, "").toString(); + for(String s:ext.split(",")) { + if (s.isEmpty()) continue; + s = s.trim().toLowerCase(); + extensions.add(s); + extensionsReq.add(s); + details.addStatsItem(s, StatsItem.makeEnumStatsItem(FOUND.class, s+"*")); + } + ext = getProperty(EXT_OPT, "").toString(); + for(String s:ext.split(",")) { + if (s.isEmpty()) continue; + s = s.trim().toLowerCase(); + extensions.add(s); + details.addStatsItem(s, StatsItem.makeEnumStatsItem(FOUND.class, s)); + } + } + + public String getExt(String test) { + test = test.toLowerCase(); + for(String ext: extensions) { + if (test.endsWith(ext)) return ext; + } + return null; + } + + //public boolean isTestable(File f) { + // return getExt(f.getName()) != null; + //} + + public String getKey(File f) { + String s = f.getName().toLowerCase(); + if (pMatch == null) return s; + Matcher m = pMatch.matcher(s); + if (m.matches()) { + s = m.replaceFirst(replace); + } + return s; + } + + public String getShortName(){return "Deriv";} + + + public Object fileTest(File f) { + Stats s = getStats(f); + String suff = f.getName().toLowerCase().substring(s.key.length()); + s.sumVal(DerivStatsItems.Count, 1); + + String ext = getExt(f.getName().toLowerCase()); + if (ext == null) { + s.sumVal(DerivStatsItems.CountExtra, 1); + s.appendVal(DerivStatsItems.Extras, suff+" "); + } else { + StatsItem si = details.getByKey(ext); + if (si == null) return null; + FOUND found = (FOUND) s.getKeyVal(si, FOUND.NOT_FOUND); + if (found == FOUND.NOT_FOUND) { + found = FOUND.FOUND; + } else if (found == FOUND.FOUND) { + found = FOUND.FOUND_DUP; + } + s.setKeyVal(si, found); + } + return s.key; + } + + @Override public void refineResults() { + for(Stats curr: dt.types.values()) { + curr.setVal(DerivStatsItems.Stat, STAT.COMPLETE); + + for(String s: extensionsReq) { + StatsItem si = details.getByKey(s); + if (si == null) continue; + FOUND found = (FOUND)curr.getKeyVal(si, FOUND.NOT_FOUND); + if (found == FOUND.NOT_FOUND) { + curr.setVal(DerivStatsItems.Stat, STAT.INCOMPLETE); + } + } + if (curr.getVal(DerivStatsItems.Stat) == STAT.INCOMPLETE) continue; + if (curr.getIntVal(DerivStatsItems.CountExtra) > 0) { + curr.setVal(DerivStatsItems.Stat, STAT.EXTRA); + continue; + } + for(String s: extensions) { + StatsItem si = details.getByKey(s); + if (si == null) continue; + FOUND found = (FOUND)curr.getKeyVal(si, FOUND.NOT_FOUND); + if (found == FOUND.FOUND_DUP) { + curr.setVal(DerivStatsItems.Stat, STAT.EXTRA); + } + } + } + } + + public Stats createStats(String key){ + return Generator.INSTANCE.create(key); + } + public StatsItemConfig getStatsDetails() { + return details; + } + + public void initFilters() { + filters.add(new DefaultFileTestFilter()); + } + + public String getDescription() { + return ""; + } + +} diff --git a/core/src/main/gov/nara/nwts/ftapp/filetest/ListDirectories.java b/core/src/main/gov/nara/nwts/ftapp/filetest/ListDirectories.java index 1ae27ed..2837564 100644 --- a/core/src/main/gov/nara/nwts/ftapp/filetest/ListDirectories.java +++ b/core/src/main/gov/nara/nwts/ftapp/filetest/ListDirectories.java @@ -70,7 +70,8 @@ public String getShortName() { public String getDescription() { - return "Generate a list of directories"; + return "This rule will generate a listing of the unique directory names found within a specific directory.\n" + + "The purpose of this rule is to generate an tracking list when performing a similar batch process on a collection of directories."; } public boolean isTestDirectory() { diff --git a/core/src/main/gov/nara/nwts/ftapp/filetest/ListFiles.java b/core/src/main/gov/nara/nwts/ftapp/filetest/ListFiles.java index a7ea8fe..7de59d3 100644 --- a/core/src/main/gov/nara/nwts/ftapp/filetest/ListFiles.java +++ b/core/src/main/gov/nara/nwts/ftapp/filetest/ListFiles.java @@ -69,7 +69,8 @@ public String getShortName() { public String getDescription() { - return "Generate a list of files"; + return "This rule will generate a listing of the full path to every file it finds.\n" + + "The purpose of this tool is to generate a file list for use in other applications."; } } diff --git a/core/src/main/gov/nara/nwts/ftapp/filetest/LowercaseTest.java b/core/src/main/gov/nara/nwts/ftapp/filetest/LowercaseTest.java index 859fed1..c3c8b83 100644 --- a/core/src/main/gov/nara/nwts/ftapp/filetest/LowercaseTest.java +++ b/core/src/main/gov/nara/nwts/ftapp/filetest/LowercaseTest.java @@ -27,5 +27,9 @@ public File getNewFile(File f, Matcher m) { }); } + public String getDescription() { + return "This test will check that all files are named with only lowercase characters." + + getNameValidationDisclaimer(); + } } diff --git a/core/src/main/gov/nara/nwts/ftapp/filetest/NameChecksum.java b/core/src/main/gov/nara/nwts/ftapp/filetest/NameChecksum.java index c2ca3b0..b6d6a19 100644 --- a/core/src/main/gov/nara/nwts/ftapp/filetest/NameChecksum.java +++ b/core/src/main/gov/nara/nwts/ftapp/filetest/NameChecksum.java @@ -2,6 +2,7 @@ import gov.nara.nwts.ftapp.FTDriver; import gov.nara.nwts.ftapp.YN; +import gov.nara.nwts.ftapp.ftprop.FTPropEnum; import gov.nara.nwts.ftapp.stats.ChecksumStats; import gov.nara.nwts.ftapp.stats.ChecksumStats.ChecksumStatsItems; import gov.nara.nwts.ftapp.stats.Stats; @@ -22,13 +23,28 @@ * @author TBrady * */ -public abstract class NameChecksum extends DefaultFileTest { +public class NameChecksum extends DefaultFileTest { HashMap> keymap; + public static final String ALGORITHM = "Algorithm"; + static enum Algorithm { + MD5("MD5"), + SHA1("SHA-1"), + SHA256("SHA-256"), + SHA384("SHA-384"), + SHA512("SHA-512"); + String algorithm; + Algorithm(String s) {algorithm = s;} + MessageDigest getInstance() throws NoSuchAlgorithmException { + return MessageDigest.getInstance(algorithm); + } + } public NameChecksum(FTDriver dt) { super(dt); keymap = new HashMap>(); + this.ftprops.add(new FTPropEnum(dt, this.getClass().getName(), ALGORITHM, "algorithm", + "Checksum Algorithm", Algorithm.values(), Algorithm.MD5)); } public String toString() { @@ -70,12 +86,11 @@ public void setChecksumKey(String s, ChecksumStats stat) { public String getShortName(){return "Checksum";} - abstract public MessageDigest getMessageDigest() throws NoSuchAlgorithmException; - public String getChecksum(File f) { + Algorithm algorithm = (Algorithm)getProperty(ALGORITHM); FileInputStream fis = null; try { - MessageDigest md = getMessageDigest(); + MessageDigest md = algorithm.getInstance(); fis = new FileInputStream(f); byte[] dataBytes = new byte[1204]; int nread = 0; @@ -120,7 +135,9 @@ public void initFilters() { } public String getDescription() { - return "This test reports the checksum for a given filename.\nNote, the checksum will be overwritten if the file is found more than once."; + return "This test reports the checksum for a given filename.\n" + + "The summary report will identify files with the same checksum value.\n" + + "You may select from a number of standard checksum algorithms."; } public void progress(int count) { if (count % 5000 == 0) { diff --git a/core/src/main/gov/nara/nwts/ftapp/filetest/NameMD5Checksum.java b/core/src/main/gov/nara/nwts/ftapp/filetest/NameMD5Checksum.java deleted file mode 100644 index c2a1324..0000000 --- a/core/src/main/gov/nara/nwts/ftapp/filetest/NameMD5Checksum.java +++ /dev/null @@ -1,31 +0,0 @@ -package gov.nara.nwts.ftapp.filetest; - -import gov.nara.nwts.ftapp.FTDriver; - -import java.security.MessageDigest; -import java.security.NoSuchAlgorithmException; - -/** - * Generate MD5 checksums for a set of files. - * @author TBrady - * - */ -class NameMD5Checksum extends NameChecksum { - - public NameMD5Checksum(FTDriver dt) { - super(dt); - } - - public String toString() { - return "Get MD5 Checksum By Name"; - } - public String getShortName(){return "MD5 ";} - - public MessageDigest getMessageDigest() throws NoSuchAlgorithmException { - return MessageDigest.getInstance("MD5"); - } - - public String getDescription() { - return "This test reports the MD5 checksum for a given filename.\nNote, the checksum will be overwritten if the file is found more than once."; - } -} diff --git a/core/src/main/gov/nara/nwts/ftapp/filetest/NameMatch.java b/core/src/main/gov/nara/nwts/ftapp/filetest/NameMatch.java index 4c10ef7..beb9d1b 100644 --- a/core/src/main/gov/nara/nwts/ftapp/filetest/NameMatch.java +++ b/core/src/main/gov/nara/nwts/ftapp/filetest/NameMatch.java @@ -40,7 +40,7 @@ public void initFilters() { } public String getDescription() { - return "This test reports on file size by name regardless of the directory in a file name is found."; + return "This test reports on file size by name regardless of the directory in which a file name is found."; } } diff --git a/core/src/main/gov/nara/nwts/ftapp/filetest/NameSha1Checksum.java b/core/src/main/gov/nara/nwts/ftapp/filetest/NameSha1Checksum.java deleted file mode 100644 index 884d2b8..0000000 --- a/core/src/main/gov/nara/nwts/ftapp/filetest/NameSha1Checksum.java +++ /dev/null @@ -1,32 +0,0 @@ -package gov.nara.nwts.ftapp.filetest; - -import gov.nara.nwts.ftapp.FTDriver; -import java.security.MessageDigest; -import java.security.NoSuchAlgorithmException; - -/** - * Generate SHA1 checksums for a set of files. - * @author TBrady - * - */ -class NameSha1Checksum extends NameChecksum { - - public NameSha1Checksum(FTDriver dt) { - super(dt); - } - - public String toString() { - return "Get SHA1 Checksum By Name"; - } - - public String getShortName(){return "SHA1";} - - public MessageDigest getMessageDigest() throws NoSuchAlgorithmException { - return MessageDigest.getInstance("SHA1"); - } - - public String getDescription() { - return "This test reports the SHA1 checksum for a given filename.\nNote, the checksum will be overwritten if the file is found more than once."; - } - -} diff --git a/core/src/main/gov/nara/nwts/ftapp/filetest/NameSha256Checksum.java b/core/src/main/gov/nara/nwts/ftapp/filetest/NameSha256Checksum.java deleted file mode 100644 index 4d1c2bf..0000000 --- a/core/src/main/gov/nara/nwts/ftapp/filetest/NameSha256Checksum.java +++ /dev/null @@ -1,32 +0,0 @@ -package gov.nara.nwts.ftapp.filetest; - -import gov.nara.nwts.ftapp.FTDriver; -import java.security.MessageDigest; -import java.security.NoSuchAlgorithmException; - -/** - * Generate SHA256 checksums for a set of files. - * @author TBrady - * - */ -class NameSha256Checksum extends NameChecksum { - - public NameSha256Checksum(FTDriver dt) { - super(dt); - } - - public String toString() { - return "Get SHA-256 Checksum By Name"; - } - - public String getShortName(){return "SHA-256";} - - public MessageDigest getMessageDigest() throws NoSuchAlgorithmException { - return MessageDigest.getInstance("SHA-256"); - } - - public String getDescription() { - return "This test reports the SHA1 checksum for a given filename.\nNote, the checksum will be overwritten if the file is found more than once."; - } - -} diff --git a/core/src/main/gov/nara/nwts/ftapp/filetest/NameValidationTest.java b/core/src/main/gov/nara/nwts/ftapp/filetest/NameValidationTest.java index 6ae665c..b011ff8 100644 --- a/core/src/main/gov/nara/nwts/ftapp/filetest/NameValidationTest.java +++ b/core/src/main/gov/nara/nwts/ftapp/filetest/NameValidationTest.java @@ -185,6 +185,7 @@ public Object fileTest(File f) { for(NameValidationPattern nvp: testPatterns) { RenameDetails det = nvp.checkFile(f); if (det.status == RenameStatus.NEXT) continue; + if (det.status == RenameStatus.SKIP) return det; File newFile = det.getFile(); if (newFile != null) { if (valPatt.checkFile(newFile).status != RenameStatus.VALID) { @@ -222,9 +223,15 @@ public String getShortName() { return shortname + (allowRename ? " Rename" : ""); } + + public String getNameValidationDisclaimer() { + return ((allowRename) ? "\nFILES WILL BE RENAMED IF POSSIBLE." : "") + + "\n\nThe File Analzyer can be re-compiled to allow the actual re-name to take place."; + } + public String getDescription() { - return "This test will check filename case." - + ((allowRename) ? "\nFILES WILL BE RENAMED IF POSSIBLE." : ""); + return "This test will check filename that a filename conforms to an established set of rules." + + getNameValidationDisclaimer() ; } public boolean isTestFiles() { diff --git a/core/src/main/gov/nara/nwts/ftapp/filetest/RandomFileTest.java b/core/src/main/gov/nara/nwts/ftapp/filetest/RandomFileTest.java index 91c4919..4e70d7d 100644 --- a/core/src/main/gov/nara/nwts/ftapp/filetest/RandomFileTest.java +++ b/core/src/main/gov/nara/nwts/ftapp/filetest/RandomFileTest.java @@ -26,7 +26,7 @@ class RandomFileTest extends DefaultFileTest implements Randomizer { public RandomFileTest(FTDriver dt) { super(dt); keySet = new TreeMap(); - ftprops.add(new FTPropEnum(this, LAQL,"AQL","Set the acceptible quality level",AQL.values(),AQL.AQLp04)); + ftprops.add(new FTPropEnum(dt, this.getClass().getName(), LAQL,"AQL","Set the acceptible quality level",AQL.values(),AQL.AQLp04)); } public String toString() { @@ -50,7 +50,10 @@ public void initFilters() { } public String getDescription() { - return "This test will return a list of files in random order for QC processing"; + return "This test will return a list of files in random order for QC processing.\n" + + "Select the AQL (acceptable quality level) target for your test. \n" + + "This rule will generate a random sample of the appropriate size based on the number of files found.\n" + + "See http://en.wikipedia.org/wiki/MIL-STD-105 for an explaination."; } public Object fileTest(File f) { diff --git a/core/src/main/gov/nara/nwts/ftapp/filetest/ReadChecksum.java b/core/src/main/gov/nara/nwts/ftapp/filetest/ReadChecksum.java new file mode 100644 index 0000000..d82a277 --- /dev/null +++ b/core/src/main/gov/nara/nwts/ftapp/filetest/ReadChecksum.java @@ -0,0 +1,120 @@ +package gov.nara.nwts.ftapp.filetest; + +import gov.nara.nwts.ftapp.FTDriver; +import gov.nara.nwts.ftapp.filter.DefaultFileTestFilter; +import gov.nara.nwts.ftapp.importer.DelimitedFileReader; +import gov.nara.nwts.ftapp.stats.Stats; +import gov.nara.nwts.ftapp.stats.StatsGenerator; +import gov.nara.nwts.ftapp.stats.StatsItem; +import gov.nara.nwts.ftapp.stats.StatsItemConfig; +import gov.nara.nwts.ftapp.stats.StatsItemEnum; + +import java.io.File; +import java.io.IOException; +import java.util.Vector; + +/** + * Abstract class for rules that report on checksum values; derived versions of this class will provide a specific hashing algorithm from the Java core library. + * @author TBrady + * + */ +public class ReadChecksum extends DefaultFileTest { + + private static enum ChecksumStatsItems implements StatsItemEnum { + File(StatsItem.makeStringStatsItem("File", 200)), + Checksum(StatsItem.makeStringStatsItem("Checksum",300)), + ; + + StatsItem si; + ChecksumStatsItems(StatsItem si) {this.si=si;} + public StatsItem si() {return si;} + } + public static enum Generator implements StatsGenerator { + INSTANCE; + class ChecksumStats extends Stats { + public ChecksumStats(String key) { + super(details, key); + } + + } + public ChecksumStats create(String key) {return new ChecksumStats(key);} + + } + public static StatsItemConfig details = StatsItemConfig.create(ChecksumStatsItems.class); + + class Cache { + File f; + String key; + String checksum; + + void cache(File f) { + if (!f.equals(this.f)) { + this.f = f; + try { + Vector> data = DelimitedFileReader.parseFile(f," *"); + if (data.size() > 0) { + if (data.get(0).size() > 1) { + checksum = data.get(0).get(0); + key = getRelPath(new File(f.getParentFile(),data.get(0).get(1))); + } + } + } catch (IOException e) { + } + } + } + } + + Cache cache; + + public ReadChecksum(FTDriver dt) { + super(dt); + cache = new Cache(); + } + + public String toString() { + return "Read Checksum"; + } + public String getKey(File f) { + cache.cache(f); + return cache.key; + } + + public String getShortName(){return "Read Checksum";} + + public Object fileTest(File f) { + Stats stat = getStats(f); + cache.cache(f); + stat.setVal(ChecksumStatsItems.Checksum, cache.checksum); + return cache.checksum; + } + public Stats createStats(String key){ + return Generator.INSTANCE.create(key); + } + public StatsItemConfig getStatsDetails() { + return details; + + } + + class ChecksumFilter extends DefaultFileTestFilter { + public String getSuffix() { + return ".*\\.(md5|sha1)$"; + } + public boolean isReSuffix() { + return true; + } + public String getName(){return "Checksum";} + + + } + + public void initFilters() { + filters.add(new ChecksumFilter()); + } + + public String getDescription() { + return "Read and import checksums from checksum files. Assuems checksum files are formatted as follows\n" + + "checksum *filename\n" + + "where * is the delimiter"; + } + +} diff --git a/core/src/main/gov/nara/nwts/ftapp/filter/CounterFilter.java b/core/src/main/gov/nara/nwts/ftapp/filter/CounterFilter.java new file mode 100644 index 0000000..7b81196 --- /dev/null +++ b/core/src/main/gov/nara/nwts/ftapp/filter/CounterFilter.java @@ -0,0 +1,17 @@ +package gov.nara.nwts.ftapp.filter; + +/** + * Filter for excel files + * @author TBrady + * + */ +public class CounterFilter extends DefaultFileTestFilter { + public String getSuffix() { + return ".*\\.(csv|txt)$"; + } + public boolean isReSuffix() { + return true; + } + public String getName(){return "Counter";} + +} diff --git a/core/src/main/gov/nara/nwts/ftapp/filter/CounterFilterXls.java b/core/src/main/gov/nara/nwts/ftapp/filter/CounterFilterXls.java new file mode 100644 index 0000000..5d080e6 --- /dev/null +++ b/core/src/main/gov/nara/nwts/ftapp/filter/CounterFilterXls.java @@ -0,0 +1,17 @@ +package gov.nara.nwts.ftapp.filter; + +/** + * Filter for excel files + * @author TBrady + * + */ +public class CounterFilterXls extends DefaultFileTestFilter { + public String getSuffix() { + return ".*\\.(xls|xlsx|csv|txt)$"; + } + public boolean isReSuffix() { + return true; + } + public String getName(){return "Counter";} + +} diff --git a/core/src/main/gov/nara/nwts/ftapp/filter/ExcelFilter.java b/core/src/main/gov/nara/nwts/ftapp/filter/ExcelFilter.java index f0a1f16..1c5aab4 100644 --- a/core/src/main/gov/nara/nwts/ftapp/filter/ExcelFilter.java +++ b/core/src/main/gov/nara/nwts/ftapp/filter/ExcelFilter.java @@ -6,8 +6,11 @@ * */ public class ExcelFilter extends DefaultFileTestFilter { - public String getSuffix() { - return ".xls"; + public String getSuffix() { + return ".*\\.(xls|xlsx)$"; + } + public boolean isReSuffix() { + return true; } public String getName(){return "Excel";} diff --git a/core/src/main/gov/nara/nwts/ftapp/filter/ZipFilter.java b/core/src/main/gov/nara/nwts/ftapp/filter/ZipFilter.java new file mode 100644 index 0000000..b5d679a --- /dev/null +++ b/core/src/main/gov/nara/nwts/ftapp/filter/ZipFilter.java @@ -0,0 +1,14 @@ +package gov.nara.nwts.ftapp.filter; + +/** + * Filter for XML files + * @author TBrady + * + */ +public class ZipFilter extends DefaultFileTestFilter { + public String getSuffix() { + return ".zip"; + } + public String getName(){return "ZIP";} + +} diff --git a/core/src/main/gov/nara/nwts/ftapp/ftprop/DefaultFTProp.java b/core/src/main/gov/nara/nwts/ftapp/ftprop/DefaultFTProp.java index 0f71bac..b2be38f 100644 --- a/core/src/main/gov/nara/nwts/ftapp/ftprop/DefaultFTProp.java +++ b/core/src/main/gov/nara/nwts/ftapp/ftprop/DefaultFTProp.java @@ -1,6 +1,6 @@ package gov.nara.nwts.ftapp.ftprop; -import gov.nara.nwts.ftapp.filetest.FileTest; +import gov.nara.nwts.ftapp.FTDriver; /** * Abstract base class for File Test Properties @@ -13,29 +13,31 @@ public abstract class DefaultFTProp implements FTProp { String shortname; String description; Object def; - FileTest ft; + FTDriver ft; + String prefix = ""; public enum RUNMODE { TEST, PROD; } - public DefaultFTProp(FileTest ft, String name, String shortname, String description, Object def) { + public DefaultFTProp(FTDriver ft, String prefix, String name, String shortname, String description, Object def) { this.name = name; this.shortname = shortname; this.description = description; this.ft = ft; this.def = def; + this.prefix = prefix; } public void init() { - if (ft.getFTDriver().hasPreferences()) { - def = ft.getFTDriver().getPreferences().get(getPrefString(), def.toString()); + if (ft.hasPreferences()) { + def = ft.getPreferences().get(getPrefString(), def.toString()); } } public void init(Object[] vals) { - if (ft.getFTDriver().hasPreferences()) { - String s = ft.getFTDriver().getPreferences().get(getPrefString(), def.toString()); + if (ft.hasPreferences()) { + String s = ft.getPreferences().get(getPrefString(), def.toString()); if (s == null) return; for(Object obj: vals) { if (s.equals(obj.toString())) { @@ -47,7 +49,7 @@ public void init(Object[] vals) { } public String getPrefString() { - return ft.toString()+"--"+name; + return "ftprop--"+prefix+"--"+shortname; } public String describe() { diff --git a/core/src/main/gov/nara/nwts/ftapp/ftprop/FTPropEnum.java b/core/src/main/gov/nara/nwts/ftapp/ftprop/FTPropEnum.java index d0fb731..238c8d5 100644 --- a/core/src/main/gov/nara/nwts/ftapp/ftprop/FTPropEnum.java +++ b/core/src/main/gov/nara/nwts/ftapp/ftprop/FTPropEnum.java @@ -3,7 +3,7 @@ import java.awt.event.ItemEvent; import java.awt.event.ItemListener; -import gov.nara.nwts.ftapp.filetest.FileTest; +import gov.nara.nwts.ftapp.FTDriver; import javax.swing.JComboBox; import javax.swing.JComponent; @@ -17,8 +17,8 @@ public class FTPropEnum extends DefaultFTProp { JComboBox combo; - public FTPropEnum(FileTest ft, String name, String shortname, String description, Object[]vals, Object def) { - super(ft, name, shortname, description, def); + public FTPropEnum(FTDriver ft, String prefix, String name, String shortname, String description, Object[]vals, Object def) { + super(ft, prefix, name, shortname, description, def); init(vals); combo = new JComboBox(); initCombo(vals); @@ -26,8 +26,8 @@ public FTPropEnum(FileTest ft, String name, String shortname, String description public void itemStateChanged(ItemEvent arg0) { Object obj = combo.getSelectedItem(); if (obj == null) return; - if (FTPropEnum.this.ft.getFTDriver().hasPreferences()){ - FTPropEnum.this.ft.getFTDriver().getPreferences().put(getPrefString(), combo.getSelectedItem().toString()); + if (FTPropEnum.this.ft.hasPreferences()){ + FTPropEnum.this.ft.setPreference(getPrefString(), combo.getSelectedItem().toString()); } } }); diff --git a/core/src/main/gov/nara/nwts/ftapp/ftprop/FTPropString.java b/core/src/main/gov/nara/nwts/ftapp/ftprop/FTPropString.java index d6fb6a7..924f152 100644 --- a/core/src/main/gov/nara/nwts/ftapp/ftprop/FTPropString.java +++ b/core/src/main/gov/nara/nwts/ftapp/ftprop/FTPropString.java @@ -1,6 +1,6 @@ package gov.nara.nwts.ftapp.ftprop; -import gov.nara.nwts.ftapp.filetest.FileTest; +import gov.nara.nwts.ftapp.FTDriver; import javax.swing.JComponent; import javax.swing.JTextField; @@ -15,26 +15,26 @@ public class FTPropString extends DefaultFTProp { JTextField tf; - public FTPropString(FileTest ft, String name, String shortname, String description, Object def) { - super(ft, name, shortname, description, def); + public FTPropString(FTDriver ft, String prefix, String name, String shortname, String description, Object def) { + super(ft, prefix, name, shortname, description, def); init(); tf = new JTextField(this.def.toString()); tf.getDocument().addDocumentListener(new DocumentListener(){ public void changedUpdate(DocumentEvent arg0) { - if (FTPropString.this.ft.getFTDriver().hasPreferences()) { - FTPropString.this.ft.getFTDriver().getPreferences().put(getPrefString(), tf.getText()); + if (FTPropString.this.ft.hasPreferences()) { + FTPropString.this.ft.setPreference(getPrefString(), tf.getText()); } } public void insertUpdate(DocumentEvent arg0) { - if (FTPropString.this.ft.getFTDriver().hasPreferences()) { - FTPropString.this.ft.getFTDriver().getPreferences().put(getPrefString(), tf.getText()); + if (FTPropString.this.ft.hasPreferences()) { + FTPropString.this.ft.setPreference(getPrefString(), tf.getText()); } } public void removeUpdate(DocumentEvent arg0) { - if (FTPropString.this.ft.getFTDriver().hasPreferences()) { - FTPropString.this.ft.getFTDriver().getPreferences().put(getPrefString(), tf.getText()); + if (FTPropString.this.ft.hasPreferences()) { + FTPropString.this.ft.setPreference(getPrefString(), tf.getText()); } } }); diff --git a/core/src/main/gov/nara/nwts/ftapp/gui/CriteriaPanel.java b/core/src/main/gov/nara/nwts/ftapp/gui/CriteriaPanel.java index fde995b..77e2c24 100644 --- a/core/src/main/gov/nara/nwts/ftapp/gui/CriteriaPanel.java +++ b/core/src/main/gov/nara/nwts/ftapp/gui/CriteriaPanel.java @@ -174,6 +174,7 @@ public void actionPerformed(ActionEvent arg0) { //description.setBorder(BorderFactory.createEmptyBorder()); description.setEditable(false); description.setLineWrap(true); + description.setWrapStyleWord(true); description.setBackground(parent.frame.getBackground()); description.setFont(description.getFont().deriveFont(Font.ITALIC)); descp.add(new JScrollPane(description), BorderLayout.CENTER); diff --git a/core/src/main/gov/nara/nwts/ftapp/gui/DirectoryTable.java b/core/src/main/gov/nara/nwts/ftapp/gui/DirectoryTable.java index a411d20..c8c7720 100644 --- a/core/src/main/gov/nara/nwts/ftapp/gui/DirectoryTable.java +++ b/core/src/main/gov/nara/nwts/ftapp/gui/DirectoryTable.java @@ -105,7 +105,7 @@ public DirectoryTable(File root, boolean modifyAllowed) { super(root); this.modifyAllowed = modifyAllowed; frame = new JFrame(title); - preferences = Preferences.userNodeForPackage(getClass()); + preferences = Preferences.userNodeForPackage(DirectoryTable.class); recent = new ArrayList(); for(int i=20-1; i>=0;i--){ String s = preferences.get("recent"+i,""); diff --git a/core/src/main/gov/nara/nwts/ftapp/gui/ImportPanel.java b/core/src/main/gov/nara/nwts/ftapp/gui/ImportPanel.java index 522d764..d47e9c8 100644 --- a/core/src/main/gov/nara/nwts/ftapp/gui/ImportPanel.java +++ b/core/src/main/gov/nara/nwts/ftapp/gui/ImportPanel.java @@ -1,9 +1,11 @@ package gov.nara.nwts.ftapp.gui; +import gov.nara.nwts.ftapp.ftprop.FTProp; import gov.nara.nwts.ftapp.importer.Importer; import gov.nara.nwts.ftapp.stats.Stats; import java.awt.BorderLayout; +import java.awt.Dimension; import java.awt.FlowLayout; import java.awt.GridLayout; import java.awt.Insets; @@ -14,6 +16,7 @@ import java.text.NumberFormat; import java.util.ArrayList; import java.util.Collections; +import java.util.List; import java.util.TreeMap; import javax.swing.BorderFactory; @@ -22,13 +25,16 @@ import javax.swing.JButton; import javax.swing.JCheckBox; import javax.swing.JComboBox; +import javax.swing.JComponent; import javax.swing.JFormattedTextField; import javax.swing.JLabel; import javax.swing.JOptionPane; import javax.swing.JPanel; +import javax.swing.JScrollPane; import javax.swing.JTabbedPane; import javax.swing.JTextArea; import javax.swing.JTextField; +import javax.swing.ScrollPaneConstants; /** * User interface component presenting file import options and auto-sequencing options. @@ -48,6 +54,8 @@ class ImportPanel extends MyPanel { JCheckBox forceKey; DirectoryTable parent; FileSelectChooser fsc; + JPanel propPanel; + JTabbedPane dptab; JPanel ipGen; @@ -55,6 +63,20 @@ void updateDesc() { Importer i = (Importer)importers.getSelectedItem(); importerDesc.setText(i.getDescription()); forceKey.setEnabled(i.allowForceKey()); + + propPanel.removeAll(); + Box b = new Box(BoxLayout.Y_AXIS); + b.add(Box.createHorizontalStrut(500)); + propPanel.add(b); + List myprops = i.getPropertyList(); + dptab.setEnabledAt(1, false); + for(FTProp myprop: myprops) { + JComponent c = myprop.getEditor(); + c.setBorder(BorderFactory.createTitledBorder(myprop.getName())); + c.setToolTipText(myprop.describe()); + b.add(c); + dptab.setEnabledAt(1, true); + } } ImportPanel(DirectoryTable dt) { @@ -127,7 +149,7 @@ public void actionPerformed(ActionEvent arg0) { fileBox.add(p1, BorderLayout.SOUTH); JPanel dp = new JPanel(); fileBox.add(dp, BorderLayout.SOUTH); - JTabbedPane dptab = new JTabbedPane(); + dptab = new JTabbedPane(); dp.add(dptab); importerDesc = new JTextArea(9,60); importerDesc.setMargin(new Insets(10,10,10,10)); @@ -136,6 +158,10 @@ public void actionPerformed(ActionEvent arg0) { importerDesc.setBackground(this.getBackground()); importerDesc.setEditable(false); dptab.add(importerDesc, "Importer Description"); + propPanel = new JPanel(); + JScrollPane sp = new JScrollPane(propPanel, ScrollPaneConstants.VERTICAL_SCROLLBAR_ALWAYS, ScrollPaneConstants.HORIZONTAL_SCROLLBAR_AS_NEEDED); + sp.setPreferredSize(new Dimension(parent.criteriaPanel.propFilter.getWidth()-20, 220)); + dptab.add(sp,"Importer Properties"); JPanel dpadv = new JPanel(); dptab.add(dpadv, "Import Options"); forceKey = new JCheckBox("Force Unique Keys"); diff --git a/core/src/main/gov/nara/nwts/ftapp/gui/TableSaver.java b/core/src/main/gov/nara/nwts/ftapp/gui/TableSaver.java index efef640..bfdafa1 100644 --- a/core/src/main/gov/nara/nwts/ftapp/gui/TableSaver.java +++ b/core/src/main/gov/nara/nwts/ftapp/gui/TableSaver.java @@ -1,15 +1,20 @@ package gov.nara.nwts.ftapp.gui; +import gov.nara.nwts.ftapp.importer.DelimitedFileImporter.Separator; + import java.awt.BorderLayout; import java.awt.GridLayout; import java.io.BufferedWriter; import java.io.File; -import java.io.FileWriter; +import java.io.FileOutputStream; import java.io.IOException; +import java.io.OutputStreamWriter; import java.util.ArrayList; import java.util.List; +import javax.swing.BorderFactory; import javax.swing.JCheckBox; +import javax.swing.JComboBox; import javax.swing.JDialog; import javax.swing.JFileChooser; import javax.swing.JFrame; @@ -32,7 +37,8 @@ class TableSaver extends JDialog { JTable jt; JPanel colPanel; ArrayList checks; - + JComboBox delim; + DirectoryTable parent; TableSaver(DirectoryTable parent, DefaultTableModel tm, JTable jt, String fname) { this(parent, tm, jt, fname, new ArrayList()); @@ -47,6 +53,9 @@ class TableSaver extends JDialog { setLayout(new BorderLayout()); JFileChooser jfc = new JFileChooser(){ private static final long serialVersionUID = 1L; + public int getDialogType() { + return JFileChooser.SAVE_DIALOG; + } public String getApproveButtonText() { return "Export"; } @@ -90,19 +99,28 @@ public void approveSelection() { colPanel.add(cb); checks.add(cb); } + delim = new JComboBox(Separator.values()); + delim.setBorder(BorderFactory.createTitledBorder("Column Delimiter")); + delim.setSelectedItem(Separator.Tab); + add(delim, BorderLayout.SOUTH); + pack(); setVisible(true); } + public String getSeparator() { + return ((Separator)delim.getSelectedItem()).separator; + } + public void save(File f) throws IOException { - BufferedWriter bw = new BufferedWriter(new FileWriter(f)); + BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(f),"UTF-8")); boolean first = true; for(int c=0; c types = new TreeMap(); + + DelimitedFileReader dfr = new DelimitedFileReader(selectedFile, fileSeparator.separator); + boolean firstRow = (YN)getProperty(HEADROW) == YN.Y; + + firstRow = (YN)getProperty(HEADROW) == YN.Y; + dfr = new DelimitedFileReader(selectedFile, fileSeparator.separator); + for(Vector cols = dfr.getRow(); cols != null; cols = dfr.getRow()){ + if (firstRow) { + firstRow = false; + continue; + } + String key = cols.get(col < cols.size() ? col : 0); + Stats stats = types.get(key); + if (stats == null) { + stats = Generator.INSTANCE.create(key); + stats.setVal(CountStatsItems.Count, 1); + stats.setVal(CountStatsItems.Stat, MULT.ONE); + types.put(key, stats); + } else { + stats.sumVal(CountStatsItems.Count, 1); + stats.setVal(CountStatsItems.Stat, MULT.MANY); + } + } + + return new ActionResult(selectedFile, selectedFile.getName(), this.toString(), details, types, true, timer.getDuration()); + } + + public String toString() { + return "Count Key"; + } + public String getDescription() { + return "Count the number of times a key appears in a file."; + } + public String getShortName() { + return "Key"; + } +} diff --git a/core/src/main/gov/nara/nwts/ftapp/importer/CsvFileImporter.java b/core/src/main/gov/nara/nwts/ftapp/importer/CsvFileImporter.java deleted file mode 100644 index 0a70356..0000000 --- a/core/src/main/gov/nara/nwts/ftapp/importer/CsvFileImporter.java +++ /dev/null @@ -1,29 +0,0 @@ -package gov.nara.nwts.ftapp.importer; - -import gov.nara.nwts.ftapp.FTDriver; - -/** - * Importer for comma delimited files - * @author TBrady - * - */ -public class CsvFileImporter extends DelimitedFileImporter { - - public CsvFileImporter(FTDriver dt) { - super(dt); - } - - public String getSeparator() { - return ","; - } - - public String toString() { - return "Import Comma-Separated File"; - } - public String getDescription() { - return "This rule will import a comma separated file and use the first column as a unique key"; - } - public String getShortName() { - return "CSV"; - } -} diff --git a/core/src/main/gov/nara/nwts/ftapp/importer/DefaultImporter.java b/core/src/main/gov/nara/nwts/ftapp/importer/DefaultImporter.java index c8277a9..1a6c615 100644 --- a/core/src/main/gov/nara/nwts/ftapp/importer/DefaultImporter.java +++ b/core/src/main/gov/nara/nwts/ftapp/importer/DefaultImporter.java @@ -2,9 +2,12 @@ import gov.nara.nwts.ftapp.ActionResult; import gov.nara.nwts.ftapp.FTDriver; +import gov.nara.nwts.ftapp.ftprop.FTProp; import java.io.File; import java.io.IOException; +import java.util.ArrayList; +import java.util.List; /** * Abstract base class for importer behaviors. @@ -13,8 +16,10 @@ */ public abstract class DefaultImporter implements Importer { protected FTDriver dt; + protected ArrayListftprops; public DefaultImporter(FTDriver dt) { this.dt = dt; + ftprops = new ArrayList(); } public abstract String toString(); @@ -31,4 +36,27 @@ public String getShortNameFormatted() { buf.append(" "); return buf.substring(0,20); } + public List getPropertyList() { + return ftprops; + } + + public Object getProperty(String name) { + return getProperty(name, null); + } + public Object getProperty(String name, Object def) { + for(FTProp ftprop: ftprops) { + if (ftprop.getName().equals(name)) { + return ftprop.getValue(); + } + } + return def; + } + public void setProperty(String name, String s) { + for(FTProp ftprop: ftprops) { + if (ftprop.getName().equals(name)) { + ftprop.setValue(ftprop.validate(s)); + return; + } + } + } } diff --git a/core/src/main/gov/nara/nwts/ftapp/importer/DelimitedFileImporter.java b/core/src/main/gov/nara/nwts/ftapp/importer/DelimitedFileImporter.java index 838b6f3..ee67139 100644 --- a/core/src/main/gov/nara/nwts/ftapp/importer/DelimitedFileImporter.java +++ b/core/src/main/gov/nara/nwts/ftapp/importer/DelimitedFileImporter.java @@ -3,88 +3,52 @@ import gov.nara.nwts.ftapp.ActionResult; import gov.nara.nwts.ftapp.FTDriver; import gov.nara.nwts.ftapp.Timer; +import gov.nara.nwts.ftapp.ftprop.FTPropEnum; import gov.nara.nwts.ftapp.stats.Stats; import gov.nara.nwts.ftapp.stats.StatsItem; import gov.nara.nwts.ftapp.stats.StatsItemConfig; -import java.io.BufferedReader; import java.io.File; -import java.io.FileReader; import java.io.IOException; import java.util.TreeMap; import java.util.Vector; +import gov.nara.nwts.ftapp.YN; + /** * Abstract class handling the import of a character-delimited text file allowing for individual values to be wrapped by quotation marks. * @author TBrady * */ -public abstract class DelimitedFileImporter extends DefaultImporter { +public class DelimitedFileImporter extends DefaultImporter { + public static enum Separator{ + Comma(","), + Tab("\t"), + Semicolon(";"), + Pipe("|"); + + public String separator; + Separator(String s) {separator = s;} + } + public static final String DELIM = "Delimiter"; + public static final String HEADROW = "HeadRow"; public DelimitedFileImporter(FTDriver dt) { super(dt); + this.ftprops.add(new FTPropEnum(dt, this.getClass().getName(), DELIM, "delim", + "Delimiter character separating fields", Separator.values(), Separator.Comma)); + this.ftprops.add(new FTPropEnum(dt, this.getClass().getName(), HEADROW, HEADROW, + "Treat first row as header", YN.values(), YN.Y)); } boolean forceKey; int rowKey = 1000000; - public abstract String getSeparator(); - protected static String getNextString(String in, String sep) { - return getNextString(in,sep,0); - } - protected static String getNextString(String in, String sep, int start) { - int pos = -1; - if (in.startsWith("\"")) { - int qpos = in.indexOf("\"", (start==0) ? 1 : start); - int qqpos = in.indexOf("\"\"", (start==0) ? 1 : start); - if ((qpos==qqpos)&&(qqpos >= 0)) { - return getNextString(in,sep,qqpos+2); - } - if (qpos == in.length()) { - return in; - } - if (qpos == -1) { - qpos = 0; - } - pos = in.indexOf(sep,qpos+1); - } else { - pos = in.indexOf(sep, 0); - } - if (pos == -1) return in; - return in.substring(0,pos); - } - - public static Vector parseLine(String line, String sep){ - String pline = line; - Vector cols = new Vector(); - while(pline!=null){ - if (!sep.trim().equals("")) pline = pline.trim(); - String tpline = getNextString(pline,sep); - cols.add(normalize(tpline)); - if (pline.length() == tpline.length()) break; - pline = pline.substring(tpline.length()+1); - } - return cols; - } - - public static Vector> parseFile(File f, String sep) throws IOException{ - return parseFile(f,sep,false); - } - public static Vector> parseFile(File f, String sep,boolean skipFirstLine) throws IOException{ - BufferedReader br = new BufferedReader(new FileReader(f)); - if (skipFirstLine) br.readLine(); - Vector> rows = new Vector>(); - for(String line=br.readLine(); line!=null; line=br.readLine()){ - rows.add(parseLine(line, sep)); - } - br.close(); - return rows; - } - public static String KEY = "key"; public ActionResult importFile(File selectedFile) throws IOException { + Separator fileSeparator = (Separator)getProperty(DELIM); Timer timer = new Timer(); forceKey = dt.getImporterForceKey(); - BufferedReader br = new BufferedReader(new FileReader(selectedFile)); + int colcount = 0; TreeMap types = new TreeMap(); StatsItemConfig details = new StatsItemConfig(); @@ -93,30 +57,39 @@ public ActionResult importFile(File selectedFile) throws IOException { } int colset = 0; + + DelimitedFileReader dfr = new DelimitedFileReader(selectedFile, fileSeparator.separator); + boolean firstRow = (YN)getProperty(HEADROW) == YN.Y; - for(String line=br.readLine(); line!=null; line=br.readLine()){ - Vector cols = parseLine(line, getSeparator()); + for(Vector cols = dfr.getRow(); cols != null; cols = dfr.getRow()){ colcount = Math.max(colcount,cols.size()); for(int i=colset; i cols = parseLine(line, getSeparator()); + + firstRow = (YN)getProperty(HEADROW) == YN.Y; + dfr = new DelimitedFileReader(selectedFile, fileSeparator.separator); + for(Vector cols = dfr.getRow(); cols != null; cols = dfr.getRow()){ + if (firstRow) { + firstRow = false; + continue; + } String key = cols.get(0); if (forceKey) { key = "" + (rowKey++); } Stats stats = Stats.Generator.INSTANCE.create(key); stats.init(details); + int start = 1; if (forceKey) { stats.setKeyVal(details.getByKey(KEY), cols.get(0)); + start = 0; } - for(int i=1; i getRow() throws IOException { + pline = br.readLine(); + if (pline == null) { + br.close(); + return null; + } + if (!sep.equals("")) pline = pline.trim(); + Vector cols = new Vector(); + while(pline!=null){ + if (!sep.trim().equals("")) pline = pline.trim(); + String tpline = getNextString(pline); + cols.add(normalize(tpline)); + if (pline.length() == tpline.length()) break; + pline = pline.substring(tpline.length()+sep.length()); + } + return cols; + } + + protected String getNextString(String in) throws IOException { + return getNextString(in,0); + } + protected String getNextString(String in, int start) throws IOException { + int pos = -1; + if (in.startsWith("\"")) { + int qpos = in.indexOf("\"", (start==0) ? 1 : start); + int qqpos = in.indexOf("\"\"", (start==0) ? 1 : start); + if ((qpos==qqpos)&&(qqpos >= 0)) { + return getNextString(in,qqpos+2); + } + if (qpos == in.length()) { + return in; + } + if (qpos == -1) { + String s = br.readLine(); + if (s == null) return pline; + pline += s; + return getNextString(pline, start); + } + pos = in.indexOf(sep,qpos+1); + } else { + pos = in.indexOf(sep, 0); + } + if (pos == -1) return in; + return in.substring(0,pos); + } + + public static Vector> parseFile(File f, String sep) throws IOException{ + return parseFile(f,sep,false); + } + public static Vector> parseFile(File f, String sep, boolean skipFirstLine) throws IOException{ + DelimitedFileReader dfr = new DelimitedFileReader(f, sep); + Vector> rows = new Vector>(); + Vector hrow = new Vector(); + if (skipFirstLine) hrow = dfr.getRow(); + if (hrow != null) { + for(Vector row = dfr.getRow(); row!=null; row = dfr.getRow()){ + rows.add(row); + } + } + return rows; + } + + public static Vector> parseStream(InputStream is, String sep, boolean skipFirstLine) throws IOException{ + DelimitedFileReader dfr = new DelimitedFileReader(is, sep); + Vector> rows = new Vector>(); + Vector hrow = new Vector(); + if (skipFirstLine) hrow = dfr.getRow(); + if (hrow != null) { + for(Vector row = dfr.getRow(); row!=null; row = dfr.getRow()){ + rows.add(row); + } + } + return rows; + } + + protected static String normalize(String val) { + val = val.trim(); + if (val.startsWith("\"")) { + val = val.substring(1); + } else if (val.startsWith("'")) { + val = val.substring(1); + } + if (val.endsWith("\"")) { + val = val.substring(0,val.length()-1); + } else if (val.endsWith("'")) { + val = val.substring(0,val.length()-1); + } + + val = val.replaceAll("\"\"", "\""); + return val; + } +} diff --git a/core/src/main/gov/nara/nwts/ftapp/importer/Importer.java b/core/src/main/gov/nara/nwts/ftapp/importer/Importer.java index 7da4666..4407b04 100644 --- a/core/src/main/gov/nara/nwts/ftapp/importer/Importer.java +++ b/core/src/main/gov/nara/nwts/ftapp/importer/Importer.java @@ -1,9 +1,11 @@ package gov.nara.nwts.ftapp.importer; import gov.nara.nwts.ftapp.ActionResult; +import gov.nara.nwts.ftapp.ftprop.FTProp; import java.io.File; import java.io.IOException; +import java.util.List; /** * Contract that Importers will fullfill. @@ -17,4 +19,8 @@ public interface Importer { public String getShortName(); public String getShortNameFormatted(); public String getShortNameNormalized(); + + public List getPropertyList(); + public Object getProperty(String name); + public void setProperty(String name, String str); } diff --git a/core/src/main/gov/nara/nwts/ftapp/importer/ImporterRegistry.java b/core/src/main/gov/nara/nwts/ftapp/importer/ImporterRegistry.java index db39fca..bc2b108 100644 --- a/core/src/main/gov/nara/nwts/ftapp/importer/ImporterRegistry.java +++ b/core/src/main/gov/nara/nwts/ftapp/importer/ImporterRegistry.java @@ -1,6 +1,7 @@ package gov.nara.nwts.ftapp.importer; import gov.nara.nwts.ftapp.FTDriver; +import gov.nara.nwts.ftapp.filetest.CounterValidation; import java.util.Vector; @@ -14,12 +15,19 @@ public class ImporterRegistry extends Vector { private static final long serialVersionUID = 1L; public ImporterRegistry(FTDriver dt) { - add(new TabSepFileImporter(dt)); - add(new CsvFileImporter(dt)); - add(new SemicolonFileImporter(dt)); - add(new PipeFileImporter(dt)); + add(new DelimitedFileImporter(dt)); add(new Parser(dt)); + add(new CountKey(dt)); + add(new CounterValidation(dt)); } + public void removeImporter(Class c) { + for(Importer ft: this) { + if (c.isInstance(ft)) { + this.remove(ft); + break; + } + } + } } diff --git a/core/src/main/gov/nara/nwts/ftapp/importer/Parser.java b/core/src/main/gov/nara/nwts/ftapp/importer/Parser.java index b16f59a..d076776 100644 --- a/core/src/main/gov/nara/nwts/ftapp/importer/Parser.java +++ b/core/src/main/gov/nara/nwts/ftapp/importer/Parser.java @@ -3,6 +3,8 @@ import gov.nara.nwts.ftapp.ActionResult; import gov.nara.nwts.ftapp.FTDriver; import gov.nara.nwts.ftapp.Timer; +import gov.nara.nwts.ftapp.ftprop.FTPropEnum; +import gov.nara.nwts.ftapp.ftprop.FTPropString; import gov.nara.nwts.ftapp.stats.Stats; import gov.nara.nwts.ftapp.stats.StatsItem; import gov.nara.nwts.ftapp.stats.StatsItemConfig; @@ -24,8 +26,16 @@ * */ public class Parser extends DefaultImporter { - public enum status {PASS,FAIL} - + public static enum status {PASS,FAIL} + public static enum Fields { + NA(0),F1(1),F2(2),F3(3),F4(4),F5(5),F6(6),F7(7),F8(8),F9(9),F10(10); + int index; + Fields(int i) {index = i;} + public String toString() { + if (this == NA) return "Not Applicable"; + return "Regex Field "+ index; + } + } public static enum ParserStatsItems implements StatsItemEnum { Row(StatsItem.makeStringStatsItem("Row",60)), PassFail(StatsItem.makeEnumStatsItem(status.class, "Pass/Fail").setInitVal(status.PASS)), @@ -39,20 +49,21 @@ public static enum ParserStatsItems implements StatsItemEnum { public static StatsItemConfig details = StatsItemConfig.create(ParserStatsItems.class); Pattern p; int cols; - StatsItemConfig mydetails; - public StatsItemConfig getDetails() { return details; } - public Pattern getPattern() { - return Pattern.compile("^(.*)$"); - } - + public static final String REGEX = "Regular Expression"; + public static final int FIELDS = 10; + public static final String FIELD_PRE = "Optional Output "; + public static final String getOptField(int i) {return FIELD_PRE + i;} public Parser(FTDriver dt) { super(dt); - mydetails = getDetails(); - cols = mydetails.size() - 1; - p = getPattern(); + this.ftprops.add(new FTPropString(dt, this.getClass().getName(), REGEX, "regex", + "Regular expression to test against each line of the file", "^(.*)$")); + for(int i=1; i<=FIELDS; i++) { + this.ftprops.add(new FTPropEnum(dt, this.getClass().getName(), getOptField(i), "f"+i, + getOptField(i), Fields.values(), Fields.NA)); + } } public Matcher test(String line) { @@ -73,21 +84,35 @@ public Object getDefVal(Matcher m, int i, String line) { return ""; } + public Fields getFieldForCol(int i) { + Fields f = (Fields)getProperty(getOptField(i)); + if (f == null) return Fields.NA; + return f; + } public void setVals(Matcher m, Stats stats, String line) { if (m.matches()) { for(int i=1;i<=cols;i++) { - stats.setKeyVal(details.getByKey(i), getVal(m,i)); + stats.setKeyVal(details.getByKey(i), getVal(m,getFieldForCol(i).index)); } } else { stats.setVal(ParserStatsItems.PassFail, status.FAIL); for(int i=1;i<=cols;i++) { - stats.setKeyVal(details.getByKey(i), getDefVal(m,i,line)); + stats.setKeyVal(details.getByKey(i), getDefVal(m,getFieldForCol(i).index,line)); } } } public ActionResult importFile(File selectedFile) throws IOException { + String patt = (String)this.getProperty(REGEX); + details = StatsItemConfig.create(ParserStatsItems.class); + for(int i=1; i<=FIELDS; i++) { + Fields f = (Fields)this.getProperty(getOptField(i)); + if (f == Fields.NA) continue; + details.addStatsItem(i, StatsItem.makeStringStatsItem(f.toString())); + cols++; + } + p = Pattern.compile(patt); Timer timer = new Timer(); TreeMap types = new TreeMap(); try { @@ -98,26 +123,28 @@ public ActionResult importFile(File selectedFile) throws IOException { Stats stats = Stats.Generator.INSTANCE.create(Parser.details, key); types.put(key, stats); Matcher m = test(line); + stats.setVal(ParserStatsItems.Data, line); setVals(m, stats, line); } br.close(); - return new ActionResult(selectedFile, selectedFile.getName(), this.toString(), mydetails, types, true, timer.getDuration()); + return new ActionResult(selectedFile, selectedFile.getName(), this.toString(), details, types, true, timer.getDuration()); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } - return new ActionResult(selectedFile, selectedFile.getName(), this.toString(), mydetails, types, false, timer.getDuration()); + return new ActionResult(selectedFile, selectedFile.getName(), this.toString(), details, types, false, timer.getDuration()); } public String toString() { - return "Parser"; + return "Regular Expression Parser"; } public String getDescription() { - return "This rule will parse each line of a file and add it to the results table."; + return "This rule will parse each line of a file and add it to the results table.\n" + + "This rule requires an understanding of regular expressions."; } public String getShortName() { - return "Parse"; + return "Regex"; } } diff --git a/core/src/main/gov/nara/nwts/ftapp/importer/PipeFileImporter.java b/core/src/main/gov/nara/nwts/ftapp/importer/PipeFileImporter.java deleted file mode 100644 index af1ac36..0000000 --- a/core/src/main/gov/nara/nwts/ftapp/importer/PipeFileImporter.java +++ /dev/null @@ -1,31 +0,0 @@ -package gov.nara.nwts.ftapp.importer; - -import gov.nara.nwts.ftapp.FTDriver; -import gov.nara.nwts.ftapp.importer.DelimitedFileImporter; - -/** - * Importer for semicolon delimited files - * @author TBrady - * - */ -public class PipeFileImporter extends DelimitedFileImporter { - - public PipeFileImporter(FTDriver dt) { - super(dt); - } - - public String getSeparator() { - return "|"; - } - - public String toString() { - return "Import Pipe-Separated File"; - } - public String getDescription() { - return "This rule will import a pipe separated file and use the first column as a unique key"; - } - public String getShortName() { - return "PIPE"; - } - -} diff --git a/core/src/main/gov/nara/nwts/ftapp/importer/SemicolonFileImporter.java b/core/src/main/gov/nara/nwts/ftapp/importer/SemicolonFileImporter.java deleted file mode 100644 index c78c9ff..0000000 --- a/core/src/main/gov/nara/nwts/ftapp/importer/SemicolonFileImporter.java +++ /dev/null @@ -1,30 +0,0 @@ -package gov.nara.nwts.ftapp.importer; - -import gov.nara.nwts.ftapp.FTDriver; - -/** - * Importer for semicolon delimited files - * @author TBrady - * - */ -public class SemicolonFileImporter extends DelimitedFileImporter { - - public SemicolonFileImporter(FTDriver dt) { - super(dt); - } - - public String getSeparator() { - return ";"; - } - - public String toString() { - return "Import Semicolon-Separated File"; - } - public String getDescription() { - return "This rule will import a semicolon separated file and use the first column as a unique key"; - } - public String getShortName() { - return "SEMI"; - } - -} diff --git a/core/src/main/gov/nara/nwts/ftapp/importer/TabSepFileImporter.java b/core/src/main/gov/nara/nwts/ftapp/importer/TabSepFileImporter.java deleted file mode 100644 index 95d1499..0000000 --- a/core/src/main/gov/nara/nwts/ftapp/importer/TabSepFileImporter.java +++ /dev/null @@ -1,29 +0,0 @@ -package gov.nara.nwts.ftapp.importer; - -import gov.nara.nwts.ftapp.FTDriver; - -/** - * Importer for tab delimited files - * @author TBrady - * - */ -public class TabSepFileImporter extends DelimitedFileImporter { - - public TabSepFileImporter(FTDriver dt) { - super(dt); - } - - public String getSeparator() { - return "\t"; - } - public String toString() { - return "Import Tab-Separated File"; - } - public String getDescription() { - return "This rule will import a tab separated file and use the first column as a unique key"; - } - public String getShortName() { - return "TAB"; - } - -} diff --git a/core/src/main/gov/nara/nwts/ftapp/nameValidation/DirAnalysis.java b/core/src/main/gov/nara/nwts/ftapp/nameValidation/DirAnalysis.java index 60a1169..d534196 100644 --- a/core/src/main/gov/nara/nwts/ftapp/nameValidation/DirAnalysis.java +++ b/core/src/main/gov/nara/nwts/ftapp/nameValidation/DirAnalysis.java @@ -16,22 +16,20 @@ */ public class DirAnalysis{ - public static RenameDetails analyze(File dir, Pattern p, int group, Pattern pIgnore){ - return analyze(dir,p,group,pIgnore,true); - } - public static RenameDetails analyze(File dir, Pattern p, int group, Pattern pIgnore, boolean rptIgnore){ + public static RenameDetails analyze(File dir, Pattern p, int group, Pattern pIgnore, boolean rptIgnore, boolean startAtOne){ ArrayList names = new ArrayList(); for(String s: dir.list()){ names.add(s); } - return analyze(names, dir,p,group,pIgnore,rptIgnore); + return analyze(names, dir,p,group,pIgnore,rptIgnore, startAtOne); } - public static RenameDetails recursiveAnalyze(File dir, Pattern p, int group, Pattern pIgnore, boolean rptIgnore){ + + public static RenameDetails recursiveAnalyze(File dir, Pattern p, int group, Pattern pIgnore, boolean rptIgnore, boolean startAtOne){ ArrayList names = new ArrayList(); gatherFileNames(names, dir); - return analyze(names, dir,p,group,pIgnore,rptIgnore); + return analyze(names, dir,p,group,pIgnore,rptIgnore, startAtOne); } - + public static void gatherFileNames(Listnames, File f) { if (f.isDirectory()) { for(File cf: f.listFiles()){ @@ -42,7 +40,7 @@ public static void gatherFileNames(Listnames, File f) { } } - public static RenameDetails analyze(List names, File dir, Pattern p, int group, Pattern pIgnore, boolean rptIgnore){ + public static RenameDetails analyze(List names, File dir, Pattern p, int group, Pattern pIgnore, boolean rptIgnore, boolean startAtOne){ TreeMap seq = new TreeMap(); StringBuffer buf = new StringBuffer(); buf.append(names.size()); @@ -67,7 +65,9 @@ public static RenameDetails analyze(List names, File dir, Pattern p, int try { int cur = Integer.parseInt(s); Integer x = seq.get(cur); - if (x == null) x = 0; + if (x == null) { + x = 0; + } //handle dup? seq.put(cur,x++); } catch(NumberFormatException e) { diff --git a/core/src/main/gov/nara/nwts/ftapp/nameValidation/RenameStatus.java b/core/src/main/gov/nara/nwts/ftapp/nameValidation/RenameStatus.java index ec0757e..f1f95c5 100644 --- a/core/src/main/gov/nara/nwts/ftapp/nameValidation/RenameStatus.java +++ b/core/src/main/gov/nara/nwts/ftapp/nameValidation/RenameStatus.java @@ -6,6 +6,7 @@ * */ public enum RenameStatus { + SKIP("Skip", RenamePassFail.PASS), NEXT("Test inconclusive", RenamePassFail.PASS), VALID("Original filename is valid, rename not required", RenamePassFail.PASS), INVALID_MANUAL("Original filename must be manually renamed",RenamePassFail.FAIL), diff --git a/core/src/main/gov/nara/nwts/ftapp/stats/Stats.java b/core/src/main/gov/nara/nwts/ftapp/stats/Stats.java index 9a864cd..76d6422 100644 --- a/core/src/main/gov/nara/nwts/ftapp/stats/Stats.java +++ b/core/src/main/gov/nara/nwts/ftapp/stats/Stats.java @@ -50,6 +50,14 @@ public void setKeyVal(StatsItem si, Object val) { } } + public Object getKeyVal(StatsItem si, Object val) { + if (si == null) return val; + int index = si.getIndex(); + if ((index >= 0) && (vals.size() > index)) { + return vals.get(index); + } + return val; + } public void setVal(StatsItemEnum eitem, Object val) { int index = eitem.si().getIndex(); if ((index >= 0) && (vals.size() > index)) { @@ -120,7 +128,23 @@ public void appendVal(StatsItemEnum eitem, String val) { } } } - + + public void appendKeyVal(StatsItem si, Object val) { + if (si == null) return; + int index = si.getIndex(); + if ((index >= 0) && (vals.size() > index)) { + Object obj = vals.get(index); + if (obj == null) { + vals.set(index,val); + } else if (obj instanceof String) { + String s = (String)obj; + s += val; + vals.set(index, s); + } + } + } + + public Vector getVals() { return vals; } @@ -133,6 +157,10 @@ public Long getLongVal(StatsItemEnum eitem) { return (Long)getVal(eitem, null); } + public Integer getIntVal(StatsItemEnum eitem) { + return (Integer)getVal(eitem, null); + } + public Object getVal(StatsItemEnum eitem, Object def) { int index = eitem.si().getIndex(); if ((index >= 0) && (vals.size() > index)) { @@ -141,7 +169,7 @@ public Object getVal(StatsItemEnum eitem, Object def) { return def; } - public & StatsItemEnum> void init(StatsItemConfig config) { + public void init(StatsItemConfig config) { boolean first = true; vals.clear(); for(StatsItem item: config) { diff --git a/core/src/main/gov/nara/nwts/ftapp/util/FileAnalyzerURIResolver.java b/core/src/main/gov/nara/nwts/ftapp/util/FileAnalyzerURIResolver.java new file mode 100644 index 0000000..81d9726 --- /dev/null +++ b/core/src/main/gov/nara/nwts/ftapp/util/FileAnalyzerURIResolver.java @@ -0,0 +1,23 @@ +package gov.nara.nwts.ftapp.util; + +import java.io.InputStream; + +import javax.xml.transform.Source; +import javax.xml.transform.TransformerException; +import javax.xml.transform.URIResolver; +import javax.xml.transform.stream.StreamSource; + +public class FileAnalyzerURIResolver implements URIResolver { + public Source resolve(String href, String base) throws TransformerException { + InputStream is = FileAnalyzerURIResolver.class.getClassLoader().getResourceAsStream(href); + if (is == null) is = FileAnalyzerURIResolver.class.getClassLoader().getResourceAsStream("resources" + href); + return new StreamSource(is); + } + public InputStream getInputStream(String href, String base) throws TransformerException { + InputStream is = FileAnalyzerURIResolver.class.getClassLoader().getResourceAsStream(href); + if (is == null) is = FileAnalyzerURIResolver.class.getClassLoader().getResourceAsStream("resources/" + href); + return is; + } + public static FileAnalyzerURIResolver INSTANCE = new FileAnalyzerURIResolver(); + +} diff --git a/core/src/main/gov/nara/nwts/ftapp/util/FileUtil.java b/core/src/main/gov/nara/nwts/ftapp/util/FileUtil.java new file mode 100644 index 0000000..cefdfe0 --- /dev/null +++ b/core/src/main/gov/nara/nwts/ftapp/util/FileUtil.java @@ -0,0 +1,32 @@ +package gov.nara.nwts.ftapp.util; + +import java.io.File; +import java.io.FileInputStream; +import java.io.FileOutputStream; +import java.io.IOException; +import java.nio.channels.FileChannel; + +public class FileUtil { + public static void copyFile(File sourceFile, File destFile) throws IOException { + if(!destFile.exists()) { + destFile.createNewFile(); + } + + FileChannel source = null; + FileChannel destination = null; + + try { + source = new FileInputStream(sourceFile).getChannel(); + destination = new FileOutputStream(destFile).getChannel(); + destination.transferFrom(source, 0, source.size()); + } + finally { + if(source != null) { + source.close(); + } + if(destination != null) { + destination.close(); + } + } + } +} diff --git a/core/src/main/gov/nara/nwts/ftapp/util/XMLUtil.java b/core/src/main/gov/nara/nwts/ftapp/util/XMLUtil.java index e3fe06c..de489f2 100644 --- a/core/src/main/gov/nara/nwts/ftapp/util/XMLUtil.java +++ b/core/src/main/gov/nara/nwts/ftapp/util/XMLUtil.java @@ -1,31 +1,46 @@ package gov.nara.nwts.ftapp.util; import java.io.File; -import java.io.FileNotFoundException; import java.io.FileOutputStream; import java.io.IOException; +import java.io.InputStream; +import java.util.HashMap; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.transform.Transformer; -import javax.xml.transform.TransformerConfigurationException; import javax.xml.transform.TransformerException; import javax.xml.transform.TransformerFactory; +import javax.xml.transform.URIResolver; +import javax.xml.transform.dom.DOMResult; import javax.xml.transform.dom.DOMSource; import javax.xml.transform.stream.StreamResult; +import javax.xml.transform.stream.StreamSource; +import javax.xml.xpath.XPath; +import javax.xml.xpath.XPathFactory; import org.w3c.dom.Document; +import org.w3c.dom.Node; public class XMLUtil { public static DocumentBuilderFactory dbf; public static DocumentBuilder db; + public static DocumentBuilderFactory dbf_ns; + public static DocumentBuilder db_ns; public static TransformerFactory tf; + public static XPathFactory xf; + public static XPath xp; static { try { dbf = DocumentBuilderFactory.newInstance(); db = dbf.newDocumentBuilder(); + dbf_ns = DocumentBuilderFactory.newInstance(); + dbf_ns.setNamespaceAware(true); + db_ns = dbf_ns.newDocumentBuilder(); tf = TransformerFactory.newInstance(); + xf = XPathFactory.newInstance(); + xp = xf.newXPath(); } catch (Exception e) { e.printStackTrace(); } @@ -33,19 +48,75 @@ public class XMLUtil { public static void serialize(Document d, File f) { try { - Transformer t = tf.newTransformer(); - FileOutputStream fos = new FileOutputStream(f); - StreamResult sr = new StreamResult(fos); - t.transform(new DOMSource(d), sr); - fos.close(); - } catch (TransformerConfigurationException e) { - e.printStackTrace(); - } catch (FileNotFoundException e) { - e.printStackTrace(); + doSerialize(d, f); } catch (TransformerException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } + + public static void doSerialize(Document d, File f) throws TransformerException, IOException { + Transformer t = tf.newTransformer(); + FileOutputStream fos = new FileOutputStream(f); + StreamResult sr = new StreamResult(fos); + t.transform(new DOMSource(d), sr); + fos.close(); + } + + public static void doTransform(Document d, File f, InputStream is) throws TransformerException, IOException { + doTransform(d,f,is,new HashMap()); + } + public static void doTransform(Document d, File f, InputStream is, HashMap pmap) throws TransformerException, IOException { + Transformer t = tf.newTransformer(new StreamSource(is)); + FileOutputStream fos = new FileOutputStream(f); + StreamResult sr = new StreamResult(fos); + for(String s:pmap.keySet()) { + t.setParameter(s, pmap.get(s)); + } + t.transform(new DOMSource(d), sr); + fos.close(); + } + + public static void doTransform(Document d, File f, String xsl) throws TransformerException, IOException { + doTransform(d, f, FileAnalyzerURIResolver.INSTANCE, xsl, new HashMap()); + } + public static void doTransform(Document d, File f, String xsl, HashMap pmap) throws TransformerException, IOException { + doTransform(d, f, FileAnalyzerURIResolver.INSTANCE, xsl, pmap); + } + public static Node doTransformToDom(Document d, String xsl) throws TransformerException, IOException { + return doTransformToDom(d, xsl, new HashMap()); + } + public static Node doTransformToDom(Document d, String xsl, HashMap pmap) throws TransformerException, IOException { + TransformerFactory tfres = TransformerFactory.newInstance(); + + tfres.setURIResolver(FileAnalyzerURIResolver.INSTANCE); + + Transformer t = tfres.newTransformer(FileAnalyzerURIResolver.INSTANCE.resolve(xsl, "")); + + DOMResult dr = new DOMResult(); + for(String s:pmap.keySet()) { + t.setParameter(s, pmap.get(s)); + } + t.transform(new DOMSource(d), dr); + return dr.getNode(); + } + + public static void doTransform(Document d, File f, URIResolver urir, String xsl) throws TransformerException, IOException { + doTransform(d, f, urir, xsl, new HashMap()); + } + public static void doTransform(Document d, File f, URIResolver urir, String xsl, HashMap pmap) throws TransformerException, IOException { + TransformerFactory tfres = TransformerFactory.newInstance(); + + tfres.setURIResolver(urir); + + Transformer t = tfres.newTransformer(urir.resolve(xsl, "")); + FileOutputStream fos = new FileOutputStream(f); + StreamResult sr = new StreamResult(fos); + for(String s:pmap.keySet()) { + t.setParameter(s, pmap.get(s)); + } + t.transform(new DOMSource(d), sr); + fos.close(); + } } diff --git a/demo/pom.xml b/demo/pom.xml index 6526c4e..ffaa95e 100644 --- a/demo/pom.xml +++ b/demo/pom.xml @@ -32,8 +32,23 @@ DSpaceFileAnalyzer 2.0 + + org.apache.tika + tika-app + 1.4 + + + gov.loc + bagit + 4.8 + + + + src/main/resources + + org.apache.maven.plugins diff --git a/demo/src/main/edu/georgetown/library/fileAnalyzer/filetest/CounterValidationXls.java b/demo/src/main/edu/georgetown/library/fileAnalyzer/filetest/CounterValidationXls.java new file mode 100644 index 0000000..6ca9ee2 --- /dev/null +++ b/demo/src/main/edu/georgetown/library/fileAnalyzer/filetest/CounterValidationXls.java @@ -0,0 +1,102 @@ +package edu.georgetown.library.fileAnalyzer.filetest; + +import gov.nara.nwts.ftapp.counter.CounterStat; +import gov.nara.nwts.ftapp.FTDriver; +import gov.nara.nwts.ftapp.filetest.CounterValidation; +import gov.nara.nwts.ftapp.filter.CSVFilter; +import gov.nara.nwts.ftapp.filter.CounterFilterXls; +import gov.nara.nwts.ftapp.filter.ExcelFilter; +import gov.nara.nwts.ftapp.filter.TxtFilter; +import gov.nara.nwts.ftapp.importer.DelimitedFileReader; +import gov.nara.nwts.ftapp.stats.Stats; + +import java.io.File; +import java.io.FileInputStream; +import java.io.IOException; +import java.util.Vector; + +import org.apache.poi.openxml4j.exceptions.InvalidFormatException; +import org.apache.poi.ss.usermodel.Cell; +import org.apache.poi.ss.usermodel.Row; +import org.apache.poi.ss.usermodel.Sheet; +import org.apache.poi.ss.usermodel.Workbook; +import org.apache.poi.ss.usermodel.WorkbookFactory; + +/** + * Extract all metadata fields from a TIF or JPG using categorized tag defintions. + * @author TBrady + * + */ +public class CounterValidationXls extends CounterValidation { + public CounterValidationXls(FTDriver dt) { + super(dt); + } + @Override public Vector> getDataFromFile(File f, Stats s) { + String ext = getExt(f); + String sep = getSeparator(f, ext); + try { + Vector> data = new Vector>(); + if (ext.equals("CSV") || ext.endsWith("TXT")) { + data = DelimitedFileReader.parseFile(f, sep); + } else { + data = readExcel(f); + } + return data; + } catch (InvalidFormatException e) { + s.setVal(CounterStatsItems.Stat, CounterStat.UNSUPPORTED_FILE); + s.setVal(CounterStatsItems.Message, e.toString()); + } catch (IOException e) { + s.setVal(CounterStatsItems.Stat, CounterStat.UNSUPPORTED_FILE); + s.setVal(CounterStatsItems.Message, e.toString()); + } catch (org.apache.poi.hssf.OldExcelFormatException e) { + s.setVal(CounterStatsItems.Stat, CounterStat.UNSUPPORTED_FILE); + s.setVal(CounterStatsItems.Message, e.toString()); + } catch (Exception e) { + e.printStackTrace(); + s.setVal(CounterStatsItems.Stat, CounterStat.UNSUPPORTED_FILE); + s.setVal(CounterStatsItems.Message, e.toString()); + } + return null; + } + + Vector> readExcel(File f) throws InvalidFormatException, IOException { + Vector> data = new Vector>(); + + FileInputStream fs = new FileInputStream(f); + try { + Workbook w = WorkbookFactory.create(fs); + if (w.getNumberOfSheets() > 1) throw new InvalidFormatException("Workbook has more than one worksheet"); + if (w.getNumberOfSheets() != 1) throw new InvalidFormatException("Workbook does not have a worksheet"); + Sheet sheet = w.getSheetAt(0); + for(int r=0; r row = new Vector(); + data.add(row); + Row srow = sheet.getRow(r); + for(int c=0; c imax) { + stat.setVal(TiffStatsItems.Spec, SPEC.NOT_IN_SPEC); + } + } catch (NumberFormatException e) { + } + } + if (!min.isEmpty()) { + try { + int imin = Integer.parseInt(min); + if (val < imin) { + stat.setVal(TiffStatsItems.Spec, SPEC.NOT_IN_SPEC); + } + } catch (NumberFormatException e) { + } + } + + } + + public Stats createStats(String key){ + return Generator.INSTANCE.create(key); + } + public StatsItemConfig getStatsDetails() { + return details; + } + + public void initFilters() { + filters.add(new TiffJpegFileTestFilter()); + filters.add(new TiffFileTestFilter()); + filters.add(new JpegFileTestFilter()); + } + + public String getDescription() { + return ""; + } + +} diff --git a/demo/src/main/edu/georgetown/library/fileAnalyzer/filetest/PageCount.java b/demo/src/main/edu/georgetown/library/fileAnalyzer/filetest/PageCount.java new file mode 100644 index 0000000..78c62ca --- /dev/null +++ b/demo/src/main/edu/georgetown/library/fileAnalyzer/filetest/PageCount.java @@ -0,0 +1,126 @@ +package edu.georgetown.library.fileAnalyzer.filetest; + +import gov.nara.nwts.ftapp.FTDriver; +import gov.nara.nwts.ftapp.filetest.DefaultFileTest; +import gov.nara.nwts.ftapp.filter.PdfFileTestFilter; +import gov.nara.nwts.ftapp.ftprop.FTPropString; +import gov.nara.nwts.ftapp.stats.Stats; +import gov.nara.nwts.ftapp.stats.StatsGenerator; +import gov.nara.nwts.ftapp.stats.StatsItem; +import gov.nara.nwts.ftapp.stats.StatsItemConfig; +import gov.nara.nwts.ftapp.stats.StatsItemEnum; + +import java.io.File; +import java.io.FileInputStream; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +import org.apache.pdfbox.pdfparser.PDFParser; +import org.apache.pdfbox.pdmodel.PDDocument; + +/** + * Extract all metadata fields from a TIF or JPG using categorized tag defintions. + * @author TBrady + * + */ +class PageCount extends DefaultFileTest { + private static enum PagesStatsItems implements StatsItemEnum { + Key(StatsItem.makeStringStatsItem("Path", 200)), + Pages(StatsItem.makeIntStatsItem("Pages")) + ; + + StatsItem si; + PagesStatsItems(StatsItem si) {this.si=si;} + public StatsItem si() {return si;} + } + + public static enum Generator implements StatsGenerator { + INSTANCE; + class PagesStats extends Stats { + public PagesStats(String key) { + super(details, key); + } + + } + public PagesStats create(String key) {return new PagesStats(key);} + } + public static StatsItemConfig details = StatsItemConfig.create(PagesStatsItems.class); + + long counter = 1000000; + public static final String KEYPATT = "key-regex"; + public static final String KEYGROUP = "regex-group-num"; + public PageCount(FTDriver dt) { + super(dt); + ftprops.add(new FTPropString(dt, this.getClass().getSimpleName(), KEYPATT, KEYPATT, + "Regular expression for setting a key: '.*(\\d\\d\\d).pdf '. If not set, the full path name will be used", "")); + ftprops.add(new FTPropString(dt, this.getClass().getSimpleName(), KEYGROUP, KEYGROUP, + "Regular expression match group number for key. Only applicable if a regular expression is set.", "0")); + } + + public String toString() { + return "Page Count"; + } + public String getKey(File f) { + String key = f.getPath(); + + String patt = getProperty(KEYPATT, "").toString(); + String group = getProperty(KEYGROUP, "").toString(); + if (patt.isEmpty()) return key; + if (group.isEmpty()) return key; + + try { + Pattern p = Pattern.compile(patt); + Matcher m = p.matcher(key); + if (m.matches()) { + int pos = Integer.parseInt(group); + if (pos >= 0 && pos <= m.groupCount()) { + return m.group(pos); + } + } + } catch (Exception e) { + } + + return key; + } + + public String getShortName(){return "Pg";} + + + public Object fileTest(File f) { + Stats s = getStats(f); + int x = 0; + try { + FileInputStream fis = new FileInputStream(f); + PDFParser pp = new PDFParser(fis); + pp.parse(); + PDDocument pd = pp.getPDDocument(); + x = pd.getNumberOfPages(); + s.setVal(PagesStatsItems.Pages, x); + pd.close(); + fis.close(); + } catch (FileNotFoundException e) { + e.printStackTrace(); + } catch (IOException e) { + //e.printStackTrace(); + } + + return x; + } + public Stats createStats(String key){ + return Generator.INSTANCE.create(key); + } + public StatsItemConfig getStatsDetails() { + return details; + } + + public void initFilters() { + filters.add(new PdfFileTestFilter()); + } + + public String getDescription() { + return ""; + } + +} diff --git a/demo/src/main/edu/georgetown/library/fileAnalyzer/filetest/VerifyBag.java b/demo/src/main/edu/georgetown/library/fileAnalyzer/filetest/VerifyBag.java new file mode 100644 index 0000000..d395af7 --- /dev/null +++ b/demo/src/main/edu/georgetown/library/fileAnalyzer/filetest/VerifyBag.java @@ -0,0 +1,106 @@ +package edu.georgetown.library.fileAnalyzer.filetest; + +import gov.loc.repository.bagit.Bag; +import gov.loc.repository.bagit.BagFactory; +import gov.loc.repository.bagit.utilities.SimpleResult; +import gov.nara.nwts.ftapp.FTDriver; +import gov.nara.nwts.ftapp.filetest.DefaultFileTest; +import gov.nara.nwts.ftapp.stats.Stats; +import gov.nara.nwts.ftapp.stats.StatsGenerator; +import gov.nara.nwts.ftapp.stats.StatsItem; +import gov.nara.nwts.ftapp.stats.StatsItemConfig; +import gov.nara.nwts.ftapp.stats.StatsItemEnum; + +import java.io.File; + +/** + * Extract all metadata fields from a TIF or JPG using categorized tag defintions. + * @author TBrady + * + */ +class VerifyBag extends DefaultFileTest { + public enum STAT { + VALID, + INVALID, + ERROR + } + + private static enum BagStatsItems implements StatsItemEnum { + Key(StatsItem.makeStringStatsItem("Bag Path", 200)), + Stat(StatsItem.makeEnumStatsItem(STAT.class, "Bag Status")), + Count(StatsItem.makeIntStatsItem("Item Count")), + Message(StatsItem.makeStringStatsItem("Message",400)), + ; + StatsItem si; + BagStatsItems(StatsItem si) {this.si=si;} + public StatsItem si() {return si;} + } + + public static enum Generator implements StatsGenerator { + INSTANCE; + class BagStats extends Stats { + public BagStats(String key) { + super(details, key); + } + + } + public BagStats create(String key) {return new BagStats(key);} + } + public static StatsItemConfig details = StatsItemConfig.create(BagStatsItems.class); + + long counter = 1000000; + public VerifyBag(FTDriver dt) { + super(dt); + } + + public String toString() { + return "Verify Bag"; + } + public String getKey(File f) { + return this.getRelPath(f); + } + + public String getShortName(){return "Bag";} + + + public Object fileTest(File f) { + Stats s = getStats(f); + BagFactory bf = new BagFactory(); + Bag bag = bf.createBag(f); + s.setVal(BagStatsItems.Count, bag.getPayload().size()); + SimpleResult result = bag.verifyValid(); + if (result.isSuccess()) { + s.setVal(BagStatsItems.Stat, STAT.VALID); + } else { + s.setVal(BagStatsItems.Stat, STAT.INVALID); + for(String m: result.getMessages()) { + s.appendVal(BagStatsItems.Message, m +" "); + } + } + return s.getVal(BagStatsItems.Count); + } + public Stats createStats(String key){ + return Generator.INSTANCE.create(key); + } + public StatsItemConfig getStatsDetails() { + return details; + } + + public String getDescription() { + return "This rule will scan for directories with names ending with '_bag'. BagIt verifyvalid will be run on the bag."; + } + + @Override public boolean isTestDirectory() { + return true; + } + + @Override public boolean isTestFiles() { + return false; + } + + @Override public boolean isTestable(File f) { + //if (!f.getParentFile().equals(dt.root)) return false; + return (f.getName().endsWith("_bag")); + } + +} diff --git a/demo/src/main/edu/georgetown/library/fileAnalyzer/filetest/YearbookNameValidationTest.java b/demo/src/main/edu/georgetown/library/fileAnalyzer/filetest/YearbookNameValidationTest.java new file mode 100644 index 0000000..1d30785 --- /dev/null +++ b/demo/src/main/edu/georgetown/library/fileAnalyzer/filetest/YearbookNameValidationTest.java @@ -0,0 +1,56 @@ +package edu.georgetown.library.fileAnalyzer.filetest; + + +import java.io.File; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +import gov.nara.nwts.ftapp.FTDriver; +import gov.nara.nwts.ftapp.filetest.NameValidationTest; +import gov.nara.nwts.ftapp.nameValidation.CustomPattern; +import gov.nara.nwts.ftapp.nameValidation.DirAnalysis; +import gov.nara.nwts.ftapp.nameValidation.RenameDetails; +import gov.nara.nwts.ftapp.nameValidation.RenameStatus; +import gov.nara.nwts.ftapp.nameValidation.ValidPattern; +/** + * Filename validation rule to ensure that filenames are lowercase. + * @author TBrady + * + */ +class YearbookNameValidationTest extends NameValidationTest { + + Pattern dirPatternFolder = Pattern.compile("^\\d\\d\\d\\d*$"); + Pattern dirPatternImage = Pattern.compile("gt_yearbooks_\\d\\d\\d\\d_(\\d\\d\\d).tif"); + Pattern dirPatternIgnore = Pattern.compile("^(Thumbs.db|.*_st.pdf)$"); + + public YearbookNameValidationTest(FTDriver dt) { + super(dt, new ValidPattern("^gt_yearbooks_\\d\\d\\d\\d_(\\d\\d\\d.tif|st.pdf)*$", false),null, "Yearbook","Yearbook"); + + testPatterns.add(new CustomPattern("^Thumbs.db$", false) { + public RenameDetails report(File f, Matcher m) { + return new RenameDetails(RenameStatus.SKIP); + } + + }); + + dirTestPatterns.add(new CustomPattern(dirPatternFolder, false) { + public RenameDetails report(File f, Matcher m) { + RenameDetails det = DirAnalysis.analyze(f, dirPatternImage, 1, dirPatternIgnore, false, false); + return det; + } + + }); + dirTestPatterns.add(new CustomPattern("^.*$", false) { + public RenameDetails report(File f, Matcher m) { + return new RenameDetails(RenameStatus.SKIP); + } + + }); + } + + public String getDescription() { + return "This test will check conformance with the Georgetown Digitization Yearbook project." + + getNameValidationDisclaimer(); + } + +} diff --git a/demo/src/main/edu/georgetown/library/fileAnalyzer/importer/DemoImporterRegistry.java b/demo/src/main/edu/georgetown/library/fileAnalyzer/importer/DemoImporterRegistry.java index 64c5f36..c8d597a 100644 --- a/demo/src/main/edu/georgetown/library/fileAnalyzer/importer/DemoImporterRegistry.java +++ b/demo/src/main/edu/georgetown/library/fileAnalyzer/importer/DemoImporterRegistry.java @@ -1,29 +1,26 @@ -package edu.georgetown.library.fileAnalyzer.importer; - -import edu.georgetown.library.fileAnalyzer.importer.demo.CreateDateParser; -import edu.georgetown.library.fileAnalyzer.importer.demo.MarcImporter; -import edu.georgetown.library.fileAnalyzer.importer.demo.TabSepToDC; -import gov.nara.nwts.ftapp.FTDriver; -import gov.nara.nwts.ftapp.importer.ImporterRegistry; - - -/** - * Activates the Importers that will be presented on the Import tab. - * @author TBrady - * - */ -public class DemoImporterRegistry extends ImporterRegistry { - - private static final long serialVersionUID = 1L; - - public DemoImporterRegistry(FTDriver dt) { - super(dt); - add(new IngestFolderCreate(dt)); - - add(new TabSepToDC(dt)); - add(new MarcImporter(dt)); - add(new CreateDateParser(dt)); - } - - -} +package edu.georgetown.library.fileAnalyzer.importer; + +import edu.georgetown.library.fileAnalyzer.filetest.CounterValidationXls; +import gov.nara.nwts.ftapp.FTDriver; +import gov.nara.nwts.ftapp.filetest.CounterValidation; + + +/** + * Activates the Importers that will be presented on the Import tab. + * @author TBrady + * + */ +public class DemoImporterRegistry extends DSpaceImporterRegistry { + + private static final long serialVersionUID = 1L; + + public DemoImporterRegistry(FTDriver dt) { + super(dt); + add(new MarcValidator(dt)); + + removeImporter(CounterValidation.class); + add(new CounterValidationXls(dt)); + } + + +} diff --git a/demo/src/main/edu/georgetown/library/fileAnalyzer/importer/MarcValidator.java b/demo/src/main/edu/georgetown/library/fileAnalyzer/importer/MarcValidator.java new file mode 100644 index 0000000..6474d6f --- /dev/null +++ b/demo/src/main/edu/georgetown/library/fileAnalyzer/importer/MarcValidator.java @@ -0,0 +1,199 @@ +package edu.georgetown.library.fileAnalyzer.importer; + +import java.io.File; +import java.io.IOException; +import java.text.NumberFormat; +import java.util.TreeMap; + +import javax.xml.transform.TransformerException; + +import org.w3c.dom.Document; +import org.w3c.dom.Element; +import org.w3c.dom.NodeList; +import org.xml.sax.SAXException; + + +import gov.nara.nwts.ftapp.ActionResult; +import gov.nara.nwts.ftapp.FTDriver; +import gov.nara.nwts.ftapp.Timer; +import gov.nara.nwts.ftapp.importer.DefaultImporter; +import gov.nara.nwts.ftapp.stats.Stats; +import gov.nara.nwts.ftapp.stats.StatsGenerator; +import gov.nara.nwts.ftapp.stats.StatsItem; +import gov.nara.nwts.ftapp.stats.StatsItemConfig; +import gov.nara.nwts.ftapp.stats.StatsItemEnum; +import gov.nara.nwts.ftapp.util.XMLUtil; + +/** + * Importer for tab delimited files + * + * @author TBrady + * + */ +public class MarcValidator extends DefaultImporter { + + public static NumberFormat nf = NumberFormat.getNumberInstance(); + static { + nf.setMinimumIntegerDigits(5); + nf.setGroupingUsed(false); + } + + public static enum STAT { + VALID, + INVALID, + } + + public static enum COUNT { + MISSING, + NOT_PRESENT, + PRESENT_ONCE, + MULTIPLE_COPIES, + CONTENT_ERR + } + + public static enum MarcStatsItems implements StatsItemEnum { + ItemNum(StatsItem.makeStringStatsItem("ItemNum", 100)), + Stat(StatsItem.makeEnumStatsItem(STAT.class, "Status").setWidth(150)), + Author(StatsItem.makeStringStatsItem("Author", 250)), + Title(StatsItem.makeStringStatsItem("Title", 250)), + + f949i(StatsItem.makeStringStatsItem("949 i").setWidth(150)), + f949l(StatsItem.makeStringStatsItem("949 l").setWidth(50)), + f949s(StatsItem.makeStringStatsItem("949 s").setWidth(50)), + f949t(StatsItem.makeStringStatsItem("949 t").setWidth(50)), + f949z(StatsItem.makeStringStatsItem("949 z").setWidth(50)), + f949a(StatsItem.makeStringStatsItem("949 a").setWidth(100)), + f949b(StatsItem.makeStringStatsItem("949 b").setWidth(100)), + f949(StatsItem.makeEnumStatsItem(COUNT.class, "949").setWidth(120)), + Stat949(StatsItem.makeStringStatsItem("Status 949").setWidth(350)), + + f980(StatsItem.makeEnumStatsItem(COUNT.class, "980").setWidth(120)), + Stat980(StatsItem.makeStringStatsItem("Status 980").setWidth(350)), + + f981(StatsItem.makeEnumStatsItem(COUNT.class, "981").setWidth(120)), + f981b(StatsItem.makeStringStatsItem("981 b").setWidth(50)), + Stat981(StatsItem.makeStringStatsItem("Status 981").setWidth(350)), + + f935(StatsItem.makeEnumStatsItem(COUNT.class, "935").setWidth(120)), + f935a(StatsItem.makeStringStatsItem("935 a").setWidth(50)), + Stat935(StatsItem.makeStringStatsItem("Status 935").setWidth(350)), + ; + + StatsItem si; + + MarcStatsItems(StatsItem si) { + this.si = si; + } + + public StatsItem si() { + return si; + } + } + + public static enum Generator implements StatsGenerator { + INSTANCE; + public Stats create(String key) { + return new Stats(details, key); + } + } + + public static StatsItemConfig details = StatsItemConfig + .create(MarcStatsItems.class); + + public MarcValidator(FTDriver dt) { + super(dt); + } + + public String toString() { + return "MARC Validator"; + } + + public String getDescription() { + return "Look for common issues when importing Marc Records."; + } + + public String getShortName() { + return "Marc"; + } + + public void setFieldCount(Stats stat, String s, MarcStatsItems msi, boolean required) { + if (s.equals("1")) { + stat.setVal(msi, COUNT.PRESENT_ONCE); + } else { + if (s.equals("0")) { + if (required) { + stat.setVal(msi, COUNT.MISSING); + } else { + stat.setVal(msi, COUNT.NOT_PRESENT); + return; + } + } else { + stat.setVal(msi, COUNT.MULTIPLE_COPIES); + } + stat.setVal(MarcStatsItems.Stat, STAT.INVALID); + } + + } + + public void setFieldError(Stats stat, String s, MarcStatsItems msiMsg, MarcStatsItems msi) { + stat.setVal(msiMsg, s); + if (!s.isEmpty()) { + stat.setVal(MarcStatsItems.Stat, STAT.INVALID); + stat.setVal(msi, COUNT.CONTENT_ERR); + } + } + + + public ActionResult importFile(File selectedFile) throws IOException { + Timer timer = new Timer(); + TreeMap types = new TreeMap(); + + try { + Document d = XMLUtil.db_ns.parse(selectedFile); + Document dout = (Document)XMLUtil.doTransformToDom(d, "edu/georgetown/library/fileAnalyzer/importer/marc.xsl"); + NodeList nl = dout.getElementsByTagName("result"); + + for(int i=0; i < nl.getLength(); i++) { + String key = nf.format(i+1); + Stats stat = Generator.INSTANCE.create(key); + stat.setVal(MarcStatsItems.Stat, STAT.VALID); + types.put(key, stat); + Element el = (Element)nl.item(i); + stat.setVal(MarcStatsItems.Author, el.getAttribute("author")); + stat.setVal(MarcStatsItems.Title, el.getAttribute("title")); + stat.setVal(MarcStatsItems.f949i, el.getAttribute("f949i")); + stat.setVal(MarcStatsItems.f949l, el.getAttribute("f949l")); + stat.setVal(MarcStatsItems.f949s, el.getAttribute("f949s")); + stat.setVal(MarcStatsItems.f949t, el.getAttribute("f949t")); + stat.setVal(MarcStatsItems.f949z, el.getAttribute("f949z")); + stat.setVal(MarcStatsItems.f949a, el.getAttribute("f949a")); + stat.setVal(MarcStatsItems.f949b, el.getAttribute("f949b")); + setFieldCount(stat, el.getAttribute("f949"), MarcStatsItems.f949, true); + setFieldError(stat, el.getAttribute("f949err"), MarcStatsItems.Stat949, MarcStatsItems.f949); + + setFieldCount(stat, el.getAttribute("f980"), MarcStatsItems.f980, true); + setFieldError(stat, el.getAttribute("f980err"), MarcStatsItems.Stat980, MarcStatsItems.f980); + + setFieldCount(stat, el.getAttribute("f981"), MarcStatsItems.f981, true); + stat.setVal(MarcStatsItems.f981b, el.getAttribute("f981b")); + setFieldError(stat, el.getAttribute("f981err"), MarcStatsItems.Stat981, MarcStatsItems.f981); + + setFieldCount(stat, el.getAttribute("f935"), MarcStatsItems.f935, false); + stat.setVal(MarcStatsItems.f935a, el.getAttribute("f935a")); + setFieldError(stat, el.getAttribute("f350err"), MarcStatsItems.Stat935, MarcStatsItems.f935); + + } + } catch (SAXException e) { + e.printStackTrace(); + } catch (TransformerException e) { + e.printStackTrace(); + } + return new ActionResult(selectedFile, selectedFile.getName(), + this.toString(), details, types, true, timer.getDuration()); + } + + public void doLine(String line) { + + } + +} diff --git a/demo/src/main/edu/georgetown/library/fileAnalyzer/importer/demo/CreateDateParser.java b/demo/src/main/edu/georgetown/library/fileAnalyzer/importer/demo/CreateDateParser.java deleted file mode 100644 index 124b881..0000000 --- a/demo/src/main/edu/georgetown/library/fileAnalyzer/importer/demo/CreateDateParser.java +++ /dev/null @@ -1,291 +0,0 @@ -package edu.georgetown.library.fileAnalyzer.importer.demo; - -import java.io.File; -import java.io.IOException; -import java.util.ArrayList; -import java.util.EnumSet; -import java.util.List; -import java.util.TreeMap; -import java.util.Vector; -import java.util.regex.Matcher; - -import edu.georgetown.library.fileAnalyzer.dateValidation.DateValidationPattern; -import edu.georgetown.library.fileAnalyzer.dateValidation.DateValidationResult; -import edu.georgetown.library.fileAnalyzer.dateValidation.DateValidationStatus; -import edu.georgetown.library.fileAnalyzer.dateValidation.DateValidator; - -import gov.nara.nwts.ftapp.ActionResult; -import gov.nara.nwts.ftapp.FTDriver; -import gov.nara.nwts.ftapp.Timer; -import gov.nara.nwts.ftapp.YN; -import gov.nara.nwts.ftapp.importer.DefaultImporter; -import gov.nara.nwts.ftapp.importer.DelimitedFileImporter; -import gov.nara.nwts.ftapp.stats.Stats; -import gov.nara.nwts.ftapp.stats.StatsGenerator; -import gov.nara.nwts.ftapp.stats.StatsItem; -import gov.nara.nwts.ftapp.stats.StatsItemConfig; -import gov.nara.nwts.ftapp.stats.StatsItemEnum; - -public class CreateDateParser extends DefaultImporter { - public static String months = "January|Jan|February|Feb|March|Mar|April|Apr|May|June|Jun|July|Jul|August|Aug|September|Sep|October|Oct|November|Nov|December|Dec"; - - DateValidator dv; - - public CreateDateParser(FTDriver dt) { - super(dt); - dv = new DateValidator("yyyy-MM-dd"); - dv.addValidationPattern("^\\d\\d\\d\\d-\\d\\d-\\d\\dT\\d\\d:\\d\\d:\\d\\dZ$", "yyyy-MM-dd'T'HH:mm:ss'Z'"); - dv.addValidationPattern("^\\d\\d\\d\\d-\\d\\d?-\\d\\d?$", "yyyy-MM-dd"); - dv.addValidationPattern("^\\d\\d\\d\\d-\\d\\d?$", "yyyy-MM", "yyyy-MM"); - dv.addValidationPattern("^("+months+") \\d\\d?, \\d\\d\\d\\d$", "MMM dd, yyyy"); - dv.addValidationPattern("^("+months+") \\d\\d\\d\\d$", "MMM yyyy", "yyyy-MM"); - dv.addValidationPattern("^\\d\\d?/\\d\\d?/\\d\\d\\d\\d$", "MM/dd/yyyy"); - dv.addValidationPattern( - new DateValidationPattern("^0000(0000)?$"){ - public DateValidationResult makeResult(String s, Matcher m) { - return DateValidationResult.invalid(); - } - } - ); - dv.addValidationPattern( - new DateValidationPattern("^(\\d\\d\\d\\d)$"){ - public DateValidationResult makeResult(String s, Matcher m) { - return DateValidationResult.valid(m.group(1)); - } - } - ); - dv.addValidationPattern( - new DateValidationPattern("^(\\d\\d\\d\\d)0000$"){ - public DateValidationResult makeResult(String s, Matcher m) { - return DateValidationResult.parseable(m.group(1)); - } - } - ); - dv.addValidationPattern( - new DateValidationPattern("^(\\d\\d\\d\\d)(\\d\\d)00$"){ - public DateValidationResult makeResult(String s, Matcher m) { - return DateValidationResult.parseable(m.group(1)+"-"+m.group(2)); - } - } - ); - dv.addValidationPattern("^\\d\\d\\d\\d\\d\\d\\d\\d$", "yyyyMMdd"); - } - - private enum TY {C,I,D,A,V} - private static List> getDateCombos(){ - ArrayList> list = new ArrayList>(); - for(int ci=0; ci<2; ci++) { - for(int ii=0; ii<2; ii++) { - for(int di=0; di<2; di++) { - for(int ai=0; ai<2; ai++) { - for(int vi=0; vi<2; vi++) { - ArrayList ilist = new ArrayList(); - if (ci == 0) ilist.add(TY.C); - if (ii == 0) ilist.add(TY.I); - if (di == 0) ilist.add(TY.D); - if (ai == 0) ilist.add(TY.A); - if (vi == 0) ilist.add(TY.V); - EnumSet set = (ilist.isEmpty()) ? EnumSet.noneOf(TY.class) : EnumSet.copyOf(ilist); - list.add(set); - } - } - } - } - } - return list; - } - private static String getDateMix() { - return "NA"; - } - private static String getDateMix(EnumSet set) { - if (set == null) return getDateMix(); - String s = ""; - for(TY t: TY.values()) { - if (set.contains(t)) { - s += t.name(); - } else { - s += "_"; - } - } - return s; - } - - private static String getDateMix(DateValidationStatus[] dstats) { - ArrayList list = new ArrayList(); - for(TY t: TY.values()) { - if (dstats[t.ordinal()].exists()) list.add(t); - } - return getDateMix(EnumSet.copyOf(list)); - } - - private static List getDateMixNames() { - ArrayList list = new ArrayList(); - for(EnumSet set: getDateCombos()) { - list.add(getDateMix(set)); - } - list.add(getDateMix()); - return list; - } - - private static enum DSpaceDateStatsItems implements StatsItemEnum { - ItemId(StatsItem.makeStringStatsItem("Item Id", 80)), - OverallStat(StatsItem.makeEnumStatsItem(DateValidationStatus.class, "Pass/Fail",DateValidationStatus.MISSING).setWidth(80)), - ResolvedDate(StatsItem.makeStringStatsItem("Res Date", 130)), - Title(StatsItem.makeStringStatsItem("Title", 300)), - Handle(StatsItem.makeStringStatsItem("Handle", 100)), - - DateMix(StatsItem.makeStringStatsItem("DateMix", 80).setValues(getDateMixNames().toArray()).setInitVal(getDateMix())), - - CreateYN(StatsItem.makeEnumStatsItem(YN.class, "Has Create Date?", YN.N).setWidth(60)), - IssueYN(StatsItem.makeEnumStatsItem(YN.class, "Has Issue Date?", YN.N).setWidth(60)), - UnqualDateYN(StatsItem.makeEnumStatsItem(YN.class, "Has Unqualified Date?", YN.N).setWidth(60)), - AccessnYN(StatsItem.makeEnumStatsItem(YN.class, "Has Accession Date?", YN.N).setWidth(60)), - AvailYN(StatsItem.makeEnumStatsItem(YN.class, "Has Avail Date?", YN.N).setWidth(60)), - - CreateDate(StatsItem.makeStringStatsItem("Creation Date", 130)), - CreateDateStatus(StatsItem.makeEnumStatsItem(DateValidationStatus.class, "Create Date Status", DateValidationStatus.MISSING)), - CreateDateNormalized(StatsItem.makeStringStatsItem("Creation Date Normalized", 130)), - - IssueDate(StatsItem.makeStringStatsItem("Issue Date", 130)), - IssueDateStatus(StatsItem.makeEnumStatsItem(DateValidationStatus.class, "Issue Date Status", DateValidationStatus.MISSING)), - IssueDateNormalized(StatsItem.makeStringStatsItem("Issue Date Normalized", 130)), - - UnqualDate(StatsItem.makeStringStatsItem("Unqualified Date", 130)), - UnqualDateStatus(StatsItem.makeEnumStatsItem(DateValidationStatus.class, "Unqualified Date Status", DateValidationStatus.MISSING)), - UnqualDateNormalized(StatsItem.makeStringStatsItem("Unqualified Date Normalized", 130)), - - AccessnDate(StatsItem.makeStringStatsItem("Accessn Date", 130)), - AccessnDateStatus(StatsItem.makeEnumStatsItem(DateValidationStatus.class, "Accessn Date Status", DateValidationStatus.MISSING)), - AccessnDateNormalized(StatsItem.makeStringStatsItem("Accessn Date Normalized", 130)), - - AvailDate(StatsItem.makeStringStatsItem("Avail Date", 130)), - AvailDateStatus(StatsItem.makeEnumStatsItem(DateValidationStatus.class, "Avail Date Status", DateValidationStatus.MISSING)), - AvailDateNormalized(StatsItem.makeStringStatsItem("Avail Date Normalized", 130)), - - CopyrightDate(StatsItem.makeStringStatsItem("Copyright Date", 100)), - SubmissionDate(StatsItem.makeStringStatsItem("Submission Date", 100)), - ; - - StatsItem si; - DSpaceDateStatsItems(StatsItem si) {this.si=si;} - public StatsItem si() {return si;} - } - - public static enum Generator implements StatsGenerator { - INSTANCE; - public Stats create(String key) {return new Stats(details, key);} - } - - int cols = 8; - Object[][]mydetails; - - static StatsItemConfig details = StatsItemConfig.create(DSpaceDateStatsItems.class); - public StatsItemConfig getDetails() { - return details; - } - - public String toString() { - return "Demo: Creation Date Analysis"; - } - public String getDescription() { - return "Prototype: This rule will parse a creation date analysis record."; - } - public String getShortName() { - return "Create"; - } - public ActionResult importFile(File selectedFile) throws IOException { - Timer timer = new Timer(); - TreeMap types = new TreeMap(); - Vector> data = DelimitedFileImporter.parseFile(selectedFile, "|", true); - for (Vector v : data) { - String item = v.get(0); - Stats stats = Generator.INSTANCE.create(item); - - DateValidationStatus[] dstats = new DateValidationStatus[TY.values().length]; - DateValidationStatus itemstatus = DateValidationStatus.MISSING; - String resdate = ""; - - for(int i=0; i 3) stats.setVal(DSpaceDateStatsItems.Title, v.get(3)); /*Title*/ - if (v.size() > 4) stats.setVal(DSpaceDateStatsItems.Handle, v.get(4).replace("http://hdl.handle.net/", "")); /*Handle*/ - - if (v.size() > 5) { - String createdate = v.get(5); - DateValidationResult dvr = dv.test(createdate); - dstats[0] = dvr.status; - itemstatus = dvr.status; - resdate = dvr.result; - - stats.setVal(DSpaceDateStatsItems.CreateDate, createdate); - stats.setVal(DSpaceDateStatsItems.CreateYN, dvr.exists()); - stats.setVal(DSpaceDateStatsItems.CreateDateStatus, dvr.status); - stats.setVal(DSpaceDateStatsItems.CreateDateNormalized, dvr.result); - } - - if (v.size() > 6) { - String issuedate = v.get(6); - DateValidationResult dvr = dv.test(issuedate); - dstats[1] = dvr.status; - - stats.setVal(DSpaceDateStatsItems.IssueDate, issuedate); - stats.setVal(DSpaceDateStatsItems.IssueYN, dvr.exists()); - stats.setVal(DSpaceDateStatsItems.IssueDateStatus, dvr.status); - stats.setVal(DSpaceDateStatsItems.IssueDateNormalized, dvr.result); - } - - if (v.size() > 7) { - String date = v.get(7); - DateValidationResult dvr = dv.test(date); - dstats[2] = dvr.status; - if (!itemstatus.exists()) { - itemstatus = dvr.status; - resdate = dvr.result; - } - - stats.setVal(DSpaceDateStatsItems.UnqualDate, date); - stats.setVal(DSpaceDateStatsItems.UnqualDateYN, dvr.exists()); - stats.setVal(DSpaceDateStatsItems.UnqualDateStatus, dvr.status); - stats.setVal(DSpaceDateStatsItems.UnqualDateNormalized, dvr.result); - } - - if (v.size() > 8) { - String accdate = v.get(8); - DateValidationResult dvr = dv.test(accdate); - dstats[3] = dvr.status; - - stats.setVal(DSpaceDateStatsItems.AccessnDate, accdate); - stats.setVal(DSpaceDateStatsItems.AccessnYN, dvr.exists()); - stats.setVal(DSpaceDateStatsItems.AccessnDateStatus,dvr.status); - stats.setVal(DSpaceDateStatsItems.AccessnDateNormalized, dvr.result); - } - - if (v.size() > 9) { - String avdate = v.get(9); - DateValidationResult dvr = dv.test(avdate); - dstats[4] = dvr.status; - - stats.setVal(DSpaceDateStatsItems.AvailDate, avdate); - stats.setVal(DSpaceDateStatsItems.AvailYN, dvr.exists()); - stats.setVal(DSpaceDateStatsItems.AvailDateStatus, dvr.status); - stats.setVal(DSpaceDateStatsItems.AvailDateNormalized, dvr.result); - } - - if (v.size() > 10) stats.setVal(DSpaceDateStatsItems.CopyrightDate, v.get(10)); - if (v.size() > 11) stats.setVal(DSpaceDateStatsItems.SubmissionDate, v.get(11)); - - if (v.size() != 12) { - itemstatus = DateValidationStatus.UNTESTABLE; - } - stats.setVal(DSpaceDateStatsItems.DateMix, getDateMix(dstats)); - stats.setVal(DSpaceDateStatsItems.OverallStat, itemstatus); - stats.setVal(DSpaceDateStatsItems.ResolvedDate, resdate); - types.put(item, stats); - } - - return new ActionResult(selectedFile, selectedFile.getName(), this.toString(), getDetails(), types, true, timer.getDuration()); - } - -} diff --git a/demo/src/main/edu/georgetown/library/fileAnalyzer/importer/demo/MarcImporter.java b/demo/src/main/edu/georgetown/library/fileAnalyzer/importer/demo/MarcImporter.java deleted file mode 100644 index eaa8a64..0000000 --- a/demo/src/main/edu/georgetown/library/fileAnalyzer/importer/demo/MarcImporter.java +++ /dev/null @@ -1,116 +0,0 @@ -package edu.georgetown.library.fileAnalyzer.importer.demo; - -import java.io.BufferedReader; -import java.io.File; -import java.io.FileReader; -import java.io.IOException; -import java.util.TreeMap; -import java.util.regex.Matcher; -import java.util.regex.Pattern; - -import gov.nara.nwts.ftapp.ActionResult; -import gov.nara.nwts.ftapp.FTDriver; -import gov.nara.nwts.ftapp.Timer; -import gov.nara.nwts.ftapp.importer.DefaultImporter; -import gov.nara.nwts.ftapp.stats.Stats; -import gov.nara.nwts.ftapp.stats.StatsGenerator; -import gov.nara.nwts.ftapp.stats.StatsItem; -import gov.nara.nwts.ftapp.stats.StatsItemConfig; -import gov.nara.nwts.ftapp.stats.StatsItemEnum; - -/** - * Importer for tab delimited files - * @author TBrady - * - */ -public class MarcImporter extends DefaultImporter { - - private static enum MarcStatsItems implements StatsItemEnum { - Leader(StatsItem.makeStringStatsItem("Leader").setExport(false)), - Author(StatsItem.makeStringStatsItem("Author 100", 300)), - Dates(StatsItem.makeStringStatsItem("Dates", 160)), - Title(StatsItem.makeStringStatsItem("Title 245", 300)), - Title_h(StatsItem.makeStringStatsItem("Title $h", 200)), - Title_b(StatsItem.makeStringStatsItem("Title $b", 200)), - ; - - StatsItem si; - MarcStatsItems(StatsItem si) {this.si=si;} - public StatsItem si() {return si;} - } - - public static enum Generator implements StatsGenerator { - INSTANCE; - public Stats create(String key) {return new Stats(details, key);} - } - static StatsItemConfig details = StatsItemConfig.create(MarcStatsItems.class); - - public MarcImporter(FTDriver dt) { - super(dt); - } - - public String toString() { - return "Demo: Import Key Marc Info"; - } - public String getDescription() { - return "Prototype: This rule will perform a simple parse of a MARC record"; - } - public String getShortName() { - return "Marc"; - } - - class Split { - String before =""; - String after =""; - String otherwise =""; - Split(String s, String sep) { - before = s; - int i = s.indexOf(sep); - if (i!=-1) { - before = s.substring(0, i); - after = s.substring(i + sep.length()); - otherwise = after; - } else { - otherwise = s; - } - } - } - - public ActionResult importFile(File selectedFile) throws IOException { - Pattern p = Pattern.compile("^=(LDR|\\d\\d\\d)\\s+....(.*)$"); - Timer timer = new Timer(); - FileReader fr = new FileReader(selectedFile); - BufferedReader br = new BufferedReader(fr); - TreeMap types = new TreeMap(); - Stats rec = null; - for(String line=br.readLine(); line!=null; line=br.readLine()){ - Matcher m = p.matcher(line); - if (m.matches()) { - String field = m.group(1); - String val = m.group(2); - - if (field.equals("LDR")) { - if (rec != null) { - types.put(rec.key, rec); - } - rec = Generator.INSTANCE.create(val); - } else if (rec == null) { - continue; - } else if (field.equals("100")) { - Split f100 = new Split(val, "$d"); - rec.setVal(MarcStatsItems.Author, f100.before); - rec.setVal(MarcStatsItems.Dates, f100.after); - } else if (field.equals("245")) { - Split f245 = new Split(val, "$h"); - rec.setVal(MarcStatsItems.Title,f245.before); - f245 = new Split(f245.otherwise, "$b"); - rec.setVal(MarcStatsItems.Title_h, f245.before); - rec.setVal(MarcStatsItems.Title_b, f245.after); - } - } - } - fr.close(); - - return new ActionResult(selectedFile, selectedFile.getName(), this.toString(), details, types, true, timer.getDuration()); - } -} diff --git a/demo/src/main/edu/georgetown/library/fileAnalyzer/importer/demo/TabSepToDC.java b/demo/src/main/edu/georgetown/library/fileAnalyzer/importer/demo/TabSepToDC.java deleted file mode 100644 index 2f44679..0000000 --- a/demo/src/main/edu/georgetown/library/fileAnalyzer/importer/demo/TabSepToDC.java +++ /dev/null @@ -1,152 +0,0 @@ -package edu.georgetown.library.fileAnalyzer.importer.demo; - -import java.io.BufferedReader; -import java.io.File; -import java.io.FileReader; -import java.io.IOException; -import java.text.NumberFormat; -import java.util.TreeMap; -import java.util.Vector; - -import org.w3c.dom.Document; -import org.w3c.dom.Element; - -import gov.nara.nwts.ftapp.ActionResult; -import gov.nara.nwts.ftapp.FTDriver; -import gov.nara.nwts.ftapp.Timer; -import gov.nara.nwts.ftapp.importer.DelimitedFileImporter; -import gov.nara.nwts.ftapp.stats.Stats; -import gov.nara.nwts.ftapp.stats.StatsGenerator; -import gov.nara.nwts.ftapp.stats.StatsItem; -import gov.nara.nwts.ftapp.stats.StatsItemConfig; -import gov.nara.nwts.ftapp.stats.StatsItemEnum; -import gov.nara.nwts.ftapp.util.XMLUtil; - -import edu.georgetown.library.fileAnalyzer.importer.demo.TabSepToDC.Generator.DCStats; - - -/** - * Importer for tab delimited files - * @author TBrady - * - */ -public class TabSepToDC extends DelimitedFileImporter { - - public enum status {PASS,FAIL} - NumberFormat nf; - - private static enum DCStatsItems implements StatsItemEnum { - LineNo(StatsItem.makeStringStatsItem("LineNo").setExport(false)), - Status(StatsItem.makeEnumStatsItem(status.class, "Status")), - ItemFolder(0, StatsItem.makeStringStatsItem("Item Folder", 200)), - ItemTitle(1, StatsItem.makeStringStatsItem("Item Title", 150)), - Author(2, StatsItem.makeStringStatsItem("Author", 150)), - Date(3, StatsItem.makeStringStatsItem("Date", 80)), - Language(4, StatsItem.makeStringStatsItem("Language", 80)), - Subject(5, StatsItem.makeStringStatsItem("Subject", 150)), - Format(6, StatsItem.makeStringStatsItem("Format", 150)), - Publsiher(7, StatsItem.makeStringStatsItem("Publisher", 150)), - ; - - StatsItem si; - Integer col; - DCStatsItems(StatsItem si) {this.si=si;} - DCStatsItems(int col, StatsItem si) {this.si=si; this.col = col;} - public StatsItem si() {return si;} - } - - public static enum Generator implements StatsGenerator { - INSTANCE; - class DCStats extends Stats { - - public DCStats(String key) { - super(details, key); - } - - public void setColumnVals(Vector cols) { - for(DCStatsItems dc: DCStatsItems.values()) { - setColumnVal(dc, cols); - } - } - - public void setColumnVal(DCStatsItems dc, Vector cols) { - if (dc.col == null) return; - if (cols.size() > dc.col) { - setVal(dc, cols.get(dc.col)); - } - } - } - public DCStats create(String key) {return new DCStats(key);} - } - static StatsItemConfig details = StatsItemConfig.create(DCStatsItems.class); - - public TabSepToDC(FTDriver dt) { - super(dt); - nf = NumberFormat.getNumberInstance(); - nf.setMinimumIntegerDigits(8); - nf.setGroupingUsed(false); - } - - public String getSeparator() { - return "\t"; - } - public String toString() { - return "Demo: Convert Tab Separated File to Dublin Core"; - } - public String getDescription() { - return "Prototype: This rule will import a tab separated file and create a Dublin Core File from it"; - } - public String getShortName() { - return "Tab2DC"; - } - - public void createItems(Document d, Vector cols) { - Element e = d.createElement("item"); - e.setAttribute("dir", cols.get(0)); - d.getDocumentElement().appendChild(e); - addElement(e, "creator", cols.get(2)); - addElement(e, "title", cols.get(1)); - addElement(e, "date", cols.get(3)); - addElement(e, "language", cols.get(4)); - addElement(e, "subject", cols.get(5)); - addElement(e, "format", cols.get(6)); - addElement(e, "publisher", cols.get(7)); - } - - public void addElement(Element e, String name, String val) { - Element el = e.getOwnerDocument().createElement("dcvalue"); - e.appendChild(el); - el.setAttribute("element",name); - el.setAttribute("qualifier", "none"); - el.appendChild(e.getOwnerDocument().createTextNode(val)); - } - - public ActionResult importFile(File selectedFile) throws IOException { - Document d = XMLUtil.db.newDocument(); - d.appendChild(d.createElement("items")); - Timer timer = new Timer(); - FileReader fr = new FileReader(selectedFile); - BufferedReader br = new BufferedReader(fr); - TreeMap types = new TreeMap(); - int rowKey = 0; - for(String line=br.readLine(); line!=null; line=br.readLine()){ - Vector cols = parseLine(line, getSeparator()); - String key = nf.format(rowKey++); - DCStats stats = Generator.INSTANCE.create(key); - if (cols.size() == 8) { - stats.setVal(DCStatsItems.Status, status.PASS); - createItems(d, cols); - } else { - stats.setVal(DCStatsItems.Status, status.FAIL); - } - - stats.setColumnVals(cols); - types.put(key, stats); - } - fr.close(); - File f = new File(selectedFile.getParentFile(), selectedFile.getName() + ".xml"); - XMLUtil.serialize(d, f); - - return new ActionResult(selectedFile, selectedFile.getName(), this.toString(), details, types, true, timer.getDuration()); - } -} diff --git a/demo/src/main/resources/edu/georgetown/library/fileAnalyzer/importer/marc.xsl b/demo/src/main/resources/edu/georgetown/library/fileAnalyzer/importer/marc.xsl new file mode 100644 index 0000000..73e92ec --- /dev/null +++ b/demo/src/main/resources/edu/georgetown/library/fileAnalyzer/importer/marc.xsl @@ -0,0 +1,110 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 949i Must be 14 characters. + + + 949i must start with 39020. + + + 949z must be 090. + + + + + + + + + + + + + + + + + + + + + 981b must exist. + + + + + + + + + + 935a must exist and start with '.o'. + + + + + + + + + + diff --git a/doc/WikiScreenShots/BagIt/i1.png b/doc/WikiScreenShots/BagIt/i1.png new file mode 100644 index 0000000..8693166 Binary files /dev/null and b/doc/WikiScreenShots/BagIt/i1.png differ diff --git a/doc/WikiScreenShots/BagIt/i2.png b/doc/WikiScreenShots/BagIt/i2.png new file mode 100644 index 0000000..252baea Binary files /dev/null and b/doc/WikiScreenShots/BagIt/i2.png differ diff --git a/doc/WikiScreenShots/BagIt/i3.png b/doc/WikiScreenShots/BagIt/i3.png new file mode 100644 index 0000000..9ec3d77 Binary files /dev/null and b/doc/WikiScreenShots/BagIt/i3.png differ diff --git a/doc/WikiScreenShots/BagIt/i4.png b/doc/WikiScreenShots/BagIt/i4.png new file mode 100644 index 0000000..1c7c3ae Binary files /dev/null and b/doc/WikiScreenShots/BagIt/i4.png differ diff --git a/doc/WikiScreenShots/Batch/i1.png b/doc/WikiScreenShots/Batch/i1.png new file mode 100644 index 0000000..66f58cb Binary files /dev/null and b/doc/WikiScreenShots/Batch/i1.png differ diff --git a/doc/WikiScreenShots/Batch/i2.png b/doc/WikiScreenShots/Batch/i2.png new file mode 100644 index 0000000..650b116 Binary files /dev/null and b/doc/WikiScreenShots/Batch/i2.png differ diff --git a/doc/WikiScreenShots/Counter/i1.png b/doc/WikiScreenShots/Counter/i1.png new file mode 100644 index 0000000..6e3e4f4 Binary files /dev/null and b/doc/WikiScreenShots/Counter/i1.png differ diff --git a/doc/WikiScreenShots/Counter/i2.png b/doc/WikiScreenShots/Counter/i2.png new file mode 100644 index 0000000..ba0242e Binary files /dev/null and b/doc/WikiScreenShots/Counter/i2.png differ diff --git a/doc/WikiScreenShots/Counter/i3.png b/doc/WikiScreenShots/Counter/i3.png new file mode 100644 index 0000000..0e41ce3 Binary files /dev/null and b/doc/WikiScreenShots/Counter/i3.png differ diff --git a/doc/WikiScreenShots/Counter/i4.png b/doc/WikiScreenShots/Counter/i4.png new file mode 100644 index 0000000..2ddb333 Binary files /dev/null and b/doc/WikiScreenShots/Counter/i4.png differ diff --git a/doc/WikiScreenShots/Derivatives/i1.png b/doc/WikiScreenShots/Derivatives/i1.png new file mode 100644 index 0000000..c73b2c8 Binary files /dev/null and b/doc/WikiScreenShots/Derivatives/i1.png differ diff --git a/doc/WikiScreenShots/Derivatives/i2.png b/doc/WikiScreenShots/Derivatives/i2.png new file mode 100644 index 0000000..5325095 Binary files /dev/null and b/doc/WikiScreenShots/Derivatives/i2.png differ diff --git a/doc/WikiScreenShots/Derivatives/i3.png b/doc/WikiScreenShots/Derivatives/i3.png new file mode 100644 index 0000000..a1be42d Binary files /dev/null and b/doc/WikiScreenShots/Derivatives/i3.png differ diff --git a/doc/WikiScreenShots/Derivatives/i4.png b/doc/WikiScreenShots/Derivatives/i4.png new file mode 100644 index 0000000..3cb0a2b Binary files /dev/null and b/doc/WikiScreenShots/Derivatives/i4.png differ diff --git a/doc/WikiScreenShots/Derivatives/i5.png b/doc/WikiScreenShots/Derivatives/i5.png new file mode 100644 index 0000000..c9a0b7f Binary files /dev/null and b/doc/WikiScreenShots/Derivatives/i5.png differ diff --git a/doc/WikiScreenShots/Derivatives/i6.png b/doc/WikiScreenShots/Derivatives/i6.png new file mode 100644 index 0000000..94a4995 Binary files /dev/null and b/doc/WikiScreenShots/Derivatives/i6.png differ diff --git a/doc/WikiScreenShots/Derivatives/i7.png b/doc/WikiScreenShots/Derivatives/i7.png new file mode 100644 index 0000000..f8571a1 Binary files /dev/null and b/doc/WikiScreenShots/Derivatives/i7.png differ diff --git a/doc/WikiScreenShots/Derivatives/i8.png b/doc/WikiScreenShots/Derivatives/i8.png new file mode 100644 index 0000000..56b09c2 Binary files /dev/null and b/doc/WikiScreenShots/Derivatives/i8.png differ diff --git a/doc/WikiScreenShots/Derivatives/i9.png b/doc/WikiScreenShots/Derivatives/i9.png new file mode 100644 index 0000000..962c499 Binary files /dev/null and b/doc/WikiScreenShots/Derivatives/i9.png differ diff --git a/doc/WikiScreenShots/VendorCheck/i1.png b/doc/WikiScreenShots/VendorCheck/i1.png new file mode 100644 index 0000000..660d3f9 Binary files /dev/null and b/doc/WikiScreenShots/VendorCheck/i1.png differ diff --git a/doc/WikiScreenShots/VendorCheck/i2.png b/doc/WikiScreenShots/VendorCheck/i2.png new file mode 100644 index 0000000..ca39dc2 Binary files /dev/null and b/doc/WikiScreenShots/VendorCheck/i2.png differ diff --git a/doc/WikiScreenShots/VendorCheck/i3.png b/doc/WikiScreenShots/VendorCheck/i3.png new file mode 100644 index 0000000..47fda66 Binary files /dev/null and b/doc/WikiScreenShots/VendorCheck/i3.png differ diff --git a/doc/WikiScreenShots/VendorCheck/i4.png b/doc/WikiScreenShots/VendorCheck/i4.png new file mode 100644 index 0000000..6189391 Binary files /dev/null and b/doc/WikiScreenShots/VendorCheck/i4.png differ diff --git a/doc/WikiScreenShots/VendorCheck/i5.png b/doc/WikiScreenShots/VendorCheck/i5.png new file mode 100644 index 0000000..27047da Binary files /dev/null and b/doc/WikiScreenShots/VendorCheck/i5.png differ diff --git a/doc/WikiScreenShots/VendorCheck/i6.png b/doc/WikiScreenShots/VendorCheck/i6.png new file mode 100644 index 0000000..c3db40d Binary files /dev/null and b/doc/WikiScreenShots/VendorCheck/i6.png differ diff --git a/dspace/pom.xml b/dspace/pom.xml index 4a6e373..0c8e876 100644 --- a/dspace/pom.xml +++ b/dspace/pom.xml @@ -30,6 +30,11 @@ + + + src/main/resources + + org.apache.maven.plugins diff --git a/dspace/src/main/edu/georgetown/library/fileAnalyzer/DSpaceFileAnalyzer.java b/dspace/src/main/edu/georgetown/library/fileAnalyzer/DSpaceFileAnalyzer.java index d08b76f..10df321 100644 --- a/dspace/src/main/edu/georgetown/library/fileAnalyzer/DSpaceFileAnalyzer.java +++ b/dspace/src/main/edu/georgetown/library/fileAnalyzer/DSpaceFileAnalyzer.java @@ -1,36 +1,36 @@ -package edu.georgetown.library.fileAnalyzer; - -import java.io.File; - -import edu.georgetown.library.fileAnalyzer.filetest.DSpaceActionRegistry; -import edu.georgetown.library.fileAnalyzer.importer.DSpaceImporterRegistry; - -import gov.nara.nwts.ftapp.filetest.ActionRegistry; -import gov.nara.nwts.ftapp.gui.DirectoryTable; -import gov.nara.nwts.ftapp.importer.ImporterRegistry; -/** - * Driver for the File Analyzer GUI loading image-specific rules but not NARA specific rules. - * @author TBrady - * - */ -public class DSpaceFileAnalyzer extends DirectoryTable { - - public DSpaceFileAnalyzer(File f, boolean modifyAllowed) { - super(f, modifyAllowed); - } - - @Override protected ActionRegistry getActionRegistry() { - return new DSpaceActionRegistry(this, true); - } - - @Override protected ImporterRegistry getImporterRegistry() { - return new DSpaceImporterRegistry(this); - } - public static void main(String[] args) { - if (args.length > 0) - new DSpaceFileAnalyzer(new File(args[0]), false); - else - new DSpaceFileAnalyzer(null, false); - } - -} +package edu.georgetown.library.fileAnalyzer; + +import java.io.File; + +import edu.georgetown.library.fileAnalyzer.filetest.DSpaceActionRegistry; +import edu.georgetown.library.fileAnalyzer.importer.DSpaceImporterRegistry; + +import gov.nara.nwts.ftapp.filetest.ActionRegistry; +import gov.nara.nwts.ftapp.gui.DirectoryTable; +import gov.nara.nwts.ftapp.importer.ImporterRegistry; +/** + * Driver for the File Analyzer GUI loading image-specific rules but not NARA specific rules. + * @author TBrady + * + */ +public class DSpaceFileAnalyzer extends DirectoryTable { + + public DSpaceFileAnalyzer(File f, boolean modifyAllowed) { + super(f, modifyAllowed); + } + + @Override protected ActionRegistry getActionRegistry() { + return new DSpaceActionRegistry(this, modifyAllowed); + } + + @Override protected ImporterRegistry getImporterRegistry() { + return new DSpaceImporterRegistry(this); + } + public static void main(String[] args) { + if (args.length > 0) + new DSpaceFileAnalyzer(new File(args[0]), false); + else + new DSpaceFileAnalyzer(null, false); + } + +} diff --git a/dspace/src/main/edu/georgetown/library/fileAnalyzer/filetest/DSpaceActionRegistry.java b/dspace/src/main/edu/georgetown/library/fileAnalyzer/filetest/DSpaceActionRegistry.java index fd605cf..eb370d8 100644 --- a/dspace/src/main/edu/georgetown/library/fileAnalyzer/filetest/DSpaceActionRegistry.java +++ b/dspace/src/main/edu/georgetown/library/fileAnalyzer/filetest/DSpaceActionRegistry.java @@ -1,21 +1,24 @@ -package edu.georgetown.library.fileAnalyzer.filetest; - -import gov.nara.nwts.ftapp.FTDriver; -import gov.nara.nwts.ftapp.filetest.ActionRegistry; - -/** - * Initialize the File Analzyer with generic image processing rules (but not NARA specific business rules) - * @author TBrady - * - */ -public class DSpaceActionRegistry extends ActionRegistry { - - private static final long serialVersionUID = 1L; - - public DSpaceActionRegistry(FTDriver dt, boolean modifyAllowed) { - super(dt, modifyAllowed); - add(new IngestInventory(dt)); - add(new IngestValidate(dt)); - } - -} +package edu.georgetown.library.fileAnalyzer.filetest; + +import gov.nara.nwts.ftapp.FTDriver; +import gov.nara.nwts.ftapp.filetest.ActionRegistry; + +/** + * Initialize the File Analzyer with generic image processing rules (but not NARA specific business rules) + * @author TBrady + * + */ +public class DSpaceActionRegistry extends ActionRegistry { + + private static final long serialVersionUID = 1L; + + public DSpaceActionRegistry(FTDriver dt, boolean modifyAllowed) { + super(dt, modifyAllowed); + add(new IngestInventory(dt)); + add(new IngestValidate(dt)); + add(new DSpaceDAT(dt)); + add(new ProquestToIngest(dt)); + add(new ProquestQC(dt)); + } + +} diff --git a/dspace/src/main/edu/georgetown/library/fileAnalyzer/filetest/DSpaceDAT.java b/dspace/src/main/edu/georgetown/library/fileAnalyzer/filetest/DSpaceDAT.java new file mode 100644 index 0000000..6180613 --- /dev/null +++ b/dspace/src/main/edu/georgetown/library/fileAnalyzer/filetest/DSpaceDAT.java @@ -0,0 +1,134 @@ +package edu.georgetown.library.fileAnalyzer.filetest; + +import gov.nara.nwts.ftapp.FTDriver; +import gov.nara.nwts.ftapp.filetest.DefaultFileTest; +import gov.nara.nwts.ftapp.filter.DefaultFileTestFilter; +import gov.nara.nwts.ftapp.stats.Stats; +import gov.nara.nwts.ftapp.stats.StatsGenerator; +import gov.nara.nwts.ftapp.stats.StatsItem; +import gov.nara.nwts.ftapp.stats.StatsItemConfig; +import gov.nara.nwts.ftapp.stats.StatsItemEnum; + +import java.io.BufferedReader; +import java.io.File; +import java.io.FileReader; +import java.io.IOException; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +/** + * Extract all metadata fields from a TIF or JPG using categorized tag defintions. + * @author TBrady + * + */ +public class DSpaceDAT extends DefaultFileTest { + + public static enum DSpaceDATStatsItems implements StatsItemEnum { + FileName(StatsItem.makeStringStatsItem("File name", 220)), + Start(StatsItem.makeStringStatsItem("Start", 100)), + End(StatsItem.makeStringStatsItem("End", 100)), + Submit(StatsItem.makeIntStatsItem("Submissions")), + AllItems(StatsItem.makeIntStatsItem("All Items")), + Search(StatsItem.makeIntStatsItem("Search")), + Browse(StatsItem.makeIntStatsItem("Browse")), + ViewHome(StatsItem.makeIntStatsItem("View Home")), + ViewItem(StatsItem.makeIntStatsItem("View Item")), + AvgViews(StatsItem.makeIntStatsItem("Avg Views")), + ; + + StatsItem si; + DSpaceDATStatsItems(StatsItem si) {this.si=si;} + public StatsItem si() {return si;} + } + + public static enum Generator implements StatsGenerator { + INSTANCE; + public Stats create(String key) {return new Stats(details, key);} + } + public static StatsItemConfig details = StatsItemConfig.create(DSpaceDATStatsItems.class); + + public DSpaceDAT(FTDriver dt) { + super(dt); + } + + public String toString() { + return "DSpaceDAT"; + } + public String getKey(File f) { + return f.getName(); + } + + public String getShortName(){return "DSpace DAT";} + + public Object fileTest(File f) { + Stats stats = getStats(f); + Pattern p1 = Pattern.compile("^(start_date|end_date)=(\\d\\d)/(\\d\\d)/(\\d\\d\\d\\d)$"); + Pattern p2 = Pattern.compile("^(archive.All Items|action.search|action.browse|action.submission_complete|action.view_community_list|action.view_item|avg_item_views)=(\\d+)$"); + try { + BufferedReader br = new BufferedReader(new FileReader(f)); + for(String line = br.readLine(); line!=null; line=br.readLine()) { + Matcher m=p1.matcher(line); + if (m.matches()) { + String lkey = m.group(1); + String val = m.group(4) + "-" + m.group(3) + "-" +m.group(2); + + if (lkey.equals("start_date")) { + stats.setVal(DSpaceDATStatsItems.Start, val); + } else if (lkey.equals("end_date")) { + stats.setVal(DSpaceDATStatsItems.End, val); + } + } + m=p2.matcher(line); + if (m.matches()) { + String lkey = m.group(1); + int val = Integer.parseInt(m.group(2)); + + if (lkey.equals("archive.All Items")) { + stats.setVal(DSpaceDATStatsItems.AllItems, val); + } else if (lkey.equals("action.search")) { + stats.setVal(DSpaceDATStatsItems.Search, val); + } else if (lkey.equals("action.browse")) { + stats.setVal(DSpaceDATStatsItems.Browse, val); + } else if (lkey.equals("action.submission_complete")) { + stats.setVal(DSpaceDATStatsItems.Submit, val); + } else if (lkey.equals("action.view_community_list")) { + stats.setVal(DSpaceDATStatsItems.ViewHome, val); + } else if (lkey.equals("action.view_item")) { + stats.setVal(DSpaceDATStatsItems.ViewItem, val); + } else if (lkey.equals("avg_item_views")) { + stats.setVal(DSpaceDATStatsItems.AvgViews, val); + } + } + } + br.close(); + } catch (IOException e) { + e.printStackTrace(); + } + return ""; + } + + public Stats createStats(String key){ + return Generator.INSTANCE.create(key); + } + public StatsItemConfig getStatsDetails() { + return details; + } + + public String getDescription() { + return "Analyzes DSpace DAT stats files"; + } + + public class DATFilter extends DefaultFileTestFilter { + public String getSuffix() { + return ".dat"; + } + public String getPrefix() { + return "dspace-log-monthly"; + } + + } + public void initFilters() { + filters.add(new DATFilter()); + } + +} diff --git a/dspace/src/main/edu/georgetown/library/fileAnalyzer/filetest/IngestInventory.java b/dspace/src/main/edu/georgetown/library/fileAnalyzer/filetest/IngestInventory.java index 40e7ef1..12513cc 100644 --- a/dspace/src/main/edu/georgetown/library/fileAnalyzer/filetest/IngestInventory.java +++ b/dspace/src/main/edu/georgetown/library/fileAnalyzer/filetest/IngestInventory.java @@ -8,6 +8,7 @@ import gov.nara.nwts.ftapp.FTDriver; import gov.nara.nwts.ftapp.filetest.DefaultFileTest; import gov.nara.nwts.ftapp.filetest.FileTest; +import gov.nara.nwts.ftapp.filter.DefaultFileTestFilter; import gov.nara.nwts.ftapp.filter.Jp2FileTestFilter; import gov.nara.nwts.ftapp.filter.JpegFileTestFilter; import gov.nara.nwts.ftapp.filter.PdfFileTestFilter; @@ -22,10 +23,11 @@ public class IngestInventory extends DefaultFileTest { - private static enum InventoryStatsItems implements StatsItemEnum { - Key(StatsItem.makeStringStatsItem("Key")), - File(StatsItem.makeStringStatsItem("File")), - ThumbFile(StatsItem.makeStringStatsItem("Thumb File")) + public static enum InventoryStatsItems implements StatsItemEnum { + Key(StatsItem.makeStringStatsItem("Key/Folder Name")), + File(StatsItem.makeStringStatsItem("Item File")), + ThumbFile(StatsItem.makeStringStatsItem("Thumb File")), + LicenseFile(StatsItem.makeStringStatsItem("License File")) ; StatsItem si; @@ -72,30 +74,30 @@ public Object compute(File f, FileTest fileTest) { "dc.relation.isreplacedby", "dc.relation.uri", "dc.rights", "dc.rights.uri", "dc.source", "dc.source.uri", "dc.subject", "dc.subject.ddc", "dc.subject.lcc", "dc.subject.lcsh", - "dc.subject.mesh", "dc.title", "dc.title.alternative", "dc.type" }; + "dc.subject.mesh", "dc.title", "dc.title.alternative", "dc.type"}; + + public String[] getMETA() {return META;} + NumberFormat nf; + public static final int COUNT = 8; public IngestInventory(FTDriver dt) { super(dt); nf = NumberFormat.getNumberInstance(); nf.setMinimumIntegerDigits(5); nf.setGroupingUsed(false); - this.ftprops.add(new FTPropEnum(this, "metadata 1", "m1", - "field to be populated for each item found", META, "dc.title")); - this.ftprops.add(new FTPropEnum(this, "metadata 2", "m2", - "field to be populated for each item found", META, "dc.date.creator")); - this.ftprops.add(new FTPropEnum(this, "metadata 3", "m3", - "field to be populated for each item found", META, "NA")); - this.ftprops.add(new FTPropEnum(this, "metadata 4", "m4", - "field to be populated for each item found", META, "NA")); - this.ftprops.add(new FTPropEnum(this, "metadata 5", "m5", - "field to be populated for each item found", META, "NA")); - this.ftprops.add(new FTPropEnum(this, "metadata 6", "m6", - "field to be populated for each item found", META, "NA")); - this.ftprops.add(new FTPropEnum(this, "metadata 7", "m7", - "field to be populated for each item found", META, "NA")); - this.ftprops.add(new FTPropEnum(this, "metadata 8", "m8", - "field to be populated for each item found", META, "NA")); + for(int i=1; i<=1; i++) { + this.ftprops.add(new FTPropEnum(dt, this.getClass().getName(), "metadata "+i, "m"+i, + "field to be populated for each item found", getMETA(), "dc.title")); + } + for(int i=2; i<=2; i++) { + this.ftprops.add(new FTPropEnum(dt, this.getClass().getName(), "metadata "+i, "m"+i, + "field to be populated for each item found", getMETA(), "dc.date.created")); + } + for(int i=3; i<=COUNT; i++) { + this.ftprops.add(new FTPropEnum(dt, this.getClass().getName(), "metadata "+i, "m"+i, + "field to be populated for each item found", getMETA(), "NA")); + } } public String getDescription() { @@ -136,6 +138,7 @@ public void init() { if (prop.getValue().equals("NA")) continue; details.addStatsItem(prop.getValue(), StatsItem.makeStringStatsItem(prop.getValue().toString())); } + count = 0; } public boolean isTestable(File f) { @@ -153,6 +156,7 @@ public void initFilters() { filters.add(new TiffFileTestFilter()); filters.add(new JpegFileTestFilter()); filters.add(new Jp2FileTestFilter()); + filters.add(new DefaultFileTestFilter()); } public Stats createStats(String key){ diff --git a/dspace/src/main/edu/georgetown/library/fileAnalyzer/filetest/IngestValidate.java b/dspace/src/main/edu/georgetown/library/fileAnalyzer/filetest/IngestValidate.java index ffe8a12..c671c14 100644 --- a/dspace/src/main/edu/georgetown/library/fileAnalyzer/filetest/IngestValidate.java +++ b/dspace/src/main/edu/georgetown/library/fileAnalyzer/filetest/IngestValidate.java @@ -1,298 +1,413 @@ -package edu.georgetown.library.fileAnalyzer.filetest; - -import gov.nara.nwts.ftapp.FTDriver; -import gov.nara.nwts.ftapp.filetest.DefaultFileTest; -import gov.nara.nwts.ftapp.stats.Stats; -import gov.nara.nwts.ftapp.stats.StatsGenerator; -import gov.nara.nwts.ftapp.stats.StatsItem; -import gov.nara.nwts.ftapp.stats.StatsItemConfig; -import gov.nara.nwts.ftapp.stats.StatsItemEnum; -import gov.nara.nwts.ftapp.util.XMLUtil; - -import edu.georgetown.library.fileAnalyzer.filetest.IngestValidate.Generator.DSpaceStats; - -import java.io.BufferedReader; -import java.io.File; -import java.io.FileNotFoundException; -import java.io.FileReader; -import java.io.IOException; -import java.util.ArrayList; -import java.util.regex.Matcher; -import java.util.regex.Pattern; - -import org.w3c.dom.Document; -import org.w3c.dom.Element; -import org.w3c.dom.NodeList; -import org.xml.sax.SAXException; - -/** - * Extract all metadata fields from a TIF or JPG using categorized tag defintions. - * @author TBrady - * - */ -public class IngestValidate extends DefaultFileTest { - - public enum OVERALL_STAT { - VALID, - INVALID, - EXTRA - } - public enum CONTENTS { - ORIGINAL, - THUMBNAIL - } - public enum CONTENTS_STAT { - MISSING, - VALID, - OTHER, - TAB_MISSING, - UNLISTED, - INVALID - } - public enum DC_STAT { - MISSING, - VALID, - INVALID - } - public enum DATE_STAT { - VALID, - INVALID - } - - public static enum DSpaceStatsItems implements StatsItemEnum { - ItemFolder(StatsItem.makeStringStatsItem("Item Folder", 200)), - OverallStat(StatsItem.makeEnumStatsItem(OVERALL_STAT.class, "Status", OVERALL_STAT.INVALID).setWidth(80)), - NumFiles(StatsItem.makeIntStatsItem("Num Items").setWidth(80)), - ContentsStat(StatsItem.makeEnumStatsItem(CONTENTS_STAT.class, "Contents File", CONTENTS_STAT.MISSING).setWidth(80)), - ContentFileCount(StatsItem.makeIntStatsItem("Num ContentF Files").setWidth(80)), - DublinCoreStat(StatsItem.makeEnumStatsItem(DC_STAT.class, "Dublin Core File", DC_STAT.MISSING).setWidth(80)), - ItemTitle(StatsItem.makeStringStatsItem("Item Title", 150)), - Author(StatsItem.makeStringStatsItem("Author", 150)), - Date(StatsItem.makeStringStatsItem("Date", 80)), - Language(StatsItem.makeStringStatsItem("Language", 80)), - Subject(StatsItem.makeStringStatsItem("Subject", 150)), - Format(StatsItem.makeStringStatsItem("Format", 150)), - Publisher(StatsItem.makeStringStatsItem("Publisher", 150)), - PrimaryBitstream(StatsItem.makeStringStatsItem("PrimaryBitstream", 150)), - Thumbnail(StatsItem.makeStringStatsItem("Thumbnail", 150)), - License(StatsItem.makeStringStatsItem("License", 150)), - Text(StatsItem.makeStringStatsItem("TextBitstream", 150)), - Other(StatsItem.makeStringStatsItem("OtherBitstream", 150)), - ; - - StatsItem si; - DSpaceStatsItems(StatsItem si) {this.si=si;} - public StatsItem si() {return si;} - } - - public static enum Generator implements StatsGenerator { - INSTANCE; - public class DSpaceStats extends Stats { - - public DSpaceStats(String key) { - super(details, key); - } - - public void setInfo(DSpaceInfo info) { - setVal(DSpaceStatsItems.OverallStat, info.overall_stat); - setVal(DSpaceStatsItems.NumFiles, info.files.length); - setVal(DSpaceStatsItems.ContentsStat, info.contents_stat); - setVal(DSpaceStatsItems.ContentFileCount, info.contentsList.size()); - setVal(DSpaceStatsItems.DublinCoreStat, info.dc_stat); - setVal(DSpaceStatsItems.ItemTitle, info.title); - setVal(DSpaceStatsItems.Author, info.author); - setVal(DSpaceStatsItems.Date, info.date); - setVal(DSpaceStatsItems.Language, info.language); - setVal(DSpaceStatsItems.Subject,info.subject); - setVal(DSpaceStatsItems.Format,info.format); - setVal(DSpaceStatsItems.Publisher,info.publisher); - setVal(DSpaceStatsItems.PrimaryBitstream,info.primary); - setVal(DSpaceStatsItems.Thumbnail,info.thumbnail); - setVal(DSpaceStatsItems.License,info.license); - setVal(DSpaceStatsItems.Text,info.text); - setVal(DSpaceStatsItems.Other,info.other); - - } - - } - public DSpaceStats create(String key) {return new DSpaceStats(key);} - } - public static StatsItemConfig details = StatsItemConfig.create(DSpaceStatsItems.class); - - public class DSpaceInfo { - public File contents = null; - public File dc = null; - public File[] files; - public ArrayList contentsList; - - public Document d; - - public CONTENTS_STAT contents_stat = CONTENTS_STAT.MISSING; - public DC_STAT dc_stat = DC_STAT.MISSING; - public OVERALL_STAT overall_stat; - - public String title =""; - public String author =""; - public String date =""; - public String language =""; - public String subject =""; - public String format =""; - public String publisher =""; - public String primary =""; - public String thumbnail =""; - public String license =""; - public String text =""; - public String other =""; - - public DSpaceInfo(File f) { - files = f.listFiles(); - contentsList = new ArrayList(); - - for(File file : files) { - if (file.getName().equals("contents")) { - contents = file; - contents_stat = CONTENTS_STAT.VALID; - readContents(file); - } else if (file.getName().equals("dublin_core.xml")) { - dc = file; - try { - d = XMLUtil.db.parse(dc); - dc_stat = DC_STAT.VALID; - Element root = d.getDocumentElement(); - title = getText(root, "title"); - author = getText(root, "creator"); - date = getText(root, "date"); - language = getText(root, "language"); - subject = getText(root, "subject"); - format = getText(root, "format"); - publisher = getText(root, "publisher"); - } catch (SAXException e) { - dc_stat = DC_STAT.INVALID; - e.printStackTrace(); - } catch (IOException e) { - dc_stat = DC_STAT.INVALID; - e.printStackTrace(); - } - } - } - for(File file : files) { - String name = file.getName(); - if (name.equals("contents")) continue; - if (name.equals("dublin_core.xml")) continue; - if (name.equals("Thumbs.db")) continue; - boolean found = false; - for(String s: contentsList) { - if (name.equals(s)) found = true; - } - if (!found) { - if (contents_stat == CONTENTS_STAT.VALID) { - contents_stat = CONTENTS_STAT.UNLISTED; - } - other += name + "; "; - } - } - - if (contents_stat == CONTENTS_STAT.VALID && dc_stat == DC_STAT.VALID) { - if (license.equals("") && text.equals("")) { - overall_stat = OVERALL_STAT.VALID; - } else { - overall_stat = OVERALL_STAT.EXTRA; - } - } else if (contents_stat == CONTENTS_STAT.OTHER && dc_stat == DC_STAT.VALID) { - overall_stat = OVERALL_STAT.EXTRA; - } else { - overall_stat = OVERALL_STAT.INVALID; - } - - } - public String getText(Element root, String tag) { - NodeList nl = root.getElementsByTagName("dcvalue"); - for(int i=0; ivals = info.metadata.get(tag); + String sep = ""; + for(String val: vals) { + appendKeyVal(details.getByKey(tag), sep+val); + sep=",\n"; + } + } + } + + } + public DSpaceStats create(String key) {return new DSpaceStats(key);} + } + public static StatsItemConfig details = StatsItemConfig.create(DSpaceStatsItems.class); + public void init() { + details = StatsItemConfig.create(DSpaceStatsItems.class); + for(FTProp prop: ftprops) { + if (prop.getValue().equals("NA")) continue; + details.addStatsItem(prop.getValue(), StatsItem.makeStringStatsItem(prop.getValue().toString())); + } + } + + public class DSpaceInfo { + public File contents = null; + public File dc = null; + public File[] files; + public int fileCount = 0; + public ArrayList contentsList; + public HashMap> metadata; + + public CONTENTS_STAT contents_stat = CONTENTS_STAT.MISSING; + public DC_STAT dc_stat = DC_STAT.MISSING; + public OTHER_STAT other_stat = OTHER_STAT.NA; + public String otherSchemas = ""; + public THUMBNAIL_STAT thumbnail_stat = THUMBNAIL_STAT.NA; + + public String primary =""; + public String thumbnail =""; + public String license =""; + public String text =""; + public String other =""; + + public DSpaceInfo(File f) { + files = f.listFiles(); + contentsList = new ArrayList(); + metadata = new HashMap>(); + + for(File file : files) { + if (!file.isDirectory()) fileCount++; + if (file.getName().equals("contents")) { + contents = file; + contents_stat = CONTENTS_STAT.VALID; + readContents(file); + } else if (file.getName().equals("dublin_core.xml")) { + dc = file; + try { + Document d = XMLUtil.db.parse(dc); + if (d!=null) { + loadMetadata(d); + } + dc_stat = DC_STAT.VALID; + } catch (SAXException e) { + dc_stat = DC_STAT.INVALID; + } catch (IOException e) { + dc_stat = DC_STAT.INVALID; + } + } else { + Matcher m = pOther.matcher(file.getName()); + if (!m.matches()) continue; + otherSchemas += m.group(1) + " "; + if (m.matches()) { + try { + Document d = XMLUtil.db.parse(file); + if (d!=null) { + loadMetadata(d); + } + if (other_stat != OTHER_STAT.INVALID) { + other_stat = OTHER_STAT.VALID; + } + } catch (SAXException e) { + other_stat = OTHER_STAT.INVALID; + } catch (IOException e) { + other_stat = OTHER_STAT.INVALID; + } + } + } + } + for(File file : files) { + if (isExpectedFile(file)) continue; + String name = file.getName(); + boolean found = false; + for(String s: contentsList) { + if (name.equals(s)) found = true; + } + if (!found) { + if (contents_stat == CONTENTS_STAT.VALID) { + contents_stat = CONTENTS_STAT.UNLISTED; + } + other += name + "; "; + } + } + + if (thumbnail.isEmpty()) { + } else if (thumbnail.equals(primary + ".jpg")) { + thumbnail_stat = THUMBNAIL_STAT.VALID; + } else { + thumbnail_stat = THUMBNAIL_STAT.INVALID_FILENAME; + } + + } + + + public OVERALL_STAT getOverallStat() { + if (fileCount == 0) + return OVERALL_STAT.SKIP; + if (dc_stat != DC_STAT.VALID) + return OVERALL_STAT.INVALID; + if (other_stat == OTHER_STAT.INVALID) + return OVERALL_STAT.INVALID; + if (thumbnail_stat == THUMBNAIL_STAT.INVALID_FILENAME) + return OVERALL_STAT.INVALID; + if (contents_stat == CONTENTS_STAT.VALID) + return OVERALL_STAT.VALID; + else if (contents_stat == CONTENTS_STAT.OTHER) + return OVERALL_STAT.EXTRA; + return OVERALL_STAT.INVALID; + } + + public void loadMetadata(Document d) { + String schema = d.getDocumentElement().getAttribute("schema"); + NodeList nl = d.getDocumentElement().getElementsByTagName("dcvalue"); + for(int i=0; i vals = metadata.get(tag); + if (vals == null) { + vals = new ArrayList(); + metadata.put(tag, vals); + } + vals.add(text); + } + } + + public String getMetadataKey(String s, String e, String q) { + StringBuffer buf = new StringBuffer(); + buf.append(s.isEmpty() ? "dc" : s); + buf.append("."); + buf.append(e); + + if (q == null) { + } else if (q.isEmpty()) { + } else if (q.equals("none")) { + } else { + buf.append("."); + buf.append(q); + } + return buf.toString(); + } + + public String getMetadataKey(String s, String e) { + return getMetadataKey(s, e, ""); + } + + + public String getMetadataKey(String s, Element elem) { + return getMetadataKey(s, elem.getAttribute("element"), elem.getAttribute("qualifier")); + } + + public String getTextVal(String key, String def) { + ArrayList vals = metadata.get(key); + if (vals == null) return def; + if (vals.size() == 1) return vals.get(0); + return def; + } + + public ArrayList getTextVals(String key) { + ArrayList vals = metadata.get(key); + if (vals == null) return new ArrayList(0); + return vals; + } + + public void readContents(File f) { + try { + BufferedReader br = new BufferedReader(new FileReader(f)); + for(String s=br.readLine(); s!=null;s=br.readLine()) { + if (s.trim().equals("")) continue; + Pattern p = Pattern.compile("^([^\\t]+)\\t(.+)\\s*$"); + Matcher m = p.matcher(s); + if (m.matches()) { + String fname = m.group(1); + contentsList.add(fname); + String type = m.group(2).trim(); + if (type.equals("bundle:ORIGINAL")) { + primary = fname; + } else if (type.equals("bundle:THUMBNAIL")) { + thumbnail = fname; + } else if (type.equals("bundle:LICENSE")) { + license = fname; + } else if (type.equals("bundle:TEXT")) { + text = fname; + } else { + other += type + "; "; + contents_stat = CONTENTS_STAT.OTHER; + } + } else if (s.indexOf("\t", 0) == -1) { + contents_stat = CONTENTS_STAT.TAB_MISSING; + other = "[" + s + "]"; + } else { + contents_stat = CONTENTS_STAT.INVALID; + other = "[" + s + "]"; + } + } + } catch (FileNotFoundException e) { + e.printStackTrace(); + } catch (IOException e) { + e.printStackTrace(); + } + + } + + } + + public String[] getMETA() {return IngestInventory.META;} + public static final int COUNT = 8; + + + long counter = 1000000; + public IngestValidate(FTDriver dt) { + super(dt); + for(int i=1; i<=1; i++) { + this.ftprops.add(new FTPropEnum(dt, this.getClass().getName(), "metadata "+i, "m"+i, + "field to display for each item found", getMETA(), "dc.title")); + } + for(int i=2; i<=2; i++) { + this.ftprops.add(new FTPropEnum(dt, this.getClass().getName(), "metadata "+i, "m"+i, + "field to display for each item found", getMETA(), "dc.date.created")); + } + for(int i=3; i<=COUNT; i++) { + this.ftprops.add(new FTPropEnum(dt, this.getClass().getName(), "metadata "+i, "m"+i, + "field to display for each item found", getMETA(), "NA")); + } + } + + public String toString() { + return "Ingest Validate"; + } + public String getKey(File f) { + return f.getName(); + } + + public String getShortName(){return "Ingest Validate";} + + public boolean isExpectedFile(File file) { + String name = file.getName(); + if (name.equals("contents")) return true; + if (name.equals("dublin_core.xml")) return true; + if (name.equals("Thumbs.db")) return true; + if (pOther.matcher(name).matches()) return true; + return false; + } + public Object fileTest(File f) { + DSpaceStats stats = (DSpaceStats)this.getStats(getKey(f)); + DSpaceInfo info = new DSpaceInfo(f); + stats.setInfo(info); + if (info.contents_stat != CONTENTS_STAT.VALID) return info.contents_stat; + if (info.dc_stat != DC_STAT.VALID) return info.dc_stat; + return info.files.length; + } + + public DSpaceStats createStats(String key){ + return Generator.INSTANCE.create(key); + } + public StatsItemConfig getStatsDetails() { + return details; + } + + public boolean isTestFiles() {return false;} + public boolean isTestDirectory() {return true;} + + public String getDescription() { + return "Analyzes an ingest folder to be uploaded into DSpace.\n\n" + + "This rule should be pointed at a folder of subfolders where each sub-folder conforms to DSpace's ingest folder conventions.\n" + + "- An item file should be present\n" + + "- A thumbnail file may be present. If present, it must match the item file name + '.jpg'\n" + + "- An optional license file or text file may be present\n" + + "- A dublin core metadata file 'dublin_core.xml' is required. This file will be scanned for required metadata.\n" + + "- A contents file 'contents' is required. This file must contain a list of all other required files.\n" + + ""; + } + +} diff --git a/dspace/src/main/edu/georgetown/library/fileAnalyzer/filetest/ProquestQC.java b/dspace/src/main/edu/georgetown/library/fileAnalyzer/filetest/ProquestQC.java new file mode 100644 index 0000000..a923292 --- /dev/null +++ b/dspace/src/main/edu/georgetown/library/fileAnalyzer/filetest/ProquestQC.java @@ -0,0 +1,359 @@ +package edu.georgetown.library.fileAnalyzer.filetest; + +import java.io.File; +import java.io.IOException; +import java.text.NumberFormat; +import java.util.Iterator; +import java.util.TreeSet; +import java.util.Vector; + +import javax.xml.namespace.NamespaceContext; +import javax.xml.namespace.QName; +import javax.xml.xpath.XPath; +import javax.xml.xpath.XPathConstants; +import javax.xml.xpath.XPathExpression; +import javax.xml.xpath.XPathExpressionException; + +import org.w3c.dom.Document; +import org.w3c.dom.Element; +import org.w3c.dom.Node; +import org.w3c.dom.NodeList; +import org.w3c.dom.Text; +import org.xml.sax.SAXException; + +import gov.nara.nwts.ftapp.FTDriver; +import gov.nara.nwts.ftapp.filetest.DefaultFileTest; +import gov.nara.nwts.ftapp.filter.XmlFilter; +import gov.nara.nwts.ftapp.stats.Stats; +import gov.nara.nwts.ftapp.stats.StatsGenerator; +import gov.nara.nwts.ftapp.stats.StatsItem; +import gov.nara.nwts.ftapp.stats.StatsItemConfig; +import gov.nara.nwts.ftapp.stats.StatsItemEnum; +import gov.nara.nwts.ftapp.util.XMLUtil; + +public class ProquestQC extends DefaultFileTest { + public static NumberFormat nf = NumberFormat.getNumberInstance(); + static { + nf.setMinimumIntegerDigits(4); + nf.setGroupingUsed(false); + } + + public enum OVERALL_STAT { + INIT, + PASS, + FAIL, + REVIEW + } + + public enum TYPE { + ProQuest, + DublinCore, + MarcXMLCollection, + MarcXMLRecord, + Other + } + + public static enum ProquestQCStatsItems implements StatsItemEnum { + Key(StatsItem.makeStringStatsItem("Path").setWidth(300)), + File(StatsItem.makeStringStatsItem("File").setWidth(200)), + OverallStat(StatsItem.makeEnumStatsItem(OVERALL_STAT.class, "Status", OVERALL_STAT.INIT).setWidth(40)), + Type(StatsItem.makeEnumStatsItem(TYPE.class, "Status", TYPE.Other).setWidth(120)), + Creator(StatsItem.makeStringStatsItem("Creator",200)), + Suffix(StatsItem.makeStringStatsItem("Suffix",50)), + Title(StatsItem.makeStringStatsItem("Title",350)), + TitleAlt(StatsItem.makeStringStatsItem("Alt Title",150)), + URL(StatsItem.makeStringStatsItem("URL",250)), + Created(StatsItem.makeStringStatsItem("Created")), + Pages(StatsItem.makeStringStatsItem("Pages")), + Degree(StatsItem.makeStringStatsItem("Degree")), + College(StatsItem.makeStringStatsItem("College")), + Dept(StatsItem.makeStringStatsItem("Dept")), + Advisor(StatsItem.makeStringStatsItem("Advisor")), + CatCode(StatsItem.makeStringStatsItem("CatCode")), + CatStr(StatsItem.makeStringStatsItem("CatString")), + CatDesc(StatsItem.makeStringStatsItem("CatDesc")), + Keyword(StatsItem.makeStringStatsItem("Keyword")), + Abstract(StatsItem.makeStringStatsItem("Abstract")), + ; + + StatsItem si; + ProquestQCStatsItems(StatsItem si) {this.si=si;} + public StatsItem si() {return si;} + } + + public static enum Generator implements StatsGenerator { + INSTANCE; + public Stats create(String key) { + return new Stats(details, key); + } + } + static StatsItemConfig details = StatsItemConfig.create(ProquestQCStatsItems.class); + + public enum XPATH { + pq_title("//DISS_title"), + dc_title("//dcvalue[@element='title']"), + m_title(".//marc:datafield[@tag='245']/marc:subfield[@code='a']"), + + dc_url("//dcvalue[@element='identifier'][@qualifier='uri']"), + m_url(".//marc:datafield[@tag='856']/marc:subfield[@code='u']"), + + pq_atitle("//DISS_supp_title"), + dc_atitle("//dcvalue[@element='title'][@qualifier='alternative']"), + m_atitle(".//marc:datafield[@tag='246']/marc:subfield[@code='a']"), + + pq_creator("concat(//DISS_author/DISS_name/DISS_surname/text(),', ',//DISS_author/DISS_name/DISS_fname/text(),' ',//DISS_author/DISS_name/DISS_middle/text(),' ',//DISS_author/DISS_name/DISS_suffix/text())", XPathConstants.STRING), + dc_creator("//dcvalue[@element='creator']"), + m_creator(".//marc:datafield[@tag='100']/marc:subfield[@code='a']"), + + pq_suffix("//DISS_author/DISS_name/DISS_suffix"), + m_suffix(".//marc:datafield[@tag='100']/marc:subfield[@code='c']"), + + pq_created("//DISS_comp_date"), + dc_created("//dcvalue[@element='date'][@qualifier='created']"), + m_created(".//marc:datafield[@tag='264']/marc:subfield[@code='c']"), + + pq_pages("//DISS_description/@page_count"), + dc_pages("//dcvalue[@element='format'][@qualifier='extent']"), + m_pages(".//marc:datafield[@tag='300']/marc:subfield[@code='a']"), + + pq_degree("//DISS_degree"), + dc_degree("//dcvalue[@element='description'][@qualifier='none']"), + m_degree(".//marc:datafield[@tag='502']/marc:subfield[@code='b']"), + + pq_college("//DISS_inst_name"), + dc_college("//dcvalue[@element='source'][@qualifier='none'][1]"), + m_college("substring-before(.//marc:datafield[@tag='502']/marc:subfield[@code='c'], ', ')", XPathConstants.STRING), + + pq_dept("//DISS_inst_contact"), + dc_dept("//dcvalue[@element='source'][@qualifier='none'][2]"), + m_dept("substring-after(.//marc:datafield[@tag='502']/marc:subfield[@code='c'],', ')", XPathConstants.STRING), + + pq_advisor("//DISS_advisor/DISS_name"), + dc_advisor("//dcvalue[@element='contributor'][@qualifier='advisor']"), + m_advisor(".//marc:datafield[@tag='700']/marc:subfield[@code='a']"), + + pq_catcode("//DISS_cat_code"), + + dc_catstr("//dcvalue[@element='subject'][@qualifier='lcsh']"), + m_catstr(".//marc:datafield[@tag='650'][@ind2='0']/marc:subfield"), + + pq_catdesc("//DISS_cat_desc"), + dc_catdesc("//dcvalue[@element='subject'][@qualifier='other']"), + m_catdesc(".//marc:datafield[@tag='650'][@ind2='4']/marc:subfield"), + + pq_catkey("//DISS_keyword"), + dc_catkey("//dcvalue[@element='subject'][@qualifier='none']"), + m_catkey(".//marc:datafield[@tag='653']/marc:subfield[@code='a']"), + + pq_abs("substring(//DISS_abstract/DISS_para,1,50)", XPathConstants.STRING), + dc_abs("substring(//dcvalue[@element='description'][@qualifier='abstract'],1,50)", XPathConstants.STRING), + m_abs("substring(.//marc:datafield[@tag='520']/marc:subfield[@code='a'],1,50)", XPathConstants.STRING), + + ; + + XPathExpression ex; + QName rt; + XPATH(String s) { + this(s, XPathConstants.NODESET); + } + XPATH(String s, QName rt) { + try { + this.rt = rt; + XPath xpath = XMLUtil.xf.newXPath(); + + xpath.setNamespaceContext(new NamespaceContext(){ + public String getNamespaceURI(String prefix) { + if (prefix.equals("marc")) return "http://www.loc.gov/MARC21/slim"; + return null; + } + + public String getPrefix(String namespaceURI) { + if (namespaceURI.equals("http://www.loc.gov/MARC21/slim")) return "marc"; + return null; + } + + @SuppressWarnings("rawtypes") + public Iterator getPrefixes(String namespaceURI) { + Vector v = new Vector(); + String s = getPrefix(namespaceURI); + if (s != null) { + v.add(s); + } + return v.iterator(); + } + }); + ex = xpath.compile(s); + } catch (XPathExpressionException e) { + e.printStackTrace(); + } + } + + public Object getObject(Node d) { + StringBuffer buf = new StringBuffer(); + try { + Object obj = ex.evaluate(d, rt); + if (rt == XPathConstants.STRING) { + buf.append(obj.toString()); + } else { + NodeList ns = (NodeList)obj; + for(int i=0; i 0) buf.append("; "); + buf.append(ns.item(i).getTextContent().trim().replaceAll("\\s\\s+", " ")); + } + } + + } catch (XPathExpressionException e) { + e.printStackTrace(); + } + return buf.toString(); + } + + public Object setVal(ProquestQCStatsItems key, Stats stats, Node d) { + if (d == null) return null; + Object obj = getObject(d); + if (obj == null) return null; + obj = prepVal(obj); + if (obj == null) return null; + stats.setVal(key, obj); + return obj; + } + + public Object prepVal(Object obj) { + return obj; + } + public Object prepIntVal(Object obj) { + return ((Double)obj).intValue(); + } + public Object prepDistinctVal(Object obj) { + NodeList nl = (NodeList) obj; + TreeSet ts = new TreeSet(); + for(int i=0; i -1; i=zis.read(buf)) { + fos.write(buf, 0, i); + bytes += i; + } + fos.flush(); + fos.close(); + + if (ze.getName().toLowerCase().endsWith(".xml") && (!ze.getName().contains("/"))) { + try { + Document d = XMLUtil.db.parse(zeout); + stats.setVal(ProquestStatsItems.XmlStat, OVERALL_STAT.PASS); + NodeList nl = d.getElementsByTagName("DISS_title"); + if (nl.getLength() == 1) { + Element elem = (Element)nl.item(0); + stats.setVal(ProquestStatsItems.Title, elem.getTextContent()); + } + nl = d.getElementsByTagName("DISS_inst_contact"); + + if (nl.getLength() == 1) { + Element elem = (Element)nl.item(0); + dept = elem.getTextContent(); + stats.setVal(ProquestStatsItems.Dept, dept); + } + + File dc = new File(zout, "dublin_core.xml"); + XMLUtil.doTransform(d, dc, GUProquestURIResolver.INSTANCE, "proquest2ingest-dc.xsl", MarcUtil.getXslParm(this.ftprops)); + + if (this.getProperty(P_MARC).equals(YN.Y)) { + File marc = new File(zout, "marc.xml"); + XMLUtil.doTransform(d, marc, GUProquestURIResolver.INSTANCE, "proquest2marc.xsl", MarcUtil.getXslParm(this.ftprops)); + } + + if (d.getDocumentElement().hasAttribute("embargo_code")) { + String s = d.getDocumentElement().getAttribute("embargo_code"); + if (!s.equals("0")) { + File gu = new File(zout, "metadata_" + getProperty(MarcUtil.P_EMBARGO_SCHEMA, "local") + ".xml"); + XMLUtil.doTransform(d, gu, GUProquestURIResolver.INSTANCE, "proquest2ingest-local.xsl", MarcUtil.getXslParm(this.ftprops)); + + Document gd = XMLUtil.db.parse(gu); + nl = gd.getElementsByTagName("dcvalue"); + for(int i=0; i 25000000 && stats.getVal(ProquestStatsItems.OverallStat) == OVERALL_STAT.INIT) { + stats.setVal(ProquestStatsItems.OverallStat, OVERALL_STAT.REVIEW); + } + + if (zcount > 2 && stats.getVal(ProquestStatsItems.OverallStat) == OVERALL_STAT.INIT) { + stats.setVal(ProquestStatsItems.OverallStat, OVERALL_STAT.REVIEW); + } + + if (zcount == 2 && stats.getVal(ProquestStatsItems.OverallStat) == OVERALL_STAT.INIT) { + stats.setVal(ProquestStatsItems.OverallStat, OVERALL_STAT.PASS); + } + + if (stats.getVal(ProquestStatsItems.OverallStat) == OVERALL_STAT.REVIEW) { + zout.renameTo(revzout); + zout = revzout; + } + + if (this.getProperty(P_FOLDERS).equals(YN.Y)) { + dept = dept.replaceAll("[^a-zA-Z0-9]", "_").replaceAll("__+","_"); + File deptdir = new File(outdir, dept); + deptdir.mkdir(); + zout.renameTo(new File(deptdir, zout.getName())); + } + return zcount; + } + + public String getShortName() { + return "ProQuest"; + } + + public void initFilters() { + filters.add(new ZipFilter()); + } + + public Stats createStats(String key){ + return Generator.INSTANCE.create(key); + } + public StatsItemConfig getStatsDetails() { + return details; + } + + public String toString() { + return "ProQuest to DSpace Ingest Folder"; + } + +} diff --git a/dspace/src/main/edu/georgetown/library/fileAnalyzer/importer/CSVBatcher.java b/dspace/src/main/edu/georgetown/library/fileAnalyzer/importer/CSVBatcher.java new file mode 100644 index 0000000..8c5c165 --- /dev/null +++ b/dspace/src/main/edu/georgetown/library/fileAnalyzer/importer/CSVBatcher.java @@ -0,0 +1,134 @@ +package edu.georgetown.library.fileAnalyzer.importer; + +import java.io.BufferedReader; +import java.io.BufferedWriter; +import java.io.File; +import java.io.FileOutputStream; +import java.io.FileReader; +import java.io.IOException; +import java.io.OutputStreamWriter; +import java.text.NumberFormat; +import java.util.TreeMap; + +import gov.nara.nwts.ftapp.ActionResult; +import gov.nara.nwts.ftapp.FTDriver; +import gov.nara.nwts.ftapp.Timer; +import gov.nara.nwts.ftapp.YN; +import gov.nara.nwts.ftapp.ftprop.FTPropEnum; +import gov.nara.nwts.ftapp.importer.DefaultImporter; +import gov.nara.nwts.ftapp.stats.Stats; +import gov.nara.nwts.ftapp.stats.StatsGenerator; +import gov.nara.nwts.ftapp.stats.StatsItem; +import gov.nara.nwts.ftapp.stats.StatsItemConfig; +import gov.nara.nwts.ftapp.stats.StatsItemEnum; + +/** + * Abstract class handling the import of a character-delimited text file allowing for individual values to be wrapped by quotation marks. + * @author TBrady + * + */ +public class CSVBatcher extends DefaultImporter { + private static enum BatcherStatsItems implements StatsItemEnum { + File(StatsItem.makeStringStatsItem("File Name").setWidth(200)), + Count(StatsItem.makeIntStatsItem("Num items")) + ; + + StatsItem si; + BatcherStatsItems(StatsItem si) {this.si=si;} + public StatsItem si() {return si;} + } + + public static enum Generator implements StatsGenerator { + INSTANCE; + public Stats create(String key) {return new Stats(details, key);} + } + static StatsItemConfig details = StatsItemConfig.create(BatcherStatsItems.class); + public enum SIZE { + B5(5), + B10(10), + B25(25), + B50(50), + B100(100), + B250(250), + B500(500), + B1000(1000), + B2000(2000), + B5000(5000), + B10000(10000); + + int size; + SIZE(int x) { + this.size = x; + } + } + public static final String HEADROW = "HeadRow"; + public static final String BATCHSIZE = "BatchSize"; + public CSVBatcher(FTDriver dt) { + super(dt); + this.ftprops.add(new FTPropEnum(dt, this.getClass().getName(), HEADROW, HEADROW, + "Treat first row as header", YN.values(), YN.Y)); + this.ftprops.add(new FTPropEnum(dt, this.getClass().getName(), BATCHSIZE, BATCHSIZE, + "Treat first row as header", SIZE.values(), SIZE.B1000)); + } + boolean forceKey; + + public static String KEY = "key"; + + public ActionResult importFile(File selectedFile) throws IOException { + boolean header = (YN)getProperty(HEADROW) == YN.Y; + int batchSize = ((SIZE)getProperty(BATCHSIZE)).size; + BufferedReader br = new BufferedReader(new FileReader(selectedFile)); + String headRow = header ? br.readLine() + "\n" : ""; + int outFileCount = 0; + int rec = 0; + int currec = 0; + + Timer timer = new Timer(); + TreeMap types = new TreeMap(); + + NumberFormat nf = NumberFormat.getIntegerInstance(); + nf.setMinimumIntegerDigits(4); + nf.setGroupingUsed(false); + BufferedWriter bw = null; + String outFileName = ""; + + for(String line = br.readLine(); line!=null; line=br.readLine()) { + if (rec % batchSize == 0) { + outFileCount++; + if (bw!=null) { + bw.close(); + Stats stats = Generator.INSTANCE.create(outFileName); + stats.setVal(BatcherStatsItems.Count, currec); + types.put(outFileName, stats); + } + outFileName = selectedFile.getName()+"_"+nf.format(outFileCount)+".csv"; + File outFile = new File(selectedFile.getParent(), outFileName); + currec = 0; + bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(outFile),"UTF-8")); + if (header) bw.write(headRow); + } + bw.write(line); + bw.write("\n"); + rec++; + currec++; + } + if (bw!=null) { + bw.close(); + Stats stats = Generator.INSTANCE.create(outFileName); + stats.setVal(BatcherStatsItems.Count, currec); + types.put(outFileName, stats); + } + + return new ActionResult(selectedFile, selectedFile.getName(), this.toString(), details, types, true, timer.getDuration()); + } + + public String toString() { + return "Batch CSV File"; + } + public String getDescription() { + return "This rule will break a CSV file into appropriate sized batches while retaining the header row."; + } + public String getShortName() { + return "BatchCSV"; + } +} diff --git a/dspace/src/main/edu/georgetown/library/fileAnalyzer/importer/DSpaceImporterRegistry.java b/dspace/src/main/edu/georgetown/library/fileAnalyzer/importer/DSpaceImporterRegistry.java index 9b312a6..04e7508 100644 --- a/dspace/src/main/edu/georgetown/library/fileAnalyzer/importer/DSpaceImporterRegistry.java +++ b/dspace/src/main/edu/georgetown/library/fileAnalyzer/importer/DSpaceImporterRegistry.java @@ -1,22 +1,24 @@ -package edu.georgetown.library.fileAnalyzer.importer; - -import gov.nara.nwts.ftapp.FTDriver; -import gov.nara.nwts.ftapp.importer.ImporterRegistry; - - -/** - * Activates the Importers that will be presented on the Import tab. - * @author TBrady - * - */ -public class DSpaceImporterRegistry extends ImporterRegistry { - - private static final long serialVersionUID = 1L; - - public DSpaceImporterRegistry(FTDriver dt) { - super(dt); - add(new IngestFolderCreate(dt)); - } - - -} +package edu.georgetown.library.fileAnalyzer.importer; + +import gov.nara.nwts.ftapp.FTDriver; +import gov.nara.nwts.ftapp.importer.ImporterRegistry; + + +/** + * Activates the Importers that will be presented on the Import tab. + * @author TBrady + * + */ +public class DSpaceImporterRegistry extends ImporterRegistry { + + private static final long serialVersionUID = 1L; + + public DSpaceImporterRegistry(FTDriver dt) { + super(dt); + add(new IngestFolderCreate(dt)); + add(new CSVBatcher(dt)); + add(new DSpaceMetadata2Marc(dt)); + } + + +} diff --git a/dspace/src/main/edu/georgetown/library/fileAnalyzer/importer/DSpaceMetadata2Marc.java b/dspace/src/main/edu/georgetown/library/fileAnalyzer/importer/DSpaceMetadata2Marc.java new file mode 100644 index 0000000..56d292a --- /dev/null +++ b/dspace/src/main/edu/georgetown/library/fileAnalyzer/importer/DSpaceMetadata2Marc.java @@ -0,0 +1,268 @@ +package edu.georgetown.library.fileAnalyzer.importer; + +import java.io.File; +import java.io.IOException; +import java.text.NumberFormat; +import java.util.HashMap; +import java.util.TreeMap; +import java.util.Vector; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +import javax.xml.transform.TransformerException; + +import org.w3c.dom.Document; +import org.w3c.dom.Element; +import org.xml.sax.SAXException; + +import edu.georgetown.library.fileAnalyzer.proquestXsl.GUProquestURIResolver; +import edu.georgetown.library.fileAnalyzer.proquestXsl.MarcUtil; + +import gov.nara.nwts.ftapp.ActionResult; +import gov.nara.nwts.ftapp.FTDriver; +import gov.nara.nwts.ftapp.Timer; +import gov.nara.nwts.ftapp.YN; +import gov.nara.nwts.ftapp.ftprop.FTPropEnum; +import gov.nara.nwts.ftapp.ftprop.FTPropString; +import gov.nara.nwts.ftapp.importer.DefaultImporter; +import gov.nara.nwts.ftapp.importer.DelimitedFileReader; +import gov.nara.nwts.ftapp.stats.Stats; +import gov.nara.nwts.ftapp.stats.StatsGenerator; +import gov.nara.nwts.ftapp.stats.StatsItem; +import gov.nara.nwts.ftapp.stats.StatsItemConfig; +import gov.nara.nwts.ftapp.stats.StatsItemEnum; +import gov.nara.nwts.ftapp.util.XMLUtil; + +/** + * Importer for tab delimited files + * + * @author TBrady + * + */ +public class DSpaceMetadata2Marc extends DefaultImporter { + + public static Pattern phead = Pattern + .compile("^dc\\.([a-z]+)(\\.([a-z]+))?(\\[[a-zA-Z_]+\\])?$"); + public static NumberFormat nf = NumberFormat.getNumberInstance(); + static { + nf.setMinimumIntegerDigits(4); + nf.setGroupingUsed(false); + } + + public static enum STAT { + CONVERTED, + SKIPPED, + COMP_DATE_FORMAT + } + + public static enum DSpace2MarcStatsItems implements StatsItemEnum { + Handle(StatsItem.makeStringStatsItem("Handle", 100)), + Stat(StatsItem.makeEnumStatsItem(STAT.class, "Status").setWidth(150)), + Url(StatsItem.makeStringStatsItem("URL", 250)), + AccessionDate(StatsItem.makeStringStatsItem("Accession", 100)), + CompDate(StatsItem.makeStringStatsItem("Comp Date", 100)), + Author(StatsItem.makeStringStatsItem("Author", 250)), + Title(StatsItem.makeStringStatsItem("Title", 250)), + EmbargoTerms(StatsItem.makeStringStatsItem("Embargo Terms", 100)), + EmbargoLift(StatsItem.makeStringStatsItem("Embargo Lift", 100)), ; + + StatsItem si; + + DSpace2MarcStatsItems(StatsItem si) { + this.si = si; + } + + public StatsItem si() { + return si; + } + } + + public static enum Generator implements StatsGenerator { + INSTANCE; + public Stats create(String key) { + return new Stats(details, key); + } + } + + public static StatsItemConfig details = StatsItemConfig + .create(DSpace2MarcStatsItems.class); + public static String P_DC = "Generate DC"; + public static String P_START = "Accession Start"; + + public DSpaceMetadata2Marc(FTDriver dt) { + super(dt); + this.ftprops.add(new FTPropEnum(dt, this.getClass().getSimpleName(), + P_DC, P_DC, + "Generate a DC record for QC/troubleshooting purposes", YN + .values(), YN.N)); + this.ftprops.add(new FTPropString(dt, this.getClass().getSimpleName(), + P_START, P_START, + "Accession Start Date in YYYY-MM-DD format", "2013-03-01")); + ftprops.add(new FTPropString(dt, this.getClass().getSimpleName(), MarcUtil.P_UNIV_NAME, MarcUtil.P_UNIV_NAME, + "University Name", "My University")); + ftprops.add(new FTPropString(dt, this.getClass().getSimpleName(), MarcUtil.P_UNIV_LOC, MarcUtil.P_UNIV_LOC, + "University Location", "My University Location")); + ftprops.add(new FTPropString(dt, this.getClass().getSimpleName(), MarcUtil.P_EMBARGO_SCHEMA, MarcUtil.P_EMBARGO_SCHEMA, + "Embargo Schema Prefix", "local")); + ftprops.add(new FTPropString(dt, this.getClass().getSimpleName(), MarcUtil.P_EMBARGO_ELEMENT, MarcUtil.P_EMBARGO_ELEMENT, + "Embargo Element", "embargo")); + ftprops.add(new FTPropString(dt, this.getClass().getSimpleName(), MarcUtil.P_EMBARGO_TERMS, MarcUtil.P_EMBARGO_TERMS, + "Embargo Policy Qualifier", "terms")); + ftprops.add(new FTPropString(dt, this.getClass().getSimpleName(), MarcUtil.P_EMBARGO_CUSTOM, MarcUtil.P_EMBARGO_CUSTOM, + "Embargo Custom Date Qualifier", "custom-date")); + ftprops.add(new FTPropString(dt, this.getClass().getSimpleName(), MarcUtil.P_EMBARGO_LIFT, MarcUtil.P_EMBARGO_LIFT, + "Embargo Lift Date Qualifier", "lift-date")); + } + + public String toString() { + return "Convert DSpace Metadata to MARC"; + } + + public String getDescription() { + return "This rule will take exported metadata from DSpace and generate MARC records. This rule is intended for use on items that were ingested from Proquest."; + } + + public String getShortName() { + return "Dspace2Marc"; + } + + public void makeElement(Element parent, String header, String content) { + Document d = parent.getOwnerDocument(); + Matcher m = phead.matcher(header); + if (!m.matches()) { + return; + } + String[] contarr = content.split("\\|\\|"); + for (String contline : contarr) { + Element elem = d.createElement("dcvalue"); + parent.appendChild(elem); + elem.setAttribute("element", m.group(1)); + String q = m.group(3); + elem.setAttribute("qualifier", q == null ? "none" : q); + elem.appendChild(d.createTextNode(contline)); + } + } + + private static Pattern p1 = Pattern.compile("^.*hdl.handle.net/(.*)$"); + private static Pattern p2 = Pattern.compile("^.*handle/(.*)$"); + private static Pattern pComp = Pattern.compile("^\\d\\d\\d\\d$"); + public String getKey(String url, int seq) { + Matcher m = p1.matcher(url); + if (m.matches()) return m.group(1); + m = p2.matcher(url); + if (m.matches()) return m.group(1); + return "" + seq; + } + + public String get(Vector cols, Vector nums) { + for(int n: nums) { + String s = cols.get(n); + if (!s.isEmpty()) return s; + } + return ""; + } + + public ActionResult importFile(File selectedFile) throws IOException { + String accStart = (String)this.getProperty(P_START); + Timer timer = new Timer(); + TreeMap types = new TreeMap(); + DelimitedFileReader dfr = new DelimitedFileReader(selectedFile, ","); + HashMap colMap = new HashMap(); + + Document collDoc = XMLUtil.db.newDocument(); + Element collDocRoot = collDoc.createElementNS("http://www.loc.gov/MARC21/slim","marc:collection"); + collDocRoot.setAttributeNS("http://www.w3.org/2001/XMLSchema-instance", "xsi:schemaLocation", "http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"); + collDoc.appendChild(collDocRoot); + + Vector headers = dfr.getRow(); + VectorurlCol = new Vector(); + VectoraccCol = new Vector(); + VectorcompCol = new Vector(); + + String embargo_element = this.getProperty(MarcUtil.P_EMBARGO_SCHEMA) + "." + this.getProperty(MarcUtil.P_EMBARGO_ELEMENT) + "."; + + for (int i = 0; i < headers.size(); i++) { + String head = headers.get(i); + if (head.startsWith("dc.creator")) + colMap.put(i, DSpace2MarcStatsItems.Author); + else if (head.startsWith("dc.title")) + colMap.put(i, DSpace2MarcStatsItems.Title); + else if (head.startsWith(embargo_element + this.getProperty(MarcUtil.P_EMBARGO_TERMS))) + colMap.put(i, DSpace2MarcStatsItems.EmbargoTerms); + else if (head.startsWith(embargo_element + this.getProperty(MarcUtil.P_EMBARGO_LIFT))) + colMap.put(i, DSpace2MarcStatsItems.EmbargoLift); + else if (head.startsWith("dc.identifier.uri")) { + colMap.put(i, DSpace2MarcStatsItems.Url); + urlCol.add(i); + } else if (head.startsWith("dc.date.accession")) { + accCol.add(i); + } else if (head.startsWith("dc.date.created")) { + compCol.add(i); + } + } + + for (Vector cols = dfr.getRow(); cols != null; cols = dfr + .getRow()) { + String url = get(cols, urlCol); + String accDate = get(cols, accCol); + String compDate = get(cols, compCol); + String key = getKey(url, types.size()); + Stats stats = Generator.INSTANCE.create(key); + types.put(key, stats); + stats.setVal(DSpace2MarcStatsItems.Url, url); + stats.setVal(DSpace2MarcStatsItems.AccessionDate, accDate); + stats.setVal(DSpace2MarcStatsItems.CompDate, compDate); + boolean bCompDate = pComp.matcher(compDate).matches(); + + stats.setVal(DSpace2MarcStatsItems.Stat, bCompDate ? STAT.CONVERTED : STAT.COMP_DATE_FORMAT); + if (accDate.compareTo(accStart) < 0) { + stats.setVal(DSpace2MarcStatsItems.Stat, STAT.SKIPPED); + continue; + } + + Document d = XMLUtil.db.newDocument(); + Element root = d.createElement("dublin_core"); + root.setAttribute("schema", "dc"); + d.appendChild(root); + + for (int i = 0; i < cols.size(); i++) { + //Commenting out, unsure why this was suppressed + //if (urlCol.contains(i) || accCol.contains(i)) continue; + String col = cols.get(i); + if (col == null) continue; + if (col.trim().isEmpty()) continue; + makeElement(root, headers.get(i), col); + DSpace2MarcStatsItems dmsi = colMap.get(i); + if (dmsi != null) { + String s = stats.getVal(dmsi).toString(); + s = (s == null) ? col : s + col; + stats.setVal(dmsi, col); + } + + } + + File f = new File(selectedFile.getParentFile(), "marc_" + + key.replace("/", "_") + ".xml"); + try { + XMLUtil.doTransform(d, f, GUProquestURIResolver.INSTANCE, + "dc2marc.xsl", MarcUtil.getXslParm(this.ftprops)); + Document md = XMLUtil.db.parse(f); + collDocRoot.appendChild(collDoc.importNode(md.getDocumentElement(), true)); + } catch (TransformerException e) { + e.printStackTrace(); + } catch (SAXException e) { + e.printStackTrace(); + } + if (this.getProperty(P_DC).equals(YN.Y)) { + File f1 = new File(selectedFile.getParentFile(), "dc_" + + key.replace("/", "_") + ".xml"); + XMLUtil.serialize(d, f1); + } + } + + File collFile = new File(selectedFile.getParentFile(), "collection.xml"); + XMLUtil.serialize(collDoc, collFile); + return new ActionResult(selectedFile, selectedFile.getName(), + this.toString(), details, types, true, timer.getDuration()); + } +} diff --git a/dspace/src/main/edu/georgetown/library/fileAnalyzer/importer/IngestFolderCreate.java b/dspace/src/main/edu/georgetown/library/fileAnalyzer/importer/IngestFolderCreate.java index 964e7fd..ca0a558 100644 --- a/dspace/src/main/edu/georgetown/library/fileAnalyzer/importer/IngestFolderCreate.java +++ b/dspace/src/main/edu/georgetown/library/fileAnalyzer/importer/IngestFolderCreate.java @@ -1,399 +1,570 @@ -package edu.georgetown.library.fileAnalyzer.importer; - -import java.io.BufferedWriter; -import java.io.File; -import java.io.FileWriter; -import java.io.IOException; -import java.text.NumberFormat; -import java.util.HashMap; -import java.util.TreeMap; -import java.util.Vector; -import java.util.regex.Matcher; -import java.util.regex.Pattern; - -import org.w3c.dom.Document; -import org.w3c.dom.Element; - -import gov.nara.nwts.ftapp.ActionResult; -import gov.nara.nwts.ftapp.FTDriver; -import gov.nara.nwts.ftapp.Timer; -import gov.nara.nwts.ftapp.importer.DefaultImporter; -import gov.nara.nwts.ftapp.importer.DelimitedFileImporter; -import gov.nara.nwts.ftapp.stats.Stats; -import gov.nara.nwts.ftapp.stats.StatsGenerator; -import gov.nara.nwts.ftapp.stats.StatsItem; -import gov.nara.nwts.ftapp.stats.StatsItemConfig; -import gov.nara.nwts.ftapp.stats.StatsItemEnum; -import gov.nara.nwts.ftapp.util.XMLUtil; - -/** - * @author TBrady - * - */ -public class IngestFolderCreate extends DefaultImporter { - private static enum IngestStatsItems implements StatsItemEnum { - LineNo(StatsItem.makeStringStatsItem("Line No").setExport(false)), - Status(StatsItem.makeEnumStatsItem(status.class, "Status")), - Message(StatsItem.makeStringStatsItem("Message").setExport(false)) - ; - - StatsItem si; - IngestStatsItems(StatsItem si) {this.si=si;} - public StatsItem si() {return si;} - } - - public static enum Generator implements StatsGenerator { - INSTANCE; - public Stats create(String key) {return new Stats(details, key);} - } - static StatsItemConfig details = StatsItemConfig.create(IngestStatsItems.class); - class column { - boolean valid; - boolean fixed; - int inputCol; - String element; - String qualifier; - String name; - - column(int inputCol, String header) { - this(inputCol, header, false); - } - column(int inputCol, String header, boolean fixed) { - this.fixed = fixed; - this.name = header; - this.inputCol = inputCol; - String[] parts = header.split("\\."); - if ((parts.length < 2) || (parts.length > 3)) { - valid = false; - } else if (parts[0].equals("dc")) { - valid = true; - element = parts[1]; - if (parts.length > 2) { - qualifier = parts[2]; - } else { - qualifier = "none"; - } - } else { - valid = false; - } - } - - } - - class createStatus { - status stat; - String message; - - createStatus(status stat, String message) { - this.stat = stat; - this.message = message; - } - - createStatus append(createStatus cs) { - if (cs.stat.ordinal() > status.PASS.ordinal()) { - return cs; - } - - String s = this.message; - if (s.equals("")) { - s = cs.message; - } else { - s = s + ". " + cs.message; - } - return new createStatus(cs.stat, s); - } - } - - Vector colHeaderDefs; - HashMap colByName; - HashMap folders; - - public StatsItemConfig getDetails() { - return details; - } - - public enum status {INIT,PASS,WARN,FAIL} - NumberFormat nf; - - public IngestFolderCreate(FTDriver dt) { - super(dt); - nf = NumberFormat.getNumberInstance(); - nf.setMinimumIntegerDigits(8); - nf.setGroupingUsed(false); - } - - public String toString() { - return "Ingest: Create Ingest Folders"; - } - public String getDescription() { - return "This will create DSpace Ingest Folders. Note: this action will MOVE files into the ingest folder structure. Please save a backup of your initial directory before running this action the first time.\n"+ - "The File Test 'Ingest Inventory' can be used to create an initial spreadsheet if one does not exist.\n"+ - "File Structure\n"+ - "\t1) Folder Name - A unique folder will be created for each item to be ingested. Names must be unique\n"+ - "\t2) Item file name - required, a file with that name must exist relative to the imported spreadsheet\n"+ - "\t3) Thumbnail file name - optional, file must exist is present\n"+ - "\tAddition columns should have a dublin core field name in their header. Columns without a 'dc.' header will be ignored\n" + - "A title (dc.title) and a properly formatted creation date (dc.date.created) must be present somewhere in the set of additional columns."; - } - public String getShortName() { - return "Ingest Folder"; - } - - public createStatus createItem(File selectedFile, Vector cols) { - Document d = XMLUtil.db.newDocument(); - Element e = d.createElement("dublin_core"); - e.setAttribute("schema", "dc"); - d.appendChild(e); - for(int i=0; i vals, String name, String def) { - column col = colByName.get(name); - if (col == null) return def; - if (col.inputCol > vals.size()) return def; - String val = vals.get(col.inputCol); - if (val == null) return def; - if (val.trim().isEmpty()) return def; - return val.trim(); - } - - public boolean hasColumnValue(Vector vals, String name) { - return getColumnValue(vals, name, null) != null; - } - - public ActionResult importFile(File selectedFile) throws IOException { - Timer timer = new Timer(); - TreeMap types = new TreeMap(); - int rowKey = 0; - Vector> data = DelimitedFileImporter.parseFile(selectedFile, "\t"); - folders = new HashMap(); - colHeaderDefs = new Vector(); - colByName = new HashMap(); - Vector colheads = data.get(0); - - addColumn(new column(0, "Item Folder", true)); - addColumn(new column(1, "Item File", true)); - addColumn(new column(2, "Item Thumbnail", true)); - - for(int i=3; i< colheads.size(); i++) { - String colh = colheads.get(i); - addColumn(new column(i, colh)); - } - - for(int r=1; r cols = data.get(r); - String key = nf.format(rowKey++); - Stats stats = Generator.INSTANCE.create(key); - importRow(selectedFile, cols, stats); - - types.put(key, stats); - } - - return new ActionResult(selectedFile, selectedFile.getName(), this.toString(), getDetails(), types, true, timer.getDuration()); - } - - public void importRow(File selectedFile, Vector cols,Stats stats) { - createStatus cs = new createStatus(status.INIT, ""); - stats.setVal(IngestStatsItems.Status, cs.stat); - stats.setVal(IngestStatsItems.Message, cs.message); - - if (cols.size() == colHeaderDefs.size()) { - String folder = ""; - String file = ""; - String thumb = ""; - - for(int i=0; i 3)) { + valid = false; + } else { + valid = true; + schema = parts[0]; + element = parts[1]; + if (parts.length > 2) { + qualifier = parts[2]; + } else { + qualifier = "none"; + } + } + } + + } + + private static enum FileStats {NA, NOT_FOUND, ERROR, ALREADY_EXISTS, MOVED_TO_INGEST, COPIED_TO_INGEST;} + private static enum MetaStats {NA, ERROR, CREATED, OVERWRITTEN;} + private static enum InputMetaStats {NA, MISSING, FORMAT_ERROR, DUPLICATE, OK;} + class CreateException extends Exception { + private static final long serialVersionUID = 5042987495219000857L; + + CreateException(String s) { + super(s); + } + } + + Vector colHeaderDefs; + HashMap colByName; + HashMap folders; + + public StatsItemConfig getDetails() { + return details; + } + + private enum status {INIT,PASS,WARN,FAIL} + + NumberFormat nf; + + public static final String REUSABLE_THUMBNAIL = "Reusable Thumbnail"; + public static final String REUSABLE_LICENSE = "Reusable License"; + public static final String P_AUTONAME = "Add User and Date to Ingest Folder"; + public static enum FIXED { + FOLDER(0), ITEM(1), THUMB(2), LICENSE(3); + int index; + FIXED(int i) {index = i;} + } + + public IngestFolderCreate(FTDriver dt) { + super(dt); + nf = NumberFormat.getNumberInstance(); + nf.setMinimumIntegerDigits(8); + nf.setGroupingUsed(false); + + this.ftprops.add(new FTPropString(dt, this.getClass().getSimpleName(), REUSABLE_THUMBNAIL, "thumb", + "Relative path to thumbnail file to be used for all items w/o thumbnail (optional)", "")); + this.ftprops.add(new FTPropString(dt, this.getClass().getSimpleName(), REUSABLE_LICENSE, "license", + "Relative path to license file to be used for all items w/o license (optional)", "")); + this.ftprops.add(new FTPropEnum(dt, this.getClass().getSimpleName(), P_AUTONAME, "autoname", + "Append user name, date and time to ingest folder name", YN.values(), YN.Y)); + + } + + public String toString() { + return "Ingest: Create Ingest Folders"; + } + public String getDescription() { + return "This will create DSpace Ingest Folders. Note: this action will MOVE files into the ingest folder structure. Please save a backup of your initial directory before running this action the first time.\n"+ + "The File Test 'Ingest Inventory' can be used to create an initial spreadsheet if one does not exist.\n"+ + "File Structure\n"+ + "\t1) Folder Name - A unique folder will be created for each item to be ingested. Names must be unique\n"+ + "\t2) Item file name - required, a file with that name must exist relative to the imported spreadsheet\n"+ + "\t3) Thumbnail file name - optional, file must exist if present\n"+ + "\t4) License file name - optional, file must exist if present\n"+ + "\tAddition columns should have a dublin core field name in their header. Columns without a 'dc.' header will be ignored\n" + + "A title (dc.title) and a properly formatted creation date (dc.date.created) must be present somewhere in the set of additional columns."; + } + public String getShortName() { + return "Ingest Folder"; + } + + public void createMetadataFile(File dir, Stats stats, Vector cols, String schema) { + String filename = "dublin_core.xml"; + IngestStatsItems index = IngestStatsItems.DublinCoreFileStats; + + if (!schema.equals("dc")) { + filename = "metadata_" + schema + ".xml"; + index = IngestStatsItems.OtherMetadataFileStats; + stats.appendVal(IngestStatsItems.OtherSchemas, schema); + } + + Document d = XMLUtil.db.newDocument(); + Element e = d.createElement("dublin_core"); + e.setAttribute("schema", schema); + d.appendChild(e); + for(int i=0; i cols) { + File dir = new File(currentIngestDir, cols.get(0)); + dir.mkdirs(); + + HashSet schemas = new HashSet(); + for(int i=0; i vals, String name, String def) { + column col = colByName.get(name); + if (col == null) return def; + if (col.inputCol > vals.size()) return def; + String val = vals.get(col.inputCol); + if (val == null) return def; + if (val.trim().isEmpty()) return def; + return val.trim(); + } + + public boolean hasColumnValue(Vector vals, String name) { + return getColumnValue(vals, name, null) != null; + } + + private File currentIngestDir; + public File getCurrentIngestDir(File selectedFile){ + File parent = selectedFile.getParentFile(); + File defdir = new File(parent, "ingest"); + if (defdir.exists()) return defdir; + if (this.getProperty(P_AUTONAME) == YN.N) return defdir; + File[] list = parent.listFiles(new FilenameFilter(){ + public boolean accept(File dir, String name) { + if (name.startsWith("ingest_")) return true; + return false; + } + }); + if (list.length > 0) return list[0]; + String name = "ingest_"; + name += System.getProperty("user.name"); + name += "_"; + DateFormat df = new SimpleDateFormat("yyyyMMdd-HHmmss"); + name += df.format(new Date()); + return new File(parent, name); + } + + + public ActionResult importFile(File selectedFile) throws IOException { + currentIngestDir = getCurrentIngestDir(selectedFile); + Timer timer = new Timer(); + String globalThumb = (String)getProperty(REUSABLE_THUMBNAIL); + String globalLicense = (String)getProperty(REUSABLE_LICENSE); + TreeMap types = new TreeMap(); + int rowKey = 0; + Vector> data = DelimitedFileReader.parseFile(selectedFile, "\t"); + folders = new HashMap(); + colHeaderDefs = new Vector(); + colByName = new HashMap(); + Vector colheads = data.get(0); + + addColumn(new column(FIXED.FOLDER.index, InventoryStatsItems.Key.si().header, true)); + addColumn(new column(FIXED.ITEM.index, InventoryStatsItems.File.si().header, true)); + addColumn(new column(FIXED.THUMB.index, InventoryStatsItems.ThumbFile.si().header, true)); + addColumn(new column(FIXED.LICENSE.index, InventoryStatsItems.LicenseFile.si().header, true)); + + for(int i=colHeaderDefs.size(); i< colheads.size(); i++) { + String colh = colheads.get(i); + addColumn(new column(i, colh)); + } + + details = StatsItemConfig.create(IngestStatsItems.class); + for(column col: colHeaderDefs) { + if (col.valid || col.fixed) { + details.addStatsItem(col.inputCol, StatsItem.makeStringStatsItem(col.name, 250)); + } + } + + for(int r=1; r cols = data.get(r); + String key = nf.format(rowKey++); + Stats stats = Generator.INSTANCE.create(key); + importRow(selectedFile, cols, stats, globalThumb, globalLicense); + + types.put(key, stats); + } + + return new ActionResult(selectedFile, selectedFile.getName(), this.toString(), getDetails(), types, true, timer.getDuration()); + } + + public void importRow(File selectedFile, Vector cols,Stats stats, String globalThumb, String globalLicense) { + StringBuffer buf = new StringBuffer(); + try { + if (cols.size() != colHeaderDefs.size()) { + throw new CreateException(cols.size()+" columns found : "+colHeaderDefs.size() + " columns expected"); + } + String folder = ""; + String file = ""; + String thumb = ""; + String license = ""; + + for(int i=0; i getXslParm(ArrayList props) { + HashMap pmap = new HashMap(); + pmap.put("recdate", df.format(new Date())); + for(FTProp prop: props) { + String key = prop.getShortName(); + pmap.put(key, prop.getValue()); + } + return pmap; + } + +} diff --git a/dspace/src/main/resources/edu/georgetown/library/fileAnalyzer/proquestXsl/dc2marc.xsl b/dspace/src/main/resources/edu/georgetown/library/fileAnalyzer/proquestXsl/dc2marc.xsl new file mode 100644 index 0000000..4a4c95f --- /dev/null +++ b/dspace/src/main/resources/edu/georgetown/library/fileAnalyzer/proquestXsl/dc2marc.xsl @@ -0,0 +1,93 @@ + + + + + + 20130225090000.10 + University Name + University Location + + + + + + + + + + + + + + + + + + + + + + + + + + , + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dspace/src/main/resources/edu/georgetown/library/fileAnalyzer/proquestXsl/lcsh.xml b/dspace/src/main/resources/edu/georgetown/library/fileAnalyzer/proquestXsl/lcsh.xml new file mode 100644 index 0000000..a700212 --- /dev/null +++ b/dspace/src/main/resources/edu/georgetown/library/fileAnalyzer/proquestXsl/lcsh.xml @@ -0,0 +1,1415 @@ + + + Accounting + + + Hearing + + + Adult education + + + Atmosphere, Upper + + + Aerospace engineering + + + Aesthetics + + + African Americans + Research + + + Africa + History + + + African literature + + + Africa + Research + + + Aging + + + Agricultural chemistry + + + Agriculture + Economic aspects + + + Agricultural education + + + Agricultural engineering + + + Agriculture + + + Agronomy + + + Dispute resolution (Law) + + + Renewable energy sources + + + Alternative medicine + + + United States + History + + + American literature + + + United States + Research + + + Chemistry, Analytic + + + History, Ancient + + + Extinct languages + + + Animal behavior + + + Animals + Diseases + + + Physiology + + + Veterinary medicine + + + Applied mathematics + + + Archaeology + + + Building + + + Architecture + + + Regional planning + + + Art criticism + + + Art + Research + + + Art + History + + + Artificial intelligence + + + Arts + Management + + + Asian Americans + Research + + + Asia + History + + + Oriental literature + + + Asia + Research + + + Astronomy + + + Astrophysics + + + Atmospheric chemistry + + + Atmospheric physics + + + Atmospheric physics + + + Nuclear Physics + + + Audiology + + + Automobiles + Design and construction + + + Baltic States + Research + + + Banks and banking + + + Psychology + + + Bible + + + Education, Bilingual + + + Biochemistry + + + Biogeochemistry + + + Biography + + + Bioinformatics + + + Marine biology + + + Biology + + + Biomechanics + + + Biomedical engineering + + + Biophysics + + + Biometry + + + Blacks + History + + + Blacks + Research + + + British literature + + + Irish literature + + + Business + + + Business Education + + + Canada + History� + + + Canadian literature + + + Canada + Research + + + Canon law + + + Caribbean literature + + + Caribbean Area + Research + + + Cytology + + + Chemical engineering + + + Chemical oceanography + + + Chemistry + + + Church history + + + Motion pictures + + + Cinematography + + + Civil engineering + + + Classical literature + + + Classical education + + + Religious education + + + Climatic changes + + + Clinical psychology + + + Cognitive psychology + + + Communication + + + Community colleges + Research + + + Comparative literature + + + Religions + + + Computer engineering + + + Computer science + + + Solid state physics + + + Condensed matter + + + Nature conservation + + + Geodynamics + + + Continuing Education + + + Counseling psychology + + + Criminology + + + Ethnology + + + Cultural property + Protection + + + Curriculum planning + + + Dance + + + Demography + + + Dentistry + + + Design + + + Developmental biology + + + Developmental psychology + + + Theology + + + Early childhood education + + + Europe, Eastern + Research + + + Ecology + + + Economic history + + + Economics + + + Economics + + + Commerce + + + Business + + + Labor economics + + + Education + + + Education + Finance + + + Education and state + + + School management and organization + + + Educational evaluation + + + Educational leadership + + + Educational psychology + + + Educational technology + + + Educational tests and measurements + + + Electrical engineering + + + Electromagnetism + + + Education, Elementary + + + Endocrinology + + + Power resources + + + Engineering + + + English language + Study and teaching + Foreign speakers + + + English literature + + + Entomology + + + Entrepreneurship + + + Environmental economics + + + Environmental education + + + Environmental engineering + + + Environmental geology + + + Environmental health + + + Environmental justice + + + Environmental law + + + Environmental management + + + Environmental sciences + Philosophy + + + Environmental sciences + + + Human Ecology + + + Epidemiology + + + Knowledge, Theory of + + + Ethics + + + Ethnology + Research + + + Europe + History + + + Europe + Research + + + Evolution (Biology) + + + Psychology, Experimental + + + Motion pictures + Research + + + Finance + + + Art + + + Fishery sciences + + + Aquatic sciences + + + Folklore + + + Food + Research + + + Language and languages + Study and teaching + + + Forensic anthropology + + + Forests and forestry� + + + French-Canadians + + + French Canadian literature + + + Gender Identity + + + Genetics + + + Geobiology + + + Geochemistry + + + Geographic information systems + + + Geodesy + + + Geography + + + Engineering geology + + + Geology + + + Geomorphology + + + Geophysics + + + Geophysics + + + Engineering geology + + + Germanic literature + + + Gerontology + + + Gifted children + Education + + + Gay and lesbian studies + + + Health services administration + + + Health Education + + + Medical sciences + + + Medical sciences + Education + + + High temperatures + + + Education, Higher + + + Education, Higher + Administration + + + Hispanic Americans + Research + + + Histology + + + History + + + Education + History + + + Oceania + History + + + Science and civilization + + + Holocaust, Jewish (1939-1945) + + + Home economics + + + Home economics + Research + + + Horticulture + + + Hydrology + + + Scandinavia + Literature + + + Icelandic Literature + + + Immunology + + + Sociology + Research + + + Families + Research + + + Industrial arts + Study and teaching�� + + + Industrial engineering + + + Information science + + + Information technology + + + Chemistry, Inorganic + + + Instructional systems + Design + + + Intellectual property + + + International law + + + International relations + + + Islam and culture + + + Journalism + + + Jews + Research + + + Kinesiology + + + Industrial relations + + + Land use + Planning + + + Landscape architecture + + + Language and culture + + + Language arts + + + Latin America + History + + + Latin American literature + + + Latin America + Research + + + Law + + + Sociological jursiprudence + + + Library science + + + Limnology + + + Linguistics + + + Literature + + + Pacific Island literature + + + Logic + + + Low temperatures + + + Macroecology + + + Management + + + Submarine geology� + + + Marketing + + + Mass media + + + Materials science + + + Mathematics + + + Mathematics + Study and teaching + + + Mechanical engineering + + + Mechanics + + + Medical physics + + + Medical ethics + + + Radiology + + + Diagnostic imaging + + + Medicine + + + Middle Ages + + + Literature, Medieval + + + Mental health + + + Metallurgy + + + Metaphysics + + + Meteorology + + + Microbiology + + + Middle East + History + + + Middle Eastern literature + + + Middle East + Research + + + Middle School education + + + Military history + + + Military art and science + + + Mineralogy + + + Mining engineering + + + History, Modern + + + Languages, Modern + + + Literature, Modern + + + Molecular biology + + + Molecules + + + Chemistry, Physical and theoretical + + + Molecules + + + Molecular theory + + + Anatomy + Comparative + + + Multicultural education + + + + + + Museum techniques� + + + Music + + + Music + Instruction and study + + + Nanoscience + + + Nanotechnology + + + Indians of North America + Research + + + Natural resources management + + + Marine engineering + + + Middle East + Research + + + Neurobiology + + + Neurosciences + + + Africa, North + Research + + + Nuclear chemistry + + + Nuclear engineering + + + Nuclear physics + + + Nuclear Physics + + + Nursing + + + Nutrition + + + Obstetrics + + + Gynecology + + + Industrial safety + + + Industrial psychology + + + Occupational therapy + + + Ocean engineering + + + Oncology + + + Operations research + + + Ophthalmology + + + Optics + + + Chemistry, Organic + + + Organizational sociology + + + Organizational sociology + + + Osteopathic medicine + + + Pacific Area + Research + + + Packaging + + + Paleobotany + + + Paleoclimatology� + + + Paleoecology + + + Paleontology + + + Paleontology + + + Palynology + + + Parasitology + + + Particles (Nuclear physics) + + + Pastoral counseling + + + Patent laws and legislation + + + Pathology + + + Peace + Research + + + Education + + + Performing arts + + + Performing arts + Study and teaching + + + Personality + + + Petroleum engineering + + + Petroleum geology + + + Petrology + + + Pharmaceutical chemistry + + + Pharmacology + + + Pharmacology + + + Philosophy + + + Education + Philosophy + + + Religion + Philosophy + + + Science + Philosophy + + + Physical anthropology + + + Chemistry, Physical and theoretical + + + Physical education and training + + + Physical geography + + + Oceanography + + + Physical therapy + + + Physics + + + Psychophysiology + + + Physiology + + + Planetology + + + Botany + + + Plant diseases + + + Plant physiology + + + Botany + + + Magnetohydrodynamics + + + Plastics + + + Plate tectonics + + + Political Science + + + Polymers + + + Polymerization + + + Psychobiology + + + Psychology + + + Public administration + + + Public health + + + Public health personnel + Education + + + Public policy + + + Psychometrics + + + Quantum theory + + + Radiation chemistry + + + Range management + + + Reading + + + Recreation + + + Tourism + + + Area studies + + + Religion + + + Religious education + + + Religion + History + + + Remote sensing + + + Rhetoric + + + Robotics + + + Romance literature + + + Russia + History + + + Sewerage + + + Scandinavia + Research + + + School management and organization + + + Educational counseling + + + Science + Study and teaching + + + Education, Secondary + + + Sedimentology + + + Slavic literature + + + Slavic countries + Research + + + Social psychology + + + Social sciences + Research + + + Social sciences + Study and teaching + + + Social structure + + + Social service + + + Sociolinguistics + + + Sociology + + + Educational sociology + + + Soil science + + + Solid state physics + + + South Africa + Research + + + South Asia + Research + + + Special education + + + Oral communication + + + Speech therapy + + + Spirituality + + + Sports administration + + + Statistics + + + Africa, Sub-Saharan + Research + + + Surgery + + + Sustainability + + + System theory + + + Biology + Classification + + + Teachers + Training of + + + Communication of technical information + + + Textile research + + + Theater + + + Theater + History + + + Theology + + + Mathematics + + + Mathematical physics + + + Physics + + + Toxicology + + + Transportation + Planning + + + Urban forestry + + + City planning + + + Veterinary medicine + + + Virology + + + Vocational education + + + Water-supply + Management + + + World Wide Web + Research + + + Wildlife conservation + + + Wildlife management + + + Women's studies + + + Wood + Research + + + World history + + + Zoology + + \ No newline at end of file diff --git a/dspace/src/main/resources/edu/georgetown/library/fileAnalyzer/proquestXsl/marc.xsl b/dspace/src/main/resources/edu/georgetown/library/fileAnalyzer/proquestXsl/marc.xsl new file mode 100644 index 0000000..3042cb9 --- /dev/null +++ b/dspace/src/main/resources/edu/georgetown/library/fileAnalyzer/proquestXsl/marc.xsl @@ -0,0 +1,242 @@ + + + + + + + + + + + + + + + + + + + + + + + + An author is required + + + A completed date is required + + + Comp date must be a 4 digit year + + + A title is required + + + A degree is required + + + + http://www.loc.gov/MARC21/slim + http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd + ntm Ic + NEW + + + + + m o d + cr ||||||||||| + + ||||||s + + dcu o 000 0 eng + + + DGU + rda + DGU + + + DGUU + + + + + + + + + + + author + + aut + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + text + txt + rdacontent + + + unmediated + n + rdamedia + + + volume + nc + rdacarrier + + + + + + + + + + + + + + + + + + Advisor: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + advisor + + + + + + + + degree granting institution + + + + + + + + + CONNECT TO ONLINE RESOURCE. + + + + + + + diff --git a/dspace/src/main/resources/edu/georgetown/library/fileAnalyzer/proquestXsl/proquest.xsl b/dspace/src/main/resources/edu/georgetown/library/fileAnalyzer/proquestXsl/proquest.xsl new file mode 100644 index 0000000..9d95d17 --- /dev/null +++ b/dspace/src/main/resources/edu/georgetown/library/fileAnalyzer/proquestXsl/proquest.xsl @@ -0,0 +1,70 @@ + + + + + + lfms + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dspace/src/main/resources/edu/georgetown/library/fileAnalyzer/proquestXsl/proquest2ingest-dc.xsl b/dspace/src/main/resources/edu/georgetown/library/fileAnalyzer/proquestXsl/proquest2ingest-dc.xsl new file mode 100644 index 0000000..18d3132 --- /dev/null +++ b/dspace/src/main/resources/edu/georgetown/library/fileAnalyzer/proquestXsl/proquest2ingest-dc.xsl @@ -0,0 +1,227 @@ + + + + + University Name + University Location + + + + + + + + + An author is required + + + Only one author is allowed + + + A completed date is required + + + Only one comp date is allowed + + + Comp date must be a 4 digit year + + + A title is required + + + Only one title is allowed + + + A degree is required + + + Only one degree is allowed + + + + + + creator + + + + + + + format + extent + + + + + title + + + + + + + + + title + alternative + + + + + + + + + date + created + + + + date + issued + + + + + date + submitted + + + + + publisher + + + + source + + + + source + + + + + contributor + advisor + + + + + + + + + + + + subject + lcsh + + + + + + + + + + + + + + + + + + + + subject + other + + + + + + + + + + + language + + + + + description + abstract + + + + + + + + + + description + + + + format + + + + + + + + + + subject + + + + + + + + + subject + + + + + + + + element + none + + + + + + + + + + + element + none + + + + + + + + + + + + + + diff --git a/dspace/src/main/resources/edu/georgetown/library/fileAnalyzer/proquestXsl/proquest2ingest-local.xsl b/dspace/src/main/resources/edu/georgetown/library/fileAnalyzer/proquestXsl/proquest2ingest-local.xsl new file mode 100644 index 0000000..b115f92 --- /dev/null +++ b/dspace/src/main/resources/edu/georgetown/library/fileAnalyzer/proquestXsl/proquest2ingest-local.xsl @@ -0,0 +1,49 @@ + + + + + local + embargo + terms + custom-date + + + + + + + + + + + + + + + + 6-months + 1-year + 2-years + forever + custom + + + + + + + + + + + + + + + + + + + + + diff --git a/dspace/src/main/resources/edu/georgetown/library/fileAnalyzer/proquestXsl/proquest2marc.xsl b/dspace/src/main/resources/edu/georgetown/library/fileAnalyzer/proquestXsl/proquest2marc.xsl new file mode 100644 index 0000000..4352a04 --- /dev/null +++ b/dspace/src/main/resources/edu/georgetown/library/fileAnalyzer/proquestXsl/proquest2marc.xsl @@ -0,0 +1,64 @@ + + + + + + 20130225090000.10 + University Name + University Location + + + + + + + + lfm + + + + + s + + + + + + + + + + + + + + + + + + + + + + + + + fmls + + + + + + + + + + + + lfms + + + + diff --git a/pom.xml b/pom.xml index 5ea53c4..8d2c132 100644 --- a/pom.xml +++ b/pom.xml @@ -14,6 +14,11 @@ + + + src/main/resources + +