Skip to content

Latest commit

 

History

History
494 lines (332 loc) · 13.5 KB

NEWS.md

File metadata and controls

494 lines (332 loc) · 13.5 KB

'stylo' news

version 0.7.5, 2024/04/02

  • size.penalize() renamed to samplesize.penalize(), for CRAN
  • bug in size.penalize() fixed
  • improved performance of dist.minmax()
  • oppose() update, to allow having just one text per set
  • a solid clean-up in a few functions
  • several minor improvements here and there

version 0.7.4, 2020/12/5

  • option for shading in rolling.classify()
  • performance.measures() greatly improved
  • supervised classifiers updated, to be compliant with cross-validation
  • SVM output fixed
  • bugs in rolling.classify() fixed
  • bugs in load.corpus() causing codepage mismatches fixed
  • general code cleanup

version 0.7.3, 2020/08/11

  • perfom.svm() improved to work with R >4.0.0
  • oppose() not restricted anymore to have at least 2 texts per set
  • better color management in rolling.classify()
  • CPU performance improvements

version 0.7.2, 2020/04/20

  • fixes required by CRAN to meet R >3.6.3 requirements
  • CPU performance improvements
  • improvements in performance.measures()
  • confusion matrices fixed
  • oppose() update, to allow having just one text per set

version 0.7.1, 2019/11/4

  • improvements in crossv(): confusion matrix fully operational
  • new funcion performance.measures(), providing recall, precision, f1, etc.
  • performance measures made available via classify()
  • new function size.penalize() to assess minimal sample size
  • extension to the generic plot() function, to plot size.penalize() results

version 0.7.0, 2019/01/22

  • Unicode (UTF-8) made the default encoding, also for Windows

version 0.6.9, 2019/01/20

  • check.encoding() and change.encoding() introduced
  • GUI allows for changing the working directory with one click
  • metadata handling through a dedicated variable
  • {Steffen Pielström joins!}

version 0.6.8, 2018/06/14

  • support for JCK (Japanese-Chinese-Korean) significantly improved
  • a fix for exporting networks to Gephi ver. 0.9.2
  • support for rmarkdown: stylo(), classify(), oppose()

version 0.6.7, 2018/05/12

  • supports the following taggers: TaKIPI (for Polish), Alpino (Dutch)
  • the Imposters method reimplemented, via the new function imposters()
  • fine tuning the parameters of the Imposters method via imposters.optimize()

version 0.6.6, 2018/04/13

  • Cosine Delta implemented and aviable via GUI
  • Min-Max distance implemented
  • Entropy distance implemented

version 0.6.5, 2017/11/03

  • support for interactive network visualisations via stylo.network()
  • corrected Spanish pronouns
  • fixes in documentation
  • countless minor fixes

version 0.6.4, 2016/09/08

  • citation hint updated; to see the changes type: citation("stylo")
  • the impostors method almost implemented, see help(perform.impostors)
  • confusion table for supervised classification via classify()
  • a separate funtion for cross-validation, see help(crossv)
  • a significant change in SVM wrapper: the procedure automatically gets rid of the variables with all 0s in the training set
  • the file inst/CITATION updated to meet recent CRAN requirements
  • man files for perform.delta, perform.svm etc. updated: new executable examples added, so that one can perform a supervised test without any corpus
  • perform.knn(), perform.svm() etc. improved, in order to handle custom vectors of classes provided by a user
  • an improved output of the oppose() function

version 0.6.3, 2015/12/20

  • significant performance improvement in make.table.of.frequencies()
  • PCA values (rotation, explained variance, etc.) saved in final results

version 0.6.2, 2015/11/11

  • the package 'stringi' involved to optimize n-gram computing
  • three datasets added to the package
    • data(novels), a collection of 9 novels by the Bronte sisters and Jane Austen (full text)
    • data(galbraith), a table of frequencies of 26 novels by 5 authors, including Galbraith's "Cacoo's Calling"
    • data(lee), a table of frequencies of 28 American novels by 8 authors, including the new novel by Harper Lee
  • new version of make.table.of.frequencies(), which speeds up the tasks radically
  • delete.markup(), delete.stop.words(), make.samples(), make.frequency.list(), txt.to.features(), txt.to.words.ext() remodelled so that can be applied to single texts and/or to corpora
  • countless improvements in most of the functions

version 0.6.1, 2015/09/27

  • UTF-8 issue in txt.to.words.ext() fixed, according to the CRAN's request

version 0.6.0, 2015/08/17

  • support for Georgian
  • plot size in rolling.classify() improved
  • distance measure engine thoroughly restructured
  • custom distance measures allowed
  • cosine distance introduced
  • new functions: dist.cosine(), dist.delta(), dist.argamon(), dist.eder(), dist.simple()
  • extracting POS tags via the function parse.pos.tags()

version 0.5.9-3, 2015/07/2

  • support for Coptic
  • customizable graphs size in rolling.classify()
  • custom graph filename
  • integration with CLARIN-PL stylometric infrastructure

version 0.5.9, 2015/01/30

  • non-ASCII chars in the source code neutralized (required by CRAN)
  • random sampling substantially improved

version 0.5.8-3, 2014/10/26

  • bug fixes: options for assign.plot.colors()

version 0.5.8-2, 2014/10/19

  • bug fixes: 'start.at' parameter in stylo()

version 0.5.8-1, 2014/09/23

  • bug fixes (mostly: colors on dendrograms)

version 0.5.8, 2014/09/3

  • new sequential methods available: rolling SVM, rolling NSC, and rolling Delta
  • bug in load.corpus.and.parse() fixed
  • bug in rolling.delta() fixed
  • network related bug in stylo() neutralized
  • classification procedures as separate functions: perform.delta(), perform.svm(), perform.knn(), perform.naivebayes(), perform.nsc()
  • classification output enhanced
  • doc files for new functions added

version 0.5.7, 2014/08/13

  • culling implemented as a separate function
  • custom stop words deletion: delete.stop.words()
  • a thoroughly re-written oppose() to use the same tokenizing, corpus loading, sampling etc. functions as stylo() and classify()
  • zeta.chisquare(), zeta.craig(), and zeta.eder() derrived as separate functions
  • gui.oppose() derrived as a separate function
  • distinctive words visualization in oppose() improved
  • draw.polygons derrived as a separate function (hidden to the end user, though)
  • cross-validation in classify() improved
  • fixed bug in cross-validation for naivebayes
  • a very unpleasant bug in oppose() fixed: the preferred and avoided words were calculated using the I set only
  • help files significatnly improved

version 0.5.6, 2014/04/20

  • support for Unicode on Windows
  • support for a few non Latin scripts
  • experimental support for CJK (Chinese-Japanese-Korean)
  • the function txt.to.words() remodelled
  • loading corpus files improved
  • printing variables on screen improved
  • better class inheritance
  • an issue with hclust and "ward", "ward.D" fixed
  • man files extended and updated

version 0.5.5, 2014/04/03

  • cross-validation in classify()
  • lots of bugs fixed

version 0.5.4, 2014/02/25

  • tSNE implemented
  • preserve.case option
  • more flexible function for splitting input text

version 0.5.3, 2014/01/2

  • custom regular expressions to tokenize input texts
  • support for external corpora or frequencies
  • support for external set of features (e.g. frequent words)
  • class "stylo.results" for formatting final results
  • class "stylo.corpus" for formatting loaded corpora
  • class "stylo.data" for formatting tables and vectors
  • PCA coordinates piped to final results
  • optional choice between relative/raw frequencies
  • xml support improved (bug fixed)
  • codepage bug in oppose() fixed

version 0.5.2, 2013/09/07

  • CRAN-related issue with .Rbuildignore fixed
  • network analysis support significantly improved
  • improvements in man pages

version 0.5.1, 2013/08/07

  • bug fixes, minor improvements
  • different options for k-NN and SVM
  • submitted to CRAN for the first time (!)

version 0.5.0-58, 2013/08/06

  • batch mode improved
  • several clustering algorithms available

version 0.5.0-50, 2013/07/24

  • man pages revised and improved

version 0.5.0-49, 2013/07/18

  • poster presentation at DH2013 (Lincoln, NE)
  • minor improvements

version 0.5.0-48, 2013/06/26

  • namespace issues solved
  • documentation corrected (typos)

version 0.5.0-45, 2013/06/12

  • arguments can be passed from command-line
  • man pages cleaned and extended
  • global variables abandoned
  • innumerable minor improvements

version 0.5.0-43, 2013/04/31

  • thousands of changes and improvements
  • documentation improved and augmented
  • stylo R package (un)officially released

version 0.5.0-30, 2013/04/26

  • changes in names of some functions
  • code cleaning, improvements, improvements, ...

version 0.5.0-23, 2013/05/24

  • first prototype of an R package

version 0.5.0-1, 2013/04/03

  • first attempt to port the stylo script into R package

version 0.4.9-2, 3013/05/27

  • code OS-independent
  • minor cleaning

version 0.4.9-1, 2013/04/02

  • experimental support for network analysis (output to Gephi)
  • bugs fixed

version 0.4.9, 2013/03/06

  • added option to dump samples for closer post-analysis inspection

version 0.4.8, 2012/12/29

  • customizable plot area, font size, etc.
  • thoroughly rewritten code for margins assignment
  • scatterplots represented either by points, or by labels, or by both (customizable label offset)
  • saving the words (features) actually used
  • saving the table of actually used frequencies

version 0.4.7, 2012/11/25

  • new output/input extensions: optional custom list of files to be analyzed, saving distance table(s) to external files
  • support for TXM Textometrie Project
  • color cluster analysis graphs (at last!)

version 0.4.6, 2012/09/09

  • code revised, cleaned, bugs fixed

version 0.4.5-4, 2012/09/03

  • added 2 new PCA visualization flavors

version 0.4.5-3, 2012/08/31

  • new GUI written

version 0.4.5-2, 2012/08/27

  • added functionality for normal sampling

version 0.4.5-1, 2012/08/22

  • support for Dutch added
  • {Mike Kestemont joins!}

version 0.4.5, 2012/07/07

  • option for choosing corpus files
  • code cleaned; bugs fixed

version 0.4.4, 2012/05/31

  • the core code rewritten
  • I/II set division abandoned
  • GUI remodeled
  • GUI tooltips added
  • different input formats supported (xml etc.)
  • config options loaded from external file
  • the code forked into (1) the Stylo script, supporting explanatory analyses (MDS, Cons. Trees, ...), (2) the Classify script for machine-learning methods (Delta, SVM, NSC, Bayes)

version 0.4.3, 2012/04/28

  • feature selection (word and character n-grams)

version 0.4.2, 2012/02/10

  • three ways of splitting words in English
  • bugs fixed
  • GUI code rearranged and simplified

version 0.4.1, 2011/06/27

  • better output
  • better text files uploading
  • new options for culling and ranking of candidates

version 0.4.0, 2011/06/20

  • the official world-premiere, at DH2011 (Stanford, CA)

version 0.3.9b, 2011/06/1

  • the code simplified; minor cleaning

version 0.3.9, 2011/05/21

  • uploading wordlist from external source
  • thousands of improvements
  • the code simplified

version 0.3.8, 2010/11/01

  • skip top frequency words option added

version 0.3.7, 2010/11/01

  • better graphs
  • attempt at better graph layout

version 0.3.6, 2010/07/31

  • more graphic options
  • dozens of improvements

version 0.3.5, 2010/07/19

  • module for color graphs
  • module for PCA

version 0.3.4, 2010/07/12

  • module for uploading corpus files improved

version 0.3.3, 2010/06/03

  • the core code simplified and improved (faster!)

version 0.3.2, 2010/05/10

  • reordered GUI
  • minor cleaning

version 0.3.1, 2010/05/10

  • the z-scores module improved

version 0.3.0, 2009/12/26

  • better counter of "good guesses"
  • option for randomly generated samples
  • minor improvements

version 0.2.99, 2009/12/25

  • platform-independent outputfile saving

version 0.2.98, 2009/12/24

  • GUI thoroughly integrated with initial variables

version 0.2.10, 2009/11/28

  • corrected MFW display in graph
  • more analysis description in outputfile

version 0.2.9, 2009/11/22

  • auto graphs for MSD and CA

version 0.2.8a, 2009/11/21

  • remodeled GUI

version 0.2.8, 2009/11/20

  • GUI: radiobuttons, checkbuttons

version 0.2.7, 2009/11/19

  • language-determined pronoun selection

version 0.2.6, 2009/11/18

  • dialog box (GUI)
  • {Jan Rybicki joins!}

version 0.2.5, 2009/11/16

  • module for different distance measures
  • thousands of improvements (I/O, interface, etc.)

version 0.2.2, 2009/10/25

  • numerous little improvements
  • deleting pronouns

version 0.2.1, 2009/08/23

  • module for culling
  • module for bootstrapping

version 0.2.0, 2009/08/23

  • module for uploading plain text files

version 0.1.9, 2009/08/1

  • innumerable improvements
  • the code simplified
  • {this version was completed on a train from Leipzig to Krakow (a looong trip...), after a very successful R course taught by Stefen Gries at ESU "C&T", Leipzig, Germany (26-31/08/2009)}

version 0.1.4, 2009/07/19

  • loop for different MFW settings

version 0.0.1, 2009/07/01

  • some bash and awk scripts translated into R