Skip to content

Releases: lightvector/KataGo

Various bugfixes for search and training

18 Feb 05:10
Compare
Choose a tag to compare

This release is not the latest release, see newer release v1.13.0!

New Neural Net Architecture Support (release series v1.12.x)

The same as prior releases in the v1.12.x series, this release KataGo has recently-added support a new neural net architecture! See the release notes for v1.12.0 for details! The new neural net, "b18c384nbt" is also attached in this release for convenience, for general analysis use it should be similar in quality to recent 60-block models, but run significantly faster due to being a smaller net. For other recent trained nets, download them from https://katagotraining.org/.

What's Changed in v1.12.4

This specific release v.1.12.4 addresses a variety of small bugs or behavioral oddities in KataGo that should improve some rare issues for analysis, and improve the training data:

  • Added a crude hack to mitigate an issue where in positions with a large misevaluation by the raw net that normally search could fix, if the search happened to try the unlikely move of passing, the opponent passing in response could prevent or greatly delay the search from converging to the right evaluation. Controlled by new config parameter enablePassingHacks, which defaults to true for GTP and analysis and false elsewhere.
  • Changed the search to be more aware of the difference between computer-like rulesets that require capturing stones before ending of the game and human-like rulesets that don't in when the game end is triggered within variations of the search itself. This and the above passing hack are intended to address a rare behavior oddity newly discovered in recent KataGo versions in the last week or two prior to this release.
  • Fixed a bug where komi was accidentally initialized to be inverted when generating training data from existing board positions where White moved first rather than Black.
  • Fixed a bug where when hiding the history for the input to the neural net, the historical ladder status of stones would not get hidden, leaking information about past history.
  • Fixes a bug in parsing the komi on certain rules strings (thanks @hzyhhzy)
  • Updated the genconfig command's produced configs to match the new formatting and inline documentation for GTP configs introduced in an earlier release.
  • Minor fixes and features for the tools for generating and handling hint positions for custom training.

OpenCL and TensorRT Bugfixes

22 Jan 14:26
Compare
Choose a tag to compare

This release is not the latest release, see newer release v1.12.4!

New Neural Net Architecture Support (release series v1.12.x)

The same as prior releases in the v1.12.x series, this release KataGo has recently-added support a new neural net architecture! See the release notes for v1.12.0 for details! The new neural net, "b18c384nbt" is also attached in this release for convenience, for general analysis use it should be similar in quality to recent 60-block models, but run significantly faster due to being a smaller net.

What's Changed in v1.12.3

This specific release v.1.12.3 fixes a few additional bugs in KataGo:

  • Fixes performance regression for some GPUs on TensorRT that was introduced along with v.1.12.x (thanks @hyln9 !) (#741)
  • Mitigates a long-standing performance bug on OpenCL, where on GPUs that used dynamic boost or dynamic clock speeds, the GPU tuner would not get accurate timings due to variable GPU clock speed, most notably on a few users machines causing the tuner to fail to select FP16 tensor cores even when the GPU supported them and they would be much better performance. Most users will not see an improvement, but a few may see a large improvement. The fix is to add some additional computation to the GPU during tuning so that it is less likely to reduce its clock speed. (#743)
  • Fixes an issue where depending on settings, in GTP or analysis Katago might fail to treat two consecutive passes as ending the game within its search tree.
  • Fixes an issue in the pytorch training code that prevented models from being easily trained on variable tensor sizes (i.e. max board sizes) in the data.
  • Contribute command in OpenCL will now also pretune for the new b18c384nbt architecture the same way pretunes for all other models.

New Neural Net Architecture! (a few more followup bugfixes)

11 Jan 05:09
Compare
Choose a tag to compare

This release is not the latest release, see newer release v1.12.3 for further bugfixes!

This is a bugfix release following a release of KataGo that supports a new neural net architecture, v1.12.0!
If you want to know more about the improvements and/or other API changes, check the release notes there!*

Users of the TensorRT version upgrading to this version of KataGo will also need to upgrade from TensorRT 8.2 to TensorRT 8.5

If you're a new user, don't forget to check out this section for getting started and basic usage! If you don't know which version to choose (OpenCL, CUDA, TensorRT, Eigen, Eigen AVX2), read this: https://github.com/lightvector/KataGo#opencl-vs-cuda-vs-tensorrt-vs-eigen

Also, KataGo is continuing to improve at https://katagotraining.org/ and if you'd like to donate your spare GPU cycles and support it, it could use your help there!

Changes

In addition to the bugfix to TensorRT computing incorrect values in v1.12.1, this release:

  • Fixes some major issues in OpenCL (not just TensorRT) where the OpenCL tuner may select extremely poorly performing or even outright bad or failing parameters sometimes.
  • Upgrades TensorRT from 8.2 to 8.5 and substantially improves loading and timing-cache initialization times for multi-GPU machines, and removes dependency of TensorRT on CUDNN, also supports newer GPUs. Thanks to @hyln9 for all of this work!
  • Adds some support in config parsing to be able to specify file paths, passwords, or other strings with hash signs or trailing spaces.
  • Adds some better internal tests and error checking for contributing data to the public run.

New Neural Net Architecture! (and bugfix for TensorRT)

08 Jan 16:31
Compare
Choose a tag to compare

Particularly for OpenCL users, see v1.12.2 for a newer release that fixes various performance bugs.

This is a quick followup bugfix for a release of KataGo that supports a new neural net architecture, v1.12.0!
If you're a new user, or want to know more about the improvements and/or other API changes, check the release notes there!

The bug fixed in this release v1.12.1 is a bug for the TensorRT backend. The prior version v1.12.0, using the new net, will compute incorrect evaluations in TensorRT and/or potentially bad moves in some positions.

New Neural Net Architecture!

08 Jan 04:43
Compare
Choose a tag to compare

This release is not the latest release, see newer release v1.12.4!

For TensorRT users only, this release contains a bug, the new net may compute incorrect values. Additionally, the OpenCL version may tune poorly and experience some errors or performance issues, upgrading to the newest release is recommended.

If you're a new user, don't forget to check out this section for getting started and basic usage! If you don't know which version to choose (OpenCL, CUDA, TensorRT, Eigen, Eigen AVX2), read this: https://github.com/lightvector/KataGo#opencl-vs-cuda-vs-tensorrt-vs-eigen

Also, KataGo is continuing to improve at https://katagotraining.org/ and if you'd like to donate your spare GPU cycles and support it, it could use your help there!

As before, attached here are "bs29" versions of KataGo. These are just for fun, and don't support distributed training but DO support board sizes up to 29x29. They may also be slower and will use much more memory, even when only playing on 19x19, so you should use them only when you really want to try large boards.

The Linux executables were compiled on an old 18.04 Ubuntu machine. As with older releases, they might not work, and it may be more reliable to build KataGo from source yourself, which fortunately is usually not so hard on Linux (https://github.com/lightvector/KataGo/blob/master/Compiling.md).

New Neural Net Architecture

This version of KataGo adds support for a new and improved neural net architecture!

The new neural nets use a new nested residual bottleneck structure, along with other major improvements in training. They train faster than KataGo's old nets and learn more effectively. Attached to this release is a one-off net b18c384nbt-uec.bin.gz that was trained for a tournament in 2022, which should be of similar strength to the 60-block nets on http://katagotraining.org/, but on many machines will run much faster, on some machines between 40-block and 60-block speed, but on some machines even as fast as or faster than 40-block.

The training code has been all rewritten to use pytorch instead of tensorflow. The training scripts and self-play scripts should be updated to account for the new implementation, but feel free to open an issue if something was overlooked.

Many thanks to "lionfenfen" and YiXiaoTian, and "Medwin", for contributing ideas and discussions and testing for improving the training, and "inbae" for the initial work and catalyst of the new pytorch implementation. Many thanks also to those on the discord server who helped with testing.

Once enough contributors have switched to this release, the new architecture will also be integrated into KataGo's main public run, where hopefully it can drive future improvement. If you are a contributor to http://katagotraining.org/, please upgrade if you can. Thanks again to everyone!

Other Notable Changes

Analysis Engine (doc)

  • Added "terminate_all" command. #727
  • Analysis engine now echos errors and warnings for bad queries to stderr by default, which can be optionally disabled. #728
  • A few additional values are now reported, including "weight", and two parameters rawStWrError rawStScoreError that measure the neural net's estimation of its own uncertainty about a position.
  • Fixed minor weighting oddity in calculation of pruned root values.

Self-play training and Contribution

  • Self-play data generation and/or contribution to the public run are changed to calculate and record slightly less-noisy values now for auxiliary value training.
  • Slightly better automated error checking was added for contributing to the public run.
  • Added some parameters to support more flexible komi initialization in selfplay. These will likely also be used for KataGo's public run soon after enough people upgrade.
  • Fixed bug in turn number computation for SGFs that specify board positions when generating sgfposes/hintposes for self-play training.

Algorithm and performance improvements

  • Improved LCB implementation to be smoother and also handle lower visits, giving a mild strength improvement for low-visits.
  • Fixed possible bug with extreme komi inputs to the neural net.
  • Improved OpenCL performance tuning logic.

Other

  • Cleaned up and clarified gtp_example.cfg. #714
  • Fixed several more bugs in the handling of non-ascii file paths.
  • Cleanup and much greater flexibility added to KataGo's config system and logging #630
  • Fixed bug where KataGo would not handle some SGF placements correctly if those placements edited and replaced many different stones on the board in the middle of a game, where all edits together would be legal but subsets of those edits might be illegal.

Graph Search and Other Improvements

20 Mar 20:42
Compare
Choose a tag to compare

If you're a new user, don't forget to check out this section for getting started and basic usage! If you don't know which version to choose (OpenCL, CUDA, TensorRT Eigen, Eigen AVX2), read this: https://github.com/lightvector/KataGo#opencl-vs-cuda-vs-tensorrt-vs-eigen

Also, KataGo is continuing to improve at https://katagotraining.org/ and if you'd like to donate your spare GPU cycles and support it, it could use your help there!

As before, attached here are "bs29" versions of KataGo. These are just for fun, and don't support distributed training but DO support board sizes up to 29x29. They may also be slower and will use much more memory, even when only playing on 19x19, so you should use them only when you really want to try large boards.

Changes This Release

Search Improvements

Graph Search

KataGo has a new stronger MCTS implementation that operates on a graph rather than a tree! Different move sequences that lead to the same positions are recombined and searched only once instead of separately, with careful handling of ko and superko situations to ensure provable partial guarantees on the correctness of the recombining. The amount of improvement given by graph search appears to be highly variable, depending heavily on the particular hardware, number of threads, thinking time per move or number of playouts, and balance of CPU/GPU speed, ranging from no improvement at all on some configurations, to nearly 100 Elo on some configurations.

Better Parameters

A few of KataGo's default search parameters have been slightly tuned and improved. Additionally, KataGo implements a new FPU (first-play-urgency) method that implements the heuristic "if existing moves have turned out worse than expected, prefer exploring entirely new moves more, and if existing moves have turned out better than expected, prefer verifying those moves more and exploring entirely new moves less". All of these changes might be worth somewhere from 15 to 40 Elo together. Thanks to sbbdms, fuhaoda, Wing, and many others for helping test these parameters.

Interface and User Improvements

  • The katago contribute command for contributing to the public distributed run at https://katagotraining.org/ now supports pausing and resuming!

    • While KataGo is running, if you type pause and hit enter, KataGo will stop its CPU and GPU usage (may take a few seconds to a minute), but will continue to remember its current state (note: KataGo will continue using RAM and GPU memory to do so, pausing only halts computation).
    • Use resume to resume computation.
    • You can also give the commands quit and forcequit which correspond to pressing Ctrl-C once or twice, causing KataGo to exit after finishing current contribution games, or exit as soon as possible discarding unfinished games.
  • On Windows, KataGo should now handle non-ascii file paths and directories. Hopefully for real this time.

Changes for Developers

  • As a result of graph search, the pvVisits array indicating the number of visits on at each point of a PV may no longer be monotone, since a position with few visits may transpose to a position with many visits. A new pvEdgeVisits output is now available that distinguishes the count of visits that reach a given position, and the count of visits that make a particular move (since a given position may now be reached by more than one move along the graph).
  • The kata-analyze command in GTP now can report the predicted ownership for each individual move (see movesOwnership in the documentation).
  • There is a new GTP extension kata-benchmark NVISITS which will run a simple benchmark from within GTP.
  • Fixed a bug in KataGo book hashing that might theoretically cause incorrect book transpositions, and greatly reduced the disk space requirements for KataGo's book file. Both the bugfix and reduction only applies to new books generated with the newest version.
  • Added checkbook command to test the integrity of a book file.

Other Improvements and Bugfixes

  • Added a link to a simple/elegant KataGo-based GUI, Ogatak.
  • Added option to contribute to perform rating games only, and documented a few more minor options.
  • GTP printsgf command now records into the SGF an intelligent final score rather than a naive all-stones-are-alive score.
  • Fixed bug where avoid moves option in GTP and analysis engine might include avoided moves anyways if the search was performed for the player that would not normally move next.
  • Fixed bug in the computation that considered whether to suppress a pass in some cases for rules compatibility.
  • Significantly optimized ownership calculation speed for long/deep searches.
  • Selfplay now records a few additional stats about policy and search surprise and entropy into the .npz files.
  • Python neural net model training now tracks export cycle across restarts.
  • Major internal refactors and cleanups of KataGo's search code.
  • Various other documentation improvements

TensorRT Backend, Many Minor Improvements

24 Oct 19:55
Compare
Choose a tag to compare

If you're a new user, don't forget to check out this section for getting started and basic usage! If you don't know which version to choose (OpenCL, CUDA, TensorRT Eigen, Eigen AVX2), read this: https://github.com/lightvector/KataGo#opencl-vs-cuda-vs-tensorrt-vs-eigen

Also, KataGo is continuing to improve at https://katagotraining.org/ and if you'd like to donate your spare GPU cycles and support it, it could use your help there!

As before, attached here are "bs29" versions of KataGo. These are just for fun, and don't support distributed training but DO support board sizes up to 29x29. They may also be slower and will use much more memory, even when only playing on 19x19, so you should use them only when you really want to try large boards.

New TensorRT Backend

There is a new TensorRT backend ("trt8.2") in this release thanks to some excellent work by hyln9! On strong NVIDIA GPUs, this backend can often be 1.5x the speed of any other backend. It is NOT universally faster however, sometimes the CUDA backend can still be faster than the TensorRT backend. The two backends may also prefer different numbers of threads - try running the benchmark to see. TensorRT also tends to take noticeably longer to start up.

Using TensorRT requires an NVIDIA GPU and CUDA 11.1+ and CUDNN 8.2+ and TensorRT 8.2 (precompiled executables in this release use CUDA 11.1 for linux and CUDA 11.2 for Windows) which you can download and install manually from NVIDIA: https://developer.nvidia.com/tensorrt, https://developer.nvidia.com/cuda-toolkit, https://developer.nvidia.com/cudnn.

If you want an easier out-of-the-box setup and/or are using other GPUs, then OpenCL is still recommended as the easiest to get working.

Minor Features and Improvements

  • KataGo antimirror logic for GTP is slightly improved.
  • Analysis engine and kata-analyze now support reporting the standard deviation of ownership across search ("ownershipStdev")
  • Added minor options for random high-temperature policy initialization to katago match command.
  • Very slight cross-backend performance improvement - most configurations by default will now avoid multi-board-size GPU masking code if only one board size is used. (Analysis engine is the one major exception, you must specify requireMaxBoardSize, maxBoardXSizeForNNBuffer, maxBoardYSizeForNNBuffer in the config and then must not query for other board sizes).
  • Added the code used to generate the books at https://katagobooks.org/, runnable by ./katago genbook with example config at https://github.com/lightvector/KataGo/blob/master/cpp/configs/book/genbook7jp.cfg. You can generate your own books if you like, although be prepared to dive into the source code if you want to know exactly what particular parameters do.

Bugfixes

  • KataGo should now (hopefully) handle non-ascii file paths on Windows.
  • GTP/Analysis "avoid" option now correctly applies when there is only 1-playout and moves are based on raw policy.
  • GTP/Analysis "avoid" option now correctly interacts with root symmetry pruning.
  • Fixed various bugs with GTP command loadsgf
  • Fixed minor issue reporting analysis values for terminal positions.
  • Fixed issue where during multithreading analysis would report zero-visit moves with weird stats.
  • Fix minor possible race if multiple katago distributed training contributes are started at once on the same machine.
  • More-reliably tolerate and retry corrupted downloads in contribute command for online distributed training
  • Benchmark now respects defaultBoardSize in config.
  • Fixed issue in cmake build setup with mingw in Windows.
  • Fixed issue with swa_model namespace when loading a preexisting model for train.py for model training.

Analysis engine bugfixes

30 Jun 03:26
Compare
Choose a tag to compare

If you're a new user, don't forget to check out this section for getting started and basic usage! If you don't know which version to choose (OpenCL, CUDA, Eigen, Eigen AVX2), read this: https://github.com/lightvector/KataGo#opencl-vs-cuda-vs-eigen

This is a quick bugfix release. See notes for previous release for info about for major recent improvements, including significant strength improvements, new features you can specify to configure KataGo's behavior and make it play a wider variety of moves, and various performance enhancements.

Also, KataGo is continuing to improve at https://katagotraining.org/ and if you'd like to donate your spare GPU cycles and support it, it could use your help there!

As before, attached here are "bs29" versions of KataGo. These are just for fun, and don't support distributed training but DO support board sizes up to 29x29. They may also be slower and will use much more memory, even when only playing on 19x19, so you should use them only when you really want to try large boards.

Changes In This Release

  • Fixed bug where analysis engine would crash if a query was terminated before the query began analyzing.
  • Analysis engine will now output ownership to an accuracy of 10^-6 and all other values to an accuracy of 7-8 decimal places past the most significant digit. Hopefully this is more than enough precision for all practical purposes, while noticeably reducing the response message size.
  • Parameter overrides in the analysis engine for entirely unknown parameters will now warn and still perform the query ignoring that parameter instead of producing an error.
  • Got rid of a harmless race in the contribute command for KataGo distributed training that could produce slightly more confusing output or error messages.

Better Search, Threads, Analysis Improvements, and More

28 Jun 04:04
Compare
Choose a tag to compare

If you're a new user, don't forget to check out this section for getting started and basic usage!

KataGo is continuing to improve at https://katagotraining.org/ and if you'd like to donate your spare GPU cycles and support it, it could use your help there!

If you don't know which version to choose (OpenCL, CUDA, Eigen, Eigen AVX2), read this: https://github.com/lightvector/KataGo#opencl-vs-cuda-vs-eigen

Also attached here are "bs29" versions of KataGo. These are just for fun, and don't support distributed training but DO support board sizes up to 29x29. KataGo's neural nets will probably be still very strong on large boards, but as usual they are not trained for these sizes, so no complete guarantees. The "bs29" version may also be slower and will use much more memory, even when only playing on 19x19, so you should use them only when you really want to try large boards.

Major Changes and Improvements and New Features

  • Major improvements in KataGo's search algorithm. KataGo might be somewhere around 75 Elo stronger than v1.8.2 with the same latest neural nets. Half of this improvement might also be applicable to older/smaller nets, although not tested. Thanks to @sbbdms and @fuhaoda for extensive help testing these parameters.

  • Major improvements in multithreaded search performance on stronger GPUs, or with multiple GPUs, by implementing close-to-lockless MCTS search. (merging the long-open "highthreads" code branch). In the extreme case, performance on multiple A100s might be more than doubled.

  • New option avoidRepeatedPatternUtility to make KataGo prefer to avoid playing a joseki that it already played the same game in a different corner. See config for more details.

  • New set of options where you can specify your own SGF files to encourage KataGo to avoid (or play) various lines, or to avoid repeating itself and play a greater variety of moves across a series of games. See config for details.

  • It's surprisingly hard to find a nice tool for summarizing win/loss results for SGFs and computing Elos given a set of test matches between different bots/configs/players. KataGo now has a small python3 library and script that does it, run like python summarize_sgfs.py /path/to/directory/of/sgfs, run it with --help for more info.

  • KataGo now leverages symmetry for searching the first few moves of the game if the position is symmetric, and will open as black in the upper right corner by default. Thanks to @fuhaoda for helping implement this.

  • KataGo will now search a much wider variety of moves during analysis by default. (analysisWideRootNoise).

  • For OpenCL users: somewhat improved the reliability of the OpenCL tuning to find good configurations and not pick bad ones as often, on some GPUs. KataGo v1.9.0 by default will continue to use the same tuning as from earlier versions, but if you want to re-run the tuner on v1.9.0, at any time you can run or rerun it like ./katago.exe tuner -model path/to/the/neuralnetfile.bin.gz -config path/to/your_gtp_config.cfg. Thanks to @EZonGH for reporting and testing.

Dev-facing Changes (Analysis Engine)

  • Added new clear_cache command to analysis engine.

  • Analysis engine now also reports the current player to move in the root info.

  • Analysis engine and GTP kata-analyze are now updated to report (isSymmetryOf) when a move's stats are symmetric copies of other move's stats, for when KataGo's new symmetry handling, rootSymmetryPruning enabled. Basically KataGo only searches one of each symmetrically equivalent move, and the stats for the others are set to be copies of the original, (with appropriately rotated PVs).

Bugfixes and Optimizations

  • Mostly mitigated the problem where if there are too many search threads some of those threads are forced to search poor moves (since all the good moves are already taken by other threads), whose poor values then bias the MCTS average values enough to cause KataGo to miss tactics or play very weird moves.

  • Reduce up the lag between moves when using large numbers of playouts (hundreds of thousands or millions) by multithreadedly deallocating the search tree or processing it for tree reuse.

  • Fixed several rare memory access and threading bugs, including one that could rarely cause KataGo to crash outright.

  • Improved the quality of rootInfo for analysis engine (thanks to @sanderland)

  • Some small internal performance optimizations in the board code (thanks to @fuhaoda).

  • Fixed some warnings and CMake issues for compiling with clang (thanks to @TFiFiE).

Distributed Training - Selfplay Diversity Fixes

19 Apr 03:52
Compare
Choose a tag to compare

If you're a new user, don't forget to check out this section for getting started and basic usage!

KataGo is continuing to improve at https://katagotraining.org/ and if you'd like to donate your spare GPU cycles and support it, it could use your help there!

If you don't know which version to choose (OpenCL, CUDA, Eigen, Eigen AVX2), read this: https://github.com/lightvector/KataGo#opencl-vs-cuda-vs-eigen

What's New This Version

If you are a user who helps with distributed training on https://katagotraining.org/ it would be great if you could update to this version, which is also the new tip of the stable branch as soon as it is convenient! And let me know in the issues if there are any new problems you encounter with it. Thanks!

This is a minor release mainly of interest to contributors to KataGo's distributed run, or to users who run KataGo self-play training on their own GPUs. This release doesn't make any changes to the KataGo engine itself, but it does fix an issue that was believed to be limiting the diversity in KataGo's self-play games. Switching to this version should over the long term of training, improve KataGo's learning, particularly on small boards, as well as enable a few further parameter changes in the future once most people have upgraded which should also further improve opening diversity.

And, still coming hopefully not too long after this should be a release with some strength and performance improvements for general users. :)

Changes

  • Separately sample komi for initializing the board versus actually playing the game
  • Rework and refactor komi initialization to make komi randomization more consistent. Newly consistent applies it to some cases missed before (e.g. cleanup training)
  • Polish up and improve many aspects of the logic for a few commands of interest to devs who run specialized training, such as dataminesgfs.