Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Covering Edges #15

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open

Conversation

mikulatomas
Copy link
Contributor

First implementation of CoveringEdges alg from Concept data analysis: Theory and applications. I have created separated PR for this, before I will try build lattice with it.

It accepts concept_list which should be ConceptList and context it yields tuples ((extent, intent), (lower_extent, lower_intent)). There is a question if better API would be yielding Concept instead of extent, intent tuple. But with respect to fcbo API, this should be better and cleaner. That can be changed later during lattice implementation.

There is also dual version which yields upper neighbours, I can implement it also if needed.

No unit-test yet, tested against fcapsy implementation.

@codecov-commenter
Copy link

codecov-commenter commented Apr 28, 2021

Codecov Report

Merging #15 (8907323) into master (b784eb5) will decrease coverage by 1.71%.
The diff coverage is 55.93%.

Impacted file tree graph

@@             Coverage Diff             @@
##            master      #15      +/-   ##
===========================================
- Coverage   100.00%   98.28%   -1.72%     
===========================================
  Files           23       24       +1     
  Lines         1461     1519      +58     
===========================================
+ Hits          1461     1493      +32     
- Misses           0       26      +26     
Impacted Files Coverage Δ
concepts/algorithms/covering_edges.py 54.05% <54.05%> (ø)
concepts/contexts.py 95.87% <57.14%> (-4.13%) ⬇️
concepts/algorithms/__init__.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b784eb5...8907323. Read the comment docs.

@mikulatomas
Copy link
Contributor Author

Added doctest. Do we want return Concept instead of plain tuple? It will be more clean, on the other hand fcbo still returns plain tuples

Copy link
Owner

@xflr6 xflr6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, some more comments below.

concepts/algorithms/covering_edges.py Show resolved Hide resolved
concepts/algorithms/covering_edges.py Show resolved Hide resolved
concepts/algorithms/covering_edges.py Outdated Show resolved Hide resolved
concepts/algorithms/covering_edges.py Show resolved Hide resolved
@xflr6
Copy link
Owner

xflr6 commented May 18, 2021

Added doctest. Do we want return Concept instead of plain tuple? It will be more clean, on the other hand fcbo still returns plain tuples

Sorry for missing this. Concept is really a lattice member so I think a more basic interface is better to separate concerns (in other words we return the plans to build a lattice not the structure itself).

P.S.: we can still have a wrapper layer for convenience

@mikulatomas
Copy link
Contributor Author

Added doctest. Do we want return Concept instead of plain tuple? It will be more clean, on the other hand fcbo still returns plain tuples

Sorry for missing this. Concept is really a lattice member so I think a more basic interface is better to separate concerns (in other words we return the plans to build a lattice not the structure itself).

P.S.: we can still have a wrapper layer for convenience

Ah I though that Concept (from lattice_members.py) is lattice member and Concept (as NamedTuple) is for usage outside lattice. But I am fine with using plain Tuple.

@xflr6
Copy link
Owner

xflr6 commented May 18, 2021

Hehe, no that's right of course, we might need to sort these names out (although type annotations help, but maybe we should not rely on that too much).

@xflr6
Copy link
Owner

xflr6 commented May 18, 2021

WDYT about returning quadruples of this kind (with bitsets for the first two)?

'lattice': [
((), (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (1, 2, 3, 4, 5, 6), ()),
((0,), (0, 3, 5, 6, 9), (7, 8, 9), (0,)),
((1,), (0, 3, 5, 7, 8), (7, 10, 11), (0,)),
((2,), (1, 2, 5, 6, 9), (8, 12, 13), (0,)),
((3,), (1, 2, 5, 7, 8), (10, 12, 14), (0,)),
((4,), (1, 3, 4, 6, 9), (9, 13, 15), (0,)),
((5,), (1, 3, 4, 7, 8), (11, 14, 15), (0,)),
((0, 1), (0, 3, 5), (18, 19), (1, 2)),
((0, 2), (5, 6, 9), (16, 18), (1, 3)),
((0, 4), (3, 6, 9), (16, 19), (1, 5)),
((1, 3), (5, 7, 8), (17, 18), (2, 4)),
((1, 5), (3, 7, 8), (17, 19), (2, 6)),
((2, 3), (1, 2, 5), (18, 20), (3, 4)),
((2, 4), (1, 6, 9), (16, 20), (3, 5)),
((3, 5), (1, 7, 8), (17, 20), (4, 6)),
((4, 5), (1, 3, 4), (19, 20), (5, 6)),
((0, 2, 4), (6, 9), (21,), (8, 9, 13)),
((1, 3, 5), (7, 8), (21,), (10, 11, 14)),
((0, 1, 2, 3), (5,), (21,), (7, 8, 10, 12)),
((0, 1, 4, 5), (3,), (21,), (7, 9, 11, 15)),
((2, 3, 4, 5), (1,), (21,), (12, 13, 14, 15)),
((0, 1, 2, 3, 4, 5), (), (), (18, 19, 20, 16, 17)),
],

@xflr6
Copy link
Owner

xflr6 commented May 18, 2021

@xflr6
Copy link
Owner

xflr6 commented May 18, 2021

Some notes:

import typing

import bitsets

class LatticeConceptInfo(typing.NamedTuple):

    extent: bitsets.bases.BitSet

    intent: bitsets.bases.BitSet

    lower_neighbors: typing.Tuple[int, ...]

    upper_neighbors: typing.Tuple[int, ...]

@mikulatomas
Copy link
Contributor Author

Some notes:

import typing

import bitsets

class LatticeConceptInfo(typing.NamedTuple):

    extent: bitsets.bases.BitSet

    intent: bitsets.bases.BitSet

    lower_neighbors: typing.Tuple[int, ...]

    upper_neighbors: typing.Tuple[int, ...]

You wanna use this structure as output from the covering edges? My plan is to keep this function as close as possible to the pseudocode from the book and then use this for building lattice. Where, that is the question, I will first complete the function and than can look where will be the best place to use it.

My first idea was to have something like Lattice.fromconceptlist(list) where list would be ConceptList, but since fcbo is not returning ConceptList we need some better idea :) Maybe just plug fcbo + covering_edges into Lattice instead of upper neighbor alg. Or keep them both and have some optional parameter.

@xflr6
Copy link
Owner

xflr6 commented May 19, 2021

You wanna use this structure as output from the covering edges?

Yep (just spelling out the earlier quadruple proposal).

My plan is to keep this function as close as possible to the pseudocode from the book and then use this for building lattice.

Okay, I see now that this actually returns extents (instead of concept indexes): that is fine, but I think we still might avoid nesting in the result (and use a namedtuple).

... ('ABD', '2')]

>>> concept_list = ConceptList.frompairs(
... map(lambda c: (context._Objects.frommembers(c[0]),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use of private API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any tips how to avoid that? Problem is that this whole function is probably also for internal use only. Maybe the whole example can be defined other way, originally wanted to avoid using fast_generate_from in the example, but what do you think? That would remove this line. Example could be context -> fast_generate_from -> ConceptList -> covering_edges.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused: Can't the function just use the output of a concept generator like fcbo as input?

context -> fast_generate_from -> ConceptList -> covering_edges.

See the other discussion on avoiding materialization. :)

Although, I might be missing something and need to read the paper first.

Problem is that this whole function is probably also for internal use only.

In case a test fiddles with internals or otherwise has a complex setup, state requirement, or assertions, it should probably be done in tests/. Doctests are better for simple (pure) functions and if they are used as an example, they should only use official API (in general, I think a good API is one that just flows naturally in the repl). As this is a pure function, my hope is we can have the doctest both guard the implementation and serve to understand the API.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option might be to create the input from scratch in the test (creating the two bitsets.bitsets).

Copy link
Contributor Author

@mikulatomas mikulatomas May 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasons why context is needed are the prime operators and bitset classes.

Copy link
Contributor Author

@mikulatomas mikulatomas May 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, that is a clever solution, thanks. I will add that and remove context.

PS: So do we want to force covering_edges to accept an iterator? Inside it will be transformed into dict anyway, but that would allow directly passing fcbo output.

Copy link
Contributor Author

@mikulatomas mikulatomas May 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked the comments once again. So, do you agree, that covering_edges would yield NamedTuple's:

class ConceptMappingInfo(typing.NamedTuple):

    concept: concepts._common.Concept

    lower_neighbors: typing.Tuple[concepts._common.Concept, ...]

I am still confused what is the purpose of concepts._common.Concept but it would be useful here.

Other option is (hope I got typing for bitset right):

class ConceptMappingInfo(typing.NamedTuple):

    extent: bitsets.bases.BitSet

    intent: bitsets.bases.BitSet

    lower_neighbors: typing.Tuple[typing.Tuple[bitsets.bases.BitSet, bitsets.bases.BitSet], ...]

As for input for covering_edges, I would be for:

def covering_edges(extent_intent_pairs: typing.Iterator[typing.Tuple[bitsets.bases.BitSet,
                                                              bitsets.bases.BitSet]]):
    pass

In that case, fast_generate_from can be directly plugged in. Inside function, your example (#15 (comment)) would be used. So covering_edges assumes, that pairs of extent/intent will be plugged in, not concepts related to some lattice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea mentioned in #15 (comment) is interesting, but I am worried it would be better to build full lattice from ConceptMappingInfo in next step.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick notes:

Still nesting (yield quadruples, see earlier discussion)?

class ConceptMappingInfo(typing.NamedTuple):

    concept: concepts._common.Concept

IIUC this differs from Yield mapping edge as ``((extent, intent), (lower_extent, lower_intent)), is it intended to include both the extent and the intent of neighbors even though one of both suffices for a reference?

lower_neighbors: typing.Tuple[concepts._common.Concept, ...]

Same here:

 lower_neighbors: typing.Tuple[typing.Tuple[bitsets.bases.BitSet, bitsets.bases.BitSet], ...]

nit: typing.Iterable (prefer generic types for arguments e.g. Sequence and concrete ones for returns, e.g. List, cf. Postel's law)

def covering_edges(extent_intent_pairs: typing.Iterator[typing.Tuple[bitsets.bases.BitSet,
                                                              bitsets.bases.BitSet]]):

We might need to do some more to express the requirement that the bitset pairs need to be ones that where for prime in matrices.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You right, that both extent/intent are not required for storing the relationship, so probably something like:

class ConceptMappingInfo(typing.NamedTuple):

    extent: bitsets.bases.BitSet

    intent: bitsets.bases.BitSet

    lower_neighbors_extents: typing.Tuple[bitsets.bases.BitSet, ...]

Agree with typing.Iterable as input. Maybe we will get better big picture when we will try to build whole lattice with this function? Then we can modify the API more and keep it public or make it fully internal only function.

Do you agree that adding lattice function into covering_edges.py in the same fashion as in lindig.py would be good approach? That function would accept context as argument, because internally uses fast_generate_from to generate concepts and covering_edges to build mapping (that is main difference to the one in the lindig.py output will be same). This lattice function should be easily pluggable into existing data structure.

Then we can finish the whole code.

concepts/algorithms/covering_edges.py Show resolved Hide resolved
@xflr6
Copy link
Owner

xflr6 commented May 19, 2021

My first idea was to have something like Lattice.fromconceptlist(list) where list would be ConceptList,
but since fcbo is not returning ConceptList we need some better idea :)

Just in case: Let's make sure to avoid unneeded materialization into sequences when we can use iterables.

@mikulatomas
Copy link
Contributor Author

Okay, I see now that this actually returns extents (instead of concept indexes): that is fine, but I think we still might avoid nesting in the result (and use a namedtuple).

Ok, agree, I will propose some solution.

@mikulatomas
Copy link
Contributor Author

My first idea was to have something like Lattice.fromconceptlist(list) where list would be ConceptList,
but since fcbo is not returning ConceptList we need some better idea :)

Just in case: Let's make sure to avoid unneeded materialization into sequences when we can use iterables.

Oh yeah, sorry, forget that there is no reason not to use yield in fast_generate_from. But since covering_edges requires to have sequence of extent/intent pairs we can assume that in fast_generate_from + covering_edges there will have to be some intermediate sequence to store fcbo output. Then generator from covering_edges should be enough to construct lattice on the fly (since we will know all concepts in advance).

@mikulatomas
Copy link
Contributor Author

Did some first prototype of the lattice function which use fcbo and covering edges. There is bob ross benchmark, looks promising (https://gist.github.com/mikulatomas/930d4e772a5548a02825a4b0eb6d71a7).

# MacBook Pro (13-inch, 2020, Four Thunderbolt 3 ports)
# 2 GHz Quad-Core Intel Core i5
# 16 GB 3733 MHz LPDDR4X

# <Context object mapping 403 objects to 67 properties [e5cfb4c5] at 0x10e2cf670>
# Covering edges: 1.3 seconds
# Lindig: 15.2 seconds

We can start thinking about the API, do we want to keep both Lindig and Covering edge for building lattice? If yes, we need some sort of switch or parameter. Current version of lattice with covering edges does not return concepts in short lexicographic order.

@mikulatomas
Copy link
Contributor Author

A few modifications, i will rebase into smaller number of commits for better review.

@mikulatomas
Copy link
Contributor Author

New benchmark results after modifications:

Covering edges: 0.8 seconds
Lindig: 14.6 seconds

@mikulatomas
Copy link
Contributor Author

mikulatomas commented Jul 12, 2021

It needs some tests, where is the best place to write them? If i understand correctly you are not directly testing concepts.algorithms.lindig.lattice anywhere (that would be best place to test concepts.algorithms.covering_edges.lattice.

EDIT: I will work on the parallel version meanwhile.

@xflr6
Copy link
Owner

xflr6 commented Jul 12, 2021

Great news. :)

It needs some tests, where is the best place to write them?

Tests are mostly organized by filename, can go here:

def test_lattice(lattice):

We can also divide into test_algorithms_lindig.py, ... (maybe that is nicer?).

FYI: I have migrated the tests to GitHub actions: with 8e6d0bf, I think we should see test results here again.

not directly testing concepts.algorithms.lindig.lattice anywhere

We might want to change that (currently assert on the lattice object but that will be created by different code paths), so we probably want to re-use test items and expected results.

@mikulatomas
Copy link
Contributor Author

Have some time to write tests and modify API. Where do you think is the best place to give user the ability to change which algorithm will build the lattice? Since Context .lattice is lazyproperty there is no room for some argument with alg name. Thanks!

@mikulatomas
Copy link
Contributor Author

One possibility is to modify Context class:

def __init__(self,
             objects: typing.Iterable[str],
             properties: typing.Iterable[str],
             bools: typing.Iterable[typing.Tuple[bool, ...]],
             method: str = 'lindig',
             n_of_processes: int = 1) -> None:

But it seems not that good for me, since parallel version is supported only for covering_edges and also, since you are not forced to calculate lattice during lifetime of Context object, it would be better to modify Context.lattice property.

Is it possible to have Context.lattice() cached method instead of property? It is definitely not that pretty, but I do not see any other way of implementing multiple lattice building algorithm which has different API (for example lindig need infimum, covering edges accept n_of_processes etc).

If you are fine with using strings as method names we can do:

def lattice(self, method: str = 'lindig', n_of_processes: int = 1) -> 'lattices.Lattice':
    """The concept lattice of the formal context.

    Args:
        method: Name of the method used for building lattice.
        n_of_processes: Number of processes used during lattice building, supported only for covering edges.
    Returns:
         lattices.Lattice: Cached or new :class:`lattices.Lattice` instance.
    """
    pass

@xflr6
Copy link
Owner

xflr6 commented Sep 18, 2021

Thanks.

Where do you think is the best place to give user the ability to change which algorithm will build the lattice?
Since Context.lattice is lazyproperty there is no room for some argument with alg name.

+1. My initial thought would be something along the following lines (using a class attribute for the default allows to create a trivial subclass to override it):

class Context:

    algorithm_for_lattice: str = 'lindig'

    def __init__(self, ..., *, algorithm_for_lattice: Optional[str] = None):
         ...
        if algorithm_for_lattice is not None:
            self.algorithm_for_lattice = algorithm_for_lattice

since you are not forced to calculate lattice during lifetime of Context object

For that reason, I would ignore algorithm_for_lattice in __eq__. I would consider creating .lattice one of the key features of Context, so it seems fine to me in terms of practicality to keep that information on the class/instance even though users might also use them only to compare or convert contexts without ever building the lattice, WDYT?

for example lindig need infimum

I think most users would go with the default of an empty extent as the infimum (exposing it as an optional argument allows to build partial lattices). Can you maybe remind me whether/why it's not possible to have the same feature with the alternate covering edges implementation?

I tend to disprefer breaking backwards compatibility by changing .lattice into a method. However, we might end up with a slightly overloaded API where we might raise usage errors for combinations that are undefined (as I think you already have in the back of your mind).

Maybe another option would be to keep .lattice as is with defaults and provide another method (or multiple methods) to fill the .lattice property explicitly providing finer control with all the arguments we would like to be able to specify (some naming ideas: build_lattice(), set_lattice(), make_lattice_lindig())?

@mikulatomas
Copy link
Contributor Author

mikulatomas commented Sep 20, 2021

I think most users would go with the default of an empty extent as the infimum (exposing it as an optional argument allows to build partial lattices). Can you maybe remind me whether/why it's not possible to have the same feature with the alternate covering edges implementation?

I would say, it should be possible in the future. If you pass all concepts higher than some concepts with given non-empty extent (infimum) to covering_edges you should be able to build partial lattice. Since we are using fast_generate_from for concept generation (all of them) you will have to filter them first, then pass them into covering_edges.

I will try to draft some API additions.

@mikulatomas
Copy link
Contributor Author

I have drafted some simple API addition in last commit. All old unittests passed, so API should be compatible with previous version. I have not written any new tests so far.

Not sure if raise NotImplementedError is in the right spot, I prefer it inside __init__ because you wanna know that given combination is not available ASAP (not in time of the calculation later). Also the two list with available algorithms _parallel_algorithms and _single_thread_algorithms are maybe unnecessary.

Maybe the algorithm_for_lattice and process_count can be @property decorated where we handle legal names and combinations? Because it can be useful to change the method on already created Context objects (idk what to do in case of already calculated lattice).

What about classmethods like fromstring, what is the right approach? shall we duplicate parameters to allow load Context from string/file and use different than default lindig alg in one step?

Docstrings and tests will be added when API will be approved.

@mikulatomas
Copy link
Contributor Author

I spent some time by checking what will be the best way to extend existing lattice tests.

I have tried more lattice fixtures for example:

@pytest.mark.parametrize('lattice', 
                        ['lattice', 'lattice_fcbo', 'lattice_fcbo_parallel'],
                        indirect=True)
def test_eq_concepts(lattice):
    other = concepts.Context(*lattice._context.definition()).lattice
    c = other[16]

    for attname in ('index', 'dindex'):
        i = getattr(c, attname)
        setattr(c, attname, -1)
        assert not other._eq(lattice)
        setattr(c, attname, i)

    for attname in ('atoms', 'properties', 'objects'):
        t = tuple(getattr(c, attname))
        if attname == 'objects':
            setattr(c, attname, ('spam',))
        else:
            setattr(c, attname, tuple(reversed(t)))
        assert not other._eq(lattice)
        setattr(c, attname, t)

But it looks that larger refactor of tests will be required. Maybe it's better to test lattice builder on level of the algorithm and not on the level of lattice object? Currently looks like you testing that lattice is properly built in test_lattice.py.

@mikulatomas
Copy link
Contributor Author

Currently reconsidering this PR, not sure if the performance/complexity ratio is worth it in current state, I will definitely keep it open but I will currently prioritise other PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants