-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Covering Edges #15
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #15 +/- ##
===========================================
- Coverage 100.00% 98.28% -1.72%
===========================================
Files 23 24 +1
Lines 1461 1519 +58
===========================================
+ Hits 1461 1493 +32
- Misses 0 26 +26
Continue to review full report at Codecov.
|
089dc43
to
e4bbc29
Compare
6a765a9
to
53b8c83
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates, some more comments below.
Sorry for missing this. P.S.: we can still have a wrapper layer for convenience |
Ah I though that Concept (from |
Hehe, no that's right of course, we might need to sort these names out (although type annotations help, but maybe we should not rely on that too much). |
WDYT about returning quadruples of this kind (with bitsets for the first two)? concepts/tests/test_serialization.py Lines 25 to 48 in 015fb73
|
Some notes: import typing
import bitsets
class LatticeConceptInfo(typing.NamedTuple):
extent: bitsets.bases.BitSet
intent: bitsets.bases.BitSet
lower_neighbors: typing.Tuple[int, ...]
upper_neighbors: typing.Tuple[int, ...] |
You wanna use this structure as output from the covering edges? My plan is to keep this function as close as possible to the pseudocode from the book and then use this for building lattice. Where, that is the question, I will first complete the function and than can look where will be the best place to use it. My first idea was to have something like |
53b8c83
to
bc6b126
Compare
Yep (just spelling out the earlier quadruple proposal).
Okay, I see now that this actually returns extents (instead of concept indexes): that is fine, but I think we still might avoid nesting in the result (and use a namedtuple). |
... ('ABD', '2')] | ||
|
||
>>> concept_list = ConceptList.frompairs( | ||
... map(lambda c: (context._Objects.frommembers(c[0]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use of private API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any tips how to avoid that? Problem is that this whole function is probably also for internal use only. Maybe the whole example can be defined other way, originally wanted to avoid using fast_generate_from
in the example, but what do you think? That would remove this line. Example could be context -> fast_generate_from
-> ConceptList
-> covering_edges
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused: Can't the function just use the output of a concept generator like fcbo
as input?
context -> fast_generate_from -> ConceptList -> covering_edges.
See the other discussion on avoiding materialization. :)
Although, I might be missing something and need to read the paper first.
Problem is that this whole function is probably also for internal use only.
In case a test fiddles with internals or otherwise has a complex setup, state requirement, or assertions, it should probably be done in tests/
. Doctests are better for simple (pure) functions and if they are used as an example, they should only use official API (in general, I think a good API is one that just flows naturally in the repl). As this is a pure function, my hope is we can have the doctest both guard the implementation and serve to understand the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option might be to create the input from scratch in the test (creating the two bitsets.bitset
s).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reasons why context is needed are the prime operators and bitset classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right, that is a clever solution, thanks. I will add that and remove context.
PS: So do we want to force covering_edges
to accept an iterator? Inside it will be transformed into dict anyway, but that would allow directly passing fcbo output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have checked the comments once again. So, do you agree, that covering_edges
would yield NamedTuple
's:
class ConceptMappingInfo(typing.NamedTuple):
concept: concepts._common.Concept
lower_neighbors: typing.Tuple[concepts._common.Concept, ...]
I am still confused what is the purpose of concepts._common.Concept
but it would be useful here.
Other option is (hope I got typing for bitset right):
class ConceptMappingInfo(typing.NamedTuple):
extent: bitsets.bases.BitSet
intent: bitsets.bases.BitSet
lower_neighbors: typing.Tuple[typing.Tuple[bitsets.bases.BitSet, bitsets.bases.BitSet], ...]
As for input for covering_edges
, I would be for:
def covering_edges(extent_intent_pairs: typing.Iterator[typing.Tuple[bitsets.bases.BitSet,
bitsets.bases.BitSet]]):
pass
In that case, fast_generate_from
can be directly plugged in. Inside function, your example (#15 (comment)) would be used. So covering_edges
assumes, that pairs of extent/intent will be plugged in, not concepts related to some lattice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea mentioned in #15 (comment) is interesting, but I am worried it would be better to build full lattice from ConceptMappingInfo
in next step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick notes:
Still nesting (yield quadruples, see earlier discussion)?
class ConceptMappingInfo(typing.NamedTuple):
concept: concepts._common.Concept
IIUC this differs from Yield mapping edge as ``((extent, intent), (lower_extent, lower_intent))
, is it intended to include both the extent and the intent of neighbors even though one of both suffices for a reference?
lower_neighbors: typing.Tuple[concepts._common.Concept, ...]
Same here:
lower_neighbors: typing.Tuple[typing.Tuple[bitsets.bases.BitSet, bitsets.bases.BitSet], ...]
nit: typing.Iterable
(prefer generic types for arguments e.g. Sequence
and concrete ones for returns, e.g. List
, cf. Postel's law)
def covering_edges(extent_intent_pairs: typing.Iterator[typing.Tuple[bitsets.bases.BitSet,
bitsets.bases.BitSet]]):
We might need to do some more to express the requirement that the bitset pairs need to be ones that where for prime in matrices.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You right, that both extent/intent are not required for storing the relationship, so probably something like:
class ConceptMappingInfo(typing.NamedTuple):
extent: bitsets.bases.BitSet
intent: bitsets.bases.BitSet
lower_neighbors_extents: typing.Tuple[bitsets.bases.BitSet, ...]
Agree with typing.Iterable
as input. Maybe we will get better big picture when we will try to build whole lattice with this function? Then we can modify the API more and keep it public or make it fully internal only function.
Do you agree that adding lattice
function into covering_edges.py
in the same fashion as in lindig.py
would be good approach? That function would accept context
as argument, because internally uses fast_generate_from
to generate concepts and covering_edges
to build mapping (that is main difference to the one in the lindig.py
output will be same). This lattice
function should be easily pluggable into existing data structure.
Then we can finish the whole code.
Just in case: Let's make sure to avoid unneeded materialization into sequences when we can use iterables. |
Ok, agree, I will propose some solution. |
Oh yeah, sorry, forget that there is no reason not to use yield in |
bc6b126
to
7c42833
Compare
Did some first prototype of the lattice function which use fcbo and covering edges. There is bob ross benchmark, looks promising (https://gist.github.com/mikulatomas/930d4e772a5548a02825a4b0eb6d71a7).
We can start thinking about the API, do we want to keep both Lindig and Covering edge for building lattice? If yes, we need some sort of switch or parameter. Current version of lattice with covering edges does not return concepts in short lexicographic order. |
A few modifications, i will rebase into smaller number of commits for better review. |
b4a4b2b
to
4188f17
Compare
New benchmark results after modifications:
|
It needs some tests, where is the best place to write them? If i understand correctly you are not directly testing EDIT: I will work on the parallel version meanwhile. |
Great news. :)
Tests are mostly organized by filename, can go here: concepts/tests/test_algorithms.py Line 36 in 24e6541
We can also divide into FYI: I have migrated the tests to GitHub actions: with 8e6d0bf, I think we should see test results here again.
We might want to change that (currently assert on the |
Have some time to write tests and modify API. Where do you think is the best place to give user the ability to change which algorithm will build the lattice? Since |
9ac78aa
to
7209db3
Compare
One possibility is to modify def __init__(self,
objects: typing.Iterable[str],
properties: typing.Iterable[str],
bools: typing.Iterable[typing.Tuple[bool, ...]],
method: str = 'lindig',
n_of_processes: int = 1) -> None: But it seems not that good for me, since parallel version is supported only for covering_edges and also, since you are not forced to calculate lattice during lifetime of Is it possible to have If you are fine with using strings as method names we can do: def lattice(self, method: str = 'lindig', n_of_processes: int = 1) -> 'lattices.Lattice':
"""The concept lattice of the formal context.
Args:
method: Name of the method used for building lattice.
n_of_processes: Number of processes used during lattice building, supported only for covering edges.
Returns:
lattices.Lattice: Cached or new :class:`lattices.Lattice` instance.
"""
pass |
Thanks.
+1. My initial thought would be something along the following lines (using a class attribute for the default allows to create a trivial subclass to override it): class Context:
algorithm_for_lattice: str = 'lindig'
def __init__(self, ..., *, algorithm_for_lattice: Optional[str] = None):
...
if algorithm_for_lattice is not None:
self.algorithm_for_lattice = algorithm_for_lattice
For that reason, I would ignore
I think most users would go with the default of an empty extent as the infimum (exposing it as an optional argument allows to build partial lattices). Can you maybe remind me whether/why it's not possible to have the same feature with the alternate covering edges implementation? I tend to disprefer breaking backwards compatibility by changing Maybe another option would be to keep |
I would say, it should be possible in the future. If you pass all concepts higher than some concepts with given non-empty extent (infimum) to I will try to draft some API additions. |
I have drafted some simple API addition in last commit. All old unittests passed, so API should be compatible with previous version. I have not written any new tests so far. Not sure if Maybe the What about classmethods like Docstrings and tests will be added when API will be approved. |
I spent some time by checking what will be the best way to extend existing lattice tests. I have tried more lattice fixtures for example: @pytest.mark.parametrize('lattice',
['lattice', 'lattice_fcbo', 'lattice_fcbo_parallel'],
indirect=True)
def test_eq_concepts(lattice):
other = concepts.Context(*lattice._context.definition()).lattice
c = other[16]
for attname in ('index', 'dindex'):
i = getattr(c, attname)
setattr(c, attname, -1)
assert not other._eq(lattice)
setattr(c, attname, i)
for attname in ('atoms', 'properties', 'objects'):
t = tuple(getattr(c, attname))
if attname == 'objects':
setattr(c, attname, ('spam',))
else:
setattr(c, attname, tuple(reversed(t)))
assert not other._eq(lattice)
setattr(c, attname, t) But it looks that larger refactor of tests will be required. Maybe it's better to test lattice builder on level of the algorithm and not on the level of lattice object? Currently looks like you testing that lattice is properly built in |
Currently reconsidering this PR, not sure if the performance/complexity ratio is worth it in current state, I will definitely keep it open but I will currently prioritise other PRs. |
First implementation of CoveringEdges alg from Concept data analysis: Theory and applications. I have created separated PR for this, before I will try build lattice with it.
It accepts
concept_list
which should be ConceptList andcontext
it yields tuples((extent, intent), (lower_extent, lower_intent))
. There is a question if better API would be yieldingConcept
instead of extent, intent tuple. But with respect to fcbo API, this should be better and cleaner. That can be changed later during lattice implementation.There is also dual version which yields upper neighbours, I can implement it also if needed.
No unit-test yet, tested against
fcapsy
implementation.