Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Covering Edges #15

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion concepts/algorithms/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,11 @@
from .common import iterunion
from .fcbo import fast_generate_from, fcbo_dual
from .lindig import lattice, neighbors
from .covering_edges import lattice_fcbo

__all__ = ['iterunion',
'fast_generate_from', 'fcbo_dual',
'lattice', 'neighbors'
'lattice', 'neighbors', 'lattice_fcbo',
'iterconcepts', 'get_concepts']


Expand Down
122 changes: 122 additions & 0 deletions concepts/algorithms/covering_edges.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
"""Covering Edges

cf. Carpineto, Claudio, and Giovanni Romano.
Concept data analysis: Theory and applications.
John Wiley & Sons, 2004.
"""

import multiprocessing
import itertools
import collections

from .fcbo import fast_generate_from


def covering_edges(concept_list, context, concept_index=None):
"""Yield mapping edge as ``((extent, intent), (lower_extent, lower_intent))``
pairs (concept and it's lower neighbor) from ``context`` and ``concept_list``

Example:
>>> from concepts import make_context, ConceptList
mikulatomas marked this conversation as resolved.
Show resolved Hide resolved
>>> from concepts._common import Concept
mikulatomas marked this conversation as resolved.
Show resolved Hide resolved

>>> context = make_context('''
xflr6 marked this conversation as resolved.
Show resolved Hide resolved
... |0|1|2|3|4|5|
... A|X|X|X| | | |
... B|X| |X|X|X|X|
... C|X|X| | |X| |
... D| |X|X| | | |''')

>>> concepts = [('ABCD', ''),
... ('ABC', '0'),
... ('AC', '01'),
... ('A', '012'),
... ('', '012345'),
... ('C', '014'),
... ('AB', '02'),
... ('B', '02345'),
... ('BC', '04'),
... ('ACD', '1'),
... ('AD', '12'),
... ('ABD', '2')]

>>> concept_list = ConceptList.frompairs(
... map(lambda c: (context._Objects.frommembers(c[0]),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use of private API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any tips how to avoid that? Problem is that this whole function is probably also for internal use only. Maybe the whole example can be defined other way, originally wanted to avoid using fast_generate_from in the example, but what do you think? That would remove this line. Example could be context -> fast_generate_from -> ConceptList -> covering_edges.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused: Can't the function just use the output of a concept generator like fcbo as input?

context -> fast_generate_from -> ConceptList -> covering_edges.

See the other discussion on avoiding materialization. :)

Although, I might be missing something and need to read the paper first.

Problem is that this whole function is probably also for internal use only.

In case a test fiddles with internals or otherwise has a complex setup, state requirement, or assertions, it should probably be done in tests/. Doctests are better for simple (pure) functions and if they are used as an example, they should only use official API (in general, I think a good API is one that just flows naturally in the repl). As this is a pure function, my hope is we can have the doctest both guard the implementation and serve to understand the API.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option might be to create the input from scratch in the test (creating the two bitsets.bitsets).

Copy link
Contributor Author

@mikulatomas mikulatomas May 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasons why context is needed are the prime operators and bitset classes.

Copy link
Contributor Author

@mikulatomas mikulatomas May 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, that is a clever solution, thanks. I will add that and remove context.

PS: So do we want to force covering_edges to accept an iterator? Inside it will be transformed into dict anyway, but that would allow directly passing fcbo output.

Copy link
Contributor Author

@mikulatomas mikulatomas May 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked the comments once again. So, do you agree, that covering_edges would yield NamedTuple's:

class ConceptMappingInfo(typing.NamedTuple):

    concept: concepts._common.Concept

    lower_neighbors: typing.Tuple[concepts._common.Concept, ...]

I am still confused what is the purpose of concepts._common.Concept but it would be useful here.

Other option is (hope I got typing for bitset right):

class ConceptMappingInfo(typing.NamedTuple):

    extent: bitsets.bases.BitSet

    intent: bitsets.bases.BitSet

    lower_neighbors: typing.Tuple[typing.Tuple[bitsets.bases.BitSet, bitsets.bases.BitSet], ...]

As for input for covering_edges, I would be for:

def covering_edges(extent_intent_pairs: typing.Iterator[typing.Tuple[bitsets.bases.BitSet,
                                                              bitsets.bases.BitSet]]):
    pass

In that case, fast_generate_from can be directly plugged in. Inside function, your example (#15 (comment)) would be used. So covering_edges assumes, that pairs of extent/intent will be plugged in, not concepts related to some lattice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea mentioned in #15 (comment) is interesting, but I am worried it would be better to build full lattice from ConceptMappingInfo in next step.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick notes:

Still nesting (yield quadruples, see earlier discussion)?

class ConceptMappingInfo(typing.NamedTuple):

    concept: concepts._common.Concept

IIUC this differs from Yield mapping edge as ``((extent, intent), (lower_extent, lower_intent)), is it intended to include both the extent and the intent of neighbors even though one of both suffices for a reference?

lower_neighbors: typing.Tuple[concepts._common.Concept, ...]

Same here:

 lower_neighbors: typing.Tuple[typing.Tuple[bitsets.bases.BitSet, bitsets.bases.BitSet], ...]

nit: typing.Iterable (prefer generic types for arguments e.g. Sequence and concrete ones for returns, e.g. List, cf. Postel's law)

def covering_edges(extent_intent_pairs: typing.Iterator[typing.Tuple[bitsets.bases.BitSet,
                                                              bitsets.bases.BitSet]]):

We might need to do some more to express the requirement that the bitset pairs need to be ones that where for prime in matrices.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You right, that both extent/intent are not required for storing the relationship, so probably something like:

class ConceptMappingInfo(typing.NamedTuple):

    extent: bitsets.bases.BitSet

    intent: bitsets.bases.BitSet

    lower_neighbors_extents: typing.Tuple[bitsets.bases.BitSet, ...]

Agree with typing.Iterable as input. Maybe we will get better big picture when we will try to build whole lattice with this function? Then we can modify the API more and keep it public or make it fully internal only function.

Do you agree that adding lattice function into covering_edges.py in the same fashion as in lindig.py would be good approach? That function would accept context as argument, because internally uses fast_generate_from to generate concepts and covering_edges to build mapping (that is main difference to the one in the lindig.py output will be same). This lattice function should be easily pluggable into existing data structure.

Then we can finish the whole code.

... context._Properties.frommembers(c[1])),
... concepts))

>>> edges = covering_edges(concept_list, context)

>>> [(''.join(concept[0].members()), # doctest: +NORMALIZE_WHITESPACE
mikulatomas marked this conversation as resolved.
Show resolved Hide resolved
... ''.join(lower[0].members()))
... for concept, lower in edges]
[('ABCD', 'ABC'),
('ABCD', 'ACD'),
('ABCD', 'ABD'),
('ABC', 'AC'),
('ABC', 'AB'),
('ABC', 'BC'),
('AC', 'A'),
('AC', 'C'),
('A', ''),
('C', ''),
('AB', 'A'),
('AB', 'B'),
('B', ''),
('BC', 'C'),
('BC', 'B'),
('ACD', 'AC'),
('ACD', 'AD'),
('AD', 'A'),
('ABD', 'AB'),
('ABD', 'AD')]
"""
Objects = context._Objects
Properties = context._Properties

if not concept_index:
concept_index = dict(concept_list)

for extent, intent in concept_list:
candidate_counter = collections.Counter()

property_candidates = Properties.fromint(Properties.supremum & ~intent)

for atom in property_candidates.atoms():
extent_candidate = Objects.fromint(extent & atom.prime())
intent_candidate = concept_index[extent_candidate]
candidate_counter[extent_candidate] += 1

if (intent_candidate.count() - intent.count()) == candidate_counter[extent_candidate]:
yield (extent, intent), (extent_candidate, intent_candidate)


def _return_edges(batch, concept_index, context):
return list(covering_edges(batch, concept_index, context))


def lattice_fcbo(context, process_count=1):
"""Returns tuple of tuples in form of ``(extent, intent, upper, lower)`` in short lexicographic order."""
concepts = list(fast_generate_from(context))
concepts.sort(key=lambda concept: concept[0].shortlex())
concept_index = dict(concepts)

if process_count == 1:
edges = covering_edges(concepts, context, concept_index=concept_index)
else:
batches = [concepts[i::process_count] for i in range(0, process_count)]

with multiprocessing.Pool(process_count) as p:
results = [p.apply_async(_return_edges, (batch, context, concept_index)) for batch in batches]
edges = itertools.chain.from_iterable([result.get() for result in results])

mapping = dict([(extent, (extent, intent, [], [])) for extent, intent in concepts])

for concept, lower_neighbor in edges:
extent, _ = concept
lower_extent, _ = lower_neighbor

mapping[extent][3].append(lower_extent)
mapping[lower_extent][2].append(extent)

return tuple(mapping.values())
65 changes: 51 additions & 14 deletions concepts/contexts.py
Original file line number Diff line number Diff line change
Expand Up @@ -465,13 +465,37 @@ def _minimize(extent, intent):


class LatticeMixin:
algorithm_for_lattice: str = 'lindig'
process_count: int = 1
_parallel_algorithms: list = ['fcbo']
_single_thread_algorithms: list = ['lindig']

def __init__(self,
algorithm_for_lattice: typing.Optional[str] = None,
process_count: typing.Optional[int] = None) -> None:

if algorithm_for_lattice is not None:
if algorithm_for_lattice not in self._parallel_algorithms + self._single_thread_algorithms:
raise NotImplementedError
self.algorithm_for_lattice = algorithm_for_lattice

if process_count is not None:
if self.algorithm_for_lattice not in self._parallel_algorithms:
raise NotImplementedError
self.process_count = process_count

def _lattice(self, infimum=()):
"""Yield ``(extent, intent, upper, lower)`` in short lexicographic order.

cf. C. Lindig. 2000. Fast Concept Analysis.
"""
return algorithms.lattice(self._Objects, infimum=infimum)

if self.algorithm_for_lattice == 'lindig':
return algorithms.lattice(self._Objects, infimum=infimum)
elif self.algorithm_for_lattice == 'fcbo':
return algorithms.lattice_fcbo(self, process_count=self.process_count)
else:
raise NotImplementedError

def _neighbors(self, objects):
"""Yield upper neighbors from extent (in colex order?).
Expand Down Expand Up @@ -630,23 +654,36 @@ def todict(self, ignore_lattice: bool = False
class Context(ExportableMixin, LatticeMixin,
MinimizeMixin, PrimeMixin,
ComparableMixin, FormattingMixin, Data):
"""Formal context defining a relation between objects and properties.
"""Formal context defining a relation between objects and properties."""

Create context from ``objects``, ``properties``, and ``bools`` correspondence.
def __init__(self,
objects: typing.Iterable[str],
properties: typing.Iterable[str],
bools: typing.Iterable[typing.Tuple[bool, ...]],
algorithm_for_lattice: typing.Optional[str] = None,
process_count: typing.Optional[int] = None):
"""Create context from ``objects``, ``properties``, and ``bools`` correspondence.

Args:
objects: Iterable of object label strings.
properties: Iterable of property label strings.
bools: Iterable of ``len(objects)`` tuples of ``len(properties)`` booleans.
Args:
objects: Iterable of object label strings.
properties: Iterable of property label strings.
bools: Iterable of ``len(objects)`` tuples of ``len(properties)`` booleans.
algorithm_for_lattice: String specifing name of the default algorithm which is
used to build the lattice.

Returns:
Context: New :class:`.Context` instance.
Returns:
Context: New :class:`.Context` instance.

Example:
>>> from concepts import Context
>>> Context(['man', 'woman'], ['male', 'female'], [(True, False), (False, True)]) # doctest: +ELLIPSIS
<Context object mapping 2 objects to 2 properties [47e29724] at 0x...>
"""
Example:
>>> from concepts import Context
>>> Context(['man', 'woman'],
... ['male', 'female'],
... [(True, False), (False, True)]) # doctest: +ELLIPSIS
<Context object mapping 2 objects to 2 properties [47e29724] at 0x...>
"""
Data.__init__(self, objects, properties, bools)
LatticeMixin.__init__(self, algorithm_for_lattice, process_count)


@property
def objects(self) -> typing.Tuple[str, ...]:
Expand Down
22 changes: 22 additions & 0 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,28 @@ def lattice(context):
return context.lattice


@pytest.fixture(scope='session')
def bob_ross(test_examples, filename='bob-ross.cxt'):
path = test_examples / filename

context = concepts.load_cxt(str(path), encoding='utf-8')

assert context.shape == (403, 67)

return context


@pytest.fixture(scope='session')
def mushroom(test_examples, filename='mushroom.cxt'):
path = test_examples / filename

context = concepts.load_cxt(str(path))

assert context.shape == (8_124, 119)

return context


@pytest.fixture(params=['str', 'bytes', 'pathlike', 'fileobj'])
def path_or_fileobj(request, tmp_path, filename='context.json'):
if request.param == 'str':
Expand Down
22 changes: 0 additions & 22 deletions tests/test_algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,28 +11,6 @@
ENCODING = 'utf-8'


@pytest.fixture
def bob_ross(test_examples, filename=BOB_ROSS):
path = test_examples / filename

context = concepts.load_cxt(str(path), encoding=ENCODING)

assert context.shape == (403, 67)

return context


@pytest.fixture
def mushroom(test_examples, filename='mushroom.cxt'):
path = test_examples / filename

context = concepts.load_cxt(str(path))

assert context.shape == (8_124, 119)

return context


def test_lattice(lattice):
pairs = [f'{x._extent.bits()} <-> {x._intent.bits()}' for x in lattice]

Expand Down