Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError in get_colnames_features #3

Open
rsancheztaksoiai opened this issue Oct 17, 2023 · 3 comments
Open

ValueError in get_colnames_features #3

rsancheztaksoiai opened this issue Oct 17, 2023 · 3 comments

Comments

@rsancheztaksoiai
Copy link

Hi,
Trying to run the example in the docs, but got this error:

colnames_features = np.array([bleu_score, edit_distance, lcs,transformer_score, one_in_one])

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (5,) + inhomogeneous part.

@fireindark707
Copy link
Owner

Hello @rsancheztaksoiai

Do you mean the example using python package?

Install

pip install schema-matching

Example

from schema_matching import schema_matching

df_pred,df_pred_labels,predicted_pairs = schema_matching("Test Data/QA/Table1.json","Test Data/QA/Table2.json")
print(df_pred)
print(df_pred_labels)
for pair_tuple in predicted_pairs:
    print(pair_tuple)

I just tried that and it works well:

schema_matching|Loading sentence transformer, this will take a while...
schema_matching|Done loading sentence transformer
                               data.title  ...  paragraphs.context
questions.body                   0.002472  ...            0.001018
questions.documents              0.000888  ...            0.000574
questions.ideal_answer           0.000896  ...            0.011124
questions.concepts               0.000594  ...            0.003764
questions.type                   0.004110  ...            0.000112
questions.id                     0.000075  ...            0.000093
snippets.offsetInBeginSection    0.000063  ...            0.000174
snippets.offsetInEndSection      0.000066  ...            0.000165
snippets.text                    0.000282  ...            0.016571
snippets.beginSection            0.001831  ...            0.000513
snippets.document                0.000643  ...            0.000653
snippets.endSection              0.004702  ...            0.000530
triples.p                        0.000357  ...            0.000383
triples.s                        0.000438  ...            0.000388
triples.o                        0.000965  ...            0.002000
questions.exact_answer           0.000799  ...            0.000229

[16 rows x 9 columns]
                               data.title  ...  paragraphs.context
questions.body                          0  ...                   0
questions.documents                     0  ...                   0
questions.ideal_answer                  0  ...                   0
questions.concepts                      0  ...                   0
questions.type                          0  ...                   0
questions.id                            0  ...                   0
snippets.offsetInBeginSection           0  ...                   0
snippets.offsetInEndSection             0  ...                   0
snippets.text                           0  ...                   0
snippets.beginSection                   0  ...                   0
snippets.document                       0  ...                   0
snippets.endSection                     0  ...                   0
triples.p                               0  ...                   0
triples.s                               0  ...                   0
triples.o                               0  ...                   0
questions.exact_answer                  0  ...                   0

[16 rows x 9 columns]
('questions.body', 'qas.question', 0.86622685)
('questions.concepts', 'qas.question', 0.17055672)
('questions.id', 'qas.id', 0.5095535)
('snippets.offsetInBeginSection', 'answers.answer_start', 0.9288852)
('snippets.offsetInEndSection', 'answers.answer_start', 0.86390895)
('questions.exact_answer', 'answers.text', 0.5319033)
('questions.exact_answer', 'plausible_answers.text', 0.56676453)

@rsancheztaksoiai
Copy link
Author

rsancheztaksoiai commented Oct 18, 2023

Hi @fireindark707 , thanks for your quick response!

Yeah, I meant that example, these are the steps I'm performing:

$mkdir python_matching

$cd python_matching/

$python3 -m venv env

$source env/bin/activate

$pip install schema-matching

$touch matching_test.py ( and copy the example code into it)

$cp -r 'Test Data'/ ~/python_matching/TestData (copy the Test Data directory from the repository)

$python3 matching_test.py

schema_matching|Loading sentence transformer, this will take a while...
schema_matching|Done loading sentence transformer
Traceback (most recent call last):
File "/python_matching/matching_test.py", line 3, in
df_pred,df_pred_labels,predicted_pairs = schema_matching("TestData/QA/Table1.json","TestData/QA/Table2.json")
File "
/python_matching/env/lib/python3.10/site-packages/schema_matching/cal_column_similarity.py", line 83, in schema_matching
features,_ = make_data_from(table1_df, table2_df, type="test")
File "/python_matching/env/lib/python3.10/site-packages/schema_matching/relation_features.py", line 122, in make_data_from
colnames_features = get_colnames_features(c1_name, c2_name,column_name_embeddings)
File "
/python_matching/env/lib/python3.10/site-packages/schema_matching/relation_features.py", line 88, in get_colnames_features
colnames_features = np.array([bleu_score, edit_distance, lcs,transformer_score, one_in_one])
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (5,) + inhomogeneous part.

@hyperdigitalplatform
Copy link

i was getting this error and able to fix the issue by adding below line:
transformer_score = transformer_score.item()

in the function :
def get_colnames_features(text1,text2,column_name_embeddings):
"""
Use BLEU, edit distance and word2vec to calculate features.
"""
bleu_score = bleu([text1], text2, smoothing_function=smoothie)
print(type(bleu_score))
edit_distance = damerau.distance(text1, text2)
lcs = metriclcs.distance(text1, text2)
transformer_score = util.cos_sim(column_name_embeddings[text1], column_name_embeddings[text2])
transformer_score = transformer_score.item()
one_in_one = text1 in text2 or text2 in text1
colnames_features = np.array([bleu_score, edit_distance, lcs,transformer_score, one_in_one])
return colnames_features

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants