We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
您好!我最近在学习您写的代码,请问这个run()函数中,既然已经存储了训练后的结果,那为何不在后面的不同阈值的循环中,只运行评测部分的代码呢?还是每次阈值不同时,训练结果不同呢?在这里,我看到阈值与训练好像没有关系。 我把run()改成这样你看是否合适 ` def train(): #训练模型 sf = SpamFilter(initial=1) all_mail = sf.getEmailList(initial=1) train_mail = all_mail.loc[:len(all_mail)*0.8] #取80%数据集作为训练集 check_mail = all_mail.loc[len(all_mail)*0.8:] check_mail.to_csv('test_set') sf.trainDict(train_mail) sf = SpamFilter()
def run(T = 0): #评测 threshold = T check_mail = pd.read_csv('test_set', index_col=0) check_mail['predict'] = check_mail.wordlist.apply(lambda x:sf.predictEmail(x,threshold)) foo = check_mail.predict + check_mail.spam all_right = 1-float(foo.value_counts()[1])/foo.value_counts().sum() ham_right = float(foo.value_counts()[0])/check_mail.spam.value_counts()[0] print ("Threshold:",threshold) print ('整体正确率', all_right, '%') print ("正常邮件获取度",ham_right*100,'%') return (all_right, ham_right) ` 最后在main中,for循环外加上一句train() 谢谢您的解答
The text was updated successfully, but these errors were encountered:
抱歉才看到。 这样的修改是完全可以的,阈值是用来寻找最优分割点的,后面的代码中通过折线图的方式展示了如何最优阈值,确定阈值的时候不必重新训练模型。 我把训练和预测的代码放在一起的目的是为了交作业的时候方便说明流程。 谢谢您抽时间看我的代码。
Sorry, something went wrong.
No branches or pull requests
您好!我最近在学习您写的代码,请问这个run()函数中,既然已经存储了训练后的结果,那为何不在后面的不同阈值的循环中,只运行评测部分的代码呢?还是每次阈值不同时,训练结果不同呢?在这里,我看到阈值与训练好像没有关系。
我把run()改成这样你看是否合适
`
def train():
#训练模型
sf = SpamFilter(initial=1)
all_mail = sf.getEmailList(initial=1)
train_mail = all_mail.loc[:len(all_mail)*0.8] #取80%数据集作为训练集
check_mail = all_mail.loc[len(all_mail)*0.8:]
check_mail.to_csv('test_set')
sf.trainDict(train_mail)
sf = SpamFilter()
def run(T = 0):
#评测
threshold = T
check_mail = pd.read_csv('test_set', index_col=0)
check_mail['predict'] = check_mail.wordlist.apply(lambda x:sf.predictEmail(x,threshold))
foo = check_mail.predict + check_mail.spam
all_right = 1-float(foo.value_counts()[1])/foo.value_counts().sum()
ham_right = float(foo.value_counts()[0])/check_mail.spam.value_counts()[0]
print ("Threshold:",threshold)
print ('整体正确率', all_right, '%')
print ("正常邮件获取度",ham_right*100,'%')
return (all_right, ham_right)
`
最后在main中,for循环外加上一句train()
谢谢您的解答
The text was updated successfully, but these errors were encountered: