Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to calculate response times #2

Open
nalbanders opened this issue Apr 24, 2015 · 13 comments
Open

Is there a way to calculate response times #2

nalbanders opened this issue Apr 24, 2015 · 13 comments

Comments

@nalbanders
Copy link
Collaborator

Hi, I found this parser very useful, I am doing a school project to analyze chat data. Thank you for contributing.

One thing I'd like to do is see the difference in response time between the root and their contact (much in the same way you calculate messages, characters, etc.)

Is there a way you recommend to add this functionality?

@nmoya
Copy link
Owner

nmoya commented Apr 24, 2015

Hello @nalbanders !

Thanks for your interest. I don't have a log file right now. Could you please refresh my memory and check if a log file contains a timestamp of the time that a message was sent/received?

If so, a first step should be to parse this string as a timestamp structure in python. Python provides several libraries to work with date and time.

You can have a glimpse of what can be done with other file of mine here: https://github.com/nmoya/glaucobot/blob/master/glaucobot/datelib.py

My best suggestion is that you should not perform manual calculations over timestamps. Always use a well tested library to work with date and time.

I am interested in working together to add this feature if you like.

Cheers,

PS. Also, if you are getting started with computing, check this video: https://www.youtube.com/watch?v=-5wpm-gesOY

@nalbanders
Copy link
Collaborator Author

Hi Nikolas,

Thanks for the reply. I will try to work it in as you suggest.

It's actually a fun project I am working on which your script is helping a
lot. I am a student at MIT. I'd be happy to have a call and discuss the
project and see if you'd have any interest working together. I am doing two
different studies interpreting relationships based on conversation data.
Always looking to connect with people who are interested and who are
skilled like yourself.

Here is my LinkedIn profile
https://www.linkedin.com/pub/armen-nalband/13/65/656

On Fri, Apr 24, 2015 at 3:37 PM, Nikolas Moya [email protected]
wrote:

Hello @nalbanders https://github.com/nalbanders !

Thanks for your interest. I don't have a log file right now. Could you
please refresh my memory and check if a log file contains a timestamp of
the time that a message was sent/received?

If so, a first step should be to parse this string as a timestamp
structure in python. Python provides several libraries to work with date
and time.

You can have a glimpse of what can be done with other file of mine here:
https://github.com/nmoya/glaucobot/blob/master/glaucobot/datelib.py

My best suggestion is that you should not perform manual calculations over
timestamps. Always use a well tested library to work with date and time.

I am interested in working together to add this feature if you like.

Cheers,

PS. Also, if you are getting started with computing, check this video:
https://www.youtube.com/watch?v=-5wpm-gesOY


Reply to this email directly or view it on GitHub
#2 (comment).

@nmoya
Copy link
Owner

nmoya commented Apr 27, 2015

Hello @nalbanders ,

Sure! Let's schedule a call and discuss more about the project. Are you available on Wednesday? My Skype/Hangout is nikolasmoya.

I also sent a connect invitation on Linkedin.

@nalbanders
Copy link
Collaborator Author

Great, how about Wednesday at 5:30 EST? I am in Boston, what city are you?
On Apr 27, 2015 2:09 PM, "Nikolas Moya" [email protected] wrote:

Hello @nalbanders https://github.com/nalbanders ,

Sure! Let's schedule a call and discuss more about the project. Are you
available on Wednesday? My Skype/Hangout is nikolasmoya.

I also sent a connect invitation on Linkedin.


Reply to this email directly or view it on GitHub
#2 (comment).

@nmoya
Copy link
Owner

nmoya commented Apr 27, 2015

I am in Curitiba (BRT). Let's try a little bit later, like, after work, how about [17, ..., 21h] EST?

@nalbanders
Copy link
Collaborator Author

Sorry, I meant 5:30PM (17:30). I believe you are one hour ahead so that
would be 18:30 your time. Does that work?

If not we can do 6:30PM EST (18:30)

On Mon, Apr 27, 2015 at 5:22 PM, Nikolas Moya [email protected]
wrote:

I am in Curitiba (BRT). Let's try a little bit later, like, after work,
how about [17, ..., 21h] EST?


Reply to this email directly or view it on GitHub
#2 (comment).

@nmoya
Copy link
Owner

nmoya commented Apr 27, 2015

Oh, alright then. 5:30 PM EST is great for me.

@nalbanders
Copy link
Collaborator Author

Ok, I will call you then tomorrow on Skype. I just sent you a contact
request.

Looking forward to it,
Armen

On Mon, Apr 27, 2015 at 5:33 PM, Nikolas Moya [email protected]
wrote:

Oh, alright then. 5:30 PM EST is great for me.


Reply to this email directly or view it on GitHub
#2 (comment).

@nalbanders
Copy link
Collaborator Author

Wednesday*

On Mon, Apr 27, 2015 at 5:35 PM, Armen Nalband [email protected] wrote:

Ok, I will call you then tomorrow on Skype. I just sent you a contact
request.

Looking forward to it,
Armen

On Mon, Apr 27, 2015 at 5:33 PM, Nikolas Moya [email protected]
wrote:

Oh, alright then. 5:30 PM EST is great for me.


Reply to this email directly or view it on GitHub
#2 (comment)
.

@nalbanders
Copy link
Collaborator Author

Hey, some context for our call tomorrow

Attached the main.py file that I modified

My goal is to be able to understand the relationships of the user based on
communication data (be able to predict who they care about most/least) Here
are some graphs I generated with the script. Attached is an output csv I am
building that I will use to do regression analysis (logistic, CART, Random
Forest) in R.
[image: Inline image 3]
[image: Inline image 1]

On a separate note, I have other development projects going on, always open
for skilled people like yourself to get involved if you find yourself
interested.

Here is a wireframe of an app I am creating. We can chat about it
separately.
https://www.justinmind.com/usernote/tests/14265484/14740364/14740366/index.html

On Mon, Apr 27, 2015 at 5:36 PM, Armen Nalband [email protected] wrote:

Wednesday*

On Mon, Apr 27, 2015 at 5:35 PM, Armen Nalband [email protected] wrote:

Ok, I will call you then tomorrow on Skype. I just sent you a contact
request.

Looking forward to it,
Armen

On Mon, Apr 27, 2015 at 5:33 PM, Nikolas Moya [email protected]
wrote:

Oh, alright then. 5:30 PM EST is great for me.


Reply to this email directly or view it on GitHub
#2 (comment)
.

@nalbanders
Copy link
Collaborator Author

Attachment

On Wed, Apr 29, 2015 at 1:19 AM, Armen Nalband [email protected] wrote:

Hey, some context for our call tomorrow

Attached the main.py file that I modified

My goal is to be able to understand the relationships of the user based on
communication data (be able to predict who they care about most/least) Here
are some graphs I generated with the script. Attached is an output csv I am
building that I will use to do regression analysis (logistic, CART, Random
Forest) in R.
[image: Inline image 3]
[image: Inline image 1]

On a separate note, I have other development projects going on, always
open for skilled people like yourself to get involved if you find yourself
interested.

Here is a wireframe of an app I am creating. We can chat about it
separately.

https://www.justinmind.com/usernote/tests/14265484/14740364/14740366/index.html

On Mon, Apr 27, 2015 at 5:36 PM, Armen Nalband [email protected] wrote:

Wednesday*

On Mon, Apr 27, 2015 at 5:35 PM, Armen Nalband [email protected] wrote:

Ok, I will call you then tomorrow on Skype. I just sent you a contact
request.

Looking forward to it,
Armen

On Mon, Apr 27, 2015 at 5:33 PM, Nikolas Moya [email protected]
wrote:

Oh, alright then. 5:30 PM EST is great for me.


Reply to this email directly or view it on GitHub
#2 (comment)
.

from future import division
from datetime import datetime
import codecs
import date
import re
import operator
import sys
import json
import csv
#import numpy
from pprint import pprint

class Chat():
def init(self, filename):
self.filename = filename
self.raw_messages = []

    self.datelist = []
    self.timelist = []
    self.senderlist = []
    self.messagelist = []
    self.chatTimeList = []
    self.rootResponseTimeList = []
    self.contactResponseTimeList = []
    self.rootBurstList = []
    self.contactBurstList = []
    #self.responseTimeList.append(0)

def open_file(self):
    arq = codecs.open(self.filename, "r", "utf-8-sig")
    content = arq.read()
    arq.close()
    lines = content.split("\n")
    lines = [l for l in lines if len(l) != 1]
    for l in lines:
        self.raw_messages.append(l.encode("utf-8"))

def feed_lists(self):
    for l in self.raw_messages:
        msg_date, sep, msg = l.partition(": ")
        raw_date, sep, time = msg_date.partition(" ")
        sender, sep, message = msg.partition(": ")
        #print ("\n\n\nRAW: ")
        #print (raw_date)
        raw_date = raw_date.replace(",", "")
        #print (raw_date)
        #print ("\n\n\n")
        if message:
            self.datelist.append(raw_date) 
            self.timelist.append(time) #here is the time object; save it              
            colonIndex = [x.start() for x in re.finditer(':', l)]
            #print ind
            chatTimeString = l[0:colonIndex[2]] #grab the characters that make up the date and time (Everthing until the third colon
            chatTime = datetime.strptime(chatTimeString, "%m/%d/%y, %I:%M:%S %p") #convert to a data object, format of the whatsapp data 8/2/14, 12:59:24 PM
            self.chatTimeList.append(chatTime)                               
            self.senderlist.append(sender)
            self.messagelist.append(message)
        else:
            self.messagelist.append(l)
    t0=self.chatTimeList[0]
    senderIndex=0;
    burstCount=1; #variable to count the number of messages in a row sent by sender

    rootName = "ROOT"
    contactName = "CONTACT"

    for t1 in self.chatTimeList[1:]: #perform the operations that are dependant on multiple messages (response time, bursts)
        dt = t1-t0
        if self.senderlist[senderIndex] != self.senderlist[senderIndex-1]: #is sender the same as the last message?
            #sender changed, store the burst count and reset 
            print("sender changed: %s") %(self.senderlist[senderIndex])
            print("response time: %d\n" %(dt.seconds) )
            if self.senderlist[senderIndex] == rootName:    #is sender the root?
                self.rootBurstList.append(burstCount)
                self.rootResponseTimeList.append(dt.seconds)                    
            elif self.senderlist[senderIndex] == contactName: #is sender the contact?
                self.contactBurstList.append(burstCount)
                self.contactResponseTimeList.append(dt.seconds)
            else:   
                sys.exit("ERROR CHANGE NAMES IN CHAT TO ROOT AND CONTACT\n")                    
            burstCount = 1  

            #save 

        else:
            burstCount+=1 #accumulate the number of messages sent in a row  
            print"repeat sender: %d %s\n" %(burstCount, self.senderlist[senderIndex])


        #self.responseTimeList.append(dt.seconds)
        t0 = t1            
        senderIndex+=1


def print_history(self, end=0):
    if end == 0:
        end = len(self.messagelist)
    for i in range(len(self.messagelist[:end])):
        print self.datelist[i], self.timelist[i],\
            self.senderlist[i], self.messagelist[i]

def get_senders(self):
    senders_set = set(self.senderlist)
    return [e for e in senders_set]

def count_messages_per_weekday(self):
    counter = dict()
    for i in range(len(self.datelist)):
        month, day, year = self.datelist[i].split("/") #AN edited date order
        parsed_date = "%s-%s-%s" % (year, month, day)
        #print ("DATE: ")
        #print (parsed_date)
        #print ("\n\n")
        weekday = date.date_to_weekday(parsed_date)
        if weekday not in counter:
            counter[weekday] = 1
        else:
            counter[weekday] += 1
    return counter

def count_messages_per_shift(self):
    shifts = {
        "latenight": 0,
        "morning": 0,
        "afternoon": 0,
        "evening": 0
    }
    for i in range(len(self.timelist)):
        hour = int(self.timelist[i].split(":")[0])
        if hour >= 0 and hour <= 6:
            shifts["latenight"] += 1

        elif hour > 6 and hour <= 11:
            shifts["morning"] += 1

        elif hour > 11 and hour <= 17:
            shifts["afternoon"] += 1

        elif hour > 17 and hour <= 23:
            shifts["evening"] += 1
    return shifts

def count_messages_pattern(self, patternlist):
    counters = dict()
    pattern_dict = dict()
    senders = self.get_senders()
    for pattern in patternlist:
        counters[pattern] = dict()
        for s in senders:
            counters[pattern][s] = 0
        pattern_dict[pattern] = re.compile(re.escape(pattern), re.I) #re=regular expression, .I = ignore case, .compile = convert to object 
    for i in range(len(self.messagelist)):
        for pattern in patternlist:
            search_result = pattern_dict[pattern].\
                findall(self.messagelist[i])
            length = len(search_result)
            if length > 0:
                if pattern not in counters:
                    counters[pattern][self.senderlist[i]] = length
                else:
                    counters[pattern][self.senderlist[i]] += length
    return counters

def print_patterns_dict(self, pattern_dict):
    for pattern in pattern_dict:
        print pattern
        for s in pattern_dict[pattern]:
            print s, ": ", pattern_dict[pattern][s]
        print ""

def message_proportions(self):
    senders = self.get_senders()
    counter = dict()
    total = 0
    for i in ["messages", "words", "chars", "qmarks", "media"]:
        counter[i] = dict()
        for s in senders:
            counter[i][s] = 0
    for i in range(len(self.senderlist)):
        counter["messages"][self.senderlist[i]] += 1
        counter["words"][self.senderlist[i]] += \
            len(self.messagelist[i].split(" "))
        counter["chars"][self.senderlist[i]] += len(self.messagelist[i])
        counter["qmarks"][self.senderlist[i]] += self.messagelist[i].count('?')
        counter["media"][self.senderlist[i]] += (self.messagelist[i].count('<media omitted>')+self.messagelist[i].count('<image omitted>')+self.messagelist[i].count('<audio omitted>'))
        total += 1
    counter["total_messages"] = 0
    counter["total_words"] = 0
    counter["total_chars"] = 0
    counter["total_qmarks"] = 0
    counter["total_media"] = 0

    for s in senders:
        counter["total_messages"] += counter["messages"][s]
        counter["total_words"] += counter["words"][s]
        counter["total_chars"] += counter["chars"][s]
        counter["total_qmarks"] += counter["qmarks"][s]
        counter["total_media"] += counter["media"][s]
    return counter

def average_message_length(self):
    msg_prop = self.message_proportions()
    counter = dict()
    for s in self.get_senders():
        counter[s] = msg_prop["words"][s] / msg_prop["messages"][s]
    return counter

def most_used_words(self, top=10, threshold=3):
    words = dict()
    for i in range(len(self.messagelist)):
        message_word = self.messagelist[i].split(" ")
        for w in message_word:
            if len(w) > threshold:
                w = w.decode("utf8")
                w = w.replace("\r", "")
                w = w.lower()
                if w not in words:
                    words[w] = 1
                else:
                    words[w] += 1
    sorted_words = sorted(words.iteritems(), key=operator.itemgetter(1),
                          reverse=True)
    counter = 0
    output = sorted_words[:top]
    return output

def printDict(dic, parent, depth):
tup = sorted(dic.iteritems(), key=operator.itemgetter(1))
isLeaf = True
for key in tup:
if isinstance(dic[key[0]], dict):
isLeaf = False
if isLeaf and depth!=0:
print " "_(depth-1)_2, parent
for key in tup:
if isinstance(dic[key[0]], dict):
printDict(dic[key[0]], key[0], depth+1)
else:
print " "_depth_2, str(key[0]), "->", dic[key[0]]

def main():
if len(sys.argv) < 2:
print "Run: python main.py [regex. patterns]"
sys.exit(1)
c = Chat(sys.argv[1])
c.open_file()
c.feed_lists()
output = dict()

print "\n--PROPORTIONS"
output["proportions"] = c.message_proportions()
printDict(output["proportions"], "proportions", 0)

print "\n--SHIFTS"
output["shifts"] = c.count_messages_per_shift()
printDict(output["shifts"], "shifts", 0)

print "\n--WEEKDAY"
output["weekdays"] = c.count_messages_per_weekday()
printDict(output["weekdays"], "weekday", 0)

print "\n--AVERAGE MESSAGE LENGTH"
output["lengths"] = c.average_message_length()
printDict(output["lengths"], "lengths", 0)

print "\n--PATTERNS"
output["patterns"] = c.count_messages_pattern(sys.argv[2:])
printDict(output["patterns"], "patterns", 0)

print "\n--TOP 15 MOST USED WORDS (length >= 3)"
output["most_used_words"] = c.most_used_words(top=15, threshold=3)
output["most_used_words"] = sorted(output["most_used_words"], key=operator.itemgetter(1), reverse=True)
#print output["most_used_words"]
#for muw in output["most_used_words"]:
#    print muw[0]

print "TIMESTAMPS\n %s\n\n" %c.chatTimeList[0:4]
print "Root Response time sample \n %s...\n" %c.rootResponseTimeList[0:4]
print "Contact Response time sample \n %s...\n" %c.contactResponseTimeList[0:4]
print "Root bursts \n %s\n" %c.rootBurstList
print "Contact bursts \n %s\n" %c.contactBurstList

print "Median response time =%s\n\n" %(numpy.median(c.responseTimeList))

output["senders"] = c.get_senders()
#filename = sys.argv[1].split("/")[-1]
#arq = open("./logs/"+filename+".json", "w")
#arq = open("filename.json", "w")
nameTest = sys.argv[1] 
arq = open("C:/Python27/"+nameTest+".json", "w")
arq.write(json.dumps(output))
pprint(output)
arq.close()

with open('names.csv', 'w') as csvfile:

fieldnames = ['msgs_root', 'msgs_contact', 'chars_root', 'chars_contact', 'qmarks_root', 'qmarks_contact']

writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

writer.writeheader()

#    writer.writerow({'msgs_root': c.message_proportions , 'last_name': 'Beans'})

main()

@nmoya
Copy link
Owner

nmoya commented Apr 29, 2015

Hello @nalbanders !

I will add you as a contributor to the repository so that you have write access. Could you please commit your adapted main file with a different name?

Also, something came up tomorrow at 6:30 PM EST. Do you mind changing our call to 4:30 PM EST or 5 PM EST? If it is not possible, it's alright, but I will need to leave at 6:15PM EST and then we can reschedule a new call if 45 minutes are not enough. I will be on Skype tomorrow's afternoon, so if you arrive earlier, we can start earlier otherwise we keep the original schedule :-)

Also, your graphs did not show up. I was looking forward to see them! :(
Great job on the modifications in the main file!

@nalbanders
Copy link
Collaborator Author

Ok, will try to call at 5 instead.

Will push to git tomorrow.

Thanks,
A
On Apr 29, 2015 1:53 AM, "Nikolas Moya" [email protected] wrote:

Hello @nalbanders https://github.com/nalbanders !

I will add you as a contributor to the repository so that you have write
access. Could you please commit your adapted main file with a different
name?

Also, something came up tomorrow at 6:30 PM EST. Do you mind changing our
call to 4:30 PM EST or 5 PM EST? If it is not possible, it's alright, but I
will need to leave at 6:15PM EST and then we can reschedule a new call if
45 minutes are not enough. I will be on Skype tomorrow's afternoon, so if
you arrive earlier, we can start earlier otherwise we keep the original
schedule :-)

Great job in your modifications in the main file!


Reply to this email directly or view it on GitHub
#2 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants