Is there a way to calculate response times #2

nalbanders · 2015-04-24T18:47:25Z

Hi, I found this parser very useful, I am doing a school project to analyze chat data. Thank you for contributing.

One thing I'd like to do is see the difference in response time between the root and their contact (much in the same way you calculate messages, characters, etc.)

Is there a way you recommend to add this functionality?

nmoya · 2015-04-24T19:37:14Z

Hello @nalbanders !

Thanks for your interest. I don't have a log file right now. Could you please refresh my memory and check if a log file contains a timestamp of the time that a message was sent/received?

If so, a first step should be to parse this string as a timestamp structure in python. Python provides several libraries to work with date and time.

You can have a glimpse of what can be done with other file of mine here: https://github.com/nmoya/glaucobot/blob/master/glaucobot/datelib.py

My best suggestion is that you should not perform manual calculations over timestamps. Always use a well tested library to work with date and time.

I am interested in working together to add this feature if you like.

Cheers,

PS. Also, if you are getting started with computing, check this video: https://www.youtube.com/watch?v=-5wpm-gesOY

nalbanders · 2015-04-27T00:08:12Z

Hi Nikolas,

Thanks for the reply. I will try to work it in as you suggest.

It's actually a fun project I am working on which your script is helping a
lot. I am a student at MIT. I'd be happy to have a call and discuss the
project and see if you'd have any interest working together. I am doing two
different studies interpreting relationships based on conversation data.
Always looking to connect with people who are interested and who are
skilled like yourself.

Here is my LinkedIn profile
https://www.linkedin.com/pub/armen-nalband/13/65/656

On Fri, Apr 24, 2015 at 3:37 PM, Nikolas Moya [email protected]
wrote:

Hello @nalbanders https://github.com/nalbanders !

Thanks for your interest. I don't have a log file right now. Could you
please refresh my memory and check if a log file contains a timestamp of
the time that a message was sent/received?

If so, a first step should be to parse this string as a timestamp
structure in python. Python provides several libraries to work with date
and time.

You can have a glimpse of what can be done with other file of mine here:
https://github.com/nmoya/glaucobot/blob/master/glaucobot/datelib.py

My best suggestion is that you should not perform manual calculations over
timestamps. Always use a well tested library to work with date and time.

I am interested in working together to add this feature if you like.

Cheers,

PS. Also, if you are getting started with computing, check this video:
https://www.youtube.com/watch?v=-5wpm-gesOY

—
Reply to this email directly or view it on GitHub
#2 (comment).

nmoya · 2015-04-27T18:09:04Z

Hello @nalbanders ,

Sure! Let's schedule a call and discuss more about the project. Are you available on Wednesday? My Skype/Hangout is nikolasmoya.

I also sent a connect invitation on Linkedin.

nalbanders · 2015-04-27T20:21:49Z

Great, how about Wednesday at 5:30 EST? I am in Boston, what city are you?
On Apr 27, 2015 2:09 PM, "Nikolas Moya" [email protected] wrote:

Hello @nalbanders https://github.com/nalbanders ,

Sure! Let's schedule a call and discuss more about the project. Are you
available on Wednesday? My Skype/Hangout is nikolasmoya.

I also sent a connect invitation on Linkedin.

—
Reply to this email directly or view it on GitHub
#2 (comment).

nmoya · 2015-04-27T21:22:11Z

I am in Curitiba (BRT). Let's try a little bit later, like, after work, how about [17, ..., 21h] EST?

nalbanders · 2015-04-27T21:26:07Z

Sorry, I meant 5:30PM (17:30). I believe you are one hour ahead so that
would be 18:30 your time. Does that work?

If not we can do 6:30PM EST (18:30)

On Mon, Apr 27, 2015 at 5:22 PM, Nikolas Moya [email protected]
wrote:

I am in Curitiba (BRT). Let's try a little bit later, like, after work,
how about [17, ..., 21h] EST?

—
Reply to this email directly or view it on GitHub
#2 (comment).

nmoya · 2015-04-27T21:33:32Z

Oh, alright then. 5:30 PM EST is great for me.

nalbanders · 2015-04-27T21:36:02Z

Ok, I will call you then tomorrow on Skype. I just sent you a contact
request.

Looking forward to it,
Armen

On Mon, Apr 27, 2015 at 5:33 PM, Nikolas Moya [email protected]
wrote:

Oh, alright then. 5:30 PM EST is great for me.

—
Reply to this email directly or view it on GitHub
#2 (comment).

nalbanders · 2015-04-27T21:36:22Z

Wednesday*

On Mon, Apr 27, 2015 at 5:35 PM, Armen Nalband [email protected] wrote:

Ok, I will call you then tomorrow on Skype. I just sent you a contact
request.

Looking forward to it,
Armen

On Mon, Apr 27, 2015 at 5:33 PM, Nikolas Moya [email protected]
wrote:

Oh, alright then. 5:30 PM EST is great for me.

—
Reply to this email directly or view it on GitHub
#2 (comment)
.

nalbanders · 2015-04-29T05:19:27Z

Hey, some context for our call tomorrow

Attached the main.py file that I modified

My goal is to be able to understand the relationships of the user based on
communication data (be able to predict who they care about most/least) Here
are some graphs I generated with the script. Attached is an output csv I am
building that I will use to do regression analysis (logistic, CART, Random
Forest) in R.
[image: Inline image 3]
[image: Inline image 1]

On a separate note, I have other development projects going on, always open
for skilled people like yourself to get involved if you find yourself
interested.

Here is a wireframe of an app I am creating. We can chat about it
separately.
https://www.justinmind.com/usernote/tests/14265484/14740364/14740366/index.html

On Mon, Apr 27, 2015 at 5:36 PM, Armen Nalband [email protected] wrote:

Wednesday*

On Mon, Apr 27, 2015 at 5:35 PM, Armen Nalband [email protected] wrote:

Ok, I will call you then tomorrow on Skype. I just sent you a contact
request.

Looking forward to it,
Armen

On Mon, Apr 27, 2015 at 5:33 PM, Nikolas Moya [email protected]
wrote:

Oh, alright then. 5:30 PM EST is great for me.

—
Reply to this email directly or view it on GitHub
#2 (comment)
.

nalbanders · 2015-04-29T05:20:18Z

Attachment

On Wed, Apr 29, 2015 at 1:19 AM, Armen Nalband [email protected] wrote:

Hey, some context for our call tomorrow

Attached the main.py file that I modified

My goal is to be able to understand the relationships of the user based on
communication data (be able to predict who they care about most/least) Here
are some graphs I generated with the script. Attached is an output csv I am
building that I will use to do regression analysis (logistic, CART, Random
Forest) in R.
[image: Inline image 3]
[image: Inline image 1]

On a separate note, I have other development projects going on, always
open for skilled people like yourself to get involved if you find yourself
interested.

Here is a wireframe of an app I am creating. We can chat about it
separately.

https://www.justinmind.com/usernote/tests/14265484/14740364/14740366/index.html

On Mon, Apr 27, 2015 at 5:36 PM, Armen Nalband [email protected] wrote:

Wednesday*

On Mon, Apr 27, 2015 at 5:35 PM, Armen Nalband [email protected] wrote:

Ok, I will call you then tomorrow on Skype. I just sent you a contact
request.

Looking forward to it,
Armen

On Mon, Apr 27, 2015 at 5:33 PM, Nikolas Moya [email protected]
wrote:

Oh, alright then. 5:30 PM EST is great for me.

—
Reply to this email directly or view it on GitHub
#2 (comment)
.

from future import division
from datetime import datetime
import codecs
import date
import re
import operator
import sys
import json
import csv
#import numpy
from pprint import pprint

class Chat():
def init(self, filename):
self.filename = filename
self.raw_messages = []

    self.datelist = []
    self.timelist = []
    self.senderlist = []
    self.messagelist = []
    self.chatTimeList = []
    self.rootResponseTimeList = []
    self.contactResponseTimeList = []
    self.rootBurstList = []
    self.contactBurstList = []
    #self.responseTimeList.append(0)

def open_file(self):
    arq = codecs.open(self.filename, "r", "utf-8-sig")
    content = arq.read()
    arq.close()
    lines = content.split("\n")
    lines = [l for l in lines if len(l) != 1]
    for l in lines:
        self.raw_messages.append(l.encode("utf-8"))

def feed_lists(self):
    for l in self.raw_messages:
        msg_date, sep, msg = l.partition(": ")
        raw_date, sep, time = msg_date.partition(" ")
        sender, sep, message = msg.partition(": ")
        #print ("\n\n\nRAW: ")
        #print (raw_date)
        raw_date = raw_date.replace(",", "")
        #print (raw_date)
        #print ("\n\n\n")
        if message:
            self.datelist.append(raw_date) 
            self.timelist.append(time) #here is the time object; save it              
            colonIndex = [x.start() for x in re.finditer(':', l)]
            #print ind
            chatTimeString = l[0:colonIndex[2]] #grab the characters that make up the date and time (Everthing until the third colon
            chatTime = datetime.strptime(chatTimeString, "%m/%d/%y, %I:%M:%S %p") #convert to a data object, format of the whatsapp data 8/2/14, 12:59:24 PM
            self.chatTimeList.append(chatTime)                               
            self.senderlist.append(sender)
            self.messagelist.append(message)
        else:
            self.messagelist.append(l)
    t0=self.chatTimeList[0]
    senderIndex=0;
    burstCount=1; #variable to count the number of messages in a row sent by sender

    rootName = "ROOT"
    contactName = "CONTACT"

    for t1 in self.chatTimeList[1:]: #perform the operations that are dependant on multiple messages (response time, bursts)
        dt = t1-t0
        if self.senderlist[senderIndex] != self.senderlist[senderIndex-1]: #is sender the same as the last message?
            #sender changed, store the burst count and reset 
            print("sender changed: %s") %(self.senderlist[senderIndex])
            print("response time: %d\n" %(dt.seconds) )
            if self.senderlist[senderIndex] == rootName:    #is sender the root?
                self.rootBurstList.append(burstCount)
                self.rootResponseTimeList.append(dt.seconds)                    
            elif self.senderlist[senderIndex] == contactName: #is sender the contact?
                self.contactBurstList.append(burstCount)
                self.contactResponseTimeList.append(dt.seconds)
            else:   
                sys.exit("ERROR CHANGE NAMES IN CHAT TO ROOT AND CONTACT\n")                    
            burstCount = 1  

            #save 

        else:
            burstCount+=1 #accumulate the number of messages sent in a row  
            print"repeat sender: %d %s\n" %(burstCount, self.senderlist[senderIndex])


        #self.responseTimeList.append(dt.seconds)
        t0 = t1            
        senderIndex+=1


def print_history(self, end=0):
    if end == 0:
        end = len(self.messagelist)
    for i in range(len(self.messagelist[:end])):
        print self.datelist[i], self.timelist[i],\
            self.senderlist[i], self.messagelist[i]

def get_senders(self):
    senders_set = set(self.senderlist)
    return [e for e in senders_set]

def count_messages_per_weekday(self):
    counter = dict()
    for i in range(len(self.datelist)):
        month, day, year = self.datelist[i].split("/") #AN edited date order
        parsed_date = "%s-%s-%s" % (year, month, day)
        #print ("DATE: ")
        #print (parsed_date)
        #print ("\n\n")
        weekday = date.date_to_weekday(parsed_date)
        if weekday not in counter:
            counter[weekday] = 1
        else:
            counter[weekday] += 1
    return counter

def count_messages_per_shift(self):
    shifts = {
        "latenight": 0,
        "morning": 0,
        "afternoon": 0,
        "evening": 0
    }
    for i in range(len(self.timelist)):
        hour = int(self.timelist[i].split(":")[0])
        if hour >= 0 and hour <= 6:
            shifts["latenight"] += 1

        elif hour > 6 and hour <= 11:
            shifts["morning"] += 1

        elif hour > 11 and hour <= 17:
            shifts["afternoon"] += 1

        elif hour > 17 and hour <= 23:
            shifts["evening"] += 1
    return shifts

def count_messages_pattern(self, patternlist):
    counters = dict()
    pattern_dict = dict()
    senders = self.get_senders()
    for pattern in patternlist:
        counters[pattern] = dict()
        for s in senders:
            counters[pattern][s] = 0
        pattern_dict[pattern] = re.compile(re.escape(pattern), re.I) #re=regular expression, .I = ignore case, .compile = convert to object 
    for i in range(len(self.messagelist)):
        for pattern in patternlist:
            search_result = pattern_dict[pattern].\
                findall(self.messagelist[i])
            length = len(search_result)
            if length > 0:
                if pattern not in counters:
                    counters[pattern][self.senderlist[i]] = length
                else:
                    counters[pattern][self.senderlist[i]] += length
    return counters

def print_patterns_dict(self, pattern_dict):
    for pattern in pattern_dict:
        print pattern
        for s in pattern_dict[pattern]:
            print s, ": ", pattern_dict[pattern][s]
        print ""

def message_proportions(self):
    senders = self.get_senders()
    counter = dict()
    total = 0
    for i in ["messages", "words", "chars", "qmarks", "media"]:
        counter[i] = dict()
        for s in senders:
            counter[i][s] = 0
    for i in range(len(self.senderlist)):
        counter["messages"][self.senderlist[i]] += 1
        counter["words"][self.senderlist[i]] += \
            len(self.messagelist[i].split(" "))
        counter["chars"][self.senderlist[i]] += len(self.messagelist[i])
        counter["qmarks"][self.senderlist[i]] += self.messagelist[i].count('?')
        counter["media"][self.senderlist[i]] += (self.messagelist[i].count('<media omitted>')+self.messagelist[i].count('<image omitted>')+self.messagelist[i].count('<audio omitted>'))
        total += 1
    counter["total_messages"] = 0
    counter["total_words"] = 0
    counter["total_chars"] = 0
    counter["total_qmarks"] = 0
    counter["total_media"] = 0

    for s in senders:
        counter["total_messages"] += counter["messages"][s]
        counter["total_words"] += counter["words"][s]
        counter["total_chars"] += counter["chars"][s]
        counter["total_qmarks"] += counter["qmarks"][s]
        counter["total_media"] += counter["media"][s]
    return counter

def average_message_length(self):
    msg_prop = self.message_proportions()
    counter = dict()
    for s in self.get_senders():
        counter[s] = msg_prop["words"][s] / msg_prop["messages"][s]
    return counter

def most_used_words(self, top=10, threshold=3):
    words = dict()
    for i in range(len(self.messagelist)):
        message_word = self.messagelist[i].split(" ")
        for w in message_word:
            if len(w) > threshold:
                w = w.decode("utf8")
                w = w.replace("\r", "")
                w = w.lower()
                if w not in words:
                    words[w] = 1
                else:
                    words[w] += 1
    sorted_words = sorted(words.iteritems(), key=operator.itemgetter(1),
                          reverse=True)
    counter = 0
    output = sorted_words[:top]
    return output

def printDict(dic, parent, depth):
tup = sorted(dic.iteritems(), key=operator.itemgetter(1))
isLeaf = True
for key in tup:
if isinstance(dic[key[0]], dict):
isLeaf = False
if isLeaf and depth!=0:
print " "_(depth-1)_2, parent
for key in tup:
if isinstance(dic[key[0]], dict):
printDict(dic[key[0]], key[0], depth+1)
else:
print " "_depth_2, str(key[0]), "->", dic[key[0]]

def main():
if len(sys.argv) < 2:
print "Run: python main.py [regex. patterns]"
sys.exit(1)
c = Chat(sys.argv[1])
c.open_file()
c.feed_lists()
output = dict()

print "\n--PROPORTIONS"
output["proportions"] = c.message_proportions()
printDict(output["proportions"], "proportions", 0)

print "\n--SHIFTS"
output["shifts"] = c.count_messages_per_shift()
printDict(output["shifts"], "shifts", 0)

print "\n--WEEKDAY"
output["weekdays"] = c.count_messages_per_weekday()
printDict(output["weekdays"], "weekday", 0)

print "\n--AVERAGE MESSAGE LENGTH"
output["lengths"] = c.average_message_length()
printDict(output["lengths"], "lengths", 0)

print "\n--PATTERNS"
output["patterns"] = c.count_messages_pattern(sys.argv[2:])
printDict(output["patterns"], "patterns", 0)

print "\n--TOP 15 MOST USED WORDS (length >= 3)"
output["most_used_words"] = c.most_used_words(top=15, threshold=3)
output["most_used_words"] = sorted(output["most_used_words"], key=operator.itemgetter(1), reverse=True)
#print output["most_used_words"]
#for muw in output["most_used_words"]:
#    print muw[0]

print "TIMESTAMPS\n %s\n\n" %c.chatTimeList[0:4]
print "Root Response time sample \n %s...\n" %c.rootResponseTimeList[0:4]
print "Contact Response time sample \n %s...\n" %c.contactResponseTimeList[0:4]
print "Root bursts \n %s\n" %c.rootBurstList
print "Contact bursts \n %s\n" %c.contactBurstList

print "Median response time =%s\n\n" %(numpy.median(c.responseTimeList))

output["senders"] = c.get_senders()
#filename = sys.argv[1].split("/")[-1]
#arq = open("./logs/"+filename+".json", "w")
#arq = open("filename.json", "w")
nameTest = sys.argv[1] 
arq = open("C:/Python27/"+nameTest+".json", "w")
arq.write(json.dumps(output))
pprint(output)
arq.close()

with open('names.csv', 'w') as csvfile:

fieldnames = ['msgs_root', 'msgs_contact', 'chars_root', 'chars_contact', 'qmarks_root', 'qmarks_contact']

writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

writer.writeheader()

#    writer.writerow({'msgs_root': c.message_proportions , 'last_name': 'Beans'})

main()

nmoya · 2015-04-29T05:53:13Z

Hello @nalbanders !

I will add you as a contributor to the repository so that you have write access. Could you please commit your adapted main file with a different name?

Also, something came up tomorrow at 6:30 PM EST. Do you mind changing our call to 4:30 PM EST or 5 PM EST? If it is not possible, it's alright, but I will need to leave at 6:15PM EST and then we can reschedule a new call if 45 minutes are not enough. I will be on Skype tomorrow's afternoon, so if you arrive earlier, we can start earlier otherwise we keep the original schedule :-)

Also, your graphs did not show up. I was looking forward to see them! :(
Great job on the modifications in the main file!

nalbanders · 2015-04-29T05:57:43Z

Ok, will try to call at 5 instead.

Will push to git tomorrow.

Thanks,
A
On Apr 29, 2015 1:53 AM, "Nikolas Moya" [email protected] wrote:

Hello @nalbanders https://github.com/nalbanders !

I will add you as a contributor to the repository so that you have write
access. Could you please commit your adapted main file with a different
name?

Also, something came up tomorrow at 6:30 PM EST. Do you mind changing our
call to 4:30 PM EST or 5 PM EST? If it is not possible, it's alright, but I
will need to leave at 6:15PM EST and then we can reschedule a new call if
45 minutes are not enough. I will be on Skype tomorrow's afternoon, so if
you arrive earlier, we can start earlier otherwise we keep the original
schedule :-)

Great job in your modifications in the main file!

—
Reply to this email directly or view it on GitHub
#2 (comment).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to calculate response times #2

Is there a way to calculate response times #2

nalbanders commented Apr 24, 2015

nmoya commented Apr 24, 2015

nalbanders commented Apr 27, 2015

nmoya commented Apr 27, 2015

nalbanders commented Apr 27, 2015

nmoya commented Apr 27, 2015

nalbanders commented Apr 27, 2015

nmoya commented Apr 27, 2015

nalbanders commented Apr 27, 2015

nalbanders commented Apr 27, 2015

nalbanders commented Apr 29, 2015

nalbanders commented Apr 29, 2015

nmoya commented Apr 29, 2015

nalbanders commented Apr 29, 2015

Is there a way to calculate response times #2

Is there a way to calculate response times #2

Comments

nalbanders commented Apr 24, 2015

nmoya commented Apr 24, 2015

nalbanders commented Apr 27, 2015

nmoya commented Apr 27, 2015

nalbanders commented Apr 27, 2015

nmoya commented Apr 27, 2015

nalbanders commented Apr 27, 2015

nmoya commented Apr 27, 2015

nalbanders commented Apr 27, 2015

nalbanders commented Apr 27, 2015

nalbanders commented Apr 29, 2015

nalbanders commented Apr 29, 2015

print "Median response time =%s\n\n" %(numpy.median(c.responseTimeList))

with open('names.csv', 'w') as csvfile:

fieldnames = ['msgs_root', 'msgs_contact', 'chars_root', 'chars_contact', 'qmarks_root', 'qmarks_contact']

writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

writer.writeheader()

nmoya commented Apr 29, 2015

nalbanders commented Apr 29, 2015