Skip to content

Commit

Permalink
refactor and fix ParseError
Browse files Browse the repository at this point in the history
  • Loading branch information
jameno committed Nov 12, 2022
1 parent fd617a5 commit 777f142
Showing 1 changed file with 43 additions and 18 deletions.
61 changes: 43 additions & 18 deletions apple_health_xml_convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,34 +16,53 @@
import pandas as pd
import xml.etree.ElementTree as ET
import datetime as dt
import re
import sys


# %% Function Definitions
def pre_process():
"""Pre-processes the XML file by replacing specific bits that would
normally result in a ParseError

def pre_process(xml_string):
"""
The export.xml file is where all your data is, but Apple Health Export has
two main problems that make it difficult to parse:
1. The DTD markup syntax is exported incorrectly by Apple Health for some data types.
2. The invisible character \x0b (sometimes rendered as U+000b) likes to destroy trees. Think of the trees!
print("Pre-processing...", end="")
with open("export.xml") as f:
newText = f.read().replace("\x0b", "")
Knowing this, we can save the trees and pre-processes the XML data to avoid destruction and ParseErrors.
"""

# with open("apple_health_export_2/new_export.xml", "w") as f:
with open("processed_export.xml", "w") as f:
f.write(newText)
print("Pre-processing...", end="")
sys.stdout.flush()

xml_string = strip_dtd(xml_string)
xml_string = strip_invisible_character(xml_string)
print("done!")

return
return xml_string


def strip_invisible_character(xml_string):

return xml_string.replace("\x0b", "")

def convert_xml():

def strip_dtd(xml_string):
start_strip = re.search('<!DOCTYPE', xml_string).span()[0]
end_strip = re.search(']>', xml_string).span()[1]

return xml_string[:start_strip] + xml_string[end_strip:]


def xml_to_csv(xml_string):
"""Loops through the element tree, retrieving all objects, and then
combining them together into a dataframe
"""

print("Converting XML File...", end="")
etree = ET.parse("processed_export.xml")
print("Converting XML File to CSV...", end="")
sys.stdout.flush()

etree = ET.ElementTree(ET.fromstring(xml_string))

attribute_list = []

Expand Down Expand Up @@ -78,13 +97,16 @@ def convert_xml():

# Add loop specific column ordering if metadata entries exist
if 'com.loopkit.InsulinKit.MetadataKeyProgrammedTempBasalRate' in original_cols:
shifted_cols.append('com.loopkit.InsulinKit.MetadataKeyProgrammedTempBasalRate')
shifted_cols.append(
'com.loopkit.InsulinKit.MetadataKeyProgrammedTempBasalRate')

if 'com.loopkit.InsulinKit.MetadataKeyScheduledBasalRate' in original_cols:
shifted_cols.append('com.loopkit.InsulinKit.MetadataKeyScheduledBasalRate')
shifted_cols.append(
'com.loopkit.InsulinKit.MetadataKeyScheduledBasalRate')

if 'com.loudnate.CarbKit.HKMetadataKey.AbsorptionTimeMinutes' in original_cols:
shifted_cols.append('com.loudnate.CarbKit.HKMetadataKey.AbsorptionTimeMinutes')
shifted_cols.append(
'com.loudnate.CarbKit.HKMetadataKey.AbsorptionTimeMinutes')

remaining_cols = list(set(original_cols) - set(shifted_cols))
reordered_cols = shifted_cols + remaining_cols
Expand All @@ -100,6 +122,8 @@ def convert_xml():

def save_to_csv(health_df):
print("Saving CSV file...", end="")
sys.stdout.flush()

today = dt.datetime.now().strftime('%Y-%m-%d')
health_df.to_csv("apple_health_export_" + today + ".csv", index=False)
print("done!")
Expand All @@ -108,8 +132,9 @@ def save_to_csv(health_df):


def main():
pre_process()
health_df = convert_xml()
xml_string = open("export.xml").read()
xml_string = pre_process(xml_string)
health_df = xml_to_csv(xml_string)
save_to_csv(health_df)

return
Expand Down

8 comments on commit 777f142

@sasmitaleela
Copy link

@sasmitaleela sasmitaleela commented on 777f142 Jan 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
could you please help me? I tried still I am not able to convert .Still showing error.https://colab.research.google.com/drive/1xEcRQPYAh3sRbs5DnjgJyW_5uWe2R-SD?usp=sharing

@jameno
Copy link
Owner Author

@jameno jameno commented on 777f142 Jan 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, could you please help me? I tried still I am not able to convert .Still showing error.https://colab.research.google.com/drive/1xEcRQPYAh3sRbs5DnjgJyW_5uWe2R-SD?usp=sharing

Hi @sasmitaleela!

Sure, I can see in your colab notebook that you are trying to parse the export_cda.xml file, but your data is in the export.xml file. Try using that file instead, and let me know if that resolves your issue!

I've updated the README to make this a little more clear.

@sasmitaleela
Copy link

@sasmitaleela sasmitaleela commented on 777f142 Jan 15, 2023 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sasmitaleela
Copy link

@sasmitaleela sasmitaleela commented on 777f142 Jan 15, 2023 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameno
Copy link
Owner Author

@jameno jameno commented on 777f142 Jan 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did as you said ,still error. please check. On Sun, Jan 15, 2023 at 8:22 AM sasmita panda @.> wrote:

Hi, Thank you so much for reply. I will try again and let you know. On Sun, Jan 15, 2023, 12:18 AM Jason Meno @.
> wrote: > Hi, could you please help me? I tried still I am not able to convert > .Still showing error. > https://colab.research.google.com/drive/1xEcRQPYAh3sRbs5DnjgJyW_5uWe2R-SD?usp=sharing > > Hi @sasmitaleela https://github.com/sasmitaleela! > > Sure, I can see in your colab notebook that you are trying to parse the > export_cda.xml file, but your data is in the export.xml file. Try using > that file instead, and let me know if that resolves your issue! > > I've updated the README to make this a little more clear. > > — > Reply to this email directly, view it on GitHub > <777f142#commitcomment-96198345>, > or unsubscribe > https://github.com/notifications/unsubscribe-auth/A3GYKQIPBLFHP2CQMG4ZMA3WSLYIPANCNFSM6AAAAAAT3HHQVU > . > You are receiving this because you were mentioned.Message ID: > <jameno/Simple-Apple-Health-XML-to-CSV/commit/777f142acf8510212f534693772821ceb37a49fa/96198345 > @github.com> >

Thanks for giving that a try!

It looks like the error is

ParseError: duplicate attribute: line 1057598, column 109

This means that there is some duplicate attribute on this line in your file. For example, it might look something like this: <Record type="HKSomeRecordType" sourceName="My Apple Watch" sourceVersion="7.0.3" unit="ms" value="39.8081" value="39.8081"> where the value attribute is duplicated.

You can open your export.xml file in a text editor like Sublime Text or VSCode which allows you to jump to a specific line. Then you can find the duplication and remove it.

@sasmitaleela
Copy link

@sasmitaleela sasmitaleela commented on 777f142 Jan 16, 2023 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameno
Copy link
Owner Author

@jameno jameno commented on 777f142 Jan 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sasmitaleela

Sorry to hear you're still having issues!

I'm happy to schedule some time to help out. Github will sensor our email addresses, so we'll have to connect elsewhere. Perhaps try adding me on LinkedIn, and we can try to troubleshoot from there: https://www.linkedin.com/in/jason-meno/

@jameno
Copy link
Owner Author

@jameno jameno commented on 777f142 Jan 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sasmitaleela

If you haven't tried it already, you could also look into https://www.ericwolter.com/projects/apple-health-export/ and see if this tool works for you?

Please sign in to comment.