Work Steps:

Open new project in pycharm.
select python interpreter which installed on my device to: C:\Users\ahmad\AppData\Local\Programs\Python\Python311
install pyspark using terminal pip install pyspark pip install pyspark[sql]
or install spark to your machine as here: https://sparkbyexamples.com/spark/apache-spark-installation-on-windows/
install single node Hadoop v3.3.3 as here: https://brain-mentors.com/hadoopinstallation/
install jdk8 from the link: https://www.oracle.com/tr/java/technologies/javase/javase8-archive-downloads.html
add JAVA_HOME and HADOOP_HOME to environment variables with the installed path
add installed path to the path variable too
if you use spark don't forget to add SPARK_HOME variable and path too
put your txt files in the folder txtFiles
run and test

Not:

I tried alot to take files from zip file but, I had a problem after reading zip file and converting it to dataframe the problem related to python version so, I tried python 2.7, 3.6, 3.9 and 3.11 and the problem still apper so, I skipped the problem reading files from folder with the wanted conditions. the problem can be seen in jupyter file.

I write another python code to application reading from zip and to match the result in file pythonCode.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idea		.idea
txtFiles		txtFiles
LICENSE		LICENSE
Readme.md		Readme.md
jupyter.ipynb		jupyter.ipynb
main.py		main.py
pythonCode.py		pythonCode.py
txtFiles.zip		txtFiles.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Work Steps:

Not:

About

Uh oh!

Releases

Packages

Languages

License

mad960/txtFiles_Pyspark

Folders and files

Latest commit

History

Repository files navigation

Work Steps:

Not:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages