Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hipi on Spark #31

Open
sdikby opened this issue Nov 3, 2016 · 11 comments
Open

Hipi on Spark #31

sdikby opened this issue Nov 3, 2016 · 11 comments

Comments

@sdikby
Copy link

sdikby commented Nov 3, 2016

Dear HIPI developers,

do you plan on integrating apache spark instead of the old mapreduce?? if so when?
Otherwise could you give me some hints on how to do it?
My use case is that i need to classify millions of images and with mapreduce it will not be efficient as i need it to be.
@sweeneychris @liuliu @voigtlandier @zverham @hafnium

@yangboz
Copy link

yangboz commented Nov 4, 2016

@sdikby have you tried hipi hibImport.sh with millions of images successfully?

@sdikby
Copy link
Author

sdikby commented Nov 13, 2016

@yangboz sorry for the delay.
no, i didn't even start to use HIPI. My use case is to process millions of images in hadoop. But i don't think that it is performant enough with MapReduce or if it is even possible with Hipi, as it is not maintained since around a year now (the last commit was on 12 april).

@yangboz
Copy link

yangboz commented Nov 14, 2016

@sdikby thanks for your reply, totally agree with your comments of lack of updates on HIPI source code, also I found code issue #30, none response..
by the way, except the HIPI solution, any other Hadoop sequence file solutions for millions of images files?

@sdikby
Copy link
Author

sdikby commented Nov 14, 2016

@yangboz i know some 2 other tools for image processing, but i didn't try them yet (i just began my master thesis :) )
there is Mipr: https://github.com/sozykin/mipr
and this one: https://github.com/okstate-robotics/hipl
The two are based also on mapreduce.
Otherwise i don't know the ´difference between them. Feel free to test them and i would be happy to get a feedback from you

@yangboz
Copy link

yangboz commented Nov 14, 2016

@sdikby thanks for your ideas suggestion, I will try them, and my ideas comes from :
http://dinesh-malav.blogspot.com/2015/05/image-processing-using-opencv-on-hadoop.html ,
It is a great tutorial on CDH(MR1)+HIPI v1+ant, but nowadays,HIPI using gradlew, version v2+,that's why I am struggling on code base modifications.

@sdikby
Copy link
Author

sdikby commented Nov 14, 2016

@yangboz it would be also great to know how the 3 tools/frameworks store images on hdfs (to deal with the blocksize problem for example) and the big differences between them(read/write performance from/into hdfs).

@yangboz
Copy link

yangboz commented Nov 14, 2016

@sdikby before those 3 tools/framework, existed solutions that I have studied on Ceph and even Cassandra image blob storage. Conclusion will coming soon.

@yangboz
Copy link

yangboz commented Nov 28, 2016

@sdikby compare Mipr: https://github.com/sozykin/mipr (full documentation an code example passed)
with this one: https://github.com/okstate-robotics/hipl (missing of documentation!)

@sdikby
Copy link
Author

sdikby commented Nov 28, 2016

@yangboz oh good job ! and what's about performance? did you compare the both in terms of # image write/read per second?
and how they both store images on HDFS, specially how they deal with the block size problem ?

@yangboz
Copy link

yangboz commented Dec 5, 2016

@sdikby there is a paper(please drop a letter to me if you need it.) on hadoop/spark performance compare includes indexing and retrieval
according to its compare result, integrate hadoop and spark to process 160k pictures on 30 node cluster that improve the efficiency.

@sdikby
Copy link
Author

sdikby commented Dec 9, 2016

@yangboz could you please provide me this paper.
I would do a performance test between the 3 cited frameworks in the next months.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants