Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR to compare original caffe branch (master) with OpenCL port branch (dev) #9

Open
wants to merge 139 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
139 commits
Select commit Hold shift + click to select a range
d100af9
This is yibing's first patch. removed all cuda files and added device…
Jul 10, 2015
3965d0c
Synced memory changes
Jul 11, 2015
8a7c2b2
update data layer for AMD_PERSISTENT_MEM
Jul 11, 2015
13cd87f
add Forward_gpu and Backward_gpu for more layers; update math functio…
Jul 12, 2015
8e07135
Minor update ot syncedmem.hpp
Jul 12, 2015
622a9bc
This patch debugged data layer, added ocl/util, etc. made run cpu ale…
Jul 13, 2015
1a45bf1
Conv layer FP and BP logic ported. Baseline scheme
Jul 15, 2015
c0ff752
Debugging layer. Not much change for layers
Jul 16, 2015
10f731b
OpenCL porting for relu, sofmax layer
Jul 23, 2015
fc4fa9b
OpenCL porting of pooling layer
Jul 16, 2015
e5dc1d7
OpenCL porting of LRN layers and inner-product layer; fixed some bugs…
Jul 18, 2015
5eeeb29
This patch has conv_org,relu, pooling,fc OpenCL porting and correct
Jul 26, 2015
b72ad4d
Port the softmax layer
Jul 31, 2015
7566846
cleanup the kernel interface of conv, relu and dropout
Jul 31, 2015
77c7cb9
minor fix
Aug 2, 2015
102649b
Ported optimized scheme for conv layer
Aug 2, 2015
b09b8a4
added Makefile.config. As needed by fresh git clone; then make all
Aug 2, 2015
270a7d9
fix some merge bugs; add ./models into git
Aug 2, 2015
3e3fb86
cleaned up sgemm_ex interfaces; re-organized the relu layer kernel an…
Aug 3, 2015
1825769
cleaning up the conv opt interfaces
Aug 3, 2015
0bf2479
conv opt cleaning up cont.
Aug 3, 2015
9179475
Merge branch 'master' of https://github.com/gujunli/caffe-merge
Aug 3, 2015
f3cd448
conv opt forward done
Aug 3, 2015
e95fd84
conf opt backward interfaces
Aug 4, 2015
2fdb29a
finished debugging for conv optimized scheme
Aug 4, 2015
9a41670
minor change
Aug 4, 2015
472b84a
conv layer clean up
Aug 5, 2015
649b3ab
fixed the bug in syncedmem set_cpu_data
Aug 5, 2015
b204a85
gconv opt debug
Aug 6, 2015
4c4b9d3
Split OpenCL kernels into different files
Aug 7, 2015
858b082
Created global kernel map
Aug 8, 2015
6934793
ocl wrappers get kernel from the map @amdDevice.Kernels instead of pa…
Aug 8, 2015
b904fcf
clean the ocl wrappers in conv_layer; check the type of files to be b…
Aug 9, 2015
77c1824
fixed some kernel name errors and re-organized the kernel files
Aug 9, 2015
cdd4d9d
add AMD's license
Aug 10, 2015
ed958d8
This is a test layer
Aug 25, 2015
a45174c
modified the packing number
Aug 25, 2015
5822b93
remove all cuda related flags in Makefile
Aug 27, 2015
4e424b4
modified the packing number
Aug 25, 2015
02762d4
add clFinish in test
Aug 27, 2015
34401f6
fix cmake
Aug 27, 2015
bad971f
Merge branch 'dev' of https://github.com/gujunli/OpenCL-CAFFE-upstrea…
Aug 27, 2015
dfa3955
Remove cuda related code
Aug 28, 2015
415f603
add FindOpenCL and FindclBLAS in cmake/
Aug 30, 2015
17104ed
Fixed conv layers opt2 bug
Sep 1, 2015
33b8282
conv clean up
Sep 1, 2015
40cdc3e
removed all cuDNN files
Sep 1, 2015
b6b96a7
Removed forward_opt and backward_opt functions in conv layer
Sep 1, 2015
20142c4
Enable SetDevice function; clean the code in device.cpp
Sep 3, 2015
b804a1d
Fixed conv layers opt2 bug
Sep 1, 2015
79e246a
conv clean up
Sep 1, 2015
1958793
removed all cuDNN files
Sep 1, 2015
5c66e9b
Removed forward_opt and backward_opt functions in conv layer
Sep 1, 2015
7474975
fixed merge conflicts
Sep 3, 2015
6a6db93
gfixed merge conflicts
Sep 3, 2015
bac49c8
Merge branch 'dev' of /home/yugao/caffe-merge-junli/temp/../upstream_…
Sep 4, 2015
1e07fb5
Merge https://github.com/gujunli/OpenCL-caffe-upstream-test into dev
Sep 4, 2015
8469f86
clean up warining info
Sep 4, 2015
9cf71bb
Remove the annotation code
Sep 4, 2015
dce5407
Partially get through unit test
Sep 4, 2015
0eaad4e
Merge branch 'dev' of https://github.com/gujunli/OpenCL-caffe-upstrea…
Sep 4, 2015
c9b345f
fixed the random seed
Sep 4, 2015
fd4441c
Clean up the last two warnings
Sep 6, 2015
f96ca76
Clean up the last two warings
Sep 6, 2015
5698e3c
Ported hdf5_data hdf5_output log and mvn layer
Sep 6, 2015
0ccf658
Port absval_layer bnll_layer concat_layer contrastive_loss_layer deco…
Sep 6, 2015
dfc6cb1
Fix some bugs in layers' porting
Sep 6, 2015
84d80c2
Ignore log dir
Sep 7, 2015
097f69c
ported new layers
Sep 8, 2015
8266a0a
Made my own last porting layers go through unit test
Sep 8, 2015
c37410b
fix bug in PReLU layer
Sep 9, 2015
454d676
modify conv layers
Sep 9, 2015
c8e5b9f
Pass HDF5 layers unit test
Sep 9, 2015
8f700e8
minor fix
Sep 9, 2015
8166acf
Format the code
Sep 9, 2015
f931d4c
Pass concat_layer & spp_layer; remove kernels in lrn_layer
Sep 9, 2015
432dd92
Fix the bug that CPU mode cannot run
Sep 9, 2015
e45e900
update Readme and License file
Sep 9, 2015
b14dac2
update Readme and License file
Sep 9, 2015
9a78a23
minor fix
Sep 9, 2015
b5792c3
Update README.md
gujunli Sep 9, 2015
947aa9a
Update README.md
gujunli Sep 9, 2015
51872ff
Update README.md
gujunli Sep 9, 2015
af514ad
Update README.md
gujunli Sep 9, 2015
49ecf7c
Update README.md
gujunli Sep 9, 2015
dc1f82a
Update README.md
gujunli Sep 9, 2015
15e5dc5
Update README.md
gujunli Sep 9, 2015
4036485
Update README.md
gujunli Sep 9, 2015
20b4a89
Adjust the code style
Sep 10, 2015
1e1bcd2
Update README.md
gujunli Sep 10, 2015
a8cb6de
Update README.md
gujunli Sep 10, 2015
d5cdc7a
Update README.md
gujunli Sep 10, 2015
915fe5c
Update README.md
gujunli Sep 10, 2015
2ea8289
Update README.md
gujunli Sep 10, 2015
ef00e37
Update README.md
gujunli Sep 10, 2015
ce44b9e
Update README.md
gujunli Sep 10, 2015
44f67c1
Fixed the bug in kernel_channel_sum(), passed throug softmaxwithloss,…
Sep 10, 2015
f8fb6d3
Update README.md
gujunli Sep 10, 2015
1fa4473
merge fix
Sep 10, 2015
fe779df
Added rng_uniform rng_gaussian
Sep 10, 2015
e1bdcb7
Merge Maohua's patch: add rng_gaussian rng_uniform
Sep 10, 2015
e42eeae
fix a template error in random.cl
Sep 10, 2015
900beb8
Add uint random generator
Sep 10, 2015
4adb3d2
Update README.md
gujunli Sep 10, 2015
d2a24e6
Update README.md
gujunli Sep 10, 2015
280f813
add notation
Sep 10, 2015
30d5f21
Adjust the indent
Sep 11, 2015
0ba689a
Merge https://github.com/amd/OpenCL-caffe into dev
Sep 11, 2015
ae39d5d
Passed dropout unit test
Sep 11, 2015
ab502d9
fixed mergre problem
Sep 12, 2015
a92c674
Merge https://github.com/amd/OpenCL-caffe into dev
Sep 13, 2015
cb7cd7b
removed cmakefiles from git repo
Sep 13, 2015
04d42ec
Enable CPU_ONLY flag
Sep 13, 2015
fd94a96
add find path for AMDAPPSDK3.0 and addes src/caffe/CMakeLists.txt
Sep 14, 2015
9d2ba11
Merge branch 'dev' of https://github.com/amd/OpenCL-caffe into dev
Sep 14, 2015
1a2f022
Add the change in tools/
Sep 14, 2015
a3d5b15
Clean test code
Sep 14, 2015
aef701c
Passed PReLU layer's unit test
Sep 15, 2015
c1102c3
Passed through Slice layer
Sep 15, 2015
8b433d1
Passed through Im2col_layer test
Sep 16, 2015
1ec6e88
Update README.md
gujunli Sep 16, 2015
c5eeb40
Tested reduction_layer & deconv_layer
Sep 16, 2015
2711a9c
Merge https://github.com/amd/OpenCL-caffe into dev
Sep 16, 2015
6a46781
Update README.md
gujunli Sep 16, 2015
b0cb051
update gitignore
Sep 16, 2015
e7db7b1
untrack auto generated src/caffe/test/CMakeFiles
Sep 16, 2015
df57731
removed unnecessary cmake files
Sep 16, 2015
6b9fe7a
integrate Mauricio's code review suggestions
Sep 17, 2015
efd5dba
Update README.md
gujunli Sep 17, 2015
660df23
comment on where the code is modified for OpenCL port
Sep 18, 2015
ab0b360
Go through 1x1 convolution
Sep 18, 2015
65adc8d
Merge https://github.com/amd/OpenCL-caffe into dev
Sep 18, 2015
3acadc0
Go through conv_layer
Sep 18, 2015
606117d
Go through GPUMathFunctionsTest
Sep 18, 2015
ecbd837
fixed im2col_opt paramters
Sep 19, 2015
7ac0a96
fixed im2col
Sep 19, 2015
a894a29
direct is_1_1 conv to original scheme
Sep 19, 2015
1511d4e
Removed unused variable in base_conv_layer
Sep 19, 2015
3318335
pass col2im_opt unit test
Sep 20, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -91,3 +91,7 @@ LOCK
LOG*
CURRENT
MANIFEST-*

#cmakefiles
src/caffe/test/CMakeFiles
src/caffe/CMakeFiles
6 changes: 6 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -42,3 +42,9 @@ CONTRIBUTION AGREEMENT
By contributing to the BVLC/caffe repository through pull-request, comment,
or otherwise, the contributor releases their content to the
license and copyright terms herein.

AMD license on the OpenCL parts

AMD holds license for the OpenCL related code, kernels and optimizations.
AMD license is added to the file or part of the file that written by AMD.
For details, please see license declaration for individual file.
96 changes: 36 additions & 60 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -38,13 +38,10 @@ DYNAMIC_NAME := $(LIB_BUILD_DIR)/lib$(PROJECT).so
##############################
# CXX_SRCS are the source files excluding the test ones.
CXX_SRCS := $(shell find src/$(PROJECT) ! -name "test_*.cpp" -name "*.cpp")
# CU_SRCS are the cuda source files
CU_SRCS := $(shell find src/$(PROJECT) ! -name "test_*.cu" -name "*.cu")
# TEST_SRCS are the test source files
TEST_MAIN_SRC := src/$(PROJECT)/test/test_caffe_main.cpp
TEST_SRCS := $(shell find src/$(PROJECT) -name "test_*.cpp")
TEST_SRCS := $(filter-out $(TEST_MAIN_SRC), $(TEST_SRCS))
TEST_CU_SRCS := $(shell find src/$(PROJECT) -name "test_*.cu")
GTEST_SRC := src/gtest/gtest-all.cpp
# TOOL_SRCS are the source files for the tool binaries
TOOL_SRCS := $(shell find tools -name "*.cpp")
Expand All @@ -68,7 +65,7 @@ NONGEN_CXX_SRCS := $(shell find \
matlab/+$(PROJECT)/private \
examples \
tools \
-name "*.cpp" -or -name "*.hpp" -or -name "*.cu" -or -name "*.cuh")
-name "*.cpp" -or -name "*.hpp")
LINT_SCRIPT := scripts/cpp_lint.py
LINT_OUTPUT_DIR := $(BUILD_DIR)/.lint
LINT_EXT := lint.txt
Expand Down Expand Up @@ -103,34 +100,29 @@ PROTO_GEN_PY := $(foreach file,${PROTO_SRCS:.proto=_pb2.py}, \
# These objects will be linked into the final shared library, so we
# exclude the tool, example, and test objects.
CXX_OBJS := $(addprefix $(BUILD_DIR)/, ${CXX_SRCS:.cpp=.o})
CU_OBJS := $(addprefix $(BUILD_DIR)/cuda/, ${CU_SRCS:.cu=.o})
PROTO_OBJS := ${PROTO_GEN_CC:.cc=.o}
OBJS := $(PROTO_OBJS) $(CXX_OBJS) $(CU_OBJS)
OBJS := $(PROTO_OBJS) $(CXX_OBJS)
# tool, example, and test objects
TOOL_OBJS := $(addprefix $(BUILD_DIR)/, ${TOOL_SRCS:.cpp=.o})
TOOL_BUILD_DIR := $(BUILD_DIR)/tools
TEST_CXX_BUILD_DIR := $(BUILD_DIR)/src/$(PROJECT)/test
TEST_CU_BUILD_DIR := $(BUILD_DIR)/cuda/src/$(PROJECT)/test
TEST_CXX_OBJS := $(addprefix $(BUILD_DIR)/, ${TEST_SRCS:.cpp=.o})
TEST_CU_OBJS := $(addprefix $(BUILD_DIR)/cuda/, ${TEST_CU_SRCS:.cu=.o})
TEST_OBJS := $(TEST_CXX_OBJS) $(TEST_CU_OBJS)
TEST_OBJS := $(TEST_CXX_OBJS)
GTEST_OBJ := $(addprefix $(BUILD_DIR)/, ${GTEST_SRC:.cpp=.o})
EXAMPLE_OBJS := $(addprefix $(BUILD_DIR)/, ${EXAMPLE_SRCS:.cpp=.o})
# Output files for automatic dependency generation
DEPS := ${CXX_OBJS:.o=.d} ${CU_OBJS:.o=.d} ${TEST_CXX_OBJS:.o=.d} \
${TEST_CU_OBJS:.o=.d} $(BUILD_DIR)/${MAT$(PROJECT)_SO:.$(MAT_SO_EXT)=.d}
DEPS := ${CXX_OBJS:.o=.d} ${TEST_CXX_OBJS:.o=.d} \
$(BUILD_DIR)/${MAT$(PROJECT)_SO:.$(MAT_SO_EXT)=.d}
# tool, example, and test bins
TOOL_BINS := ${TOOL_OBJS:.o=.bin}
EXAMPLE_BINS := ${EXAMPLE_OBJS:.o=.bin}
# symlinks to tool bins without the ".bin" extension
TOOL_BIN_LINKS := ${TOOL_BINS:.bin=}
# Put the test binaries in build/test for convenience.
TEST_BIN_DIR := $(BUILD_DIR)/test
TEST_CU_BINS := $(addsuffix .testbin,$(addprefix $(TEST_BIN_DIR)/, \
$(foreach obj,$(TEST_CU_OBJS),$(basename $(notdir $(obj))))))
TEST_CXX_BINS := $(addsuffix .testbin,$(addprefix $(TEST_BIN_DIR)/, \
$(foreach obj,$(TEST_CXX_OBJS),$(basename $(notdir $(obj))))))
TEST_BINS := $(TEST_CXX_BINS) $(TEST_CU_BINS)
TEST_BINS := $(TEST_CXX_BINS)
# TEST_ALL_BIN is the test binary that links caffe dynamically.
TEST_ALL_BIN := $(TEST_BIN_DIR)/test_all.testbin

Expand All @@ -139,35 +131,45 @@ TEST_ALL_BIN := $(TEST_BIN_DIR)/test_all.testbin
##############################
WARNS_EXT := warnings.txt
CXX_WARNS := $(addprefix $(BUILD_DIR)/, ${CXX_SRCS:.cpp=.o.$(WARNS_EXT)})
CU_WARNS := $(addprefix $(BUILD_DIR)/cuda/, ${CU_SRCS:.cu=.o.$(WARNS_EXT)})
TOOL_WARNS := $(addprefix $(BUILD_DIR)/, ${TOOL_SRCS:.cpp=.o.$(WARNS_EXT)})
EXAMPLE_WARNS := $(addprefix $(BUILD_DIR)/, ${EXAMPLE_SRCS:.cpp=.o.$(WARNS_EXT)})
TEST_WARNS := $(addprefix $(BUILD_DIR)/, ${TEST_SRCS:.cpp=.o.$(WARNS_EXT)})
TEST_CU_WARNS := $(addprefix $(BUILD_DIR)/cuda/, ${TEST_CU_SRCS:.cu=.o.$(WARNS_EXT)})
ALL_CXX_WARNS := $(CXX_WARNS) $(TOOL_WARNS) $(EXAMPLE_WARNS) $(TEST_WARNS)
ALL_CU_WARNS := $(CU_WARNS) $(TEST_CU_WARNS)
ALL_WARNS := $(ALL_CXX_WARNS) $(ALL_CU_WARNS)
ALL_WARNS := $(ALL_CXX_WARNS)

EMPTY_WARN_REPORT := $(BUILD_DIR)/.$(WARNS_EXT)
NONEMPTY_WARN_REPORT := $(BUILD_DIR)/$(WARNS_EXT)

##############################
# Derive include and lib directories
##############################
CUDA_INCLUDE_DIR := $(CUDA_DIR)/include
#################################
# OpenCL include and library
#################################
OCL_INCLUDE_DIR := $(OCL_DIR)/include
CLBLAS_INCLUDE_DIR := ${CLBLAS_DIR}/include

OCL_LIB_DIR :=
CLBLAS_LIB_DIR :=
# add <OCL>/lib/x86_64 only if it exists
ifneq ("$(wildcard $(OCL_LIB_DIR)/lib/x86_64)","")
OCL_LIB_DIR += $(OCL_DIR)/lib/x86_64
endif
OCL_LIB_DIR += $(OCL_DIR)/lib/x86

# add <CLBLAS_DIR>/lib/ only if it exists
ifneq ("$(wildcard $(CLBLAS_DIR)/lib)","")
CLBLAS_LIB_DIR += $(CLBLAS_LIB_DIR)/lib
endif

CUDA_LIB_DIR :=
# add <cuda>/lib64 only if it exists
ifneq ("$(wildcard $(CUDA_DIR)/lib64)","")
CUDA_LIB_DIR += $(CUDA_DIR)/lib64
# add <CLBLAS_DIR>/lib64/ only if it exists
ifneq ("$(wildcard $(CLBLAS_DIR)/lib64)","")
CLBLAS_LIB_DIR += $(CLBLAS_LIB_DIR)/lib64
endif
CUDA_LIB_DIR += $(CUDA_DIR)/lib

INCLUDE_DIRS += $(BUILD_INCLUDE_DIR) ./src ./include
ifneq ($(CPU_ONLY), 1)
INCLUDE_DIRS += $(CUDA_INCLUDE_DIR)
LIBRARY_DIRS += $(CUDA_LIB_DIR)
LIBRARIES := cudart cublas curand
INCLUDE_DIRS += $(OCL_INCLUDE_DIR) + $(CLBLAS_INCLUDE_DIR)
LIBRARY_DIRS += $(OCL_LIB_DIR) + $(CLBLAS_LIB_DIR)
LIBRARIES += OpenCL clBLAS

endif
LIBRARIES += glog gflags protobuf leveldb snappy \
lmdb boost_system hdf5_hl hdf5 m \
Expand All @@ -187,7 +189,6 @@ ifneq ($(strip $(DISTRIBUTE_DIR)),distribute)
endif

ALL_BUILD_DIRS := $(sort $(BUILD_DIR) $(addprefix $(BUILD_DIR)/, $(SRC_DIRS)) \
$(addprefix $(BUILD_DIR)/cuda/, $(SRC_DIRS)) \
$(LIB_BUILD_DIR) $(TEST_BIN_DIR) $(PY_PROTO_BUILD_DIR) $(LINT_OUTPUT_DIR) \
$(DISTRIBUTE_SUBDIRS) $(PROTO_BUILD_INCLUDE_DIR))

Expand All @@ -206,7 +207,7 @@ DOXYGEN_SOURCES := $(shell find \
matlab/ \
examples \
tools \
-name "*.cpp" -or -name "*.hpp" -or -name "*.cu" -or -name "*.cuh" -or \
-name "*.cpp" -or -name "*.hpp" -or \
-name "*.py" -or -name "*.m")
DOXYGEN_SOURCES += $(DOXYGEN_CONFIG_FILE)

Expand Down Expand Up @@ -242,13 +243,8 @@ endif
ifeq ($(OSX), 1)
CXX := /usr/bin/clang++
ifneq ($(CPU_ONLY), 1)
CUDA_VERSION := $(shell $(CUDA_DIR)/bin/nvcc -V | grep -o 'release \d' | grep -o '\d')
ifeq ($(shell echo | awk '{exit $(CUDA_VERSION) < 7.0;}'), 1)
CXXFLAGS += -stdlib=libstdc++
LINKFLAGS += -stdlib=libstdc++
endif
# clang throws this warning for cuda headers
WARNINGS += -Wno-unneeded-internal-declaration
# todo
#############
endif
# gtest needs to use its own tuple to not conflict with clang
COMMON_FLAGS += -DGTEST_USE_OWN_TR1_TUPLE=1
Expand Down Expand Up @@ -284,12 +280,6 @@ else
COMMON_FLAGS += -DNDEBUG -O2
endif

# cuDNN acceleration configuration.
ifeq ($(USE_CUDNN), 1)
LIBRARIES += cudnn
COMMON_FLAGS += -DUSE_CUDNN
endif

# CPU-only configuration
ifeq ($(CPU_ONLY), 1)
OBJS := $(PROTO_OBJS) $(CXX_OBJS)
Expand Down Expand Up @@ -374,7 +364,7 @@ PYTHON_LDFLAGS := $(LDFLAGS) $(foreach library,$(PYTHON_LIBRARIES),-l$(library))
#
# * Recursive with the exception that symbolic links are never followed, per the
# default behavior of 'find'.
SUPERCLEAN_EXTS := .so .a .o .bin .testbin .pb.cc .pb.h _pb2.py .cuo
SUPERCLEAN_EXTS := .so .a .o .bin .testbin .pb.cc .pb.h _pb2.py

# Set the sub-targets of the 'everything' target.
EVERYTHING_TARGETS := all py$(PROJECT) test warn lint
Expand Down Expand Up @@ -525,26 +515,12 @@ $(PROTO_BUILD_DIR)/%.pb.o: $(PROTO_BUILD_DIR)/%.pb.cc $(PROTO_GEN_HEADER) \
|| (cat $@.$(WARNS_EXT); exit 1)
@ cat $@.$(WARNS_EXT)

$(BUILD_DIR)/cuda/%.o: %.cu | $(ALL_BUILD_DIRS)
@ echo NVCC $<
$(Q)$(CUDA_DIR)/bin/nvcc $(NVCCFLAGS) $(CUDA_ARCH) -M $< -o ${@:.o=.d} \
-odir $(@D)
$(Q)$(CUDA_DIR)/bin/nvcc $(NVCCFLAGS) $(CUDA_ARCH) -c $< -o $@ 2> $@.$(WARNS_EXT) \
|| (cat $@.$(WARNS_EXT); exit 1)
@ cat $@.$(WARNS_EXT)

$(TEST_ALL_BIN): $(TEST_MAIN_SRC) $(TEST_OBJS) $(GTEST_OBJ) \
| $(DYNAMIC_NAME) $(TEST_BIN_DIR)
@ echo CXX/LD -o $@ $<
$(Q)$(CXX) $(TEST_MAIN_SRC) $(TEST_OBJS) $(GTEST_OBJ) \
-o $@ $(LINKFLAGS) $(LDFLAGS) -l$(PROJECT) -Wl,-rpath,$(ORIGIN)/../lib

$(TEST_CU_BINS): $(TEST_BIN_DIR)/%.testbin: $(TEST_CU_BUILD_DIR)/%.o \
$(GTEST_OBJ) | $(DYNAMIC_NAME) $(TEST_BIN_DIR)
@ echo LD $<
$(Q)$(CXX) $(TEST_MAIN_SRC) $< $(GTEST_OBJ) \
-o $@ $(LINKFLAGS) $(LDFLAGS) -l$(PROJECT) -Wl,-rpath,$(ORIGIN)/../lib

$(TEST_CXX_BINS): $(TEST_BIN_DIR)/%.testbin: $(TEST_CXX_BUILD_DIR)/%.o \
$(GTEST_OBJ) | $(DYNAMIC_NAME) $(TEST_BIN_DIR)
@ echo LD $<
Expand Down
100 changes: 100 additions & 0 deletions Makefile.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
## Refer to http://caffe.berkeleyvision.org/installation.html
# Contributions simplifying and improving our build system are welcome!

# Use OpenCL
USE_OPENCL := 1
# OpenCL directory
OCL_DIR := /opt/AMDAPPSDK-2.9-1
# clBLAS directory
CLBLAS_DIR := /opt/clBLAS-2.1

# cuDNN acceleration switch (uncomment to build with cuDNN).
# USE_CUDNN := 1

# CPU-only switch (uncomment to build without GPU support).
# CPU_ONLY := 1

# To customize your choice of compiler, uncomment and set the following.
# N.B. the default for Linux is g++ and the default for OSX is clang++
# CUSTOM_CXX := g++

# CUDA directory contains bin/ and lib/ directories that we need.
#CUDA_DIR := /usr/local/cuda
# On Ubuntu 14.04, if cuda tools are installed via
# "sudo apt-get install nvidia-cuda-toolkit" then use this instead:
# CUDA_DIR := /usr

# CUDA architecture setting: going with all of them.
# For CUDA < 6.0, comment the *_50 lines for compatibility.
#CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \
-gencode arch=compute_20,code=sm_21 \
-gencode arch=compute_30,code=sm_30 \
-gencode arch=compute_35,code=sm_35 \
-gencode arch=compute_50,code=sm_50 \
-gencode arch=compute_50,code=compute_50

# BLAS choice:
# atlas for ATLAS (default)
# mkl for MKL
# open for OpenBlas
BLAS := atlas
# Custom (MKL/ATLAS/OpenBLAS) include and lib directories.
# Leave commented to accept the defaults for your choice of BLAS
# (which should work)!
# BLAS_INCLUDE := /path/to/your/blas
# BLAS_LIB := /path/to/your/blas

# Homebrew puts openblas in a directory that is not on the standard search path
# BLAS_INCLUDE := $(shell brew --prefix openblas)/include
# BLAS_LIB := $(shell brew --prefix openblas)/lib

# This is required only if you will compile the matlab interface.
# MATLAB directory should contain the mex binary in /bin.
# MATLAB_DIR := /usr/local
# MATLAB_DIR := /Applications/MATLAB_R2012b.app

# NOTE: this is required only if you will compile the python interface.
# We need to be able to find Python.h and numpy/arrayobject.h.
PYTHON_INCLUDE := /usr/include/python2.7 \
/usr/lib/python2.7/dist-packages/numpy/core/include
# Anaconda Python distribution is quite popular. Include path:
# Verify anaconda location, sometimes it's in root.
# ANACONDA_HOME := $(HOME)/anaconda
# PYTHON_INCLUDE := $(ANACONDA_HOME)/include \
# $(ANACONDA_HOME)/include/python2.7 \
# $(ANACONDA_HOME)/lib/python2.7/site-packages/numpy/core/include \

# We need to be able to find libpythonX.X.so or .dylib.
PYTHON_LIB := /usr/lib
# PYTHON_LIB := $(ANACONDA_HOME)/lib

# Homebrew installs numpy in a non standard path (keg only)
# PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.__file__)'))/include
# PYTHON_LIB += $(shell brew --prefix numpy)/lib

# Uncomment to support layers written in Python (will link against Python libs)
# WITH_PYTHON_LAYER := 1

# Whatever else you find you need goes here.
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib

# If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies
# INCLUDE_DIRS += $(shell brew --prefix)/include
# LIBRARY_DIRS += $(shell brew --prefix)/lib

# Uncomment to use `pkg-config` to specify OpenCV library paths.
# (Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.)
# USE_PKG_CONFIG := 1

BUILD_DIR := build
DISTRIBUTE_DIR := distribute

# Uncomment for debugging. Does not work on OSX due to https://github.com/BVLC/caffe/issues/171
DEBUG := 1

# The ID of the GPU that 'make runtest' will use to run unit tests.
TEST_GPUID := 0

# enable pretty build (comment to see full commands)
Q ?= @
55 changes: 54 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,57 @@
# Caffe
#OpenCL Caffe

This is an OpenCL implementation of Caffe, a mainstream DNN framework (https://github.com/BVLC/caffe). It includes a largely complete Caffe feature set as of August 2015. The project is under active development to improve performance and add new features. Contributions from the community are welcome.

OpenCL (https://en.wikipedia.org/wiki/OpenCL) is an open standard parallel programming language for heterogeneous platforms. OpenCL is supported by a variety of commercial chip manufacturers.

#Design features
-All Caffe layers ported to OpenCL

-Performance improvement by batched implementation for conv layer based on clBLAS

-The user can choose the optimal batch number depending on H/W properties, image size and minibatch size

-Supports OpenCL 2.0, 1.2

-Implemented in C++ and OpenCL, maintaining the same interfaces as the original Caffe

-Users can directly run DNN models: AlexNet, VGG-16 and VGG-19

Note: More features are planned in the near future. Currently this implementation has been verified and tuned on AMD devices (CPUs/GPUs/APUs). Compatibility across different chip manufacturers will be considered for future addition.

#Performance

We intend to keep updating the latest performance as we make optimizations. Fury results are preliminary and are actively being improved.

* Training speed (Model: AlexNet, minibatch size 128)

-AMD W9100, 255 images per second

-AMD R9 Fury, 261 images per second

* Recognition speed (Model: AlexNet, minibatch size 128)

-AMD W9100, 590 images per second

-AMD R9 Fury, 699 images per second

#Wiki
For more information on how to install, use or contribute to this code base, please visit our wiki page:
https://github.com/amd/OpenCL-caffe/wiki

#Contributors
Junli Gu, Yibing Liu, Yuan Gao, Maohua Zhu

We thank Mauricio Breternitz, Hanjin Chu and Greg Stoner for their technical suggestions and support.

#Support needed
As an open source project, we hope to maintain an open dynamics and sharing culture. We encourage the contribution and support from the community to improve it together.

#License
The original Caffe is provided in the [BSD 2-Clause license](https://github.com/BVLC/caffe/blob/master/LICENSE) open source license. The OpenCL ports written by AMD is covered by AMD license. We encourage the contribution and support from external, your contribution will be covered either by BSD 2-Clause license or whichever your preferred license.

# Original Caffe information
## Caffe

Caffe is a deep learning framework made with expression, speed, and modularity in mind.
It is developed by the Berkeley Vision and Learning Center ([BVLC](http://bvlc.eecs.berkeley.edu)) and community contributors.
Expand Down
Loading