My Bachelor Thesis Sat, Mar 29. 2014
This PDF is my Bachelor Thesis (german), the grade for the thesis is 1,3 (see this entry in wikipedia for a comparison of grades). I am pretty proud about it. Apart from diving into evolutionary algorithms, implementing these from scratch on CUDA C/C++ and applying those within the context of a specific problem domain (portfolio optimization) was a lot of fun.
It was a great opportunity to utilize the latest tech, C/C++ and CUDA ... I was able to achieve a speedup of 22 on a GPU (GTX560Ti) in comparison to one CPU core. I used Latex and the great TexMaker IDE for the text, DIA and Inkscape for drawings, etc.. Some calculations and plots were created with the R package and associated plugins. I used libeigen3 for vector-math on a single cpu core (on GPU I wrote those operations myself).
Now I am just waiting for my actual degree to arrive
It was a great opportunity to utilize the latest tech, C/C++ and CUDA ... I was able to achieve a speedup of 22 on a GPU (GTX560Ti) in comparison to one CPU core. I used Latex and the great TexMaker IDE for the text, DIA and Inkscape for drawings, etc.. Some calculations and plots were created with the R package and associated plugins. I used libeigen3 for vector-math on a single cpu core (on GPU I wrote those operations myself).
Now I am just waiting for my actual degree to arrive

CUDA Single-GPU Debugging (Breakpoints) Fri, Oct 4. 2013
I use a simple GTX 560Ti and the CUDA 5.5 SDK to write my CUDA Code
I use Ubuntu Linux 12.04 LTS as OS.
You can develop your CUDA Apps and use cuPrintf for Debugging (or even printf with the latests CUDA SDKs). That's actually good enough for 90% of all use cases.
Anyway if you want to set a breakpoint and check out the vars using NVIDIA NSight on Linux the Debugger can't break into your CUDA Kernel Code - since your X Display is utilizing the graphics Card (you can set breakpoints outside of the kernel).
My Solution
I run my XServer on another machine. I connect to the target machine via XDMCP. I ensure that the NVidia Graphics Card isn't being used on the target machine by starting Xvfb (which uses the CPU) instead of Xorg. As an XServer for Windows I recommend using VcXsrv. Since I use Windowmaker, everything is still pretty lightweight ...
The following assumes you are using lightdm as your Display Manager ..
• Enable XDMCP for lightdm via lightdm.conf
• Install XVfb .
• Modify your lightdm.conf so it doesn't try to open a local Display,but instead just uses Xvfb. Use a custom xserver-command
• Now start your XServer on Windows (or Linux) and connect to the machine via XDMCP.
This is my lightdm.conf
Actually NSight for Visual Studio 2010 is pretty cool, unfortunately single GPU Debugging doesn't work as expected.. Display flickers, etc... On the first breakpoint, the Debugger breaks
.. but you can't step... and then Windows 7 resets the Display Driver or so... But I guess most people use a Dual GPU config anyway. I am pretty happy I can work at that level using a consumer-level GPU.
Have fun
I use Ubuntu Linux 12.04 LTS as OS.
You can develop your CUDA Apps and use cuPrintf for Debugging (or even printf with the latests CUDA SDKs). That's actually good enough for 90% of all use cases.
Anyway if you want to set a breakpoint and check out the vars using NVIDIA NSight on Linux the Debugger can't break into your CUDA Kernel Code - since your X Display is utilizing the graphics Card (you can set breakpoints outside of the kernel).
My Solution
I run my XServer on another machine. I connect to the target machine via XDMCP. I ensure that the NVidia Graphics Card isn't being used on the target machine by starting Xvfb (which uses the CPU) instead of Xorg. As an XServer for Windows I recommend using VcXsrv. Since I use Windowmaker, everything is still pretty lightweight ...
The following assumes you are using lightdm as your Display Manager ..
• Enable XDMCP for lightdm via lightdm.conf
• Install XVfb .
• Modify your lightdm.conf so it doesn't try to open a local Display,but instead just uses Xvfb. Use a custom xserver-command
• Now start your XServer on Windows (or Linux) and connect to the machine via XDMCP.
This is my lightdm.conf
[XDMCPServer]This is the xserverrc2 file I use
enabled=true
[SeatDefaults]
greeter-session=unity-greeter
user-session=ubuntu
xserver-command=/etc/X11/xinit/xserverrc2
#!/bin/sh
exec Xvfb :0 -screen 0 1280x1024x24
Actually NSight for Visual Studio 2010 is pretty cool, unfortunately single GPU Debugging doesn't work as expected.. Display flickers, etc... On the first breakpoint, the Debugger breaks

Have fun
My BSc Thesis Wed, Sep 11. 2013
Hi everybody,
I have almost completed by BSc Informatik Studies (BSc Computer Science), and I am now writing my BSc Thesis...
It is called
which can be roughly translated to
More specifically, the financial application Domain is Modern Portfolio Theory, and it's extensions (integer constraints, Transaction costs). See Luenberger - Investment Science for an overview of the topic, and Maringer - Portfolio Management with Heuristic Optimization for the extensions.
The target platforms will be "traditional" shared memory processors (x86) using C/C++ and GPUs using CUDA.
I have three months to complete my thesis and I am starting right now..
I have almost completed by BSc Informatik Studies (BSc Computer Science), and I am now writing my BSc Thesis...
It is called
Parallelisierung von Genetischen Algorithmen für Anwendungen der Finanzwirtschaft
which can be roughly translated to
Parallelization of Genetic Algorithms for Applied Finance
More specifically, the financial application Domain is Modern Portfolio Theory, and it's extensions (integer constraints, Transaction costs). See Luenberger - Investment Science for an overview of the topic, and Maringer - Portfolio Management with Heuristic Optimization for the extensions.
The target platforms will be "traditional" shared memory processors (x86) using C/C++ and GPUs using CUDA.
I have three months to complete my thesis and I am starting right now..
I am now a Certified Qt Developer ... Fri, May 4. 2012
Yeah, today I passed the Qt Essentials Exam, and therefore I am a Nokia Certified Qt Developer now.
Please, stay seated !
Please, stay seated !

"Beating" the linux standard quicksort (glibc) Thu, Nov 17. 2011
I am programming parallel sorting algorithms with MPI / C++ on computer clusters right now, and therefore had to implement qsort in a serial fashion (before creating a parallelized version of it) ..
For arrays of integers, the following "pedestrian" C-code implementation beats the builtin quicksort implementation (defined by ISO) on Linux by being 58% percent faster...
These are my results for a randomized array of integers...
Not bad.. The reason? Inlining. . Rather missing inlining. For glibc qsort, the comparison method (which I provide) can't be inlined, since the code for qsort has already been generated, it's in glibc. Again, in my implementation the comparison is within the existing divide method.
The complete file, for those interested: main_sort.cpp. For large arrays, be sure to set the maximum stacksize accordingly.
Again, on Windows the qsort provided by VS 2010 is twice as fast as my implementation!
PS: I know, this ain't real "beating", because it doesn't improve the algorithmic complexity
For arrays of integers, the following "pedestrian" C-code implementation beats the builtin quicksort implementation (defined by ISO) on Linux by being 58% percent faster...
inline void swap(int* p1, int* p2) {
int tmp = *p1;
*p1 = *p2;
*p2 = tmp;
}
inline int divide(int* start, int* end, int pivotIndex) {
int len = end - start + 1;
int pivot = start[pivotIndex];
int storeIndex = 0;
swap(&start[pivotIndex], &start[len-1]);
for (int i=0; i < len-1; i++) {
if (start[i] < pivot) {
swap(&start[i], &start[storeIndex++]);
}
}
swap(&start[storeIndex], &start[len-1]);
return storeIndex;
}
inline void mysort(int* start, int* end) {
int len = end - start + 1;
if (start >= end || len == 1) {
return;
}
int pivotIndex = rand() % len;
int newPivotIndex = divide(start, end, pivotIndex);
mysort(&start[0], &start[newPivotIndex-1]);
mysort(&start[newPivotIndex+1], end);
}
These are my results for a randomized array of integers...
Sorting 57 MB
My sort: 3694 msec
Quicksort: 5861 msec
Intel(R) Pentium(R) D CPU 3.00GHz (Presler)
2GB RAM
Linux 2.6.32-5-amd64 x86_64 GNU/Linux
g++ (Debian 4.4.5-8) 4.4.5
glibc 2.11.2
Not bad.. The reason? Inlining. . Rather missing inlining. For glibc qsort, the comparison method (which I provide) can't be inlined, since the code for qsort has already been generated, it's in glibc. Again, in my implementation the comparison is within the existing divide method.
The complete file, for those interested: main_sort.cpp. For large arrays, be sure to set the maximum stacksize accordingly.
Again, on Windows the qsort provided by VS 2010 is twice as fast as my implementation!
PS: I know, this ain't real "beating", because it doesn't improve the algorithmic complexity
Latest NVIDIA 285.58 WHQL + Quadcore + Borderlands == Deadlock Sun, Nov 13. 2011
Hi Folks,
Problem
I downladed the latest NVIDIA drivers for my 8800GT. Everything worked fine until I tried to start Borderlands.
I was pretty busy lately - as always - and wanted to relax for an hour or so.
It hung at the splash screen. Damn. It really hung. CPU 0%, NO IO being done. Nothing.
Argh...
I attached my VS.NET 2010 debugger to that process and took a glimpse.

Hmm.. nothing interesting. WaitForSingleObject means basically waiting for ownership of a Mutex or Win32 Kernel Event. A deadlock? Seems like some optimizations in the latest NVIDIA driver exposed a vulnerability of the game code towards deadlocks. It's probably not NVIDIAs fault ..
Anyway, the other threads didn't provide any useful info. Actually, I had no time. I just wanted to take one or two hours off and relax now.. So i decided to work-around it.
This is my quick hack
I decided to forcibly serialize / "single-thread" the execution..
I simply started Borderlands from Steam, as soon as the process started to appear in the process manager, I changed the affinity for that process to one physical CPU
This won't make the execution single-threaded, but would reduce the probability of the dead lock because there is no hardware parallelism anymore wrt CPU cores.

You have to be quick
If the splash screen appears, it's too late ... .

... Then the game started normally, it passed the splash screen ... I immediately activated all cores again.
If you have a SSD this may be difficult :-/ Anyway I could continue this way.
And off I went blasting off some psychos
REMEMBER: If it took more than one shot, you weren't using a Jakobs
Problem
I downladed the latest NVIDIA drivers for my 8800GT. Everything worked fine until I tried to start Borderlands.
I was pretty busy lately - as always - and wanted to relax for an hour or so.
It hung at the splash screen. Damn. It really hung. CPU 0%, NO IO being done. Nothing.
Argh...
I attached my VS.NET 2010 debugger to that process and took a glimpse.
Hmm.. nothing interesting. WaitForSingleObject means basically waiting for ownership of a Mutex or Win32 Kernel Event. A deadlock? Seems like some optimizations in the latest NVIDIA driver exposed a vulnerability of the game code towards deadlocks. It's probably not NVIDIAs fault ..
Anyway, the other threads didn't provide any useful info. Actually, I had no time. I just wanted to take one or two hours off and relax now.. So i decided to work-around it.
This is my quick hack
I decided to forcibly serialize / "single-thread" the execution..

I simply started Borderlands from Steam, as soon as the process started to appear in the process manager, I changed the affinity for that process to one physical CPU

You have to be quick

... Then the game started normally, it passed the splash screen ... I immediately activated all cores again.
If you have a SSD this may be difficult :-/ Anyway I could continue this way.
And off I went blasting off some psychos

REMEMBER: If it took more than one shot, you weren't using a Jakobs
Posted by Amanjit Singh Gill
in C++, Mostly Harmless, Windows programming Comments: (0)
Trackbacks: (0)
Vym (View your Mind) Mindmapping tool for Windows Sat, Oct 15. 2011
Hi Folks,
I wanted to do some mindmapping on Windows and liked vym. No official Windows port available. The one I found wasn't able to save my file to the Desktop (I didn't read the known issues - no spaces in path allowed), so I had to use Freemind which is just the typical huge, clunky java "App". I bit the bullet and created 2 diagrams (needed for my parallel programming course at fernuni hagen), anyway after studying I went to work, came back home in the evening and just downloaded the sources and built a version with the current Qt SDK. Some minor problems with a QDBus dependency, fixing stuff while compiling. 10 Minutes.
Then I converted my 2 diagrams and erased Freemind from my HD.
Problem. Solved.
A page with further info and a download can be found here (including GPL sourcecode):
Vym (View your mind) for Windows
Have fun!
I wanted to do some mindmapping on Windows and liked vym. No official Windows port available. The one I found wasn't able to save my file to the Desktop (I didn't read the known issues - no spaces in path allowed), so I had to use Freemind which is just the typical huge, clunky java "App". I bit the bullet and created 2 diagrams (needed for my parallel programming course at fernuni hagen), anyway after studying I went to work, came back home in the evening and just downloaded the sources and built a version with the current Qt SDK. Some minor problems with a QDBus dependency, fixing stuff while compiling. 10 Minutes.
Then I converted my 2 diagrams and erased Freemind from my HD.

Problem. Solved.
A page with further info and a download can be found here (including GPL sourcecode):
Vym (View your mind) for Windows
Have fun!
A minimal CMake Finder Macro for the Google C++ Mocking Framework Mon, Sep 26. 2011
Hi,
If you just want to get started quickly with google's own c++ mocking framework (which also bundles their testing framework), you might find my cmake macro handy (That is, if you are using cmake
). You can download my macro from here:FindGoogleMockMinimal.cmake
It's for those cases where you are just banging out an test executable and need the gmock-all.cc and gtest-all.cc files in order to make your code run. Therefore you just need an unpacked google-mock sourcecode package, and don't have to build it..
Step 1: CMakeLists.txt changes
Here's how your CMakeLists.txt would look like:
As you can see, you only need ${GMOCKMINIMAL_INCLUDE_DIRS} and ${GMOCKMINIMAL_SRC}.
In order to use the CMake Finder Macro you have to first "register" it in your CMake file - f.e. if you put it into a subfolder cmake/Modules, this is the snippet:
Step 2: Configuring cmake
At configuration time, set the variable ${GMOCK_SOURCE_ROOT} to point to the gmock sourcecode.
That's it
Here is the CMake Macro File in all its glory
Once again, the download link: FindGoogleMockMinimal.cmake
Have fun
If you just want to get started quickly with google's own c++ mocking framework (which also bundles their testing framework), you might find my cmake macro handy (That is, if you are using cmake

It's for those cases where you are just banging out an test executable and need the gmock-all.cc and gtest-all.cc files in order to make your code run. Therefore you just need an unpacked google-mock sourcecode package, and don't have to build it..
Step 1: CMakeLists.txt changes
Here's how your CMakeLists.txt would look like:
find_package(GoogleMockMinimal REQUIRED)
INCLUDE_DIRECTORIES(${GMOCKMINIMAL_INCLUDE_DIRS})
FILE(GLOB MY_TEST_SRCS .cpp .h)
ADD_EXECUTABLE(MyTest
${GMOCKMINIMAL_SRC}
${MY_TEST_SRCS}
)
As you can see, you only need ${GMOCKMINIMAL_INCLUDE_DIRS} and ${GMOCKMINIMAL_SRC}.
In order to use the CMake Finder Macro you have to first "register" it in your CMake file - f.e. if you put it into a subfolder cmake/Modules, this is the snippet:
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_SOURCE_DIR}/cmake/Modules/")
Step 2: Configuring cmake
At configuration time, set the variable ${GMOCK_SOURCE_ROOT} to point to the gmock sourcecode.
cmake -DGMOCK_SOURCE_ROOT=/wherever/it/is
That's it
Here is the CMake Macro File in all its glory

#
#
# Locate and configure the Google Mock (and bundled Google Test) libraries for minimal setup
# Tested with google mock version: 1.6, Win32 (VC.NET 2003) & Linux (gcc 4.x.x)
#
# Defines the following variables:
#
# GMOCKMINIMAL_FOUND - Found the Google Mock libraries
# GMOCKMINIMAL_INCLUDE_DIRS - The directories needed for include paths
# GMOCKMINIMAL_SRC - The minimal set off .cc files to use with an executable (mock+test)
#
# Copyright 2011 Amanjit Gill (amanjit.gill@gmx.de)
# Based on a CMake macro from Chandler Carruth
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not
# use this file except in compliance with the License. You may obtain a copy
# of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
#
#
if(GMOCKMINIMAL_INCLUDE_DIRS)
set(GMOCKMINIMAL_FOUND true)
else(GMOCKMINIMAL_INCLUDE_DIRS)
set(GMOCK_SOURCE_ROOT "" CACHE PATH "Source folder for Google Mock")
if(GMOCK_SOURCE_ROOT)
find_path(_GMOCKMINIMAL_INCLUDE_DIR gmock/gmock.h
PATHS "${GMOCK_SOURCE_ROOT}/include"
PATH_SUFFIXES ""
NO_DEFAULT_PATH)
find_path(_GTESTMINIMAL_INCLUDE_DIR gtest/gtest.h
PATHS "${GMOCK_SOURCE_ROOT}/gtest/include"
PATH_SUFFIXES ""
NO_DEFAULT_PATH)
find_file(_GMOCKMINIMAL_SRC /src/gmock-all.cc
PATHS "${GMOCK_SOURCE_ROOT}"
PATH_SUFFIXES ""
NO_DEFAULT_PATH)
find_file(_GTESTMINIMAL_SRC /gtest/src/gtest-all.cc
PATHS "${GMOCK_SOURCE_ROOT}"
PATH_SUFFIXES ""
NO_DEFAULT_PATH)
else(GMOCK_SOURCE_ROOT)
find_path(_GMOCKMINIMAL_INCLUDE_DIR gmock/gmock.h
PATH_SUFFIXES "")
find_path(_GTESTMINIMAL_INCLUDE_DIR gtest/include/gtest.h
PATH_SUFFIXES "")
find_path(_GMOCKMINIMAL_SRC src/gmock-all.cc
PATH_SUFFIXES "")
find_path(_GTESTMINIMAL_SRC gtest/src/gtest-all.cc
PATH_SUFFIXES "")
endif(GMOCK_SOURCE_ROOT)
if(_GMOCKMINIMAL_INCLUDE_DIR)
set(GMOCKMINIMAL_FOUND true)
set(GMOCKMINIMAL_INCLUDE_DIRS ${_GMOCKMINIMAL_INCLUDE_DIR} ${_GTESTMINIMAL_INCLUDE_DIR}
${GMOCK_SOURCE_ROOT} ${GMOCK_SOURCE_ROOT}/gtest CACHE PATH
"Include directories for Google Mock library")
set(GMOCKMINIMAL_SRC ${_GMOCKMINIMAL_SRC} ${_GTESTMINIMAL_SRC} CACHE PATH
"Source paths for Google Mock / Google Test combined cpp file (gmock-all.cc and gtest-all.cc)")
mark_as_advanced(GMOCKMINIMAL_INCLUDE_DIRS)
mark_as_advanced(GMOCKMINIMAL_SRC)
if(NOT GoogleMockMinimal_FIND_QUIETLY)
message(STATUS "Found minimal setup for the Google Mock library: ${GMOCK_SOURCE_ROOT}")
endif(NOT GoogleMockMinimal_FIND_QUIETLY)
else(_GMOCKMINIMAL_INCLUDE_DIR)
if(GoogleMockMinimal_FIND_REQUIRED)
message(FATAL_ERROR "Could not find the Google Mock library")
endif(GoogleMockMinimal_FIND_REQUIRED)
endif(_GMOCKMINIMAL_INCLUDE_DIR)
endif(GMOCKMINIMAL_INCLUDE_DIRS)
Once again, the download link: FindGoogleMockMinimal.cmake
Have fun
"Prepare ship for ludicrous speed" ... the /Ox compiler flag Thu, Jun 30. 2011
Reading the Codeproject daily newsletter I found out about a cool C++ article on Codeproject about Image Manipulation using modern C++ features for abstractions that theoretically could be optimized away by the compiler - aka zero cost abstraction..
Anyway I noticed the VC10 code was significantly slower than the g++ code.
So in a comment, I asked about using the /Ox compiler flag (I don't have access to VC10, the last thing I bought was VC7.1)...
... And charts have to be redrawn now, I guess..
(Yeah, I know /O2 still is the preferred choice for most projects...)
Anyway,
C++ 4TW
PS: Do you Remember Spaceballs?.
Anyway I noticed the VC10 code was significantly slower than the g++ code.
So in a comment, I asked about using the /Ox compiler flag (I don't have access to VC10, the last thing I bought was VC7.1)...
... And charts have to be redrawn now, I guess..

(Yeah, I know /O2 still is the preferred choice for most projects...)
Anyway,
C++ 4TW
PS: Do you Remember Spaceballs?.
Comparing Linear Algebra packages for C++ Thu, Sep 23. 2010
An Audio Interview with Bjarne Stroustrup @ Deutschlandfunk Wed, Sep 1. 2010
"C wurde so konzipiert, dass der Compiler 48 Kilobyte Speicherplatz belegt. Kilobyte, nicht Megabyte!"
"Deshalb hätte der Vater von C, Dennis Ritchie, sicherlich einiges anders gemacht, wenn er 256 Kilobyte gehabt hätte wie ich, als ich mit C++ begann."

The whole interview: A Deutschlandfunk Audio interview with Bjarne Stroustrup
Another cool MFC extension Fri, Aug 27. 2010
Nice.. Another cool GUI contribution... available at the gold old/modern friend of Win32 and .NET Programmers, Codeproject.

I can here you say:
MFC? Are you kidding me?
Jep, MFC is still going strong. Of course, one could argue that programming to the metal with something as WTL has a bit more API elegancy. Also MFC has this modern Ribbon API.
And the whole binary is 808kb, damn it!
I can here you say:
MFC? Are you kidding me?
Jep, MFC is still going strong. Of course, one could argue that programming to the metal with something as WTL has a bit more API elegancy. Also MFC has this modern Ribbon API.
And the whole binary is 808kb, damn it!
« previous page
(Page 1 of 1, totaling 12 entries)
next page »