I am a software developer, currently based in Hamburg, Germany. 
  I completed a PhD in Computer Science at the University of Peloponnese in Greece under the supervision of Konstantinos Masselos and Grigoris Dimitroulakos. My PhD research was focused on program locality optimization using reuse distance analysis. I also hold an MSc diploma in Computer Science from the same university and a BSc degree in Informatics and Telecommunications from the Technological Educational Institute of Peloponnese. 
  I have worked for several years as a full stack developer, both as an employee and as a freelancer. From December 2011 until 2015 I had been working at the Computer Systems Laboratory in the University of Peloponnese as a research assistant and from 2016 until 2018 at CERN as a software developer for the ATTRACT project. 
  Contact email:  
 
  Software/Tools
   MemAssist: Cache memory optimization tool. 
      Paper Manager: Academic publication manager for Joomla. 
      eATTRACT: Proposal submission and evaluation system. 
      MEMSCOPT: Source-to-source compiler. 
       Refereed Publications
 Journal publications
  -     A Locality Optimizer for Loop-Dominated Applications Based on Reuse Distance Analysis  ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 25 Issue 6, September 2020   Source code optimization can heavily improve software code implementation quality while still being complementary to conventional compilers’ optimizations. Source code analysis tools are very useful in supporting source code optimization. This article discusses MemAssist, a source-level optimization environment for semi-automatic locality optimization of loop-dominated code. MemAssist applies reuse distance analysis and a relevant optimization algorithm to explore the design space. It generates a set of suggestions for locality optimizing loop transformations that reduce data cache miss rate and execution time. MemAssist has been used to optimize a number of applications. Experimental results show that MemAssist leads to cache miss rate reduction at all cache layers, memory accesses reduction by up to 42%, and to a speedup of up to three times. Therefore, MemAssist can be used for efficient early-stage software optimization leading to development effort and time reduction. 
 
 
 
-     A Retargetable MATLAB-to-C Compiler Exploiting Custom Instructions and Data Parallelism  ACM Transactions on Embedded Computing Systems (TECS), Volume 19 Issue 6, October 2020   This article presents a MATLAB-to-C compiler that exploits custom instructions present in state-of-the-art processor architectures and supports semi-automatic vectorization. A parameterized processor model is used to describe the target instruction set architecture to achieve user-friendly retargetability. Custom instructions are represented via specialized intrinsic functions in the generated code, which can then be used as input to any C/C++ compiler supporting the target processor. In addition, the compiler supports the generation of data parallel/vectorized code through the introduction of data packing/unpacking statements. The compiler has been used for code generation targeting ARM and x86 architectures for several benchmarks. The vectorized code generated by the compiler achieves an average speedup of 4.1× and 2.7× for packed fixed and floating point data, respectively, compared to scalarized code for ARM architecture and an average speedup of 3.1× and 1.5× for packed fixed and floating point data, respectively, for x86 architecture. Implementing data parallel instructions directly in the assembly code would have required a lot of design effort, and it would not been sustainable across evolving platform variants. Thus, the compiler can be employed to efficiently speed up critical sections of the target application. The compiler is therefore potentially employable to raise the design abstraction and reduce development time for both embedded and general-purpose applications. 
 
 
 
-     A MATLAB vectorizing compiler targeting Application Specific Instruction Set Processors  ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 22 Issue 2, March 2017   This article discusses a MATLAB-to-C vectorizing compiler that exploits custom instructions, for example, for Single Instruction Multiple Data (SIMD) processing and instructions for complex arithmetic present in Application-Specific Instruction Set Processors (ASIPs). Custom instructions are represented via specialized intrinsic functions in the generated code, and the generated code can be used as input to any C/C++ compiler supporting the target processor. Furthermore, the specialized instruction set of the target processor is described in a parameterized way using a target processor-independent architecture description approach, thus allowing the support of any processor. The compiler has been used for the generation of application code for two different ASIPs for several benchmarks. The code generated by the compiler achieves a speedup between 2× --74× and 2× --97× compared to the code generated by the MathWorks MATLAB-to-C compiler. Experimental results also prove that the compiler efficiently exploits SIMD custom instructions achieving a 3.3 factor speedup compared to cases where no SIMD processing is used. Thus the compiler can be employed to reduce the development time/effort/cost and time to market through raising the abstraction of application design in an embedded systems/system-on-chip development context. 
 
 
 
Conference and workshop publications
  -     Compiler-directed data locality optimization in MATLAB  Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems (SCOPES), Sankt Goar, Germany, May 23-25th, 2016   Array programming languages, such as MATLAB, are often used for algorithm development by scientists and engineers without taking into consideration implementation related issues and with limited emphasis on relevant optimizations. Application code optimization, especially in terms of data storage and transfer behavior, is still an important issue and heavily affects implementations' quality in terms of performance, power consumption etc. Efficient approaches for the optimization of high level application code are required to derive high quality implementations while still reducing development time and cost. This paper presents MemAssist, a software tool supporting application developers in detecting parts of the application code in MATLAB that do not exploit efficiently the targeted processor architecture and especially the memory hierarchy. Furthermore, the proposed tool guides application developers in applying code transformations in MATLAB for the optimization of the algorithm's temporal data locality. An image processing algorithm has been optimized using MemAssist as a practical usage scenario. Experimental results prove that the use of MemAssist can heavily reduce cache misses (up to 40%) and improve execution time (up to 30% speedup) on two different processor architectures. Thus, MemAssist can be used for optimized application code development that can lead to efficient implementations while still reducing development time and cost. 
 
 
 
-     Automatic generation of code analysis tools: The CastQL approach  Proceedings of the 1st International Workshop on Real World Domain Specific Languages (RWDSL), held in conjunction with the 2016 International Symposium on Code Generation and Optimization (CGO), Barcelona, Spain, March 12-18, 2016   Source code analysis and manipulation tools have become an essential part of software development processes. Automating the development of such tools can heavily reduce development time, effort and cost. This paper proposes a framework for the efficient development of code analysis software. A tool for automatically generating the front end of analysis tools for a given language grammar is proposed. The proposed approach can be applied to any language that can be described using the BNF notation. The proposed framework also provides a domain specific language to concisely express queries on the internal representation generated by the front end. This language tackles the problem of writing complex code in a general purpose programming language in order to retrieve information from the internal representation. The approach has been evaluated through two different realistic usage scenarios applied to a number of different benchmark applications. The front end generator has also been tested for twenty input grammars. In all cases the software generated by the proposed framework functions according to the input grammar while the development time has been reduced on average down to 12% compared to equivalent handwritten implementations. The experimental results give evidence that the use of the proposed framework can heavily reduce the relevant design effort and cost. 
 
 
 
-     MAFE: An environment for MATLAB-to-C compilation supporting static and dynamic memory allocation and multi-level user interactive code optimization  Proceedings of the 2016 International Symposium on Code Generation and Optimization (CGO), Barcelona, Spain, March 12-18, 2016   MATLAB compilation to lower / implementation level languages is performed for application development (e.g. embedded C generation, high-level synthesis to VHDL) and for performance optimization. In this work MAFE, an environment for MATLAB-to-C compilation is proposed. The C code generated by MAFE allocates memory for arrays both statically and dynamically. MAFE's approach for dynamic memory allocation preallocates arrays and a maximum and an imaginary size are assigned to them. The imaginary size changes when a new value is assigned to the array. This way the array size can change dynamically without any reallocation cost. Furthermore, MAFE can optionally generate exception functions that implement runtime checks on the arrays' sizes which is an advantage over Mathworks' MATLAB Coder that infers all array sizes at compile time and does not generate code for execution time size checks. MAFE environment also includes a source code optimizer applying loop and data reuse exploitation transformations. The optimizer supports developers in efficiently applying transformations interactively both at low level (C code) and at high level (MATLAB code). Experimental results prove that the optimizer can: 1) improve execution time of a MATLAB algorithm up to 30% for the C code generated by MAFE and up to 22% for the C code generated by MATLAB Coder, and 2) reduce cache misses up to 31% for the C code generated by MAFE and up to 40% for the C code generated by MATLAB Coder. 
 
 
 
-     MATLAB-to-C compilation targeting Application Specific Instruction Set Processors  Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, March 14-18, 2016   This paper discusses a MATLAB to C compiler exploiting custom instructions such as instructions for SIMD processing and instructions for complex arithmetic present in Application Specific Instruction Set Processors (ASIPs). The compiler generates ANSI C code in which the processor's special instructions are represented via specialized intrinsic functions. By doing this the generated code can be used as input to any C/C++ compiler. Thus the proposed compiler allows the description of the specialized instruction set of the target processor in a parameterized way allowing the support of any processor. The proposed compiler has been used for the generation of application code for an ASIP targeting DSP applications. The code generated by the proposed compiler achieves a speed up between 2x-30x on the targeted ASIP for six DSP benchmarks compared to the code generated by Mathworks MATLAB to C compiler. Thus the proposed compiler can be employed to reduce the development time/effort/cost and time to market by raising the abstraction of application design in an embedded systems / system-on-chip development context while still improving implementation efficiency. 
 
 
 
-     Reuse distance analysis for locality optimization in loop-dominated applications  Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, March 9-13, 2015   This paper discusses MemAddIn, a compiler assisted dynamic code analysis tool that analyzes C code and exposes critical parts for memory related optimizations on embedded systems that can heavily affect systems performance, power and cost. The tool includes enhanced features for data reuse distance analysis and source code transformation recommendations for temporal locality optimization. Several of data reuse distance measurement algorithms have been implemented leading to different trade-offs between accuracy and profiling execution time. The proposed tool can be easily and seamlessly integrated into different software development environments offering a unified environment for application development and optimization. The novelties of our work over a similar optimization tool are also discussed. MemAddIn has been applied for the dynamic computation of data reuse distance for a number of different applications. Experimental results prove the effectiveness of the tool through the analysis and optimization of a realistic image processing application. 
 
 
 
-     Dynamic source code analysis for memory hierarchy optimization in multimedia applications  Proceedings of the 2013 Conference on Design and Architectures for Signal and Image Processing (DASIP), Cagliari, Italy, October 8-10, 2013   Realizing image and signal processing algorithms in embedded systems is a three step process including algorithmic design, implementation and mapping to a target architecture and memory hierarchy. This paper presents MemAddIn, a dynamic analysis tool for C applications that exposes the critical application's loops which deserve the designer's attention for memory hierarchy optimization. MemAddIn is based on an extension of MEMSCOPT compiler and integrates in the Visual Studio IDE offering a unified environment for the application's implementation and optimization. To conclude on the criticality of the application loops the tool utilizes two metrics which are relevant with the underlying memory architecture cost and performance. 
 
 
 
-     MEMSCOPT: A source-to-source compiler for dynamic code analysis and loop transformations  Proceedings of the 2012 Conference on Design and Architectures for Signal and Image Processing (DASIP), Karlsruhe, Germany, October 23-25, 2012   In this paper, we present MEMSCOPT, a source-to-source compiler incorporated in a system level design tool chain for dynamic code analysis and loop transformations targeting memory performance optimization. MEMSCOPT is user interactive, supported by both Windows and Linux platforms and integrates with Visual Studio and NetBeans. 
 
 
 
-     XMSIM: A tool for early memory hierarchy evaluation  Proceedings of the 2012 Conference on Design and Architectures for Signal and Image Processing (DASIP), Karlsruhe, Germany, October 23-25, 2012   In this demonstration we present the usage of XMSIM, a tool for memory hierarchy evaluation of multimedia applications. The input is a high level C code application description and a memory hierarchy specification and the output are the statistics characterizing the memory operation. 
 
 
 
   Participation in Research Projects
   ALMA: project (ICT-2011. 287733), "Architecture oriented paraLlelization for high performance embedded Multicore systems using scilAb". 
      ENOSYS: project (ICT-2009.3.4), "intEgrated modelliNg and synthesis tOol flow for embedded SYStems design". 
       Teaching
  -  Compilers I (Lab) (Teaching Assistant). 4th semester course at the department of Informatics and Telecommunications, University of Peloponnese.   Spring 2013, Fall 2013  
-  Programming II (Lab) (Laboratory Associate). 2nd semester course at the department of Informatics Engineering, Technological Educational Institute of Peloponnese.   Spring 2014  
-  Databases I (Lab) (Laboratory Associate). 3rd semester course at the department of Informatics Engineering, Technological Educational Institute of Peloponnese.   Fall 2012  
-  Data Warehousing (Lab) (Laboratory Associate). 7th semester course at the department of Informatics Engineering, Technological Educational Institute of Peloponnese.   Fall 2012