Archive | February 2014

Codebase Analysis Lab

Sysprof is a “statisical, system-wide profiler for Linux”, and one of the packages in the pool for our group.

Beginning this lab, I had to start by installing fedpkg to perform the following commands:
fedpkg clone -a sysprof //move into the folder created; in this case “sysprof”
fedpkg prep //move into the “sysprof-1.2.0” folder.

Once all the files were in place, I started by searching for any files that ended with .s or .S: find $(pwd) | grep -i ".*\.s$" which returned nothing.

Next, I looked for the pattern “asm” inside all the files in the “sysprof-1.2.0” directory: grep "asm" * -r which resulted in:

TODO: /include/asm-i386/mach-default/do_timer.h. This function
util.h:#define rmb() asm volatile("lock; addl $0,0(%%esp)" ::: "memory")
util.h:#define cpu_relax() asm volatile("rep; nop" ::: "memory");
util.h:#define rmb() asm volatile("lfence" ::: "memory")
util.h:#define cpu_relax() asm volatile("rep; nop" ::: "memory");
util.h:#define rmb() asm volatile ("sync" ::: "memory")
util.h:#define cpu_relax() asm volatile ("" ::: "memory");
util.h:#define rmb() asm volatile("bcr 15,0" ::: "memory")
util.h:#define cpu_relax() asm volatile("" ::: "memory");
util.h:# define rmb() asm volatile("synco" ::: "memory")
util.h:# define rmb() asm volatile("" ::: "memory")
util.h:#define cpu_relax() asm volatile("" ::: "memory")
util.h:#define rmb() asm volatile("" ::: "memory")
util.h:#define cpu_relax() asm volatile("" ::: "memory");

The file TODO seemed to list a version history and not important to the requirement of this lab. However, the file util.h contained the following:

#ifndef UTIL_H
#define UTIL_H

#define FMT64 "%"G_GUINT64_FORMAT

#if defined(__i386__)
#define rmb() asm volatile("lock; addl $0,0(%%esp)" ::: "memory")
#define cpu_relax() asm volatile("rep; nop" ::: "memory");
#endif

#if defined(__x86_64__)
#define rmb() asm volatile("lfence" ::: "memory")
#define cpu_relax() asm volatile("rep; nop" ::: "memory");
#endif

#ifdef __powerpc__
#define rmb() asm volatile ("sync" ::: "memory")
#define cpu_relax() asm volatile ("" ::: "memory");
#endif

#ifdef __s390__
#define rmb() asm volatile("bcr 15,0" ::: "memory")
#define cpu_relax() asm volatile("" ::: "memory");
#endif

#ifdef __sh__
#if defined(__SH4A__) || defined(__SH5__)
# define rmb() asm volatile("synco" ::: "memory")
#else
# define rmb() asm volatile("" ::: "memory")
#endif
#define cpu_relax() asm volatile("" ::: "memory")
#endif

#ifdef __hppa__
#define rmb() asm volatile("" ::: "memory")
#define cpu_relax() asm volatile("" ::: "memory");
#endif

#endif

This file exemplifies the purpose of this course. The assembly language is so particular to a processor architecture that lines of code are needed for each architecture. In this case, there are logic statements for 32-bit x86, 64-bit x86, powerPC, IBM ESA/390, SuperH (for models 4 and 5, and then the rest), and finally for HP’s Precision Architecture. The purpose of memory barriers are to ensure specified operations, before and after, are executed in a predictable manner.

While my exposure to memory barriers is extremely limited, but doing a cursory look at memory barriers, it does seem there are solution in both the aarch64 and with c/c++; the latter could remove the architecture dependencies.

The second package we looked at was oprofile, a system-wide profiler for Linux systems. Following a similar search pattern for sysprof, I was unable to find any files ending with .s. However, checking for asm resulted in this. The inline assembly found in libperf_events/operf_utils.h looks similar to the assembly found in sysprof. Looking closer at the entire file (click on link), we can see it has a similar structure to the file found in sysprof, but also having additional lines for additional architectures – including arm and aarch64.

The other files, with the exception of op_hw_specific.h, contained the pattern “asm” as text, in the code descriptor. Taking a closer look at op_hw_specific.h (click on link), looks like a list of functions written in assembly to gather information on the processor properties of the system. With some changes to the syntax of register references, it should not be difficult to make this code adaptable for aarch64.

No Zero — Assembly Lab

Apparently I did not read the entire lab, and missed the requirement of removing the leading 0’s from 00 to 09. It was not very difficult, as I was able to complete it for both x86 and arch64 rather quickly. Here are links to both assembly programs.

I had to reorder the individual steps from my original code. I began by moving the ten place digit to the top of the code; then check the value of the division. If the result is 0, then skip processing of converting to ASCII and writing the value to the output message. The program continues to process the one place digit as normal, as we will always write that to output. I thought this was the easiest and cleanest solution.

-Richard K.

Assembler Lab Catch up

After getting some feedback from Professor Tyler regarding my issue I was able to complete the assembly lab. You can find the loop from 0 to 9 here. The loop through 00 to 30 is here.

Completing the aarch64 assembly took me a bit longer than the x86_64 portion of the lab, this is possibly due to the time in between labs, and having to relearn some of what I figured out.

Here are some of the differences:

Description x86_64 aarch64
value 10 into register 15 mov $10,%r15 mov r15,10
add 0x30 to register 15 add 0x30,%r15 add x15,x15,0x30
go to label if != jne label bne label

There were some other differences that caused some issues. I forgot to include “mov x0,1“, which caused my “Hello, world!” to print only once to the screen instead of 10 times during my initial phase of creating the program. I believe this is equivalent to the following command in x86_64 movq $1,%rdi.

Overall I found the x86_64 assembly easier (maybe because I did those first and was used to its layout), but like most programming languages, the logic is practically the same; it is a matter of learning the syntax in the different environment. I find myself having to take time to write a simple hello world program, but hoping with more practice, I will be able to write more verbose programs in assembly.

Assembler Lab

Assembly is not an easy language to pick up; usually I have no real issues learning new programming and scripting languages. But assembly has proven to be difficult. Many times I have to look at things backwards than I normally would when trying to determine the logic of program. I feel I have to juggle more pieces of the puzzle, which normally I would not have to concern myself with.

I still feel overwhelmed when I look at the simplest assembly, but I am beginning to understand some of it. I think most of my confusion comes from how assembly changes depending on the architecture, the manufacturer or if it’s gas or nasm. I also got to see some inline assembly in C, which also had tags that were foreign to me; it seems the learning curve is going to be steep and unforgiving.

In this lab, we introduced loops, and a little logic to our program. Beginning with X86_64 architecture, I took the hello, world! program and added some looping components to output hello, world ten times. With that amazing feat accomplished, I was able to add an increment to number the loops 0-9.

I had an issue getting the proper output. To append the increment value to the end of the text I used:

movq %r14,msg+6

Thinking this would add the single ascii character the 6th field, but apparently this is not so. My output kept coming out on a single line, and I was forced to add several empty spaces between the loop: and \n that gave me the new line. After adding 7 additional blank spaces did the new line finally show. This is the assembly code I created to get the following output:

Loop: 0
Loop: 1
Loop: 2
Loop: 3
Loop: 4
Loop: 5
Loop: 6
Loop: 7
Loop: 8
Loop: 9

This was after many failed attempts and watching the dreaded endless loop while spamming ctrl-c in hopes of stopping the loop before it gets out of hand.

The lab continues by asking us to do the same in aarch64, but when trying to compile the example assembly in that folder, I received the following error:

hello.s: Assembler messages:
hello.s:5: Error: too many memory references for `mov'
hello.s:6: Error: no such instruction: `adr x1,msg'
hello.s:7: Error: too many memory references for `mov'
hello.s:9: Error: too many memory references for `mov'
hello.s:10: Error: no such instruction: `svc 1'
hello.s:12: Error: too many memory references for `mov'
hello.s:13: Error: too many memory references for `mov'
hello.s:14: Error: no such instruction: `svc 0'
make: *** [hello] Error 1

I attemped to resolve the issue, and it seemed like a common problem from posts found on the internet, but being inexperienced in assembly, I felt lost in how to resolve this issue. If someone has some input, feel free to let me know. I will speak with people in class and get this resolved, and post the aarch64 parts of this lab within a few days.

Continuing on with the lab (x86_64 portions), we are asked to expand on the loop to 30, and to display two characters (00-30). This was a bit tricky. At first, I wanted to try something different than what was suggested, and I failed miserably.

Instead of doing division (which I admit was much simpler), I tried to do it with logic in the attempt to learn more about jumps and compares through trial. After spending some time on it, and failing into the late hours of the night, I completed it as it was recommended in the lab. Doing it this way was much simpler, and provided some insight into the div command and how some of the registers work. This is my final code and output:

Loop:00
Loop:01
Loop:02
Loop:03
Loop:04
Loop:05
Loop:06
Loop:07
Loop:08
Loop:09
Loop:10
Loop:11
Loop:12
Loop:13
Loop:14
Loop:15
Loop:16
Loop:17
Loop:18
Loop:19
Loop:20
Loop:21
Loop:22
Loop:23
Loop:24
Loop:25
Loop:26
Loop:27
Loop:28
Loop:29
Loop:30

I am still getting the hang of using the registries; having such finite control over memory adds a few more balls I need to juggle as I work. I will have to review my notes on register use; understanding why I should use register x over register y (performance?).

The lab was interesting and not as daunting as I first imaged. Sitting down and looking though some of the documentation helped. I look forward to completing the aarch64 once I resolve my errors.

-Richard K.