What system calls have to be executed by a command interpreter in order to start a new process?

Administrivia
What system calls have to be executed by a command interpreter in order to start a new process?
 Overview
What system calls have to be executed by a command interpreter in order to start a new process?
 Course email
 
Materials
What system calls have to be executed by a command interpreter in order to start a new process?
 Lecture Schedule
What system calls have to be executed by a command interpreter in order to start a new process?
 Sections
What system calls have to be executed by a command interpreter in order to start a new process?
 Discussion Board
 
Assignments
What system calls have to be executed by a command interpreter in order to start a new process?
 Homework
What system calls have to be executed by a command interpreter in order to start a new process?
 Projects
What system calls have to be executed by a command interpreter in order to start a new process?
 Turnin
What system calls have to be executed by a command interpreter in order to start a new process?
 Gradebook
 
Information
What system calls have to be executed by a command interpreter in order to start a new process?
 Project information
What system calls have to be executed by a command interpreter in order to start a new process?
 Linux information
What system calls have to be executed by a command interpreter in order to start a new process?
 forkbomb information
 
What system calls have to be executed by a command interpreter in order to start a new process?
 

Project 1: The Shell and System Calls

Out: Friday, April 9, 2010
Due (code and writeup): Friday, April 23, 2010, 11:30 AM Saturday, April 24


Assignment Goals

  • C, beautiful C, ...
  • To understand the roles of and relationships among applications, OS command interpreters (shells), library calls, system calls, and the kernel.
  • To become familiar with the tools and skills needed to understand, modify, compile, install, and debug the Linux kernel.
  • To design and implement a simple shell, a simple system call, and a simple library routine.

Background: The Shell

As we'll discuss in class, the OS command interpreter is the program that people interact with in order to launch and control programs. On UNIX systems, the command interpreter is usually called the shell[1]: it is a user-level program that gives people a command-line interface to launching, suspending, and killing other programs. sh, ksh, csh, tcsh, and bash are all examples of UNIX shells.

Every shell is structured as a loop that includes the following:

  1. print a prompt
  2. read a line of input from the user
  3. parse the line into the program name, and an array of parameters
  4. use the fork() system call to spawn a new child process
    • the child process then uses the exec() system call to launch the specified program
    • the parent process (the shell) uses the wait() system call to wait for the child to terminate
  5. once the child (i.e. the launched program) finishes, the shell repeats the loop by jumping to 1.
Although most of the commands people type on the prompt are the name of other UNIX programs (such as ls or cat), shells recognize some special commands (called internal commands) which are not program names. For example, the exit command terminates the shell, and the cd command changes the current working directory. Shells directly make system calls to execute these commands, instead of forking child processes to handle them.

Background: Library Routines

Library routines are just code that is made available to you. They differ from routines you write yourself in two ways, though:
  • the code is provided in compiled form - .o files rather than .c.
  • a library is actually a single file that (typically) contains many .o files within it (just as Winzip, jar, and tar files contain other files within them).

Many languages come with some kind of "standard library" that provides commonly used functionality. C does. So does C++. The Java API might be thought of as the standard library for that language.

In C, many of the standard library functions serve as an interface between the code written for user applications and the operating system on which the application runs -- the library routines provide a standard interface to user code for IO (e.g, printf() and scanf()). This allows the applications to be somewhat portable (able to run on many different systems) - all you have to do is compile the application code on some system and link it with the standard library functions (written for and compiled on that system), and voila.

In Unix, libraries[2] are manipulated with the ar command. In a typical installation, standard libraries are located in /usr/lib, and have file names beginning with "lib". The name of the standard C library is usually /usr/lib/libc.a, for instance.

Linkers, like ld (used by gcc), will often automatically look in standard libraries for code required at link time. If you have a non-standard library (as you will in this assignment), you have to tell the compiler/linker about it, e.g.,

$ gcc -c sub.c # creates sub.o
$ ar r mylib.a sub.o # creates mylib.a with sub.o inside it
$ gcc main.o mylib.a # creates a.out
See documentation on gcc for more complicated uses of libraries, and man ar for more information on creating and examining them.

The Assignment

This assignment has four parts:
  1. implement a simple shell
  2. implement a new system call
  3. implement a library routine capable of invoking that system call, and a simple driver program that exercises it
  4. answer some questions about what you've done
The implementation approach we suggest has a slightly different order than that, though, the problem being that to test the new syscall you need a working library routine, and to test the library routine you need a working app. The suggested methodology minimizes the number of pieces of code in play not known to be working, thus making debugging easy(er).

Step 1: Build a new shell (fsh.c)

Write a shell program (in C), fsh.c, which has the following features:
  • It should recognize three internal commands:
    • exit [n] terminates the shell, either by calling the exit() standard library routine or causing a return from the shell's main(). If an argument (n) is given, it should be the exit value of the shell's execution[3]. Otherwise, the exit value should be the value returned by the last executed command (or 0 if no commands were executed.)
    • cd [dir] uses the chdir() standard library routine to change the shell's working directory to the argument directory. If no argument is given, the value of the HOME environment variable is used.
    • . filename causes commands to be read from the file. When end-of-file is reached, the shell returns to reading commands from the keyboard.

  • If the command line doesn't invoke an internal command, the shell assumes it is of the form
    ....
    Your shell uses the fork() standard library call, and some flavor of exec(), to invoke the executable, passing it any command line arguments.
Assume that executable names are specified as they are using "a real shell," i.e., name resolution involves the PATH environment variable. Try to use the same prompt as in the following:CSE451Shell% date Sat Jan 6 16:03:51 PST 1982 CSE451Shell% /bin/cat /etc/motd /etc/shells Pine, MH, and emacs RMAIL are the supported mailers on the instructional systems. [and on and on...] /bin/sh /bin/bash [and on and on...] CSE451Shell% ./date ./date: No such file or directory
Notes:
  • The words in bold in the sample session above were output by the shell. The underlined words were typed by the user. (Where did the non-bold, non-underlined words come from?)
  • Please take a look at the manual pages for execvp, fork, wait, and getenv.
  • To allow users to pass arguments to executables, you will have to parse the input line into words separated by whitespace (spaces and '\t' (the tab character)), and then create an array of strings pointing at the words. You might try using strtok() for this (man strtok for a very good example of how to solve exactly this problem using it).
  • You'll need to pass the name of the command as well as the entire list of tokenized strings to one of the other variants of exec, such as execvp(). These tokenized strings will then end up as the argv[] argument to the main() function of the new program executed by the child process. Try man execv or man execvp for more details.
  • Users tend to have a lot of bugs, and consequently dealing with them in a robust way can require a lot of code, especially if you're feeling like being helpful about reporting just what the mistake might be. Except as explicitly noted in the "specs" above, or when trivial to implement and of significant value, we're not feeling very helpful in writing this shell.

Step 2: Create a driver routine, and a dummy library routine (getexeccounts.c, getexecounts.h, getcounts.a, getDriver.c)

Our goal in this step is to debug the process of creating and linking with a library, as well as debugging a driver routine that will be used to test the library routine.

We start by writing getexeccounts.c and getexeccounts.h. The latter defines a single function:

int getExecCounts( int pid, int* pArray );
The former contains the implementation of that function. getExecCounts() returns 0 for success and non-zero for failure. The argument pArray is assumed to be an array of four ints. The implementation at this point is just a dummy routine: it assigns the value k to the kth element of the array, k=0,1,2,3, and always returns success.

Now write getDriver.c, an implementation of main() that invokes getExecCounts and prints the returned values like this:

pid 22114:     0   fork     1   vfork     2   execve     3   clone Compile and link as usual. To this point, this step should take maybe 15 minutes (depending primarily on how fast you type).

Now create a library, getcounts.a, with a single member, getexeccounts.o, and make sure you can link getDriver.o against it.

Step 3: Add a new system call, and modify the library routine to use it

There are four system calls in Linux related to creating new processes: fork, vfork, execve, and clone.  (The man pages will describe for you the differences among them, although those details aren't important to the implementation portion of this assignment.)  Implement a new system call that returns to the calling program how many invocations of each of those four process creation calls have been performed by a specific process and all of its descendants.

To do this requires four things:

  1. Modify the process control block definition (struct task_struct in include/linux/sched.h) to allow you to record the required information on a per-process basis.
  2. Instrument the kernelto keep track of this information.

  3. Design and implement a new system call that will get this data back to the user application.

  4. Modify your library routine to invoke your new syscall and return the results.

You'll use your dummy app to test the syscall (as well, possibly, as other techniques, such as printk()).

Notes: Items 1-3

  • I suggest you wade, rather than dive, into this.  In particular:
    1. If you've never compiled the kernel, go back through the project information page (consulting Linux Kernel HOWTO as needed). It should not take longer than an hour and it will ensure that you are up to speed with VMware.
    2. Now put a "printk()" somewhere in the code, and figure out how to find its output.  (Hints: /var/log and "man dmesg").
    3. Now implement a parameterless system call, whose body is just a printk() call.  See this excellent walkthrough, though be aware it was written for a slightly different version of linux with fewer preexisting system calls; as one example, where it refers to entry.S, you'll want syscall_table.S instead.

  • Both the design and implementation aspects of steps 2 and 3 present several different ways to approach this problem. It is your job to analyze them from an engineering point-of-view and choose an appropriate set of trade-offs for your implementation.

  • You might find this tool (also linked from the Linux Information link on the course homepage) useful in browsing the Linux source. The version of Linux it displays is undoubtedly different than what we're using; the impact of that should be small though. (Follow the big "Browse the code" link at the top of the page to start source browsing.)

  • Warning 1: Remember that the Linux kernel should be allowed to access any memory location, while the calling application should be prevented from causing the kernel to unwittingly read/write addresses other than those in its own address space. Details about this are here.

  • Warning 2 (Hint 0): Remember that it's inconceivable that this problem (warning 1) has never before been confronted in the existing kernel.

  • Warning 3: User programs tend to have a lot of bugs. Remember that the kernel must never, ever crash. This means that it cannot trust the application to know what it's talking about when it makes a request, particularly with respect to parameters passed in from the application to the kernel. (It also means the kernel cannot have a memory leak, as that would eventually cause a crash.)

  • Warning 4: Similarly, remember that you must be sure not to create security holes in the kernel with your code.

Notes: Item 4

  • You can't write C code that causes the compiler to generate a syscall instruction (meaning you can't directly invoke raw syscalls from C). Instead, you need to use the syscall() library routine. This code fragment show you how to do that:


    #define __NR_execcounts someNumber

    #include
    #include

    ...
       

    int ret = syscall(__NR_execcounts, ...);

    man syscall will tell you more.

  • In a more real situation, your new syscall number would be put in a system include files so that #include would provide it to user programs. That's cumbersome in this classroom environment. As a workaround, we just #define the new value in the one file that needs it (the library routine).

Step 4: Implement a utility application (execcnts.c)

We want to implement a program, execcnts that is to process creation syscall statistics what time (for info, man time) is to seconds. execcnts takes a command invocation line as arguments, executes the command, and prints out the number of each of the four process creation syscalls made in executing the command line. For intance,
execcnts find . -name '*.c'
would print the numbers of each of the four syscalls involved in executing that find command.

What to Turn In

You should (electronically) turn in the following:
  1. The C source code of your shell (fsh.c).

  2. The C source code (getexeccounts.c and .h files) of your library routine implementation, and the library itself (getcounts.a).

  3. Your driver (getDriver.c).

  4. Your implementation of the utility program, execcnts.c.

  5. The source code of the files you changed in or added to the kernel.

  6. A compiled kernel image (bzImage) with your changes made.

  7. The names of all of the Linux kernel source files that you modified, and a written description of what you did to them and why you needed to do it (i.e. why was it necessary to modify this particular file).

  8. A transcript showing you using your new shell to invoke the /bin/date program, the /bin/cat program, and the exit and cd commands supported by your shell.  (/usr/bin/script might come in handy to generate this printout.  As always, do man script to find out how to use the command.)

  9. A brief write-up with the answers to the following questions.
    1. What is your syscall design? What arguments does it take? How does it return results? What restrictions are there, if any, on its use?
    2. Explain the calling sequence that makes your system call work. First, a user program calls <.....>. Then, <.....> calls <.....>. ... and so on. You can explain this using either text or a rough diagram (don't spend more than 15 minutes on a diagram).
    3. gotos are generally considered bad programming style, but these are used frequently in the Linux kernel, why could this be?  This is a thinking question, so justification is more important than your answer.
    4. How could you extend your shell to support multiple simultaneous processes (foreground and background...)?
    5. Why must the three internal commands your shell supports be internal commands? (That is, why couldn't they be implemented as separate programs, like all other commands? In the specific case of '.', why couldn't you implement it by forking fsh
    6. How long does your new system call take? Explain your timing methodology.
    7. How is it that we can write test programs and scripts that will automate testing of your new syscall, even though we don't have any idea how it is you designed it (e.g., what the syscall number is, or what arguments it takes, or how it returns results)?
    8. I claim that the functionality provided by your new system call must be implemented in the kernel; that is, it couldn't be implemented through any combination of library routines, user applications, or the shell (without kernel support). Explain why that is true.
    9. Suppose that for whatever reason (e.g., you don't have access to the source), you couldn't modify the kernel implementation. I claim that, to a large extent, you could still obtain information about the number process creation system calls made by many applications, even if you don't have access to the source for those applications either. Explain why I'm right about this as well.

    Turn in your write-up electronically in a separate file along with your code.
Additionally, hand in printed copies of the following in class on Friday, April 23:
  1. Your write-up.
  2. The code for your shell implementation (fsh.c).
  3. The code for your library routine (getexeccounts.c and getexeccounts.h).
  4. The code for your utility program (execcnts.c).

Do not underestimate the importance of the write-up. Your project grade depends significantly on how well you understood what you were doing, and the write-up is the best way for you to demonstrate that understanding.

The grade on the project will be calculated as follows:

  • Shell: 10 points
  • Library routine: 5 points
  • Utility program: 5 points
  • System call: 20 points
  • Write-up: 20 points

Submission instructions: Ideally, we'd like turnin to be a single .tar.gz file, which expands to a sensible directory structure containing files with sensible names and not containing the superfluous mess the compiler leaves around (.o files, executables, etc.).


Footnotes

[1]While we say "the shell," there are many different shell programs (e.g., sh, ksh, csh, tcsh, bash, the shell you're writing, etc.). As well, a single system, and a single user, can be running more than one shell process at a time. Because a shell is just a program, you can launch any shell from any other shell. On Unix systems, a user's login (default) shell is kept as the last data item in the line of information about that user in the /etc/passwd file, e.g.,
farnsworth:x:122:119:Ted &,411,35142:/homes/iws/farnsworth:/bin/bash
The command chsh has historically been used to change the login shell entry in your /etc/passwd line; our labs now use kchsh, which is a version of chsh adapted to use Kerberos password authentication. [2]At this point, the text is actually talking about static libraries, those whose code is linked to the application at link time (i.e., at the time the executable file is created). Dynamic (i.e., linked at run time) versions of libraries are available as well. [3]atoi() converts between ASCII representations of integers and ints: man atoi. (Note that it is incapable of indicating that the string argument does not represent an integer. However, sscanf() can do that. Sort of.)

What else is a command interpreter call?

A command interpreter is the part of a computer operating system that understands and executes commands that are entered interactively by a human being or from a program. In some operating systems, the command interpreter is called the shell.

What is the purpose of the command interpreter why is it usually separate from the kernel quizlet?

What is the purpose of the command interpreter? Why is it usually separate from the kernel? and executes them, usually by turning them into one or more system calls. It is usually not part of the kernel since the command interpreter is subject to changes.

Which command functions as command interpreter?

A command-line interpreter (command interpreter) is a program responsible for handling and processing text commands. For example, the command-line interpreter for MS-DOS and early versions of Windows is COMMAND.COM. In later versions of Windows, it is cmd.exe (Command Prompt).

What is system calls how they are executed?

When a user program invokes a system call, a system call instruction is executed, which causes the processor to begin executing the system call handler in the kernel protection domain. This system call handler performs the following actions: Sets the ut_error field in the uthread structure to 0.