The Apache Portable Runtime

Jan 发表于 2005-03-10 09:08:45

APR是一项很有意思的技术... 首先它是开源的,其次它在某种程度上是可以和ACE相提并论的。它的主要目的是让程序员可以轻松的写出可移植的代码而不用纠缠于无止境的#ifdef/#endif. 把介绍放在全文中作文档,以后也许会用到。

Apache的衍生物也是不同反响,真是漂亮。

Feature Story  
Written by Ryan Bloom     
Tuesday, 01 February 2005  
Apache 1.3 was ported to a variety of platforms, including many that weren't POSIX based, such as Windows, OS/2, and BeOS. On those platforms, Apache 1.3 often relied on #ifdef blocks to acheive portability, effectively forking the source into mainline code and platform-specific code, making the code harder to read, debug, and maintain.
When development started on Apache 2.0, the developers knew that they needed a better solution. Initially, two existing solutions were considered. One was the Adaptive Communication Environment (ACE), and the other was the Netscape Portability Runtime (NSPR). However, both were rejected.
ACE was implausible because Apache requires that all code be written strictly in C, and ACE is a combination of C and C++. And while NSPR looked like a good fit, it's license was incompatible with Apache's. (The licensing issues were eventually resolved, but by that time, APR was already in development.)
Nonetheless, writing APR from scratch has worked well for the Apache community — and others. (See the sidebar, "Who's Using the Apache Runtime?" for more details.) It's a portability layer specifically written with servers in mind.
Let's see how to write applications with APR. And to appreciate APR's power, let's start with code that looks portable, but's not. As you'll see, there are devils in the details.
Almost Portable

Some code is inherently portable, because it uses very well documented APIs
that are implemented everywhere. For example, the code...
char *var = getenv("SHELL");
... compiles and runs on all platforms, but there are subtleties that may make it behave differently. For instance, is the SHELL variable name case sensitive? On Unix it is, but on Windows, it isn't. Also, on Windows, applications can be compiled in either UNICODE or ANSI modes. If your application is compiled for UNICODE, then the environment table is UNICODE — but this code always tries to read the environment variables as ANSI strings. These details can be absolutely infuriating for any developer.
In APR, this same concept can be written as:
char *var;apr_status_t rv;rv = apr_env_get(&var, "SHELL", p);
In this case, the code isn't much more complex than the original, and it resolves the issues of the original code. Because APR always uses native functions under the covers, APR is able to determine if it should be reading as UNICODE or ANSI and react accordingly.
Listing One shows a very simple program that demonstrates getting a single environment variable.
LISTING ONE: Reading an environment variable with the Apache Portable Runtime

1 #include "apr.h" 2 #include "apr_env.h" 3 4 int main(int argc, char* argv[]) { 5   apr_pool_t* p; 6   char *env_var; 7 8   apr_initialize(); 9   atexit(apr_terminate());10   apr_pool_create(&p, NULL);11   apr_env_get(&env_var, argv[1], p);12   printf("%s\n", env_var);13   apr_pool_destroy(p);14 }
The first thing the program does is initialize APR. Every APR-based application should do this as soon as the program starts. (While APR may work without initialization on some platforms, on others it won't. It's best to always add the call.)
Next — and another mandatory step — the program calls atexit() to configure APR to call apr_terminate() when the program exits. This ensures that any mutexes and other, limited operating system resources are released.
The rest of the code does the work. Line 10 creates the first pool created (more on pools momentarily), lines 11-12 get the requested environment variable and print it to the console. Line 13 destroys the pool and exits.
If the code is in file abc.c, compile it on Linux using:
$ gcc abc.c `./apr/apr-1-config --includes` \  `./apr/apr-1-config --libs`  \  apr/.libs/libapr-1.a -o echoenv
apr-1-config is a configuration script that provides the proper arguments for compiling APR programs. In this case, the includes directories and the libraries that must be linked to satisfy APR's requirements. You can run the application with ./echoenv SHELL.
Not Portable

Code that looks portable but isn't is frustrating, to be sure. But a far more complex problem is code that's obviously non-portable. It's hard enough for experienced multi-platform developers to remember which function is for which platform — what do you do when porting a complex application to a platform you have no experience with?
For example, loading shared libraries is different on many platforms. What function do you call to load a library on Windows, Linux, HP/UX, and MacOS X? Many people know how to do it properly on one or two of those, but very few people know all four.
Even if you do — on Windows, you use LoadLibraryEx(), Linux uses dlopen(), HP/UX uses shl_load(), and MacOS X uses NSLinkModule() — each of those functions have different arguments and different error codes.
In sharp contract, loading a library in APR is as simple as:
apr_dso_handle_t *h = NULL;apr_status_t status;status = apr_dso_load(&h, "testdso.so", p);
The type apr_dso_handle_t is a handle to the shared library. With a handle, you can load a specific symbol from the library.[ Ed.: What is the argument, p?]
For a complete list of the types of actions that APR makes portable, see the
API documentation on the APR web site.
Managing Memory

In all of the examples above, APR allocated the memory for the APR variables, because for most APR types, only APR can allocate the memory. Because of how most APR types are defined, only APR has the correct size of the APR type, and therefore only APR can allocate the memory correctly.
So how do you allocate memory for your variables? Use pools.
Typical C memory management requires that the code that requests memory also free that memory. If you forget to free any of your allocated memory, your application leaks, which, for a long-running application like a server, can get bad enough to bring the computer to a crawl. Pools address this problem by having all memory allocation happen from shared pools of pre-allocated memory. This allows all of the allocated memory to be freed at one time, without needing to worry about memory leaks.
You can also create a hierarchy of pools, so that each pool has a parent. If you destroy a pool with a parent pool, then the memory from the sub-pool is returned to the parent instead of being freed. If you clear or destroy a parent pool, the child pools are automatically cleared or destroyed, respectively. Finally, you can register functions to run when a pool is cleared. This is useful if you have a resource that must be closed properly, because you can drop the handle to the resource, and when the pool is cleared, the resource will be closed. APR itself uses this feature to ensure that mutexes are released before a program ends.
Pools also enhance performance. One of the worst things you can do in C is allocate and free memory repeatedly. Often, the worst part of your application is malloc() and free(), which you don't control. By moving to pools, you remove a lot of the overhead of malloc() and free(), because calls to those functions are centralized. In fact, in a well-architected, pool-based application, there are never any calls to free() until the program is about to end. Written to use pools, programs come to a steady state, where it neither allocates nor frees memory while performing its work.
To be fair, working with pools is complex, and they don't map well to all applications. Any application that is small and doesn't do the same operation multiple times isn't a good match for pools.
Pools are APR's biggest advantage and biggest weakness. For people who like pools and have been using them in their applications, APR is the perfect portability library. However, for people who want to write very object-oriented code, pools can often get in the way. Also, it is very difficult to combine pool-based and non-pool-based code. The APR developers realize that pools aren't for everybody, and are working on finding ways to abstract memory allocation so that the current pools implementation can use a non-pools based allocator.
See the sidebar "Swimming in Pools" to help determine if your application is well-suited for pools.
Implementing cat using APR

cat is one of the simplest commands found on Unix machines: it reads one or more files or standard input, and prints everything read to standard output. However, paired with a variety of other Unix commands, cat can become a very powerful tool. So, it's surprising that there isn't a reasonable facsimile of cat for Windows (except for cygwin, but it removes you from the Windows environment instead of implementing the tools in a native environment).
Let's implement a simple, portable version of cat using APR. Listing Two shows the most important function ((a complete version of cat is left as an exercise.)
Listing Two: The guts of a portable version of cat

1 void printOutput(apr_file_t *in, apr_file_t *out, int numberNonBlank,  2                 int numberAll, int showEnd, int showTab, int showNonprint,  3                 int squeezeBlank)  4 { 5   char str[HUGE_STRING_LEN]; 6   int linenum = 0; 7   int lastBlank = FALSE; 8 9   while (apr_file_gets(str, HUGE_STRING_LEN, in) != APR_EOF) {10     apr_size_t bytes;11     int emptyLine = FALSE;1213     emptyLine = !strcmp(str, APR_EOL_STR);14     if (apr_file_eof(in)) {15       break;16     }17        18     if (squeezeBlank && emptyLine) {19       if (lastBlank) {20         continue;21       }22       else {23         lastBlank = TRUE;24       }25     }26     if (!emptyLine) {27       lastBlank = FALSE;28     }2930     if (numberAll || (!emptyLine && numberNonBlank)) {31       linenum++;32       apr_file_printf(out, "%d: ", linenum);33     }3435     if (showTab) {36       replaceTab(str);37     }38     if (showNonprint) {39       replaceNonprint(str);40     }4142     bytes = strlen(str);43     if (!strcmp(str + bytes - strlen(APR_EOL_STR), APR_EOL_STR)) {44       str[bytes - strlen(APR_EOL_STR)] = '#CONTENT#';45     }46     apr_file_printf(out, "%s%s" APR_EOL_STR, str, (showEnd) ? "$" : "");47   }48 }
The function printOutput() loops through a file reading one line at a time, making some changes to the string that was just read, and printing the result. No sub-pools are created, because nothing in the loop actually allocates new memory.
Looking at Listing Two, it should be obvious that using APR doesn't change the code that you'd normally write to implement cat, with one exception on line 43. Notice the APR_EOL_STR macro. It expands to the correct end-of-line character sequence for the current platform. On Windows, this is CR/LF, while on Unix, it's LF. This can be a very important difference when porting applications that deal with text files between platforms, and again APR provides the tools for handling this problem.
Listing Three shows the function that determines how to handle printing non-printable characters.
Listing Three: Character checking in APR

1 void replaceNonprint(char *str) 2 { 3   int len = strlen(str); 4   char old[HUGE_STRING_LEN]; 5   int i; 6   int offset; 7 8   memcpy(old, str, len); 9   for (i = 0, offset = 0; i < len; i++, offset++) {10     if (old == '\t') {11       continue;12     }13     if (!apr_isascii(old) && !apr_isprint(old)) {14       str[offset++] = 'M';15       str[offset++] = '-';16       str[offset++] = toascii(old);17     }18     if (apr_iscntrl(old)) {19       str[offset++] = '^';20       str[offset++] = (old == '7') ? '?' : old | 0100;21     }22   }23 }
replaceNonprint() is only called if the user wants to see non-printable characters. In this case, lines 13 and 18 are the most interesting, because they show how to determine if the current character is printable or not, and if it is an ACSII character or a control character. These methods are usually implemented as macros on most platforms, but on some of the more esoteric platforms, they don't exist at all, so APR had to re-implement them.
Who's Using the Apache Runtime?
Since APR 1.0 was released in October 2004, it's received a lot of attention. However, projects have been using APR with great success since its early days.
Obviously, the most visible APR-based project is Apache 2.0. Apache 2.0 relies on APR for all of its portability concerns. This has allowed the Apache developers to combine their source into a single codebase, avoiding the issues associated with lots of platform-specific #ifdef s.
While solving some problems, code unification highlighted another: portability isn't as simple as replacing platform-specific networking and file system functions with "generic", portable replacements. Fundamentally, different platforms run servers in often unique ways.
For example, Unix lends itself to having a lot of server processes with one or a small number of threads per process. Windows, on the other hand, lends itself to at most two processes with a much larger number of threads per process. To solve this problem, Apache 2.0 needed to abstract out the process launching aspect of the application.
Another project that's had great success with APR is Subversion (http://subversion.tigris.org/). Subversion leverages APR to provide a current versioning system on every major (and some minor) platforms. The Subversion version control system is interesting, because it has both client and server components, both which use APR for portability. The Subversion server can either be run as a module to Apache 2.0 or as a stand-alone server. The client is a simple command-line application. (APR doesn't contain any GUI components.)
A slightly different project that takes advantage of APR is APR-util. APR-util is developed by the same engineers who write APR itself. For more details about APR-util, see the sidebar "APR-util: What Is It?"
Portability is a complex problem to solve, and it shouldn't be tackled lightly. While APR goes a long way to solving portability problems, no tool can remove all of the complexity, and APR should be considered just one part of your solution.
APR-util: What Is It?
Although the Apache Portable Runtime provides everything you need to write portable C code, it doesn't handle a number tasks that are nonetheless valuable to have available on many platforms — tasks such as MD4 and MD5 encoding, and string matching with simple glob characters like * and ?.
For tasks like those, the APR team created the APR-util library, a set of useful routines that are guaranteed to work on all platforms. For instance, if you've always wanted a wrapper for accessing any type of database with the same API (much like Perl' s DBI layer), APR-util has the DBM system.
APR-util can also be used for manipulating URIs, generic pooling mechanisms through the resource list API, and UUID handling.
Each individual system in APR-util is completely separate, which can be very confusing. The only thing that holds these systems together in a single library is that all of its features are small, useful, and inherently portable. Although most people don't know about APR-util, it is a very powerful library and should be considered when you need anything it offers.
Swimming in Pools
Pools work best for long-running applications that repeat the same operations over and over again. However, pools can be used for any type of application, if it is designed properly.
To write an application for use with memory pools, think of your application as a set of nested operations. Each operation should get one pool with as many sub-pools as necessary.
For example, the first operation is the application itself; the second operation could be configuration; and a third could be running a command or processing a request. Everything nests within the first pool, the one that was created when the program started. When the program ends, simply discard the first pool. This will, in turn, destroy every other pool in the program, ensuring that all memory is freed.
Code, Code Everywhere

Hopefully, this quick trip through the Apache Portable Runtime has shown you how easy it can be to write portable C code without sacrificing readability or maintainability. APR is a project that has taken a long time to come to fruition, but the final 1.0 release is a strong foundation for portable code.
Thanks to the Apache Portable Runtime team, writing portable code so much easier than it used to be. APR has eased cross-platform development for everybody.
Ryan Bloom is one of the founding members of the Apache Portable Runtime project, and continues as an occaisional committer today. He is the author of the Apache Server 2.0: The Complete Reference. Ryan can be reached at rbb@apache.orgThis email address is being protected from spam bots, you need Javascript enabled to view it. Thanks to David Barrett for reviewing early drafts of this article.

收藏: QQ书签 del.icio.us 订阅: Google 抓虾

最新评论

发表评论

* 昵称

已经注册过? 请登录

新用户请先注册 以便能显示头像及追踪评论回复

Email
网址
* 评论
表情
 
 

分类小组论坛
杂谈, 娱乐、八卦, 文学、艺术, 体育, 旅游、同城, 象牙塔, 情感, 时尚、生活, 星座, 科技

请注意遵守中华人民共和国法律法规, 如威胁到本站生存, 将依法向有关部门报告, 同时本站的相关记录可能成为对您不利的证据.

相关法律法规
全国人大常委会关于维护互联网安全的决定
中华人民共和国计算机信息系统安全保护条例
中华人民共和国计算机信息网络国际联网管理暂行规定
计算机信息网络国际联网安全保护管理办法
计算机信息系统国际联网保密管理规定