integrating faces and fingerprints for personal identification. an automatic personal identification system based solely on fingerprints or faces is often not able to meet the system performance requirements. face recognition is fast but not extremely reliable, while fingerprint verification is reliable but inefficient in database retrieval. we have developed a prototype biometric system which integrates faces and fingerprints. the system overcomes the limitations of face recognition systems as well as fingerprint verification systems. the integrated prototype system operates in the identification mode with an admissible response time. the identity established by the system is more reliable than the identity established by a face recognition system. in addition, the proposed decision fusion scheme enables performance improvement by integrating multiple cues with different confidence measures. experimental results demonstrate that our system performs very well. it meets the response time as well as the accuracy requirements. optimising the complete image feature extraction chain. the hypothesis verification stage of the traditional image processing approach, consisting of low, medium, and high level processing, will suffer if the set of low level features extracted are of poor quality. we investigate the optimisation of the feature extraction chain by using genetic algorithms. the fitness function is a performance measure which reflects the quality of an extracted set of features. we will present some results and compare them with a hill-climbing approach. 3d shape recovery of smooth surfaces: dropping the fixed viewpoint assumption. we present a new method for recovering the 3d shape of a featureless smooth surface from three or more calibrated images illuminated by different light sources (three of them are independent). this method is unique in its ability to handle images taken from unconstrained perspective viewpoints and unconstrained illumination directions. the correspondence between such images is hard to compute and no other known method can handle this problem locally from a small number of images. our method combines geometric and photometric information in order to recover dense correspondence between the images and accurately computes the 3d shape. only a single pass starting at one point and local computation are used. this is in contrast to methods that use the occluding contours recovered from many images to initialize and constrain an optimization process. the output of our method can be used to initialize such processes. in the special case of fixed viewpoint, the proposed method becomes a new perspective photometric stereo algorithm. nevertheless, the introduction of the multiview setup, self-occlusions, and regions close to the occluding boundaries are better handled, and the method is more robust to noise than photometric stereo. experimental results are presented for simulated and real images. a real-time large disparity range stereo-system using fpgas. in this paper, we discuss the design and implementation of a field-programmable gate array (fpga) based stereo depth measurement system that is capable of handling a very large disparity range. the system performs rectification of the input video stream and a left-right consistency check to improve the accuracy of the results and generates subpixel disparities at 30 frames/second on 480 × 640 images. the system is based on the local weighted phase- correlation algorithm [9] which estimates disparity using a multi-scale and multi-orientation approach. though fpgas are ideal devices to exploit the inherent parallelism in many computer vision algorithms, they have a finite resource capacity which poses a challenge when adapting a system to deal with large image sizes or disparity ranges. in this work, we take advantage of the temporal information available in a video sequence to design a novel architecture for the correlation unit to achieve correlation over a large range while keeping the resource utilisation very low as compared to a naive approach of designing a correlation unit in hardware. a linear algorithm for motion from three weak perspective images using euler angles. in this paper, we describe a new simple linear algorithm for motion and structure from three weak perspective projections using euler angles. we first determine the epipolar equation between each pair of images, which determines the first and third euler angles for the rotation between that pair of images, leaving only the second euler angle undetermined. in the next step, combining the three rotations results in a very simple linear algorithm to determine the second euler angles, up to a necker reversal. experimental results on synthetic and real images are presented. the degenerate cases are discussed. the program can be ftped from http://www.cv.cs.ritsumei.ac.jp/noriko/motion.html. viewpoint determination of image by interpolation over sparse samples. we address the problem of determining the viewpoint of an image without referencing to or estimating explicitly the 3-d structure pictured in the image. used for reference are instead a number of sample snapshots of the scene, each supplied with the associated viewpoint. by viewing image and its associated viewpoint as the input and output of a function, and the reference snapshot-viewpoint pairs as input-output samples of that function, we have a natural formulation of the problem as an interpolation one. the interpolation formulation allows imaging details like camera intrinsic parameters to be unknown, and the specification of the desired viewpoint to be not necessarily in metric terms. we describe an interpolation-based mechanism that determines the viewpoint of any given input image, which has the property that it fits all the given input-output reference samples exactly. experimental results on benchmarking image datasets show that the mechanism is effective in reaching quality viewpoint solution even with only a few reference snapshots. an integrated model for evaluating the amount of data required for reliable recognition. many recognition procedures rely on the consistency of a subset of data features with a hypothesis as the sufficient evidence to the presence of the corresponding object. we analyze here the performance of such procedures, using a probabilistic model, and provide expressions for the sufficient size of such data subsets, that, if consistent, guarantee the validity of the hypotheses with arbitrary confidence. we focus on 2d objects and the affine transformation class, and provide, for the first time, an integrated model which takes into account the shape of the objects involved, the accuracy of the data collected, the clutter present in the scene, the class of the transformations involved, the accuracy of the localization, and the confidence we would like to have in our hypotheses. interestingly, it turns out that most of these factors can be quantified cumulatively by one parameter, denoted "effective similarity," which largely determines the sufficient subset size. the analysis is based on representing the class of instances corresponding to a model object and a group of transformations, as members of a metric space, and quantifying the variation of the instances by a metric cover. script and language identification from document images. in this paper we present a detailed review of current script and language identification techniques. the main criticism of the existing techniques is that most of them rely on character segmentation. we go on to present a new method based on texture analysis for script identification which does not require character segmentation. a uniform text block on which texture analysis can be performed is produced from a document image via simple processing. multiple channel (gabor) filters and grey level co-occurrence matrices are used in independent experiments in order to extract texture features. classification of test documents is made based on the features of training documents using the k-nn classifier. initial results of over 95% accuracy on the classification of 105 test documents from 7 languages are very promising. the method shows robustness with respect to noise, the presence of foreign characters or numerals, and can be applied to very small amounts of text. similarity matching. with complex multimedia data, wesee the emergence of database systems in which the fundamental operation is similarity assessment. before database issues can be addressed, it is necessary to give a definition of similarity as an operation. in this paper, we develop a similarity measure, based on fuzzy logic, that exhibits several features that match experimental findings in humans. the model is dubbed fuzzy feature contrast (ffc) and is an extension to a more general domain of the feature contrast model due to tversky. we show how the ffc model can be used to model similarity assessment from fuzzy judgment of properties, and we address the use of fuzzy measures to deal with dependencies among the properties. online updating appearance generative mixture model for meanshift tracking. this paper proposes an appearance generative mixture model based on key frames for meanshift tracking. meanshift tracking algorithm tracks an object by maximizing the similarity between the histogram in tracking window and a static histogram acquired at the beginning of tracking. the tracking therefore could fail if the appearance of the object varies substantially. in this paper, we assume the key appearances of the object can be acquired before tracking and the manifold of the object appearance can be approximated by piece-wise linear combination of these key appearances in histogram space. the generative process is described by a bayesian graphical model. an online em algorithm is proposed to estimate the model parameters from the observed histogram in the tracking window and to update the appearance histogram. we applied this approach to track human head motion and to infer the head pose simultaneously in videos. experiments verify that our online histogram generative model constrained by key appearance histograms alleviates the drifting problem often encountered in tracking with online updating, that the enhanced meanshift algorithm is capable of tracking object of varying appearances more robustly and accurately, and that our tracking algorithm can infer additional information such as the object poses. 3d shape and motion analysis from image blur and smear: a unified approach. this paper addresses 3d shape recovery and motion estimation using a realistic camera model with an aperture and a shutter. the spatial blur and temporal smear effects induced by the camera's finite aperture and shutter speed are used for inferring both the shape and motion of the imaged objects. histogram features-based fisher linear discriminant for face detection. the face pattern is described by pairs of template-based histogram and fisher projection orientation under the framework of adaboost learning in this paper. we assume that a set of templates are available first. to avoid making strong assumptions about distributional structure while still retaining good properties for estimation, the classical statistical model, histogram, is used to summarize the response of each template. by introducing a novel “integral histogram image”, we can compute histogram rapidly. then, we turn to fisher linear discriminant for each template to project histogram from d-dimensional subspace to one-dimensional subspace. best features, used to describe face pattern, are selected by adaboost learning. the results of experiments demonstrate that the selected features are much more powerful to represent the face pattern than the simple rectangle features used by viola and jones and some variants. eye correction using correlation information. this paper proposes a novel eye detection method using the mct-based pattern correlation. the proposed method detects the face by the mct-based adaboost face detector over the input image and then detects two eyes by the mct-based adaboost eye detector over the eye regions. sometimes, we have some incorrectly detected eyes due to the limited detection capability of the eye detector. to reduce the falsely detected eyes, we propose a novel eye verification method that employs the mct-based pattern correlation map. we verify whether the detected eye patch is eye or non-eye depending on the existence of a noticeable peak. when one eye is correctly detected and the other eye is falsely detected, we can correct the falsely detected eye using the peak position of the correlation map of the correctly detected eye. experimental results show that the eye detection rate of the proposed method is 98.7% and 98.8% on the bern images and ar-564 images. probability hypothesis density approach for multi-camera multi-object tracking. object tracking with multiple cameras is more efficient than tracking with one camera. in this paper, we propose a multiple-camera multiple-object tracking system that can track 3d object locations even when objects are occluded at cameras. our system tracks objects and fuses data from multiple cameras by using the probability hypothesis density filter. this method avoids data association between observations and states of objects, and tracks multiple objects in single-object state space. hence, it has lower computation than methods using joint state space. moreover, our system can track varying number of objects. the results demonstrate that our method has a high reliability when tracking 3d locations of objects. adaptive multiple object tracking using colour and segmentation cues. we consider the problem of reliably tracking multiple objects in video, such as people moving through a shopping mall or airport. in order to mitigate difficulties arising as a result of object occlusions, mergers and changes in appearance, we adopt an integrative approach in which multiple cues are exploited. object tracking is formulated as a bayesian parameter estimation problem. the object model used in computing the likelihood function is incrementally updated. key to the approach is the use of a background subtraction process to deliver foreground segmentations. this enables the object colour model to be constructed using weights derived from a distance transform operating over foreground regions. results from foreground segmentation are also used to gain improved localisation of the object within a particle filter framework. we demonstrate the effectiveness of the approach by tracking multiple objects through videos obtained from the caviar dataset. gender classification based on fusion of multi-view gait sequences. in this paper, we present a new method for gender classification based on fusion of multi-view gait sequences. for each silhouette of gait sequences, we first use a simple method to divide the silhouette into 7 (for 90 degree, i.e. fronto-parallel view) or 5 (for 0 and 180 degree, i.e. front view and back view) parts, and then fit ellipses to each of the regions. next, the features are extracted from each sequence by computing the ellipse parameters. for each view angle, every subject's features are normalized and combined as a feature vector. the combination of feature vector contains enough information to perform well on gender recognition. sum rule and svm are applied to fuse the similarity measures from 0°, 90°, and 180°. we carried our experiments on casia gait database, one of the largest gait databases as we know, and achieved the classification accuracy of 89.5%. synthesis of exaggerative caricature with inter and intra correlations. we developed a novel system consisting of two modules, statistics-based synthesis and non-photorealistic rendering (npr), to synthesize caricatures of exaggerated facial features and other particular characteristics, such as beards or nevus. the statistics-based synthesis module can exaggerate shapes and positions of facial features based on non-linear exaggerative rates determined automatically. instead of comparing only the inter relationship between features of different subjects at the existing methods, our synthesis module applies both inter and intra (i.e. comparisons between facial features of the same subject) relationships to make the synthesized exaggerative shape more contrastive. subsequently, the npr module generates a line-drawing sketch of original face, and then the sketch is warped to an exaggerative style with synthesized shape points. the experimental results demonstrate that this system can automatically, and effectively, exaggerate facial features, thereby generating corresponding facial caricatures. combined object detection and segmentation by using space-time patches. this paper presents a method for classifying the direction of movement and for segmenting objects simultaneously using features of space-time patches. our approach uses vector quantization to classify the direction of movement of an object and to estimate its centroid by referring to a codebook of the space-time patch feature, which is generated from multiple learning samples. we segmented the objects' regions based on the probability calculated from the mask images of the learning samples by using the estimated centroid of the object. even though occlusions occur when multiple objects overlap in different directions of movement, our method detects objects individually because their direction of movement is classified. experimental results show that object detection is more accurate with our method than with the conventional method, which is only based on appearance features. image assimilation for motion estimation of atmospheric layers with shallow-water model. the complexity of dynamical laws governing 3d atmospheric flows associated to incomplete and noisy observations makes very difficult the recovery of atmospheric dynamics from satellite images sequences. in this paper, we face the challenging problem of joint estimation of time-consistent horizontal motion fields and pressure maps at various atmospheric depths. based on a vertical decomposition of the atmosphere, we propose a dense motion estimator relying on a multi-layer dynamical model. noisy and incomplete pressure maps obtained from satellite images are reconstructed according to shallow-water model on each cloud layer using a framework derived from data assimilation. while reconstructing dense pressure maps, this variational process estimates time-consistent horizontal motion fields related to the multi-layer model. the proposed approach is validated on a synthetic example and applied to a real world meteorological satellite image sequence. an fpga-based smart camera for gesture recognition in hci applications. smart camera is a camera that can not only see but also think and act. a smart camera is an embedded vision system which captures and processes image to extract application-specific information in real time. the brain of a smart camera is a special processing module that performs application specific information processing. the design of a smart camera as an embedded system is challenging because video processing has insatiable demand for performance and power, but at the same time embedded systems place considerable constraints on the design. we present our work to develop gesturecam, an fpga-based smart camera built from scratch that can recognize simple hand gestures. the first completed version of gesturecam has shown promising real-time performance and is being tested in several desktop hci (human computer interface) applications. real-time and marker-free 3d motion capture for home entertainment oriented applications. we present an automated system for real-time marker-free motion capture from two calibrated webcams. for fast 3d shape and skin reconstructions, we extend shape-from-silhouette algorithms. the motion capture system is based on simple and fast heuristics to increase the efficiency. multi-modal scheme using both shape and skin-parts analysis, temporal coherence, and human anthropometric constraints are adopted to increase the robustness. thanks to fast algorithms, low-cost cameras and the fact that the system runs on a single computer, our system is perfectly suitable for home entertainment device. results on real video sequences demonstrate our approach efficiency. an efficient method for text detection in video based on stroke width similarity. text appearing in video provides semantic knowledge and significant information for video indexing and retrieval system. this paper proposes an effective method for text detection in video based on the similarity in stroke width of text (which is defined as the distance between two edges of a stroke). from the observation that text regions can be characterized by a dominant fixed stroke width, edge detection with local adaptive thresholds is first devised to keep text- while reducing background-regions. second, morphological dilation operator with adaptive structuring element size determined by stroke width value is exploited to roughly localize text regions. finally, to reduce false alarm and refine text location, a new multi-frame refinement method is applied. experimental results show that the proposed method is not only robust to different levels of background complexity, but also effective to different fonts (size, color) and languages of text. user-guided shape from shading to reconstruct fine details from a single photograph. many real objects, such as faces, sculptures, or low-reliefs are composed of many detailed parts that can not be easily modeled by an artist nor by 3d scanning. in this paper, we propose a new shape from shading (sfs) approach to rapidly model details of these objects such as wrinkles and reliefs of surfaces from one photograph. the method first determines the surface's flat areas in the photograph. then, it constructs a graph of relative altitudes between each of these flat areas. we circumvent the ill-posed problem of shape from shading by having the user set if some of these flat areas are a local maximum or a local minimum; additional points can be added by the user (e.g. at discontinuous creases) - this is the only user input. we use an intuitive mass-spring based minimization to determine the final position of these flat areas and a fast-marching method to generate the surface. this process can be iterated until the user is satisfied with the resulting surface. we illustrate our approach on real faces and low-relief photographs. unsupervised identification of multiple objects of interest from multiple images: discover. given a collection of images of offices, what would we say we see in the images? the objects of interest are likely to be monitors, keyboards, phones, etc. such identification of the foreground in a scene is important to avoid distractions caused by background clutter and facilitates better understanding of the scene. it is crucial for such an identification to be unsupervised to avoid extensive human labeling as well as biases induced by human intervention.most interesting scenes contain multiple objects of interest. hence, it would be useful to separate the foreground into the multiple objects it contains. we propose discover, an unsupervised approach to identifying the multiple objects of interest in a scene from a collection of images. in order to achieve this, it exploits the consistency in foreground objects - in terms of occurrence and geometry - across the multiple images of the scene. privacy preserving: hiding a face in a face. this paper proposes a detailed framework of privacy preserving techniques in real-time video surveillance systems. in the proposed system, the protected video data can be released in such a way that the identity of any individual contained in video cannot be recognized while the surveillance data remains practically useful, and if the original privacy information is demanded, it can be recoverable with a secrete key. the proposed system attempts to hide a face (real face, privacy information) in a face (new generated face for anonymity). to deal with the huge payload problem of privacy information hiding, an active appearance model (aam) based privacy information extraction and recovering is proposed in our system. a quantized index modulation based data hiding scheme is used to hide the privacy information. experimental results have shown that the proposed system can embed the privacy information into video without affecting its visual quality and keep its practical usefulness, at the same time, allows the privacy information to be revealed in a secure and reliable way. initial pose estimation for 3d model tracking using learned objective functions. tracking 3d models in image sequences essentially requires determining their initial position and orientation. our previous work [14] identifies the objective function as a crucial component for fitting 2d models to images. we state preferable properties of these functions and we propose to learn such a function from annotated example images. this paper extends this approach by making it appropriate to also fit 3d models to images. the correctly fitted model represents the initial pose for model tracking. however, this extension induces nontrivial challenges such as out-of-plane rotations and self occlusion, which cause large variation to the model's surface visible in the image. we solve this issue by connecting the input features of the objective function directly to the model. furthermore, sequentially executing objective functions specifically learned for different displacements from the correct positions yields highly accurate objective values. shape reconstruction from cast shadows using coplanarities and metric constraints. to date, various techniques of shape reconstruction using cast shadows have been proposed. the techniques have the advantage that they can be applied to various scenes including outdoor scenes without using special devices. previously proposed techniques usually require calibration of camera parameters and light source positions, and such calibration processes make the application ranges limited. if a shape can be reconstructed even when these values are unknown, the technique can be used to wider range of applications. in this paper, we propose a method to realize such a technique by constructing simultaneous equations from coplanarities and metric constraints, which are observed by cast shadows of straight edges and visible planes in the scenes, and solving them. we conducted experiments using simulated and real images to verify the technique. on the critical point of gradient vector flow snake. in this paper, the so-called critical point problem of gradient vector flow (gvf) snake is studied in two respects: influencing factors and detection of the critical points. one influencing factor that particular attention should be paid to is the iteration number in the diffusion process, too large amount of diffusion would flood the object boundaries while too small amount would preserve excessive noise. here, the optimal iteration number is chosen by minimizing the correlation between the signal and noise in the filtered vector field. on the other hand, we single out all the critical points by quantizing the gvf vector field. after the critical points are singled out, the initial contour can be located properly to avoid the nuisance arising from critical points. several experiments are also presented to demonstrate the effectiveness of the proposed strategies. super resolution of images of 3d scenecs. we address the problem of super resolved generation of novel views of a 3d scene with the reference images obtained from cameras in general positions; a problem which has not been tackled before in the context of super resolution and is also of importance to the field of image based rendering. we formulate the problem as one of estimation of the color at each pixel in the high resolution novel view without explicit and accurate depth recovery. we employ a reconstruction based approach using mrf-map formalism and solve using graph cut optimization. we also give an effective method to handle occlusion. we present compelling results on real images. feature subset selection for multi-class svm based image classification. multi-class image classification can benefit much from feature subset selection. this paper extends an error bound of binary svms to a feature subset selection criterion for the multi-class svms. by minimizing this criterion, the scale factors assigned to each feature in a kernel function are optimized to identify the important features. this minimization problem can be efficiently solved by gradient-based search techniques, even if hundreds of features are involved. also, considering that image classification is often a small sample problem, the regularization issue is investigated for this criterion, showing its robustness in this situation. experimental study on multiple benchmark image data sets demonstrates the effectiveness of the proposed approach. generative estimation of 3d human pose using shape contexts matching. we present a method for 3d pose estimation of human motion in generative framework. for the generalization of application scenario, the observation information we utilized comes from monocular silhouettes. we distill prior information of human motion by performing conventional pca on single motion capture data sequence. in doing so, the aims for both reducing dimensionality and extracting the prior knowledge of human motion are achieved simultaneously. we adopt the shape contexts descriptor to construct the matching function, by which the validity and the robustness of the matching between image features and synthesized model features can be ensured. to explore the solution space efficiently, we design the annealed genetic algorithm (aga) and hierarchical annealed genetic algorithm (haga) that searches the optimal solutions effectively by utilizing the characteristics of state space. results of pose estimation on different motion sequences demonstrate that the novel generative method can achieves viewpoint invariant 3d pose estimation. view planning for cityscape archiving and visualization. this work explores full registration of scenes in a large area purely based images for city indexing and visualization. ground-based images including route panoramas, scene tunnels, panoramic views, and spherical views are acquired in the area and are associated with geospatial information. in this paper, we plan distributed locations and paths in the urban area based on the visibility, image properties, image coverage, and scene importance for image acquisition. the criterion is to use a small number of images to cover as large scenes as possible. lidar data are used in this view evaluation and real data are acquired accordingly. the extended images realize a compact and complete visual data archiving, which will enhance the perception of spatial relations of scenes. markov random field modeled level sets method for object tracking with moving cameras. object tracking using active contours has attracted increasing interest in recent years due to acquisition of effective shape descriptions. in this paper, an object tracking method based on level sets using moving cameras is proposed. we develop an automatic contour initialization method based on optical flow detection. a markov random field (mrf)-like model measuring the correlations between neighboring pixels is added to improve the general region-based level sets speed model. the experimental results on several real video sequences show that our method successfully tracks objects despite object scale changes, motion blur, background disturbance, and gets smoother and more accurate results than the current region-based method. constrained optimization for human pose estimation from depth sequences. a new 2-step method is presented for human upper-body pose estimation from depth sequences, in which coarse human part labeling takes place first, followed by more precise joint position estimation as the second phase. in the first step, a number of constraints are extracted from notable image features such as the head and torso. the problem of pose estimation is cast as that of label assignment with these constraints. major parts of the human upper body are labeled by this process. the second step estimates joint positions optimally based on kinematic constraints using dense correspondences between depth profile and human model parts. the proposed framework is shown to overcome some issues of existing approaches for human pose tracking using similar types of data streams. performance comparison with motion capture data is presented to demonstrate the accuracy of our approach. shape from contour for the digitization of curved documents. we are aiming at extending the basic digital camera functionalities to the ability to simulate the flattening of a document, by virtually acting like a flatbed scanner. typically, the document is the warped page of an opened book. the problem is stated as a computer vision problem, whose resolution involves, in particular, a 3d reconstruction technique, namely shape from contour. assuming that a photograph is taken by a camera in arbitrary position or orientation, and that the model of the document surface is a generalized cylinder, we show how the corrections of its geometric distortions, including perspective distortion, can be achieved from a single view of the document. the performances of the proposed technique are assessed and illustrated through experiments on real images. evaluating multi-class multiple-instance learning for image categorization. automatic image categorization is a challenging computer vision problem, to which multiple-instance learning (mil) has emerged as a promising approach. typical current mil schemes rely on binary one-versus-all classification, even for inherently multi-class problems. there are a few drawbacks with binary mil when applied to a multi-class classification problem. this paper describes multi-class multiple-instance learning (mcmil). to image categorization that bypasses the necessity of constructing a series of binary classifiers. we analyze mcmil in depth to show why it is advantageous over binary mil when strong target concept overlaps exist among the classes. we systematically valuate mcmil using two challenging image databases, and compare it with state-of-the-art binary mil approaches. the mcmil achieves competitive classification accuracy, robustness to labeling noise, and effectiveness in capturing the target concepts using smaller amount of training data. we show that the learned target concepts from mcmil conform to human interpretation of the images. action recognition for surveillance applications using optic flow and svm. low quality images taken by surveillance cameras pose a great challenge to human action recognition algorithms. this is because they are usually noisy, of low resolution and of low frame rate. in this paper we propose an action recognition algorithm to overcome the above challenges. we use optic flow to construct motion descriptors and apply a svm to classify them. having powerful discriminative features, we significantly reduce the size of the feature set required. this algorithm can be applied to videos with low frame rate without scarifying efficiency or accuracy, and is robust to scale and view point changes. to evaluate our method, we used a database consisting of walking, running, jogging, hand clapping, hand waving and boxing actions. this grayscale database has images of low resolution and poor quality. this image database resembles images taken by surveillance cameras. the proposed method outperforms competing algorithms evaluated on the same database. depth from stationary blur with adaptive filtering. this work achieves an efficient acquisition of scenes and their depths along long streets. a camera is mounted on a vehicle moving along a path and a sampling line properly set in the camera frame scans the 1d scene continuously to form a 2d route panorama. this paper extends a method to estimate depth from the camera path by analyzing the stationary blur in the route panorama. the temporal stationary blur is a perspective effect in parallel projection yielded from the sampling slit with a physical width. the degree of blur is related to the scene depth from the camera path. this paper analyzes the behavior of the stationary blur with respect to camera parameters and uses adaptive filtering to improve the depth estimation. it avoids feature matching or tracking for complex street scenes and facilitates real time sensing. the method also stores much less data than a structure from motion approach does so that it can extend the sensing area significantly. an occupancy-depth generative model of multi-view images. this paper presents an occupancy based generative model of stereo and multi-view stereo images. in this model, the space is divided into empty and occupied regions. the depth of a pixel is naturally determined from the occupancy as the depth of the first occupied point in its viewing ray. the color of a pixel corresponds to the color of this 3d point. this model has two theoretical advantages. first, unlike other occupancy based models, it explicitly models the deterministic relationship between occupancy and depth and, thus, it correctly handles occlusions. second, unlike depth based approaches, determining depth from the occupancy automatically ensures the coherence of the resulting depth maps. experimental results computing the map of the model using message passing techniques are presented to show the applicability of the model. highest accuracy fundamental matrix computation. we compare algorithms for fundamental matrix computation, which we classify into "a posteriori correction", "internal access", and "external access". doing experimental comparison, we show that the 7-parameter levenberg-marquardt (lm) search and the extended fns (efns) exhibit the best performance and that additional bundle adjustment does not increase the accuracy to any noticeable degree. determining relative geometry of cameras from normal flows. determining the relative geometry of cameras is important in active binocular head or multi-camera system. most of the existing works rely upon the establishment of either motion correspondences or binocular correspondences. this paper presents a first solution method that requires no recovery of full optical flow in either camera, nor overlap in the cameras' visual fields and in turn the presence of binocular correspondences. the method is based upon observations that are directly available in the respective image stream - the monocular normal flow. experimental results on synthetic data and real image data are shown to illustrate the potential of the method. detecting, tracking and recognizing license plates. this paper introduces a novel real-time framework which enables detection, tracking and recognition of license plates from video sequences. an efficient algorithm based on analysis of maximally stable extremal region (mser) detection results allows localization of international license plates in single images without the need of any learning scheme. after a one-time detection of a plate it is robustly tracked through the sequence by applying a modified version of the mser tracking framework which provides accurate localization results and additionally segmentations of the individual characters. therefore, tracking and character segmentation is handled simultaneously. finally, support vector machines are used to recognize the characters on the plate. an experimental evaluation shows the high accuracy and efficiency of the detection and tracking algorithm. furthermore, promising results on a challenging data set are presented and the significant improvement of the recognition rate due to the robust tracking scheme is proved. dense 3d reconstruction of specular and transparent objects using stereo cameras and phase-shift method. in this paper, we first describe our approach to measuring the surface shape of specular objects and then we extend the method to measuring the surface shape of transparent objects by using stereo cameras and a display. we show that two viewpoints can uniquely determine the surface shape and surface normal by investigating the light path for each surface point. we can determine the light origin for each surface point by showing two-dimensional phase shifts on the display. we obtained dense and accurate results for both planar surfaces and curved surfaces. fast optimal three view triangulation. we consider the problem of l2-optimal triangulation from three separate views. triangulation is an important part of numerous computer vision systems. under gaussian noise, minimizing the l2 norm of the reprojection error gives a statistically optimal estimate. this has been solved for two views. however, for three or more views, it is not clear how this should be done. a previously proposed, but computationally impractical, method draws on gröbner basis techniques to solve for the complete set of stationary points of the cost function. we show how this method can be modified to become significantly more stable and hence given a fast implementation in standard ieee double precision. we evaluate the precision and speed of the new method on both synthetic and real data. the algorithm has been implemented in a freely available software package which can be downloaded from the internet. recognition of digital images of the human face at ultra low resolution via illumination spaces. recent work has established that digital images of a human face, collected under various illumination conditions, contain discriminatory information that can be used in classification. in this paper we demonstrate that sufficient discriminatory information persists at ultralow resolution to enable a computer to recognize specific human faces in settings beyond human capabilities. for instance, we utilized the haar wavelet to modify a collection of images to emulate pictures from a 25- pixel camera. from these modified images, a low-resolution illumination space was constructed for each individual in the cmu-pie database. each illumination space was then interpreted as a point on a grassmann manifold. classification that exploited the geometry on this manifold yielded error-free classification rates for this data set. this suggests the general utility of a low-resolution illumination camera for set-based image recognition problems. an adaptive nonparametric discriminant analysis method and its application to face recognition. linear discriminant analysis (lda) is frequently used for dimension reduction and has been successfully utilized in many applications, especially face recognition. in classical lda, however, the definition of the between-class scatter matrix can cause large overlaps between neighboring classes, because lda assumes that all classes obey a gaussian distribution with the same covariance. we therefore, propose an adaptive nonparametric discriminant analysis (anda) algorithm that maximizes the distance between neighboring samples belonging to different classes, thus improving the discriminating power of the samples near the classification borders. to evaluate its performance thoroughly, we have compared our anda algorithm with traditional pca+lda, orthogonal lda (olda) and nonparametric discriminant analysis (nda) on the feret and orl face databases. experimental results show that the proposed algorithm outperforms the others. sports classification using cross-ratio histograms. the paper proposes a novel approach for classification of sports images based on the geometric information encoded in the image of a sport's field. the proposed approach uses invariant nature of a crossratio under projective transformation to develop a robust classifier. for a given image, cross-ratios are computed for the points obtained from the intersection of lines detected using hough transform. these cross-ratios are represented by a histogram which forms a feature vector for the image. an svm classifier trained on aprior model histograms of crossratios for sports fields is used to decide the most likely sport's field in the image. experimental validation shows robust classification using the proposed approach for images of tennis, football, badminton, basketball taken from dissimilar view points. multiplexed illumination for measuring brdf using an ellipsoidal mirror and a projector. measuring a bidirectional reflectance distribution function (brdf) requires long time because a target object must be illuminated from all incident angles and the reflected light must be measured from all reflected angles. a high-speed method is presented to measure brdfs using an ellipsoidal mirror and a projector. the method can change incident angles without a mechanical drive. moreover, it is shown that the dynamic range of the measured brdf can be significantly increased by multiplexed illumination based on the hadamard matrix. image segmentation using iterated graph cuts based on multi-scale smoothing. we present a novel approach to image segmentation using iterated graph cuts based on multi-scale smoothing. we compute the prior probability obtained by the likelihood from a color histogram and a distance transform using the segmentation results from graph cuts in the previous process, and set the probability as the t-link of the graph for the next process. the proposed method can segment the regions of an object with a stepwise process from global to local segmentation by iterating the graph-cuts process with gaussian smoothing using different values for the standard deviation. we demonstrate that we can obtain 4.7% better segmentation than that with the conventional approach. person-similarity weighted feature for expression recognition. in this paper, a new method to extract person-independent expression feature based on hosvd (higher-order singular value decomposition) is proposed for facial expression recognition. with the assumption that similar persons have similar facial expression appearance and shape, person-similarity weighted expression feature is used to estimate the expression feature of the test person. as a result, the estimated expression feature can reduce the influence of individual caused by insufficient training data and becomes less person-dependent, and can be more robust to new persons. the proposed method has been tested on cohn-kanade facial expression database and japanese female facial expression (jaffe) database. person-independent experimental results show the efficiency of the proposed method. co-segmentation of image pairs with quadratic global constraint in mrfs. this paper provides a novel method for co-segmentation, namely simultaneously segmenting multiple images with same foreground and distinct backgrounds. our contribution primarily lies in four-folds. first, image pairs are typically captured under different imaging conditions, which makes the color distribution of desired object shift greatly, hence it brings challenges to color-based co-segmentation. here we propose a robust regression method to minimize color variances between corresponding image regions. secondly, although having been intensively discussed, the exact meaning of the term "co-segmentation" is rather vague and importance of image background is previously neglected, this motivate us to provide a novel, clear and comprehensive definition for co-segmentation. thirdly, it is an involved issue that specific regions tend to be categorized as foreground, so we introduce "risk term" to differentiate colors, which has not been discussed before in the literatures to our best knowledge. lastly and most importantly, unlike conventional linear global terms in mrfs, we propose a sum-of-squared-difference (ssd) based global constraint and deduce its equivalent quadratic form which takes into account the pairwise relations in feature space. reasonable assumptions are made and global optimal could be efficiently obtained via alternating graph cuts. microscopic surface shape estimation of a transparent plate using a complex image. this paper proposes a method to estimate the surface shape of a transparent plate using a reflection image on the plate. the reflection image on a transparent plate is a complex image that consists of a reflection on the surface and on the rear surface of the plate. a displacement between the two reflection images holds the range information to the object, which can be extracted from a single complex image. the displacement in the complex image depends not only on the object range but also on the normal vectors of the plate surfaces, plate thickness, relative refraction index, and the plate position. these parameters can be estimated using multiple planar targets with random texture at known distances. experimental results show that the proposed method can detect microscopic surface shape differences between two different commercially available transparent acrylic plates. pose estimation from circle or parallel lines in a single image. the paper is focused on the problem of pose estimation from a single view in minimum conditions that can be obtained from images. under the assumption of known intrinsic parameters, we propose and prove that the pose of the camera can be recovered uniquely in three situations: (a) the image of one circle with discriminable center; (b) the image of one circle with preassigned world frame; (c) the image of any two pairs of parallel lines. compared with previous techniques, the proposed method does not need any 3d measurement of the circle or lines, thus the required conditions are easily satisfied in many scenarios. extensive experiments are carried out to validate the proposed method. discriminant clustering embedding for face recognition with image sets. in this paper, a novel local discriminant embedding method, discriminant clustering embedding (dce), is proposed for face recognition with image sets. dce combines the effectiveness of submanifolds, which are extracted by clustering for each subject's image set, characterizing the inherent structure of face appearance manifold and the discriminant property of discriminant embedding. the low-dimensional embedding is learned via preserving the neighbor information within each submanifold, and separating the neighbor submanifolds belonging to different subjects from each other. compared with previous work, the proposed method could not only discover the most powerful discriminative information embedded in the local structure of face appearance manifolds more sufficiently but also preserve it more efficiently. extensive experiments on real world data demonstrate that dce is efficient and robust for face recognition with image sets. learning-based super-resolution system using single facial image and multi-resolution wavelet synthesis. a learning-based super-resolution system consisting of training and synthesis processes is presented. in the proposed system, a multiresolution wavelet approach is applied to carry out the robust synthesis of both the global geometric structure and the local high-frequency detailed features of a facial image. in the training process, the input image is transformed into a series of images of increasingly lower resolution using the haar discrete wavelet transform (dwt). the images at each resolution level are divided into patches, which are then projected onto an eigenspace to derive the corresponding projection weight vectors. in the synthesis process, a low-resolution input image is divided into patches, which are then projected onto the same eigenspace as that used in the training process. modeling the resulting projection weight vectors as a markov network, the maximum a posteriori (map) estimation approach is then applied to identity the best-matching patches with which to reconstruct the image at a higher level of resolution. the experimental results demonstrate that the proposed reconstruction system yields better results than the bi-cubic spline interpolation method. content-based matching of videos using local spatio-temporal fingerprints. fingerprinting is the process of mapping content or fragments of it, into unique, discriminative hashes called fingerprints. in this paper, we propose an automated video identification algorithm that employs fingerprinting for storing videos inside its database. when queried using a degraded short video segment, the objective of the system is to retrieve the original video to which it corresponds to, both accurately and in real-time. we present an algorithm that first, extracts key frames for temporal alignment of the query and its actual database video, and then computes spatio-temporal fingerprints locally within such frames, to indicate a content-match. all stages of the algorithm have been shown to be highly stable and reproducible even when strong distortions are applied to the query. a new framework for grayscale and colour non-lambertian shape-from-shading. in this paper we show how arbitrary surface reflectance properties can be incorporated into a shape-from-shading scheme, by using a riemannian minimisation scheme to minimise the brightness error. we show that for face images an additional regularising constraint on the surface height function is all that is required to recover accurate face shape from single images, the only assumption being of a single light source of known direction. the method extends naturally to colour images, which add additional constraints to the problem. for our experimental evaluation we incorporate the torrance and sparrow surface reflectance model into our scheme and show how to solve for its parameters in conjunction with recovering a face shape estimate. we demonstrate that the method provides a realistic route to non-lambertian shape-from-shading for both grayscale and colour face images. comparative studies on multispectral palm image fusion for biometrics. hand biometrics, including fingerprint, palmprint, hand geometry and hand vein pattern, have obtained extensive attention in recent years. physiologically, skin is a complex multi-layered tissue consisting of various types of components. optical research suggests that different components appear when the skin is illuminated with light sources of different wavelengths. this motivates us to extend the capability of camera by integrating information from multispectral palm images to a composite representation that conveys richer and denser pattern for recognition. besides, usability and security of the whole system might be boosted at the same time. in this paper, comparative study of several pixel level multispectral palm image fusion approaches is conducted and several well-established criteria are utilized as objective fusion quality evaluation measure. among others, curvelet transform is found to perform best in preserving discriminative patterns from multispectral palm images. improved space carving method for merging and interpolating multiple range images using information of light sources of active stereo. to merge multiple range data obtained by range scanners, filling holes caused by unmeasured regions, the space carving method is a simple and effective method. however, this method often fails if the number of the input range images is small, because unseen voxels that are not carved out remains in the volume area. in this paper, we propose an improved algorithm of the space carving method that produces stable results. in the proposed method, a discriminant function defined on volume space is used to estimate whether each voxel is inside or outside the objects. also, in particular case that the range images are obtained by active stereo method, the information of the positions of the light sources can be used to improve the accuracy of the results. image and video matting with membership propagation. two techniques are devised for a natural image matting method using semi-supervised object extraction. one is a guiding scheme for placement of user strokes specifying object or background regions and the other is a scheme of adjustment of object colors for conforming to composited background colors. we draw strokes at inhomogeneous color regions disclosed with an unsupervised cluster extraction method from which the semi-supervised algorithm is derived. objects are composited with a new background after their color adjustment using a color transfer method with eigencolor mapping. this image matting method is then extended to videos. strokes are drawn only in the first frame from which memberships are propagated to successive frames to extract objects in every frame. performance of the proposed method is examined with images and videos experimented with existing matting methods. high capacity watermarking in nonedge texture under statistical distortion constraint. high-capacity image watermarking scheme aims at maximize bit rate of hiding information, neither eliciting perceptible image distortion nor facilitating special watermark attack. texture, in preattentive vision, delivers itself by concise high-order statistics, and holds high capacity for watermark. however, traditional distortion constraint, e.g. just-noticeable-distortion (jnd), cannot evaluate texture distortion in visual perception and thus imposes too strict constraint. inspired by recent work of image representation [9], which suggests texture extraction and mix probability principal component analysis for learning texture feature, we propose a distortion measure in the subspace spanned by texture principal components, and an adaptive distortion constraint depending on image local roughness. the proposed spread-spectrum watermarking scheme generates watermarked images with larger snr than jnd-based schemes at the same distortion level allowed, and its watermark has a power spectrum approximately directly proportional to the host image's and thereby more robust against wiener filtering attack. localized content-based image retrieval using semi-supervised multiple instance learning. in this paper, we propose a semi-supervised multiple-instance learning (ssmil) algorithm, and apply it to localized content-based image retrieval (lcbir), where the goal is to rank all the images in the database, according to the object that users want to retrieve. ssmil treats lcbir as a semi-supervised problem and utilize the unlabeled pictures to help improve the retrieval performance. the comparison result of ssmil with several state-of-art algorithms is promising. where's the weet-bix? this paper proposes a new retrieval problem and conducts the initial study. this problem aims at finding the location of an item in a supermarket by means of visual retrieval. it is modelled as object-based retrieval and approached using the local invariant features. two existing retrieval methods are investigated and their similarity measures are modified to better fit this new problem. more importantly, through the study this new retrieval problem proves itself to be a challenging task. an instant application of it is to help the customer find what they want without physically wandering around the shelves but a wide range of potential applications could be expected. evolving measurement regions for depth from defocus. depth from defocus (dfd) is a 3d recovery method based on estimating the amount of defocus induced by finite lens apertures. given two images with different camera settings, the problem is to measure the resulting differences in defocus across the image, and to estimate a depth based on these blur differences. most methods assume that the scene depth map is locally smooth, and this leads to inaccurate depth estimates near discontinuities. in this paper, we propose a novel dfd method that avoids smoothing over discontinuities by iteratively modifying an elliptical image region over which defocus is estimated. our method can be used to complement any depth from defocus method based on spatial domain measurements. in particular, this method improves the dfd accuracy near discontinuities in depth or surface orientation. face recognition by using elongated local binary patterns with average maximum distance gradient magnitude. in this paper, we propose a new face recognition approach based on local binary patterns (lbp). the proposed approach has the following novel contributions. (i) as compared with the conventional lbp, anisotropic structures of the facial images can be captured effectively by the proposed approach using elongated neighborhood distribution, which is called the elongated lbp (elbp). (ii) a new feature, called average maximum distance gradient magnitude (amdgm), is proposed. amdgm embeds the gray level difference information between the reference pixel and neighboring pixels in each elbp pattern. (iii) it is found that the elbp and amdgm features are well complement with each other. the proposed method is evaluated by performing facial expression recognition experiments on two databases: orl and feret. the proposed method is compared with two widely used face recognition approaches. furthermore, to test the robustness of the proposed method under the condition that the resolution level of the input images is low, we also conduct additional face recognition experiments on the two databases by reducing the resolution of the input facial images. the experimental results show that the proposed method gives the highest recognition accuracy in both normal environment and low image resolution conditions. a convex programming approach to the trace quotient problem. the trace quotient problem arises in many applications in pattern classification and computer vision, e.g., manifold learning, low-dimension embedding, etc. the task is to solve a optimization problem involving maximizing the ratio of two traces, i.e., maxw tr(f(w))/tr(h(w)). this optimization problem itself is non-convex in general, hence it is hard to solve it directly. conventionally, the trace quotient objective function is replaced by a much simpler quotient trace formula, i.e., maxw tr (h(w)-1f(w)), which accommodates a much simpler solution. however, the result is no longer optimal for the original problem setting, and some desirable properties of the original problem are lost. in this paper we proposed a new formulation for solving the trace quotient problem directly. we reformulate the original non-convex problem such that it can be solved by efficiently solving a sequence of semidefinite feasibility problems. the solution is therefore globally optimal. besides global optimality, our algorithm naturally generates orthonormal projection matrix. moreover it relaxes the restriction of linear discriminant analysis that the projection matrix's rank can only be at most c - 1, where c is the number of classes. our approach is more flexible. experiments show the advantages of the proposed algorithm. kernel discriminant analysis based on canonical differences for face recognition in image sets. a novel kernel discriminant transformation (kdt) algorithm based on the concept of canonical differences is presented for automatic face recognition applications. for each individual, the face recognition system compiles a multi-view facial image set comprising images with different facial expressions, poses and illumination conditions. since the multi-view facial images are non-linearly distributed, each image set is mapped into a highdimensional feature space using a nonlinear mapping function. the corresponding linear subspace, i.e. the kernel subspace, is then constructed via a process of kernel principal component analysis (kpca). the similarity of two kernel subspaces is assessed by evaluating the canonical difference between them based on the angle between their respective canonical vectors. utilizing the kernel fisher discriminant (kfd), a kdt algorithm is derived to establish the correlation between kernel subspaces based on the ratio of the canonical differences of the between-classes to those of the within-classes. the experimental results demonstrate that the proposed classification system outperforms existing subspace comparison schemes and has a promising potential for use in automatic face recognition applications. visual odometry for non-overlapping views using second-order cone programming. we present a solution for motion estimation for a set of cameras which are firmly mounted on a head unit and do not have overlapping views in each image. this problem relates to ego-motion estimation of multiple cameras, or visual odometry. we reduce motion estimation to solving a triangulation problem, which finds a point in space from multiple views. the optimal solution of the triangulation problem in linfinity norm is found using socp (second-order cone programming) consequently, with the help of the optimal solution for the triangulation, we can solve visual odometry by using socp as well. stereo vision enabling precise border localization within a scanline optimization framework. a novel algorithm for obtaining accurate dense disparity measurements and precise border localization from stereo pairs is proposed. the algorithm embodies a very effective variable support approach based on segmentation within a scanline optimization framework. the use of a variable support allows for precisely retrieving depth discontinuities while smooth surfaces are well recovered thanks to the minimization of a global function along multiple scanlines. border localization is further enhanced by symmetrically enforcing the geometry of the scene along depth discontinuities. experimental results show a significant accuracy improvement with respect to comparable stereo matching approaches. a fast and noise-tolerant method for positioning centers of spiraling and circulating vector fields. identification of centers of circulating and spiraling vector fields are important in many applications. tropical cyclone tracking, rotating object identification, analysis of motion video and movement of fluids are but some examples. in this paper, we introduce a fast and noise tolerant method for finding centers of circulating and spiraling vector field pattern. the method can be implemented using integer operations only. it is 1.4 to 4.5 times faster than traditional methods, and the speedup can be further boosted up to 96.6 by the incorporation of search algorithms. we show the soundness of the algorithm using experiments on synthetic vector fields and demonstrate its practicality using application examples in the field of multimedia and weather forecasting. discriminating 3d faces by statistics of depth differences. in this paper, we propose an efficient 3d face recognition method based on statistics of range image differences. each pixel value of range image represents normalized depth value of corresponding point on facial surface, and so depth differences between two range images' pixels of the same position on face can straightforwardly describe the differences between two faces' structures. here, we propose to use histogram proportion of depth differences to discriminate intra and inter personal differences for 3d face recognition. depth differences are computed from a neighbor district instead of direct subtraction to avoid the impact of non-precise registration. furthermore, three schemes are proposed to combine the local rigid region(nose) and holistic face to overcome expression variation for robust recognition. promising experimental results are achieved on the 3d dataset of frgc2.0, which is the most challenging 3d database so far. color-stripe structured light robust to surface color and discontinuity. multiple color stripes have been employed for structured light-based rapid range imaging to increase the number of uniquely identifiable stripes. the use of multiple color stripes poses two problems: (1) object surface color may disturb the stripe color and (2) the number of adjacent stripes required for identifying a stripe may not be maintained near surface discontinuities such as occluding boundaries. in this paper, we present methods to alleviate those problems. log-gradient filters are employed to reduce the influence of object colors, and color stripes in two and three directions are used to increase the chance of identifying correct stripes near surface discontinuities. experimental results demonstrate the effectiveness of our methods. viewpoint insensitive action recognition using envelop shape. action recognition is a popular and important research topic in computer vision. however, it is challenging when facing viewpoint variance. so far, most researches in action recognition remain rooted in view-dependent representations. some view invariance approaches have been proposed, but most of them suffer from some weaknesses, such as lack of abundant information for recognition, dependency on robust meaningful feature detection or point correspondence. to perform viewpoint and subject independent action recognition, we propose a representation named "envelop shape" which is viewpoint insensitive. "envelop shape" is easy to acquire from silhouettes using two orthogonal cameras. it makes full use of two cameras' silhouettes to dispel influence caused by human body's vertical rotation, which is often the primary viewpoint variance. with the help of "envelop shape", we obtained inspiring results on action recognition independent of subject and viewpoint. results indicate that "envelop shape" representation contains enough discriminating features for action recognition. shape recovery from turntable image sequence. this paper makes use of both feature points and silhouettes to deliver fast 3d shape recovery from a turntable image sequence. the algorithm exploits object silhouettes in two views to establish a 3d rim curve, which is defined with respect to the two frontier points arising from two views. the images of this 3d rim curve in the two views are matched using cross correlation technique with silhouette constraint incorporated. a 3d planar rim curve is then reconstructed using point-based reconstruction method. a set of rims enclosing the object can be obtained from an image sequence captured under circular motion. the proposed method solves the problem of reconstruction of concave object surface, which is usually left unresolved in general silhouette-based reconstruction methods. in addition, the property of the organized reconstructed rim curves allows fast surface extraction. experimental results with real data are presented. fast 3-d interpretation from monocular image sequences on large motion fields. this paper proposes a fast method for dense 3-d interpretation to directly estimate a dense map of relative depth and motion from a monocular sequence of images on large motion fields. the nagel-enkelmann technique is employed in the variational formulation of the problem. diffusion-reaction equations are derived from the formulation so as to approximate the dense map on large motion fields and realize an anisotropic diffusion to preserve the discontinuities of the map. by combining the ideas of implicit schemes and multigrid methods, we present a new implicit multigrid block gauss-seidel relaxation scheme, which dramatically reduces the computation time for solving the largescale linear system of diffusion-reaction equations. using our method, we perform fast 3-d interpretation of image sequences with large motion fields. the efficiency and effectiveness of our method are experimentally verified with synthetic and real image sequences. a regularized approach to feature selection for face detection. in this paper we present a trainable method for selecting features from an overcomplete dictionary of measurements. the starting point is a thresholded version of the landweber algorithm for providing a sparse solution to a linear system of equations. we consider the problem of face detection and adopt rectangular features as an initial representation for allowing straightforward comparisons with existing techniques. for computational efficiency and memory requirements, instead of implementing the full optimization scheme on tenths of thousands of features, we propose to first solve a number of smaller size optimization problems obtained by randomly sub-sampling the feature vector, and then recombining the selected features. the obtained set is still highly redundant, so we further apply feature selection. the final feature selection system is an efficient two-stages architecture. experimental results of an optimized version of the method on face images and image sequences indicate that this method is a serious competitor of other feature selection schemes recently popularized in computer vision for dealing with problems of real time object detection. adaptively determining degrees of implicit polynomial curves and surfaces. fitting an implicit polynomial (ip) to a data set usually suffers from the difficulty of determining a moderate polynomial degree. an over-low degree leads to inaccuracy than one expects, whereas an overhigh degree leads to global instability. we propose a method based on automatically determining the moderate degree in an incremental fitting process through using qr decomposition. this incremental process is computationally efficient, since by reusing the calculation result from the previous step, the burden of calculation is dramatically reduced at the next step. simultaneously, the fitting instabilities can be easily checked out by judging the eigenvalues of an upper triangular matrix from qr decomposition, since its diagonal elements are equal to the eigenvalues. based on this beneficial property and combining it with tasdizen's ridge regression method, a new technique is also proposed for improving fitting stability. iris tracking and regeneration for improving nonverbal interface. in this study, we discuss the quality of teleconference with respect to especially to the "eye-contact". recently, video conference system can be used easily even in the mobile phone environment with camera, and many people use it in daily life. since human is likely to look at the face of his partner on monitor not at camera, he will usually fail to send his own eye-contacted facial images to him, and vice versa. we pay attention to the disagreement of the eye-contact in teleconference caused by the separation between the input camera and output monitor devices. then we propose the eye-contact camera system for generating eye-contacted motion images to the receiver. in this system, iris contour is extracted after the face region extraction, the vertical and horizontal directions of the glance are calculated based on the relation among positions of the monitor, camera and receiver, and finally the iris center coordinates are shifted in the image so that the partner looks just looking at him, and vice versa. we implemented the system on note pc with web-camera for evaluating the usability. multiple view geometry for non-rigid motions viewed from translational cameras. this paper introduces multiple view geometry under projective projections from four-dimensional space to two-dimensional space which can represent multiple view geometry under the projection of space with time. we show the multifocal tensors defined under space-time projective projections can be derived from non-rigid object motions viewed from multiple cameras with arbitrary translational motions, and they are practical for generating images of non-rigid object motions viewed from cameras with arbitrary translational motions. the method is tested in real image sequences. shape representation and classification using boundary radius function. in this paper, a new method for the problem of shape representation and classification is proposed. in this method, we define a radius function on the contour of the shape which captures for each point of the boundary, attributes of its related internal part of the shape. we call these attributes as "depth" of the point. depths of boundary points generate a descriptor sequence which represents the shape. matching of sequences is performed using dynamic programming method and a distance measure is acquired. at last, different classes of shapes are classified using a hierarchical clustering method and the distance measure. the proposed method can analyze features of each part of the shape locally which this leads to the ability of part analysis and insensitivity to local deformations such as articulation, occlusion and missing parts. we show high efficiency of the proposed method by evaluating it for shape matching and classification of standard shape datasets. efficient graph cuts for multiclass interactive image segmentation. interactive image segmentation has attracted much attention in the vision and graphics community recently. a typical application for interactive image segmentation is foreground/background segmentation based on user specified brush labellings. the problem can be formulated within the binary markov random field (mrf) framework which can be solved efficiently via graph cut [1]. however, no attempt has yet been made to handle segmentation of multiple regions using graph cuts. in this paper, we propose a multiclass interactive image segmentation algorithm based on the potts mrf model. following [2], this can be converted to a multiway cut problem first proposed in [2] and solved by expansion-move algorithms for approximate inference [2]. a faster algorithm is proposed in this paper for efficient solution of the multiway cut problem based on partial optimal labeling. to achieve this, we combine the one-vs-all classifier fusion framework with the expansion-move algorithm for label inference over large images. we justify our approach with both theoretical analysis and experimental validation. motion observability analysis of the simplified color correlogram for visual tracking. compared with the color histogram, where the position information of each pixel is ignored, a simplified color correlogram (scc) representation encodes the spatial information explicitly and enables an estimation algorithm to recover the object orientation. this paper analyzes the capability of the scc (in a kernel based framework) in detecting and estimating object motion and presents a principled way to obtain motion observable sccs as object representations to achieve more reliable tracking. extensive experimental results demonstrate the reliability of the tracking procedure using the proposed algorithm. simultaneous plane extraction and 2d homography estimation using local feature transformations. in this paper, we use local feature transformations estimated in the matching process as initial seeds for 2d homography estimation. the number of testing hypotheses is equal to the number of matches, naturally enabling a full search over the hypothesis space. using this property, we develop an iterative algorithm that clusters the matches under the common 2d homography into one group, i.e., features on a common plane. our clustering algorithm is less affected by the proportion of inliers and as few as two features on the common plane can be clustered together; thus, the algorithm robustly detects multiple dominant scene planes. the knowledge of the dominant planes is used for robust fundamental matrix computation in the presence of quasi-degenerate data. identifying foreground from multiple images. in this paper, we present a novel foreground extraction method that automatically identifies image regions corresponding to a common space region seen from multiple cameras. we assume that background regions present some color coherence in each image and we exploit the spatial consistency constraint that several image projections of the same space region must satisfy. integrating both color and spatial consistency constraints allows to fully automatically segment foreground and background regions in multiple images. in contrast to standard background subtraction approaches, the proposed approach does not require any a priori knowledge on the background nor user interactions. we demonstrate the effectiveness of the method for multiple camera setups with experimental results on standard real data sets. a novel multi-stage classifier for face recognition. a novel face recognition scheme based on multi-stages classifier, which includes methods of support vector machine (svm), eigenface, and random sample consensus (ransac), is proposed in this paper. the whole decision process is conducted cascade coarse-to-fine stages. the first stage adopts one-against-one-svm (oao-svm) method to choose two possible classes best similar to the testing image. in the second stage, "eigenface" method was employed to select one prototype image with the minimum distance to the testing image in each of the two classes chosen. finally, the real class is determined by comparing the geometric similarity, as done by "ransac" method, between these prototype images and the testing images. this multi-stage face recognition system has been tested on olivetti research laboratory (orl) face databases, and its experimental results give evidence that the proposed approach outperforms the other approaches either based on the single classifier or multi-parallel classifier, it can even obtain a nearly 100 percent recognition accuracy. conic fitting using the geometric distance. we consider the problem of fitting a conic to a set of 2d points. it is commonly agreed that minimizing geometrical error, i.e. the sum of squared distances between the points and the conic, is better than using an algebraic error measure. however, most existing methods rely on algebraic error measures. this is usually motivated by the fact that pointto-conic distances are difficult to compute and the belief that non-linear optimization of conics is computationally very expensive. in this paper, we describe a parameterization for the conic fitting problem that allows to circumvent the difficulty of computing point-to-conic distances, and we show how to perform the non-linear optimization process efficiently. statistical framework for shot segmentation and classification in sports video. in this paper, a novel statistical framework is proposed for shot segmentation and classification. the proposed framework segments and classifies shots simultaneously using same difference features based on statistical inference. the task of shot segmentation and classification is taken as finding the most possible shot sequence given feature sequences, and it can be formulated by a conditional probability which can be divided into a shot sequence probability and a feature sequence probability. shot sequence probability is derived from relations between adjacent shots by bi-gram, and feature sequence probability is dependent on inherent character of shot modeled by hmm. thus, the proposed framework segments shot considering the character of intra-shot to classify shot, while classifies shot considering character of inter-shot to segment shot, which obtain more accurate results. experimental results on soccer and badminton videos are promising, and demonstrate the effectiveness of the proposed framework. temporal priors for novel video synthesis. in this paper we propose a method to construct a virtual sequence for a camera moving through a static environment given an input sequence from a different camera trajectory. existing image-based rendering techniques can generate photorealistic images given a set of input views, though the output images almost unavoidably contain small regions where the colour has been incorrectly chosen. in a single image these artifacts are often hard to spot, but become more obvious when viewing a real image with its virtual stereo pair, and even more so when when a sequence of novel views is generated, since the artifacts are rarely temporally consistent. to address this problem of consistency, we propose a new spatiotemporal approach to novel video synthesis. the pixels in the output video sequence are modelled as nodes of a 3-d graph. we define an mrf on the graph which encodes photoconsistency of pixels as well as texture priors in both space and time. unlike methods based on scene geometry which yield highly connected graphs, our approach results in a graph whose degree is independent of scene structure. the mrf energy is therefore tractable and we solve it for the whole sequence using a stateof-the-art message passing optimisation algorithm. we demonstrate the effectiveness of our approach in reducing temporal artifacts. face mis-alignment analysis by multiple-instance subspace. in this paper, we systematically study the effect of poorly registered faces on the training and inferring stages of traditional face recognition algorithms. we then propose a novel multiple-instance based subspace learning scheme for face recognition. in this approach, we iteratively update the subspace training instances according to diverse densities, using class-balanced supervised clustering. we test our multiple instance subspace learning algorithm with fisherface for the application of face recognition. experimental results show that the proposed learning algorithm can improve the robustness of current methods with poorly aligned training and testing data. backward segmentation and region fitting for geometrical visibility range estimation. we present a new application of computer vision: continuous measurement of the geometrical visibility range on inter-urban roads, solely based on a monocular image acquisition system. to tackle this problem, we propose first a road segmentation scheme based on a parzen-windowing of a color feature space with an original update that allows us to cope with heterogeneously paved-roads, shadows and reflections, observed under various and changing lighting conditions. second, we address the under-constrained problem of retrieving the depth information along the road based on the flat word assumption. this is performed by a new region-fitting iterative least squares algorithm, derived from half-quadratic theory, able to cope with vanishing-point estimation, and allowing us to estimate the geometrical visibility range. image correspondence from motion subspace constraint and epipolar constraint. in this paper, we propose a novel method for inferring image correspondences on the pair of synchronized image sequences. in the proposed method, after tracking the feature points in each image sequence over several frames, we solve the image corresponding problem from two types of geometrical constraints: (1) the motion subspace obtained from the tracked feature points of a target sequence, and (2) the epipolar constraints between the two cameras. dissimilarly to the conventional correspondence estimation based on image matching using pixel values, the proposed approach enables us to obtain the correspondences even though the feature points, that can be seen from one camera view, but can not be seen (occluded or outside of the view) from the other camera. the validity of our method is demonstrated through the experiments using synthetic and real images. analyzing facial expression by fusing manifolds. feature representation and classification are two major issues in facial expression analysis. in the past, most methods used either holistic or local representation for analysis. in essence, local information mainly focuses on the subtle variations of expressions and holistic representation stresses on global diversities. to take the advantages of both, a hybrid representation is suggested in this paper and manifold learning is applied to characterize global and local information discriminatively. unlike some methods using unsupervised manifold learning approaches, embedded manifolds of the hybrid representation are learned by adopting a supervised manifold learning technique. to integrate these manifolds effectively, a fusion classifier is introduced, which can help to employ suitable combination weights of facial components to identify an expression. comprehensive comparisons on facial expression recognition are included to demonstrate the effectiveness of our algorithm. content-based image retrieval by indexing random subwindows with randomized trees. we propose a new method for content-based image retrieval which exploits the similarity measure and indexing structure of totally randomized tree ensembles induced from a set of subwindows randomly extracted from a sample of images. we also present the possibility of updating the model as new images come in, and the capability of comparing new images using a model previously constructed from a different set of images. the approach is quantitatively evaluated on various types of images with state-of-the-art results despite its conceptual simplicity and computational efficiency. palmprint recognition under unconstrained scenes. this paper presents a novel real-time palmprint recognition system for cooperative user applications. this system is the first one achieving noncontact capturing and recognizing palmprint images under unconstrained scenes. its novelties can be described in two aspects. the first is a novel design of image capturing device. the hardware can reduce influences of background objects and segment out hand regions efficiently. the second is a process of automatic hand detection and fast palmprint alignment, which aims to obtain normalized palmprint images for subsequent feature extraction. the palmprint recognition algorithm used in the system is based on accurate ordinal palmprint representation. by integrating power of the novel imaging device, the palmprint preprocessing approach and the palmprint recognition engine, the proposed system provides a friendly user interface and achieves a good performance under unconstrained scenes simultaneously. stereo matching using population-based mcmc. in this paper, we propose a new stereo matching method using the population-based markov chain monte carlo (pop-mcmc). pop-mcmc belongs to the sampling-based methods. since previous mcmc methods produce only one sample at a time, only local moves are available. however, since pop-mcmc uses multiple chains and produces multiple samples at a time, it enables global moves by exchanging information between samples, and in turn leads to faster mixing rate. in the view of optimization, it means that we can reach a state with the lower energy. the experimental results on real stereo images demonstrate that the performance of proposed algorithm is superior to those of previous algorithms. accelerating pattern matching or how much can you slide? this paper describes a method that accelerates pattern matching. the distance between a pattern and a window is usually close to the distance of the pattern to the adjacement windows due to image smoothness. we show how to exploit this fact to reduce the running time of pattern matching by adaptively sliding the window often by more than one pixel. the decision how much we can slide is based on a novel rank we define for each feature in the pattern. implemented on a pentium 4 3ghz processor, detection of a pattern with 7569 pixels in a 640 × 480 pixel image requires only 3.4ms. learning gabor magnitude features for palmprint recognition. palmprint recognition, as a new branch of biometric technology, has attracted much attention in recent years. various palmprint representations have been proposed for recognition. gabor feature has been recognized as one of the most effective representations for palmprint recognition, where gabor phase and orientation feature representations are extensively studied. in this paper, we explore a novel gabor magnitude feature-based method for palmprint recognition. the novelties are as follows: first, we propose an illumination normalization method for palmprint images to decrease the influence of illumination variations caused by different sensors and lighting conditions. second, we propose to use gabor magnitude features for palmprint representation. third, we utilize adaboost learning to extract most effective features and apply local discriminant analysis (lda) to reduce the dimension further for palmprint recognition. experimental results on three large palmprint databases demonstrate the effectiveness of proposed method. compared with state-of-the-art gabor-based methods, our method achieves higher accuracy. total absolute gaussian curvature for stereo prior. in spite of the great progress in stereo matching algorithms, the prior models they use, i.e., the assumptions about the probability to see each possible surface, have not changed much in three decades. here, we introduce a novel prior model motivated by psychophysical experiments. it is based on minimizing the total sum of the absolute value of the gaussian curvature over the disparity surface. intuitively, it is similar to rolling and bending a flexible paper to fit to the stereo surface, whereas the conventional prior is more akin to spanning a soap film. through controlled experiments, we show that the new prior outperforms the conventional models, when compared in the equal setting. a bayesian network for foreground segmentation in region level. this paper presents a probabilistic approach for automatically segmenting foreground objects from a video sequence. in order to save computation time and be robust to noise effect, a region detection algorithm incorporating edge information is first proposed to identify the regions of interest. next, we consider the motion of the foreground objects, and hence utilize the temporal coherence property on the regions detected. thus, foreground segmentation problem is formulated as follows. given two consecutive image frames and the segmentation result obtained priorly, we simultaneously estimate the motion vector field and the foreground segmentation mask in a mutually supporting manner. to represent the conditional joint probability density function in a compact form, a bayesian network is adopted, which is derived to model the interdependency of these two elements. experimental results for several video sequences are provided to demonstrate the effectiveness of our proposed approach. information fusion for multi-camera and multi-body structure and motion. information fusion algorithms have been successful in many vision tasks such as stereo, motion estimation, registration and robot localization. stereo and motion image analysis are intimately connected and can provide complementary information to obtain robust estimates of scene structure and motion. we present an information fusion based approach for multi-camera and multi-body structure and motion that combines bottom-up and top-down knowledge on scene structure and motion. the only assumption we make is that all scene motion consists of rigid motion. we present experimental results on synthetic and nonsynthetic data sets, demonstrating excellent performance compared to binocular based state-of-the-art approaches for structure and motion. a basin morphology approach to colour image segmentation by region merging. the problem of colour image segmentation is investigated in the context of mathematical morphology. morphological operators are extended to colour images by means of a lexicographical ordering in a polar colour space, which are then employed in the preprocessing stage. the actual segmentation is based on the use of the watershed transformation, followed by region merging, with the procedure being formalized as a basin morphology, where regions are "eroded" in order to form greater catchment basins. the result is a fully automated processing chain, with multiple levels of parametrisation and flexibility, the application of which is illustrated by means of the berkeley segmentation dataset. embedding a region merging prior in level set vector-valued image segmentation. in the scope of level set image segmentation, the number of regions is fixed beforehand. this number occurs as a constant in the objective functional and its optimization. in this study, we propose a region merging prior which optimizes the objective functional implicitly with respect to the number of regions. a statistical interpretation of the functional and learning over a set of relevant images and segmentation examples allow setting the weight of this prior to obtain the correct number of regions. this method is investigated and validated with color images and motion maps. continuously tracking objects across multiple widely separated cameras. in this paper, we present a new solution to the problem of multi-camera tracking with non-overlapping fields of view. the identities of moving objects are maintained when they are traveling from one camera to another. appearance information and spatio-temporal information are explored and combined in a maximum a posteriori (map) framework. in computing appearance probability, a two-layered histogram representation is proposed to incorporate spatial information of objects. diffusion distance is employed to histogram matching to compensate for illumination changes and camera distortions. in deriving spatio-temporal probability, transition time distribution between each pair of entry zone and exit zone is modeled as a mixture of gaussian distributions. experimental results demonstrate the effectiveness of the proposed method. spatiotemporal oriented energy features for visual tracking. this paper presents a novel feature set for visual tracking that is derived from "oriented energies". more specifically, energy measures are used to capture a target's multiscale orientation structure across both space and time, yielding a rich description of its spatiotemporal characteristics. to illustrate utility with respect to a particular tracking mechanism, we show how to instantiate oriented energy features efficiently within the mean shift estimator. empirical evaluations of the resulting algorithm illustrate that it excels in certain important situations, such as tracking in clutter with multiple similarly colored objects and environments with changing illumination. many trackers fail when presented with these types of challenging video sequences. a cascade of feed-forward classifiers for fast pedestrian detection. we develop a method that can detect humans in a single image based on a new cascaded structure. in our approach, both the rectangle features and 1-d edge-orientation features are employed in the feature pool for weak-learner selection, which can be computed via the integral-image and the integral-histogram techniques, respectively. to make the weak learner more discriminative, real adaboost is used for feature selection and learning the stage classifiers from the training images. instead of the standard boosted cascade, a novel cascaded structure that exploits both the stage-wise classification information and the interstage cross-reference information is proposed. experimental results show that our approach can detect people with both efficiency and accuracy. synchronized ego-motion recovery of two face-to-face cameras. a movie captured by a wearable camera affixed to an actor's body gives audiences the sense of "immerse in the movie". the raw movie captured by wearable camera needs stabilization with jitters due to ego-motion. however, conventional approaches often fail in accurate ego-motion estimation when there are moving objects in the image and no sufficient feature pairs provided by background region. to address this problem, we proposed a new approach that utilizes an additional synchronized video captured by the camera attached on the foreground object (another actor). formally we configure above sensor system as two face-to-face moving cameras. then we derived the relations between four views including two consecutive views from each camera. the proposed solution has two steps. firstly we calibrate the extrinsic relationship of two cameras with an ax=xb formulation, and secondly estimate the motion using calibration matrix. experiments verify that this approach can recover from failures of conventional approach and provide acceptable stabilization results for real data. multiperspective distortion correction using collineations. we present a new framework for correcting multiperspective distortions using collineations. a collineation describes the transformation between the images of a camera due to changes in sampling and image plane selection. we show that image distortions in many previous models of cameras can be effectively reduced via proper collineations. to correct distortions in a specific multiperspective camera, we develop an interactive system that allows users to select feature rays from the camera and position them at the desirable pixels. our system then computes the optimal collineation to match the projections of these rays with the corresponding pixels. experiments demonstrate that our system robustly corrects complex distortions without acquiring the scene geometry, and the resulting images appear nearly undistorted. multi-camera people tracking by collaborative particle filters and principal axis-based integration. this paper presents a novel approach to tracking people in multiple cameras. a target is tracked not only in each camera but also in the ground plane by individual particle filters. these particle filters collaborate in two different ways. first, the particle filters in each camera pass messages to those in the ground plane where the multicamera information is integrated by intersecting the targets' principal axes. this largely relaxes the dependence on precise foot positions when mapping targets from images to the ground plane using homographies. secondly, the fusion results in the ground plane are then incorporated by each camera as boosted proposal functions. a mixture proposal function is composed for each tracker in a camera by combining an independent transition kernel and the boosted proposal function. experiments show that our approach achieves more reliable results using less computational resources than conventional methods. machine vision in early days: japan's pioneering contributions. the history of machine vision started in the mid-1960s by the efforts of japanese industry researchers. a variety of prominent vision-based systems was made possible by creating and evolving real-time image processing techniques, and was applied to factory automation, office automation, and even to social automation during the 1970-2000 period. in this article, these historical attempts are briefly explained to promote understanding of the pioneering efforts that opened the door and formed the bases of today's computer vision research. how marginal likelihood inference unifies entropy, correlation and snr-based stopping in nonlinear diffusion scale-spaces. iterative smoothing algorithms are frequently applied in image restoration tasks. the result depends crucially on the optimal stopping (scale selection) criteria. an attempt is made towards the unification of the two frequently applied model selection ideas: (i) the earliest time when the 'entropy of the signal' reaches its steady state, suggested by j. sporring and j. weickert (1999), and (ii) the time of the minimal 'correlation' between the diffusion outcome and the noise estimate, investigated by p. mrázek and m. navara (2003). it is shown that both ideas are particular cases of the marginal likelihood inference. better entropy measures are discovered and their connection to the generalized signal-to-noise ratio is emphasized. pedestrian detection using global-local motion patterns. we propose a novel learning strategy called global-local motion pattern classification (glmpc) to localize pedestrian-like motion patterns in videos. instead of modeling such patterns as a single class that alone can lead to high intra-class variability, three meaningful partitions are considered - left, right and frontal motion. an adaboost classifier based on the most discriminative eigenflow weak classifiers is learnt for each of these subsets separately. furthermore, a linear three-class svm classifier is trained to estimate the global motion direction. to detect pedestrians in a given image sequence, the candidate optical flow sub-windows are tested by estimating the global motion direction followed by feeding to the matched adaboost classifier. the comparison with two baseline algorithms including the degenerate case of a single motion class shows an improvement of 37% in false positive rate. flea, do you remember me? the ability to detect and recognize individuals is essential for an autonomous robot interacting with humans even if computational resources are usually rather limited. in general a small user group can be assumed for interaction. the robot has to distinguish between multiple users and further on between known and unknown persons. for solving this problem we propose an approach which integrates detection, recognition and tracking by formulating all tasks as binary classification problems. because of its efficiency it is well suited for robots or other systems with limited resources but nevertheless demonstrates robustness and comparable results to state-of-the-art approaches. we use a common over-complete representation which is shared by the different modules. by means of the integral data structure an efficient feature computation is performed enabling the usage of this system for real-time applications such as for our autonomous robot flea. optimal algorithms in multiview geometry. this is a survey paper summarizing recent research aimed at finding guaranteed optimal algorithms for solving problems in multiview geometry. many of the traditional problems in multiview geometry now have optimal solutions in terms of minimizing residual imageplane error. success has been achieved in minimizing l2 (least-squares) or l∞ (smallest maximum error) norm. the main methods involve second order cone programming, or quasi-convex optimization, and branch-andbound. the paper gives an overview of the subject while avoiding as far as possible the mathematical details, which can be found in the original papers. coarse-to-fine statistical shape model by bayesian inference. in this paper, we take a predefined geometry shape as a constraint for accurate shape alignment. a shape model is divided in two parts: fixed shape and active shape. the fixed shape is a user-predefined simple shape with only a few landmarks which can be easily and accurately located by machine or human. the active one is composed of many landmarks with complex shape contour. when searching an active shape, pose parameter is calculated by the fixed shape. bayesian inference is introduced to make the whole shape more robust to local noise generated by the active shape, which leads to a compensation factor and a smooth factor for a coarse-to-fine shape search. this method provides a simple and stable means for online and offline shape analysis. experiments on cheek and face contour demonstrate the effectiveness of our proposed approach. finding camera overlap in large surveillance networks. recent research on video surveillance across multiple cameras has typically focused on camera networks of the order of 10 cameras. in this paper we argue that existing systems do not scale to a network of hundreds, or thousands, of cameras. we describe the design and deployment of an algorithm called exclusion that is specifically aimed at finding correspondence between regions in cameras for large camera networks. the information recovered by exclusion can be used as the basis for other surveillance tasks such as tracking people through the network, or as an aid to human inspection. we have run this algorithm on a campus network of over 100 cameras, and report on its performance and accuracy over this network. multiview pedestrian detection based on vector boosting. in this paper, a multiview pedestrian detection method based on vector boosting algorithm is presented. the extended histograms of oriented gradients (ehog) features are formed via dominant orientations in which gradient orientations are quantified into several angle scales that divide gradient orientation space into a number of dominant orientations. blocks of combined rectangles with their dominant orientations constitute the feature pool. the vector boosting algorithm is used to learn a tree-structure detector for multiview pedestrian detection based on ehog features. further a detector pyramid framework over several pedestrian scales is proposed for better performance. experimental results are reported to show its high performance. a noise-insensitive object tracking algorithm. in this paper, we brought out a noise-insensitive pixel-wise object tracking algorithm whose kernel is a new reliable data grouping algorithm that introduces the reliability evaluation into the existing k-means clustering (called as rk-means clustering). the rk-means clustering concentrates on two problems of the existing k-mean clustering algorithm: 1) the unreliable clustering result when the noise data exists; 2) the bad/wrong clustering result caused by the incorrectly assumed number of clusters. the first problem is solved by evaluating the reliability of classifying an unknown data vector according to the triangular relationship among it and its two nearest cluster centers. noise data will be ignored by being assigned low reliability. the second problem is solved by introducing a new group merging method that can delete pairs of "too near" data groups by checking their variance and average reliability, and then combining them together. we developed a video-rate object tracking system (called as rk-means tracker) with the proposed algorithm. the extensive experiments of tracking various objects in cluttered environments confirmed its effectiveness and advantages. calibrating pan-tilt cameras with telephoto lenses. pan-tilt cameras are widely used in surveillance networks. these cameras are often equipped with telephoto lenses to capture objects at a distance. such a camera makes full-metric calibration more difficult since the projection with a telephoto lens is close to orthographic. this paper discusses the problems caused by pan-tilt cameras with long focal length and presents a method to improve the calibration accuracy. experiments show that our method reduces the re-projection errors by an order of magnitude compared to popular homography-based approaches. camera calibration from silhouettes under incomplete circular motion with a constant interval angle. in this paper, we propose an algorithm for camera calibration from silhouettes under circular motion with an unknown constant interval angle. unlike previous silhouette-based methods based on surface of revolution, the proposed algorithm can be applied to sparse and incomplete image sequences. under the assumption of circular motion with a constant interval angle, epipoles of successive image pairs remain constant and can be determined from silhouettes. a pair of epipoles formed by a certain interval angle can provide a constraint on the angle and focal length. with more pairs of epipoles recovered, the focal length can be determined from the one that most satisfies the constraints and determine the interval angle concurrently. the rest of camera parameters can be recovered from image invariants. finally, the estimated parameters are optimized by minimizing the epipolar tangency constraints. experimental results on both synthetic and real images are shown to demonstrate its performance. hand posture estimation in complex backgrounds by considering mis-match of model. this paper proposes a novel method of estimating 3-d hand posture from images observed in complex backgrounds. conventional methods often cause mistakes by mis-matches of local image features. our method considers possibility of the mis-match between each posture model appearance and the other model appearances in a baysian stochastic estimation form by introducing a novel likelihood concept "mistakenly matching likelihood (mml)". the correct posture model is discriminated from mis-matches by mml-based posture candidate evaluation. the method is applied to hand tracking problem in complex backgrounds and its effectiveness is shown. learning generative models for monocular body pose estimation. we consider the problem of monocular 3d body pose tracking from video sequences. this task is inherently ambiguous. we propose to learn a generative model of the relationship of body pose and image appearance using a sparse kernel regressor. within a particle filtering framework, the potentially multimodal posterior probability distributions can then be inferred. the 2d bounding box location of the person in the image is estimated along with its body pose. body poses are modelled on a low-dimensional manifold, obtained by lle dimensionality reduction. in addition to the appearance model, we learn a prior model of likely body poses and a nonlinear dynamical model, making both pose and bounding box estimation more robust. the approach is evaluated on a number of challenging video sequences, showing the ability of the approach to deal with low-resolution images and noise. a theoretical approach to construct highly discriminative features with application in adaboost. adaboost is a practical method of real-time face detection, but abides by a crucial problem of overfitting for the big number of features used in a trained classifier due to the weak discriminative abilities of these features. this paper proposes a theoretical approach to construct highly discriminative features, which is named composed features, from haar-like features. both of the composed and haar-like features are employed to train a multi-view face detector. the primary experiments show promising results in reducing the number of features used in a classifier, which leads to the increase of the generalization ability of the classifier. 3d intrusion detection system with uncalibrated multiple cameras. in this paper, we propose a practical intrusion detection system using uncalibrated multiple cameras. our algorithm combines the contour based multi-planar visual hull method and a projective reconstruction method. to set up the detection system, no advance knowledge or calibration is necessary. a user can specify points in the scene directly with a simple colored marker, and the system automatically generates a restricted area as the convex hull of all specified points. to detect an intrusion, the system computes intersections of an object and each sensitive plane, which is the boundary of the restricted area, by projecting an object silhouette from each image to the sensitive plane using 2d homography. when an object exceeds one sensitive plane, the projected silhouettes from all cameras must have some common regions. therefore, the system can detect intrusion by any object with an arbitrary shape without reconstruction of the 3d shape of the object. optical flow-driven motion model with automatic variance adjustment for adaptive tracking. we propose a statistical motion model for sequential bayesian tracking, called the optical flow-driven motion model, and show an adaptive particle filter algorithm with the motion model. it predicts the current state with the help of optical flows, i.e., it explores the state space with information based on the current and previous images of an image sequence. in addition, we introduce an automatic method for adjusting the variance of the motion model, which parameter is manually determined in most particle filters. in experiments with synthetic and real image sequences, we compare the proposed motion model with a random walk model, which is a widely used model for tracking, and show the proposed model outperform the random walk model in terms of accuracy even though their execution times are almost the same. gesture recognition under small sample size. this paper addresses gesture recognition under small sample size, where direct use of traditional classifiers is difficult due to high dimensionality of input space.we propose a pairwise feature extraction method of video volumes for classification. the method of canonical correlation analysis is combined with the discriminant functions and scale-invariant-feature-transform (sift) for the discriminative spatiotemporal features for robust gesture recognition. the proposed method is practically favorable as it works well with a small amount of training samples, involves few parameters, and is computationally efficient. in the experiments using 900 videos of 9 hand gesture classes, the proposed method notably outperformed the classifiers such as support vector machine/relevance vector machine, achieving 85% accuracy. robust foreground extraction technique using gaussian family model and multiple thresholds. we propose a robust method to extract silhouettes of foreground objects from color video sequences. to cope with various changes in the background, the background is modeled as generalized gaussian family of distributions and updated by the selective running average and static pixel observation. all pixels in the input video image are classified into four initial regions using background subtraction with multiple thresholds, after which shadow regions are eliminated using color components. the final foreground silhouette is extracted by refining the initial region using morphological processes. we have verified that the proposed algorithm works very well in various background and foreground situations through experiments. detecting and segmenting un-occluded items by actively casting shadows. we present a simple and practical approach for segmenting unoccluded items in a scene by actively casting shadows. by 'items', we refer to objects (or part of objects) enclosed by depth edges. our approach utilizes the fact that under varying illumination, un-occluded items will cast shadows on occluded items or background, but will not be shadowed themselves. we employ an active illumination approach by taking multiple images under different illumination directions, with illumination source close to the camera. our approach ignores the texture edges in the scene and uses only the shadow and silhouette information to determine the occlusions. we show that such a segmentation does not require the estimation of a depth map or 3d information, which can be cumbersome, expensive and often fails due to the lack of texture and presence of specular objects in the scene. our approach can handle complex scenes with self-shadows and specularities. results on several real scenes along with the analysis of failure cases are presented. qualitative and quantitative behaviour of geometrical pdes in image processing. we analyse a series of approaches to evolve images. it is motivated by combining gaussian blurring, the mean curvature motion (used for denoising and edge-preserving), and maximal blurring (used for inpainting). we investigate the generalised method using the combination of second order derivatives in terms of gauge coordinates. for the qualitative behaviour, we derive a solution of the pde series and mention its properties briefly. relations with general diffusion equations are discussed. quantitative results are obtained by a novel implementation whose stability and convergence is analysed. the practical results are visualised on a real-life image, showing the expected qualitative behaviour. when a constraint is added that penalises the distance of the results to the input image, one can vary the desired amount of blurring and denoising. pose-invariant facial expression recognition using variable-intensity templates. in this paper, we propose a method for pose-invariant facial expression recognition from monocular video sequences. the advantage of our method is that, unlike existing methods, our method uses a simple model, called the variable-intensity template, for describing different facial expressions. this makes it possible to prepare a model for each person with very little time and effort. variable-intensity templates describe how the intensities of multiple points, defined in the vicinity of facial parts, vary with different facial expressions. by using this model in the framework of a particle filter, our method is capable of estimating facial poses and expressions simultaneously. experiments demonstrate the effectiveness of our method. a recognition rate of over 90% is achieved for all facial orientations, horizontal, vertical, and in-plane, in the range of ±40 degrees, ±20 degrees, and ±40 degrees from the frontal view, respectively. efficient search in document image collections. this paper presents an efficient indexing and retrieval scheme for searching in document image databases. in many non-european languages, optical character recognizers are not very accurate. word spotting - word image matching - may instead be used to retrieve word images in response to a word image query. the approaches used for word spotting so far, dynamic time warping and/or nearest neighbor search, tend to be slow. here indexing is done using locality sensitive hashing (lsh) - a technique which computes multiple hashes - using word image features computed at word level. efficiency and scalability is achieved by content-sensitive hashing implemented through approximate nearest neighbor computation. we demonstrate that the technique achieves high precision and recall (in the 90% range), using a large image corpus consisting of seven kalidasa's (a well known indian poet of antiquity) books in the telugu language. the accuracy is comparable to using dynamic time warping and nearest neighbor search while the speed is orders of magnitude better - 20000 word images can be searched in milliseconds. texture-independent feature-point matching (tifm) from motion coherence. this paper proposes a novel and efficient feature-point matching algorithm for finding point correspondences between two uncalibrated images. the striking feature of the proposed algorithm is that the algorithm is based on the motion coherence/smoothness constraint only, which states that neighboring features in an image tend to move coherently. in the algorithm, the correspondences of feature points in a neighborhood are collectively determined in a way such that the smoothness of the local motion field is maximized. the smoothness constraint does not rely on any image feature, and is self-contained in the motion field. it is robust to the camera motion, scene structure, illumination, etc. this makes the proposed algorithm texture-independent and robust. experimental results show that the proposed method outperforms existing methods for feature-point tracking in image sequences. mapaco-training: a novel online learning algorithm of behavior models. the traditional co-training algorithm, which needs a great number of unlabeled examples in advance and then trains classifiers by iterative learning approach, is not suitable for online learning of classifiers. to overcome this barrier, we propose a novel semi-supervised learning algorithm, called mapaco-training, by combining the cotraining with the principle of maximum a posteriori adaptation. this mapaco-training algorithm is an online multi-class learning algorithm, and has been successfully applied to online learning of behaviors modeled by hidden markov model. the proposed algorithm is tested with the li's database as well as schuldt's dataset. cardiac motion estimation from tagged mri using 3d-harp and nurbs volumetric model. concerning analysis of tagged cardiac mr images, harmonic phase (harp) is a promising technique with the largest potential for clinical use in terms of rapidity and automation without tags detection and tracking. however, it is usually applied to 2d images and only provides "apparent motion" information. in this paper, harp is integrated with a nonuniform rational b-spline (nurbs) volumetric model to densely reconstruct 3d motion of left ventricle (lv). the nurbs model represents anatomy of lv compactly, and displacement information that harp provides within short-axis and long-axis images drives the model to deform. after estimating the motion at each phase, we smooth the nurbs models temporally to achieve a 4d continuous time-varying representation of lv motion. experimental results on in vivo data show that the proposed strategy could estimate 3d motion of lv rapidly and effectively benefiting from both harp and nurbs model. task scheduling in large camera networks. camera networks are increasingly being deployed for security. in most of these camera networks, video sequences are captured, transmitted and archived continuously from all cameras, creating enormous stress on available transmission bandwidth, storage space and computing facilities. we describe an intelligent control system for scheduling pan-tilt-zoom cameras to capture video only when task-specific requirements can be satisfied. these videos are collected in real time during predicted temporal "windows of opportunity". we present a scalable algorithm that constructs schedules in which multiple tasks can possibly be satisfied simultaneously by a given camera. we describe two scheduling algorithms: a greedy algorithm and another based on dynamic programming (dp). we analyze their approximation factors and present simulations that show that the dp method is advantageous for large camera networks in terms of task coverage. results from a prototype real time active camera system however reveal that the greedy algorithm performs faster than the dp algorithm, making it more suitable for a real time system. the prototype system, built using existing low-level vision algorithms, also illustrates the applicability of our algorithms. a local probabilistic prior-based active contour model for brain mr image segmentation. this paper proposes a probabilistic prior-based active contour model for segmenting human brain mr images. our model is formulated with the maximum a posterior (map) principle and implemented under the level set framework. probabilistic atlas for the structure of interest, e.g., cortical gray matter or caudate nucleus, can be seamlessly integrate into the level set evolution procedure to provide crucial guidance in accurately capturing the target. unlike other region-based active contour models, our solution uses locally varying gaussians to account for intensity inhomogeneity and local variations existing in many mr images are better handled. experiments conducted on whole brain as well as caudate segmentation demonstrate the improvement made by our model. an active multi-camera motion capture for face, fingers and whole body. this paper explores a novel endeavor of deploying only four active-tracking cameras and fundamental vision-based technologies for 3d motion capture of a full human body figure, which includes facial expression, motion of fingers of both hands and a whole body. the proposed methods suggest alternatives to extract motion parameters of the mentioned body parts from four single-view image sequences. the proposed ellipsoidal model- and flow-based facial expression motion capture solution tackles both 3d head pose and non-rigid facial motion effectively and we observe that a set of 22 self-defined feature points suffice the expression representation. the body figure and fingers motion capture is solved with a combination of articulated model and flow-based methods. exploiting inter-frame correlation for fast video to reference image alignment. strong temporal correlation between adjacent frames of a video signal has been successfully exploited in standard video compression algorithms. in this work, we show that the temporal correlation in a video signal can also be used for fast video to reference image alignment. to this end, we first divide the input video sequence into groups of pictures (gops). then for each gop, only one frame is completely correlated with the reference image, while for the remaining frames, upper and lower bounds on the correlation coefficient (ρ) are calculated. these newly proposed bounds are significantly tighter than the existing cauchy-schwartz inequality based bounds on ρ. these bounds are used to eliminate majority of the search locations and thus resulting in significant speedup, without effecting the value or location of the global maxima. in our experiments, up to 80% search locations are found to be eliminated and the speedup is up to five times the fft based implementation and up to seven times the spatial domain techniques. a family of quadratic snakes for road extraction. the geographic information system industry would benefit from flexible automated systems capable of extracting linear structures from satellite imagery. quadratic snakes allow global interactions between points along a contour, and are well suited to segmentation of linear structures such as roads. however, a single quadratic snake is unable to extract disconnected road networks and enclosed regions. we propose to use a family of cooperating snakes, which are able to split, merge, and disappear as necessary. we also propose a preprocessing method based on oriented filtering, thresholding, canny edge detection, and gradient vector flow (gvf) energy. we evaluate the performance of the method in terms of precision and recall in comparison to ground truth data. the family of cooperating snakes consistently outperforms a single snake in a variety of road extraction tasks, and our method for obtaining the gvf is more suitable for road extraction tasks than standard methods. automated removal of partial occlusion blur. this paper presents a novel, automated method to remove partial occlusion from a single image. in particular, we are concerned with occlusions resulting from objects that fall on or near the lens during exposure. for each such foreground object, we segment the completely occluded region using a geometric flow. we then look outward from the region of complete occlusion at the segmentation boundary to estimate the width of the partially occluded region. once the area of complete occlusion and width of the partially occluded region are known, the contribution of the foreground object can be removed. we present experimental results which demonstrate the ability of this method to remove partial occlusion with minimal user interaction. the result is an image with improved visibility in partially occluded regions, which may convey important information or simply improve the image's aesthetics. hierarchical learning of dominant constellations for object class recognition. the importance of spatial configuration information for object class recognition is widely recognized. single isolated local appearance codes are often ambiguous. on the other hand, object classes are often characterized by groups of local features appearing in a specific spatial structure. learning these structures can provide additional discriminant cues and boost recognition performance. however, the problem of learning such features automatically from raw images remains largely uninvestigated. in contrast to previous approaches which require accurate localization and segmentation of objects to learn spatial information, we propose learning by hierarchical voting to identify frequently occurring spatial relationships among local features directly from raw images. the method is resistant to common geometric perturbations in both the training and test data. we describe a novel representation developed to this end and present experimental results that validate its efficacy by demonstrating the improvement in class recognition results realized by including the additional learned information. multistrategical approach in visual learning. in this paper, we propose a novel visual learning framework to develop flexible and accurate object recognition methods. currently, most of visual learning based recognition methods adopt the monostrategy learning framework using a single feature. however, the real-world objects are so complex that it is quite difficult for monostrategy method to correctly classify them. thus, utilizing a wide variety of features is required to precisely distinguish them. in order to utilize various features, we propose multistrategical visual learning by integrating multiple visual learners. in our method, multiple visual learners are collaboratively trained. specifically, a visual learner l intensively learns the examples misclassified by the other visual learners. instead, the other visual learners learn the examples misclassified by l. as a result, a powerful object recognition method can be developed by integrating various visual learners even if they have mediocre recognition performance. attention monitoring for music contents based on analysis of signal-behavior structures. in this paper, we propose a method to estimate user attention to displayed content signals with temporal analysis of their exhibited behavior. detecting user attention and controlling contents are key issues in our "networked interaction therapy system" that effectively attracts the attention of memory-impaired people. in our proposed method, user behavior, including body motions (beat actions), is detected with auditory/vision-based methods. this design is based on our observations of the behavior of memory-impaired people under video watching conditions. user attention to the displayed content is then estimated based on body motions synchronized to auditorial signals. estimated attention levels can be used for content control to attract deeper attention of viewers to the display system. experimental results suggest that the proposed method effectively extracts user attention to musical signals. eye-gaze detection from monocular camera image using parametric template matching. in the coming ubiquitous-computing society, an eyegaze interface will be one of the key technologies as an input device. most of the conventional eyegaze tracking algorithms require specific light sources, equipments, devices, etc. in a previous work, the authors developed a simple eye-gaze detection system using a monocular video camera. this paper proposes a fast eye-gaze detection algorithm using the parametric template matching. in our algorithm, the iris extraction by the parametric template matching is applied to the eye-gaze detection based on physiological eyeball model. the parametric template matching can carry out an accurate sub-pixel matching by interpolating a few template images of a user's eye captured in the calibration process for personal error. so, a fast calculation can be realized with keeping the detection accuracy. we construct an eye-gaze communication interface using the proposed algorithm, and verified the performance through key typing experiments using visual keyboard on display. improved background mixture models for video surveillance applications. background subtraction is a method commonly used to segment objects of interest in image sequences. by comparing new frames to a background model, regions of interest can be found. to cope with highly dynamic and complex environments, a mixture of several models has been proposed. this paper proposes an update of the popular mixture of gaussian models technique. experimental analysis shows a lack of this technique to cope with quick illumination changes. a different matching mechanism is proposed to improve the general robustness and a comparison with related work is given. finally, experimental results are presented to show the gain of the updated technique, according to the standard scheme and the related techniques. fragments based parametric tracking. the paper proposes a parametric approach for color based tracking. the method fragments a multimodal color object into multiple homogeneous, unimodal, fragments. the fragmentation process consists of multi level thresholding of the object color space followed by an assembling. each homogeneous region is then modelled using a single parametric distribution and the tracking is achieved by fusing the results of the multiple parametric distributions. the advantage of the method lies in tracking complex objects with partial occlusions and various deformations like non-rigid, orientation and scale changes. we evaluate the performance of the proposed approach on standard and challenging real world datasets. less is more: coded computational photography. computational photography combines plentiful computing, digital sensors, modern optics, actuators, and smart lights to escape the limitations of traditional cameras, enables novel imaging applications and simplifies many computer vision tasks. however, a majority of current computational photography methods involve taking multiple sequential photos by changing scene parameters and fusing the photos to create a richer representation. the goal of coded computational photography is to modify the optics, illumination or sensors at the time of capture so that the scene properties are encoded in a single (or a few) photographs. we describe several applications of coding exposure, aperture, illumination and sensing and describe emerging techniques to recover scene parameters from coded photographs. tracking and classifying of human motions with gaussian process annealed particle filter. this paper presents a framework for 3d articulated human body tracking and action classification. the method is based on nonlinear dimensionality reduction of high dimensional data space to low dimensional latent space. motion of human body is described by concatenation of low dimensional manifolds which characterize different motion types. we introduce a body pose tracker, which uses the learned mapping function from low dimensional latent space to high dimensional body pose space. the trajectories in the latent space provide low dimensional representations of body poses performed during motion. they are used to classify human actions. the approach was checked on humaneva dataset as well as on our own one. the results and the comparison to other methods are presented. mirror localization for catadioptric imaging system by observing parallel light pairs. this paper describes a method of mirror localization to calibrate a catadioptric imaging system. while the calibration of a catadioptric system includes the estimation of various parameters, we focus on the localization of the mirror. the proposed method estimates the position of the mirror by observing pairs of parallel lights, which are projected from various directions. although some earlier methods for calibrating catadioptric systems assume that the system is single viewpoint, which is a strong restriction on the position and shape of the mirror, our method does not restrict the position and shape of the mirror. since the constraint used by the proposed method is that the relative angle of two parallel lights is constant with respect to the rigid transformation of the imaging system, we can omit both the translation and rotation between the camera and calibration objects from the parameters to be estimated. therefore, the estimation of the mirror position by the proposed method is independent of the extrinsic parameters of a camera. we compute the error between the model of the mirror and the measurements, and then estimate the position of the mirror by minimizing this error. we test our method using both simulation and real experiments, and evaluate the accuracy thereof. automated billboard insertion in video. the paper proposes an approach to superimpose virtual contents for advertising in an existing image sequence with no or minimal user interaction. our approach automatically recognizes planar surfaces in the scene over which a billboard can be inserted for seamless display to the viewers. the planar surfaces are segmented in the image frame using a homography dependent scheme. in each of the segmented planar regions, a rectangle with the largest area is located to superimpose a billboard into the original image sequence. it can also provide a viewing index based on the occupancy of the virtual real estate for charging the advertiser. gait identification based on multi-view observations using omnidirectional camera. we propose a method of gait identification based on multiview gait images using an omnidirectional camera. we first transform omnidirectional silhouette images into panoramic ones and obtain a spatio-temporal gait silhouette volume (gsv). next, we extract frequency-domain features by fourier analysis based on gait periods estimated by autocorrelation of the gsvs. because the omnidirectional camera makes it possible to observe a straight-walking person from various views, multiview features can be extracted from the gsvs composed of multi-view images. in an identification phase, distance between a probe and a gallery feature of the same view is calculated, and then these for all views are integrated for matching. experiments of gait identification including 15 subjects from 5 views demonstrate the effectiveness of the proposed method. high dynamic range scene realization using two complementary images. many existing tone reproduction schemes are based on the use of a single high dynamic range (hdr) image and are therefore unable to accurately recover the local details and colors of the scene due to the limited information available. accordingly, the current study develops a novel tone reproduction system which utilizes two images with different exposures to capture both the local details and color information of the low- and high-luminance regions of a scene. by computing the local region of each pixel, whose radius is determined via an iterative morphological erosion process, the proposed system implements a pixel-wise local tone mapping module which compresses the luminance range and enhances the local contrast in the low-exposure image. and a local color mapping module is applied to capture the precise color information from the high-exposure image. subsequently, a fusion process is then performed to fuse the local tone mapping and color mapping results to generate highly realistic reproductions of hdr scenes. human pose estimation from volume data and topological graph database. this paper proposes a novel volume-based motion capture method using a bottom-up analysis of volume data and an example topology database of the human body. by using a two-step graph matching algorithm with many example topological graphs corresponding to postures that a human body can take, the proposed method does not require any initial parameters or iterative convergence processes, and it can solve the changing topology problem of the human body. first, three-dimensional curved lines (skeleton) are extracted from the captured volume data using the thinning process. the skeleton is then converted into an attributed graph. by using a graph matching algorithm with a large amount of example data, we can identify the body parts from each curved line in the skeleton. the proposed method is evaluated using several video sequences of a single person and multiple people, and we can confirm the validity of our approach. non-parametric background and shadow modeling for object detection. we propose a fast algorithm to estimate background models using parzen density estimation in non-stationary scenes. each pixel has a probability density which approximates pixel values observed in a video sequence. it is important to estimate a probability density function fast and accurately. in our approach, the probability density function is partially updated within the range of the window function based on the observed pixel value. the model adapts quickly to changes in the scene and foreground objects can be robustly detected. in addition, applying our approach to cast-shadow modeling, we can detect moving cast shadows. several experiments show the effectiveness of our approach. on-line ensemble svm for robust object tracking. in this paper, we present a novel visual object tracking algorithm based on ensemble of linear svm classifiers. there are two main contributions in this paper. first of all, we propose a simple yet effective way for on-line updating linear svm classifier, where useful "key frames" of target are automatically selected as support vectors. secondly, we propose an on-line ensemble svm tracker, which can effectively handle target appearance variation. the proposed algorithm makes better usage of history information, which leads to better discrimination of target and the surrounding background. the proposed algorithm is tested on many video clips including some public available ones. experimental results show the robustness of our proposed algorithm, especially under large appearance change during tracking. road sign detection using eigen color. this paper presents a novel color-based method to detect road signs directly from videos. a road sign usually has specific colors and high contrast to its background. traditional color-based approaches need to train different color detectors for detecting road signs if their colors are different. this paper presents a novel color model derived from karhunen-loeve (kl) transform to detect road sign color pixels from the background. the proposed color transform model is invariant to different perspective effects and occlusions. furthermore, only one color model is needed to detect various road signs. after transformation into the proposed color space, a rbf (radial basis function) network is trained for finding all possible road sign candidates. then, a verification process is applied to these candidates according to their edge maps. due to the filtering effect and discriminative ability of the proposed color model, different road signs can be very efficiently detected from videos. experiment results have proved that the proposed method is robust, accurate, and powerful in road sign detection. logical dp matching for detecting similar subsequence. a logical dynamic programming (dp) matching algorithm is proposed for extracting similar subpatterns from two sequential patterns. in the proposed algorithm, local similarity between two patterns is measured by a logical function, called support. the dp matching with the support can extract all similar subpatterns simultaneously while compensating nonlinear fluctuation. the performance of the proposed algorithm was evaluated qualitatively and quantitatively via an experiment of extracting motion primitives, i.e., common subpatterns in gesture patterns of different classes. multi-view gymnastic activity recognition with fused hmm. more and more researchers focus their studies on multiview activity recognition, because a fixed view could not provide enough information for recognition. in this paper, we use multi-view features to recognize six kinds of gymnastic activities. firstly, shape-based features are extracted from two orthogonal cameras in the form of r transform. then a multi-view approach based on fused hmm is proposed to combine different features for similar gymnastic activity recognition. compared with other activity models, our method achieves better performance even in the case of frame loss. adaboost learning for human detection based on histograms of oriented gradients. we developed a novel learning-based human detection system, which can detect people having different sizes and orientations, under a wide variety of backgrounds or even with crowds. to overcome the affects of geometric and rotational variations, the system automatically assigns the dominant orientations of each block-based feature encoding by using the rectangular- and circular-type histograms of orientated gradients (hog), which are insensitive to various lightings and noises at the outdoor environment. moreover, this work demonstrated that gaussian weight and tri-linear interpolation for hog feature construction can increase detection performance. particularly, a powerful feature selection algorithm, adaboost, is performed to automatically select a small set of discriminative hog features with orientation information in order to achieve robust detection results. the overall computational time is further reduced significantly without any performance loss by using the cascade-ofrejecter structure, whose hyperplanes and weights of each stage are estimated by using the adaboost approach. object detection combining recognition and segmentation. we develop an object detection method combining top-down recognition with bottom-up image segmentation. there are two main steps in this method: a hypothesis generation step and a verification step. in the top-down hypothesis generation step, we design an improved shape context feature, which is more robust to object deformation and background clutter. the improved shape context is used to generate a set of hypotheses of object locations and figure-ground masks, which have high recall and low precision rate. in the verification step, we first compute a set of feasible segmentations that are consistent with top-down object hypotheses, then we propose a false positive pruning (fpp) procedure to prune out false positives. we exploit the fact that false positive regions typically do not align with any feasible image segmentation. experiments show that this simple framework is capable of achieving both high recall and high precision with only a few positive training examples and that this method can be generalized to many object classes. discriminative mean shift tracking with auxiliary particles. we present a new approach towards efficient and robust tracking by incorporating the efficiency of the mean shift algorithm with the robustness of the particle filtering. the mean shift tracking algorithm is robust and effective when the representation of a target is sufficiently discriminative, the target does not jump beyond the bandwidth, and no serious distractions exist. in case of sudden motion, the particle filtering outperforms the mean shift algorithm at the expense of using a large particle set. in our approach, the mean shift algorithm is used as long as it provides reasonable performance. auxiliary particles are introduced to conquer the distraction and sudden motion problems when such threats are detected. moreover, discriminative features are selected according to the separation of the foreground and background distributions. we demonstrate the performance of our approach by comparing it with other trackers on challenging image sequences. efficient normalized cross correlation based on adaptive multilevel successive elimination. in this paper we propose an efficient normalized cross correlation (ncc) algorithm for pattern matching based on adaptive multilevel successive elimination. this successive elimination scheme is applied in conjunction with an upper bound for the cross correlation derived from cauchy-schwarz inequality. to apply the successive elimination, we partition the summation of cross correlation into different levels with the partition order determined by the gradient energies of the partitioned regions in the template. thus, this adaptive multi-level successive elimination scheme can be employed to early reject most candidates to reduce the computational cost. experimental results show the proposed algorithm is very efficient for pattern matching under different lighting conditions. efficient texture representation using multi-scale regions. this paper introduces an efficient way of representing textures using connected regions which are formed by coherent multi-scale over-segmentations. we show that the recently introduced covariance-based similarity measure, initially applied on rectangular windows, can be used with our newly devised, irregular structure-coherent patches; increasing the discriminative power and consistency of the texture representation. furthermore, by treating texture in multiple scales, we allow for an implicit encoding of the spatial and statistical texture properties which are persistent across scale. the meaningfulness and efficiency of the covariance based texture representation is verified utilizing a simple binary segmentation method based on min-cut. our experiments show that the proposed method, despite the low dimensional representation in use, is able to effectively discriminate textures and that its performance compares favorably with the state of the art. tracking iris contour with a 3d eye-model for gaze estimation. this paper describes a sophisticated method to track iris contour and to estimate eye gaze for blinking eyes with a monocular camera. a 3d eye-model that consists of eyeballs, iris contours and eyelids is designed that describes the geometrical properties and the movements of eyes. both the iris contours and the eyelid contours are tracked by using this eye-model and a particle filter. this algorithm is able to detect "pure" iris contours because it can distinguish iris contours from eyelids contours. the eye gaze is described by the movement parameters of the 3d eye model, which are estimated by the particle filter during tracking. other distinctive features of this algorithm are: 1) it does not require any special light sources (e.g. an infrared illuminator) and 2) it can operate at video rate. through extensive experiments on real video sequences we confirmed the robustness and the effectiveness of our method. feature management for efficient camera tracking. in dynamic scenes with occluding objects many features need to be tracked for a robust real-time camera pose estimation. an open problem is that tracking too many features has a negative effect on the real-time capability of a tracking approach. this paper proposes a method for the feature management which performs a statistical analysis of the ability to track a feature and then uses only those features which are very likely to be tracked from a current camera position. thereby a large set of features in different scales is created, where every feature holds a probability distribution of camera positions from which the feature can be tracked successfully. as only the feature points with the highest probability are used in the tracking step, the method can handle a large amount of features in different scale without losing the ability of real time performance. both the statistical analysis and the reconstruction of the features' 3d coordinates are performed online during the tracking and no preprocessing step is needed. multi-posture human detection in video frames by motion contour matching. in the paper, we proposed a method for moving human detection in video frames by motion contour matching. firstly, temporal and spatial difference of frames is calculated and contour pixels are extracted by global thresholding as the basic features. then, skeleton templates with multiple representative postures are built on these features to represent multiposture human contours. in the detection procedure, a dynamic programming algorithm is adopted to find best global match between the built templates and with extracted contour features. finally a thresholding method is used to classify a matching result into moving human or negatives. and in the matching process scale problem and interpersonal contour difference are considered. experiments on real video data prove the effectiveness of the proposed method. measurement of reflection properties in ancient japanese drawing ukiyo-e. ukiyo-e is one famous traditional woodblock type japanese drawing. some pattern printed by special print techniques can only be seen from some special direction. this phenomenon relate to the reflection properties on the surface of ukiyo-e. in this paper, we propose a method to measure these reflection properties of ukiyo-e. fitstly, the normal on the surface and the direction of the fiber in japanese paper are computed from photos which are taken by a measuring machine named ogm. then, fit the reflection model to the measured data and the reflection properties of ukiyo-e can be obtained. based on these parameters, the the appearance of ukiyo-e can be rendered on real-time. camera calibration using principal-axes aligned conics. the projective geometric properties of two principal-axes aligned (paa) conics in a model plane are investigated in this paper by utilized the generalized eigenvalue decomposition (ged). we demonstrate that one constraint on the image of the absolute conic (iac) can be obtained from a single image of two paa conics even if their parameters are unknown. and if the eccentricity of one of the two conics is given, two constraints on the iac can be obtained. an important merit of the algorithm using paa is that it can be employed to avoid the ambiguities when estimating extrinsic parameters in the calibration algorithms using concentric circles. we evaluate the characteristics and robustness of the proposed algorithm in experiments with synthetic and real data. color constancy via convex kernel optimization. this paper introduces a novel convex kernel based method for color constancy computation with explicit illuminant parameter estimation. a simple linear render model is adopted and the illuminants in a new scene that contains some of the color surfaces seen in the training image are sequentially estimated in a global optimization framework. the proposed method is fully data-driven and initialization invariant. nonlinear color constancy can also be approximately solved in this kernel optimization framework with piecewise linear assumption. extensive experiments on real-scene images validate the practical performance of our method. kernel-bayesian framework for object tracking. this paper proposes a general kernel-bayesian framework for object tracking. in this framework, the kernel based method--mean shift algorithm is embedded into the bayesian framework seamlessly to provide a heuristic prior information to the state transition model, aiming at effectively alleviating the heavy computational load and avoiding sample degeneracy suffered by the conventional bayesian trackers. moreover, the tracked object is characterized by a spatial-constraint mog (mixture of gaussians) based appearance model, which is shown more discriminative than the traditional mog based appearance model. meantime, a novel selective updating technique for the appearance model is developed to accommodate the changes in both appearance and illumination. experimental results demonstrate that, compared with bayesian and kernel based tracking frameworks, the proposed algorithm is more efficient and effective. optimal learning high-order markov random fields priors of colour image. in this paper, we present an optimised learning algorithm for learning the parametric prior models for high-order markov random fields (mrf) of colour images. compared to the priors used by conventional low-order mrfs, the learned priors have richer expressive power and can capture the statistics of natural scenes. our proposed optimal learning algorithm is achieved by simplifying the estimation of partition function without compromising the accuracy of the learned model. the parameters in mrf colour image priors are learned alternatively and iteratively in an em-like fashion by maximising their likelihood. we demonstrate the capability of the proposed learning algorithm of high-order mrf colour image priors with the application of colour image denoising. experimental results show the superior performance of our algorithm compared to the state-of-the-art of colour image priors in [1], although we use a much smaller training image set. near-optimal mosaic selection for rotating and zooming video cameras. applying graph-theoretic concepts to solve computer vision problems makes it not only trivial to analyze the complexity of the problem at hand, but also existing algorithms from the graph-theory literature can be used to find a solution. we consider the challenging tasks of frame selection for use in mosaicing, and feature selection from computer vision, andmachine learning, respectively, and demonstrate that we can map these problems into the existing graph theory problem of finding the maximum independent set. for frame selection, we represent the temporal and spatial connectivity of the images in a video sequence by a graph, and demonstrate that the optimal subset of images to be used in mosaicing can be determined by finding the maximum independent set of the graph. this process of determining the maximum independent set, not only reduces the overhead of using all the images, which may not be significantly contributing in building the mosaic, but also implicitly solves the "camera loop-back" problem. for feature selection, we conclude that we can apply a similar mapping to the maximum independent set problem to obtain a solution. finally, to demonstrate the efficacy of our frame selection method, we build a system for mosaicing, which uses our method of frame selection. efficient registration of aerial image sequences without camera priors. we present an efficient approach for finding homographies between sequences of aerial images. we propose a two-step approach: a) initially solving for image-plane rotation and scale parameters without using correspondence (under affine assumption), and b) using these parameters to constrain the full homography search, and c) extending the results to full perspective projection. no flight meta-data, camera priors, or any other user defined information is used for the task. based on the perspective parameters estimated, the aerial images are stitched with the best matching image based on a probabilistic model, to compose a high resolution aerial image mosaic. while retaining the improved asymptotic worst-case complexity of [6], we demonstrate significant performance improvements in practice. three dimensional position measurement for maxillofacial surgery by stereo x-ray images. this paper describes a method whereby a three dimensional position inside a human body can be measured using a simple x-ray stereo image pair. because the geometry of x-ray imaging is similar to that of ordinary photography, a standard stereo vision technique can be used. however, one problem is that the x-ray source position is unknown and should be computed from the x-ray image. in addition, a reference coordinate on which the measurement is based needs to be determined. the proposed method solves these two problems using a cubic wire frame called the reference object. although three dimensional positioning for a human body is possible by computer tomography (ct), it requires expensive equipment. in contrast, the proposed method only requires ordinary x-ray photography equipment, which is inexpensive and widely available even in developing countries. the kernel orthogonal mutual subspace method and its application to 3d object recognition. this paper proposes the kernel orthogonal mutual subspace method (komsm) for 3d object recognition. komsm is a kernel-based method for classifying sets of patterns such as video frames or multiview images. it classifies objects based on the canonical angles between the nonlinear subspaces, which are generated from the image patterns of each object class by kernel pca. this methodology has been introduced in the kernel mutual subspace method (kmsm). however, komsm is different from kmsm in that nonlinear class subspaces are orthogonalized based on the framework proposed by fukunaga and koontz before calculating the canonical angles. this orthogonalization provides a powerful feature extraction method for improving the performance of kmsm. the validity of komsm is demonstrated through experiments using face images and images from a public database. video mosaicing based on structure from motion for distortion-free document digitization. this paper presents a novel video mosaicing method capable of generating a geometric distortion-free mosaic image using a hand-held camera. for a document composed of curved pages, mosaic images of virtually flattened pages are generated. the process of our method is composed of two stages : real-time stage and off-line stage. in the realtime stage, image features are automatically tracked on the input images, and the viewpoint of each image as well as the 3-d position of each image feature are estimated by a structure-from-motion technique. in the offline stage, the estimated viewpoint and 3-d position of each feature are refined and utilized to generate a geometric distortion-free mosaic image. we demonstrate our prototype system on curved documents to show the feasibility of our approach. image segmentation using co-em strategy. inspired by the idea of multi-view, we proposed an image segmentation algorithm using co-em strategy in this paper. image data are modeled using gaussian mixture model (gmm), and two sets of features, i.e. two views, are employed using co-em strategy instead of conventional single view based em to estimate the parameters of gmm. compared with the single view based gmm-em methods, there are several advantages with the proposed segmentation method using co-em strategy. first, imperfectness of single view can be compensated by the other view in the co-em. second, employing two views, co-em strategy can offer more reliability to the segmentation results. third, the drawback of local optimality for single view based em can be overcome to some extent. fourth, the convergence rate is improved. the average time is far less than single view based methods. we test the proposed method on large number of images with no specified contents. the experimental results verify the above advantages, and outperform the single view based gmm-em segmentation methods. efficiently solving the fractional trust region problem. normalized cuts has successfully been applied to a wide range of tasks in computer vision, it is indisputably one of the most popular segmentation algorithms in use today. a number of extensions to this approach have also been proposed, ones that can deal with multiple classes or that can incorporate a priori information in the form of grouping constraints. it was recently shown how a general linearly constrained normalized cut problem can be solved. this was done by proving that strong duality holds for the lagrangian relaxation of such problems. this provides a principled way to perform multi-class partitioning while enforcing any linear constraints exactly. the lagrangian relaxation requires the maximization of the algebraically smallest eigenvalue over a one-dimensional matrix sub-space. this is an unconstrained, piece-wise differentiable and concave problem. in this paper we show how to solve this optimization efficiently even for very large-scale problems. the method has been tested on real data with convincing results. sequential l norm minimization for triangulation. it has been shown that various geometric vision problems such as triangulation and pose estimation can be solved optimally by minimizing l∞ error norm. this paper proposes a novel algorithm for sequential estimation. when a measurement is given at a time instance, applying the original batch bi-section algorithm is very much inefficient because the number of seocnd order constraints increases as time goes on and hence the computational cost increases accordingly. this paper shows that, the upper and lower bounds, which are two input parameters of the bi-section method, can be updated through the time sequence so that the gap between the two bounds is kept as small as possible. furthermore, we may use only a subset of all the given measurements for the l∞ estimation. this reduces the number of constraints drastically. finally, we do not have to reestimate the parameter when the reprojection error of the measurement is smaller than the estimation error. these three provide a very fast l∞ estimation through the sequence; our method is suitable for real-time or on-line sequential processing under l∞ optimality. this paper particularly focuses on the triangulation problem, but the algorithm is general enough to be applied to any l∞ problems. transformesh : a topology-adaptive mesh-based approach to surface evolution. most of the algorithms dealing with image based 3-d reconstruction involve the evolution of a surface based on a minimization criterion. the mesh parametrization, while allowing for an accurate surface representation, suffers from the inherent problems of not being able to reliably deal with selfintersections and topology changes. as a consequence, an important number of methods choose implicit representations of surfaces, e.g. level set methods, that naturally handle topology changes and intersections. nevertheless, these methods rely on space discretizations, which introduce an unwanted precision-complexity trade-off. in this paper we explore a new mesh-based solution that robustly handles topology changes and removes self intersections, therefore overcoming the traditional limitations of this type of approaches. to demonstrate its efficiency, we present results on 3-d surface reconstruction from multiple images and compare them with state-of-the art results. analyzing the influences of camera warm-up effects on image acquisition. this article presents an investigation of the impact of camera warmup on the image acquisition process and therefore on the accuracy of segmented image features. based on an experimental study we show that the camera image is shifted to an extent of some tenth of a pixel after camera start-up. the drift correlates with the temperature of the sensor board and stops when the camera reaches its thermal equilibrium. a further study of the observed image flow shows that it originates from a slight displacement of the image sensor due to thermal expansion of the mechanical components of the camera. this sensor displacement can be modeled using standard methods of projective geometry in addition with bi-exponential decay terms to model the temporal dependence. the parameters of the proposed model can be calibrated and then used to compensate warmup effects. further experimental studies show that our method is applicable to different types of cameras and that the warm-up behaviour is characteristic for a specific camera. converting thermal infrared face images into normal gray-level images. in this paper, we address the problem of producing visible spectrum facial images as we normally see by using thermal infrared images. we apply canonical correlation analysis (cca) to extract the features, converting a many-to-many mapping between infrared and visible images into a one-to-one mapping approximately. then we learn the relationship between two feature spaces in which the visible features are inferred from the corresponding infrared features using locally-linear regression (llr) or, what is called, sophisticated lle, and a locally linear embedding (lle) method is used to recover a visible image from the inferred features, recovering some information lost in the infrared image. experiments demonstrate that our method maintains the global facial structure and infers many local facial details from the thermal infrared images. face mosaicing for pose robust video-based recognition. this paper proposes a novel face mosaicing approach to modeling human facial appearance and geometry in a unified framework. the human head geometry is approximated with a 3d ellipsoid model. multi-view face images are back projected onto the surface of the ellipsoid, and the surface texture map is decomposed into an array of local patches, which are allowed to move locally in order to achieve better correspondences among multiple views. finally the corresponding patches are trained to model facial appearance. and a deviation model obtained from patch movements is used to model the face geometry. our approach is applied to pose robust face recognition. using the cmu pie database, we show experimentally that the proposed algorithm provides better performance than the baseline algorithms. we also extend our approach to video-based face recognition and test it on the face in action database. simultaneous appearance modeling and segmentation for matching people under occlusion. we describe an approach to segmenting foreground regions corresponding to a group of people into individual humans. given background subtraction and ground plane homography, hierarchical parttemplate matching is employed to determine a reliable set of human detection hypotheses, and progressive greedy optimization is performed to estimate the best configuration of humans under a bayesian map framework. then, appearance models and segmentations are simultaneously estimated in an iterative sampling-expectation paradigm. each human appearance is represented by a nonparametric kernel density estimator in a joint spatial-color space and a recursive probability update scheme is employed for soft segmentation at each iteration. additionally, an automatic occlusion reasoning method is used to determine the layered occlusion status between humans. the approach is evaluated on a number of images and videos, and also applied to human appearance matching using a symmetric distance measure derived from the kullback-leiber divergence. crystal vision-applications of point groups in computer vision. methods from the representation theory of finite groups are used to construct efficient processing methods for the special geometries related to the finite subgroups of the rotation group. we motivate the use of these subgroups in computer vision, summarize the necessary facts from the representation theory and develop the basics of fourier theory for these geometries. we illustrate its usage for data compression in applications where the processes are (on average) symmetrical with respect to these groups. we use the icosahedral group as an example since it is the largest finite subgroup of the 3d rotation group. other subgroups with fewer group elements can be studied in exactly the same way. three-stage motion deblurring from a video. in this paper, a novel approach is proposed to remove the motion blur from a video, which is degraded and distorted by fast camera motion. our approach is based on the image statistics rather than the traditional motion estimation. the image statistics has been successfully applied for blind motion deblurring for a single image by fergus et al [3] and levin [10]. here a three-stage method is used to deal with the video. first, the "unblurred" frames in the video can be found based on the image statistics. then the blur functions can be obtained by comparing the blurred frames with the unblurred ones. finally a standard deconvolution algorithm is used to reconstruct the video. our experiments show that our algorithms are efficient. interpolation between eigenspaces using rotation in multiple dimensions. we propose a method for interpolation between eigenspaces. techniques that represent observed patterns as multivariate normal distribution have actively been developed to make it robust over observation noises. in the recognition of images that vary based on continuous parameters such as camera angles, one cause that degrades performance is training images that are observed discretely while the parameters are varied continuously. the proposed method interpolates between eigenspaces by analogy from rotation of a hyper-ellipsoid in high dimensional space. experiments using face images captured in various illumination conditions demonstrate the validity and effectiveness of the proposed interpolation method. sign recognition using constrained optimization. sign recognition has been one of the challenging problems in computer vision for years. for many sign languages, signs formed by two overlapping hands are a part of the vocabulary. in this work, an algorithm for recognizing such signs with overlapping hands is presented. two formulations are proposed for the problem. for both approaches, the input blob is converted to a graph representing the finger and palm structure which is essential for sign understanding. the first approach uses a graph subdivision as the basic framework, while the second one casts the problem to a label assignment problem and integer programming is applied for finding an optimal solution. experimental results are shown to illustrate the feasibility of our approaches. automatic range image registration using mixed integer linear programming. a coarse registration method using mixed integer linear programming (milp) is described that finds global optimal registration parameter values that are independent of the values of invariant features. we formulate the range image registration problem using milp. our algorithm using milp formulation finds the best balanced optimal registration for robustly aligning two range images with the best balanced accuracy. it adjusts the error tolerance automatically in accordance with the accuracy of the given range image data. experimental results show that this method of coarse registration is highly effective. an all-subtrees approach to unsupervised parsing. we investigate generalizations of the all-subtrees "dop" approach to unsupervised parsing. unsupervised dop models assign all possible binary trees to a set of sentences and next use (a large random subset of) all subtrees from these binary trees to compute the most probable parse trees. we will test both a relative frequency estimator for unsupervised dop and a maximum likelihood estimator which is known to be statistically consistent. we report state-of-the-art results on english (wsj), german (negra) and chinese (ctb) data. to the best of our knowledge this is the first paper which tests a maximum likelihood estimator for dop on the wall street journal, leading to the surprising result that an unsupervised parsing model beats a widely used supervised model (a treebank pcfg). spoken dialogue interpretation with the dop model. we show how the dop model can be used for fast and robust processing of spoken input in a practical spoken dialogue system called ovis. ovis, openbaar vervoer informatie systeem ("public transport information system"), is a dutch spoken language information system which operates over ordinary telephone lines. the prototype system is the immediate goal of the nwo1 priority programme "language and speech technology". in this paper, we extend the original dop model to context-sensitive interpretation of spoken input. the system we describe uses the ovis corpus (10,000 trees enriched with compositional semantics) to compute from an input word-graph the best utterance together with its meaning. dialogue context is taken into account by dividing up the ovis corpus into context-dependent subcorpora. each system question triggers a subcorpus by which the user answer is analyzed and interpreted. our experiments indicate that the context-sensitive dop model obtains better accuracy than the original model, allowing for fast and robust processing of spoken input. a bootstrapping approach to unsupervised detection of cue phrase variants. we investigate the unsupervised detection of semi-fixed cue phrases such as "this paper proposes a novel approach...1" from unseen text, on the basis of only a handful of seed cue phrases with the desired semantics. the problem, in contrast to bootstrapping approaches for question answering and information extraction, is that it is hard to find a constraining context for occurrences of semi-fixed cue phrases. our method uses components of the cue phrase itself, rather than external context, to bootstrap. it successfully excludes phrases which are different from the target semantics, but which look superficially similar. the method achieves 88% accuracy, outperforming standard bootstrapping approaches. a probabilistic corpus-driven model for lexical-functional analysis. we develop a data-oriented parsing (dop) model based on the syntactic representations of lexical-functional grammar (lfg). we start by summarizing the original dop model for tree representations and then show how it can be extended with corresponding functional structures. the resulting lfg-dop model triggers a new, corpus-based notion of grammaticality, and its probability models exhibit interesting behavior with respect to specificity and the interpretation of ill-formed strings. polynominal learnability and locality of formal grammars. we apply a complexity theoretic notion of feasible learnability called "polynomial learnability" to the evaluation of grammatical formalisms for linguistic description. we show that a novel, nontrivial constraint on the degree of "locality" of grammars allows not only context free languages but also a rich class of mildy context sensitive languages to be polynomially learnable. we discuss possible implications of this result to the theory of natural language acquisition. lexical and syntactic rules in a tree adjoining grammar. taking examples from english and french idioms, this paper shows that not only constituent structures rules but also most syntactic rules (such as topicalization, wh-question, pronominalization ...) are subject to lexical constraints (on top of syntactic, and possibly semantic, ones). we show that such puzzling phenomena are naturally handled in a 'lexicalized' formalism such as tree adjoining grammar. the extended domain of locality of tags also allows one to 'jexicalize' syntactic rules while defining them at the level of constituent structures. japanese dependency parsing using co-occurrence information and a combination of case elements. in this paper, we present a method that improves japanese dependency parsing by using large-scale statistical information. it takes into account two kinds of information not considered in previous statistical (machine learning based) parsing methods: information about dependency relations among the case elements of a verb, and information about co-occurrence relations between a verb and its case element. this information can be collected from the results of automatic dependency parsing of large-scale corpora. the results of an experiment in which our method was used to rerank the results obtained using an existing machine learning based parsing method showed that our method can improve the accuracy of the results obtained using the existing method. construct algebra: analytical dialog management. in this paper we describe a systematic approach for creating a dialog management system based on a construct algebra, a collection of relations and operations on a task representation. these relations and operations are analytical components for building higher level abstractions called dialog motivators. the dialog manager, consisting of a collection of dialog motivators, is entirely built using the construct algebra. scaling to very very large corpora for natural language disambiguation. the amount of readily available on-line text has reached hundreds of billions of words and continues to grow. yet for most core natural language tasks, algorithms continue to be optimized, tested and compared after training on corpora consisting of only one million words or less. in this paper, we evaluate the performance of different learning methods on a prototypical natural language disambiguation task, confusion set disambiguation, when trained on orders of magnitude more labeled data than has previously been used. we are fortunate that for this particular application, correctly labeled training data is free. since this will often not be the case, we examine methods for effectively exploiting very large corpora when labeled data comes at a cost. bootstrapping. this paper refines the analysis of co-training, defines and evaluates a new co-training algorithm that has theoretical justification, gives a theoretical justification for the yarowsky algorithm, and shows that co-training and the yarowsky algorithm are based on different independence assumptions. headline generation based on statistical translation. extractive summarization techniques cannot generate document summaries shorter than a single sentence, something that is often required. an ideal summarization system would understand each document and generate an appropriate summary directly from the results of that understanding. a more practical approach to this problem results in the use of an approximation: viewing summarization as a problem analogous to statistical machine translation. the issue then becomes one of generating a target document in a more concise language from a source document in a more verbose language. this paper presents results on experiments using this approach, in which statistical models of the term selection and term ordering are jointly applied to produce summaries in a style learned from a training corpus. relating probabilistic grammars and automata. both probabilistic context-free grammars (pcfgs) and shift-reduce probabilistic pushdown automata (ppdas) have been used for language modeling and maximum likelihood parsing. we investigate the precise relationship between these two formalisms, showing that, while they define the same classes of probabilistic languages, they appear to impose different inductive biases. paraphrasing with bilingual parallel corpora. previous work has used monolingual parallel corpora to extract and generate paraphrases. we show that this task can be done using bilingual parallel corpora, a much more commonly available resource. using alignment techniques from phrase-based statistical machine translation, we show how paraphrases in one language can be identified using a phrase in another language as a pivot. we define a paraphrase probability that allows paraphrases extracted from a bilingual parallel corpus to be ranked using translation probabilities, and show how it can be refined to take contextual information into account. we evaluate our paraphrase extraction and ranking methods using a set of manual word alignments, and contrast the quality with paraphrases extracted from automatic alignments. an unsupervised morpheme-based hmm for hebrew morphological disambiguation. morphological disambiguation is the process of assigning one set of morphological features to each individual word in a text. when the word is ambiguous (there are several possible analyses for the word), a disambiguation procedure based on the word context must be applied. this paper deals with morphological disambiguation of the hebrew language, which combines morphemes into a word in both agglutinative and fusional ways. we present an un-supervised stochastic model - the only resource we use is a morphological analyzer-which deals with the data sparseness problem caused by the affixational morphology of the hebrew language.we present a text encoding method for languages with affixational morphology in which the knowledge of word formation rules (which are quite restricted in hebrew) helps in the disambiguation. we adapt hmm algorithms for learning and searching this text representation, in such a way that segmentation and tagging can be learned in parallel in one step. results on a large scale evaluation indicate that this learning improves disambiguation for complex tag sets. our method is applicable to other languages with affix morphology. evaluation tool for rule-based anaphora resolution methods. in this paper we argue that comparative evaluation in anaphora resolution has to be performed using the same pre-processing tools and on the same set of data. the paper proposes an evaluation environment for comparing anaphora resolution algorithms which is illustrated by presenting the results of the comparative evaluation of three methods on the basis of several evaluation measures. evaluation of semantic clusters. semantic clusters of a domain form an important feature that can be useful for performing syntactic and semantic disambiguation. several attempts have been made to extract the semantic clusters of a domain by probabilistic or taxonomic techniques. however, not much progress has been made in evaluating the obtained semantic clusters. this paper focuses on an evaluation mechanism that can be used to evaluate semantic clusters produced by a system against those provided by human experts. processing unknown words in hpsg. the lexical acquisition system presented in this paper incrementally updates linguistic properties of unknown words inferred from their surrounding context by parsing sentences with an hpsg grammar for german. we employ a gradual, information-based concept of "unknownness" providing a uniform treatment for the range of completely known to maximally unknown lexical entries. "unknown" information is viewed as revisable information, which is either generalizable or specializable. updating takes place after parsing, which only requires a modified lexical lookup. revisable pieces of information are identified by grammar-specified declarations which provide access paths into the parse feature structure. the updating mechanism revises the corresponding places in the lexical feature structures iff the context actually provides new information. for revising generalizable information, type union is required. a worked-out example demonstrates the inferential capacity of our implemented system. a simple but useful approach to conjunct identification. this paper presents an approach to identifying conjuncts of coordinate conjunctions appearing in text which has been labelled with syntactic and semantic tags. the overall project of which this research is a part is also briefly discussed. the program was tested on a 10,000 word chapter of the merck veterinary manual. the algorithm is deterministic and domain independent and it performs relatively well on a large real-life domain. constructs not handled by the simple algorithm are also described in some detail. semi-automatic recognition of noun modifier relationships. semantic relationships among words and phrases are often marked by explicit syntactic or lexical clues that help recognize such relationships in texts. within complex nominals, however, few overt clues are available. systems that analyze such nominals must compensate for the lack of surface clues with other information. one way is to load the system with lexical semantics for nouns or adjectives. this merely shifts the problem elsewhere: how do we define the lexical semantics and build large semantic lexicons? another way is to find constructions similar to a given complex nominal, for which the relationships are already known. this is the way we chose, but it too has drawbacks. similarity is not easily assessed, similar analyzed constructions may not exist, and if they do exist, their analysis may not be appropriate for the current nominal.we present a semi-automatic system that identifies semantic relationships in noun phrases without using precoded noun or adjective semantics. instead, partial matching on previously analyzed noun phrases leads to a tentative interpretation of a new input. processing can start without prior analyses, but the early stage requires user interaction. as more noun phrases are analyzed, the system learns to find better interpretations and reduces its reliance on the user. in experiments on english technical texts the system correctly identified 60--70% of relationships automatically. towards a single proposal in spelling correction. the study presented here relies on the integrated use of different kinds of knowledge in order to improve first-guess accuracy in non-word context-sensitive correction for general unrestricted text. state of the art spelling correction systems, e.g. ispell, apart from detecting spelling errors, also assist the user by offering a set of candidate corrections that are close to the misspelled word. based on the correction proposals of ispell, we built several guessers, which were combined in different ways. firstly, we evaluated all possibilities and selected the best ones in a corpus with artificially generated typing errors. secondly, the best combinations were tested on texts with genuine spelling errors. the results for the latter suggest that we can expect automatic non-word correction for all the errors in a free running text with 80% precision and a single proposal 98% of the times (1.02 proposals on average). redundancy: helping semantic disambiguation. redundancy is a good thing, at least in a learning process. to be a good teacher you must say what you are going to say, say it, then say what you have just said. well, three times is better than one. to acquire and learn knowledge from text for building a lexical knowledge base, we need to find a source of information that states facts, and repeats them a few times using slightly different sentence structures. a technique is needed for gathering information from that source and identify the redundant information. the extraction of the commonality is an active learning of the knowledge expressed. the proposed research is based on a clustering method developed by barrière and popowich (1996) which performs a gathering of related information about a particular topic. individual pieces of information are represented via the conceptual graph (cg) formalism and the result of the clustering is a large cg embedding all individual graphs. in the present paper, we suggest that the identification of the redundant information within the resulting graph is very useful for disambiguation of the original information at the semantic level. a tool kit for lexicon building. this paper describes a set of interactive routines that can be used to create, maintain, and update a computer lexicon. the routines are available to the user as a set of commands resembling a simple operating system. the lexicon produced by this system is based on lexical-semantic relations, but is compatible with a variety of other models of lexicon structure. the lexicon builder is suitable for the generation of moderate-sized vocabularies and has been used to construct a lexicon for a small medical expert system. a future version of the lexicon builder will create a much larger lexicon by parsing definitions from machine-readable dictionaries. guided parsing of range concatenation languages. the theoretical study of the range concatenation grammar [rcg] formalism has revealed many attractive properties which may be used in nlp. in particular, range concatenation languages [rcl] can be parsed in polynomial time and many classical grammatical formalisms can be translated into equivalent rcgs without increasing their worst-case parsing time complexity. for example, after translation into an equivalent rcg, any tree adjoining grammar can be parsed in o(n6) time. in this paper, we study a parsing technique whose purpose is to improve the practical efficiency of rcl parsers. the non-deterministic parsing choices of the main parser for a language l are directed by a guide which uses the shared derivation forest output by a prior rcl parser for a suitable superset of l. the results of a practical evaluation of this method on a wide coverage english grammar are given. parsing vs. text processing in the analysis of dictionary definitions. we have analyzed definitions from webster's seventh new collegiate dictionary using sager's linguistic string parser and again using basic unix text processing utilities such as grep and awk. this paper evaluates both procedures, compares their results, and discusses possible future lines of research exploiting and combining their respective strengths. a simple hybrid aligner for generating lexical correspondences in parallel texts. we present an algorithm for bilingual word alignment that extends previous work by treating multi-word candidates on a par with single words, and combining some simple assumptions about the translation process to capture alignments for low frequency words. as most other alignment algorithms it uses cooccurrence statistics as a basis, but differs in the assumptions it makes about the translation process. the algorithm has been implemented in a modular system that allows the user to experiment with different combinations and variants of these assumptions. we give performance results from two evaluations, which compare will with results reported in the literature. modeling local coherence: an entity-based approach. this article proposes a novel framework for representing and measuring local coherence. central to this approach is the entity-grid representation of discourse, which captures patterns of entity distribution in a text. the algorithm introduced in the article automatically abstracts a text into a set of entity transition sequences and records distributional, syntactic, and referential information about discourse entities. we re-conceptualize coherence assessment as a learning task and show that our entity-based representation is well-suited for ranking-based generation and text classification tasks. using the proposed representation, we achieve good performance on text ordering, summary coherence evaluation, and readability assessment. extracting paraphrases from a parallel corpus. while paraphrasing is critical both for interpretation and generation of natural language, current systems use manual or semi-automatic methods to collect paraphrases. we present an unsupervised learning algorithm for identification of paraphrases from a corpus of multiple english translations of the same source text. our approach yields phrasal and single word lexical paraphrases as well as syntactic paraphrases. analysis of source identified text corpora: exploring the statistics of the reused text and authorship. this paper aims at providing a view of text recycled, within a short time, by the authors themselves. we first present a simple and general method for extracting reused term sequences, and then analyze several author-identified text collections to compare the statistical quantities. the ratio of recycling is also measured for each collection. finally, related research topics are introduced together with some discussion of future research directions. information fusion in the context of multi-document summarization. we present a method to automatically generate a concise summary by identifying and synthesizing similar elements across related text from a set of multiple documents. our approach is unique in its usage of language generation to reformulate the wording of the summary. resolving anaphors in embedded sentences. we propose an algorithm to resolve anaphors, tackling mainly the problem of intrasentential antecedents. we base our methodology on the fact that such antecedents are likely to occur in embedded sentences. sidner's focusing mechanism is used as the basic algorithm in a more complete approach. the proposed algorithm has been tested and implemented as a part of a conceptual analyser, mainly to process pronouns. details of an evaluation are given. generalized chart algorithm: an efficient procedure for cost-based abduction. we present an efficient procedure for cost-based abduction, which is based on the idea of using chart parsers as proof procedures. we discuss in detail three features of our algorithm --- goal-driven bottom-up derivation, tabulation of the partial results, and agenda control mechanism --- and report the results of the preliminary experiments, which show how these features improve the computational efficiency of cost-based abduction. a flexible example-based parser based on the sstc. in this paper we sketch an approach for natural language parsing. our approach is an example-based approach, which relies mainly on examples that already parsed to their representation structure, and on the knowledge that we can get from these examples the required information to parse a new input sentence. in our approach, examples are annotated with the structured string tree correspondence (sstc) annotation schema where each sstc describes a sentence, a representation tree as well as the correspondence between substrings in the sentence and subtrees in the representation tree. in the process of parsing, we first try to build subtrees for phrases in the input sentence which have been successfully found in the example-base - a bottom up approach. these subtrees will then be combined together to form a single rooted representation tree based on an example with similar representation structure - a top down approach. aspects of clause politeness in japanese: an extended inquiry semantics treatment. the inquiry semantics approach of the nigel computational systemic grammar of english has proved capable of revealing distinctions within propositional content that the text planning process needs to control in order for adequate text to be generated. an extension to the chooser and inquiry framework motivated by a japanese clause generator capable of expressing levels of politeness makes this facility available for revealing the distinctions necessary among interpersonal, social meanings also. this paper shows why the previous inquiry framework was incapable of the kind of semantic control japanese politeness requires and how the implemented extension achieves that control. an example is given of the generation of a sentence that is appropriately polite for its context of use and some implications for future work are suggested. translating named entities using monolingual and bilingual resources. named entity phrases are some of the most difficult phrases to translate because new phrases can appear from nowhere, and because many are domain specific, not to be found in bilingual dictionaries. we present a novel algorithm for translating named entity phrases using easily obtainable monolingual and bilingual resources. we report on the application and evaluation of this algorithm in translating arabic named entities to english. we also compare our results with the results obtained from human translations and a commercial system for the same task. using aggregation for selecting content when generating referring expressions. previous algorithms for the generation of referring expressions have been developed specifically for this purpose. here we introduce an alternative approach based on a fully generic aggregation method also motivated for other generation tasks. we argue that the alternative contributes to a more integrated and uniform approach to content determination in the context of complete noun phrase generation. distortion models for statistical machine translation. in this paper, we argue that n-gram language models are not sufficient to address word reordering required for machine translation. we propose a new distortion model that can be used with existing phrase-based smt decoders to address those n-gram language model limitations. we present empirical results in arabic to english machine translation that show statistically significant improvements when our proposed model is used. we also propose a novel metric to measure word order similarity (or difference) between any pair of languages based on word alignments. features and agreement. this paper compares the consistency-based account of agreement phenomena in 'unification-based' grammars with an implication-based account based on a simple feature extension to lambek categorial grammar (lcg). we show that the lcg treatment accounts for constructions that have been recognized as problematic for 'unification-based' treatments. using machine learning techniques to build a comma checker for basque. in this paper, we describe the research using machine learning techniques to build a comma checker to be integrated in a grammar checker for basque. after several experiments, and trained with a little corpus of 100,000 words, the system guesses correctly not placing commas with a precision of 96% and a recall of 98%. it also gets a precision of 70% and a recall of 49% in the task of placing commas. finally, we have shown that these results can be improved using a bigger and a more homogeneous corpus to train, that is, a bigger corpus written by one unique author. grammatical analysis by computer of the lancaster-oslo/bergen (lob) corpus of british english texts. research has been under way at the unit for computer research on the english language at the university of lancaster, england, to develop a suite of computer programs which provide a detailed grammatical analysis of the lob corpus, a collection of about 1 million words of british english texts available in machine readable form.the first phrase of the project, completed in september 1983, produced a grammatically annotated version of the corpus giving a tag showing the word class of each word token. over 93 per cent of the word tags were correctly selected by using a matrix of tag pair probabilities and this figure was upgraded by a further 3 per cent by retagging problematic strings of words prior to disambiguation and by altering the probability weightings for sequences of three tags. the remaining 3 to 4 per cent were corrected by a human post-editor.the system was originally designed to run in batch mode over the corpus but we have recently modified procedures to run interactively for sample sentences typed in by a user at a terminal. we are currently extending the word tag set and improving the word tagging procedures to further reduce manual intervention. a similar probabilistic system is being developed for phrase and clause tagging. an unsupervised system for identifying english inclusions in german text. we present an unsupervised system that exploits linguistic knowledge resources, namely english and german lexical databases and the world wide web, to identify english inclusions in german text. we describe experiments with this system and the corpus which was developed for this task. we report the classification results of our system and compare them to the performance of a trained machine learner in a series of in- and cross-domain experiments. corpus-based lexical choice in natural language generation. choosing the best lexeme to realize a meaning in natural language generation is a hard task. we investigate different tree-based stochastic models for lexical choice. because of the difficulty of obtaining a sense-tagged corpus, we generalize the notion of synonymy. we show that a tree-based model can achieve a word-bag based accuracy of 90%, representing an improvement over the baseline. jointly labeling multiple sequences: a factorial hmm approach. we present new statistical models for jointly labeling multiple sequences and apply them to the combined task of part-of-speech tagging and noun phrase chunking. the model is based on the factorial hidden markov model (fhmm) with distributed hidden states representing part-of-speech and noun phrase sequences. we demonstrate that this joint labeling approach, by enabling information sharing between tagging/chunking subtasks, out-performs the traditional method of tagging and chunking in succession. further, we extend this into a novel model, switching fhmm, to allow for explicit modeling of cross-sequence dependencies based on linguistic knowledge. we report tagging/chunking accuracies for varying dataset sizes and show that our approach is relatively robust to data sparsity. a rote extractor with edit distance-based generalisation and multi-corpora precision calculation. in this paper, we describe a rote extractor that learns patterns for finding semantic relationships in unrestricted text, with new procedures for pattern generalization and scoring. these include the use of part-of-speech tags to guide the generalization, named entity categories inside the patterns, an edit-distance-based pattern generalization algorithm, and a pattern accuracy calculation procedure based on evaluating the patterns on several test corpora. in an evaluation with 14 entities, the system attains a precision higher than 50% for half of the relationships considered. lexicon and grammar in probabilistic tagging of written english. the paper describes the development of software for automatic grammatical analysis of unrestricted, unedited english text at the unit for computer research on the english language (ucrel) at the university of lancaster. the work is currently funded by ibm and carried out in collaboration with colleagues at ibm uk (winchester) and ibm yorktown heights. the paper will focus on the lexicon component of the word tagging system, the ucrel grammar, the databanks of parsed sentences, and the tools that have been written to support development of these components. this work has applications to speech technology, spelling correction, and other areas of natural language processing. currently, our goal is to provide a language model using transition statistics to disambiguate alternative parses for a speech recognition device. generalized algorithms for constructing statistical language models. recent text and speech processing applications such as speech mining raise new and more general problems related to the construction of language models. we present and describe in detail several new and efficient algorithms to address these more general problems and report experimental results demonstrating their usefulness. we give an algorithm for computing efficiently the expected counts of any sequence in a word lattice output by a speech recognizer or any arbitrary weighted automaton; describe a new technique for creating exact representations of n-gram language models by weighted automata whose size is practical for offline use even for a vocabulary size of about 500,000 words and an n-gram order n = 6; and present a simple and more general technique for constructing class-based language models that allows each class to represent an arbitrary weighted automaton. an efficient implementation of our algorithms and techniques has been incorporated in a general software library for language modeling, the grm library, that includes many other text and grammar processing functionalities. corpus-based identification of non-anaphoric noun phrases. coreference resolution involves finding antecedents for anaphoric discourse entities, such as definite noun phrases. but many definite noun phrases are not anaphoric because their meaning can be understood from general world knowledge (e.g., "the white house" or "the news media"). we have developed a corpus-based algorithm for automatically identifying definite noun phrases that are non-anaphoric, which has the potential to improve the efficiency and accuracy of coreference resolution systems. our algorithm generates lists of non-anaphoric noun phrases and noun phrase patterns from a training corpus and uses them to recognize non-anaphoric noun phrases in new texts. using 1600 muc-4 terrorism news articles as the training corpus, our approach achieved 78% recall and 87% precision at identifying such noun phrases in 50 text documents. toward a computational theory of speech perception. in recent years, a great deal of evidence has been collected which gives substantially increased insight into the nature of human speech perception. it is the author's belief that such data can be effectively used to infer much of the structure of a practical speech recognition system. this paper details a new view of the role of structural constraints within the several structural domains (e. g. articulation, phonetics, phonology, syntax, semantics) that must be utilized to infer the desired percept. integrating multiple knowledge sources for detection and correction of repairs in human-computer dialog. we have analyzed 607 sentences of spontaneous human-computer speech data containing repairs, drawn from a total corpus of 10,718 sentences. we present here criteria and techniques for automatically detecting the presence of a repair, its location, and making the appropriate correction. the criteria involve integration of knowledge from several sources: pattern matching, syntactic and semantic analysis, and acoustics. evaluating a focus-based approach to anaphora resolution. we present an approach to anaphora resolution based on a focusing algorithm, and implemented within an existing muc (message understanding conference) information extraction system, allowing quantitative evaluation against a substantial corpus of annotated real-world texts. extensions to the basic focusing mechanism can be easily tested, resulting in refinements to the mechanism and resolution rules. results show that the focusing algorithm is highly sensitive to the quality of syntactic-semantic analyses, when compared to a simpler heuristic-based approach. dependency-based statistical machine translation. we present a czech-english statistical machine translation system which performs tree-to-tree translation of dependency structures. the only bilingual resource required is a sentence-aligned parallel corpus. all other resources are monolingual. we also refer to an evaluation method and plan to compare our system's output with a benchmark system. two diverse systems built using generic components for spoken dialogue (recent progress on trips). this paper describes recent progress on the trips architecture for developing spoken-language dialogue systems. the interactive poster session will include demonstrations of two systems built using trips: a computer purchasing assistant, and an object placement (and manipulation) task. prosody, syntax and parsing. we describe the modification of a grammar to take advantage of prosodic information provided by a speech recognition system. this initial study is limited to the use of relative duration of phonetic segments in the assignment of syntactic structure, specifically in ruling out alternative parses in otherwise ambiguous sentences. taking advantage of prosodic information in parsing can make a spoken language system more accurate and more efficient, if prosodic-syntactic mismatches, or unlikely matches, can be pruned. we know of no other work that has succeeded in automatically extracting speech information and using it in a parser to rule out extraneous parses. a robust system for natural spoken dialogue. this paper describes a system that leads us to believe in the feasibility of constructing natural spoken dialogue systems in task-oriented domains. it specifically addresses the issue of robust interpretation of speech in the presence of recognition errors. robustness is achieved by a combination of statistical error post-correction, syntactically- and semantically-driven robust parsing, and extensive use of the dialogue context. we present an evaluation of the system using time-to-completion and the quality of the final solution that suggests that most native speakers of english can use the system successfully with virtually no training. k-valued non-associative lambek categorial grammars are not learnable from strings. this paper is concerned with learning categorial grammars in gold's model. in contrast to k-valued classical categorial grammars, k-valued lambek grammars are not learnable from strings. this result was shown for several variants but the question was left open for the weakest one, the non-associative variant nl.we show that the class of rigid and k-valued nl grammars is unlearnable from strings, for each k; this result is obtained by a specific construction of a limit point in the considered class, that does not use product operator.another interest of our construction is that it provides limit points for the whole hierarchy of lambek grammars, including the recent pregroup grammars.such a result aims at clarifying the possible directions for future learning algorithms: it expresses the difficulty of learning categorial grammars from strings and the need for an adequate structure on examples. head automata and bilingual tiling: translation with minimal representations. we present a language model consisting of a collection of costed bidirectional finite state automata associated with the head words of phrases. the model is suitable for incremental application of lexical associations in a dynamic programming search for optimal dependency tree derivations. we also present a model and algorithm for machine translation involving optimal "tiling" of a dependency tree with entries of a costed bilingual lexicon. experimental results are reported comparing methods for assigning cost functions to these models. we conclude with a discussion of the adequacy of annotated linguistic strings as representations for machine translation. tagging unknown proper names using decision trees. this paper describes a supervised learning method to automatically select from a set of noun phrases, embedding proper names of different semantic classes, their most distinctive features. the result of the learning process is a decision tree which classifies an unknown proper name on the basis of its context of occurrence. this classifier is used to estimate the probability distribution of an out of vocabulary proper name over a tagset. this probability distribution is itself used to estimate the parameters of a stochastic part of speech tagger. automatic acquisition of hierarchical transduction models for machine translation. we describe a method for the fully automatic learning of hierarchical finite state translation models. the input to the method is transcribed speech utterances and their corresponding human translations, and the output is a set of head transducers, i.e. statistical lexical head-outward transducers. a word-alignment function and a head-ranking function are first obtained, and then counts are generated for hypothesized state transitions of head transducers whose lexical translations and word order changes are consistent with the alignment. the method has been applied to create an english-spanish translation model for a speech translation application, with word accuracy of over 75% as measured by a string-distance comparison to three reference translations. an efficient kernel for multilingual generation in speech-to-speech dialogue translation. we present core aspects of a fully implemented generation component in a multilingual speech-to-speech dialogue translation system. its design was particularly influenced by the necessity of real-time processing and usability for multiple languages and domains. we developed a general kernel system comprising a microplanning and a syntactic realizer module. the microplanner performs lexical and syntactic choice, based on constraint-satisfaction techniques. the syntactic realizer processes hpsg grammars reflecting the latest developments of the underlying linguistic theory, utilizing their pre-processing into the tag formalism. the declarative nature of the knowledge bases, i.e., the microplanning constraints and the hpsg grammars allowed an easy adaption to new domains and languages. the successful integration of our component into the translation system verbmobil proved the fulfillment of the specific real-time constraints. a comparison of head transducers and transfer for a limited domain translation application. we compare the effectiveness of two related machine translation models applied to the same limited-domain task. one is a transfer model with monolingual head automata for analysis and generation; the other is a direct transduction model based on bilingual head transducers. we conclude that the head transducer model is more effective according to measures of accuracy, computational requirements, model size, and development effort. the sammie system: multimodal in-car dialogue. the sammie system is an in-car multi-modal dialogue system for an mp3 application. it is used as a testing environment for our research in natural, intuitive mixed-initiative interaction, with particular emphasis on multimodal output planning and realization aimed to produce output adapted to the context, including the driver's attention state w.r.t. the primary driving task. monotonic semantic interpretation. aspects of semantic interpretation, such as quantifier scoping and reference resolution, are often realised computationally by non-monotonic operations involving loss of information and destructive manipulation of semantic representations. the paper describes how monotonic reference resolution and scoping can be carried out using a revised quasi logical form (qlf) representation. semantics for qlf are presented in which the denotations of formulas are extended monotonically as qlf expressions are resolved. the rhythm of lexical stress in prose. "prose rhythm" is a widely observed but scarcely quantified phenomenon. we describe an information-theoretic model for measuring the regularity of lexical stress in english texts, and use it in combination with trigram language models to demonstrate a relationship between the probability of word sequences in english and the amount of rhythm present in them. we find that the stream of lexical stress in text from the wall street journal has an entropy rate of less than 0.75 bits per syllable for common sentences. we observe that the average number of syllables per word is greater for rarer word sequences, and to normalize for this effect we run control experiments to show that the choice of word order contributes significantly to stress regularity, and increasingly with lexical probability. a model of lexical attraction and repulsion. this paper introduces new methods based on exponential families for modeling the correlations between words in text and speech. while previous work assumed the effects of word co-occurrence statistics to be constant over a window of several hundred words, we show that their influence is nonstationary on a much smaller time scale. empirical data drawn from english and japanese text, as well as conversational speech, reveals that the "attraction" between words decays exponentially, while stylistic and syntactic contraints create a "repulsion" between words that discourages close co-occurrence. we show that these characteristics are well described by simple mixture models based on two-stage exponential distributions which can be trained using the em algorithm. the resulting distance distributions can then be incorporated as penalizing features in an exponential language model. logical forms in the core language engine. this paper describes a 'logical form' target language for representing the literal meaning of english sentences, and an intermediate level of representation ('quasi logical form') which engenders a natural separation between the compositional semantics and the processes of scoping and reference resolution. the approach has been implemented in the sri core language engine which handles the english constructions discussed in the paper. consonant spreading in arabic stems. this paper examines the phenomenon of consonant spreading in arabic stems. each spreading involves a local surface copying of an underlying consonant, and, in certain phonological contexts, spreading alternates productively with consonant lengthening (or gemination). the morphophonemic triggers of spreading lie in the patterns or even in the roots themselves, and the combination of a spreading root and a spreading pattern causes a consonant to be copied multiple times. the interdigitation of arabic stems and the realization of consonant spreading are formalized using finite-state morphotactics and variation rules, and this approach has been successfully implemented in a large-scale arabic morphological analyzer which is available for testing on the internet. computing locally coherent discourses. we present the first algorithm that computes optimal orderings of sentences into a locally coherent discourse. the algorithm runs very efficiently on a variety of coherence measures from the literature. we also show that the discourse ordering problem is np-complete and cannot be approximated. finite-state non-concatenative morphotactics. we describe a new technique for constructing finite-state transducers that involves reapplying the regular-expression compiler to its own output. implemented in an algorithm called compile-replace, this technique has proved useful for handling non-concatenative phenomena; and we demonstrate it on malay full-stem reduplication and arabic stem interdigitation. the selection of the most probable dependency structure in japanese using mutual information. we use a statistical method to select the most probable structure or parse for a given sentence. it takes as input the dependency structures generated for the sentence by a dependency grammar, finds all triple of modifier, particle and modificant relations, calculates mutual information of each relation and chooses the structure for which the product of the mutual information of its relations is the highest. inside-outside estimation of a lexicalized pcfg for german. the paper describes an extensive experiment in inside-outside estimation of a lexicalized probabilistic context free grammar for german verb-final clauses. grammar and formalism features which make the experiment feasible are described. successive models are evaluated on precision and recall of phrase markup. improvement of a whole sentence maximum entropy language model using grammatical features. in this paper, we propose adding long-term grammatical information in a whole sentence maximun entropy language model (wsme) in order to improve the performance of the model. the grammatical information was added to the wsme model as features and were obtained from a stochastic context-free grammar. finally, experiments using a part of the penn treebank corpus were carried out and significant improvements were acheived. the sentimental factor: improving review classification via human-provided information. sentiment classification is the task of labeling a review document according to the polarity of its prevailing opinion (favorable or unfavorable). in approaching this problem, a model builder often has three sources of information available: a small collection of labeled documents, a large collection of unlabeled documents, and human understanding of language. ideally, a learning method will utilize all three sources. to accomplish this goal, we generalize an existing procedure that uses the latter two.we extend this procedure by re-interpreting it as a naive bayes model for document sentiment. viewed as such, it can also be seen to extract a pair of derived features that are linearly combined to predict sentiment. this perspective allows us to improve upon previous methods, primarily through two strategies: incorporating additional derived features into the model and, where possible, using labeled data to estimate their relative influence. mt evaluation: human-like vs. human acceptable. we present a comparative study on machine translation evaluation according to two different criteria: human likeness and human acceptability. we provide empirical evidence that there is a relationship between these two kinds of evaluation: human likeness implies human acceptability but the reverse is not true. from the point of view of automatic evaluation this implies that metrics based on human likeness are more reliable for system tuning. our results also show that current evaluation metrics are not always able to distinguish between automatic and human translations. in order to improve the descriptive power of current metrics we propose the use of additional syntax-based metrics, and metric combinations inside the qarla framework. an empirical study of information synthesis task. this paper describes an empirical study of the "information synthesis" task, defined as the process of (given a complex information need) extracting, organizing and inter-relating the pieces of information contained in a set of relevant documents, in order to obtain a comprehensive, non redundant report that satisfies the information need.two main results are presented: a) the creation of an information synthesis testbed with 72 reports manually generated by nine subjects for eight complex topics with 100 relevant documents each; and b) an empirical comparison of similarity metrics between reports, under the hypothesis that the best metric is the one that best distinguishes between manual and automatically generated reports. a metric based on key concepts overlap gives better results than metrics based on n-gram overlap (such as rouge) or sentence overlap. qarla: a framework for the evaluation of text summarization systems. this paper presents a probabilistic framework, qarla, for the evaluation of text summarisation systems. the input of the framework is a set of manual (reference) summaries, a set of baseline (automatic) summaries and a set of similarity metrics between summaries. it provides i) a measure to evaluate the quality of any set of similarity metrics, ii) a measure to evaluate the quality of a summary using an optimal set of similarity metrics, and iii) a measure to evaluate whether the set of baseline summaries is reliable or may produce biased results.compared to previous approaches, our framework is able to combine different metrics and evaluate the quality of a set of metrics without any a-priori weighting of their relative importance. we provide quantitative evidence about the effectiveness of the approach to improve the automatic evaluation of text summarisation systems by combining several similarity metrics. discovering phonotactic finite-state automata by generic search. this paper presents a genetic algorithm based approach to the automatic discovery of finite-state automata (fsas) from positive data. fsas are commonly used in computational phonology, but-given the limited learnability of fsas from arbitrary language subsets-are usually constructed manually. the approach presented here offers a practical automatic method that helps reduce the cost of manual fsa construction. tense and connective constraints on the expression of causality. starting from descriptions of french connectives (in particular "donc"---therefore), on the one hand, and aspectual properties of french tenses passé simple and imparfait on the other hand, we study in this paper how the two interact with respect to the expression of causality. it turns out that their interaction is not free. some combinations are not acceptable, and we propose an explanation for them. these results apply straightforwardly to natural language generation: given as input two events related by a cause relation, we can choose among various ways of presentation (the parameters being (i) the order, (ii) the connective, (iii) the tense) so that we are sure to express a cause relation, without generating either an incorrect discourse or an ambiguous one. query-relevant summarization using faqs. this paper introduces a statistical model for query-relevant summarization: succinctly characterizing the relevance of a document to a query. learning parameter values for the proposed model requires a large collection of summarized documents, which we do not have, but as a proxy, we use a collection of faq (frequently-asked question) documents. taking a learning approach enables a principled, quantitative evaluation of the proposed system, and the results of some initial experiments---on a collection of usenet faqs and on a faq-like set of customer-submitted questions to several large retail companies---suggest the plausibility of learning for summarization. time mapping with hypergraphs. word graphs are able to represent a large number of different utterance hypotheses in a very compact manner. however, usually they contain a huge amount of redundancy in terms of word hypotheses that cover almost identical intervals in time. we address this problem by introducing hypergraphs for speech processing. hypergraphs can be classified as an extension to word graphs and charts, their edges possibly having several start and end vertices. by converting ordinary word graphs to hypergraphs one can reduce the number of edges considerably. we define hypergraphs formally, present an algorithm to convert word graphs into hypergraphs and state consistency properties for edges and their combination. finally, we present some empirical results concerning graph size and parsing effciency. bootstrapping path-based pronoun resolution. we present an approach to pronoun resolution based on syntactic paths. through a simple bootstrapping procedure, we learn the likelihood of coreference between a pronoun and a candidate noun based on the path in the parse tree between the two entities. this path information enables us to handle previously challenging resolution instances, and also robustly addresses traditional syntactic coreference constraints. highly coreferent paths also allow mining of precise probabilistic gender/number information. we combine statistical knowledge with well known features in a support vector machine pronoun resolution classifier. significant gains in performance are observed on several datasets. a high-performance semi-supervised learning method for text chunking. in machine learning, whether one can build a more accurate classifier by using unlabeled data (semi-supervised learning) is an important issue. although a number of semi-supervised methods have been proposed, their effectiveness on nlp tasks is not always clear. this paper presents a novel semi-supervised method that employs a learning paradigm which we call structural learning. the idea is to find "what good classifiers are like" by learning from thousands of automatically generated auxiliary classification problems on unlabeled data. by doing so, the common predictive structure shared by the multiple classification problems can be discovered, which can then be used to improve performance on the target problem. the method produces performance higher than the previous best results on conll'00 syntactic chunking and conll'03 named entity chunking (english and german). finding parts in very large corpora. we present a method for extracting parts of objects from wholes (e.g. "speedometer" from "car"). given a very large corpus our method finds part words with 55% accuracy for the top 50 words as ranked by the system. the part list could be scanned by an end-user and added to an existing ontology (such as wordnet), or used as a part of a rough semantic lexicon. resolution of collective-distributive ambiguity using model-based reasoning. i present a semantic analysis of collective-distributive ambiguity, and resolution of such ambiguity by model-based reasoning. this approach goes beyond scha and stallard [17], whose reasoning capability was limited to checking semantic types. my semantic analysis is based on link [14, 13] and roberts [15], where distributivity comes uniformly from a quantificational operator, either explicit (e.g. each) or implicit (e.g. the d operator). i view the semantics module of the natural language system as a hypothesis generator and the reasoner in the pragmatics module as a hypothesis filter (cf. simmons and davis [18]). the reasoner utilizes a model consisting of domain-dependent constraints and domain-independent axioms for disambiguation. there are two kinds of constraints, type constraints and numerical constraints, and they are associated with predicates in the knowledge base. whenever additional information is derived from the model, the contradiction checker is invoked to detect any contradiction in a hypothesis using simple mathematical knowledge. cdcl (collective-distributive constraint language) is used to represent hypotheses, constraints, and axioms in a way isomorphic to diagram representations of collective-distributive ambiguity. evaluating automated and manual acquisition of anaphora resolution strategies. we describe one approach to build an automatically trainable anaphora resolution system. in this approach, we use japanese newspaper articles tagged with discourse information as training examples for a machine learning algorithm which employs the c4.5 decision tree algorithm by quinlan (quinlan, 1993). then, we evaluate and compare the results of several variants of the machine learning-based approach with those of our existing anaphora resolution system which uses manually-designed knowledge sources. finally, we compare our algorithms with existing theories of anaphora, in particular, japanese zero pronouns. a language-independent anaphora resolution system for understanding multilingual texts. this paper describes a new discourse module within our multilingual nlp system. because of its unique data-driven architecture, the discourse module is language-independent. moreover, the use of hierarchically organized multiple knowledge sources makes the module robust and trainable using discourse-tagged corpora. separating discourse phenomena from knowledge sources makes the discourse module easily extensible to additional phenomena. trainable, scalable summarization using robust nlp and machine learning. we describe a trainable and scalable summarization system which utilizes features derived from information retrieval, information extraction, and nlp techniques and on-line resources. the system combines these features using a trainable feature combiner learned from summary examples through a machine learning algorithm. we demonstrate system scalability by reporting results on the best combination of summarization features for different document sources. we also present preliminary results from a task-based evaluation on summarization output usability. parsing free word order languages in the paninian framework. there is a need to develop a suitable computational grammar formalism for free word order languages for two reasons: first, a suitably designed formalism is likely to be more efficient. second, such a formalism is also likely to be linguistically more elegant and satisfying. in this paper, we describe such a formalism, called the paninian framework, that has been successfully applied to indian languages.this paper shows that the paninian framework applied to modern indian languages gives an elegant account of the relation between surface form (vibhakti) and semantic (karaka) roles. the mapping is elegant and compact. the same basic account also explains active-passives and complex sentences. this suggests that the solution is not just adhoc but has a deeper underlying unity.a constraint based parser is described for the framework. the constraints problem reduces to bipartite graph matching problem because of the nature of constraints. efficient solutions are known for these problems.it is interesting to observe that such a parser (designed for free word order languages) compares well in asymptotic time complexity with the parser for context free grammars (cfgs) which are basically designed for positional languages. zero morphemes in unification-based combinatory categorial grammar. in this paper, we report on our use of zero morphemes in unification-based combinatory categorial grammar. after illustrating the benefits of this approach with several examples, we describe the algorithm for compiling zero morphemes into unary rules, which allows us to use zero morphemes more efficiently in natural language processing. then, we discuss the question of equivalence of a grammar with these unary rules to the original grammar. lastly, we compare our approach to zero morphemes with possible alternatives. unsupervised sense disambiguation using bilingual probabilistic models. we describe two probabilistic models for unsupervised word-sense disambiguation using parallel corpora. the first model, which we call the sense model, builds on the work of diab and resnik (2002) that uses both parallel text and a sense inventory for the target language, and recasts their approach in a probabilistic framework. the second model, which we call the concept model, is a hierarchical model that uses a concept latent variable to relate different language specific sense labels. we show that both models improve performance on the word sense disambiguation task over previous unsupervised approaches, with the concept model showing the largest improvement. furthermore, in learning the concept model, as a by-product, we learn a sense inventory for the parallel language. problem solving applied to language generation. this research was supported at sri international by the defense advanced research projects agency under contract n00039--79--c--0118 with the naval electronic systems command. the views and conclusions contained in this document are those of the author and should not be interpreted as representative of the official policies either expressed or implied of the defense advanced research projects agency, or the u. s. government. the author is grateful to barbara grosz, gary hendrix and terry winograd for comments on an earlier draft of this paper. unsupervised part-of-speech tagging employing efficient graph clustering. an unsupervised part-of-speech (pos) tagging system that relies on graph clustering methods is described. unlike in current state-of-the-art approaches, the kind and number of different tags is generated by the method itself. we compute and merge two partitionings of word graphs: one based on context similarity of high frequency words, another on log-likelihood statistics for words of lower frequencies. using the resulting word clusters as a lexicon, a viterbi pos tagger is trained, which is refined by a morphological component. the approach is evaluated on three different languages by measuring agreement with existing taggers. alternative phrases and natural languages information retrieval. this paper presents a formal analysis for a large class of words called alternative markers, which includes other(than), such(as), and besides. these words appear frequently enough in dialog to warrant serious attention, yet present natural language search engines perform poorly on queries containing them. i show that the performance of a search engine can be improved dramatically by incorporating an approximation of the formal analysis that is compatible with the search engine's operational semantics. the value of this approach is that as the operational semantics of natural language applications improve, even larger improvements are possible. a practical nonmonotonic theory for reasoning about speech acts. a prerequisite to a theory of the way agents understand speech acts is a theory of how their beliefs and intentions are revised as a consequence of events. this process of attitude revision is an interesting domain for the application of nonmonotonic reasoning because speech acts have a conventional aspect that is readily represented by defaults, but that interacts with an agent's beliefs and intentions in many complex ways that may override the defaults. perrault has developed a theory of speech acts, based on rieter's default logic, that captures the conventional aspect; it does not, however, adequately account for certain easily observed facts about attitude revision resulting from speech acts. a natural theory of attitude revision seems to require a method of stating preferences among competing defaults. we present here a speech act theory, formalized in hierarchic autoepistemic logic (a refinement of moore's autoepistemic logic), in which revision of both the speaker's and hearer's attitudes can be adequately described. as a collateral benefit, efficient automatic reasoning methods for the formalism exist. the theory has been implemented and is now being employed by an utterance-planning system. the structure of shared forests in ambiguous parsing. the context-free backbone of some natural language analyzers produces all possible cf parses as some kind of shared forest, from which a single tree is to be chosen by a disambiguation process that may be based on the finer features of the language. we study the structure of these forests with respect to optimality of sharing, and in relation with the parsing schema used to produce them. in addition to a theoretical and experimental framework for studying these issues, the main results presented are:- sophistication in chart parsing schemata (e.g. use of look-ahed) may reduce time and space efficiency instead of improving it,- there is a shared forest structure with at most cubic size for any cf grammar,- when o(n3) complexity is required, the shape of a shared forest is dependent on the parsing schema used.though analyzed on cf grammars for simplicity, these results extend to more complex formalisms such as unification based grammars. a flexible approach to cooperative response generation in information-seeking dialogues. this paper presents a cooperative consultation system on a restricted domain. the system builds hypotheses on the user's plan and avoids misunderstandings (with consequent repair dialogues) through clarification dialogues in case of ambiguity. the role played by constraints in the generation of the answer is characterized in order to limit the cases of ambiguities requiring a clarification dialogue. the answers of the system are generated at different levels of detail, according to the user's competence in the domain. nltk: the natural language toolkit. the natural language toolkit is a suite of program modules, data sets and tutorials supporting research and teaching in computational linguistics and natural language processing. nltk is written in python and distributed under the gpl open source license. over the past year the toolkit has been rewritten, simplifying many linguistic data structures and taking advantage of recent enhancements in the python language. this paper reports on the simplified toolkit and explains how it is used in teaching nlp. a memory-based approach to learning shallow natural language patterns. recognizing shallow linguistic patterns, such as basic syntactic relationships between words, is a common task in applied natural language and text processing. the common practice for approaching this task is by tedious manual definition of possible pattern structures, often in the form of regular expressions or finite automata. this paper presents a novel memory-based learning method that recognizes shallow patterns in new text based on a bracketed training corpus. the training data are stored as-is, in efficient suffix-tree data structures. generalization is performed on-line at recognition time by comparing subsequences of the new text to positive and negative evidence in the corpus. this way, no information in the training is lost, as can happen in other learning systems that construct a single generalized model at the time of training. the paper presents experimental results for recognizing noun phrase, subject-verb and verb-object patterns in english. since the learning approach enables easy porting to new domains, we plan to apply it to syntactic patterns in other languages and to sub-language patterns for information extraction. a flexible approach to natural language generation for disabled children. natural language generation (nlg) is a way to automatically realize a correct expression in response to a communicative goal. this technology is mainly explored in the fields of machine translation, report generation, dialog system etc. in this paper we have explored the nlg technique for another novel application-assisting disabled children to take part in conversation. the limited physical ability and mental maturity of our intended users made the nlg approach different from others. we have taken a flexible approach where main emphasis is given on flexibility and usability of the system. the evaluation results show this technique can increase the communication rate of users during a conversation. lexicalization in crosslinguistic probabilistic parsing: the case of french. this paper presents the first probabilistic parsing results for french, using the recently released french treebank. we start with an unlexicalized pcfg as a baseline model, which is enriched to the level of collins' model 2 by adding lexicalization and subcategorization. the lexicalized sister-head model and a bigram model are also tested, to deal with the flatness of the french treebank. the bigram model achieves the best performance: 81% constituency f-score and 84% dependency accuracy. all lexicalized models outperform the unlexicalized baseline, consistent with probabilistic parsing results for english, but contrary to results for german, where lexicalization has only a limited effect on parsing performance. parsing ambiguous structures using controlled disjunctions and unary quasi-trees. the problem of parsing ambiguous structures concerns (i) their representation and (ii) the specification of mechanisms allowing to delay and control their evaluation. we first propose to use a particular kind of disjunctions called controlled disjunctions: these formulae allows the representation and the implementation of specific constraints that can occur between ambiguous values. but an efficient control of ambiguous structures also has to take into account lexical as well as syntactic information concerning this object. we then propose the use of unary quasi-trees specifying constraints at these different levels. the two devices allow an efficient implementation of the control of the ambiguity. moreover, they are independent from a particular formalism and can be used whatever the linguistic theory. intentions and indormation in discourse. this paper is about the flow of inference between communicative intentions, discourse structure and the domain during discourse processing. we augment a theory of discourse interpretation with a theory of distinct mental attitudes and reasoning about them, in order to provide an account of how the attitudes interact with reasoning about discourse structure. acceptability prediction by means of grammaticality quantification. we propose in this paper a method for quantifying sentence grammaticality. the approach based on property grammars, a constraint-based syntactic formalism, makes it possible to evaluate a grammaticality index for any kind of sentence, including ill-formed ones. we compare on a sample of sentences the grammaticality indices obtained from pg formalism and the acceptability judgements measured by means of a psycholinguistic analysis. the results show that the derived grammaticality index is a fairly good tracer of acceptability scores. knowledge acquisition from texts: using an automatic clustering method based on noun-modifier relationship. we describe the early stage of our methodology of knowledge acquisition from technical texts. first, a partial morpho-syntactic analysis is performed to extract "candidate terms". then, the knowledge engineer, assisted by an automatic clustering tool, builds the "conceptual fields" of the domain. we focus on this conceptual analysis stage, describe the data prepared from the results of the morpho-syntactic analysis and show the results of the clustering module and their interpretation. we found that syntactic links represent good descriptors for candidate terms clustering since the clusters are often easily interpreted as "conceptual fields". trigger-pair predictors in parsing and tagging. in this article, we apply to natural language parsing and tagging the device of trigger-pair predictors, previously employed exclusively within the field of language modelling for speech recognition. given the task of predicting the correct rule to associate with a parse-tree node, or the correct tag to associate with a word of text, and assuming a particular class of parsing or tagging model, we quantify the information gain realized by taking account of rule or tag trigger-pair predictors, i.e. pairs consisting of a "triggering" rule or tag which has already occurred in the document being processed, together with a specific "triggered" rule or tag whose probability of occurrence within the current sentence we wish to estimate. this information gain is shown to be substantial. further, by utilizing trigger pairs taken from the same general sort of document as is being processed (e.g. same subject matter or same discourse type)---as opposed to predictors derived from a comprehensive general set of english texts---we can significantly increase this information gain. the effect of corpus size in combining supervised and unsupervised training for disambiguation. we investigate the effect of corpus size in combining supervised and unsupervised learning for two types of attachment decisions: relative clause attachment and prepositional phrase attachment. the supervised component is collins' parser, trained on the wall street journal. the unsupervised component gathers lexical statistics from an unannotated corpus of newswire text. we find that the combined system only improves the performance of the parser for small training sets. surprisingly, the size of the unannotated corpus has little effect due to the noisiness of the lexical statistics acquired by unsupervised learning. towards history-based grammars: using richer models for probabilistic parsing. we describe a generative probabilistic model of natural language, which we call hbg, that takes advantage of detailed linguistic information to resolve ambiguity. hbg incorporates lexical, syntactic, semantic, and structural information from the parse tree into the disambiguation process in a novel way. we use a corpus of bracketed sentences, called a treebank, in combination with decision tree building to tease out the relevant aspects of a parse tree that will determine the correct parse of a sentence. this stands in contrast to the usual approach of further grammar tailoring via the usual linguistic introspection in the hope of generating the correct parse. in head-to-head tests against one of the best existing robust probabilistic parsing models, which we call p-cfg, the hbg model significantly outperforms p-cfg, increasing the parsing accuracy rate from 60% to 75%, a 37% reduction in error. a phrase-based statistical model for sms text normalization. short messaging service (sms) texts behave quite differently from normal written texts and have some very special phenomena. to translate sms texts, traditional approaches model such irregularities directly in machine translation (mt). however, such approaches suffer from customization problem as tremendous effort is required to adapt the language model of the existing translation system to handle sms text style. we offer an alternative approach to resolve such irregularities by normalizing sms texts before mt. in this paper, we view the task of sms normalization as a translation problem from the sms language to the english language and we propose to adapt a phrase-based statistical mt model for the task. evaluation by 5-fold cross validation on a parallel sms normalized corpus of 5000 sentences shows that our method can achieve 0.80702 in bleu score against the baseline bleu score 0.6958. another experiment of translating sms texts from english to chinese on a separate sms text corpus shows that, using sms normalization as mt preprocessing can largely boost sms translation performance from 0.1926 to 0.3770 in bleu score. development and evaluation of a broad-coverage probabilistic grammar of english-language computer manuals. we present an approach to grammar development where the task is decomposed into two separate subtasks. the first tasks linguistic, with the goal of producing a set of rules that have a large coverage (in the sense that the correct parse is among the proposed parses) on a blind test set of sentences. the second task is statistical, with the goal of developing a model of the grammar which assigns maximum probability for the correct parse. we give parsing results on text from computer manuals. going beyond aer: an extensive analysis of word alignments and their impact on mt. this paper presents an extensive evaluation of five different alignments and investigates their impact on the corresponding mt system output. we introduce new measures for intrinsic evaluations and examine the distribution of phrases and untranslated words during decoding to identify which characteristics of different alignments affect translation. we show that precision-oriented alignments yield better mt output (translating more words and using longer phrases) than recall-oriented alignments. automatic compensation for parser figure-of-merit flaws. best-first chart parsing utilises a figure of merit (fom) to efficiently guide a parse by first attending to those edges judged better. in the past it has usually been static; this paper will show that with some extra information, a parser can compensate for fom flaws which otherwise slow it down. our results are faster than the prior best by a factor of 2.5; and the speedup is won with no significant decrease in parser accuracy. discourse entities in janus. this paper addresses issues that arose in applying the model for discourse entity (de) generation in b. webber's work (1978, 1983) to an interactive multimodal interface. her treatment was extended in 4 areas: (1) the notion of context dependence of des was formalized in an intensional logic, (2) the treatment of des for indefinite nps was modified to use skolem functions, (3) the treatment of dependent quantifiers was generalized, and (4) des originating from non-linguistic sources, such as pointing actions, were taken into account. the discourse entities are used in intra- and extra-sentential pronoun resolution in bbn janus. on the decidability of functional uncertainty. we show that feature logic extended by functional uncertainty is decidable, even if one admits cyclic descriptions. we present an algorithm, which solves feature descriptions containing functional uncertainty in two phases, both phases using a set of deterministic and non-deterministic rewrite rules. we then compare our algorithm with the one of kaplan and maxwell, that does not cover cyclic feature descriptions. a comparison of document, sentence, and term event spaces. the trend in information retrieval systems is from document to sub-document retrieval, such as sentences in a summarization system and words or phrases in question-answering system. despite this trend, systems continue to model language at a document level using the inverse document frequency (idf). in this paper, we compare and contrast idf with inverse sentence frequency (isf) and inverse term frequency (itf). a direct comparison reveals that all language models are highly correlated; however, the average isf and itf values are 5.5 and 10.4 higher than idf. all language models appeared to follow a power law distribution with a slope coefficient of 1.6 for documents and 1.7 for sentences and terms. we conclude with an analysis of idf stability with respect to random, journal, and section partitions of the 100,830 full-text scientific articles in our experimental corpus. a complete and recursive feature theory. various feature descriptions are being employed in constrained-based grammar formalisms. the common notational primitive of these descriptions are functional attributes called features. the descriptions considered in this paper are the possibly quantified first-order formulae obtained from a signature of features and sorts. we establish a complete first-order theory ft by means of three axiom schemes and construct three elementarily equivalent models.one of the models consists of so-called feature graphs, a data structure common in computational linguistics. the other two models consist of so-called feature trees, a record-like data structure generalizing the trees corresponding to first-order terms.our completeness proof exhibits a terminating simplification system deciding validity and satisfiability of possibly quantified feature descriptions. outilex, a linguistic platform for text processing. we present outilex, a generalist linguistic platform for text processing. the platform includes several modules implementing the main operations for text processing and is designed to use large-coverage language resources. these resources (dictionaries, grammars, annotated texts) are formatted into xml, in accordance with current standards. evaluations on efficiency are given. entity-based cross-document coreferencing using the vector space model. cross-document coreference occurs when the same person, place, event, or concept is discussed in more than one text source. computer recognition of this phenomenon is important because it helps break "the document boundary" by allowing a user to examine information about a particular entity from multiple text sources at the same time. in this paper we describe a cross-document coreference resolution algorithm which uses the vector space model to resolve ambiguities between people having the same name. in addition, we also describe a scoring algorithm for evaluating the cross-document coreference chains produced by our system and we compare our algorithm to the scoring algorithm used in the muc-6 (within document) coreference task. towards an optimal lexicalization in a natural-sounding portable natural language generator for dialog systems. in contrast to the latest progress in speech recognition, the state-of-the-art in natural language generation for spoken language dialog systems is lagging behind. the core dialog managers are now more sophisticated; and natural-sounding and flexible output is expected, but not achieved with current simple techniques such as template-based systems. portability of systems across subject domains and languages is another increasingly important requirement in dialog systems. this paper presents an outline of legend, a system that is both portable and generates natural-sounding output. this goal is achieved through the novel use of existing lexical resources such as framenet and wordnet. the berkeley framenet project. framenet is a three-year nsf-supported project in corpus-based computational lexicography, now in its second year (nsf iri-9618838, "tools for lexicon building"). the project's key features are (a) a commitment to corpus evidence for semantic and syntactic generalizations, and (b) the representation of the valences of its target words (mostly nouns, adjectives, and verbs) in which the semantic portion makes use of frame semantics. the resulting database will contain (a) descriptions of the semantic frames underlying the meanings of the words described, and (b) the valence representation (semantic and syntactic) of several thousand words and phrases, each accompanied by (c) a representative collection of annotated corpus attestations, which jointly exemplify the observed linkings between "frame elements" and their syntactic realizations (e.g. grammatical function, phrase type, and other syntactic traits). this report will present the project's goals and workflow, and information about the computational tools that have been adapted or created in-house for this work. on representing governed prepositions and handling "incorrect" and novel prepositions. nlp systems, in order to be robust, must handle novel and ill-formed input. one common type of error involves the use of non-standard prepositions to mark arguments. in this paper, we argue that such errors can be handled in a systematic fashion, and that a system designed to handle them offers other advantages. we offer a classification scheme for preposition usage errors. further, we show how the knowledge representation employed in the sra nlp system facilitates handling these data. coupling ccg and hybrid logic dependency semantics. categorial grammar has traditionally used the λ-calculus to represent meaning. we present an alternative, dependency-based perspective on linguistic meaning and situate it in the computational setting. this perspective is formalized in terms of hybrid logic and has a rich yet perspicuous propositional ontology that enables a wide variety of semantic phenomena to be represented in a single meaning formalism. finally, we show how we can couple this formalization to combinatory categorial grammar to produce interpretations compositionally. low-cost, high-performance translation retrieval: dumber is better. in this paper, we compare the relative effects of segment order, segmentation and segment contiguity on the retrieval performance of a translation memory system. we take a selection of both bag-of-words and segment order-sensitive string comparison methods, and run each over both character and word-segmented data, in combination with a range of local segment contiguity models (in the form of n-grams). over two distinct datasets, we find that indexing according to simple character bigrams produces a retrieval accuracy superior to any of the tested word n-gram models. further, in their optimum configuration, bag-of-words methods are shown to be equivalent to segment order-sensitive methods in terms of retrieval accuracy, but much faster. we also provide evidence that our findings are scalable. discriminative word alignment with conditional random fields. in this paper we present a novel approach for inducing word alignments from sentence aligned data. we use a conditional random field (crf), a discriminative model, which is estimated on a small supervised training set. the crf is conditioned on both the source and target texts, and thus allows for the use of arbitrary and overlapping features over these data. moreover, the crf has efficient training and decoding processes which both find globally optimal solutions.we apply this alignment model to both french-english and romanian-english language pairs. we show how a large number of highly predictive features can be easily incorporated into the crf, and demonstrate that even with only a few hundred word-aligned training sentences, our model improves over the current state-of-the-art with alignment error rates of 5.29 and 25.8 for the two tasks respectively. learning the countability of english nouns from corpus data. this paper describes a method for learning the countability preferences of english nouns from raw text corpora. the method maps the corpus-attested lexico-syntactic properties of each noun onto a feature vector, and uses a suite of memory-based classifiers to predict membership in 4 countability classes. we were able to assign countability to english nouns with a precision of 94.6%. multiple underlying systems: translating user requests into programs to produce answers. a user may typically need to combine the strengths of more than one system in order to perform a task. in this paper, we describe a component of the janus natural language interface that translates intensional logic expressions representing the meaning of a request into executable code for each application program, chooses which combination of application systems to use, and designs the transfer of data among them in order to provide an answer. the complete janus natural language system has been ported to two large command and control decision support aids. an improved parser for data-oriented lexical-functional analysis. we present an lfg-dop parser which uses fragments from lfg-annotated sentences to parse new sentences. experiments with the verbmobil and homecentre corpora show that (1) viterbi n best search performs about 100 times faster than monte carlo search while both achieve the same accuracy; (2) the dop hypothesis which states that parse accuracy increases with increasing fragment size is confirmed for lfg-dop; (3) lfg-dop's relative frequency estimator performs worse than a discounted frequency estimator; and (4) lfg-dop significantly outperforms tree-dop if evaluated on tree structures only. negative polarity licensing at the syntax-semantics interface. recent work on the syntax-semantics interface (see e.g. (dalrymple et al., 1994)) uses a fragment of linear logic as a 'glue language' for assembling meanings compositionally. this paper presents a glue language account of how negative polarity items (e.g. ever, any) get licensed within the scope of negative or downward-entailing contexts (ladusaw, 1979), e.g. nobody ever left. this treatment of licensing operates precisely at the syntax-semantics interface, since it is carried out entirely within the interface glue language (linear logic). in addition to the account of negative polarity licensing, we show in detail how linear-logic proof nets (girard, 1987; gallier, 1992) can be used for efficient meaning deduction within this 'glue language' framework. a general computational treatment of comparatives for natural language question answering. we discuss the techniques we have developed and implemented for the cross-categorial treatment of comparatives in teli, a natural language question-answering system that's transportable among both application domains and types of backend retrieval systems. for purposes of illustration, we shall consider the example sentences "list the cars at least 20 inches more than twice as long as the century is wide" and "have any us companies made at least 3 more large cars than buick?" issues to be considered include comparative inflections, left recursion and other forms of nesting, extraposition of comparative complements, ellipsis, the wh element "how", and the translation of normalized parse trees into logical form. what is the minimal set of fragments that achieves maximal parse accuracy? we aim at finding the minimal set of fragments which achieves maximal parse accuracy in data oriented parsing. experiments with the penn wall street journal treebank show that counts of almost arbitrary fragments within parse trees are important, leading to improved parse accuracy over previous models tested on this treebank (a precision of 90.8% and a recall of 90.6%). we isolate some dependency relations which previous models neglect but which contribute to higher parse accuracy. integrating word boundary identification with sentence understanding. chinese sentences are written with no special delimiters such as space to indicate word boundaries. existing chinese nlp systems therefore employ preprocessors to segment sentences into words. contrary to the conventional wisdom of separating this issue from the task of sentence understanding, we propose an integrated model that performs word boundary identification in lockstep with sentence understanding. in this approach, there is no distinction between rules for word boundary identification and rules for sentence understanding. these two functions are combined. word boundary ambiguities are detected, especially the fallacious ones, when they block the primary task of discovering the inter-relationships among the various constituents of a sentence, which essentially is the essence of the understanding process. in this approach, statistical information is also incorporated, providing the system a quick and fairly reliable starting ground to carry out the primary task of relationship-building. learning the structure of task-driven human-human dialogs. data-driven techniques have been used for many computational linguistics tasks. models derived from data are generally more robust than hand-crafted systems since they better reflect the distribution of the phenomena being modeled. with the availability of large corpora of spoken dialog, dialog management is now reaping the benefits of data-driven techniques. in this paper, we compare two approaches to modeling subtask structure in dialog: a chunk-based model of subdialog sequences, and a parse-based, or hierarchical, model. we evaluate these models using customer agent dialogs from a catalog service domain. responding to user queries in a collaborative environment. we propose a plan-based approach for responding to user queries in a collaborative environment. we argue that in such an environment, the system should not accept the user's query automatically, but should consider it a proposal open for negotiation. in this paper we concentrate on cases in which the system and user disagree, and discuss how this disagreement can be detected, negotiated, and how final modifications should be made to the existing plan. a bottom-up approach to sentence ordering for multi-document summarization. ordering information is a difficult but important task for applications generating natural language texts such as multi-document summarization, question answering, and concept-to-text generation. in multi-document summarization, information is selected from a set of source documents. however, improper ordering of information in a summary can confuse the reader and deteriorate the readability of the summary. therefore, it is vital to properly order the information in multi-document summarization. we present a bottom-up approach to arrange sentences extracted for multi-document summarization. to capture the association and order of two textual segments (e.g. sentences), we define four criteria: chronology, topical-closeness, precedence, and succession. these criteria are integrated into a criterion by a supervised learning approach. we repeatedly concatenate two textual segments into one segment based on the criterion, until we obtain the overall segment with all sentences arranged. we evaluate the sentence orderings produced by the proposed method and numerous baselines using subjective gradings as well as automatic evaluation measures. we introduce the average continuity, an automatic evaluation measure of sentence ordering in a summary, and investigate its appropriateness for this task. anchoring floating quantifiers in japanese-to-english machine transltion. in this paper we present an algorithm to anchor floating quantifiers in japanese, a language in which quantificational nouns and numeral-classifier combinations can appear separated from the noun phrase they quantify. the algorithm differentiates degree and event modifiers from nouns that quantify noun phrases. it then finds a suitable anchor for such floating quantifiers. to do this, the algorithm considers the part of speech of the quantifier and the target, the semantic relation between them, the case marker of the antecedent and the meaning of the verb that governs the two constituents. the algorithm has been implemented and tested in a rule-based japanese-to-english machine translation system, with an accuracy of 76% and a recall of 97%. a dop model for semantic interpretation. in data-oriented language processing, an annotated language corpus is used as a stochastic grammar. the most probable analysis of a new sentence is constructed by combining fragments from the corpus in the most probable way. this approach has been successfully used for syntactic analysis, using corpora with syntactic annotations such as the penn tree-bank. if a corpus with semantically annotated sentences is used, the same approach can also generate the most probable semantic interpretation of an input sentence. the present paper explains this semantic interpretation method. a data-oriented semantic interpretation algorithm was tested on two semantically annotated corpora: the english atis corpus and the dutch ovis corpus. experiments show an increase in semantic accuracy if larger corpus-fragments are taken into consideration. managing information at linguistic interfaces. a large spoken dialogue translation system imposes both engineering and linguistic constraints on the way in which linguistic information is communicated between modules. we describe the design and use of interface terms, whose formal, functional and communicative role has been tested in a sequence of integrated systems and which have proven adequate to these constraints. shallow parsing on the basis of words only: a case study. we describe a case study in which a memory-based learning algorithm is trained to simultaneously chunk sentences and assign grammatical function tags to these chunks. we compare the algorithm's performance on this parsing task with varying training set sizes (yielding learning curves) and different input representations. in particular we compare input consisting of words only, a variant that includes word form information for low-frequency words, gold-standard pos only, and combinations of these. the word-based shallow parser displays an apparently log-linear increase in performance, and surpasses the flatter pos-based curve at about 50,000 sentences of training data. the low-frequency variant performs even better, and the combinations is best. comparative experiments with a real pos tagger produce lower results. we argue that we might not need an explicit intermediate pos-tagging step for parsing when a sufficient amount of training material is available and word form information is used for low-frequency words. memory-based morphological analysis. we present a general architecture for efficient and deterministic morphological analysis based on memory-based learning, and apply it to morphological analysis of dutch. the system makes direct mappings from letters in context to rich categories that encode morphological boundaries, syntactic class labels, and spelling changes. both precision and recall of labeled morphemes are over 84% on held-out dictionary test words and estimated to be over 93% in free text. detecting problematic turns in human-machine interactions: rule-induction versus memory-based learning approaches. we address the issue of on-line detection of communication problems in spoken dialogue systems. the usefulness is investigated of the sequence of system question types and the word graphs corresponding to the respective user utterances. by applying both rule-induction and memory-based learning techniques to data obtained with a dutch train time-table information system, the current paper demonstrates that the aforementioned features indeed lead to a method for problem detection that performs significantly above baseline. the results are interesting from a dialogue perspective since they employ features that are present in the majority of spoken dialogue systems and can be obtained with little or no computational overhead. the results are interesting from a machine learning perspective, since they show that the rule-based method performs significantly better than the memory-based method, because the former is better capable of representing interactions between features. simulating children's null subjects: an early language generation model. this paper reports work in progress on a sentence generation model which attempts to emulate certain language output patterns of children between the ages of one and one-half and three years. in particular, the model addresses the issue of why missing or phonetically "null" subjects appear as often as they do in the speech of young english-speaking children. it will also be used to examine why other patterns of output appear in the speech of children learning languages such as italian and chinese. initial findings are that an output generator successfully approximates the null-subject output patterns found in english-speaking children by using a 'processing overload' metric alone; however, reference to several parameters related to discourse orientation and agreement morphology is necessary in order to account for the differing patterns of null arguments appearing cross-linguistically. based on these findings, it is argued that the 'null-subject phenomenon' is due to the combined effects of limited processing capacity and early, accurate parameter setting. a quantitative analysis of lexical differences between genders in telephone conversations. in this work, we provide an empirical analysis of differences in word use between genders in telephone conversations, which complements the considerable body of work in sociolinguistics concerned with gender linguistic differences. experiments are performed on a large speech corpus of roughly 12000 conversations. we employ machine learning techniques to automatically categorize the gender of each speaker given only the transcript of his/her speech, achieving 92% accuracy. an analysis of the most characteristic words for each gender is also presented. experiments reveal that the gender of one conversation side influences lexical use of the other side. a surprising result is that we were able to classify male-only vs. female-only conversations with almost perfect accuracy. another facet of lig parsing. in this paper we present a new parsing algorithm for linear indexed grammars (ligs) in the same spirit as the one described in (vijay-shanker and weir, 1993) for tree adjoining grammars. for a lig l and an input string x of length n, we build a non ambiguous context-free grammar whose sentences are all (and exclusively) valid derivation sequences in l which lead to x. we show that this grammar can be built in o(n6) time and that individual parses can be extracted in linear time with the size of the extracted parse tree. though this o(n6) upper bound does not improve over previous results, the average case behaves much better. moreover, practical parsing times can be decreased by some statically performed computations. defaults in unification grammar. incorporation of defaults in grammar formalisms is important for reasons of linguistic adequacy and grammar organization. in this paper we present an algorithm for handling default information in unification grammar. the algorithm specifies a logical operation on feature structures, merging with the non-default structure only those parts of the default feature structure which are not constrained by the non-default structure. we present various linguistic applications of default unification. constraint-based categorical grammar. we propose a generalization of categorial grammar in which lexical categories are defined by means of recursive constraints. in particular, the introduction of relational constraints allows one to capture the effects of (recursive) lexical rules in a computationally attractive manner. we illustrate the linguistic merits of the new approach by showing how it accounts for the syntax of dutch cross-serial dependencies and the position and scope of adjuncts in such constructions. delayed evaluation is used to process grammars containing recursive constraints. a morphographemic model for error correction in nonconcatenative strings. this paper introduces a spelling correction system which integrates seamlessly with morphological analysis using a multi-tape formalism. handling of various semitic error problems is illustrated, with reference to arabic and syriac examples. the model handles errors vocalisation, diacritics, phonetic syncopation and morphographemic idiosyncrasies, in addition to damerau errors. a complementary correction strategy for morphologically sound but morphosyntactically ill-formed words is outlined. deriving the predicate-argument structure for a free word order language. in relatively free word order languages, grammatical functions are intricately related to case marking. assuming an ordered representation of the predicate-argument structure, this work proposes a combinatory categorial grammar formulation of relating surface case cues to categories and types for correctly placing the arguments in the predicate-argument structure. this is achieved by treating case markers as type shifters. unlike other cg formulations, type shifting does not proliferate or cause spurious ambiguity. categories of all argument-encoding grammatical functions follow from the same principle of category assignment. normal order evaluation of the combinatory form reveals the predicate-argument structure. the application of the method to turkish is shown. the logical structure of binding. a logical recasting of binding theory is performed as an enhancing step for the purpose of its full and lean declarative implementation. a new insight on sentential anaphoric processes is presented which may suggestively be captured by the slogan binding conditions are the effect of phase quantification on the universe of discourse referents. tagset reduction without information loss. a technique for reducing a tagset used for n-gram part-of-speech disambiguation is introduced and evaluated in an experiment. the technique ensures that all information that is provided by the original tagset can be restored from the reduced one. this is crucial, since we are interested in the linguistically motivated tags for part-of-speech disambiguation. the reduced tagset needs fewer parameters for its statistical model and allows more accurate parameter estimation. additionally, there is a slight but not significant improvement of tagging accuracy. a simplified theory of tense representations and constraints on their composition. this paper proposes a set of representations for tenses and a set of constraints on how they can be combined in adjunct clauses. the semantics we propose explains the possible meanings of tenses in a variety of sentential contexts. it also supports an elegant constraint on tense combination in adjunct clauses. these semantic representations provide insights into the interpretations of tenses, and the constraints provide a source of syntactic disambiguation that has not previously been demonstrated. we demonstrate an implemented disambiguator for a certain class of three-clause sentences based on our theory. automatic acquisition of subcategorization frames from untagged text. this paper describes an implemented program that takes a raw, untagged text corpus as its only input (no open-class dictionary) and generates a partial list of verbs occurring in the text and the subcategorization frames (sfs) in which they occur. verbs are detected by a novel technique based on the case filter of rouvret and vergnaud (1980). the completeness of the output list increases monotonically with the total number of occurrences of each verb in the corpus. false positive rates are one to three percent of observations. five sfs are currently detected and more are planned. ultimately, i expect to provide a large sf dictionary to the nlp community and to train dictionaries for specific corpora. chinese text segmentation with mbdp-1: making the most of training corpora. this paper describes a system for segmenting chinese text into words using the mbdp-1 algorithm. mbdp-1 is a knowledge-free segmentation algorithm that bootstraps its own lexicon, which starts out empty. experiments on chinese and english corpora show that mbdp-1 reliably outperforms the best previous algorithm when the available hand-segmented training corpus is small. as the size of the hand-segmented training corpus grows, the performance of mbdp-1 converges toward that of the best previous algorithm. the fact that mbdp-1 can be used with a small corpus is expected to be useful not only for the rare event of adapting to a new language, but also for the common event of adapting to a new genre within the same language. automatic grammar induction and parsing free text: a transformation-based approach. in this paper we describe a new technique for parsing free text: a transformational grammar1 is automatically learned that is capable of accurately parsing text into binary-branching syntactic trees with nonterminals unlabelled. the algorithm works by beginning in a very naive state of knowledge about phrase structure. by repeatedly comparing the results of bracketing in the current state to proper bracketing provided in the training corpus, the system learns a set of simple structural transformations that can be applied to reduce error. after describing the algorithm, we present results and compare these results to other recent results in automatic grammar induction. beyond n-grams: can linguistic sophistication improve language modeling? it seems obvious that a successful model of natural language would incorporate a great deal of both linguistic and world knowledge. interestingly, state of the art language models for speech recognition are based on a very crude linguistic model, namely conditioning the probability of a word on a small fixed number of preceding words. despite many attempts to incorporate more sophisticated information into the models, the n-gram model remains the state of the art, used in virtually all speech recognition systems. in this paper we address the question of whether there is hope in improving language modeling by incorporating more sophisticated linguistic and world knowledge, or whether the n-grams are already capturing the majority of the information that can be employed. an improved error model for noisy channel spelling correction. the noisy channel model has been applied to a wide range of problems, including spelling correction. these models consist of two components: a source model and a channel model. very little research has gone into improving the channel model for spelling correction. this paper describes a new channel model for spelling correction, based on generic string to string edits. using this model gives significant performance improvements compared to previously proposed models. man* vs. machine: a case study in base noun phrase learning. a great deal of work has been done demonstrating the ability of machine learning algorithms to automatically extract linguistic knowledge from annotated corpora. very little work has gone into quantifying the difference in ability at this task between a person and a machine. this paper is a first step in that direction. classifier combination for improved lexical disambiguation. one of the most exciting recent directions in machine learning is the discovery that the combination of multiple classifiers often results in significantly better performance than what can be achieved with a single classifier. in this paper, we first show that the errors made from three different state of the art part of speech taggers are strongly complementary. next, we show how this complementatry behavior can be used to our advantage. by using contextual cues to guide tagger combination, we are able to derive a new tagger that achieves performance significantly greater than any of the individual taggers. lexical access in connected speech recognition. this paper addresses two issues concerning lexical access in connected speech recognition: 1) the nature of the pre-lexical representation used to initiate lexical lookup 2) the points at which lexical look-up is triggered off this representation. the results of an experiment are reported which was designed to evaluate a number of access strategies proposed in the literature in conjunction with several plausible pre-lexical representations of the speech input. the experiment also extends previous work by utilising a dictionary database containing a realistic rather than illustrative english vocabulary. co-evolution of language and of the language acquisition device. a new account of parameter setting during grammatical acquisition is presented in terms of generalized categorial grammar embedded in a default inheritance hierarchy, providing a natural partial ordering on the setting of parameters. experiments show that several expermentally effective learners can be defined in this framework. evolutionary simulations suggest that a learner with default initial settings for parameters will emerge, provided that learning is memory limited and the environment of linguistic adaptation contains an appropriate language. evaluating the accuracy of an unlexicalized statistical parser on the parc depbank. we evaluate the accuracy of an unlexicalized statistical parser, trained on 4k treebanked sentences from balanced data and tested on the parc depbank. we demonstrate that a parser which is competitive in accuracy (without sacrificing processing speed) can be quickly tuned without reliance on large in-domain manually-constructed treebanks. this makes it more practical to use statistical parsers in applications that need access to aspects of predicate-argument structure. the comparison of systems using depbank is not straightforward, so we extend and validate depbank and highlight a number of representation and scoring issues for relational evaluation schemes. the second release of the rasp system. we describe the new release of the rasp (robust accurate statistical parsing) system, designed for syntactic annotation of free text. the new version includes a revised and more semantically-motivated output representation, an enhanced grammar and part-of-speech tagger lexicon, and a more flexible and semi-supervised training method for the structural parse ranking model. we evaluate the released version on the wsj using a relational evaluation scheme, and describe how the new release allows users to enhance performance using (in-domain) lexical information. correcting esl errors using phrasal smt techniques. this paper presents a pilot study of the use of phrasal statistical machine translation (smt) techniques to identify and correct writing errors made by learners of english as a second language (esl). using examples of mass noun errors found in the chinese learner error corpus (clec) to guide creation of an engineered training set, we show that application of the smt paradigm can capture errors not well addressed by widely-used proofing tools designed for native speakers. our system was able to correct 61.81% of mistakes in a set of naturally-occurring examples of mass noun errors found on the world wide web, suggesting that efforts to collect alignable corpora of pre- and post-editing esl writing samples offer can enable the development of smt-based writing assistance tools capable of repairing many of the complex syntactic and lexical problems found in the writing of esl learners. ensemble methods for unsupervised wsd. combination methods are an effective way of improving system performance. this paper examines the benefits of system combination for unsupervised wsd. we investigate several voting- and arbiter-based combination strategies over a diverse pool of unsupervised wsd systems. our combination methods rely on predominant senses which are derived automatically from raw text. experiments using the semcor and senseval-3 data sets demonstrate that our ensembles yield significantly better results when compared with state-of-the-art. separating surface order and syntactic relations in a dependency grammar. this paper proposes decoupling the dependency tree from word order, such that surface ordering is not determined by traversing the dependency tree. we develop the notion of a word order domain structure, which is linked but structurally dissimilar to the syntactic dependency tree. the proposal results in a lexicalized, declarative, and formally precise description of word order; features which lack previous proposals for dependency grammars. contrary to other lexicalized approaches to word order, our proposal does not require lexical ambiguities for ordering alternatives. aligning sentences in parallel corpora. in this paper we describe a statistical technique for aligning sentences with their translations in two parallel corpora. in addition to certain anchor points that are available in our data, the only information about the sentences that we use for calculating alignments is the number of tokens that they contain. because we make no use of the lexical details of the sentence, the alignment computation is fast and therefore practical for application to very large collections of text. we have used this technique to align several million sentences in the english-french hansard corpora and have achieved an accuracy in excess of 99% in a random selected set of 1000 sentence pairs that we checked by hand. we show that even without the benefit of anchor points the correlation between the lengths of aligned sentences is strong enough that we should expect to achieve an accuracy of between 96% and 97%. thus, the technique may be applicable to a wider variety of texts than we have yet tried. word-sense disambiguation using statistical methods. we describe a statistical technique for assigning senses to words. an instance of a word is assigned a sense by asking a question about the context in which the word appears. the question is constructed to have high mutual information with the translation of that instance in another language. when we incorporated this method of assigning senses into our statistical machine translation system, the error rate of the system decreased by thirteen percent. word-sense disambiguation using decomposable models. most probabilistic classifiers used for word-sense disambiguation have either been based on only one contextual feature or have used a model that is simply assumed to characterize the interdependencies among multiple contextual features. in this paper, a different approach to formulating a probabilistic model is presented along with a case study of the performance of models produced in this manner for the disambiguation of the noun interest. we describe a method for formulating probabilistic models that use multiple contextual features for word-sense disambiguation, without requiring untested assumptions regarding the form of the model. using this approach, the joint distribution of all variables is described by only the most systematic variable interactions, thereby limiting the number of parameters to be estimated, supporting computational efficiency, and providing an understanding of the data. the interpretation of relational nouns. this paper decribes a computational treatment of the semantics of relational nouns. it covers relational nouns such as "sister" and "commander", and focuses especially on a particular subcategory of them, called function nouns ("speed", "distance", "rating"). relational nouns are usually viewed as either requiring non-compositional semantic interpretation, or causing an undesirable proliferation of syntactic rules. in contrast to this, we present a treatment which is both syntactically uniform and semantically compositional. the core ideas of this treatment are: (1) the recognition of different levels of semantic analysis; in particular, the distinction between an english-oriented and a domain-oriented level of meaning representation. (2) the analysis of relational nouns as denoting relation-extensions.the paper shows how this approach handles a variety of linguistic constructions involving relational nouns. the treatment presented here has been implemented in bbn's spoken language system, an experimental spoken language interface to a database/graphics system. terminology finite-state preprocessing for computational lfg. this paper presents a technique to deal with multiword nominal terminology in a computational lexical functional grammar. this method treats multiword terms as single tokens by modifying the preprocessing stage of the grammar (tokenization and morphological analysis), which consists of a cascade of two-level finite-state automata (transducers). we present here how we build the transducers to take terminology into account. we tested the method by parsing a small corpus with the without this treatment of multiword terms. the number of parses and parsing time decrease without affecting the relevance of the results. moreover, the method improves the perspicuity of the analyses. collective information extraction with relational markov networks. most information extraction (ie) systems treat separate potential extractions as independent. however, in many cases, considering influences between different potential extractions could improve overall accuracy. statistical methods based on undirected graphical models, such as conditional random fields (crfs), have been shown to be an effective approach to learning accurate ie systems. we present a new ie method that employs relational markov networks (a generalization of crfs), which can represent arbitrary dependencies between extractions. this allows for "collective information extraction" that exploits the mutual influence between possible extractions. experiments on learning to extract protein names from biomedical text demonstrate the advantages of this approach. named entity scoring for speech input. this paper describes a new scoring algorithm that supports comparison of linguistically annotated data from noisy sources. the new algorithm generalizes the message understanding conference (muc) named entity scoring algorithm, using a comparison based on explicit alignment of the underlying texts, followed by a scoring phase. the scoring procedure maps corresponding tagged regions and compares these according to tag type and tag extent, allowing us to reproduce the muc named entity scoring for identical underlying texts. in addition, the new algorithm scores for content (transcription correctness) of the tagged region, a useful distinction when dealing with noisy data that may differ from a reference transcription (e.g., speech recognizer output). to illustrate the algorithm, we have prepared a small test data set consisting of a careful transcription of speech data and manual insertion of sgml named entity annotation. we report results for this small test corpus on a variety of experiments involving automatic speech recognition and named entity tagging. automated scoring using a hybrid feature identification technique. this study exploits statistical redundancy inherent in natural language to automatically predict scores for essays. we use a hybrid feature identification method, including syntactic structure analysis, rhetorical structure analysis, and topical analysis, to score essay responses from test-takers of the graduate management admissions test (gmat) and the test of written english (twe). for each essay question, a stepwise linear regression analysis is run on a training set (sample of human scored essay responses) to extract a weighted set of predictive features for each test question. score prediction for cross-validation sets is calculated from the set of predictive features. exact or adjacent agreement between the electronic essay rater (e-rater) score predictions and human rater scores ranged from 87% to 94% across the 15 test questions. towards automatic classification of discourse elements in essays. educators are interested in essay evaluation systems that include feedback about writing features that can facilitate the essay revision process. for instance, if the thesis statement of a student's essay could be automatically identified, the student could then use this information to reflect on the thesis statement with regard to its quality, and its relationship to other discourse elements in the essay. using a relatively small corpus of manually annotated data, we use bayesian classification to identify thesis statements. this method yields results that are much closer to human performance than the results produced by two baseline systems. using an on-line dictionary to find rhyming words and pronunciations for unknown words. humans know a great deal about relationships among words. this paper discusses relationships among word pronunciations. we describe a computer system which models human judgement of rhyme by assigning specific roles to the location of primary stress, the similarity of phonetic segments, and other factors. by using the model as an experimental tool, we expect to improve our understanding of rhyme. a related computer model will attempt to generate pronunciations for unknown words by analogy with those for known words. the analogical processes involve techniques for segmenting and matching word spellings, and for mapping spelling to sound in known words. as in the case of rhyme, the computer model will be an important tool for improving our understanding of these processes. both models serve as the basis for functions in the wordsmith automated dictionary system. adapting an english morphological analyzer for french. a word-based morphological analyzer and a dictionary for recognizing inflected forms of french words have been built by adapting the udict system. we describe the adaptations, emphasizing mechanisms developed to handle french verbs. this work lays the groundwork for doing french derivational morphology and morphology for other languages. resolving pronominal reference to abstract entities. this paper describes phora, a technique for resolving pronominal reference to either individual or abstract entities. it defines processes for evoking abstract referents from discourse and for resolving both demonstrative and personal pronouns. it successfully interprets 72% of test pronouns, compared to 37% for a leading technique without these features. a preliminary model of centering in dialog. the centering framework explains local coherence by relating local focus and the form of referring expressions. it has proven useful in monolog, but its utility for multi-party discourse has not been shown, and a variety of issues must be tackled to adapt the model for dialog. this paper reports our application of three naive models of centering theory for dialog. these results will be used as baselines for evaluating future models. long-distance dependency resolution in automatically acquired wide-coverage pcfg-based lfg approximations. this paper shows how finite approximations of long distance dependency (ldd) resolution can be obtained automatically for wide-coverage, robust, probabilistic lexical-functional grammar (lfg) resources acquired from treebanks. we extract lfg subcategorisation frames and paths linking ldd reentrancies from f-structures generated automatically for the penn-ii treebank trees and use them in an ldd resolution algorithm to parse new text. unlike (collins, 1999; johnson, 2000), in our approach resolution of ldds is done at f-structure (attribute-value structure representations of basic predicate-argument or dependency structure) without empty productions, traces and coindexation in cfg parse trees. currently our best automatically induced grammars achieve 80.97% f-score for f-structures parsing section 23 of the wsj part of the penn-ii treebank and evaluating against the dcu 1051 and 80.24% against the parc 700 dependency bank (king et al., 2003), performing at the same or a slightly better level than state-of-the-art hand-crafted grammars (kaplan et al., 2004). from rags to riches: exploiting the potential of a flexible generation architecture. the rags proposals for generic specification of nlg systems includes a detailed account of data representation, but only an outline view of processing aspects. in this paper we introduce a modular processing architecture with a concrete implementation which aims to meet the rags goals of transparency and reusability. we illustrate the model with the riches system -- a generation system built from simple linguistically-motivated modules. robust pcfg-based generation using automatically acquired lfg approximations. we present a novel pcfg-based architecture for robust probabilistic generation based on wide-coverage lfg approximations (cahill et al., 2004) automatically extracted from treebanks, maximising the probability of a tree given an f-structure. we evaluate our approach using string-based evaluation. we currently achieve coverage of 95.26%, a bleu score of 0.7227 and string accuracy of 0.7476 on the penn-ii wsj section 23 sentences of length ≤20. the effect of pitch accenting on pronoun referent resolution. by strictest interpretation, theories of both centering and intonational meaning fail to predict the existence of pitch accented pronominals. yet they occur felicitously in spoken discourse. to explain this, i emphasize the dual functions served by pitch accents, as markers of both propositional (semantic/pragmatic) and attentional salience. this distinction underlies my proposals about the attentional consequences of pitch accents when applied to pronominals, in particular, that while most pitch accents may weaken or reinforce a cospecifier's status as the center of attention, a contrastively stressed pronominal may force a shift, even when contraindicated by textual features. integrating discourse markers into a pipelined natural language generation architecture. pipelined natural language generation (nlg) systems have grown increasingly complex as architectural modules were added to support language functionalities such as referring expressions, lexical choice, and revision. this has given rise to discussions about the relative placement of these new modules in the overall architecture. recent work on another aspect of multi-paragraph text, discourse markers, indicates it is time to consider where a discourse marker insertion algorithm fits in. we present examples which suggest that in a pipelined nlg architecture, the best approach is to strongly tie it to a revision component. finally, we evaluate the approach in a working multi-page system. pronominalization in generated discourse and dialogue. previous approaches to pronominalization have largely been theoretical rather than applied in nature. frequently, such methods are based on centering theory, which deals with the resolution of anaphoric pronouns. but it is not clear that complex theoretical mechanisms, while having satisfying explanatory power, are necessary for the actual generation of pronouns. we first illustrate examples of pronouns from various domains, describe a simple method for generating pronouns in an implemented multi-page generation system, and present an evaluation of its performance. scaling phrase-based statistical machine translation to larger corpora and longer phrases. in this paper we describe a novel data structure for phrase-based statistical machine translation which allows for the retrieval of arbitrarily long phrases while simultaneously using less memory than is required by current decoder implementations. we detail the computational complexity and average retrieval times for looking up phrase translations in our suffix array-based data structure. we show how sampling can be used to reduce the retrieval time by orders of magnitude with no loss in translation quality. statistical machine translation with word- and sentence-aligned parallel corpora. the parameters of statistical translation models are typically estimated from sentence-aligned parallel corpora. we show that significant improvements in the alignment and translation quality of such models can be achieved by additionally including word-aligned data during training. incorporating word-level alignments into the parameter estimation of the ibm models reduces alignment error rate and increases the bleu score when compared to training the same models only on sentence-aligned data. on the verbmobil data set, we attain a 38% reduction in the alignment error rate and a higher bleu score with half as many training examples. we discuss how varying the ratio of word-aligned to sentence-aligned data affects the expected performance gain. using linguistic principles to recover empty categories. this paper describes an algorithm for detecting empty nodes in the penn treebank (marcus et al., 1993), finding their antecedents, and assigning them function tags, without access to lexical information such as valency. unlike previous approaches to this task, the current method is not corpus-based, but rather makes use of the principles of early government-binding theory (chomsky, 1981), the syntactic theory that underlies the annotation. using the evaluation metric proposed by johnson (2002), this approach outperforms previously published approaches on both detection of empty categories and antecedent identification, given either annotated input stripped of empty categories or the output of a parser. some problems with this evaluation metric are noted and an alternative is proposed along with the results. the paper considers the reasons a principle-based approach to this problem should outperform corpus-based approaches, and speculates on the possibility of a hybrid approach. generating an ltag out of a principle-based hierarchical representation. lexicalized tree adjoining grammars have proved useful for nlp. however, numerous redundancy problems face ltags developers, as highlighted by vijay-shanker and schabes (92).we present and a tool that automatically generates the tree families of an ltag. it starts from a compact hierarchical organization of syntactic descriptions that is linguistically motivated and carries out all the relevant combinations of linguistic phenomena. building parallel ltag for french and italian. in this paper we view lexicalized tree adjoining grammars as the compilation of a more abstract and modular layer of linguistic description: the metagrammar (mg). mg provides a hierarchical representation of lexico-syntactic descriptions and principles that capture the well-formedness of lexicalized structures, expressed using syntactic functions. this makes it possible for a tool to compile an instance of mg into an ltag, automatically performing the relevant combinations of linguistic phenomena. we then describe the instantiation of an mg for italian and french. the work for french was performed starting with an existing ltag, which has been augmented as a result. the work for italian was performed by systematic contrast with the french mg. the automatic compilation gives two parallel ltag, compatible for multilingual nlp applications. uncertainty reduction in collaborative bootstrapping: measure and algorithm. this paper proposes the use of uncertainty reduction in machine learning methods such as co-training and bilingual boot-strapping, which are referred to, in a general term, as 'collaborative bootstrapping'. the paper indicates that uncertainty reduction is an important factor for enhancing the performance of collaborative bootstrapping. it proposes a new measure for representing the degree of uncertainty correlation of the two classifiers in collaborative bootstrapping and uses the measure in analysis of collaborative bootstrapping. furthermore, it proposes a new algorithm of collaborative bootstrapping on the basis of uncertainty reduction. experimental results have verified the correctness of the analysis and have demonstrated the significance of the new algorithm. automatic construction of a hypernym-labeled noun hierarchy from text. previous work has shown that automatic methods can be used in building semantic lexicons. this work goes a step further by automatically creating not just clusters of related words, but a hierarchy of nouns and their hypernyms, akin to the hand-built hierarchy in wordnet. a pragmatics-based approach to understanding intersentential ellipsis. intersentential elliptical utterances occur frequently in information-seeking dialogues. this paper presents a pragmatics-based framework for interpreting such utterances, including identification of the speaker's discourse goal in employing the fragment. we claim that the advantage of this approach is its reliance upon pragmatic information, including discourse content and conversational goals, rather than upon precise representations of the preceding utterance alone. metapher - a key to extensible semantic analysis. interpreting metaphors is an integral and inescapable process in human understanding of natural language. this paper discusses a method of analyzing metaphors based on the existence of a small number of generalized metaphor mappings. each generalized metaphor contains a recognition network, a basic mapping, additional transfer mappings, and an implicit intention component. it is argued that the method reduces metaphor interpretation from a reconstruction to a recognition task. implications towards automating certain aspects of language learning are also discussed. corpus-based acquisition of relative pronoun disambiguation heuristics. this paper presents a corpus-based approach for deriving heuristics to locate the antecedents of relative pronouns. the technique duplicates the performance of hand-coded rules and requires human intervention only during the training phase. because the training instances are built on parser output rather than word cooccurrences, the technique requires a small number of training examples and can be used on small to medium-sized corpora. our initial results suggest that the approach may provide a general method for the automated acquisition of a variety of disambiguation heuristics for natural language systems, especially for problems that require the assimilation of syntactic and semantic knowledge. error-driven pruning of treebank grammars for base noun phrase identification. finding simple, non-recursive, base noun phrases is an important subtask for many natural language processing applications. while previous empirical methods for base np identification have been rather complex, this paper instead proposes a very simple algorithm that is tailored to the relative simplicity of the task. in particular, we present a corpus-based approach for finding base nps by matching part-of-speech tag sequences. the training phase of the algorithm is based on two successful techniques: first the base np grammar is read from a "treebank" corpus; then the grammar is improved by selecting rules with high "benefit" scores. using this simple algorithm with a naive heuristic for matching rules, we achieve surprising accuracy in an evaluation on the penn treebank wall street journal. an empirical study of the influence of argument conciseness on argument effectiveness. we have developed a system that generates evaluative arguments that are tailored to the user, properly arranged and concise. we have also developed an evaluation framework in which the effectiveness of evaluative arguments can be measured with real users. this paper presents the results of a formal experiment we have performed in our framework to verify the influence of argument conciseness on argument effectiveness paralanguage in computer mediated communication. this paper reports on some of the components of person to person communication mediated by computer conferenceing systems. transcripts from two systems were analysed: the electronic information and exchange system (eies), based at the new jersey institute of technology; and planet, based at infomedia inc. in palo alto, california. the research focused upon the ways in which expressive communication is encoded by users of the medium inclusion, disjointness and choice: the logic of linguistic classification. we investigate the logical structure of concepts generated by conjunction and disjunction over a monotonic multiple inheritance network where concept nodes represent linguistic categories and links indicate basic inclusion (isa) and disjointness (isnota) relations. we model the distinction between primitive and defined concepts as well as between closed-and open-world reasoning. we apply our logical analysis to the sort inheritance and unification system of hpsg and also to classification in systemic choice systems. word sense disambiguation vs. statistical machine translation. we directly investigate a subject of much recent debate: do word sense disambiguation models help statistical machine translation quality? we present empirical results casting doubt on this common, but unproved, assumption. using a state-of-the-art chinese word sense disambiguation model to choose translation candidates for a typical ibm statistical mt system, we find that word sense disambiguation does not yield significantly better translation quality than the statistical machine translation system alone. error analysis suggests several key factors behind this surprising finding, including inherent limitations of current statistical mt architectures. relating complexity to practical performance in parsing with wide-coverage unification grammars. the paper demonstrates that exponential complexities with respect to grammar size and input length have little impact on the performance of three unification-based parsing algorithms, using a wide-coverage grammar. the results imply that the study and optimisation of unification-based parsing must rely on empirical data until complexity theory can more accurately predict the practical behaviour of such parsers1. lattice-based word identification in clare. i argue that because of spelling and typing errors and other properties of typed text, the identification of words and word boundaries in general requires syntactic and semantic knowledge. a lattice representation is therefore appropriate for lexical analysis. i show how the use of such a representation in the clare system allows different kinds of hypothesis about word identity to be integrated in a uniform framework. i then describe a quantitative evaluation of clare's performance on a set of sentences into which typographic errors have been introduced. the results show that syntax and semantics can be applied as powerful sources of constraint on the possible corrections for misspelled words. n semantic classes are harder than two. we show that we can automatically classify semantically related phrases into 10 classes. classification robustness is improved by training with multiple sources of evidence, including within-document cooccurrence, html markup, syntactic relationships in sentences, substitutability in query logs, and string similarity. our work provides a benchmark for automatic n-way classification into wordnet's semantic classes, both on a trec news corpus and on a corpus of substitutable search query phrases. non-verbal cues for discourse structure. this paper addresses the issue of designing embodied conversational agents that exhibit appropriate posture shifts during dialogues with human users. previous research has noted the importance of hand gestures, eye gaze and head nods in conversations between embodied agents and humans. we present an analysis of human monologues and dialogues that suggests that postural shifts can be predicted as a function of discourse state in monologues, and discourse and conversation state in dialogues. on the basis of these findings, we have implemented an embodied conversational agent that uses collagen in such a way as to generate postural shifts. computational lexical semantics, incrementality, and the so-called punctuality of events. the distinction between achievements and accomplishments is known to be an empirically important but subtle one. it is argued here to depend on the atomicity (rather than punctuality) of events, and to be strongly related to incrementality (i.e., to event-object mapping functions). a computational treatment of incrementality and atomicity is discussed in the paper, and a number or related empirical problems considered, notably lexical polysemy in verb-argument relationships. optimization in multimodal interpretation. in a multimodal conversation, the way users communicate with a system depends on the available interaction channels and the situated context (e.g., conversation focus, visual feedback). these dependencies form a rich set of constraints from various perspectives such as temporal alignments between different modalities, coherence of conversation, and the domain semantics. there is strong evidence that competition and ranking of these constraints is important to achieve an optimal interpretation. thus, we have developed an optimization approach for multimodal interpretation, particularly for interpreting multimodal references. a preliminary evaluation indicates the effectiveness of this approach, especially for complex user inputs that involve multiple referring expressions in a speech utterance and multiple gestures. towards conversational qa: automatic identification of problematic situations and user intent. to enable conversational qa, it is important to examine key issues addressed in conversational systems in the context of question answering. in conversational systems, understanding user intent is critical to the success of interaction. recent studies have also shown that the capability to automatically identify problematic situations during interaction can significantly improve the system performance. therefore, this paper investigates the new implications of user intent and problematic situations in the context of question answering. our studies indicate that, in basic interactive qa, there are different types of user intent that are tied to different kinds of system performance (e.g., problematic/error free situations). once users are motivated to find specific information related to their information goals, the interaction context can provide useful cues for the system to automatically identify problematic situations and user intent. sense disambiguation using semantic relations and adjacency information. this paper describes a heuristic-based approach to word-sense disambiguation. the heuristics that are applied to disambiguate a word depend on its part of speech, and on its relationship to neighboring salient words in the text. parts of speech are found through a tagger, and related neighboring words are identified by a phrase extractor operating on the tagged text. to suggest possible senses, each heuristic draws on semantic relations extracted from a webster's dictionary and the semantic thesaurus wordnet. for a given word, all applicable heuristics are tried, and those senses that are rejected by all heuristics are discarded. in all, the disambiguator uses 39 heuristics based on 12 relationships. estimating class priors in domain adaptation for word sense disambiguation. instances of a word drawn from different domains may have different sense priors (the proportions of the different senses of a word). this in turn affects the accuracy of word sense disambiguation (wsd) systems trained and applied on different domains. this paper presents a method to estimate the sense priors of words drawn from a new domain, and highlights the importance of using well calibrated probabilities when performing these estimations. by using well calibrated probabilities, we are able to estimate the sense priors effectively to achieve significant improvements in wsd accuracy. an alignment method for noisy parallel corpora based on image processing techniques. this paper presents a new approach to bitext correspondence problem (bcp) of noisy bilingual corpora based on image processing (ip) techniques. by using one of several ways of estimating the lexical translation probability (ltp) between pairs of source and target words, we can turn a bitext into a discrete gray-level image. we contend that the bcp, when seen in the light, bears a striking resemblance to the line detection problem in ip. therefore, bcps, including sentence and word alignment, can benefit from a wealth of effective, well established ip techniques, including convolution-based filters, texture analysis and hough transform. this paper describes a new program, plotalign that produces a word-level bitext map for noisy or non-literal bitext, based on these techniques. a pipeline framework for dependency parsing. pipeline computation, in which a task is decomposed into several stages that are solved sequentially, is a common computational strategy in natural language processing. the key problem of this model is that it results in error accumulation and suffers from its inability to correct mistakes in previous stages. we develop a framework for decisions made via in pipeline models, which addresses these difficulties, and presents and evaluates it in the context of bottom up dependency parsing for english. we show improvements in the accuracy of the inferred trees relative to existing models. interestingly, the proposed algorithm shines especially when evaluated globally, at a sentence level, where our results are significantly better than those of existing approaches. gpsm: a generalized probabilistic semantic model for ambiguity resolution. in natural language processing, ambiguity resolution is a central issue, and can be regarded as a preference assignment problem. in this paper, a generalized probabilistic semantic model (gpsm) is proposed for preference computation. an effective semantic tagging procedure is proposed for tagging semantic features. a semantic score function is derived based on a score function, which integrates lexical, syntactic and semantic preference under a uniform formulation. the semantic score measure shows substantial improvement in structural disambiguation over a syntax-based approach. immediate-head parsing for language models. we present two language models based upon an "immediate-head" parser --- our name for a parser that conditions all events below a constituent c upon the head of c. while all of the most accurate statistical parsers are of the immediate-head variety, no previous grammatical language model uses this technology. the perplexity for both of these models significantly improve upon the trigram model base-line as well as the best previous grammar-based language model. for the better of our two models these improvements are 24% and 14% respectively. we also suggest that improvement of the underlying parser should significantly improve the model's perplexity and that even in the near term there is a lot of potential for improvement in immediate-head language models. a logic for semantic interpretation. we propose that logic (enhanced to encode probability information) is a good way of characterizing semantic interpretation. in support of this we give a fragment of an axiomatization for word-sense disambiguation, nounphrase (and verb) reference, and case disambiguation. we describe an inference engine (frail3) which actually takes this axiomatization and uses it to drive the semantic interpretation process. we claim three benefits from this scheme. first, the interface between semantic interpretation and pragmatics has always been problematic, since all of the above tasks in general require pragmatic inference. now the interface is trival, since both semantic interpretation and pragmatics use the same vocabulary and inference engine. the second benefit, related to the first, is that semantic guidance of syntax is a side effect of the interpretation. the third benefit is the elegance of the semantic interpretation theory. a few simple rules capture a remarkable diversity of semantic phenomena. coarse-to-fine n-best parsing and maxent discriminative reranking. discriminative reranking is one method for constructing high-performance statistical parsers (collins, 2000). a discriminative reranker requires a source of candidate parses for each sentence. this paper describes a simple yet novel method for constructing sets of 50-best parses based on a coarse-to-fine generative parser (charniak, 2000). this method generates 50-best lists that are of substantially higher quality than previously obtainable. we used these parses as the input to a maxent reranker (johnson et al., 1999; riezler et al., 2002) that selects the best parse from the set of parses for each sentence, obtaining an f-score of 91.0% on sentences of length 100 or less. word alignment in english-hindi parallel corpus using recency-vector approach: some studies. word alignment using recency-vector based approach has recently become popular. one major advantage of these techniques is that unlike other approaches they perform well even if the size of the parallel corpora is small. this makes these algorithms worth-studying for languages where resources are scarce. in this work we studied the performance of two very popular recency-vector based approaches, proposed in (fung and mckeown, 1994) and (somers, 1998), respectively, for word alignment in english-hindi parallel corpus. but performance of the above algorithms was not found to be satisfactory. however, subsequent addition of some new constraints improved the performance of the recency-vector based alignment technique significantly for the said corpus. the present paper discusses the new version of the algorithm and its performance in detail. a hybrid convolution tree kernel for semantic role labeling. a hybrid convolution tree kernel is proposed in this paper to effectively model syntactic structures for semantic role labeling (srl). the hybrid kernel consists of two individual convolution kernels: a path kernel, which captures predicate-argument link features, and a constituent structure kernel, which captures the syntactic structure features of arguments. evaluation on the datasets of conll-2005 srl shared task shows that the novel hybrid convolution tree kernel out-performs the previous tree kernels. we also combine our new hybrid tree kernel based method with the standard rich flat feature based method. the experimental results show that the combinational method can get better performance than each of them individually. a structured language model. the paper presents a language model that develops syntatic structure and uses it to extract meaningful information from the word history, thus enabling the use of long distance dependencies. the model assigns probability to every joint sequence of words-binary-parse-structure with headword annotation. the model, its probabilistic parametrization, and a set of experiments meant to evaluate its predictive power are presented. position specific posterior lattices for indexing speech. the paper presents the position specific posterior lattice, a novel representation of automatic speech recognition lattices that naturally lends itself to efficient indexing of position information and subsequent relevance ranking of spoken documents using proximity.in experiments performed on a collection of lecture recordings --- mit icampus data --- the spoken document ranking accuracy was improved by 20% relative over the commonly used baseline of indexing the 1-best output from an automatic speech recognizer. the mean average precision (map) increased from 0.53 when using 1-best output to 0.62 when using the new lattice representation. the reference used for evaluation is the output of a standard retrieval engine working on the manual transcription of the speech collection.albeit lossy, the pspl lattice is also much more compact than the asr 3-gram lattice from which it is computed --- which translates in reduced inverted index size as well --- at virtually no degradation in word-error-rate performance. since new paths are introduced in the lattice, the oracle accuracy increases over the original asr lattice. speech ogle: indexing uncertainty for spoken document search. the paper presents the position specific posterior lattice (pspl), a novel lossy representation of automatic speech recognition lattices that naturally lends itself to efficient indexing and subsequent relevance ranking of spoken documents.in experiments performed on a collection of lecture recordings --- mit icampus data --- the spoken document ranking accuracy was improved by 20% relative over the commonly used baseline of indexing the 1-best output from an automatic speech recognizer.the inverted index built from pspl lattices is compact --- about 20% of the size of 3-gram asr lattices and 3% of the size of the uncompressed speech --- and it allows for extremely fast retrieval. furthermore, little degradation in performance is observed when pruning pspl lattices, resulting in even smaller indexes --- 5% of the size of 3-gram asr lattices. exploiting syntactic structure for language modeling. the paper presents a language model that develops syntactic structure and uses it to extract meaningful information from the word history, thus enabling the use of long distance dependencies. the model assigns probability to every joint sequence of words-binary-parse-structure with headword annotation and operates in a left-to-right manner --- therefore usable for automatic speech recognition. the model, its probabilistic parameterization, and a set of experiments meant to evaluate its predictive power are presented; an improvement over standard trigram modeling is achieved. aligning sentences in bilingual corpora using lexical information. in this paper, we describe a fast algorithm for aligning sentences with their translations in a bilingual corpus. existing efficient algorithms ignore word identities and only consider sentence length (brown et al., 1991b; gale and church, 1991). our algorithm constructs a simple statistical word-to-word translation model on the fly during alignment. we find the alignment that maximizes the probability of generating the corpus with this translation model. we have achieved an error rate of approximately 0.4% on canadian hansard data, which is a significant improvement over previous results. the algorithm is language independent. bayesian grammar induction for language modeling. we describe a corpus-based induction algorithm for probabilistic context-free grammars. the algorithm employs a greedy heuristic search within a bayesian framework, and a post-pass using the inside-outside algorithm. we compare the performance of our algorithm to n-gram models and the inside-outside algorithm in three language modeling tasks. in two of the tasks, the training data is generated by a probabilistic context-free grammar and in both tasks our algorithm outperforms the other techniques. the third task involves naturally-occurring data, and in this task our algorithm does not perform as well as n-gram models but vastly outperforms the inside-outside algorithm. resolving translation ambiguity and target polysemy in cross-language information retrieval. this paper deals with translation ambiguity and target polysemy problems together. two monolingual balanced corpora are employed to learn word co-occurrence for translation ambiguity resolution, and augmented translation restrictions for target polysemy resolution. experiments show that the model achieves 62.92% of monolingual information retrieval, and is 40.80% addition to the select-all model. combining the target polysemy resolution, the retrieval performance is about 10.11% increase to the model resolving translation ambiguity only. a high-accurate chinese-english ne backward translation system combining both lexical information and web statistics. named entity translation is indispensable in cross language information retrieval nowadays. we propose an approach of combining lexical information, web statistics, and inverse search based on google to backward translate a chinese named entity (ne) into english. our system achieves a high top-1 accuracy of 87.6%, which is a relatively good performance reported in this area until present. extracting noun phrases from large-scale texts: a hybrid approach and its automatic evaluation. to acquire noun phrases from running texts is useful for many applications, such as word grouping, terminology indexing, etc. the reported literatures adopt pure probabilistic approach, or pure rule-based noun phrases grammar to tackle this problem. in this paper, we apply a probabilistic chunker to deciding the implicit boundaries of constituents and utilize the linguistic knowledge to extract the noun phrases by a finite state mechanism. the test texts are susanne corpus and the results are evaluated by comparing the parse field of susanne corpus automatically. the results of this preliminary experiment are encouraging. a concept-based adaptive approach to word sense disambiguation. word sense disambiguation for unrestricted text is one of the most difficult tasks in the fields of computational linguistics. the crux of the problem is to discover a model that relates the intended sense of a word with its context. this paper describes a general framework for adaptive conceptual word sense disambiguation. central to this wsd framework is the sense division and semantic relations based on topical analysis of dictionary sense definitions. the process begins with an initial disambiguation step using an mrd-derived knowledge base. an adaptation step follows to combine the initial knowledge base with knowledge gleaned from the partial disambiguated text. once the knowledge base is adjusted to suit the text at hand, it is then applied to the text again to finalize the disambiguation result. definitions and example sentences from ldoce are employed as training materials for wsd, while passages from the brown corpus and wall street journal are used for testing. we report on several experiments illustrating effectiveness of the adaptive approach. an empirical study of smoothing techniques for language modeling. we present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by jelinek and mercer (1980), katz (1987), and church and gale (1991). we investigate for the first time how factors such as training data size, corpus (e.g., brown versus wall street journal), and n-gram order (bigram versus trigram) affect the relative performance of these methods, which we measure through the cross-entropy of test data. in addition, we introduce two novel smoothing techniques, one a variation of jelinek-mercer smoothing and one a very simple linear interpolation technique, both of which outperform existing methods. proper name translation in cross-language information retrieval. recently, language barrier becomes the major problem for people to search, retrieve, and understand www documents in different languages. this paper deals with query translation issue in cross-language information retrieval, proper names in particular. models for name identification, name translation and name searching are presented. the recall rates and the precision rates for the identification of chinese organization names, person names and location names under met data are (76.67%, 79.33%), (87.33%, 82.33%) and (77.00%, 82.00%), respectively. in name translation, only 0.79% and 1.11% of candidates for english person names and location names, respectively, have to be proposed. the name searching facility is implemented on an mt sever for information retrieval on the www. under this system, user can issue queries and read documents with his familiar language. relation extraction using label propagation based semi-supervised learning. shortage of manually labeled data is an obstacle to supervised relation extraction methods. in this paper we investigate a graph based semi-supervised learning algorithm, a label propagation (lp) algorithm, for relation extraction. it represents labeled and unlabeled examples and their distances as the nodes and the weights of edges of a graph, and tries to obtain a labeling function to satisfy two constraints: 1) it should be fixed on the labeled nodes, 2) it should be smooth on the whole graph. experiment results on the ace corpus showed that this lp algorithm achieves better performance than svm when only very few labeled examples are available, and it also performs better than bootstrapping for the relation extraction task. unsupervised relation disambiguation using spectral clustering. this paper presents an unsupervised learning approach to disambiguate various relations between name entities by use of various lexical and syntactic features from the contexts. it works by calculating eigen-vectors of an adjacency graph's laplacian to recover a submanifold of data from a high dimensionality space and then performing cluster number estimation on the eigenvectors. experiment results on ace corpora show that this spectral clustering based approach outperforms the other clustering methods. a new statistical approach to chinese pinyin input. chinese input is one of the key challenges for chinese pc users. this paper proposes a statistical approach to pinyin-based chinese input. this approach uses a trigram-based language model and a statistically based segmentation. also, to deal with real input, it also includes a typing model which enables spelling correction in sentence-based pinyin input, and a spelling model for english which enables modeless pinyin input. fast - an automatic generation system for grammar tests. this paper introduces a method for the semi-automatic generation of grammar test items by applying natural language processing (nlp) techniques. based on manually-designed patterns, sentences gathered from the web are transformed into tests on grammaticality. the method involves representing test writing knowledge as test patterns, acquiring authentic sentences on the web, and applying generation strategies to transform sentences into items. at runtime, sentences are converted into two types of toefl-style question: multiple-choice and error detection. we also describe a prototype system fast (free assessment of structural tests). evaluation on a set of generated questions indicates that the proposed method performs satisfactory quality. our methodology provides a promising approach and offers significant potential for computer assisted language learning and assessment. novel association measures using web search with double checking. a web search with double checking model is proposed to explore the web as a live corpus. five association measures including variants of dice, overlap ratio, jaccard, and cosine, as well as co-occurrence double check (codc), are presented. in the experiments on rubenstein-goodenough's benchmark data set, the codc measure achieves correlation coefficient 0.8492, which competes with the performance (0.8914) of the model using wordnet. the experiments on link detection of named entities using the strategies of direct association, association matrix and scalar association matrix verify that the double-check frequencies are reliable. further study on named entity clustering shows that the five measures are quite useful. in particular, codc measure is very stable on word-word and name-name experiments. the application of codc measure to expand community chains for personal name disambiguation achieves 9.65% and 14.22% increase compared to the system without community expansion. all the experiments illustrate that the novel model of web search with double checking is feasible for mining associations from the web. chinese verb sense discrimination using an em clustering model with rich linguistic features. this paper discusses the application of the expectation-maximization (em) clustering algorithm to the task of chinese verb sense discrimination. the model utilized rich linguistic features that capture predicate-argument structure information of the target verbs. a semantic taxonomy for chinese nouns, which was built semi-automatically based on two electronic chinese semantic dictionaries, was used to provide semantic features for the model. purity and normalized mutual information were used to evaluate the clustering performance on 12 chinese verbs. the experimental results show that the em clustering model can learn sense or sense group distinctions for most of the verbs successfully. we further enhanced the model with certain fine-grained semantic categories called lexical sets. our results indicate that these lexical sets improve the model's performance for the three most challenging verbs chosen from the first set of experiments. pat-trees with the deletion function as the learning device for linguistic patterns. in this study, a learning device based on the pat-tree data structures was developed. the original pat-trees were enhanced with the deletion function to emulate human learning competence. the learning process worked as follows. the linguistic patterns from the text corpus are inserted into the pat-tree one by one. since the memory was limited, hopefully, the important and new patterns would be retained in the pat-tree and the old and unimportant patterns would be released from the tree automatically. the proposed pat-trees with the deletion function have the following advantages. 1) they are easy to construct and maintain. 2) any prefix substring and its frequency count through pat-tree can be searched very quickly. 3) the space requirement for a pat-tree is linear with respect to the size of the input text. 4) the insertion of a new element can be carried out at any time without being blocked by the memory constraints because the free space is released through the deletion of unimportant elements.experiments on learning high frequency bigrams were carried out under different memory size constraints. high recall rates were achieved. the results show that the proposed pat-trees can be used as on-line learning devices. an empirical study of chinese chunking. in this paper, we describe an empirical study of chinese chunking on a corpus, which is extracted from upenn chinese treebank-4 (ctb4). first, we compare the performance of the state-of-the-art machine learning models. then we propose two approaches in order to improve the performance of chinese chunking. 1) we propose an approach to resolve the special problems of chinese chunking. this approach extends the chunk tags for every problem by a tag-extension function. 2) we propose two novel voting methods based on the characteristics of chunking task. compared with traditional voting methods, the proposed voting methods consider long distance information. the experimental results show that the svms model outperforms the other models and that our proposed approaches can improve performance significantly. reranking answers for definitional qa using language modeling. statistical ranking methods based on centroid vector (profile) extracted from external knowledge have become widely adopted in the top definitional qa systems in trec 2003 and 2004. in these approaches, terms in the centroid vector are treated as a bag of words based on the independent assumption. to relax this assumption, this paper proposes a novel language model-based answer reranking method to improve the existing bag-of-words model approach by considering the dependence of the words in the centroid vector. experiments have been conducted to evaluate the different dependence models. the results on the trec 2003 test set show that the reranking approach with biterm language model, significantly outperforms the one with the bag-of-words model and unigram language model by 14.9% and 12.5% respectively in f-measure(5). embedding new information into referring expressions. this paper focuses on generating referring expressions capable of serving multiple communicative goals. the components of a referring expression are divided into a referring part and a non-referring part. two rules for the content determination and construction of the non-referring part are given, which are realised in an embedding algorithm. the significant aspect of our approach is that it intends to generate the non-referring part given the restrictions imposed by the referring part, whose realisation is, on the other hand, affected by the non-referring part. a probability model to improve word alignment. word alignment plays a crucial role in statistical machine translation. word-aligned corpora have been found to be an excellent source of translation-related knowledge. we present a statistical model for computing the probability of an alignment given a sentence pair. this model allows easy integration of context-specific features. our experiments show that this model can be an effective tool for improving an existing word alignment. soft syntactic constraints for word alignment through discriminative training. word alignment methods can gain valuable guidance by ensuring that their alignments maintain cohesion with respect to the phrases specified by a monolingual dependency tree. however, this hard constraint can also rule out correct alignments, and its utility decreases as alignment models become more complex. we use a publicly available structured output svm to create a max-margin syntactic aligner with a soft cohesion constraint. the resulting aligner is the first, to our knowledge, to use a discriminative learning method to train an itg bitext parser. statistical parsing with an automatically-extracted tree adjoining grammar. we discuss the advantages of lexicalized tree-adjoining grammar as an alternative to lexicalized pcfg for statistical parsing, describing the induction of a probabilistic ltag model from the penn treebank and evaluating its parsing performance. we find that this induction method is an improvement over the em-based method of (hwa, 1998), and that the induced model yields results comparable to lexicalized pcfg. constraints on strong generative power. we consider the question "how much strong generative power can be squeezed out of a formal system without increasing its weak generative power?" and propose some theoretical and practical constraints on this problem. we then introduce a formalism which, under these constraints, maximally squeezes strong generative power out of context-free grammar. finally, we generalize this result to formalisms beyond cfg. a hierarchical phrase-based model for statistical machine translation. we present a statistical phrase-based translation model that uses hierarchical phrases---phrases that contain subphrases. the model is formally a synchronous context-free grammar but is learned from a bitext without any syntactic information. thus it can be seen as a shift to the formal machinery of syntax-based translation systems without any linguistic commitment. in our experiments using bleu as a metric, the hierarchical phrase-based model achieves a relative improvement of 7.5% over pharaoh, a state-of-the-art phrase-based system. a preference-first language processor integrating the unification grammar and markov language model for speech recognition applications. a language processor is to find out a most promising sentence hypothesis for a given word lattice obtained from acoustic signal recognition. in this paper a new language processor is proposed, in which unification grammar and markov language model are integrated in a word lattice parsing algorithm based on an augmented chart, and the island-driven parsing concept is combined with various preference-first parsing strategies defined by different construction principles and decision rules. test results show that significant improvements in both correct rate of recognition and computation speed can be achieved. teaching a weaker classifier: named entity recognition on upper case text. this paper describes how a machine-learning named entity recognizer (ner) on upper case text can be improved by using a mixed case ner and some unlabeled text. the mixed case ner can be used to tag some unlabeled mixed case text, which are then used as additional training material for the upper case ner. we show that this approach reduces the performance gap between the mixed case ner and the upper case ner substantially, by 39% for muc-6 and 22% for muc-7 named entity test data. our method is thus useful in improving the accuracy of ners on upper case text, such as transcribed text from automatic speech recognizers where case information is missing. closing the gap: learning-based information extraction rivaling knowledge-engineering methods. in this paper, we present a learning approach to the scenario template task of information extraction, where information filling one template could come from multiple sentences. when tested on the muc-4 task, our learning approach achieves accuracy competitive to the best of the muc-4 systems, which were all built with manually engineered rules. our analysis reveals that our use of full parsing and state-of-the-art learning algorithms have contributed to the good performance. to our knowledge, this is the first research to have demonstrated that a learning approach to the full-scale information extraction task could achieve performance rivaling that of the knowledge engineering approach. an account for compound prepositions in farsi. there are some sorts of 'preposition + noun' combinations in farsi that apparently a prepositional phrase almost behaves as compound prepositions. as they are not completely behaving as compounds, it is doubtful that the process of word formation is a morphological one. the analysis put forward by this paper proposes "incorporation" by which an n° is incorporated to a p° constructing a compound preposition. in this way tagging prepositions and parsing texts in natural language processing is defined in a proper manner. extracting semantic hierarchies from a large on-line dictionary. dictionaries are rich sources of detailed semantic information, but in order to use the information for natural language processing, it must be organized systematically. this paper describes automatic and semi-automatic procedures for extracting and organizing semantic feature information implicit in dictionary definitions. two head-finding heuristics are described for locating the genus terms in noun and verb definitions. the assumption is that the genus term represents inherent features of the word it defines. the two heuristics have been used to process definitions of 40,000 nouns and 8,000 verbs, producing indexes in which each genus term is associated with the words it defined. the sprout program interactively grows a taxonomic "tree" from any specified root feature by consulting the genus index. its output is a tree in which all of the nodes have the root feature for at least one of their senses. the filter program uses an inverted form of the genus index. filtering begins with an initial filter file consisting of words that have a given feature (e.g. [+human]) in all of their senses. the program then locates, in the index, words whose genus terms all appear in the filter file. the output is a list of new words that have the given feature in all of their senses. a flexible distributed architecture for nlp system development and use. we describe a distributed, modular architecture for platform independent natural language systems. it features automatic interface generation and self-organization. adaptive (and non-adaptive) voting mechanisms are used for integrating discrete modules. the architecture is suitable for rapid prototyping and product delivery. analysis system of speech acts and discourse structures using maximum entropy model. we propose a statistical dialogue analysis model to determine discourse structures as well as speech acts using maximum entropy model. the model can automatically acquire probabilistic discourse knowledge from a discourse tagged corpus to resolve ambiguities. we propose the idea of tagging discourse segment boundaries to represent the structural information of discourse. using this representation we can effectively combine speech act analysis and discourse structure analysis in one framework. hybrid approaches to improvement of translation quality in web-based english-korean machine translation. the previous english-korean mt system that was the transfer-based mt system and applied to only written text enumerated a following brief list of the problems that had not seemed to be easy to solve in the near future: 1) processing of non-continuous idiomatic expressions 2) reduction of too many ambiguities in english syntactic analysis 3) robust processing for failed or illformed sentences 4) selecting correct word correspondence between several alternatives 5) generation of korean sentence style. the problems can be considered as factors that have influence on the translation quality of machine translation system. this paper describes the symbolic and statistical hybrid approaches to solutions of problems of the previous english-to-korean machine translation system in terms of the improvement of translation quality. the solutions are now successfully applied to the web-based english-korean machine translation system "fromto/ek" which has been developed from 1997. techniques to incorporate the benefits of a hierarchy in a modified hidden markov model. this paper explores techniques to take advantage of the fundamental difference in structure between hidden markov models (hmm) and hierarchical hidden markov models (hhmm). the hhmm structure allows repeated parts of the model to be merged together. a merged model takes advantage of the recurring patterns within the hierarchy, and the clusters that exist in some sequences of observations, in order to increase the extraction accuracy. this paper also presents a new technique for reconstructing grammar rules automatically. this work builds on the idea of combining a phrase extraction method with hhmm to expose patterns within english text. the reconstruction is then used to simplify the complex structure of an hhmm the models discussed here are evaluated by applying them to natural language tasks based on conll-2004 and a sub-corpus of the lancaster treebank analysis and synthesis of the distribution of consonants over languages: a complex network approach. cross-linguistic similarities are reflected by the speech sound systems of languages all over the world. in this work we try to model such similarities observed in the consonant inventories, through a complex bipartite network. we present a systematic study of some of the appealing features of these inventories with the help of the bipartite network. an important observation is that the occurrence of consonants follows a two regime power law distribution. we find that the consonant inventory size distribution together with the principle of preferential attachment are the main reasons behind the emergence of such a two regime behavior. in order to further support our explanation we present a synthesis model for this network based on the general theory of preferential attachment. using machine-learning to assign function labels to parser output for spanish. data-driven grammatical function tag assignment has been studied for english using the penn-ii treebank data. in this paper we address the question of whether such methods can be applied successfully to other languages and treebank resources. in addition to tag assignment accuracy and f-scores we also present results of a task-based evaluation. we use three machine-learning methods to assign cast3lb function tags to sentences parsed with bikel's parser trained on the cast3lb treebank. the best performing method, svm, achieves an f-score of 86.87% on gold-standard trees and 66.67% on parser output - a statistically significant improvement of 6.74% over the baseline. in a task-based evaluation we generate lfg functional-structures from the function-tag-enriched trees. on this task we achive an f-score of 75.67%, a statistically significant 3.4% improvement over the baseline. tracking initiative in collaborative dialogue interactions. in this paper, we argue for the need to distinguish between task and dialogue initiatives, and present a model for tracking shifts in both types of initiatives in dialogue interactions. our model predicts the initiative holders in the next dialogue turn based on the current initiative holders and the effect that observed cues have on changing them. our evaluation across various corpora shows that the use of cues consistently improves the accuracy in the system's prediction of task and dialogue initiative holders by 2-4 and 8-13 percentage points, respectively, thus illustrating the generality of our model. response generation in collaborative negotiation. in collaborative planning activities, since the agents are autonomous and heterogenous, it is inevitable that conflicts arise in their beliefs during the planning process. in cases where such conflicts are relevant to the task at hand, the agents should engage in collaborative negotiation as an attempt to square away the discrepancies in their beliefs. this paper presents a computational strategy for detecting conflicts regarding proposed beliefs and for engaging in collaborative negotiation to resolve the conflicts that warrant resolution. our model is capable of selecting the most effective aspect to address in its pursuit of conflict resolution in cases where multiple conflicts arise, and of selecting appropriate evidence to justify the need for such modification. furthermore, by capturing the negotiation process in a recursive propose-evaluate-modify cycle of actions, our model can successfully handle embedded negotiation subdialogues. dialogue management in vector-based call routing. this paper describes a domain independent, automatically trained call router which directs customer calls based on their response to an open-ended "how may i direct your call?" query. routing behavior is trained from a corpus of transcribed and hand-routed calls and then carried out using vector-based information retrieval techniques. based on the statistical discriminating power of the n-gram terms extracted from the caller's request, the caller is 1) routed to the appropriate destination, 2) transferred to a human operator, or 3) asked a disambiguation question. in the last case, the system dynamically generates queries tailored to the caller's request and the destinations with which it is consistent. our approach is domain independent and the training process is fully automatic. evaluations over a financial services call center handling hundreds of activities with dozens of destinations demonstrate a substantial improvement on existing systems by correctly routing 93.8% of the calls after punting 10.2% of the calls to a human operator. constraint-based sentence compression: an integer programming approach. the ability to compress sentences while preserving their grammaticality and most of their meaning has recently received much attention. our work views sentence compression as an optimisation problem. we develop an integer programming formulation and infer globally optimal compressions in the face of linguistically motivated constraints. we show that such a formulation allows for relatively simple and knowledge-lean compression models that do not require parallel corpora or large-scale resources. the proposed approach yields results comparable and in some cases superior to state-of-the-art. developing a flexible spoken dialog system using simulation. in this paper, we describe a new methodology to develop mixed-initiative spoken dialog systems, which is based on the extensive use of simulations to accelerate the development process. with the help of simulations, a system providing information about a database of nearly 1000 restaurants in the boston area has been developed. the simulator can produce thousands of unique dialogs which benefit not only dialog development but also provide data to train the speech recognizer and understanding components, in preparation for real user interactions. also described is a strategy for creating cooperative responses to user queries, incorporating an intelligent language generation capability that produces content-dependent verbal descriptions of listed items. generating parallel multilingual lfg-tag grammars from a metagrammar. we introduce a metagrammar, which allows us to automatically generate, from a single and compact metagrammar hierarchy, parallel lexical functional grammars (lfg) and tree-adjoining grammars (tag) for french and for english: the grammar writer specifies in compact manner syntactic properties that are potentially framework-, and to some extent language-independent (such as subcategorization, valency alternations and realization of syntactic functions), from which grammars for several frameworks and languages are automatically generated offline. on parsing strategies and closure. this paper proposes a welcom hypothesis: a computationally simple deviceis sufficient for processing natural language. traditionally it has been argued that processing natural language syntax requires very powerful machinery. many engineers have come to this rather grim conclusion; almost all working parsers are actually turing machines (tm). for example, woods believed that a parser should have tm complexity and specifically designed his augmented transition networks (atns) to be turing equivalent.(1) "it is well known (cf. [chomsky64]) that the strict context-free grammar model is not an adequate mechanism for characterizing the subtleties of natural languages." [woods70]if the problem is really as hard as it appears, then the only solution is to grin and bear it. our own position is that parsing acceptable sentences is simpler because there are constraints on human performance that drastically reduce the computational complexity. although woods correctly observes that competence models are very complex, this observation may not apply directly to a performance problem such as parsing.the claim is that performance limitations actually reduce parsing complexity. this suggests two interesting questions: (a) how is the performance model constrained so as to reduce its complexity, and (b) how can the constrained performance model naturally approximate competence idealizations? stress assignment in letter to sound rules for speech synthesis. this paper will discuss how to determine word stress from spelling. stress assignment is a well-established weak point for many speech synthesizers because stress dependencies cannot be determined locally. it is impossible to determine the stress of a word by looking through a five or six character window, as many speech synthesizers do. well-known examples such as degráde / dègradátion and télegraph / telégraphy demonstrate that stress dependencies can span over two and three syllables. this paper will present a principled framework for dealing with these long distance dependencies. stress assignment will be formulated in terms of waltz' style constraint propagation with four sources of constraints: (1) syllable weight, (2) part of speech, (3) morphology and (4) etymology. syllable weight is perhaps the most interesting, and will be the main focus of this paper. most of what follows has been implemented. char_align: a program for aligning parallel texts at the character level. there have been a number of recent papers on aligning parallel texts at the sentence level, e.g., brown et al (1991), gale and church (to appear), isabelle (1992), kay and rösenschein (to appear), simard et al (1992), warwick-armstrong and russell (1990). on clean inputs, such as the canadian hansards, these methods have been very successful (at least 96% correct by sentence). unfortunately, if the input is noisy (due to ocr and/or unknown markup conventions), then these methods tend to break down because the noise can make it difficult to find paragraph boundaries, let alone sentences. this paper describes a new program, char_align, that aligns texts at the character level rather than at the sentence/paragraph level, based on the cognate approach proposed by simard et al. word association norms, mutual information and lexicography. the term word association is used in a very particular sense in the psycholinguistic literature. (generally speaking, subjects respond quicker than normal to the word "nurse" if it follows a highly associated word such as "doctor.") we will extend the term to provide the basis for a statistical description of a variety of interesting linguistic phenomena, ranging from semantic relations of the doctor/nurse type (content word/content word) to lexico-syntactic co-occurrence constraints between verbs and prepositions (content word/function word). this paper will propose a new objective measure based on the information theoretic notion of mutual information, for estimating word association norms from computer readable corpora. (the standard method of obtaining word association norms, testing a few thousand subjects on a few hundred words, is both costly and unreliable.) the proposed measure, the association ratio, estimates word association norms directly from computer readable corpora, making it possible to estimate norms for tens of thousands of words. partially specified signatures: a vehicle for grammar modularity. this work provides the essential foundations for modular construction of (typed) unification grammars for natural languages. much of the information in such grammars is encoded in the signature, and hence the key is facilitating a modularized development of type signatures. we introduce a definition of signature modules and show how two modules combine. our definitions are motivated by the actual needs of grammar developers obtained through a careful examination of large scale grammars. we show that our definitions meet these needs by conforming to a detailed set of desiderata. the wild thing. suppose you are on a mobile device with no keyboard (e.g., a cell or pda). how can you enter text quickly? t9? graffiti? this demo will show how language modeling can be used to speed up data entry, both in the mobile context, as well as the desk-top. the wild thing encourages users to use wildcards (*). a language model finds the k-best expansions. users quickly figure out when they can get away with wildcards. general purpose trigram language models are effective for the general case (unrestricted text), but there are important special cases like searching over popular web queries, where more restricted language models are even more effective. natural language input to a computer-based glaucoma consultation system. a "front end" for a computer-based glaucoma consultation system is described. the system views a case as a description of a particular instance of a class of concepts called "structured objects" and builds up a representation of the instance from the sentences in the case. the information required by the consultation system is then extracted and passed on to the consultation system in the appropriately coded form. a core of syntactic, semantic and contextual rules which are applicable to all structured objects is being developed together with a representation of the structured object glaucoma-patient. there is also a facility for adding domain dependent syntax, abbreviations and defaults. speech acts and rationality. this paper derives the basis of a theory of communication from a formal theory of rational interaction. the major result is a demonstration that illocutionary acts need not be primitive, and need not be recognized. as a test case. we derive searle's conditions on requesting from principles of rationality coupled with a gricean theory of imperatives. the theory is shown to distinguish insincere or nonserious imperatives from true requests. extensions to indirect speech acts, and ramifications for natural language systems are also briefly discussed. memory-based learning of morphology with stochastic transducers. this paper discusses the supervised learning of morphology using stochastic transducers, trained using the expectation-maximization (em) algorithm. two approaches are presented: first, using the transducers directly to model the process, and secondly using them to define a similarity measure, related to the fisher kernel method (jaakkola and haussler, 1998), and then using a memory-based learning (mbl) technique. these are evaluated and compared on data sets from english, german, slovene and arabic. performatives in a rationally based speech act theory. a crucially important adequacy test of any theory of speech acts is its ability to handle performatives. this paper provides a theory of performatives as a test case for our rationally based theory of illocutionary acts. we show why "i request you..." is a request, and "i lie to you that p" is self-defeating. the analysis supports and extends earlier work of theorists such as bach and harnish [1] and takes issue with recent claims by searle [10] that such performative-as-declarative analyses are doomed to failure. parsing the wsj using ccg and log-linear models. this paper describes and evaluates log-linear parsing models for combinatory categorial grammar (ccg). a parallel implementation of the l-bfgs optimisation algorithm is described, which runs on a beowulf cluster allowing the complete penn treebank to be used for estimation. we also develop a new efficient parsing algorithm for ccg which maximises expected recall of dependencies. we compare models which use all ccg derivations, including non-standard derivations, with normal-form models. the performances of the two models are comparable and the results are competitive with existing wide-coverage ccg parsers. scaling conditional random fields using error-correcting codes. conditional random fields (crfs) have been applied with considerable success to a number of natural language processing tasks. however, these tasks have mostly involved very small label sets. when deployed on tasks with larger label sets, the requirements for computational resources mean that training becomes intractable.this paper describes a method for training crfs on such tasks, using error correcting output codes (ecoc). a number of crfs are independently trained on the separate binary labelling tasks of distinguishing between a subset of the labels and its complement. during decoding, these models are combined to produce a predicted label sequence which is resilient to errors by individual models.error-correcting crf training is much less resource intensive and has a much faster training time than a standardly formulated crf, while decoding performance remains quite comparable. this allows us to scale crfs to previously impossible tasks, as demonstrated by our experiments with large label sets. machine translation versus dictionary term translation - a comparison for english-japanese news article alignment. bilingual news article alignment methods based on multi-lingual information retrieval have been shown to be successful for the automatic production of so-called noisy-parallel corpora. in this paper we compare the use of machine translation (mt) to the commonly used dictionary term lookup (dtl) method for reuter news article alignment in english and japanese. the results show the trade-off between improved lexical disambiguation provided by machine translation and extended synonym choice provided by dictionary term lookup and indicate that mt is superior to dtl only at medium and low recall levels. at high recall levels dtl has superior precision. models for sentence compression: a comparison across domains, training requirements and evaluation measures. sentence compression is the task of producing a summary at the sentence level. this paper focuses on three aspects of this task which have not received detailed treatment in the literature: training requirements, scalability, and automatic evaluation. we provide a novel comparison between a supervised constituent-based and an weakly supervised word-based compression algorithm and examine how these models port to different domains (written vs. spoken text). to achieve this, a human-authored compression corpus has been created and our study highlights potential problems with the automatically gathered compression corpora currently used. finally, we assess whether automatic evaluation measures can be used to determine compression quality. the distributional inclusion hypotheses and lexical entailment. this paper suggests refinements for the distributional similarity hypothesis. our proposed hypotheses relate the distributional behavior of pairs of words to lexical entailment -- a tighter notion of semantic similarity that is required by many nlp applications. to automatically explore the validity of the defined hypotheses we developed an inclusion testing algorithm for characteristic features of two words, which incorporates corpus and web-based feature sampling to overcome data sparseness. the degree of hypotheses validity was then empirically tested and manually analyzed with respect to the word sense level. in addition, the above testing algorithm was exploited to improve lexical entailment acquisition. an experiment in hybrid dictionary and statistical sentence alignment. the task of aligning sentences in parallel corpora of two languages has been well studied using pure statistical or linguistic models. we developed a linguistic method based on lexical matching with a bilingual dictionary and two statistical methods based on sentence length ratios and sentence offset probabilities. this paper seeks to further our knowledge of the alignment task by comparing the performance of the alignment models when used separately and together, i.e. as a hybrid system. our results show that for our english-japanese corpus of newspaper articles, the hybrid system using lexical matching and sentence length ratios outperforms the pure methods. ranking algorithms for named entity extraction: boosting and the voted perceptron. this paper describes algorithms which rerank the top n hypotheses from a maximum-entropy tagger, the application being the recovery of named-entity boundaries in a corpus of web data. the first approach uses a boosting algorithm for ranking problems. the second approach uses the voted perceptron algorithm. both algorithms give comparable, significant improvements over the maximum-entropy baseline. the voted perceptron algorithm can be considerably more efficient to train, at some cost in computation on test examples. a new statistical parser based on bigram lexical dependencies. this paper describes a new statistical parser which is based on probabilities of dependencies between head-words in the parse tree. standard bigram probability estimation techniques are extended to calculate probabilities of dependencies between pairs of words. tests using wall street journal data show that the method performs at least as well as spatter (magerman 95; jelinek et al. 94), which has the best published results for a statistical parser on this task. the simplicity of the approach means the model trains on 40,000 sentences in under 15 minutes. with a beam search strategy parsing speed can be improved to over 200 sentences a minute with negligible loss in accuracy. three generative, lexicalised models for statistical parsing. in this paper we first propose a new statistical parsing model, which is a generative model of lexicalised context-free grammar. we then extend the model to include a probabilistic treatment of both subcategorisation and wh-movement. results on wall street journal text show that the parser performs at 88.1/87.5% constituent precision/recall, an average improvement of 2.3% over (collins 96). head-driven parsing for word lattices. we present the first application of the head-driven statistical parsing model of collins (1999) as a simultaneous language model and parser for large-vocabulary speech recognition. the model is adapted to an online left to right chart-parser for word lattices, integrating acoustic, n-gram, and parser probabilities. the parser uses structural and lexical dependencies not considered by n-gram models, conditioning recognition on more linguistically-grounded relationships. experiments on the wall street journal treebank and lattice corpora show word error rates competitive with the standard n-gram language model while extracting additional structural information useful for speech understanding. new ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron. this paper introduces new learning algorithms for natural language processing based on the perceptron algorithm. we show how the algorithms can be efficiently applied to exponential sized representations of parse trees, such as the "all subtrees" (dop) representation described by (bod 1998), or a representation tracking all sub-fragments of a tagged sentence. we give experimental results showing significant improvements on two tasks: parsing wall street journal text, and named-entity extraction from web data. clause restructuring for statistical machine translation. we describe a method for incorporating syntactic information in statistical machine translation systems. the first step of the method is to parse the source language string that is being translated. the second step is to apply a series of transformations to the parse tree, effectively reordering the surface string on the source language side of the translation system. the goal of this step is to recover an underlying word order that is closer to the target language word-order than the original string. the reordering approach is applied as a pre-processing step in both the training and decoding phases of a phrase-based statistical mt system. we describe experiments on translation from german to english, showing an improvement from 25.2% bleu score for a baseline system to 26.8% bleu score for the system with reordering, a statistically significant improvement. incremental parsing with the perceptron algorithm. this paper describes an incremental parsing approach where parameters are estimated using a variant of the perceptron algorithm. a beam-search algorithm is used during both training and decoding phases of the method. the perceptron approach was implemented with the same feature set as that of an existing generative model (roark, 2001a), and experimental results show that it gives competitive performance to the generative model on parsing the penn treebank. we demonstrate that training a perceptron model to combine with the generative model during search provides a 2.1 percent f-measure improvement over the generative model alone, to 88.8 percent. discriminative syntactic language modeling for speech recognition. we describe a method for discriminative training of a language model that makes use of syntactic features. we follow a reranking approach, where a baseline recogniser is used to produce 1000-best output for each acoustic input, and a second "reranking" model is then used to choose an utterance from these 1000-best lists. the reranking model makes use of syntactic features together with a parameter estimation method that is based on the perception algorithm. we describe experiments on the switchboard speech recognition task. the syntactic features provide an additional 0.3% reduction in test-set error rate beyond the model of (roark et al., 2004a; roark et al., 2004b) (significant at p < 0.001), which makes use of a discriminatively trained n-gram model, giving a total reduction of 1.2% over the baseline switchboard system. measuring conformity to discourse routines in decision-making interactions. in an effort to develop measures of discourse level management strategies, this study examines a measure of the degree to which decision-making interactions consist of sequences of utterance functions that are linked in a decision-making routine. the measure is applied to 100 dyadic interactions elicited in both face-to-face and computer-mediated environments with systematic variation of task complexity and message-window size. every utterance in the interactions is coded according to a system that identifies decision-making functions and other routine functions of utterances. markov analyses of the coded utterances make it possible to measure the relative frequencies with which sequences of 2 and 3 utterances trace a path in a markov model of the decision routine. these proportions suggest that interactions in all conditions adhere to the model, although we find greater conformity in the computer-mediated environments, which is probably due to increased processing and attentional demands for greater efficiency. the results suggest that measures based on markov analyses of coded interactions can provide useful measures for comparing discourse level properties, for correlating discourse features with other textual features, and for analyses of discourse management strategies. topic-focused multi-document summarization using an approximate oracle score. we consider the problem of producing a multi-document summary given a collection of documents. since most successful methods of multi-document summarization are still largely extractive, in this paper, we explore just how well an extractive method can perform. we introduce an "oracle" score, based on the probability distribution of unigrams in human summaries. we then demonstrate that with the oracle score, we can generate extracts which score, on average, better than the human summaries, when evaluated with rouge. in addition, we introduce an approximation to the oracle score which produces a system with the best known performance for the 2005 document understanding conference (duc) evaluation. integrating symbolic and statistical representations: the lexicon pragmatics interface. we describe a formal framework for interpretation of words and compounds in a discourse context which integrates a symbolic lexicon/grammar, word-sense probabilities, and a pragmatic component. the approach is motivated by the need to handle productive word use. in this paper, we concentrate on compound nominals. we discuss the inadequacies of approaches which consider compund interpretation as either wholly lexico-grammatical or wholly pragmatic, and provide an alternative integrated account. an algebra for semantic construction in constraint-based grammars. we develop a framework for formalizing semantic construction within grammars expressed in typed feature structure logics, including hpsg. the approach provides an alternative to the lambda calculus; it maintains much of the desirable flexibility of unification-based approaches to composition, while constraining the allowable operations in order to capture basic generalizations and improve maintainability. a pylonic decision-tree language model- with optimal question selection. this paper discusses a decision-tree approach to the problem of assigning probabilities to words following a given text. in contrast with previous decision-tree language model attempts, an algorithm for selecting nearly optimal questions is considered. the model is to be tested on a standard task, the wall street journal, allowing a fair comparison with the well-known trigram model. using parsed corpora for structural disambiguation in the trains domain. this paper describes a prototype disambiguation module, kankei, which was tested on two corpora of the trains project. in ambiguous verb phrases of form v ... np pp or v ... np adverb(s), the two corpora have very different pp and adverb attachment patterns; in the first, the correct attachment is to the vp 88.7% of the time, while in the second, the correct attachment is to the np 73.5% of the time. kankei uses various n-gram patterns of the phrase heads around these ambiguities, and assigns parse trees (with these ambiguities) a score based on a linear combination of the frequencies with which these patterns appear with np and vp attachments in the trains corpora. unlike previous statistical disambiguation systems, this technique thus combines evidence from bigrams, trigrams, and the 4-gram around an ambiguous attachment. in the current experiments, equal weights are used for simplicity but results are still good on the trains corpora (92.2% and 92.4% accuracy). despite the large statistical differences in attachment preferences in the two corpora, training on the first corpus and testing on the second gives an accuracy of 90.9%. a syntactic framework for speech repairs and other disruptions. this paper presents a grammatical and processing framework for handling the repairs, hesitations, and other interruptions in natural human dialog. the proposed framework has proved adequate for a collection of human-human task-oriented dialogs, both in a full manual examination of the corpus, and in tests with a parser capable of parsing some of that corpus. this parser can also correct a pre-parser speech repair identifier resulting in a 4.8% increase in recall. on determining the consistency of partial descriptions of trees. we examine the consistency problem for descriptions of trees based on remote dominance, and present a consistency-checking algorithm which is polynomial in the number of nodes in the description, despite disjunctions inherent in the theory of trees. the resulting algorithm allows for descriptions which go beyond sets of atomic formulas to allow certain types of disjunction and negation. less is more: eliminating index terms from subordinate clauses. we perform a linguistic analysis of documents during indexing for information retrieval. by eliminating index terms that occur only in subordinate clauses, index size is reduced by approximately 30% without adversely affecting precision or recall. these results hold for two corpora: a sample of the world wide web and an electronic encyclopedia. a machine learning approach to the automatic evaluation of machine translation. we present a machine learning approach to evaluating the well-formedness of output of a machine translation system, using classifiers that learn to distinguish human reference translations from machine translations. this approach can be used to evaluate an mt system, tracking improvements over time; to aid in the kind of failure analysis that can help guide system development; and to select among alternative output strings. the method presented is fully automated and independent of source language, target language and domain. using wordnet to automatically deduce relations between words in noun-noun compounds. we present an algorithm for automatically disambiguating noun-noun compounds by deducing the correct semantic relation between their constituent words. this algorithm uses a corpus of 2,500 compounds annotated with wordnet senses and covering 139 different semantic relations (we make this corpus available online for researchers interested in the semantics of noun-noun compounds). the algorithm takes as input the wordnet senses for the nouns in a compound, finds all parent senses (hypernyms) of those senses, and searches the corpus for other compounds containing any pair of those senses. the relation with the highest proportional co-occurrence with any sense pair is returned as the correct relation for the compound. this algorithm was tested using a 'leave-one-out' procedure on the corpus of compounds. the algorithm identified the correct relations for compounds with high precision: in 92% of cases where a relation was found with a proportional co-occurrence of 1.0, it was the correct relation for the compound being disambiguated. alignment of multiple languages for historical comparison. an essential step in comparative reconstruction is to align corresponding phonological segments in the words being compared. to do this, one must search among huge numbers of potential alignments to find those that give a good phonetic fit. this is a hard computational problem, and it becomes exponentially more difficult when more than two strings are being aligned. in this paper i extend the guided-search alignment algorithm of covington (computational linguistics, 1996) to handle more than two strings. the resulting algorithm has been implemented in prolog and gives reasonable results when tested on data from several languages. reference to locations. we propose a semantics for locative expressions such as near jones or west of denver, an important subsystem for nlp applications. locative expressions denote regions of space, and serve as arguments to predicates, locating objects and events spatially. since simple locatives occupy argument positions, they do not participate in scope ambiguities---pace one common view, which sees locatives as logical operators. our proposal justifies common representational practice in computational linguistics, accounting for how locative expressions function anaphorically, and explaining a wide range of inference involving locatives. we further demonstrate how the argument analysis may accommodate multiple locative arguments in a single predicate. the analysis is implemented for use in a database query application. a computational semantics for natural language. in the new head-driven phrase structure grammar (hpsg) language processing system that is currently under development at hewlett-packard laboratories, the montagovian semantics of the earlier gpsg system (see [gawron et al. 1982]) is replaced by a radically different approach with a number of distinct advantages. in place of the lambda calculus and standard first-order logic, our medium of conceptual representation is a new logical formalism called nflt (neo-fregean language of thought); compositional semantics is effected, not by schematic lambda expressions, but by lisp procedures that operate on nflt expressions to produce new expressions. nflt has a number of features that make it well-suited for natural language translations, including predicates of variable arity in which explicitly marked situational roles supercede order-coded argument positions, sortally restricted quantification, a compositional (but nonextensional) semantics that handles causal contexts, and a principled conceptual raising mechanism that we expect to lead to a computationally tractable account of propositional attitudes. the use of semantically compositional lisp procedures in place of lambda-schemas allows us to produce fully reduced translations on the fly, with no need for post-processing. this approach should simplify the task of using semantic information (such as sortal incompatibilities) to eliminate bad parse paths. automatically extracting nominal mentions of events with a bootstrapped probabilistic classifier. most approaches to event extraction focus on mentions anchored in verbs. however, many mentions of events surface as noun phrases. detecting them can increase the recall of event extraction and provide the foundation for detecting relations between events. this paper describes a weakly-supervised method for detecting nominal event mentions that combines techniques from word sense disambiguation (wsd) and lexical acquisition to create a classifier that labels noun phrases as denoting events or non-events. the classifier uses boot-strapped probabilistic generative models of the contexts of events and non-events. the contexts are the lexically-anchored semantic dependency relations that the nps appear in. our method dramatically improves with bootstrapping, and comfortably outperforms lexical lookup methods which are based on very much larger hand-crafted resources. unsupervised segmentation of words using prior distributions of morph length and frequency. we present a language-independent and unsupervised algorithm for the segmentation of words into morphs. the algorithm is based on a new generative probabilistic model, which makes use of relevant prior information on the length and frequency distributions of morphs in a language. our algorithm is shown to outperform two competing algorithms, when evaluated on data from a language with agglutinative morphology (finnish), and to perform well also on english data. veins theory: a model of global discourse cohesion and coherence. in this paper, we propose a generalization of centering theory (ct) (grosz, joshi, weinstein (1995)) called veins theory (vt), which extends the applicability of centering rules from local to global discourse. a key facet of the theory involves the identification of «veins» over discourse structure trees such as those defined in rst, which delimit domains of referential accessibility for each unit in a discourse. once identified, reference chains can be extended across segment boundaries, thus enabling the application of ct over the entire discourse. we describe the processes by which veins are defined over discourse structure trees and how ct can be applied to global discourse by using these chains. we also define a discourse «smoothness» index which can be used to compare different discourse structures and interpretations, and show how vt can be used to abstract a span of text in the context of the whole discourse. finally, we validate our theory by analyzing examples from corpora of english, french, and romanian. expectations in incremental discourse processing. the way in which discourse features express connections back to the previous discourse has been described in the literature in terms of adjoining at the right frontier of discourse structure. but this does not allow for discourse features that express expectations about what is to come in the subsequent discourse. after characterizing these expectations and their distribution in text, we show how an approach that makes use of substitution as well as adjoining on a suitably defined right frontier, can be used to both process expectations and constrain discouse processing in general. sub-sentential alignment using substring co-occurrence counts. in this paper, we will present an efficient method to compute the co-occurrence counts of any pair of substring in a parallel corpus, and an algorithm that make use of these counts to create sub-sentential alignments on such a corpus. this algorithm has the advantage of being as general as possible regarding the segmentation of text. constraint-based event recognition for information extraction. we present a program for segmenting texts according to the separate events they describe. a modular architecture is described that allows us to examine the contributions made by particular aspects of natural language to event structuring. this is applied in the context of terrorist news articles, and a technique is suggested for evaluating the resulting segmentations. we also examine the usefulness of various heuristics in forming these segmentations. an integrated archictecture for shallow and deep processing. we present an architecture for the integration of shallow and deep nlp components which is aimed at flexible combination of different language technologies for a range of practical current and future applications. in particular, we describe the integration of a high-level hpsg parsing system with different high-performance shallow components, ranging from named entity recognition to chunk parsing and shallow clause recognition. the nlp components enrich a representation of natural language text with layers of new xml meta-information using a single shared data structure, called the text chart. we describe details of the integration methods, and show how information extraction and language checking applications for realworld german text benefit from a deep grammatical analysis. automatic semantic tagging of unknown proper names. implemented methods for proper names recognition rely on large gazetteers of common proper nouns and a set of heuristic rules (e.g. mr. as an indicator of a person entity type). though the performance of current pn recognizers is very high (over 90%), it is important to note that this problem is by no means a "solved problem". existing systems perform extremely well on newswire corpora by virtue of the availability of large gazetteers and rule bases designed for specific tasks (e.g. recognition of organization and person entity types as specified in recent message understanding conferences muc).however, large gazetteers are not available for most languages and applications other than newswire texts and, in any case, proper nouns are an open class.in this paper we describe a context-based method to assign an entity type to unknown proper names (pns). like many others, our system relies on a gazetteer and a set of context-dependent heuristics to classify proper nouns. however, due to the unavailability of large gazetteers in italian, over 20% detected pns cannot be semantically tagged.the algorithm that we propose assigns an entity type to an unknown pn based on the analysis of syntactically and semantically similar contexts already seen in the application corpus.the performance of the algorithm is evaluated not only in terms of precision, following the tradition of muc conferences, but also in terms of information gain, an information theoretic measure that takes into account the complexity of the classification task. language independent, minimally supervised induction of lexical probabilities. a central problem in part-of-speech tagging, especially for new languages for which limited annotated resources are available, is estimating the distribution of lexical probabilities for unknown words. this paper introduces a new paradigmatic similarity measure and presents a minimally supervised learning approach combining effective selection and weighting methods based on paradigmatic and contextual similarity measures populated from large quantities of inexpensive raw text data. this approach is highly language independent and requires no modification to the algorithm or implementation to shift between languages such as french and english. dependency tree kernels for relation extraction. we extend previous work on tree kernels to estimate the similarity between the dependency trees of sentences. using this kernel within a support vector machine, we detect and classify relations between entities in the automatic content extraction (ace) corpus of news articles. we examine the utility of different features such as wordnet hypernyms, parts of speech, and entity types, and find that the dependency tree kernel achieves a 20% f1 improvement over a "bag-of-words" kernel. supersense tagging of unknown nouns using semantic similarity. the limited coverage of lexical-semantic resources is a significant problem for nlp systems which can be alleviated by automatically classifying the unknown words. supersense tagging assigns unknown nouns one of 26 broad semantic categories used by lexicographers to organise their manual insertion into wordnet. ciaramita and johnson (2003) present a tagger which uses synonym set glosses as annotated training examples. we describe an unsupervised approach, based on vector-space similarity, which does not require annotated examples but significantly outperforms their tagger. we also demonstrate the use of an extremely large shallow-parsed corpus for calculating vector-space semantic similarity. multi-tagging for lexicalized-grammar parsing. with performance above 97% accuracy for newspaper text, part of speech (pos) tagging might be considered a solved problem. previous studies have shown that allowing the parser to resolve pos tag ambiguity does not improve performance. however, for grammar formalisms which use more fine-grained grammatical categories, for example tag and ccg, tagging accuracy is much lower. in fact, for these formalisms, premature ambiguity resolution makes parsing infeasible.we describe a multi-tagging approach which maintains a suitable level of lexical category ambiguity for accurate and efficient ccg parsing. we extend this multi-tagging approach to the pos level to overcome errors introduced by automatically assigned pos tags. although pos tagging accuracy seems high, maintaining some pos tag ambiguity in the language processing pipeline results in more accurate ccg supertagging. scaling context space. context is used in many nlp systems as an indicator of a term's syntactic and semantic function. the accuracy of the system is dependent on the quality and quantity of contextual information available to describe each term. however, the quantity variable is no longer fixed by limited corpus resources. given fixed training time and computational resources, it makes sense for systems to invest time in extracting high quality contextual information from a fixed corpus. however, with an effectively limitless quantity of text available, extraction rate and representation size need to be considered. we use thesaurus extraction with a range of context extracting tools to demonstrate the interaction between context quantity, time and size on a corpus of 300 million words. lexical disambiguation: sources of information and their statistical realization. lexical disambiguation can be achieved using different sources of information. aiming at high performance of automatic disambiguation it is important to know the relative importance and applicability of the various sources. in this paper we classify several sources of information and show how some of them can be achieved using statistical data. first evaluations indicate the extreme importance of local information, which mainly represents lexical associations and selectional restrictions for syntactically related words. direct word sense matching for lexical substitution. this paper investigates conceptually and empirically the novel sense matching task, which requires to recognize whether the senses of two synonymous words match in context. we suggest direct approaches to the problem, which avoid the intermediate step of explicit word sense disambiguation, and demonstrate their appealing advantages and stimulating potential for future research. two languages are more informative than one. this paper presents a new approach for resolving lexical ambiguities in one language using statistical data on lexical relations in another language. this approach exploits the differences between mappings of words to senses in different languages. we concentrate on the problem of target word selection in machine translation, for which the approach is directly applicable, and employ a statistical model for the selection mechanism. the model was evaluated using two sets of hebrew and german examples and was found to be very useful for disambiguation. similarity-based methods for word sense disambiguation. we compare four similarity-based estimation methods against back-off and maximum-likelihood estimation methods on a pseudo-word sense disambiguation task in which we controlled for both unigram and bigram frequency. the similarity-based methods perform up to 40% better on this particular task. we also conclude that events that occur only once in the training set have major impact on similarity-based estimates. contextual word similarity and estimation from sparse data. in recent years there is much interest in word cooccurrence relations, such as n-grams, verb-object combinations, or cooccurrence within a limited context. this paper discusses how to estimate the probability of cooccurrences that do not occur in the training data. we present a method that makes local analogies between each specific unobserved cooccurrence and other cooccurrences that contain similar words, as determined by an appropriate word similarity metric. our evaluation suggests that this method performs better than existing smoothing methods, and may provide an alternative to class based models. similarity-based estimation of word cooccurrence probabilities. in many applications of natural language processing it is necessary to determine the likelihood of a given word combination. for example, a speech recognizer may need to determine which of the two word combinations "eat a peach" and "eat a beach" is more likely. statistical nlp methods determine the likelihood of a word combination according to its frequency in a training corpus. however, the nature of language is such that many word combinations are infrequent and do not occur in a given corpus. in this work we propose a method for estimating the probability of such previously unseen word combinations using available information on "most similar" words.we describe a probabilistic word association model based on distributional word similarity, and apply it to improving probability estimates for unseen word bigrams in a variant of katz's back-off model. the similarity-based method yields a 20% perplexity improvement in the prediction of unseen bigrams and statistically significant reductions in speech-recognition error. cooking up referring expressions. this paper describes the referring expression generation mechanisms used in epicure, a computer program which produces natural language descriptions of cookery recipes. major features of the system include: an underlying ontology which permits the representation of non-singular entities; a notion of discriminatory power, to determine what properties should be used in a description; and a patr-like unification grammar to produce surface linguistic strings. the interpretation of tense and aspect in english. an analysis of english tense and aspect is presented that specifies temporal precedence relations within a sentence. the relevant reference points for interpretation are taken to be the initial and terminal points of events in the world, as well as two "hypothetical" times: the perfect time (when a sentence contains perfect aspect) and the progressive or during time. a method for providing temporal interpretation for nontensed elements in the sentence is also described. investigating regular sense extensions based on intersective levin classes. in this paper we specifically address questions of polysemy with respect to verbs, and how regular extensions of meaning can be achieved through the adjunction of particular syntactic phrases. we see verb classes as the key to making generalizations about regular extensions of meaning. current approaches to english classification, levin classes and wordnet, have limitations in their applicability that impede their utility as general classification schemes. we present a refinement of levin classes, intersective sets, which are a more fine-grained classification and have more coherent sets of syntactic frames and associated semantic components. we have preliminary indications that the membership of our intersective sets will be more compatible with wordnet than the original levin classes. we also have begun to examine related classes in portuguese, and find that these verbs demonstrate similarly coherent syntactic and semantic properties. the role of semantic roles in disambiguating verb senses. we describe an automatic word sense disambiguation (wsd) system that disambiguates verb senses using syntactic and semantic features that encode information about predicate arguments and semantic classes. our system performs at the best published accuracy on the english verbs of senseval-2. we also experiment with using the gold-standard predicate-argument labels from propbank for disambiguating fine-grained wordnet senses and course-grained propbank framesets, and show that disambiguation of verb senses can be further improved with better extraction of semantic roles. mapping wordnets using structural information. we present a robust approach for linking already existing lexical/semantic hierarchies. we used a constraint satisfaction algorithm (relaxation labeling) to select - among a set of candidates- the node in a target taxonomy that bests matches each node in a source taxonomy. in particular, we use it to map the nominal part of wordnet 1.5 onto wordnet 1.6, with a very high precision and a very low remaining ambiguity. a noisy-channel model for document compression. we present a document compression system that uses a hierarchical noisy-channel model of text production. our compression system first automatically derives the syntactic structure of each sentence and the overall discourse structure of the text given as input. the system then uses a statistical hierarchical model of text production in order to drop non-important syntactic and discourse constituents so as to generate coherent, grammatical document compressions of arbitrary length. the system outperforms both a baseline and a sentence-based compression system that operates by simplifying sequentially all sentences in a text. our results support the claim that discourse knowledge plays an important role in document summarization. bayesian query-focused summarization. we present bayesum (for "bayesian summarization"), a model for sentence extraction in query-focused summarization. bayesum leverages the common case in which multiple documents are relevant to a single query. using these documents as reinforcement for query terms, bayesum is not afflicted by the paucity of information in short queries. we show that approximate inference in bayesum is possible on large data sets and results in a state-of-the-art summarization system. furthermore, we show how bayesum can be understood as a justified query expansion technique in the language modeling for ir framework. efficient unsupervised discovery of word categories using symmetric patterns and high frequency words. we present a novel approach for discovering word categories, sets of words sharing a significant aspect of their meaning. we utilize meta-patterns of high-frequency words and content words in order to discover pattern candidates. symmetric patterns are then identified using graph-based measures, and word categories are created based on graph clique sets. our method is the first pattern-based method that requires no corpus annotation or manually provided seed patterns or words. we evaluate our algorithm on very large corpora in two languages, using both human judgments and wordnet-based evaluation. our fully unsupervised results are superior to previous work that used a pos tagged corpus, and computation time for huge corpora are orders of magnitude faster than previously reported. assigning intonational features in synthesized spoken directions. speakers convey much of the information hearers use to interpret discourse by varying prosodic features such as phrasing, pitch accent placement, tune, and pitch range. the ability to emulate such variation is crucial to effective (synthetic) speech generation. while text-to-speech synthesis must rely primarily upon structural information to determine appropriate intonational features, speech synthesized from an abstract representation of the message to be conveyed may employ much richer sources. the implementation of an intonation assignment component for direction assistance, a program which generates spoken directions, provides a first approximation of how recent models of discourse structure can be used to control intonational variation in ways that build upon recent research in intonational meaning. the implementation further suggests ways in which these discourse models might be augmented to permit the assignment of appropriate intonational features. a three-valued interpretation of negation in feature structure descriptions. feature structures are informational elements that have been used in several linguistic theories and in computational systems for natural-language processing. a logical calculus has been developed and used as a description language for feature structures. in the present work, a framework in three-valued logic is suggested for defining the semantics of a feature structure description language, allowing for a more complete set of logical operators. in particular, the semantics of the negation and implication operators are examined. various proposed interpretations of negation and implication are compared within the suggested framework. one particular interpretation of the description language with a negation operator is described and its computational aspects studied. interactively exploring a machine translation model. this paper describes a method of interactively visualizing and directing the process of translating a sentence. the method allows a user to explore a model of syntax-based statistical machine translation (mt), to understand the model's strengths and weaknesses, and to compare it to other mt systems. using this visualization method, we can find and address conceptual and practical problems in an mt system. in our demonstration at acl, new users of our tool will drive a syntax-based decoder for themselves. an information-state approach to collaborative reference. we describe a dialogue system that works with its interlocutor to identify objects. our contributions include a concise, modular architecture with reversible processes of understanding and generation, an information-state model of reference, and flexible links between semantics and collaborative problem solving. a nonparametric method for extraction of candidate phrasal terms. this paper introduces a new method for identifying candidate phrasal terms (also known as multiword units) which applies a nonparametric, rank-based heuristic measure. evaluation of this measure, the mutual rank ratio metric, shows that it produces better results than standard statistical measures when applied to this task. the use of syntactic clues in discourse processing. the desirability of a syntactic parsing component in natural language understanding systems has been the subject of debate for the past several years. this paper describes an approach to automatic text processing which is entirely based on syntactic form. a program is described which processes one genre of discourse, that of newspaper reports. the program creates summaries of reports by relying on an expanded concept of text grounding: certain syntactic structures and tense/aspect pairs indicate the most important events in a news story. supportive, background material is also highly coded syntactically. certain types of information are routinely expressed with distinct syntactic forms. where more than one episode occurs in a single report, a change of episode will also be marked syntactically in a reliable way. learning a syntagmatic and paradigmatic structure from language data with a bi-multigram model. in this paper, we present a stochastic language modeling tool which aims at retrieving variable-length phrases (multigrams), assuming bigram dependencies between them. the phrase retrieval can be intermixed with a phrase clustering procedure, so that the language data are iteratively structured at both a paradigmatic and a syntagmatic level in a fully integrated way. perplexity results on atr travel arrangement data with a bi-multigram model (assuming bigram correlations between the phrases) come very close to the trigram scores with a reduced number of entries in the language model. also the ability of the class version of the model to merge semantically related phrases into a common class is illustrated. experiments with learning parsing heuristics. any large language processing software relies in its operation on heuristic decisions concerning the strategy of processing. these decisions are usually "hard-wired" into the software in the form of hand-crafted heuristic rules, independent of the nature of the processed texts. we propose an alternative, adaptive approach in which machine learning techniques learn the rules from examples of sentences in each class. we have experimented with a variety of learning techniques on a representative instance of this problem within the realm of parsing. our approach lead to the discovery of new heuristics that perform significantly better than the current hand-crafted heuristic. we discuss the entire cycle of application of machine learning and suggest a methodology for the use of machine learning as a technique for the adaptive optimisation of language-processing software. answer extraction, semantic clustering, and extractive summarization for clinical question answering. this paper presents a hybrid approach to question answering in the clinical domain that combines techniques from summarization and information retrieval. we tackle a frequently-occurring class of questions that takes the form "what is the best drug treatment for x?" starting from an initial set of medline citations, our system first identifies the drugs under study. abstracts are then clustered using semantic classes from the umls ontology. finally, a short extractive summary is generated for each abstract to populate the clusters. two evaluations---a manual one focused on short answers and an automatic one focused on the supporting abstract---demonstrate that our system compares favorably to pubmed, the search system most widely used by physicians today. relieving the data acquisition bottleneck in word sense disambiguation. supervised learning methods for wsd yield better performance than unsupervised methods. yet the availability of clean training data for the former is still a severe challenge. in this paper, we present an unsupervised bootstrapping approach for wsd which exploits huge amounts of automatically generated noisy data for training within a supervised learning framework. the method is evaluated using the 29 nouns in the english lexical sample task of senseval 2. our algorithm does as well as supervised algorithms on 31% of this test set, which is an improvement of 11% (absolute) over state-of-the-art bootstrapping wsd algorithms. we identify seven different factors that impact the performance of our system. an unsupervised method for word sense tagging using parallel corpora. we present an unsupervised method for word sense disambiguation that exploits translation correspondences in parallel corpora. the technique takes advantage of the fact that cross-language lexicalizations of the same concept tend to be consistent, preserving some core element of its semantics, and yet also variable, reflecting differing translator preferences and the influence of context. working with parallel corpora introduces an extra complication for evaluation, since it is difficult to find a corpus that is both sense tagged and parallel with another language; therefore we use pseudo-translations, created by machine translation systems, in order to make possible the evaluation of the approach against a standard test set. the results demonstrate that word-level translation correspondences are a valuable source of information for sense disambiguation. detecting errors in discontinuous structural annotation. consistency of corpus annotation is an essential property for the many uses of annotated corpora in computational and theoretical linguistics. while some research addresses the detection of inconsistencies in positional annotation (e.g., part-of-speech) and continuous structural annotation (e.g., syntactic constituency), no approach has yet been developed for automatically detecting annotation errors in discontinuous structural annotation. this is significant since the annotation of potentially discontinuous stretches of material is increasingly relevant, from tree-banks for free-word order languages to semantic and discourse annotation.in this paper we discuss how the variation n-gram error detection approach (dickinson and meurers, 2003a) can be extended to discontinuous structural annotation. we exemplify the approach by showing how it successfully detects errors in the syntactic annotation of the german tiger corpus (brants et al., 2002). a corpus-based approach to topic in danish dialog. we report on an investigation of the pragmatic category of topic in danish dialog and its correlation to surface features of nps. using a corpus of 444 utterances, we trained a decision tree system on 16 features. the system achieved near-human performance with success rates of 84--89% and f1-scores of 0.63--0.72 in 10-fold cross validation tests (human performance: 89% and 0.78). the most important features turned out to be preverbal position, definiteness, pronominalisation, and non-subordination. we discovered that nps in epistemic matrix clauses (e.g. "i think ...") were seldom topics and we suspect that this holds for other interpersonal matrix clauses as well. deep syntactic processing by combining shallow methods. we present a novel approach for finding discontinuities that outperforms previously published results on this task. rather than using a deeper grammar formalism, our system combines a simple unlexicalized pcfg parser with a shallow pre-processor. this pre-processor, which we call a trace tagger, does surprisingly well on detecting where discontinuities can occur without using phase structure information. grammars for local and long dependencies. polarized dependency (pd-) grammars are proposed as a means of efficient treatment of discontinuous constructions. pd-grammars describe two kinds of dependencies: local, explicitly derived by the rules, and long, implicitly specified by negative and positive valencies of words. if in a pd-grammar the number of non-saturated valencies in derived structures is bounded by a constant, then it is weakly equivalent to a cf-grammar and has a o(n3)-time parsing algorithm. it happens that such bounded pd-grammars are strong enough to express such phenomena as unbounded raising, extraction and extraposition. multext-east: parallel and comparable corpora and lexicons for six central and eastern european languages. the eu copernicus project multext-east has created a multi-lingual corpus of text and speech data, covering the six languages of the project: bulgarian, czech, estonian, hungarian, romanian, and slovene. in addition, wordform lexicons for each of the languages were developed. the corpus includes a parallel component consisting of orwell's nineteen eighty-four, with versions in all six languages tagged for part-of-speech and aligned to english (also tagged for pos). we describe the encoding format and data architecture designed especially for this corpus, which is generally usable for encoding linguistic corpora. we also describe the methodology for the development of a harmonized set of morphosyntactic descriptions (msds), which builds upon the scheme for western european languages developed within the eagles project. we discuss the special concerns for handling the six project languages, which cover three distinct language families. machine translation using probabilistic synchronous dependency insertion grammars. syntax-based statistical machine translation (mt) aims at applying statistical models to structured data. in this paper, we present a syntax-based statistical machine translation system based on a probabilistic synchronous dependency insertion grammar. synchronous dependency insertion grammars are a version of synchronous grammars defined on dependency trees. we first introduce our approach to inducing such a grammar from parallel corpora. second, we describe the graphical model for the machine translation task, which can also be viewed as a stochastic tree-to-tree transducer. we introduce a polynomial time decoding algorithm for the model. we evaluate the outputs of our mt system using the nist and bleu automatic mt evaluation software. the result shows that our system outperforms the baseline system based on the ibm models in both translation speed and quality. error driven word sense disambiguation. in this paper we describe a method for performing word sense disambiguation (wsd). the method relies on unsupervised learning and exploits functional relations among words as produced by a shallow parser. by exploiting an error driven rule learning algorithm (brill 1997), the system is able to produce rules for wsd, which can be optionally edited by humans in order to increase the performance of the system. a text input front-end processor as an information access platform. this paper presents a practical foreign language writing support tool which makes it much easier to utilize dictionary and example sentence resources. like a kana-kanji conversion front-end processor used to input japanese language text, this tool is also implemented as a front-end processor and can be combined with a wide variety of applications. a morphological analyzer automatically extracts key words from text as it is being input into the tool, and these words are used to locate information relevant to the input text. this information is then automatically displayed to the user. with this tool, users can concentrate better on their writing because much less interruption of their work is required for the consulting of dictionaries or for the retrieval of reference sentences. retrieval and display may be conducted in any three ways: 1) relevant information is retrieved and displayed automatically; 2) information is retrieved automatically but displayed only on user command; 3) information is both retrieved and displayed only on user command. the extent to which the retrieval and display of information proceeds automatically depends on the type of information being referenced; this element of the design adds to system efficiency. further, by combining this tool with a stepped-level interactive machine translation function, we have created a pc support tool to help japanese people write in english. subdeletion in verb phrase ellipsis. this paper stems from an ongoing research project on verb phrase ellipsis. the project's goals are to implement a verb phrase ellipsis resolution algorithm, automatically test the algorithm on corpus data, then automatically evaluate the algorithm against human-generated answers. the paper will establish the current status of the algorithm based on this automatic evaluation, categorizing current problem situations. an algorithm to handle one of these problems, the case of subdeletion, will be described and evaluated. the algorithm attempts to detect and solve subdeletion by locating adjuncts of similar types in a verb phrase ellipsis and corresponding antecedent. syntactic and semantic transfer with f-structures. we present two approaches for syntactic and semantic transfer based on lfg f-structures and compare the results with existing co-description and restriction operator based approaches, focusing on aspects of ambiguity preserving transfer, complex cases of syntactic structural mismatches as well as on modularity and reusability. the two transfer approaches are interfaced with an existing, implemented transfer component (verbmobil), by translating f-structures into a term language, and by interfacing f-structure representations with an existing semantic based transfer approach, respectively. solving thematic divergences in machine translation. though most translation systems have some mechanism for translating certain types of divergent predicate-argument structures, they do not provide a general procedure that takes advantage of the relationship between lexical-semantic structure and syntactic structure. a divergent predicate-argument structure is one in which the predicate (e.g., the main verb) or its arguments (e.g., the subject and object) do not have the same syntactic ordering properties for both the source and target language. to account for such ordering differences, a machine translator must consider language-specific syntactic idiosyncrasies that distinguish a target language from a source language, while making use of lexical-semantic uniformities that tie the two languages together. this paper describes the mechanisms used by the unitran machine translation system for mapping an underlying lexical-conceptual structure to a syntactic structure (and vice versa), and it shows how these mechanisms coupled with a set of general linking routines solve the problem of thematic divergence in machine translation. a parameterized approach to integrating aspect with lexical-semanics for machine translation. this paper discusses how a two-level knowledge representation model for machine translation integrates aspectual information with lexical-semantic information by means of parameterization. the integration of aspect with lexical-semantics is especially critical in machine translation because of the lexical selection and aspectual realization processes that operate during the production of the target-language sentence: there are often a large number of lexical and aspectual possibilities to choose from in the production of a sentence from a lexical semantic representation. aspectual information from the source-language sentence constrains the choice of target-language terms. in turn, the target-language terms limit the possibilities for generation of aspect. thus, there is a two-way communication channel between the two processes. this paper will show that the selection/realization processes may be parameterized so that they operate uniformly across more than one language and it will describe how the parameter-based approach is currently being used as the basis for extraction of aspectual information from corpora. deriving verbal and compositional lexical aspect for nlp applications. verbal and compositional lexical aspect provide the underlying temporal structure of events. knowledge of lexical aspect, e.g., (a)telicity, is therefore required for interpreting event sequences in discourse (dowty, 1986; moens and steedman, 1988; passoneau, 1988), interfacing to temporal databases (androutsopoulos, 1996), processing temporal modifiers (antonisse, 1994), describing allowable alternations and their semantic effects (resnik, 1996; tenny, 1994), and selecting tense and lexical items for natural language generation ((dorr and olsen, 1996; klavans and chodorow, 1992), cf. (slobin and bocaz, 1988)). we show that it is possible to represent lexical aspect---both verbal and compositional---on a large scale, using lexical conceptual structure (lcs) representations of verbs in the classes cataloged by levin (1993). we show how proper consideration of these universal pieces of verb meaning may be used to refine lexical representations and derive a range of meanings from combinations of lcs representations. a single algorithm may therefore be used to determine lexical aspect classes and features at both verbal and sentence levels. finally, we illustrate how knowledge of lexical aspect facilitates the interpretation of events in nlp applications. feature logic with weak subsumption constraints. in the general framework of a constraint-based grammar formalism often some sort of feature logic serves as the constraint language to describe linguistic objects. we investigate the extension of basic feature logic with subsumption (or matching) constraints, based on a weak notion of subsumption. this mechanism of one-way information flow is generally deemed to be necessary to give linguistically satisfactory descriptions of coordination phenomena in such formalisms. we show that the problem whether a set of constraints is satisfiable in this logic is decidable in polynomial time and give a solution algorithm. parsing for semidirectional lambek grammar is np-complete. we study the computational complexity of the parsing problem of a variant of lambek categorial grammar that we call semidirectional. in semidirectional lambek calculus sdl there is an additional nondirectional abstraction rule allowing the formula abstracted over to appear anywhere in the premise sequent's left-hand side, thus permitting non-peripheral extraction. sdl grammars are able to generate each context-free language and more than that. we show that the parsing problem for semidirectional lambek grammar is np-complete by a reduction of the 3-partition problem. efficient construction of underspecified semantics under massive ambiguity. we investigate the problem of determining a compact underspecified semantical representation for sentences that may be highly ambiguous. due to combinatorial explosion, the naive method of building semantics for the different syntactic readings independently is prohibitive. we present a method that takes as input a syntactic parse forest with associated constraint-based semantic construction rules and directly builds a packed semantic structure. the algorithm is fully implemented and runs in o(n4log(n)) in sentence length, if the grammar meets some reasonable 'normality' restrictions. gemini: a natural language system for spoken-language understanding. gemini is a natural language understanding system developed for spoken language applications. this paper describes the details of the system, and includes relevant measurements of size, efficiency, and performance of each of its sub-components in detail. practical issues in compiling typed unification grammars for speech recognition. current alternatives for language modeling are statistical techniques based on large amounts of training data, and hand-crafted context-free or finite-state grammars that are difficult to build and maintain. one way to address the problems of the grammar-based approach is to compile recognition grammars from grammars written in a more expressive formalism. while theoretically straight-forward, the compilation process can exceed memory and time bounds, and might not always result in accurate and efficient speech recognition. we will describe and evaluate two approaches to this compilation problem. we will also describe and evaluate additional techniques to reduce the structural ambiguity of the language model. interleaving syntax and semantics in an effecient bottom-up parser. we describe an efficient bottom-up parser that interleaves syntactic and semantic structure building. two techniques are presented for reducing search by reducing local ambiguity: limited left context constraints are used to reduce local syntactic ambiguity, and deferred sortal-constraint application is used to reduce local semantic ambiguity. we experimentally evaluate these techniques, and show dramatic reductions in both number of chart edges and total parsing time. the robust processing capabilities of the parser are demonstrated in its use in improving the accuracy of a speech recognizer. representing paraphrases using synchronous tags. this paper looks at representing paraphrases using the formalism of synchronous tags; it looks particularly at comparisons with machine translation and the modifications it is necessary to make to synchronous tags for paraphrasing. a more detailed version is in dras (1997a). a meta-level grammar: redefining synchronous tag for translation and paraphrase. in applications such as translation and paraphrase, operations are carried out on grammars at the meta level. this paper shows how a meta-grammar, defining structure at the meta level, is useful in the case of such operations; in particular, how it solves problems in the current definition of synchronous tag (shieber, 1994) caused by ignoring such structure in mapping between grammars, for applications such as translation. moreover, essential properties of the formalism remain unchanged. a bio-inspired approach for multi-word expression extraction. this paper proposes a new approach for multi-word expression (mwe)extraction on the motivation of gene sequence alignment because textual sequence is similar to gene sequence in pattern analysis. theory of longest common subsequence (lcs) originates from computer science and has been established as affine gap model in bioinformatics. we perform this developed lcs technique combined with linguistic criteria in mwe extraction. in comparison with traditional n-gram method, which is the major technique for mwe extraction, lcs approach is applied with great efficiency and performance guarantee. experimental results show that lcs-based approach achieves better results than n-gram. what to do when lexicalization fails: parsing german with suffix analysis and smoothing. in this paper, we present an unlexicalized parser for german which employs smoothing and suffix analysis to achieve a labelled bracket f-score of 76.2, higher than previously reported results on the negra corpus. in addition to the high accuracy of the model, the use of smoothing in an unlexicalized parser allows us to better examine the interplay between smoothing and parsing results. probabilistic parsing for german using sister-head dependencies. we present a probabilistic parsing model for german trained on the negra treebank. we observe that existing lexicalized parsing models using head-head dependencies, while successful for english, fail to outperform an unlexicalized baseline model for german. learning curves show that this effect is not due to lack of training data. we propose an alternative model that uses sister-head dependencies instead of head-head dependencies. this model out-performs the baseline, achieving a labeled precision and recall of up to 74%. this indicates that sister-head dependencies are more appropriate for treebanks with very flat structures such as negra. integrating syntactic priming into an incremental probabilistic parser, with an application to psycholinguistic modeling. the psycholinguistic literature provides evidence for syntactic priming, i.e., the tendency to repeat structures. this paper describes a method for incorporating priming into an incremental probabilistic parser. three models are compared, which involve priming of rules between sentences, within sentences, and within coordinate structures. these models simulate the reading time advantage for parallel structures found in human data, and also yield a small increase in overall parsing accuracy. empirically estimating order constraints for content planning in generation. in a language generation system, a content planner embodies one or more "plans" that are usually hand--crafted, sometimes through manual analysis of target text. in this paper, we present a system that we developed to automatically learn elements of a plan and the ordering constraints among them. as training data, we use semantically annotated transcripts of domain experts performing the task our system is designed to mimic. given the large degree of variation in the spoken language of the transcripts, we developed a novel algorithm to find parallels between transcripts based on techniques used in computational genomics. our proposed methodology was evaluated two--fold: the learning and generalization capabilities were quantitatively evaluated using cross validation obtaining a level of accuracy of 89%. a qualitative evaluation is also provided. topological dependency trees: a constraint-based account of linear precedence. we describe a new framework for dependency grammar, with a modular decomposition of immediate dependency and linear precedence. our approach distinguishes two orthogonal yet mutually constraining structures: a syntactic dependency tree and a topological dependency tree. the syntax tree is nonprojective and even non-ordered, while the topological tree is projective and partially ordered. interpreting the human genome sequence, using stochastic grammars. the 3 billion base pair sequence of the human genome is now available, and attention is focusing on annotating it to extract biological meaning. i will discuss what we have obtained, and the methods that are being used to analyse biological sequences. in particular i will discuss approaches using stochastic grammars analogous to those used in computational linguistics, both for gene finding and protein family classification. a noisy-channel approach to question answering. we introduce a probabilistic noisy-channel model for question answering and we show how it can be exploited in the context of an end-to-end qa system. our noisy-channel system outperforms a state-of-the-art rule-based qa system that uses similar resources. we also show that the model we propose is flexible enough to accommodate within one mathematical framework many qa-specific resources and techniques, which range from the exploitation of wordnet, structured, and semi-structured databases to reasoning, and paraphrasing. towards a modular data model for multi-layer annotated corpora. in this paper we discuss the current methods in the representation of corpora annotated at multiple levels of linguistic organization (so-called multi-level or multi-layer corpora). taking five approaches which are representative of the current practice in this area, we discuss the commonalities and differences between them focusing on the underlying data models. the goal of the paper is to identify the common concerns in multi-layer corpus representation and processing so as to lay a foundation for a unifying, modular data model. choosing the word most typical in context using a lexical co-occurrence network. this paper presents a partial solution to a component of the problem of lexical choice: choosing the synonym most typical, or expected, in context. we apply a new statistical approach to representing the context of a word through lexical co-occurrence networks. the implementation was trained and evaluated on a large corpus, and results show that the inclusion of second-order co-occurrence relations improves the performance of our implemented lexical choice program. constraints over lambda-structures in semantic underspecification. we introduce a first-order language for semantic underspecification that we call constraint language for lambda-structures (clls). a λ-structure can be considered as a λ-term up to consistent renaming of bound variables (λ-equality); a constraint of clls is an underspecified description of a λ-structure. clls solves a capturing problem omnipresent in underspecified scope representations. clls features constraints for dominance, lambda binding, parallelism, and anaphoric links. based on clls we present a simple, integrated, and underspecified treatment of scope, parallelism, and anaphora. unification of disjunctive feature descriptions. the paper describes a new implementation of feature structures containing disjunctive values, which can be characterized by the following main points: local representation of embedded disjunctions, avoidance of expansion to disjunctive normal form and of repeated test-unifications for checking consistence. the method is based on a modification of kasper and rounds' calculus of feature descriptions and its correctness therefore is easy to see. it can handle cyclic structures and has been incorporated successfully into an environment for grammar development. parameter estimation for probabilistic finite-state transducers. weighted finite-state transducers suffer from the lack of a training algorithm. training is even harder for transducers that have been assembled via finite-state operations such as composition, minimization, union, concatenation, and closure, as this yields tricky parameter tying. we formulate a "parameterized fst" paradigm and give training algorithms for it, including a general bookkeeping trick ("expectation semirings") that cleanly and efficiently computes expectations and gradients. efficient normal-form parsing for combinatory categorial grammar. under categorial grammars that have powerful rules like composition, a simple n-word sentence can have exponentially many parses. generating all parses is inefficient and obscures whatever true semantic ambiguities are in the input. this paper addresses the problem for a fairly general form of combinatory categorial grammar, by means of an efficient, correct, and easy to implement normal-form parsing technique. the parser is proved to find exactly one parse in each semantic equivalence class of allowable parses; that is, spurious ambiguity (as carefully defined) is shown to be both safely and completely eliminated. efficient generation in primitive optimality theory. this paper introduces primitive optimality theory (otp), a linguistically motivated formalization of ot. otp specifies the class of autosegmental representations, the universal generator gen, and the two simple families of permissible constraints. in contrast to less restricted theories using generalized alignment, otp's optimal surface forms can be generated with finite-state methods adapted from (ellison, 1994). unfortunately these methods take time exponential on the size of the grammar. indeed the generation problem is shown np-complete in this sense. however, techniques are discussed for making ellison's approach fast in the typical case, including a simple trick that alone provides a 100-fold speedup on a grammar fragment of moderate size. one avenue for future improvements is a new finite-state notion, "factored automata," where regular languages are represented compactly via formal intersections ∩ki=1ai of fsas. efficient parsing for bilexical context-free grammars and head automaton grammars. several recent stochastic parsers use bilexical grammars, where each word type idiosyncratically prefers particular complements with particular head words. we present o(n4) parsing algorithms for two bilexical formalisms, improving the prior upper bounds of o(n5). for a common special case that was known to allow o(n3) parsing (eisner, 1997), we present an o(n3) algorithm with an improved grammar constant. a modified joint source-channel model for transliteration. most machine transliteration systems transliterate out of vocabulary (oov) words through intermediate phonemic mapping. a framework has been presented that allows direct orthographical mapping between two languages that are of different origins employing different alphabet sets. a modified joint source-channel model along with a number of alternatives have been proposed. aligned transliteration units along with their context are automatically derived from a bilingual training corpus to generate the collocational statistics. the transliteration units in bengali words take the pattern c+m where c represents a vowel or a consonant or a conjunct and m represents the vowel modifier or matra. the english transliteration units are of the form c*v* where c represents a consonant and v represents a vowel. a bengali-english machine transliteration system has been developed based on the proposed models. the system has been trained to transliterate person names from bengali to english. it uses the linguistic knowledge of possible conjuncts and diphthongs in bengali and their equivalents in english. the system has been evaluated and it has been observed that the modified joint source-channel model performs best with a word agreement ratio of 69.3% and a transliteration unit agreement ratio of 89.8%. types in functional unification grammars. functional unification grammars (fugs) are popular for natural language applications because the formalism uses very few primitives and is uniform and expressive. in our work on text generation, we have found that it also has annoying limitations: it is not suited for the expression of simple, yet very common, taxonomic relations and it does not allow the specification of completeness conditions. we have implemented an extension of traditional functional unification. this extension addresses these limitations while preserving the desirable properties of fugs. it is based on the notions of typed features and typed constituents. we show the advantages of this extension in the context of a grammar used for text generation. measuring language divergence by intra-lexical comparison. this paper presents a method for building genetic language taxonomies based on a new approach to comparing lexical forms. instead of comparing forms cross-linguistically, a matrix of language-internal similarities between forms is calculated. these matrices are then compared to give distances between languages. we argue that this coheres better with current thinking in linguistics and psycholinguistics. an implementation of this approach, called philologicon, is described, along with its application to dyen et al.'s (1992) ninety-five wordlists from indo-european languages. spelling correction using context. this paper describes a spelling correction system that functions as part of an intelligent tutor that carries on a natural language dialogue with its users. the process that searches the lexicon is adaptive as is the system filter, to speed up the process. the basis of our approach is the interaction between the parser and the spelling corrector. alternative correction targets are fed back to the parser, which does a series of syntactic and semantic checks, based on the dialogue context, the sentence context, and the phrase context. exploring and exploiting the limited utility of captions in recognizing intention in information graphics. this paper presents a corpus study that explores the extent to which captions contribute to recognizing the intended message of an information graphic. it then presents an implemented graphic interpretation system that takes into account a variety of communicative signals, and an evaluation study showing that evidence obtained from shallow processing of the graphic's caption has a significant impact on the system's success. this work is part of a larger project whose goal is to provide sight-impaired users with effective access to information graphics. unification with lazy non-redundant copying. this paper presents a unification procedure which eliminates the redundant copying of structures by using a lazy incremental copying approach to achieve structure sharing. copying of structures accounts for a considerable amount of the total processing time. several methods have been proposed to minimize the amount of necessary copying. lazy incremental copying (lic) is presented as a new solution to the copying problem. it synthesizes ideas of lazy copying with the notion of chronological dereferencing for achieving a high amount of structure sharing. ambiguity preserving machine translation using packed representations. in this paper we present an ambiguity preserving translation approach which transfers ambiguous lfg f-structure representatios. it is based on packed f-structure representations which are the result of potentially ambiguous utterances. if the ambiguities between source and target language can be preserved, no unpacking during transfer is necessary and the generator may produce utterances which maximally cover the underlying ambiguities. we convert the packed f-structure descriptions into a flat set of prolog terms which consist of predicates, their predicate argument structure and additional attribute-value information. ambiguity is expressed via local disjunctions. the flat representations facilitate the application of a shake-and-bake like transfer approach extended to deal with packed ambiguites. handling linear precedence constraints by unification. linear precedence (lp) rules are widley used for stating word order principles. they have been adopted as constraints by hpsg but no encoding in the formalism has been provided. since they only order siblings, they are not quite adequate, at least not for german. we propose a notion of lp constraints that applies to linguistically motivated branching domains such as head domains. we show a type-based encoding in an hpsg-style formalism that supports processing. the encoding can be achieved by a compilation step. minimizing manual annotation cost in supervised training from corpora. corpus-based methods for natural language processing often use supervised training, requiring expensive manual annotation of training corpora. this paper investigates methods for reducing annotation cost by sample selection. in this approach, during training the learning program examines many unlabeled examples and selects for labeling (annotation) only those that are most informative at each stage. this avoids redundantly annotating examples that contribute little new information. this paper extends our previous work on committee-based sample selection for probabilistic classifiers. we describe a family of methods for committee-based sample selection, and report experimental results for the task of stochastic part-of-speech tagging. we find that all variants achieve a significant reduction in annotation cost, though their computational efficiency differs. in particular, the simplest method, which has no parameters to tune, gives excellent results. we also show that sample selection yields a significant reduction in the size of the model used by the tagger. towards a resource for lexical semantics: a large german corpus with extensive semantic annotation. we describe the ongoing construction of a large, semantically annotated corpus resource as reliable basis for the large-scale acquisition of word-semantic information, e.g. the construction of domain-independent lexica. the backbone of the annotation are semantic roles in the frame semantics paradigm. we report experiences and evaluate the annotated data from the first project stage. on this basis, we discuss the problems of vagueness and ambiguity in semantic annotation. understanding natural language instructions: the case of purpose clauses. this paper presents an analysis of purpose clauses in the context of instruction understanding. such analysis shows that goals affect the interpretation and / or execution of actions, lends support to the proposal of using generation and enablement to model relations between actions, and sheds light on some inference processes necessary to interpret purpose clauses. aggregation improves learning: experiments in natural language generation for intelligent tutoring systems. to improve the interaction between students and an intelligent tutoring system, we developed two natural language generators, that we systematically evaluated in a three way comparison that included the original system as well. we found that the generator which intuitively produces the best language does engender the most learning. specifically, it appears that functional aggregation is responsible for the improvement. an empirical investigation of proposals in collaborative dialogues. we describe a corpus-based investigation of proposals in dialogue. first, we describe our dri compliant coding scheme and report our inter-coder reliability results. next, we test several hypotheses about what constitutes a well-formed proposal. learning features that predict cue usage. our goal is to identify the features that predict the occurrence and placement of discourse cues in tutorial explanations in order to aid in the automatic generation of explanations. previous attempts to devise rules for text generation were based on intuition or small numbers of constructed examples. we apply a machine learning program, c4.5, to induce decision trees for cue occurrence and placement from a corpus of data coded for a variety of features previously thought to affect cue usage. our experiments enable us to identify the features with most predictive power, and show that machine learning can be used to induce decision trees useful for text generation. approximating context-free grammars with a finite-state calculus. although adequate models of human language for syntactic analysis and semantic interpretation are of at least context-free complexity, for applications such as speech processing in which speed is important finite-state models are often preferred. these requirements may be reconciled by using the more complex grammar to automatically derive a finite-state approximation which can then be used as a filter to guide speech recognition or to reject many hypotheses at an early stage of processing. a method is presented here for calculating such finite-state approximations from context-free grammars. it is essentially different from the algorithm introduced by pereira and wright (1991; 1996), is faster in some cases, and has the advantage of being open-ended and adaptable. encoding lexicalized tree adjoining grammars with a nonmonotonic inheritance hierachy. this paper shows how datr, a widely used formal language for lexical knowledge representation, can be used to define an ltag lexicon as an inheritance hierarchy with internal lexical rules. a bottom-up featural encoding is used for ltag trees and this allows lexical rules to be implemented as covariation constraints within feature structures. such an approach eliminates the considerable redundancy otherwise associated with an ltag lexicon. a structure-sharing parser for lexicalized grammars. in wide-coverage lexicalized grammars many of the elementary structures have substructures in common. this means that in conventional parsing algorithms some of the computation associated with different structures is duplicated. in this paper we describe a precompilation technique for such grammars which allows some of this computation to be shared. in our approach the elementary structures of the grammar are transformed into finite state automata which can be merged and minimised using standard algorithms, and then parsed using an automaton-based parser. we present algorithms for constructing automata from elementary structures, merging and minimising them, and string recognition and parse recovery with the resulting grammar. noun-phrase analysis in unrestricted text for information retrieval. information retrieval is an important application area of natural-language processing where one encounters the genuine challenge of processing large quantities of unrestricted natural-language text. this paper reports on the application of a few simple, yet robust and efficient noun-phrase analysis techniques to create better indexing phrases for information retrieval. in particular, we describe an hybrid approach to the extraction of meaningful (continuous or discontinuous) subcompounds from complex noun phrases using both corpus statistics and linguistic heuristics. results of experiments show that indexing based on such extracted subcompound improves both recall and precision in an information retrieval system. the noun-phrase analysis techniques are also potentially useful for book indexing and automatic thesaurus extraction. methods for the qualitative evaluation of lexical association measures. this paper presents methods for a qualitative, unbiased comparison of lexical association measures and the results we have obtained for adjective-noun pairs and preposition-noun-verb triples extracted from german corpora. in our approach, we compare the entire list of candidates, sorted according to the particular measures, to a reference set of manually identified "true positives". we also show how estimates for the very large number of hapaxlegomena and double occurrences can be inferred from random samples. combining stochastic and rule-based methods for disambiguation in agglutinative languages. in this paper we present the results of the combination of stochastic and rule-based disambiguation methods applied to basque language1. the methods we have used in disambiguation are constraint grammar formalism and an hmm based tagger developed within the multext project. as basque is an agglutinative language, a morphological analyser is needed to attach all possible readings to each word. then, cg rules are applied using all the morphological features and this process decreases morphological ambiguity of texts. finally, we use the multext project tools to select just one from the possible remaining tags.using only the stochastic method the error rate is about 14%, but the accuracy may be increased by about 2% enriching the lexicon with the unknown words. when both methods are combined, the error rate of the whole process is 3.5%. considering that the training corpus is quite small, that the hmm model is a first order one and that constraint grammar of basque language is still in progress, we think that this combined method can achieve good results, and it would be appropriate for other agglutinative languages. chinese-english term translation mining based on semantic prediction. using abundant web resources to mine chinese term translations can be applied in many fields such as reading/writing assistant, machine translation and cross-language information retrieval. in mining english translations of chinese terms, how to obtain effective web pages and evaluate translation candidates are two challenging issues. in this paper, the approach based on semantic prediction is first proposed to obtain effective web pages. the proposed method predicts possible english meanings according to each constituent unit of chinese term, and expands these english items using semantically relevant knowledge for searching. the refined related terms are extracted from top retrieved documents through feedback learning to construct a new query expansion for acquiring more effective web pages. for obtaining a correct translation list, a translation evaluation method in the weighted sum of multi-features is presented to rank these candidates estimated from effective web pages. experimental results demonstrate that the proposed method has good performance in chinese-english term translation acquisition, and achieves 82.9% accuracy. optimizing story link detection is not equivalent to optimizing new event detection. link detection has been regarded as a core technology for the topic detection and tracking tasks of new event detection. in this paper we formulate story link detection and new event detection as information retrieval task and hypothesize on the impact of precision and recall on both systems. motivated by these arguments, we introduce a number of new performance enhancing techniques including part of speech tagging, new similarity measures and expanded stop lists. experimental results validate our hypothesis. highly constrained unification grammars. unification grammars are widely accepted as an expressive means for describing the structure of natural languages. in general, the recognition problem is undecidable for unification grammars. even with restricted variants of the formalism, off-line parsable grammars, the problem is computationally hard. we present two natural constraints on unification grammars which limit their expressivity and allow for efficient processing. we first show that non-reentrant unification grammars generate exactly the class of context-free languages. we then relax the constraint and show that one-reentrant unification grammars generate exactly the class of mildly context-sensitive languages. we thus relate the commonly used and linguistically motivated formalism of unification grammars to more restricted, computationally tractable classes of languages. anaphor resolution in unrestricted texts with partial parsing. in this paper we deal with several kinds of anaphora in unrestricted texts. these kinds of anaphora are pronominal references, surfacecount anaphora and one-anaphora. in order to solve these anaphors we work on the output of a part-of-speech tagger, on which we automatically apply a partial parsing from the formalism: slot unification grammar, which has been implemented in prolog. we only use the following kinds of information: lexical (the lemma of each word), morphologic (person, number, gender) and syntactic. finally we show the experimental results, and the restrictions and preferences that we have used for anaphor resolution with partial parsing. using textual clues to improve metaphor processing. in this paper, we propose a textual clue approach to help metaphor detection, in order to improve the semantic processing of this figure. the previous works in the domain studied the semantic regularities only, overlooking an obvious set of regularities. a corpus-based analysis shows the existence of surface regularities related to metaphors. these clues can be characterized by syntactic structures and lexical markers. we present an object oriented model for representing the textual clues that were found. this representation is designed to help the choice of a semantic processing, in terms of possible non-literal meanings. a prototype implementing this model is currently under development, within an incremental approach allowing step-by-step evaluations. how to thematically segemt texts by using lexical cohesion? this article outlines a quantitative method for segmenting texts into thematically coherent units. this method relies on a network of lexical collocations to compute the thematic coherence of the different parts of a text from the lexical cohesiveness of their words. we also present the results of an experiment about locating boundaries between a series of concatened texts. thematic segmentation of texts: two methods for two kinds of text. to segment texts in thematic units, we present here how a basic principle relying on word distribution can be applied on different kind of texts. we start from an existing method well adapted for scientific texts, and we propose its adaptation to other kinds of texts by using semantic links between words. these relations are found in a lexical network, automatically built from a large corpus. we will compare their results and give criteria to choose the more suitable method according to text characteristics. enhancing electronic dictionaries with an index based on associations. a good dictionary contains not only many entries and a lot of information concerning each one of them, but also adequate means to reveal the stored information. information access depends crucially on the quality of the index. we will present here some ideas of how a dictionary could be enhanced to support a speaker/writer to find the word s/he is looking for. to this end we suggest to add to an existing electronic resource an index based on the notion of association. we will also present preliminary work of how a subset of such associations, for example, topical associations, can be acquired by filtering a network of lexical co-occurrences extracted from a corpus. a dynamic bayesian framework to model context and memory in edit distance learning: an application to pronunciation classification. sitting at the intersection between statistics and machine learning, dynamic bayesian networks have been applied with much success in many domains, such as speech recognition, vision, and computational biology. while natural language processing increasingly relies on statistical methods, we think they have yet to use graphical models to their full potential. in this paper, we report on experiments in learning edit distance costs using dynamic bayesian networks and present results on a pronunciation classification task. by exploiting the ability within the dbn framework to rapidly explore a large model space, we obtain a 40% reduction in error rate compared to a previous transducer-based method of learning edit distance. automatic creation of domain templates. recently, many natural language processing (nlp) applications have improved the quality of their output by using various machine learning techniques to mine information extraction (ie) patterns for capturing information from the input text. currently, to mine ie patterns one should know in advance the type of the information that should be captured by these patterns. in this work we propose a novel methodology for corpus analysis based on cross-examination of several document collections representing different instances of the same domain. we show that this methodology can be used for automatic domain template creation. as the problem of automatic domain template creation is rather new, there is no well-defined procedure for the evaluation of the domain template quality. thus, we propose a methodology for identifying what information should be present in the template. using this information we evaluate the automatically created domain templates through the text snippets retrieved according to the created templates. using lexical dependency and ontological knowledge to improve a detailed syntactic and semantic tagger of english. this paper presents a detailed study of the integration of knowledge from both dependency parses and hierarchical word ontologies into a maximum-entropy-based tagging model that simultaneously labels words with both syntax and semantics. our findings show that information from both these sources can lead to strong improvements in overall system accuracy: dependency knowledge improved performance over all classes of word, and knowledge of the position of a word in an on-tological hierarchy increased accuracy for words not seen in the training data. the resulting tagger offers the highest reported tagging accuracy on this tagset to date. incorporating non-local information into information extraction systems by gibbs sampling. most current statistical natural language processing models use only local features so as to permit dynamic programming in inference, but this makes them unable to fully account for the long distance structure that is prevalent in language use. we show how to solve this dilemma with gibbs sampling, a simple monte carlo method used to perform approximate inference in factored probabilistic models. by using simulated annealing in place of viterbi decoding in sequence models such as hmms, cmms, and crfs, it is possible to incorporate non-local structure while preserving tractable inference. we use this technique to augment an existing crf-based information extraction system with long-distance dependency models, enforcing label consistency and extraction template consistency constraints. this technique results in an error reduction of up to 9% over state-of-the-art systems on two established information extraction tasks. a layered approach to nlp-based information retrieval. a layered approach to information retrieval permits the inclusion of multiple search engines as well as multiple databases, with a natural language layer to convert english queries for use by the various search engines. the nlp layer incorporates morphological analysis, noun phrase syntax, and semantic expansion based on word-net. offline strategies for online question answering: answering questions before they are asked. recent work in question answering has focused on web-based systems that extract answers using simple lexico-syntactic patterns. we present an alternative strategy in which patterns are used to extract highly precise relational information offline, creating a data repository that is used to efficiently answer questions. we evaluate our strategy on a challenging subset of questions, i.e. "who is ..." questions, against a state of the art web-based question answering system. results indicate that the extracted relations answer 25% more questions correctly and do so three orders of magnitude faster than the state of the art system. structure-sharing in lexical representation. the lexicon now plays a central role in our implementation of a head-driven phrase structure grammar (hpsg), given the massive relocation into the lexicon of linguistic information that was carried by the phrase structure rules in the old gpsg system. hpsg's grammar contains fewer than twenty (very general) rules; its predecessor required over 350 to achieve roughly the same coverage. this simplification of the grammar is made possible by an enrichment of the structure and content of lexical entries, using both inheritance mechanisms and lexical rules to represent the linguistic information in a general and efficient form. we will argue that our mechanisms for structure-sharing not only provide the ability to express important linguistic generalization about the lexicon, but also make possible an efficient, readily modifiable implementation that we find quite adequate for continuing development of a large natural language system. factorizing complex models: a case study in mention detection. as natural language understanding research advances towards deeper knowledge modeling, the tasks become more and more complex: we are interested in more nuanced word characteristics, more linguistic properties, deeper semantic and syntactic features. one such example, explored in this article, is the mention detection and recognition task in the automatic content extraction project, with the goal of identifying named, nominal or pronominal references to real-world entities---mentions---and labeling them with three types of information: entity type, entity subtype and mention type. in this article, we investigate three methods of assigning these related tags and compare them on several data sets. a system based on the methods presented in this article participated and ranked very competitively in the ace'04 evaluation. dynamic nonlocal language modeling via hierarchical topic-based adaptation. this paper presents a novel method of generating and applying hierarchical, dynamic topic-based language models. it proposes and evaluates new cluster generation, hierarchical smoothing and adaptive topic-probability estimation techniques. these combined models help capture long-distance lexical dependencies. experiments on the broadcast news corpus show significant improvement in perplexity (10.5% overall and 33.5% on target vocabulary). free indexation: combinatorial analysis and a compositional algorithm. the principle known as 'free indexation' plays an important role in the determination of the referential properties of noun phrases in the principle-and-parameters language framework. first, by investigating the combinatorics of free indexation, we show that the problem of enumerating all possible indexings requires exponential time. secondly, we exhibit a provably optimal free indexation algorithm. on reversing the generation process in optimality theory. optimality theory, a constraint-based phonology and morphology paradigm, has allowed linguists to make elegant analyses of many phenomena, including infixation and reduplication. in this work-in-progress, we build on the work of ellison (1994) to investigate the possibility of using ot as a parsing tool that derives underlying forms from surface forms. a maximum entropy/minimum divergence translation model. i present empirical comparisons between a linear combination of standard statistical language and translation models and an equivalent maximum entropy/minimum divergence (memd) model, using several different methods for automatic feature selection. the memd model significantly outperforms the standard model in test corpus perplexity, even though it has far fewer parameters. multimodal generation in the comic dialogue system. we describe how context-sensitive, user-tailored output is specified and produced in the comic multimodal dialogue system. at the conference, we will demonstrate the user-adapted features of the dialogue manager and text planner. guiding a constraint dependency parser with supertags. we investigate the utility of supertag information for guiding an existing dependency parser of german. using weighted constraints to integrate the additionally available information, the decision process of the parser is influenced by changing its preferences, without excluding alternative structural interpretations from being considered. the paper reports on a series of experiments using varying models of supertags that significantly increase the parsing accuracy. in addition, an upper bound on the accuracy that can be achieved with perfect supertags is estimated. hybrid parsing: using probabilistic models as predictors for a symbolic parser. in this paper we investigate the benefit of stochastic predictor components for the parsing quality which can be obtained with a rule-based dependency grammar. by including a chunker, a supertagger, a pp attacher, and a fast probabilistic parser we were able to improve upon the baseline by 3.2%, bringing the overall labelled accuracy to 91.1% on the german negra corpus. we attribute the successful integration to the ability of the underlying grammar model to combine uncertain evidence in a soft manner, thus avoiding the problem of error propagation. the benefit of stochastic pp attachment to a rule-based parser. to study pp attachment disambiguation as a benchmark for empirical methods in natural language processing it has often been reduced to a binary decision problem (between verb or noun attachment) in a particular syntactic configuration. a parser, however, must solve the more general task of deciding between more than two alternatives in many different contexts. we combine the attachment predictions made by a simple model of lexical attraction with a full-fledged parser of german to determine the actual benefit of the subtask to parsing. we show that the combination of data-driven and rule-based components can reduce the number of all parsing errors by 14% and raise the attachment accuracy for dependency parsing of german to an unprecedented 92%. from route descriptions to sketches: a model for a text-to-image translator. this paper deals with the automatic translation of route descriptions into graphic sketches. we discuss some general problems implied by such inter-mode transcription. we propose a model for an automatic text-to-image translator with a two-stage intermediate representation in which the linguistic representation of a route description precedes the creation of its conceptual representation. learning more effective dialogue strategies using limited dialogue move features. we explore the use of restricted dialogue contexts in reinforcement learning (rl) of effective dialogue strategies for information seeking spoken dialogue systems (e.g. communicator (walker et al., 2001)). the contexts we use are richer than previous research in this area, e.g. (levin and pieraccini, 1997; scheffler and young, 2001; singh et al., 2002; pietquin, 2004), which use only slot-based information, but are much less complex than the full dialogue "information states" explored in (henderson et al., 2005), for which tractabe learning is an issue. we explore how incrementally adding richer features allows learning of more effective dialogue strategies. we use 2 user simulations learned from communicator data (walker et al., 2001; georgila et al., 2005b) to explore the effects of different features on learned dialogue strategies. our results show that adding the dialogue moves of the last system and user turns increases the average reward of the automatically learned strategies by 65.9% over the original (hand-coded) communicator systems, and by 7.8% over a baseline rl policy that uses only slot-status features. we show that the learned strategies exhibit an emergent "focus switching" strategy and effective use of the 'give help' action. licensing and tree adjoining grammar in government binding parsing. this paper presents an implemented, psychologically plausible parsing model for government binding theory grammars. i make use of two main ideas: (1) a generalization of the licensing relations of [abney, 1986] allows for the direct encoding of certain principles of grammar (e.g. theta criterion, case filter) which drive structure building; (2) the working space of the parser is constrained to the domain determined by a tree adjoining grammar elementary tree. all dependencies and constraints are localized within this bounded structure. the resultant parser operates in linear time and allows for incremental semantic interpretation and determination of grammaticality. integrated shallow and deep parsing: topp meets hpsg. we present a novel, data-driven method for integrated shallow and deep parsing. mediated by an xml-based multi-layer annotation architecture, we interleave a robust, but accurate stochastic topological field parser of german with a constraint-based hpsg parser. our annotation-based method for dovetailing shallow and deep phrasal constraints is highly flexible, allowing targeted and fine-grained guidance of constraint-based parsing. we conduct systematic experiments that demonstrate substantial performance gains. incorporating context information for the extraction of terms. the information used for the extraction of terms can be considered as rather 'internal', i.e. coming from the candidate string itself. this paper presents the incorporation of 'external' information derived from the context of the candidate string. it is embedded to the c-value approach for automatic term recognition (atr), in the form of weights constructed from statistical characteristics of the context words of the candidate string. independence assumptions considered harmful. many current approaches to statistical language modeling rely on independence assumptions between the different explanatory variables. this results in models which are computationally simple, but which only model the main effects of the explanatory variables on the response variable. this paper presents an argument in favor of a statistical approach that also models the interactions between the explanatory variables. the argument rests on empirical evidence from two series of experimetns concerning automatic ambiguity resolution. semi-supervised training for statistical word alignment. we introduce a semi-supervised approach to training for statistical machine translation that alternates the traditional expectation maximization step that is applied on a large training corpus with a discriminative step aimed at increasing word-alignment quality on a small, manually word-aligned sub-corpus. we show that our algorithm leads not only to improved alignments but also to machine translation outputs of higher quality. toward general-purpose learning for information extraction. two trends are evident in the recent evolution of the field of information extraction: a preference for simple, often corpus-driven techniques over linguistically sophisticated ones; and a broadening of the central problem definition to include many non-traditional text domains. this development calls for information extraction systems which are as retargetable and general as possible. here, we describe srv, a learning architecture for information extraction which is designed for maximum generality and flexibility. srv can exploit domain-specific information, including linguistic syntax and lexical information, in the form of features provided to the system explicitly as input for training. this process is illustrated using a domain created from reuters corporate acquisitions articles. features are derived from two general-purpose nlp systems, sleator and temperly's link grammar parser and wordnet. experiments compare the learner's performance with and without such linguistic information. surprisingly, in many cases, the system performs as well without this information as with it. a general computational treatment of the comparative. we present a general treatment of the comparative that is based on more basic linguistic elements so that the underlying system can be effectively utilized: in the syntactic analysis phase, the comparative is treated the same as similar structures; in the syntactic regularization phase, the comparative is transformed into a standard form so that subsequent processing is basically unaffected by it. the scope of quantifiers under the comparative is also integrated into the system in a general way. semi-supervised learning of partial cognates using bilingual bootstrapping. partial cognates are pairs of words in two languages that have the same meaning in some, but not all contexts. detecting the actual meaning of a partial cognate in context can be useful for machine translation tools and for computer-assisted language learning tools. in this paper we propose a supervised and a semi-supervised method to disambiguate partial cognates between two languages: french and english. the methods use only automatically-labeled data; therefore they can be applied for other pairs of languages as well. we also show that our methods perform well when using corpora from different domains. japanese morphological analyzer using word co-occurence -jtag. we developed a japanese morphological analyzer that uses the co-occurrence of words to select the correct sequence of words in an unsegmented japanese sentence. the co-occurrence information can be obtained from cases where the system incorrectly analyzes sentences. as the amount of information increases, the accuracy of the system increases with a small risk of degradation. experimental results show that the proposed system assigns the correct phonological representations to unsegmented japanese sentences more precisely than do other popular systems. minimal recursion semantics as dominance constraints: translation, evaluation, and analysis. we show that a practical translation of mrs descriptions into normal dominance constraints is feasible. we start from a recent theoretical translation and verify its assumptions on the outputs of the english resource grammar (erg) on the redwoods corpus. the main assumption of the translation---that all relevant underspecified descriptions are nets---is validated for a large majority of cases; all non-nets computed by the erg seem to be systematically incomplete. utilizing the world wide web as an encyclopedia: extracting term descriptions from semi-structured texts. in this paper, we propose a method to extract descriptions of technical terms from web pages in order to utilize the world wide web as an encyclopedia. we use linguistic patterns and html text structures to extract text fragments containing term descriptions. we also use a language model to discard extraneous descriptions, and a clustering method to summarize resultant descriptions. we show the effectiveness of our method by way of experiments. oganizing encyclopedic knowledge based on the web and its application to question answering. we propose a method to generate large-scale encyclopedic knowledge, which is valuable for much nlp research, based on the web. we first search the web for pages containing a term in question. then we use linguistic patterns and html structures to extract text fragments describing the term. finally, we organize extracted term descriptions based on word senses and domains. in addition, we apply an automatically generated encyclopedia to a question answering system targeting the japanese information-technology engineers examination. an implemented description of japanese: the lexeed dictionary and the hinoki treebank. in this paper we describe the current state of a new japanese lexical resource: the hinoki treebank. the treebank is built from dictionary definition sentences, and uses an hpsg based japanese grammar to encode both syntactic and semantic information. it is combined with an ontology based on the definition sentences to give a detailed sense level description of the most familiar 28,000 words of japanese. using bilingual comparable corpora and semi-supervised clustering for topic tracking. we address the problem dealing with skewed data, and propose a method for estimating effective training stories for the topic tracking task. for a small number of labelled positive stories, we extract story pairs which consist of positive and its associated stories from bilingual comparable corpora. to overcome the problem of a large number of labelled negative stories, we classify them into some clusters. this is done by using k-means with em. the results on the tdt corpora show the effectiveness of the method. a hardware algorithm for high speed morpheme extraction and its implementation. this paper describes a new hardware algorithm for morpheme extraction and its implementation on a specific machine (mex-i), as the first step toward achieving natural language parsing accelerators. it also shows the machine's performance, 100--1,000 times faster than a personal computer. this machine can extract morphemes from 10,000 character japanese text by searching an 80,000 morpheme dictionary in 1 second. it can treat multiple text streams, which are composed of character candidates, as well as one text stream. the algorithm is implemented on the machine in linear time for the number of candidates, while conventional sequential algorithms are implemented in combinational time. a pattern matching method for finding noun and proper noun translations from noisy parallel corpora. we present a pattern matching method for compiling a bilingual lexicon of nouns and proper nouns from unaligned, noisy parallel texts of asian/indo-european language pairs. tagging information of one language is used. word frequency and position information for high and low frequency words are represented in two different vector forms for pattern matching. new anchor point finding and noise elimination techniques are introduced. we obtained a 73.1% precision. we also show how the results can be used in the compilation of domain-specific noun phrases. robust word sense translation by em learning of frame semantics. we propose a robust method of automatically constructing a bilingual word sense dictionary from readily available monolingual ontologies by using estimation-maximization, without any annotated training data or manual tuning. we demonstrate our method on the english framenet and chinese hownet structures. owing to the robustness of em iterations in improving translation likelihoods, our word sense translation accuracies are very high, at 82% on average, for the 11 most ambiguous words in the english framenet with 5 senses or more. we also carried out a pilot study on using this automatically generated bilingual word sense dictionary to choose the best translation candidates and show the first significant evidence that frame semantics are useful for translation disambiguation. translation disambiguation accuracy using frame semantics is 75%, compared to 15% by using dictionary glossing only. these results demonstrate the great potential for future application of bilingual frame semantics to machine translation tasks. automatic speech recognition and its application to information extraction. this paper describes recent progress and the author's perspectives of speech recognition technology. applications of speech recognition technology can be classified into two main areas, dictation and human-computer dialogue systems. in the dictation domain, the automatic broadcast news transcription is now actively investigated, especially under the darpa project. the broadcast news dictation technology has recently been integrated with information extraction and retrieval technology and many application systems, such as automatic voice document indexing and retrieval systems, are under development. in the human-computer interaction domain, a variety of experimental systems for information retrieval through spoken dialogue are being investigated. in spite of the remarkable recent progress, we are still behind our ultimate goal of understanding free conversational speech uttered by any speaker under any environment. this paper also describes the most important research issues that we should attack in order to advance to our ultimate goal of fluent speech recognition. splitting long or ill-formed input for robust spoken-language translation. this paper proposes an input-splitting method for translating spoken-language which includes many long or ill-formed expressions. the proposed method splits input into well-balanced translation units based on a semantic distance calculation. the splitting is performed during left-to-right parsing, and does not degrade translation efficiency. the complete translation result is formed by concatenating the partial translation results of each split unit. the proposed method can be incorporated into frameworks like tdmt, which utilize left-to-right parsing and a score for a substructure. experimental results show that the proposed method gives tdmt the following advantages: (1) elimination of null outputs, (2) splitting of utterances into sentences, and (3) robust translation of erroneous speech recognition results. combining acoustic and pragmatic features to predict recognition performance in spoken dialogue systems. we use machine learners trained on a combination of acoustic confidence and pragmatic plausibility features computed from dialogue context to predict the accuracy of incoming n-best recognition hypotheses to a spoken dialogue system. our best results show a 25% weighted f-score improvement over a baseline system that implements a "grammar-switching" approach to context-sensitive speech recognition. automatic extraction of subcorpora based on subcategorization frames from a part-of-speech tagged corpus. this paper presents a method for extracting subcorpora documenting different subcategorization frames for verbs, nouns, and adjectives in the 100 mio. word british national corpus. the extraction tool consists of a set of batch files for use with the corpus query processor (cqp), which is part of the ims corpus workbench (cf. christ 1994a, b). a macroprocessor has been developed that allows the user to specify in a simple input file which subcorpora are to be created for a given lemma.the resulting subcorpora can be used (1) to provide evidence for the subcategorization properties of a given lemma, and to facilitate the selection of corpus lines for lexicographic research, and (2) to determine the frequencies of different syntactic contexts of each lemma. a program for aligning sentences in bilingual corpora. researchers in both machine translation (e.g., brown et al. 1990) and bilingual lexicography (e.g., klavans and tzoukermann 1990) have recently become interested in studying bilingual corpora, bodies of text such as the canadian hansards (parliamentary proceedings), which are available in multiple languages (such as french and english). one useful step is to align the sentences, that is, to identify correspondences between sentences in one language and sentences in the other language.this paper will describe a method and a program (align) for aligning sentences based on a simple statistical model of character lengths. the program uses the fact that longer sentences in one language tend to be translated into longer sentences in the other language, and that shorter sentences tend to be translated into shorter sentences. a probabilistic score is assigned to each proposed correspondence of sentences, based on the scaled difference of lengths of the two sentences (in characters) and the variance of this difference. this probabilistic score is used in a dynamic programming framework to find the maximum likelihood alignment of sentences.it is remarkable that such a simple approach works as well as it does. an evaluation was performed based on a trilingual corpus of economic reports issued by the union bank of switzerland (ubs) in english, french, and german. the method correctly aligned all but 4% of the sentences. moreover, it is possible to extract a large subcorpus that has a much smaller error rate. by selecting the best-scoring 80% of the alignments, the error rate is reduced from 4% to 0.7%. there were more errors on the english-french subcorpus than on the english-german subcorpus, showing that error rates will depend on the corpus considered; however, both were small enough to hope that the method will be useful for many language pairs.to further research on bilingual corpora, a much larger sample of canadian hansards (approximately 90 million words, half in english and and half in french) has been aligned with the align program and will be available through the data collection initiative of the association for computational linguistics (acl/dci). in addition, in order to facilitate replication of the align program, an appendix is provided with detailed c-code of the more difficult core of the align program. estimating upper and lower bounds on the performance of word-sense disambiguation programs. we have recently reported on two new word-sense disambiguation systems, one trained on bilingual material (the canadian hansards) and the other trained on monolingual material (roget's thesaurus and grolier's encyclopedia). after using both the monolingual and bilingual classifiers for a few months, we have convinced ourselves that the performance is remarkably good. nevertheless, we would really like to be able to make a stronger statement, and therefore, we decided to try to develop some more objective evaluation measures. although there has been a fair amount of literature on sense-disambiguation, the literature does not offer much guidance in how we might establish the success or failure of a proposed solution such as the two systems mentioned in the previous paragraph. many papers avoid quantitative evaluations altogether, because it is so difficult to come up with credible estimates of performance.this paper will attempt to establish upper and lower bounds on the level of performance that can be expected in an evaluation. an estimate of the lower bound of 75% (averaged over ambiguous types) is obtained by measuring the performance produced by a baseline system that ignores context and simply assigns the most likely sense in all cases. an estimate of the upper bound is obtained by assuming that our ability to measure performance is largely limited by our ability obtain reliable judgments from human informants. not surprisingly, the upper bound is very dependent on the instructions given to the judges. jorgensen, for example, suspected that lexicographers tend to depend too much on judgments by a single informant and found considerable variation over judgments (only 68% agreement), as she had suspected. in our own experiments, we have set out to find word-sense disambiguation tasks where the judges can agree often enough so that we could show that they were outperforming the baseline system. under quite different conditions, we have found 96.8% agreement over judges. scalable inference and training of context-rich syntactic translation models. statistical mt has made great progress in the last few years, but current translation models are weak on re-ordering and target language fluency. syntactic approaches seek to remedy these problems. in this paper, we take the framework for acquiring multi-level syntactic translation rules of (galley et al., 2004) from aligned tree-string pairs, and present two main extensions of their approach: first, instead of merely computing a single derivation that minimally explains a sentence pair, we construct a large number of derivations that include contextually richer rules, and account for multiple interpretations of unaligned words. second, we propose probability estimates and a training procedure for weighting these rules. we contrast different approaches on real examples, show that our estimates based on multiple derivations favor phrasal re-orderings that are linguistically better motivated, and establish that our larger rules provide a 3.63 bleu point increase over minimal rules. discourse segmentation of multi-party conversation. we present a domain-independent topic segmentation algorithm for multi-party speech. our feature-based algorithm combines knowledge about content using a text-based algorithm as a feature and about form using linguistic and acoustic cues about topic shifts extracted from speech. this segmentation algorithm uses automatically induced decision rules to combine the different features. the embedded text-based algorithm builds on lexical cohesion and has performance comparable to state-of-the-art algorithms based on lexical information. a significant error reduction is obtained by combining the two knowledge sources. identifying agreement and disagreement in conversational speech: use of bayesian networks to model pragmatic dependencies. we describe a statistical approach for modeling agreements and disagreements in conversational interaction. our approach first identifies adjacency pairs using maximum entropy ranking based on a set of lexical, durational, and structural features that look both forward and backward in the discourse. we then classify utterances as agreement or disagreement using these adjacency pairs and features that represent various pragmatic influences of previous agreement or disagreement on the current utterance. our approach achieves 86.9% accuracy, a 4.9% increase over previous work. a synopsis of learning to recognize names across languages. the development of natural language processing (nlp) systems that perform machine translation (mt) and information retrieval (ir) has highlighted the need for the automatic recognition of proper names. while various name recognizers have been developed, they suffer from being too limited; some only recognize one name class, and all are language specific. this work develops an approach to multilingual name recognition that uses machine learning and a portable framework to simplify the porting task by maximizing reuse and automation. semantic-head based resolution of scopal ambiguities. we introduce an algorithm for scope resolution in underspecified semantic representations. scope preferences are suggested on the basis of semantic argument structure. the major novelty of this approach is that, while maintaining an (scopally) underspecified semantic representation, we at the same time suggest a resolution possibility. the algorithm has been implemented and tested in a large-scale system and fared quite well: 28% of the utterances were ambiguous, 80% of these were correctly interpreted, leaving errors in only 5.7% of the utterance set. machine-learned contexts for linguistic operations in german sentence realization. we show that it is possible to learn the contexts for linguistic operations which map a semantic representation to a surface syntactic tree in sentence realization with high accuracy. we cast the problem of learning the contexts for the linguistic operations as classification tasks, and apply straightforward machine learning techniques, such as decision tree learning. the training data consist of linguistic features extracted from syntactic and semantic representations produced by a linguistic analysis system. the target features are extracted from links to surface syntax trees. our evidence consists of four examples from the german sentence realization system code-named amalgam: case assignment, assignment of verb position features, extraposition, and syntactic aggregation. exploring asymmetric clustering for statistical language modeling. the n-gram model is a stochastic model, which predicts the next word (predicted word) given the previous words (conditional words) in a word sequence. the cluster n-gram model is a variant of the n-gram model in which similar words are classified in the same cluster. it has been demonstrated that using different clusters for predicted and conditional words leads to cluster models that are superior to classical cluster models which use the same clusters for both words. this is the basis of the asymmetric cluster model (acm) discussed in our study. in this paper, we first present a formal definition of the acm. we then describe in detail the methodology of constructing the acm. the effectiveness of the acm is evaluated on a realistic application, namely japanese kana-kanji conversion. experimental results show substantial improvements of the acm in comparison with classical cluster models and word n-gram models at the same model size. our analysis shows that the high-performance of the acm lies in the asymmetry of the model. distribution-based pruning of backoff language models. we propose a distribution-based pruning of n-gram backoff language models. instead of the conventional approach of pruning n-grams that are infrequent in training data, we prune n-grams that are likely to be infrequent in a new document. our method is based on the n-gram distribution i.e. the probability that an n-gram occurs in a new document. experimental results show that our method performed 7--9% (word perplexity reduction) better than conventional cutoff methods. improved source-channel models for chinese word segmentation. this paper presents a chinese word segmentation system that uses improved source-channel models of chinese sentence generation. chinese words are defined as one of the following four types: lexicon words, morphologically derived words, factoids, and named entities. our system provides a unified approach to the four fundamental features of word-level chinese language processing: (1) word segmentation, (2) morphological analysis, (3) factoid detection, and (4) named entity recognition. the performance of the system is evaluated on a manually annotated test set, and is also compared with several state-of-the-art systems, taking into account the fact that the definition of chinese words often varies from system to system. unsupervised learning of dependency structure for language modeling. this paper presents a dependency language model (dlm) that captures linguistic constraints via a dependency structure, i.e., a set of probabilistic dependencies that express the relations between headwords of each phrase in a sentence by an acyclic, planar, undirected graph. our contributions are three-fold. first, we incorporate the dependency structure into an n-gram language model to capture long distance word dependency. second, we present an unsupervised learning method that discovers the dependency structure of a sentence using a bootstrapping procedure. finally, we evaluate the proposed models on a realistic application (japanese kana-kanji conversion). experiments show that the best dlm achieves an 11.3% error rate reduction over the word trigram model. approximation lasso methods for language modeling. lasso is a regularization method for parameter estimation in linear models. it optimizes the model parameters with respect to a loss function subject to model complexities. this paper explores the use of lasso for statistical language modeling for text input. owing to the very large number of parameters, directly optimizing the penalized lasso loss function is impossible. therefore, we investigate two approximation methods, the boosted lasso (blasso) and the forward stagewise linear regression (fslr). both methods, when used with the exponential loss function, bear strong resemblance to the boosting algorithm which has been used as a discriminative training method for language modeling. evaluations on the task of japanese text input show that blasso is able to produce the best approximation to the lasso solution, and leads to a significant improvement, in terms of character error rate, over boosting and the traditional maximum likelihood estimation. improving language model size reduction using better pruning criteria. reducing language model (lm) size is a critical issue when applying a lm to realistic applications which have memory constraints. in this paper, three measures are studied for the purpose of lm pruning. they are probability, rank, and entropy. we evaluated the performance of the three pruning criteria in a real application of chinese text input in terms of character error rate (cer). we first present an empirical comparison, showing that rank performs the best in most cases. we also show that the high-performance of rank lies in its strong correlation with error rate. we then present a novel method of combining two criteria in model pruning. experimental results show that the combined criterion consistently leads to smaller models than the models pruned using either of the criteria separately, at the same cer. towards a self-extending parser. this paper discusses an approach to incremental learning in natural language processing. the technique of projecting and integrating semantic constraints to learn word definitions is analyzed as implemented in the politics system. extensions and improvements of this technique are developed. the problem of generalizing existing word meanings and understanding metaphorical uses of words is addressed in terms of semantic constraint integration. refined lexikon models for statistical machine translation using a maximum entropy approach. typically, the lexicon models used in statistical machine translation systems do not include any kind of linguistic or contextual information, which often leads to problems in performing a correct word sense disambiguation. one way to deal with this problem within the statistical framework is to use maximum entropy methods. in this paper, we present how to use this type of information within a statistical machine translation system. we show that it is possible to significantly decrease training and test corpus perplexity of the translation models. in addition, we perform a rescoring of n-best lists using our maximum entropy model and thereby yield an improvement in translation quality. experimental results are presented on the so-called "verbmobil task". unifying parallels. i show that the equational treatment of ellipsis proposed in (dalrymple et al., 1991) can further be viewed as modeling the effect of parallelism on semantic interpretation. i illustrate this claim by showing that the account straightforwardly extends to a general treatment of sloppy identity on the one hand, and to deaccented foci on the other. i also briefly discuss the results obtained in a prototype implementation. generating minimal definite descriptions. the incremental algorithm introduced in (dale and reiter, 1995) for producing distinguishing descriptions does not always generate a minimal description. in this paper, i show that when generalised to sets of individuals and disjunctive properties, this approach might generate unnecessarily long and ambiguous and/or epistemically redundant descriptions. i then present an alternative, constraint-based algorithm and show that it builds on existing related algorithms in that (i) it produces minimal descriptions for sets of individuals using positive, negative and disjunctive properties, (ii) it straightforwardly generalises to n-ary relations and (iii) it is integrated with surface realisation. efficient parsing for french. parsing with categorial grammars often leads to problems such as proliferating lexical ambiguity, spurious parses and overgeneration. this paper presents a parser for french developed on an unification based categorial grammar (fg) which avoids these problems. this parser is a bottom-up chart parser augmented with a heuristic eliminating spurious parses. the unicity and completeness of parsing are proved. higher-order coloured unification and natural language semantics. in this paper, we show that higher-order coloured unification - a form of unification developed for automated theorem proving - provides a general theory for modeling the interface between the interpretation process and other sources of linguistic, non semantic information. in particular, it provides the general theory for the primary occurrence restriction which (dalrymple et al., 1991)'s analysis called for. coreference handling in xmg. we claim that existing specification languages for tree based grammars fail to adequately support identifier managment. we then show that xmg (extensible meta-grammar) provides a sophisticated treatment of identifiers which is effective in supporting a linguist-friendly grammar design. generating with a grammar based on tree descriptions: a constraint-based approach. while the generative view of language processing builds bigger units out of smaller ones by means of rewriting steps, the axiomatic view eliminates invalid linguistic structures out of a set of possible structures by means of well formedness principles. we present a generator based on the axiomatic view and argue that when combined with a tag-like grammar and a flat semantics, this axiomatic view permits avoiding drawbacks known to hold either of top-down or of bottom-up generators. acquiring receptive morphology: a connectionist model. this paper describes a modular connectionist model of the acquisition of receptive inflectional morphology. the model takes inputs in the form of phones one at a time and outputs the associated roots and inflections. simulations using artificial language stimuli demonstrate the capacity of the model to learn suffixation, prefixation, infixation, circumfixation, mutation, template, and deletion rules. separate network modules responsible for syllables enable to the network to learn simple reduplication rules as well. the model also embodies constraints against association-line crossing. conceptual coherence in the generation of referring expressions. one of the challenges in the automatic generation of referring expressions is to identify a set of domain entities coherently, that is, from the same conceptual perspective. we describe and evaluate an algorithm that generates a conceptually coherent description of a target set. the design of the algorithm is motivated by the results of psycholinguistic experiments. a geometric view on bilingual lexicon extraction from comparable corpora. we present a geometric view on bilingual lexicon extraction from comparable corpora, which allows to re-interpret the methods proposed so far and identify unresolved problems. this motivates three new methods that aim at solving these problems. empirical evaluation shows the strengths and weaknesses of these methods, as well as a significant gain in the accuracy of extracted lexicons. processing broadcast audio for information access. this paper addresses recent progress in speaker-independent, large vocabulary, continuous speech recognition, which has opened up a wide range of near and mid-term applications. one rapidly expanding application area is the processing of broadcast audio for information access. at limsi, broadcast news transcription systems have been developed for english, french, german, mandarin and portuguese, and systems for other languages are under development. audio indexation must take into account the specificities of audio data, such as needing to deal with the continuous data stream and an imperfect word transcription. some near-term applications areas are audio data mining, selective dissemination of information and media monitoring. growing semantic grammars. a critical path in the development of natural language understanding (nlu) modules lies in the difficulty of defining a mapping from words to semantics: usually it takes in the order of years of highly-skilled labor to develop a semantic mapping, e.g., in the form of a semantic grammar, that is comprehensive enough for a given domain. yet, due to the very nature of human language, such mapping invariably fail to achieve full coverage on unseen data. acknowledging the impossibility of stating a priori all the surface forms by which a concept can be expressed, we present gsg: an empathic computer system for the rapid deployment of nlu front-ends and their dynamic customization by non-expert end-users. given a new domain for which an nlu front-end is to be developed, two stages are involved. in the authoring stage, gsg aids the developer in the construction of a simple domain model and a kernel analysis grammar. then, in the run-time stage, gsg provides the end-user with an interactive environment in which the kernel grammar is dynamically extended. three learning methods are employed in the acquisition of semantic mappings from unseen data: (i) parser predictions, (ii) hidden understanding model, and (iii) end-user paraphrases. a baseline version of gsg has been implemented and preliminary experiments show promising results. priority union and generalization in discourse grammars. we describe an implementation in carpenter's typed feature formalism, ale, of a discourse grammar of the kind proposed by scha, polanyi, et al. we examine their method for resolving parallelism-dependent anaphora and show that there is a coherent feature-structural rendition of this type of grammar which uses the operations of priority union and generalization. we describe an augmentation of the ale system to encompass these operations and we show that an appropriate choice of definition for priority union gives the desired multiple output for examples of vp-ellipsis which exhibit a strict/sloppy ambiguity. dynamic programming for parsing and estimation of stochastic unification-based grammars. stochastic unification-based grammars (subgs) define exponential distributions over the parses generated by a unification-based grammar (ubg). existing algorithms for parsing and estimation require the enumeration of all of the parses of a string in order to determine the most likely one, or in order to calculate the statistics needed to estimate a grammar from a training corpus. this paper describes a graph-based dynamic programming algorithm for calculating these statistics from the packed ubg parse representations of maxwell and kaplan (1995) which does not require enumerating all parses. like many graphical algorithms, the dynamic programming algorithm's complexity is worst-case exponential, but is often polynomial. the key observation is that by using maxwell and kaplan packed representations, the required statistics can be rewritten as either the max or the sum of a product of functions. this is exactly the kind of problem which can be solved by dynamic programming over graphical models. xml-based data preparation for robust deep parsing. we describe the use of xml tokenisation, tagging and mark-up tools to prepare a corpus for parsing. our techniques are generally applicable but here we focus on parsing medline abstracts with the anlt wide-coverage grammar. hand-crafted grammars inevitably lack coverage but many coverage failures are due to inadequacies of their lexicons. we describe a method of gaining a degree of robustness by interfacing pos tag information with the existing lexicon. we also show that xml tools provide a sophisticated approach to pre-processing, helping to ameliorate the 'messiness' in real language data and improve parse performance. on interpreting f-structures as udrss. we describe a method for interpreting abstract flat syntactic representations, lfg f-structures, as underspecified semantic representations, here underspecified discourse representation structures (udrss). the method establishes a one-to-one correspondence between subsets of the lfg and udrs formalisms. it provides a model theoretic interpretation and an inferential component which operates directly on underspecified representations for f-structures through the translation images of f-structures as udrss. segment-based hidden markov models for information extraction. hidden markov models (hmms) are powerful statistical models that have found successful applications in information extraction (ie). in current approaches to applying hmms to ie, an hmm is used to model text at the document level. this modelling might cause undesired redundancy in extraction in the sense that more than one filler is identified and extracted. we propose to use hmms to model text at the segment level, in which the extraction process consists of two steps: a segment retrieval step followed by an extraction step. in order to retrieve extraction-relevant segments from documents, we introduce a method to use hmms to model and retrieve segments. our experimental results show that the resulting segment hmm ie system not only achieves near zero extraction redundancy, but also has better overall extraction performance than traditional document hmm ie systems. an empirical evaluation of probabilistic lexicalized tree insertion grammars. we present an empirical study of the applicability of probabilistic lexicalized tree insertion grammars (pltig), a lexicalized counterpart to probabilistic context-free grammars (pcfg), to problems in stochastic natural-language processing. comparing the performance of pltigs, with non-hierarchical n-gram models and pcfgs, we show that pltig combines the best aspects of both, with language modeling capability comparable to n-gram models and pcfgs, we show that pltig combines the best aspects of both, with language modeling capability comparable to n-grams, and improved parsing performance over its nonlexicalized counterpart. furthermore, training of pltigs displays faster convergence than pcfgs. entropy rate constancy in text. we present a constancy rate principle governing language generation. we show that this principle implies that local measures of entropy (ignoring context) should increase with the sentence number. we demonstrate that this is indeed the case by measuring entropy in three different ways. we also show that this effect has both lexical (which words are used) and non-lexical (how the words are used) causes. anaphora resolution: short-term memory and focusing. anaphora resolution is the process of determining the referent of anaphors, such as definite noun phrases and pronouns, in a discourse. computational linguists, in modeling the process of anaphora resolution. have proposed the notion of focusing. focusing is the process, engaged in by a reader of selecting a subset of the discourse items and making them highly available for further computations. this paper provides a cognitive basis for anaphora resolution and focusing. human memory is divided into a short-term, an operating, and a long-term memory. short-term memory can only contain a small number of meaning units and its retrieval time is fast. short-term memory is divided into a cache and a buffer. the cache contains a subset of meaning units expressed in the previous sentences and the buffer holds a representation of the incoming sentence. focusing is realized in the cache that contains a subset of the most topical units and a subset of the most recent units in the text. the information stored in the cache is used to integrate the incoming sentence with the preceding discourse. pronouns should be used to refer to units in focus. operating memory contains a very large number of units but its retrieval time is slow. it contains the previous text units that are not in the cache. it comprises the text units not in focus. definite noun phrases should be used to refer to units not in focus. two empirical studies are described that demonstrate the cognitive basis for focusing, the use of definite noun pphrases to refer to antecedents not in focus. and the use of pronouns to refer to antecedents in focus. word order in german: a formal dependency grammar using a topological hierarchy. this paper proposes a description of german word order including phenomena considered as complex, such as scrambling, (partial) vp fronting and verbal pied piping. our description relates a syntactic dependency structure directly to a topological hierarchy without resorting to movement or similar mechanisms. mechanisms for mixed-initiative human-computer collaborative discourse. in this paper, we examine mechanisms for automatic dialogue initiative setting. we show how to incorporate initiative changing in a task-oriented human-computer dialogue system, and we evaluate the effects of initiative both analytically and via computer-computer dialogue simulation. a polynomial parsing algorithm for the topological model: synchronizing constituent and dependency grammars, illustrated by german word order phenomena. this paper describes a minimal topology driven parsing algorithm for topological grammars that synchronizes a rewriting grammar and a dependency grammar, obtaining two linguistically motivated syntactic structures. the use of non-local slash and visitor features can be restricted to obtain a cky type analysis in polynomial time. german long distance phenomena illustrate the algorithm, bringing to the fore the procedural needs of the analyses of syntax-topology mismatches in constraint based approaches like for example hpsg. one tokenization per source. we report in this paper the observation of one tokenization per source. that is, the same critical fragment in different sentences from the same source almost always realize one and the same of its many possible tokenizations. this observation is demonstrated very helpful in sentence tokenization practice, and is argued to be with far-reaching implications in natural language processing. supervised grammar induction using training data with limited constituent information. corpus-based grammar induction generally relies on hand-parsed training data to learn the structure of the language. unfortunately, the cost of building large annotated corpora is prohibitively expensive. this work aims to improve the induction strategy when there are few labels in the training data. we show that the most informative linguistic constituents are the higher nodes in the parse trees, typically denoting complex noun phrases and sentential clauses. they account for only 20% of all constituents. for inducing grammars from sparsely labeled training data (e.g., only higher-level constituent labels), we propose an adaptation strategy, which produces grammars that parse almost as well as grammars induced from fully labeled corpora. our results suggest that for a partial parser to replace human annotators, it must be able to automatically extract higher-level constituents rather than base noun phrases. fast decoding and optimal decoding for machine translation. a good decoding algorithm is critical to the success of any statistical machine translation system. the decoder's job is to find the translation that is most likely according to set of previously learned parameters (and a formula for combining them). since the space of possible translations is extremely large, typical decoding algorithms are only able to examine a portion of it, thus risking to miss good solutions. in this paper, we compare the speed and output quality of a traditional stack-based decoding algorithm with two new decoders: a fast greedy decoder and a slow but optimal decoder that treats decoding as an integer-programming optimization problem. memory capacity and sentence processing. the limited capacity of working memory is intrinsic to human sentence processing, and therefore must be addressed by any theory of human sentence processing. this paper gives a theory of garden-path effects and processing overload that is based on simple assumptions about human short term memory capacity. accessing germanet data and computing semantic relatedness. we present an api developed to access germanet, a lexical semantic database for german represented in xml. the api provides a set of software functions for parsing and retrieving information from germanet. then, we present a case study which builds upon the germanet api and implements an application for computing semantic relatedness according to five different metrics. the package can, again, serve as a software library to be deployed in natural language processing applications. a graphical user interface allows to interactively experiment with the system. grammar viewed as a functional part of a cognitive system. how can grammar be viewed as a functional part of a cognitive system? given a neural basis for the processing control paradigm of language performance, what roles does "grammar" play? is there evidence to suggest that grammatical processing can be independent from other aspects of language processing?this paper will focus on these issues and suggest answers within the context of one computational solution. the example model of sentence comprehension, hope, is intended to demonstrate both representational considerations for a grammar within such a system as well as to illustrate that by interpreting a grammar as a feedback control mechanism of a "neural-like" process, additional insights into language processing can be obtained. subject-dependent co-occurence and word sense disambiguation. we describe a method for obtaining subject-dependent word sets relative to some (subject) domain. using the subject classifications given in the machine-redable version of longman's dictionary of contemporary english, we established subject-dependent co-occurrence links between words of the defining vocabulary to construct these "neighborhoods". here, we describe the application of these neighborhoods to information retrieval, and present a method of word sense disambiguation based on these co-occurrences, an extension of previous work. would i lie to you? modelling misrepresentation and context in dialogue. in this paper we discuss a mechanism for modifying context in a tutorial dialogue. the context mechanism imposes a pedagogically motivated misrepresentation (pmm) on a dialogue to achieve instructional goals. in the paper, we outline several types of pmms and detail a particular pmm in a sample dialogue situation. while the notion of pmms are specifically oriented towards tutorial dialogue, misrepresentation has interesting implications for context in dialogue situations generally, and also suggests that grice's maxim of quality needs to be modified. loosely tree-based alignment for machine translation. we augment a model of translation based on re-ordering nodes in syntactic trees in order to allow alignments not conforming to the original tree structure, while keeping computational complexity polynomial in the sentence length. this is done by adding a new subtree cloning operation to either tree-to-string or tree-to-tree alignment algorithms. reduced n-gram models for english and chinese corpora. statistical language models should improve as the size of the n-grams increases from 3 to 5 or higher. however, the number of parameters and calculations, and the storage requirement increase very rapidly if we attempt to store all possible combinations of n-grams. to avoid these problems, the reduced n-grams' approach previously developed by o'boyle (1993) can be applied. a reduced n-gram language model can store an entire corpus's phrase-history length within feasible storage limits. another theoretical advantage of reduced n-grams is that they are closer to being semantically complete than traditional models, which include all n-grams. in our experiments, the reduced n-gram zipf curves are first presented, and compared with previously obtained conventional n-grams for both english and chinese. the reduced n-gram model is then applied to large english and chinese corpora. for english, we can reduce the model sizes, compared to 7-gram traditional model sizes, with factors of 14.6 for a 40-million-word corpus and 11.0 for a 500-million-word corpus while obtaining 5.8% and 4.2% improvements in perplexities. for chinese, we gain a 16.9% perplexity reductions and we reduce the model size by a factor larger than 11.2. this paper is a step towards the modeling of english and chinese using semantically complete phrases in an n-gram model. automatic labeling of semantic roles. we present a system for identifying the semantic relationships, or semantic roles, filled by constituents of a sentence within a semantic frame. various lexical and syntactic features are derived from parse trees and used to derive statistical classifiers from hand-annotated training data. a generalization of the offline parsable grammars. the offline parsable grammars apparently have enough formal power to describe human language, yet the parsing problem for these grammars is solvable. unfortunately they exclude grammars that use x-bar theory - and these grammars have strong linguistic justification. we define a more general class of unification grammars, which admits x-bar grammars while preserving the desirable properties of offline parsable grammars. automatic induction of finite state transducers for simple phonological rules. this paper presents a method for learning phonological rules from sample pairs of underlying and surface forms, without negative evidence. the learned rules are represented as finite state transducers that accept underlying forms as input and generate surface forms as output. the algorithm for learning them is an extension of the ostia algorithm for learning general subsequential finite state transducers. although ostia is capable of learning arbitrary s.f.s.t's in the limit, large dictionaries of actual english pronunciations did not give enough samples to correctly induce phonological rules. we then augmented ostia with two kinds of knowledge specific to natural language phonology, biases from "universal grammar". one bias is that underlying phones are often realized as phonetically similar or identical surface phones. the other biases phonological rules to apply across natural phonological classes. the additions helped in learning more compact, accurate, and general transducers than the unmodified ostia algorithm. an implementation of the algorithm successfully learns a number of english postlexical rules. arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. we present an approach to using a morphological analyzer for tokenizing and morphologically tagging (including part-of-speech tagging) arabic words in one process. we learn classifiers for individual morphological features, as well as ways of using these classifiers to choose among entries from the output of the analyzer. we obtain accuracy rates on all tasks in the high nineties. the necessity of parsing for predicate argument recognition. broad-coverage corpora annotated with semantic role, or argument structure, information are becoming available for the first time. statistical systems have been trained to automatically label semantic roles from the output of statistical parsers on unannotated text. in this paper, we quantify the effect of parser accuracy on these systems' performance, and examine the question of whether a flatter "chunked" representation of the input can be as effective for the purposes of semantic role identification. magead: a morphological analyzer and generator for the arabic dialects. we present magead, a morphological analyzer and generator for the arabic language family. our work is novel in that it explicitly addresses the need for processing the morphology of the dialects. magead performs an on-line analysis to or generation from a root+pattern+features representation, it has separate phonological and orthographic representations, and it allows for combining morphemes from different dialects. we present a detailed evaluation of magead. factoring synchronous grammars by sorting. synchronous context-free grammars (scfgs) have been successfully exploited as translation models in machine translation applications. when parsing with an scfg, computational complexity grows exponentially with the length of the rules, in the worst case. in this paper we examine the problem of factorizing each rule of an input scfg to a generatively equivalent set of rules, each having the smallest possible length. our algorithm works in time o(n log n), for each rule of length n. this improves upon previous results and solves an open problem about recognizing permutations that can be factored. separable verbs in a reusable morphological dictionary for german. separable verbs are verbs with prefixes which, depending on the syntactic context, can occur as one word written together or discontinuously. they occur in languages such as german and dutch and constitute a problem for nlp because they are lexemes whose forms cannot always be recognized by dictionary lookup on the basis of a text word. conventional solutions take a mixed lexical and syntactic approach. in this paper, we propose the solution offered by word manager, consisting of string-based recognition by means of rules of types also required for periphrastic inflection and clitics. in this way, separable verbs are dealth with as part of the domain of reusable lexical resources. we show how this solution compares favourably with conventional approaches. low-cost enrichment of spanish wordnet with automatically translated glosses: combining general and specialized models. this paper studies the enrichment of spanish wordnet with synset glosses automatically obtained from the english word-net glosses using a phrase-based statistical machine translation system. we construct the english-spanish translation system from a parallel corpus of proceedings of the european parliament, and study how to adapt statistical models to the domain of dictionary definitions. we build specialized language and translation models from a small set of parallel definitions and experiment with robust manners to combine them. a statistically significant increase in performance is obtained. the best system is finally used to generate a definition for all spanish synsets, which are currently ready for a manual revision. as a complementary issue, we analyze the impact of the amount of in-domain data needed to improve a system trained entirely on out-of-domain data. semantics of temporal queries and temporal data. this paper analyzes the requirements for adding a temporal reasoning component to a natural language database query system, and proposes a computational model that satisfies those requirements. a preliminary implementation in prolog is used to generate examples of the model's capabilities. resolving ellipsis in clarification. we offer a computational analysis of the resolution of ellipsis in certain cases of dialogue clarification. we show that this goes beyond standard techniques used in anaphora and ellipsis resolution and requires operations on highly structured, linguistically heterogeneous representations. we characterize these operations and the representations on which they operate. we offer an analysis couched in a version of head-driven phrase structure grammar combined with a theory of information states (is) in dialogue. we sketch an algorithm for the process of utterance integration in iss which leads to grounding or clarification. prototype-driven grammar induction. we investigate prototype-driven learning for primarily unsupervised grammar induction. prior knowledge is specified declaratively, by providing a few canonical examples of each target phrase type. this sparse prototype information is then propagated across a corpus using distributional similarity features, which augment an otherwise standard pcfg model. we show that distributional features are effective at distinguishing bracket labels, but not determining bracket locations. to improve the quality of the induced trees, we combine our pcfg induction with the ccm model of klein and manning (2002), which has complementary stengths: it identifies brackets but does not label them. using only a handful of prototypes, we show substantial improvements over naive pcfg induction for english and chinese grammar induction. scaling up from dialogue to multilogue: some principles and benchmarks. the paper considers how to scale up dialogue protocols to multilogue, settings with multiple conversationalists. we extract two benchmarks to evaluate scaled up protocols based on the long distance resolution possibilities of non-sentential utterances in dialogue and multilogue in the british national corpus. in light of these benchmarks, we then consider three possible transformations to dialogue protocols, formulated within an issue-based approach to dialogue management. we show that one such transformation yields protocols for querying and assertion that fulfill these benchmarks. selection of effective contextual information for automatic synonym acquisition. various methods have been proposed for automatic synonym acquisition, as synonyms are one of the most fundamental lexical knowledge. whereas many methods are based on contextual clues of words, little attention has been paid to what kind of categories of contextual information are useful for the purpose. this study has experimentally investigated the impact of contextual information selection, by extracting three kinds of word relationships from corpora: dependency, sentence co-occurrence, and proximity. the evaluation result shows that while dependency and proximity perform relatively well by themselves, combination of two or more kinds of contextual information gives more stable performance. we've further investigated useful selection of dependency relations and modification categories, and it is found that modification has the greatest contribution, even greater than the widely adopted subject-object combination. phrase linguistic classification and generalization for improving statistical machine translation. in this paper a method to incorporate linguistic information regarding single-word and compound verbs is proposed, as a first step towards an smt model based on linguistically-classified phrases. by substituting these verb structures by the base form of the head verb, we achieve a better statistical word alignment performance, and are able to better estimate the translation model and generalize to unseen verb forms during translation. preliminary experiments for the english - spanish language pair are performed, and future research lines are detailed. centering in-the-large: computing referential discourse segments. we specify an algorithm that builds up a hierarchy of referential discourse segments from local centering data. the spatial extension and nesting of these discourse segments constrain the reachability of potential antecedents of an anaphoric expression beyond the local level of adjacent center pairs. thus, the centering model is scaled up to the level of the global referential structure of discourse. an empirical evaluation of the algorithm is supplied. semantic role labeling via framenet, verbnet and propbank. this article describes a robust semantic parser that uses a broad knowledge base created by interconnecting three major resources: framenet, verbnet and propbank. the framenet corpus contains the examples annotated with semantic roles whereas the verbnet lexicon provides the knowledge about the syntactic behavior of the verbs. we connect verbnet and framenet by mapping the framenet frames to the verbnet intersective levin classes. the propbank corpus, which is tightly connected to the verbnet lexicon, is used to increase the verb coverage and also to test the effectiveness of our approach. the results indicate that our model is an interesting step towards the design of more robust semantic parsers. a text understander that learns. we introduce an approach to the automatic acquisition of new concepts from natural language texts which is tightly integrated with the underlying text understanding process. the learning model is centered around the 'quality' of different forms of linguistic and conceptual evidence which underlies the incremental generation and refinement of alternative concept hypotheses, each one capturing a different conceptual reading for an unknown lexical item. speeding up full syntactic parsing by leveraging partial parsing decisions. parsing is a computationally intensive task due to the combinatorial explosion seen in chart parsing algorithms that explore possible parse trees. in this paper, we propose a method to limit the combinatorial explosion by restricting the cyk chart parsing algorithm based on the output of a chunk parser. when tested on the three parsers presented in (collins, 1999), we observed an approximate three-fold speedup with only an average decrease of 0.17% in both precision and recall. project april: a progress report. parsing techniques based on rules defining grammaticality are difficult to use with authentic inputs, which are often grammatically messy. instead, the april system seeks a labelled tree structure which maximizes a numerical measure of conformity to statistical norms derived from a sample of parsed text. no distinction between legal and illegal trees arises: any labelled tree has a value. because the search space is large and has an irregular geometry, april seeks the best tree using simulated annealing, a stochastic optimization technique. beginning with an arbitrary tree, many randomly-generated local modifications are considered and adopted or rejected according to their effect on tree-value: acceptance decisions are made probabilistically, subject to a bias against adverse moves which is very weak at the outset but is made to increase as the random walk through the search space continues. this enables the system to converge on the global optimum without getting trapped in local optima. performance of an early version of the april system on authentic inputs is yielding analyses with a mean accuracy of 75.3% using a schedule which increases processing linearly with sentence-length; modifications currently being implemented should eliminate a high proportion of the remaining errors. domain kernels for word sense disambiguation. in this paper we present a supervised word sense disambiguation methodology, that exploits kernel methods to model sense distinctions. in particular a combination of kernel functions is adopted to estimate independently both syntagmatic and domain similarity. we defined a kernel function, namely the domain kernel, that allowed us to plug "external knowledge" into the supervised learning process. external knowledge is acquired from unlabeled data in a totally unsupervised way, and it is represented by means of domain models. we evaluated our methodology on several lexical sample tasks in different languages, outperforming significantly the state-of-the-art for each of them, while reducing the amount of labeled training data required for learning. exploiting comparable corpora and bilingual dictionaries for cross-language text categorization. cross-language text categorization is the task of assigning semantic classes to documents written in a target language (e.g. english) while the system is trained using labeled documents in a source language (e.g. italian).in this work we present many solutions according to the availability of bilingual resources, and we show that it is possible to deal with the problem even when no such resources are accessible. the core technique relies on the automatic acquisition of multilingual domain models from comparable corpora.experiments show the effectiveness of our approach, providing a low cost solution for the cross language text categorization task. in particular, when bilingual dictionaries are available the performance of the categorization gets close to that of monolingual text categorization. serial combination of rules and statistics: a case study in czech tagging. a hybrid system is described which combines the strength of manual rule-writing and statistical learning, obtaining results superior to both methods if applied separately. the combination of a rule-based system and a statistical one is not parallel but serial: the rule-based system performing partial disambiguation with recall close to 100% is applied first, and a trigram hmm tagger runs on its results. an experiment in czech tagging has been performed with encouraging results. a computational framework for composition in multiple linguistic domains. we describe a computational framework for a grammar architecture in which different linguistic domains such as morphology, syntax, and semantics are treated not as separate components but compositional domains. the framework is based on combinatory categorial grammars and it uses the morpheme as the basic building block of the categorial lexicon. topic-focus and salience. most of the current work on corpus annotation is concentrated on morphemics, lexical semantics and sentence structure. however, it becomes more and more obvious that attention should and can be also paid to phenomena that reflect the links between a sentence and its context, i.e. the discourse anchoring of utterances. if conceived in this way, an annotated corpus can be used as a resource for linguistic research not only within the limits of the sentence, but also with regard to discourse patterns. thus, the applications of the research to issues of information retrieval and extraction may be made more effective; also applications in new domains become feasible, be it to serve for inner linguistic (and literary) aims, such as text segmentation, specification of topics of parts of a discourse, or for other disciplines. lazy unification. unification-based nl parsers that copy argument graphs to prevent their destruction suffer from inefficiency. copying is the most expensive operation in such parsers, and several methods to reduce copying have been devised with varying degrees of success. lazy unification is presented here as a new, conceptually elegant solution that reduces copying by nearly an order of magnitude. lazy unification requires no new slots in the structure of nodes, and only nominal revisions to the unification algorithm. pcfgs with syntactic and prosodic indicators of speech repairs. a grammatical method of combining two kinds of speech repair cues is presented. one cue, prosodic disjuncture, is detected by a decision tree-based ensemble classifier that uses acoustic cues to identify where normal prosody seems to be interrupted (lickley, 1996). the other cue, syntactic parallelism, codifies the expectation that repairs continue a syntactic category that was left unfinished in the reparandum (levelt, 1983). the two cues are combined in a treebank pcfg whose states are split using a few simple tree transformations. parsing performance on the switchboard and fisher corpora suggests that these two cues help to locate speech repairs in a synergistic way. an unsupervised model for statistically determining coordinate phrase attachment. this paper examines the use of an unsupervised statistical model for determining the attachment of ambiguous coordinate phrases (cp) of the form n1 p n2 cc n3. the model presented here is based on [ar98], an unsupervised model for determining prepositional phrase attachment. after training on unannotated 1988 wall street journal text, the model performs at 72% accuracy on a development set from sections 14 through 19 of the wsj treebank [msm93]. attention shifting for parsing speech. we present a technique that improves the efficiency of word-lattice parsing as used in speech recognition language modeling. our technique applies a probabilistic parser iteratively where on each iteration it focuses on a different subset of the word-lattice. the parser's attention is shifted towards word-lattice subsets for which there are few or no syntactic analyses posited. this attention-shifting technique provides a six-times increase in speed (measured as the number of parser analyses evaluated) while performing equivalently when used as the first-stage of a multi-stage parsing-based language model. noun phrase chunking in hebrew: influence of lexical and morphological features. we present a method for noun phrase chunking in hebrew. we show that the traditional definition of base-nps as non-recursive noun phrases does not apply in hebrew, and propose an alternative definition of simple nps. we review syntactic properties of hebrew related to noun phrases, which indicate that the task of hebrew simplenp chunking is harder than base-np chunking in english. as a confirmation, we apply methods known to work well for english to hebrew data. these methods give low results (f from 76 to 86) in hebrew. we then discuss our method, which applies svm induction over lexical and morphological features. morphological features improve the average precision by ~0.5%, recall by ~1%, and f-measure by ~0.75, resulting in a system with average performance of 93% precision, 93.4% recall and 93.2 f-measure. discriminative classifiers for deterministic dependency parsing. deterministic parsing guided by treebank-induced classifiers has emerged as a simple and efficient alternative to more complex models for data-driven parsing. we present a systematic comparison of memory-based learning (mbl) and support vector machines (svm) for inducing classifiers for deterministic dependency parsing, using data from chinese, english and swedish, together with a variety of different feature models. the comparison shows that svm gives higher accuracy for richly articulated feature models across all languages, albeit with considerably longer training times. the results also confirm that classifier-based deterministic parsing can achieve parsing accuracy very close to the best results reported for more complex parsing models. combining trigram-based and feature-based methods for context-sensitive spelling correction. this paper addresses the problem of correcting spelling errors that result in valid, though unintended words (such as peace and piece, or quiet and quite) and also the problem of correcting particular word usage errors (such as amount and number, or among and between). such corrections require contextual information and are not handled by conventional spelling programs such as unix spell. first, we introduce a method called trigrams that uses part-of-speech trigrams to encode the context. this method uses a small number of parameters compared to previous methods based on word trigrams. however, it is effectively unable to distinguish among words that have the same part of speech. for this case, an alternative feature-based method called bayes performs better; but bayes is less effective than trigrams when the distinction among words depends on syntactic constraints. a hybrid method called tribayes is then introduced that combines the best of the previous two methods. the improvement in performance of tribayes over its components is verified experimentally. tribayes is also compared with the grammar checker in microsoft word, and is found to have substantially higher performance. event extraction in a plot advice agent. in this paper we present how the automatic extraction of events from text can be used to both classify narrative texts according to plot quality and produce advice in an interactive learning environment intended to help students with story writing. we focus on the story rewriting task, in which an exemplar story is read to the students and the students rewrite the story in their own words. the system automatically extracts events from the raw text, formalized as a sequence of temporally ordered predicate-arguments. these events are given to a machine-learner that produces a coarse-grained rating of the story. the results of the machine-learner and the extracted events are then used to generate fine-grained advice for the students. contextual dependencies in unsupervised word segmentation. developing better methods for segmenting continuous text into words is important for improving the processing of asian languages, and may shed light on how humans learn to segment speech. we propose two new bayesian word segmentation methods that assume unigram and bigram models of word dependencies respectively. the bigram model greatly outperforms the unigram model (and previous probabilistic models), demonstrating the importance of such dependencies for word segmentation. we also show that previous probabilistic models rely crucially on sub-optimal search procedures. linguistic profiling for authorship recognition and verification. a new technique is introduced, linguistic profiling, in which large numbers of counts of linguistic features are used as a text profile, which can then be compared to average profiles for groups of texts. the technique proves to be quite effective for authorship verification and recognition. the best parameter settings yield a false accept rate of 8.1% at a false reject rate equal to zero for the verification task on a test corpus of student essays, and a 99.4% 2-way recognition accuracy on the same corpus. building verb predicates: a computational view. a method for the definition of verb predicates is proposed. the definition of the predicates is essentially tied to a semantic interpretation algorithm that determines the predicate for the verb, its semantic roles and adjuncts. as predicate definitions are complete, they can be tested by running the algorithm on some sentences and verifying the resolution of the predicate, semantic roles and adjuncts in those sentences. the predicates are defined semiautomatically with the help of a software environment that uses several sections of a corpus to provide feedback for the definition of the predicates, and then for the subsequent testing and refining of the definitions. the method is very flexible in adding a new predicate to a list of already defined predicates for a given verb. the method builds on an existing approach that defines predicates for wordnet verb classes, and that plans to define predicates for every english verb. the definitions of the predicates and the semantic interpretation algorithm are being used to automatically create a corpus of annotated verb predicates, semantic roles and adjuncts. improving data driven wordclass tagging by system combination. in this paper we examine how the differences in modelling between different data driven systems performing the same nlp task can be exploited to yield a higher accuracy than the best individual system. we do this by means of an experiment involving the task of morpho-syntactic wordclass tagging. four well-known tagger generator (hidden markov model, memory-based, transformation rules and maximum entropy) are trained on the same corpus data. after comparison, their outputs are combined using several voting strategies and second stage classifiers. all combination taggers outperform their best component, with the best combination showing a 19.1% lower error rate than the best indvidual tagger. detection of quotations and inserted clauses and its application to dependency structure analysis in spontaneous japanese. japanese dependency structure is usually represented by relationships between phrasal units called bunsetsus. one of the biggest problems with dependency structure analysis in spontaneous speech is that clause boundaries are ambiguous. this paper describes a method for detecting the boundaries of quotations and inserted clauses and that for improving the dependency accuracy by applying the detected boundaries to dependency structure analysis. the quotations and inserted clauses are determined by using an svm-based text chunking method that considers information on morphemes, pauses, fillers, etc. the information on automatically analyzed dependency structure is also used to detect the beginning of the clauses. our evaluation experiment using corpus of spontaneous japanese (csj) showed that the automatically estimated boundaries of quotations and inserted clauses helped to improve the accuracy of dependency structure analysis. sequential conditional generalized iterative scaling. we describe a speedup for training conditional maximum entropy models. the algorithm is a simple variation on generalized iterative scaling, but converges roughly an order of magnitude faster, depending on the number of constraints, and the way speed is measured. rather than attempting to train all model parameters simultaneously, the algorithm trains them sequentially. the algorithm is easy to implement, typically uses only slightly more memory, and will lead to improvements for most maximum entropy problems. a step towards the detection of semantic variants of terms in technical documents. this paper reports the results of a preliminary experiment on the detection of semantic variants of terms in a french technical document. the general goal of our work is to help the structuration of terminologies. two kinds of semantic variants can be found in traditional terminologies: strict synonymy links and fuzzier relations like see-also. we have designed three rules which exploit general dictionary information to infer synonymy relations between complex candidate terms. the results have been examined by a human terminologist. the expert has judged that half of the overall pairs of terms are relevant for the semantic variation. he validated an important part of the detected links as synonymy. moreover, it appeared that numerous errors are due to few mis-interpreted links: they could be eliminated by few exception rules. repairing reference identification failures by relaxation. the goal of this work is the enrichment of human-machine interactions in a natural language environment. we want to provide a framework less restrictive than earlier ones by allowing a speaker leeway in forming an utterance about a task and in determining the conversational vehicle to deliver it. a speaker and listener cannot be assured to have the same beliefs, contexts, backgrounds or goals at each point in a conversation. as a result, difficulties and mistakes arise when a listener interprets a speaker's utterance. these mistakes can lead to various kinds of misunderstandings between speaker and listener, including reference failures or failure to understand the speaker's intention. we call these misunderstandings miscommunication. such mistakes constitute a kind of "ill-formed" input that can slow down and possibly break down communication. our goal is to recognize and isolate such miscommunications and circumvent them. this paper will highlight a particular class of miscommunication - reference problems - by describing a case study, including techniques for avoiding failures of reference. improving english subcategorization acquisition with diathesis alternations as heuristic information. automatically acquired lexicons with subcategorization information have already proved accurate and useful enough for some purposes but their accuracy still shows room for improvement. by means of diathesis alternation, this paper proposes a new filtering method, which improved the performance of korhonen's acquisition system remarkably, with the precision increased to 91.18% and recall unchanged, making the acquired lexicon much more practical for further manual proofreading and other nlp uses. parsing algorithms and metrics. many different metrics exist for evaluating parsing results, including viterbi, crossing brackets rate, zero crossing brackets rate, and several others. however, most parsing algorithms, including the viterbi algorithm, attempt to optimize the same metric, namely the probability of getting the correct labelled tree. by choosing a parsing algorithm appropriate for the evaluation metric, better performance can be achieved. we present two new algorithms: the "labelled recall algorithm," which maximizes the expected labelled recall rate, and the "bracketed recall algorithm," which maximizes the bracketed recall rate. experimental results are given, showing that the two new algorithms have improved performance over the viterbi algorithm on many criteria, especially the ones that they optimize. an application of wordnet to prepositional attachment. this paper presents a method for word sense disambiguation and coherence understanding of prepositional relations. the method relies on information provided by wordnet 1.5. we first classify prepositional attachments according to semantic equivalence of phrase heads and then apply inferential heuristics for understanding the validity of prepositional structures. recognizing expressions of commonsense psychology in english text. many applications of natural language processing technologies involve analyzing texts that concern the psychological states and processes of people, including their beliefs, goals, predictions, explanations, and plans. in this paper, we describe our efforts to create a robust, large-scale lexical-semantic resource for the recognition and classification of expressions of commonsense psychology in english text. we achieve high levels of precision and recall by hand-authoring sets of local grammars for commonsense psychology concepts, and show that this approach can achieve classification performance greater than that obtained by using machine learning techniques. we demonstrate the utility of this resource for large-scale corpus analysis by identifying references to adversarial and competitive goals in political speeches throughout u.s. history. methods for using textual entailment in open-domain question answering. work on the semantics of questions has argued that the relation between a question and its answer(s) can be cast in terms of logical entailment. in this paper, we demonstrate how computational systems designed to recognize textual entailment can be used to enhance the accuracy of current open-domain automatic question answering (q/a) systems. in our experiments, we show that when textual entailment information is used to either filter or rank answers returned by a q/a system, accuracy can be increased by as much as 20% overall. scaling distributional similarity to large corpora. accurately representing synonymy using distributional similarity requires large volumes of data to reliably represent infrequent words. however, the naïve nearest-neighbour approach to comparing context vectors extracted from large corpora scales poorly (o(n2) in the vocabulary size).in this paper, we compare several existing approaches to approximating the nearest-neighbour search for distributional similarity. we investigate the trade-off between efficiency and accuracy, and find that sash (houle and sakuma, 2005) provides the best balance. experiments with interactive question-answering. this paper describes a novel framework for interactive question-answering (q/a) based on predictive questioning. generated off-line from topic representations of complex scenarios, predictive questions represent requests for information that capture the most salient (and diverse) aspects of a topic. we present experimental results from large user studies (featuring a fully-implemented interactive q/a system named ferret) that demonstrates that surprising performance is achieved by integrating predictive questions into the context of a q/a dialogue. the role of lexico-semantic feedback in open-domain textual question-answering. this paper presents an open-domain textual question-answering system that uses several feedback loops to enhance its performance. these feedback loops combine in a new way statistical results with syntactic, semantic or pragmatic information derived from texts and lexical databases. the paper presents the contribution of each feedback loop to the overall performance of 76% human-assessed precise answers. interleaving universal principles and relational constraints over typed feature logic. we introduce a typed feature logic system providing both universal implicational principles as well as definite clauses over feature terms. we show that such an architecture supports a modular encoding of linguistic theories and allows for a compact representation using underspecification. the system is fully implemented and has been used as a workbench to develop and test large hpsg grammars. the techniques described in this paper are not restricted to a specific implementation, but could be added to many current feature-based grammar development systems. an efficient parsing algorithm for tree adjoining grammars. in the literature, tree adjoining grammars (tags) are propagated to be adequate for natural language description --- analysis as well as generation. in this paper we concentrate on the direction of analysis. especially important for an implementation of that task is how efficiently this can be done, i.e., how readily the word problem can be solved for tags. up to now, a parser with o(n6) steps in the worst case was known where n is the length of the input string. in this paper, the result is improved to o(n4 log n) as a new lowest upper bound. the paper demonstrates how local interpretion of tag trees allows this reduction. aligning words using matrix factorisation. aligning words from sentences which are mutual translations is an important problem in different settings, such as bilingual terminology extraction, machine translation, or projection of linguistic features. here, we view word alignment as matrix factorisation. in order to produce proper alignments, we show that factors must satisfy a number of constraints such as orthogonality. we then propose an algorithm for orthogonal non-negative matrix factorisation, based on a probabilistic model of the alignment data, and apply it to word alignment. this is illustrated on a french-english alignment task from the hansard. an algorithm for vp ellipsis. an algorithm is proposed to determine antecedents for vp ellipsis. the algorithm eliminates impossible antecedents, and then imposes a preference ordering on possible antecedents. the algorithm performs with 94% accuracy on a set of 304 examples of vp ellipsis collected from the brown corpus. the problem of determining antecedents for vp ellipsis has received little attention in the literature, and it is shown that the current proposal is a significant improvement over alternative approaches. parsing aligned parallel corpus by projecting syntactic relations from annotated source corpus. example-based parsing has already been proposed in literature. in particular, attempts are being made to develop techniques for language pairs where the source and target languages are different, e.g. direct projection algorithm (hwa et al., 2005). this enables one to develop parsed corpus for target languages having fewer linguistic tools with the help of a resource-rich source language. the dpa algorithm works on the assumption of direct correspondence which simply means that the relation between two words of the source language sentence can be projected directly between the corresponding words of the parallel target language sentence. however, we find that this assumption does not hold good all the time. this leads to wrong parsed structure of the target language sentence. as a solution we propose an algorithm called pseudo dpa (pdpa) that can work even if direct correspondence assumption is not guaranteed. the proposed algorithm works in a recursive manner by considering the embedded phrase structures from outermost level to the innermost. the present work discusses the pdpa algorithm, and illustrates it with respect to english-hindi language pair. link grammar based parsing has been considered as the underlying parsing scheme for this work. polyphony and argumentative semantics. we extract from sentences a superstructure made of argumentative operators and connectives applying to the remaining set of terminal sub-sentences. we found the argumentative interpretation of utterances on a semantics defined at the linguistic level. we describe the computation of this particular semantics, based on the constraints that the superstructure impels to the argumentative power of terminal subsentences. generation of vp ellipsis: a corpus-based approach. we present conditions under which verb phrases are elided based on a corpus of positive and negative examples. factor that affect verb phrase ellipsis include: the distance between antecedent and ellipsis site, the syntactic relation between antecedent and ellipsis site, and the presence or absence of adjuncts. building on these results, we examine where in the generation architecture a trainable algorithm for vp ellipsis should be located. we show that the best performance is achieved when the trainable module is located after the realizer and has access to surface-oriented features (error rate of 7.5%). normal state implicature. in the right situation, a speaker can use an unqualified indefinite description without being misunderstood. this use of language, normal state implicature, is a kind of conversational implicature, i.e. a non-truth-functional context-dependent inference based upon language users' awareness of principles of cooperative conversation. i present a convention for identifying normal state implicatures which is based upon mutual beliefs of the speaker and hearer about certain properties of the speaker's plan. a key property is the precondition that an entity playing a role in the plan must be in a normal state with respect to the plan. data-driven strategies for an automated dialogue system. we present a prototype natural-language problem-solving application for a financial services call center, developed as part of the amitiés multilingual human-computer dialogue project. our automated dialogue system, based on empirical evidence from real call-center conversations, features a data-driven approach that allows for mixed system/customer initiative and spontaneous conversation. preliminary evaluation results indicate efficient dialogues and high user satisfaction, with performance comparable to or better than that of current conversational travel information systems. local constraints on sentence markers and focus in somali. we present a computationally tractable account of the interactions between sentence markers and focus marking in somali. somali, as a cushitic language, has a basic pattern wherein a small 'core' clause is preceded, and in some cases followed by, a set of 'topics', which provide scene-seting information against which the core is interpreted. some topics appear to carry a 'focus marker', indicating that they are particularly salient. we will outline a computationally tractable grammar for somali in which focus marking emerges naturally from a consideration of the use of a range of sentence markers. conversational implicatures in indirect replies. in this paper we present algorithms for the interpretation and generation of a kind of particularized conversational implicature occurring in certain indirect replies. our algorithms make use of discourse expectations, discourse plans, and discourse relations. the algorithms calculate implicatures of discourse units of one or more sentences. our approach has several advantages. first, by taking discourse relations into account, it can capture a variety of implicatures not handled before. second, by treating implicatures of discourse units which may consist of more than one sentence, it avoids the limitations of a sentence-at-a-time approach. third, by making use of properties of discourse which have been used in models of other discourse phenomena, our approach can be integrated with those models. also, our model permits the same information to be used both in interpretation and generation. designer definites in logical form. in this paper, we represent singular definite noun phrases as functions in logical form. this representation is designed to model the behaviors of both anaphoric and non-anaphoric, distributive definites. it is also designed to obey the computational constraints suggested in harper [har88]. our initial representation of a definite places an upper bound on its behavior given its structure and location in a sentence. later, when ambiguity is resolved, the precise behavior of the definite is pinpointed. a hybrid reasoning model for indirect answers. this paper presents our implemented computational model for interpreting and generating indirect answers to yes-no questions. its main features are 1) a discourse-plan-based approach to implicature, 2) a reversible architecture for generation and interpretation, 3) a hybrid reasoning model that employs both plan inference and logical inference, and 4) use of stimulus conditions to model a speaker's motivation for providing appropriate, unrequested information. the model handles a wider range of types of indirect answers than previous computational models and has several significant advantages. inducing frame semantic verb classes from wordnet and ldoce. this paper presents semframe, a system that induces frame semantic verb classes from wordnet and ldoce. semantic frames are thought to have significant potential in resolving the paraphrase problem challenging many language-based applications.when compared to the handcrafted framenet, semframe achieves its best recall-precision balance with 83.2% recall (based on semframe's coverage of framenet frames) and 73.8% precision (based on semframe verbs' semantic relatedness to frame-evoking verbs). the next best performing semantic verb classes achieve 56.9% recall and 55.0% precision. mapping lexical entries in a verbs database to wordnet senses. this paper describes automatic techniques for mapping 9611 entries in a database of english verbs to wordnet senses. the verbs were initially grouped into 491 classes based on syntactic features. mapping these verbs into wordnet senses provides a resource that supports disambiguation in multilingual applications such as machine translation and cross-language information retrieval. our techniques make use of (1) a training set of 1791 disambiguated entries, representing 1442 verb entries from 167 classes; (2) word sense probabilities, from frequency counts in a tagged corpus; (3) semantic similarity of wordnet senses for verbs within the same class; (4) probabilistic correlations between wordnet data and attributes of the verb classes. the best results achieved 72% precision and 58% recall, versus a lower bound of 62% precision and 38% recall for assigning the most frequently occurring wordnet sense, and an upper bound of 87% precision and 75% recall for human judgment. semi-supervised conditional random fields for improved sequence segmentation and labeling. we present a new semi-supervised training procedure for conditional random fields (crfs) that can be used to train sequence segmentors and labelers from a combination of labeled and unlabeled training data. our approach is based on extending the minimum entropy regularization framework to the structured prediction case, yielding a training objective that combines unlabeled conditional entropy with labeled conditional likelihood. although the training objective is no longer concave, it can still be used to improve an initial model (e.g. obtained from supervised training) by iterative ascent. we apply our new training algorithm to the problem of identifying gene and protein mentions in biological texts, and show that incorporating unlabeled data improves the performance of the supervised crf in this case. analysis and repair of name tagger errors. name tagging is a critical early stage in many natural language processing pipelines. in this paper we analyze the types of errors produced by a tagger, distinguishing name classification and various types of name identification errors. we present a joint inference model to improve chinese name tagging by incorporating feedback from subsequent stages in an information extraction pipeline: name structure parsing, cross-document coreference, semantic relation extraction and event extraction. we show through examples and performance measurement how different stages can correct different types of errors. the resulting accuracy approaches that of individual human annotators. sextant: exploring unexplored contexts for semantic extraction from syntactic analysis. for a very long time, it has been considered that the only way of automatically extracting similar groups of words from a text collection for which no semantic information exists is to use document co-occurrence data. but, with robust syntactic parsers that are becoming more frequently available, syntactically recognizable phenomena about word usage can be confidently noted in large collections of texts. we present here a new system called sextant which uses these parsers and the finer-grained contexts they produce to judge word similarity. a collaborative framework for collecting thai unknown words from the web. we propose a collaborative framework for collecting thai unknown words found on web pages over the internet. our main goal is to design and construct a web-based system which allows a group of interested users to participate in constructing a thai unknown-word open dictionary. the proposed framework provides supporting algorithms and tools for automatically identifying and extracting unknown words from web pages of given urls. the system yields the result of unknown-word candidates which are presented to the users for verification. the approved unknown words could be combined with the set of existing words in the lexicon to improve the performance of many nlp tasks such as word segmentation, information retrieval and machine translation. our framework includes word segmentation and morphological analysis modules for handling the non-segmenting characteristic of thai written language. to take advantage of large available text resource on the web, our unknown-word boundary identification approach is based on the statistical string pattern-matching algorithm. using conditional random fields to predict pitch accents in conversational speech. the detection of prosodic characteristics is an important aspect of both speech synthesis and speech recognition. correct placement of pitch accents aids in more natural sounding speech, while automatic detection of accents can contribute to better word-level recognition and better textual understanding. in this paper we investigate probabilistic, contextual, and phonological factors that influence pitch accent placement in natural, conversational speech in a sequence labeling setting. we introduce conditional random fields (crfs) to pitch accent prediction task in order to incorporate these factors efficiently in a sequence model. we demonstrate the usefulness and the incremental effect of these factors in a sequence model by performing experiments on hand labeled data from the switchboard corpus. our model outperforms the baseline and previous models of pitch accent prediction on the switch-board corpus. mistake-driven mixture of hierarchical tag context trees. this paper proposes a mistake-driven mixture method for learning a tag model. the method iteratively performs two procedures: 1. constructing a tag model based on the current data distribution and 2. updating the distribution by focusing on data that are not well predicted by the constructed model. the final tag model is constructed by mixing all the models according to their performance. to well reflect the data distribution, we represent each tag model as a hierarchical tag (i.e., ntt