training dataset into histogram vectors.
algorithms use only the visual signatures for classifying the images.
constraints of the images should also be taken into account.
noise in the ANN (large clusters of very similar images which may reduce
Computer Interaction (HCI) continues to evolve, creating more natural interfaces
that increase productivity for a wider audience across a range of environments.
In particular mobile devices, used while moving, are receiving a lot of
attention in the post-desktop era (Daley, 2012). As a result of this shift, and
the increase in processing power and improved statistical language models,
speech recognition has grown in popularity as a way of interacting with mobile
devices. Smartphone applications such as Siri (Apple) and Cortana (Microsoft)
allow the user to book diary events, look up information, or ask for directions,
using only speech input.
While automatic speech recognition has
improved the interaction is not entirely natural as the application is unaware
of the user’s surroundings and unable to
refer to things as people typically do in conversation, for example to
comprehend a question such as ”What’s that statue over there?”, or to direct
the user to”the café next to the bridge”. To include such environmental
references these devices need to model their surroundings and refer to features
in common ways, so that the interface can become so natural and intuitive it is
not even noticed (Weiser, Gold & Brown, 1999).
It has been recognized for some time
that further progress in mobile HCI should include expanding the machine’s
abilities to refer to objects in the user’s
surroundings, and to consider the context in which the device is being used
(Bartie & Mackaness, 2006; Chen & Kotz, 2000; Long, Aust, Abowd,
&Atkeson, 1996; Noh, Lee, Oh, Hwang, & Cho, 2012; Zipf, 2002). A key
aspect of this link
between virtual and real worlds is the use of common anchor points, or
landmarks, which can be recognized and referred to by both the user and the
machine. Such as including a reference to a salient object when giving a
navigation instruction. There are a number of challenges in doing this, which
include having access
to a complete dataset of objects with corresponding attribute and positional
information, a method to identify landmark candidate.
Different from image-retrieval-based
approaches that return images having similar content, classification-based
approaches treats locations as labels and assign these labels to query images.
Hays et al.demonstrated that given a
large dataset, k-Nearest Neighbor (kNN) solution can achieve 16 percent
geo-location accuracy. We will use this method as a measurement of information loss
suffered by our compression technique. Discriminative models are a good way to
test whether a feature descriptor is appropriate for a particular task. Support
Vector Machines (SVM) have been used with some success to discriminate between
different regions. Crandall et al., for example, showed that SIFT is a good
feature for discriminating regions that are spatially close (e.g., city) while
perform poorly for global geo-localization. SVMs in particular do not scale to
the data sizes that arerequired for this task and obtaining a labeled set of
data to train such supervised learning algorithms is not trivial. Evaluating
the results is also difficult due to the fact that few works calculate “random”
performance. In Hayes, “chance” accuracy was defined in terms of the number of
classes (10) used in training the SVM. This is a reasonable approach, although
the number of images for different classes differs significantly (sometimes
almost an order of magnitude) and no confusion matrix was provided to help interpret
the results. The small size of our clothing dataset allows us to employ SVM as
a method for testing whether
clothing can be used for geo-locating images. However, due to the fact that
SVMs do not scale to the size of the datasets used in the general geo-location
results, we do not compare against this method. It should also be noted that
be seen as learning the probability of a feature belonging to a location L,
where this probability is 1 for one label and 0 for all others. It is not clear
what the set of labels should be and our method can be complementary to
use SVMs as the resulting graph can provide a set of discrete locations to use
The difficulty of obtaining an unbiased
dataset that models the actual distribution of world features has caused many
researchers to augment this data with external sources such as authorship ,
time between photos made by a single person , and textual tags However, such information is not always
available, and the goal in this work is to geo-locate images based only on
visual features. Therefore, we do not make a comparison against information
outside the image itself.
The LASOM is implemented using a 768
input (it is the number of values given by a RGB histogram function on each
image in the dataset)artificial neural network and the weights are assigned
randomly. These weights are modified in accordance with the error calculated
between the predicted and actual output. During the testing phase, an image
from the test set is given as input to the neural network, it is matched
against a particular feature from the feature list, and it is geo-tagged
according to the centroid of the cluster it belongs to.
The path of the folder which contains
the training data set is given as input during the training phase. The feature
reduction phase of the testing part takes input from the test data set. During
training phase, geo-tagged images(images with latitude and longitude values in
EXIF format) are taken as the data set. During testing phase, random images
form the data set.
The LASOM algorithm starts with a small
codebook dictionary, which can be randomly generated or created from training
samples. Given a query feature vector, f, LASOM finds the top two matching
codebook vectors. Two distance functions depending on the feature type (e.g.,
histogram or vector) will be used.
1: Collecting dataset manually using camera.
2: Designing the algorithm.
3: Coding the algorithm designed.
4: Training the model.
5: Implementation of prototype.
6: Testing the final application.
Automatically geo-locating images is a challenging task due to a
variety of factors. First, obtaining an unbiased dataset that represents the
true distribution of visual information found in the world is difficult (if not
the goal is not always clear. If the goal is to locate landmarks, then biases
in the dataset actually help. However, if locating a missing person who might
be kept in remote locations is desired, then the biases need to be avoided. The
compression algorithm, LASOM, mitigates the effects of biased datasets by
storing fewer images in over-sampled locations making the distribution of
stored images more
uniform and reducing the storage requirements by as much as 50 percent. This
lossy compression has minor impact on geo-localization accuracy.
billions of images stored on photo sharing websites.
help us in identifying their corresponding clusters.
To replace density
estimation methods and efficiently handle large datasets.
In this work, the automatic
construction of visual landmark recognition engines from Internet image
collections will be evaluated. A large-scale dataset of photos, collected a set
of typical query images, and created a ground truth for evaluating large-scale
photo auto-annotation will be used. For each component of the pipeline, how
different methods and parameters affect overall performance as well as the
performance for individual query categories. Several novel methods for various
subtasks, some of which out-perform literature approaches are proposed. In our
analysis, we identified areas where such a system performs well, as well as
areas where improvement is still possible.