Paper Cds

Proximity Image Retrieval: Identifying the Problems and Solutions for Searching in an Architectural and Urban Planning-Specific Database
Daniel Rude School of Information Studies University of Wisconsin-Milwaukee 3210 N. Maryland Ave., Bolton Hall 510 Milwaukee, WI 53211, U.S.A. dprude@uwm.edu Lucio Campanelli Kun Lu School of Information Studies School of Information Studies University of Wisconsin-Milwaukee University of Wisconsin-Milwaukee 3210 N. Maryland Ave., Bolton Hall 510 3210 N. Maryland Ave., Bolton Hall 510 Milwaukee, WI 53211, U.S.A. Milwaukee, WI 53211, U.S.A. lucio@uwm.edu kunlu@uwm.edu
Hohyon Ryu Xiangming Mu School of Information Studies School of Information Studies University of Wisconsin-Milwaukee University of Wisconsin-Milwaukee 3210 N. Maryland Ave., Bolton Hall 510 3210 N. Maryland Ave., Bolton Hall 510 Milwaukee, WI 53211, U.S.A. Milwaukee, WI 53211, U.S.A. hohyon@gmail.com mux@uwm.edu
ABSTRACT
This paper studies the search behavior of an image search retrieval system using the model of an architectural and urban planning database. In this mixed-method study, interviews and surveys were conducted to identify unique search behaviors employed by student-workers at Community Design Solutions, a non-profit architectural and urban planning organization at the School of Architecture and Urban Planning (SARUP) located at the University of Wisconsin-Milwaukee. The study looked for users to indicate the problems with the search strategies they formulated to navigate the current database, the problems they encountered and any suggestions they had for improving the system. Based off of the findings, we found that a lack of metadata and associated text, users had a very difficult time locating related images. From this, we propose a proximity retrieval system.
INTRODUCTION
Architecture and urban planning is a field that hinges precariously on the need for visual information. A database for architecture and urban planning then is unique in many ways. First, information is accumulated in a manner different from many other collections of data. Often, the documents are associated with a project such as a building or a
neighborhood. Second, the amount of data can vary significantly. In some cases, collections are based around maps or neighborhoods specific topics with a specific format. A good example comes from the University of Wisconsin-Milwaukees Digital Collections (http://www4.uwm.edu/libraries/digilib/). One collection in particular, the Milwaukee Neighborhoods collection, consists mainly of maps and photographs. Collections like this support navigation better than the limited search capacity. Another comes from the Yale School of Architecture (YSOA) student building project. This is a collection of mostly browse-able images. In some systems, like Community Design Solutions (CDS) database (www.cds.uwm.edu) from the University of Wisconsin-Milwaukee, the visual information is based around projects, images and textual-documents are tied to one project. Considering the combination of text and image documents, browsing is an important method of information retrieval in such a collection. Still, images are one of the most important types of document for architecture and urban planning. Professionals often maintain their own databases of images. Visual information is critical for architects in particular. In the physical office, architects will often plaster their walls, tables, and open space with drawings or the designs that inspire most. These visual objects exist in a context and usually serve as a reference point for the architect. Many are organized more like a repository than a true collection. In such collections, information is usually organized by folder for quick browsing or in order to navigate fileby-file. In many personal or smaller organizational databases, file searches are performed by Windows
File Search which, without appropriate labeling makes searching much more challenging. As the amount of overall content increases and the price of storage decreases, the demand for improved image retrieval systems is great. The improvement will only come in the form of enhanced user-system interaction. How users choose to compose their searches is of great interest to academics. In a retrieval tool, context is also important for the individual item. Alone, the image in a database doesnt have enough clues to allow effective retrieval. To remedy this, many collections provide appropriate metadata (title, description, etc) to enhance the search system. An excellent example for this comes from the Flamenco search system (http://flamenco.berkeley.edu/). Unfortunately, developing this type of metadata is an expensive and time-intensive enterprise. Without this well-designed, well-created metadata, image retrieval is extremely difficult. This then, becomes the challenge: how to design a search mechanism without an extensive metadata set that significantly improves the domainspecific architecture and urban planning collection. In this paper we propose to study a data collection of architectural and urban planning documents called Community Design Solutions (CDS). This is a student-led architectural and urban planning database with well-over a thousand documents. We analyze the content of this collection, interview users associated with CDS, and perform a questionnaire to understand the utilization and to help improve their efficiency. Based on our findings, we also propose a new image retrieval solution with we call proximity image retrieval. In proximity image retrieval, we utilize both the content of the collection as well as relevant related textual documents stored within the same folder to represent documents on an individual and project level. The layout for this paper is as follows: we first explore the appropriate literature associated with image retrieval methods as well as image retrieval in domain-specific environments. After we present the literature, the research question is explained followed by a brief introduction to CDS and their current database. Then, we define the methodology for our interviews and questionnaires. Trailing this is an explanation of our results with a suggestion for the design and implementation of a new system using the Proximity Image Retrieval. Finally, we offer a discussion along with suggestions for future study and conclusion to our paper.
LITERATURE REVIEW
In this literature review we examine user behavior and current image retrieval techniques. In our project, we hope to focus on the search strategies employed by users in the architecture and urban planning domain. The literature we sought to study user behavior examines the manner in which the researchers tested their users: some performed transaction log assessments, while other performed real user studies. We also review the current trends in image retrieval to determine how we might improve, especially for the architecture and urban planning images. This literature exposes the issues which we hope to avoid in the design of our own prototype.
User Search Behavior
Many studies explore the patterns of image searching and the behavior of users. This includes transactional log analysis, which is a major tool of researchers in identifying trends especially on the web where user patterns are more generalized (Spink and Jansen, 2006; Jrgensen and Jrgensen, 2005). Our study is done with a narrower audience in mind and so perhaps findings that benchmark the difficulty in capturing users focus for any length of time (Rorissa and Hemalata, 2008) do not apply. In our case working with an architectural and urban planning collection a lack of a good keyword searching tool means that users do not have a starting point to start their search. Beginning a search by browsing may be unfamiliar to some users causing potential confusion, such as in the CDS collection. Feeling that there are important points to examine, we discuss a few of these larger web-based transaction log studies, but we also feel that there is more to gain from the study of user preference in smaller settings. Spink and Jansen (2006) used transactional log data to study large federated online collections and how users access images within. Their work found that image searches are brief, unmediated affairs in which the user prefers to enter as few terms as possible, and visiting few of the results. This is not the only study to find that longer queries tend to lead the searcher off-course (Pu, 2008; Tjondronegoro, Spink, and Jansen, 2009). These findings go a long way in establishing a behavioral pattern that the current models should rely upon, but may not always hold true with smaller, domain-specific collections. Jrgensen and Jrgensen (2005) focused on a commercial image provider to analyze the queries and strategies that users employed. Their research is significant in that it revealed the ineffectiveness of Boolean searching in relation to image retrieval. They reported that users that did use Boolean often
failed to string terms correctly. In terms of images, Jrgensen and Jrgensen found that users tend to prefer browsing users employ browsing strategies much more often in searches that result in image downloads (1357) encouraging future retrieval system creators to create a useful browsing system. This also implies that users begin image retrieval with query-based navigation and follow up with browsing. Choi and Rasmussen (2003) looked at the characteristics of users queries for a popular webbased image gallery, the Library of Congress American Memory collection. Their work suggests that the establishment of relevant access points is the preference of users in smaller domain-specific image collections. Other studies have shown that users actually prefer searching for images in smaller collections (Fukumoto, 2006). Choi and Rasmussens findings are particularly relevant to our own work as they point to areas in which browsing can be augmented in these smaller settings. Like Jrgensen and Jrgensen, they highlight the importance of the search-first, browse -second nature to image retrieval, specifically in smaller collections. Choi and Rasmussen however, do not advocate for search functions neither do they perform a thorough analysis of users search queries. In our project, there is also the difference that users will be more familiar with the project rather than the keywords associated with the images for which they search.
Image Retrieval Methodology
that may make this type of application limited in its approach. Similar to the Flamenco approach is another by Lori McCay-Peet and Elaine Toms (2009) in which textbased retrieval is improved through an association of each image with a work task. In analyzing appropriate terminology to assign to images, McCayPeet and Toms study the manner or context in which an image was created. In this way, historically and functionally relevant textual information can be attributed to images that might otherwise have been lost with a lack of descriptive terms. Furthermore, this method also considers how a document is used by considering the stage in a project in which a user might need a particular image. For architecture and urban planning, we feel that this might be improved upon by adding other forms of textual information like PDFs and Word documents into the mix and grouping images by project rather than work task. The other major form of image retrieval is known as the content-based image retrieval (CBIR) system. This method operates on the explicit content of the digitized image (Enser, 2008). Aspects of photos not found in the textual information (i.e. color) are used to retrieve similar objects (Shah, 2006). The shortcoming to this system is that it is based in the sample image. The image being retrieved can only determine how similar the two images are and doesnt consider other critical factors that may be relevant to architecture and urban planning. A detailed discussion of these factors is offered in the following section. Content-based image retrieval has not been proven superior to text-based retrieval particularly in architecture and urban planning where users can be looking for similar sketches, but in totally different projects. By combining CBIR methods with text-based retrieval, searching might be enhanced several researchers have proposed the concept of semantic retrieval including, most notably Enser, et al (2007) but limitations still exist, including object labeling, and the discourse that locates those objects into categories.
COMMUNITY DESIGN SOLUTIONS
There are currently two methods to image retrieval, text-based and content-based. In text-based retrieval systems, images are treated traditional documents: image is gathered based on textual information such as the title, author, and any available abstract information. It is very user-friendly and quite populate. Most image retrieval systems on this approach. Even Google treats images as text, using file information to offer retrieval for users. The drawback is that to be successful in text-based systems, an intricate metadata system must be completed. This is no easy task: creating rich metadata is an expensive and time-consuming affair. One such example is the Flamenco image retrieval system from the University of California-Berkeley (http://flamenco.berkeley.edu/; Yee, et al, 2003). This system is built upon good document classification and a well-design metadata schema. Yee, et al, demonstrated that a majority of users preferred the Flamenco metadata navigation over traditional keyword searching. In the case of the CDS collection, such a complete metadata schema is not available and
CDS consists of 15-25 part-time student employees who work with Milwaukee-area community organizations, local developers and individual business owners to provide design proposals. At any one time, multiple students will be working on one project, all stored within a local database. The database, kept in-house, consists of a hierarchical series of folders. The most important of these folders and the one that stores most of the data relevant to student projects and local planning information is
called Quick Response Team (QRT). This corresponds with the name of the team within CDS that facilitates timely planning proposals. As a student-run organization, the requests come to students who form partnerships in the community and provide design services. Currently, browsing is the only realistic method of locating documents. A search function for the database does exist in the format of a Windows File Search system but given the prevalence of images without associative text data, two problems hamper retrieval: 1) the system is slow, and 2) the system only supports file name search. If a system does not have a very effective file naming convention, searching by file name would obviously not be very effective. CDS could ultimately benefit from following the Flamenco approach and manually indexing its content. Having a manually re-indexed collection is desirable but also somewhat unrealistic. With over 30 images for each project and more than 100 projects dating back to 2000, manually indexing the information would be a very tedious and expensive undertaking. Alternatively, using a contented-based indexing system to assign semantic meanings for associative metadata would be a quicker process, but as noted in the literature review doubts about the quality exist. Theoretically, a text-based search with enhanced metadata is good approach, but in practice, neither a purely content-based approach nor a wholly text-based approach would solve all of CDS retrieval problems. A new search strategy is needed. In this paper, we propose a new approach to image retrieval which seeks to take advantage of the rich textual information within an image collection. We call this approach proximity image retrieval. This approach draws on the idea that images should be accessed in the same way that they were created. Similar to McCay-Peet and Toms, proximity image retrieval weighs the context in which an image was created, but at a project level. A good example comes from the images below. Images A and B, appear to relate to each other more than they relate to Image C, but in fact, Image B and Image C come from the same project and were created within the same relative time frame. In content-based approach this information is not searchable. A user searching by the term beach or pavilion would probably retrieve Images A and B, which exist in completely different contexts. In proximity image retrieval, work task information associated with architectural and urban
planning images is used to provide a more fluid and intuitive retrieval interaction. Using a specific example (CDS and its data), we will explain and test how this graphic information informs the system.
Images 1A, 1B and 1C (clockwise from top left). Text-based and image-based files from CDS.
RESEARCH QUESTION To explain the research questions, it is imperative to re-state the ultimate goals of our project, which include understanding CDS both as a group of individual information users and an organization. Once we gain a better discernment of CDS usage, we aim to use that knowledge to build a modern, efficient search tool which can be incorporated into a new digital repository system. So to begin, the first question is stated below: R1. What are the problems and issues of a real architecture and urban design database system in terms of information access and retrieval? The first and second questions focus mostly on the content and the nature of the database. We seek to answer these questions before we focus on user behavior in order to determine whether user behavior is affected by the state of the current database. R2. What are the effective non-text graphical information retrieval techniques for a complex architecture and urban planning collection? The non-text graphic retrieval is the key obstacle to developing a new system and thus it is a main focus of our research. The system that we will be focusing on currently has no means of searching for image files. This database, from Community Design Solutions, presents the perfect setting to study the applications of a domain-specific image search tool.
METHODOLOGY
To begin learning about the most basic needs of an information retrieval system, we structured a system of interviews and questionnaires on which to build our prototype. We used the interviews to identify the most basic searching needs and strategies of the users. Since we were interested in uncovering the habits of real architecture and urban planning users, we chose to focus specifically on that population for both our interviews and questionnaire. The interviews were performed first. We used these tools to identify key issues that we address specifically in the questionnaire.
Interviews
the respondents were associated with CDS, the responses were collected anonymously. When writing the questionnaires, we used the answers provided in the interviews to prompt further investigation of concerns and improvement. As was noted in the interviews, file naming is a specific organizational concern that almost certainly needs to be addressed in any future prototype system. Several of the questions in the questionnaire probed this issue and possible solutions. The last two questions were open-ended in order to give the respondents an opportunity to inform the creation of a better system. Once all of the responses were collected, the answers were coded.
RESULTS AND ANALYSIS
For the interviews, we chose five individuals (four female, one male) that had a strong association with CDS and with a background in architecture and urban planning. Four of the interviewees worked regularly with the CDS database, while one had a general familiarity with architectural and urban planning databases. All five came from the School of Architecture and Urban Planning (SARUP) at UWM. The interviewers sat down with each of the interviewees for about thirty minutes. The interviews were semi-structured. Five main questions were prepared to prompt the interviewees to discuss the merits and pitfalls of the system (Appendix A). In all of the interviews we sought to bring out the breadth of influences that affect interviewees work with the search system, including problems, distractions and areas of improvement. Interviewees were encouraged to tender their frustrations and possible solutions. They were also encouraged to try a prototype search system. With the permission of each of the interviews, each session was recorded. After the interviews were completed, these recordings were transcribed and coding. The answers were analyzed and helped to create the questionnaire.
Questionnaires
The interviews and the questionnaires gave us a significant picture of the issues associated with the CDS search system. As anticipated, the organizational and naming conventions of the collection were identified as being significant problems with the database. Based on the results of the interviews and questionnaires, the most important problems diagnosed can be broken down into three categories: organizational, collaborative, and aesthetic. Each of these categories has two problems identified separately in the interviews and the questionnaires and all have significant importance for designing a prototype. Each of these categories and problems represent an area that we intend to improve upon in our model (see Table 1).
The interviews were used to craft an online questionnaire to be distributed to students and staff in the School of Architectural and Urban Planning (SARUP). The questionnaire was given to twentyfive individuals using a UWM survey software program called the Campus Survey Instrument developed by Qualtrics. There were nine questions (see Appendix B). The first three questions addressed biographical information (age, gender, relation to CDS), the next three addressed CDS usage, and the final three addressed searching behavior. Although
Table 1. Diagnosed problems of CDS database.
In determining these categories, we looked first at the interviews. The interviews revealed the dissatisfaction with the current CDS database. Almost all of the interviewees discussed file labeling as being a significant impediment to finding the desired file. To begin analyzing the results, we address each of the parts of the study (the interviews
Category Organizational
Coding Accessibility
Quote
remembering the location of a particular image was difficult with the CDS database.
Organizational
File Labeling
The main problem is that, for example, if someone had found an image of a receptacle or a tree from Wisconsin and that person archived that file somewhere, five years ago, in one of those completed projects, how can I access that file? For example, if we have a streetscape, we might have benches, or trees. How can we have a category for that? Is the bench included in that? If something is searched, how can we have a category for that? Some name that identifies what a project is would be nice. Its a big problem.
Organizational
File Labeling
A perfect system could something that has not only names but when you search that you see a picture. Right now we dont have that visual communication. It would be good if you have the name and an image pops up. Aesthetic Image It definitely helps to see what the document is before you open it. I think some Representation sort of graphic preview where you actually know what youre looking at. Even if and the questionnaires) separately. Discussion of the its not labeled, you can see this is what I need. results follows. Aesthetic Image Representation Table 2. Coded interview quotations. Interviews
The interviews drew a solid initial picture of the relationship that workers had with the CDS search system. The interviewees all had strong internet and computer skills, including regular web searching. The four interviewees associated with CDS searched the database on a daily basis. Most of the interviewees reported browsing the CDS database by folder, one reported using the Microsoft File Search tool to search folders, but only if I really have no idea where to find a file. Another interviewee associated with CDS noted that they would use Google to find images after searching the CDS database unsuccessfully. The first question addressed their main search tasks. Three respondents said that their main search task was looking for images. Three also reported looking for completed projects that might have some aspect (i.e. a map, a drawing) that could have some relevance for their current work. Other search tasks performed in the database included searching for text files, examining work records, and looking for historical records. Three of the interviewees mentioned going outside the database (i.e. to a search engine like Google) for the information they needed when they did not locate it quickly in the database. The second question addressed how the interviewees performed their searches. In their approach to searching the database three of the interviewees reiterated the importance of images to their search. One of the interviewees stated that the majority of the searches that person performed were related to finding images. All three mentioned the time it would save to preview the image before opening the file. Two of the interviewees complained that
The third question asked the interviewees to identify the main problems associated with the CDS database. Four of the interviewees suggested that the database could use a more consistent labeling system. One interviewee stated that everything was vaguely labeled and organized. Two interviewees cited the amount of time it took to search as a problem. One stated that its always two or three steps to get where you might be able to get in one step and sometimes its four or five or six or seven or eight or nine or ten steps depending on what it is youre looking for The fourth question asked interviewees to give aspects of their ideal architecture and urban planning retrieval system. Many of the suggestions related to specific problems addressed in the third question. For instance, one interviewee suggested creating a better system of naming by forming a labeling system based on chronology. Two of the interviewees suggested that graphic representation of the document was far superior to text representation. One mentioned the interface of the search engine, Bing as a good example of visual retrieval. Another interviewee suggested linking documents to a Google Map so that these files could be retrieved geographically. Finally, the interviewees were asked to preview the prototype search engine. In certain cases if they hadnt viewed the prototype beforehand they were asked to do a short search to test the system. All responded positively. Many were impressed that the JPEG files were displayed graphically on the search results list. One interviewee said that they would specifically prefer a layout of graphic images instead of a traditional text-based results list. Another typed
in a response and did not get the results that they expected, but was still pleased. One inquired about showing AutoCAD and PDF documents graphically, but considered it a big step to be able to search text files. Another called it better than going through the folders and admitted that it was a big step in the creation of an image library.
Questionnaires
documents they were looking for. In this question, respondents were allowed to select multiple answers. Of the fourteen respondents, thirteen (93%) said that images were a type of file that they searched for. Other important searched files included documents (10 responses, 71%), maps (11 responses, 79%) and drawings (11 responses, 79%). Question 8 asked respondents how they searched the CDS system. They were given three options: 1) by memory, 2) browsing by folders, or 3) by Windows File Search. Of the thirteen respondents, twelve (92%) said that they searched by either memory or by browsing the folders. The final two questions were open-ended to give respondents a chance to describe problems and suggestions that they had for CDS database. The first question asked users to describe the major problems of the system. There were eight useful responses to this question. When coded, three of the responses mentioned file labeling as a problem, while two cited accessibility within the database as a significant problem. The last question asked for suggestions or improvements that could be made. There were six useful responses to this question. Two of the responses to this question said that a more consistent file naming convention would be helpful, while two others discussed being able to represent files with images rather than names. When asked which type of files that users typically looked for 69% (43 out of 62) responded that they regularly searched for imageProblem When searching for images there are not names clearly, (and naming them takes lots of time) therefore it is difficult to find what I am looking for. It is not visual. For maps, it is not organized in a central location and once found the naming convention is not fully utilized yet. Not a uniform way of labeling. Sometimes finding a specific file was difficult if you didn't remember the actual location. Often repeated files.
From the twenty-five submissions, eighteen useful responses were yielded. The questionnaires revealed a lack of organization in CDS collection. The questionnaires also revealed the breadth of the divide between image and text-based document searching. The first question asked respondents about their gender. Eighty-nine percent (16) responded male, whereas only two responded to being female. The second question asked respondents for their age. While the majority of respondents (67% or 12 respondents) fell between the ages of 19 and 29 years old, three respondents were between 30 and 49 years old, and three were older than 50. The third question asked about internet usage. All eighteen respondents said that they used the internet every day. The second set of questions was designed to test usage of the CDS database. Five respondents reported having never used the database (28%), while six (33%) said that theyd used occasionally in the past. Seven people (39%) said that they had used the CDS system either weekly or daily in the past.
Category Organizational Coding File Labeling
Organizational Organizational Collaborative Collaborative
File Labeling Accessibility Duplication Multiple Authors
Since we have more than one person working in same projects, often we get confused with various versions of the same document and its locations. Regarding images, we have images used in different projects. When we need the same/similar images, we have to go back to those folders and find them somehow. Table 3. Problems associated with CDS database.
[Type a quote from the document or the summary of an interesting point. You can position the text
files (images, drawings, or sketches) Question 6 asked why respondents used the CDS box anywhere in the document. Use the Text Box Toolsbased tab to change themaps, formatting of the pull quote while the rest would generally be considered textual database. Of the fourteen respondents, twelve (86%) text box.] files (word processing documents, slideshows, or said for work while five said for classes (36%). spreadsheets). Question 7 asked students to identify the types of
Category Organizational Organizational Organizational
Coding Accessiblity File Labeling File Labeling
Suggestion "Much of the database should be a publicly accessible web based service." "I think the shift in the naming convention is helping improve searches." "Possibly only allow a few people to create and delete files. This will reduce copies and misunderstood folder names." "1- Possibility of having a [separate] search for files 2- access to components of some [software] like Revit, Auto Cad, [Photoshop], and 3D'S Max which need to be bought." "If a document could be continuously worked on, with a provision of archiving successive revisions, that might help. In that case, the search system should allow to see the successive revisions and who worked on it. For images, if there would be an easy system of tagging, captioning them, like key words which would then be used for retrieving them as well...that might be helpful."
Organizational, Aesthetic
Accessiblity, Software
Aesthetic, Collaborative
Image Representation, Multiple Authors
Table 4. Suggestions for the CDS database.
In terms of system organization, the most revealing and relevant portion of the questionnaire may have been the open-ended responses to the questions asking for problems and suggestions. In most cases, these exposed the problems of a large-system used by multiple authors. As described in Figure 3, a system of labeling in regards to the files and folders makes browsing nearly impossible. The file location problem spills over into collaborative problems. When one user cant find the project of another, he or she will create a new file, often duplicating work that has already been done. The main problem is that no two users think alike and therefore do not index their work alike. The suggestions (Table 4) for the CDS database make clear that the naming process of CDS has failed. Re-indexing is not a realistic solution due to the time it would take. It also is unrealistic to expect future generations of CDS users to follow suit. Instead, a way to lump together changes on specific files located within specific projects is desirable.
DISCUSSION
After the formation of an information need, users interaction with the interface leads them to three options: browsing folder by folder, searching keywords using the Windows File Search tool or by memory. As discussed earlier in this paper, the first two options are undesirable and difficult. The third option by memory is distinguished from browsing folder by folder, because in the users mind he or she knows exactly which folder the file is in. He or she does not need comb through files to get to it. Usually this memory exists because of a contextual or proximity framework. Searching by memory is obviously the most ideal manner, but not for those unfamiliar with the system. Based on the current organization of images, a user would have to know which project the file is in, in which year the project was completed, or possibly by whom the image was introduced into the system. Proximity retrieval is meant to mimic memory. An architectural and urban design database user thinking of the word beach will probably conjure images of a floor plan for a beach house, a map of the shoreline or the minutes of a meeting that described guidelines for fencing on a beach. In this way, image seekers can enter a search term (e.g. beach) and retrieve documents and images that correspond to projects with the word beach in them. Very specific details from a project can be retrieved without the hassle of navigating between folders and the chronological association in which theyre found. In a project-oriented database such as the one CDS has, our proximity retrieval model addresses several
As Jorgensen and Jorgensen (2005) note, in an image-based database, users tend to prefer browsing rather than searching. The problems we found in our study seem to indicate displeasure not that users couldnt search, but rather that theyre browsing was inhibited. Users looking for images are more naturally disposed to begin their search by browsing. Based on our findings, we developed the following model (Figure 2) to represent the user behavior of our target population.
Users
Information Need
Interface
By Windows file search
By browing folders
By memory
Proximity
P1
P2
P3
P4
P5
P n
file1
file2
file3
file4
file5
file6
file7
file8
file9
File n
Figure 2. User behavior model for image retrieval in architecture and urban planning .
key problems by mining useful information from neighboring files. This means that CDS does not need to spend the significant time and money to develop rich metadata and re-index thousands of files. Because proximity retrieval seeks to find files cohesively place behind a designated folder, it has the capability to meet the determined search patterns of architectural and urban designers. The next step for our project will be to test the effectiveness of our search system versus a baseline system such as Windows File Search in full-fledge user study. We will also seek to enhance the browsing features of our system which are currently rudimentary in order to test the preferences of our users. These two steps will eventually lead to the creation of a completely userpreference driven search system.
CONCLUSION
retrieval is highly favorable. The next step in our study is to examine how our prototype system performs against other retrieval systems, such as a baseline search system.
ACKNOWLEDGEMENTS
We would like to thank all of our participants from Community Design Solutions and the School of Architecture and Urban Planning at the University of Wisconsin-Milwaukee. We would also like to thank Susan Weistrop for her dedication and advocacy.
REFERENCES
Choi, Y. and Rasmussen, E. M. (2003) Searching for images: The analysis of users queries for image retrieval in American history. Journal of the American Society for Information Science and Technology 54(6):498-511. Cover, T. & Thomas J. (1991). Elements of Information Theory. New York: Wiley. Enser, G. B. et al (2007) Facing the reality of semantic image retrieval. Journal of Documentation 63(4): 465-481. Enser, P. (2008) The evolution of visual information retrieval. Journal of Information Science 34(4):531-546. Fukumoto, T. (2006) An analysis of image retrieval behavior for metadata type image database. Information Processing and Management 42 : 723728.
This study presents evidence for the design of proximity image retrieval. As our study reveals, labeling problems are one of the most significant issues in the CDS search system. Unfortunately, reindexing an image collection such as CDS system is not a realistic approach. Based upon the problems and unique features of the CDS architectural database, neither a content-based approach, nor a strictly text-based approach will satisfy a smaller domain-specific image collection like CDS and thus a contextual approach such as proximity image
Gelernter, J. (2009) Image indexing in article component databases. Journal of the American Society for Information Science and Technology 60(10) 1965-1976. Jansen, B. (2008) Searching for digital images on the web. Journal of Documentation 64(1): 81-101. Jrgensen, C. and Jrgensen, P. (2005) Image querying by image professionals. Journal of the American Society for Information Science and Technology 56(12): 1346-1359. Matusiak, K. (2006). Towards user-centered indexing in digital image collections. OCLC Systems & Services 22(4): 283-298. McCay-Peet, L. and Toms, E. (2009) Image use within the work task model: Images as information and illustration. Journal of the American Society for Information Science and Technology 60(12):2416-2429. Ogilvie, P. & Callan, J. (2001). Experiments using the Lemur toolkit. In Proceedings of the Tenth Text Retrieval Conference (TREC-10) Tjondronegoro, D., Spink, A. and Jansen, B. (2009) A study and comparison of multimedia web searching: 1997-2006. Journal of the American Society for Information Science and Technology 60(9): 1756-1768.
Pu, H. T. (2008) An analysis of failed queries for web image retrieval. Journal of Information Science 34(3): 275-289. Robbins, D. (2000) Shifts of focus on various aspects of user information problems during interaction information retrieval. Journal of the American Society for Information Science and Technology 51(10): 913-928. Rorissa, A. and Hemalata, I. (2008) Theories in cognition and image categorization: What categories labels reveal about basic level theory. Journal of the American Society for Information Science and Technology 59(9): 1383-1392. Shah, B., et al (2006) A cluster-based approach for efficient content-based image retrieval using a similarity-preserving space transformation method. Journal of the American Society for Information Science and Technology 57(12): 1694-1707. Spink, A. and Jansen, B. (2006) Searching multimedia federated content web collections. Online Information Review 30(5): 485-495. Yee, K. P. et al (2003) Faceted metadata for Image Search and Browsing. Proceedings of the SIGCHI conference on human factors in computing systems, April 5-10, 2003, Ft. Lauderdale, Florida, USA.
10

Paper Cds

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Paper Cds

Caricato da

Copyright:

Formati disponibili

Proximity Image Retrieval: Identifying the Problems and Solutions for Searching in an Architectural and Urban Planning-Specific Database

Table 1. Diagnosed problems of CDS database.

Organizational Organizational Collaborative Collaborative

File Labeling Accessibility Duplication Multiple Authors

Category Organizational Organizational Organizational

Coding Accessiblity File Labeling File Labeling

Image Representation, Multiple Authors

Table 4. Suggestions for the CDS database.

By Windows file search

Potrebbero piacerti anche