Indexing and searching huge volumes of images????

Discussion in 'UK Photography' started by Umgall, Aug 1, 2005.

  1. Umgall

    Umgall Guest

    I know that someone out there can help me with a problem.

    I'm being asked to find some software which can index and allow searches on
    huge volumes of images. Most of these images will be TIFFs, and to be
    honest, I'm expecting there to be about 1.5 million at the end of the
    project. Ouch.

    Basically I need to be able to store 'metadata' against each image, and to
    search this metadata very quickly. Ideally, the metadata would be stored in
    an SQL database, and would provide hyperlinks to images on the file system.
    I need to have a description (up to 2k of text), a date and a location.

    So, I could search for "hyde park" and if this phrase occurs within the
    metadata fields of any of the 1.5 million images, the hits would be
    displayed (along with the metadata) and I could click through to the image.

    Does anyone know of any system that can do what I need? Any help would be
    gratefully appreciated. The alternative is to develop an application, but
    if there is an off-the-shelf solution then this obviously going to be
    better!

    Umgall.
     
    Umgall, Aug 1, 2005
    #1
    1. Advertisements

  2. I can't answer the question but you might want to also ask this on some more
    technical newsgroups - perhaps something like alt.comp.databases...?
     
    Michael Cargill, Aug 1, 2005
    #2
    1. Advertisements

  3. You need to work out what you want to get out of the database in the end.
    This will set what informaton you need to record and the way it is recorded.
    Then you need to look at how you are going to access the data and how many
    people will need access to it and when.
    This will then affect what database system you end up using.
    You also need to look and see if there is a commercially available system as
    buying it may be cheaper than developing your own.

    It all boils down to the use it will be put to.
    If it was for access from multiple locations by multiple users I would be
    looking at Oracle or MSSQL.
    If its just one person sitting at a computer I would probably not risk MS
    Access as I don't know how it handles such large databases. You would need
    to get some advive on that from people who have run big databases.
     
    Gordon Hudson, Aug 1, 2005
    #3
  4. Have a look at ThumbsPlus:
    http://www.cerious.com/image-database.shtml
    There's a mention of sql in there somewhere.
    I'm in the middle of setting up a database driven website along similar
    lines (though I doubt if it's ever get past tens of thousands of
    images!) using mysql and php. Basic searches are pretty straightforward,
    but of course I want it to do more - show categories which can be
    clicked on to refine the search to fewer and fewer images, similar to
    Ebay's searching system. I should have it finished in a while...
     
    Willy Eckerslyke, Aug 1, 2005
    #4
  5. The IPTC standard was created for this very purpose - see
    http://www.peterkrogh.com/Pages/digital/iptc.html. What you need therefore
    is some software that will let you add it to an image and then some
    (perhaps the same) that will let you search it all.

    There are several products that will do with or both and they need not be
    expensive. IrfanView, for example, lets you add IPTC data - and that's
    free. This product (which I’ve not tried) does both jobs -
    http://peccatte.karefil.com/Kalimages/EN/Index.html . Another, possiblly
    more robust, is here - http://www.camerabits.com/pages/PM4.html .

    Perhaps other people here know of some. Most pro photographers need
    something like it. There's certainly no need to reinvent the wheel and
    start messing around with SQL.
     
    Roger Whitehead, Aug 1, 2005
    #5
  6. Umgall

    Neil Barker Guest

    Yup, no problemo.

    Have a look at Fotostation Pro and Index Manager.

    http://www.fotoware.com

    Fotostation Pro is the front-end application, which works as a
    standalone image cataloguer / editor, but really comes into its own
    when connected to a server running Index Manager.

    Essentially what happens is this:-

    When an image is sent to the server from Fotostation Pro, Index Manager
    reads the data contained in the IPTC fields and adds it to an index,
    with a pointer to that image file location for later retrieval.

    When using the search facility in Fotostation, rather than having to
    search through thousands of files, all it needs to do is to consult the
    master index - any matches can then be found in seconds.

    You'll find that many newspapers, mine included, run this system and it
    does work extremely well. We currently have just under 100,000 images
    online and searching on a keyword or phrase takes literally a few
    seconds. Index Manager has the capacity to search millions of images,
    potentially spread over several servers using something called "Cluster
    Commander" (which enables many servers to be treated effectively as one
    big one). It can also do Boolean algebra searches using AND/OR/NOT
    together with phonetic searches and more.

    It can also be connected to a WWW front-end, which is a Java
    application enabling online viewing/ordering etc.

    If you need further help with this, feel free to get in touch.
     
    Neil Barker, Aug 1, 2005
    #6
  7. Umgall

    infinity Guest

    Have a look at Thumbsplus 7 by Cerious, which uses the Access database
    format, although it functions as a standalone application. You can have
    keywords, user defined fields that take numeric or string values, and also
    add lengthy comments to images. The thumbnails view can show all your own
    fields & keywords plus EXIF data etc, and info embedded in the file can be
    used to generate keywords if you like, as can its name and folder path.
    There's a 30 day free trial available. Since the database is now Access
    format, you should be able to open it directly if you need more
    functionality and use your own search macros.
    I'm not sure how well it copes with millions of images but certainly tens
    of thousands is no problemo.
     
    infinity, Aug 2, 2005
    #7
  8. I agree in principle, but as the OP refered to millions of images,
    there's going to be a massive investment in time just inputting the
    data. In comparison, a few days spent messing with SQL to get something
    that does this specific job perfectly, and nothing else, could be time
    well spent.
    Fine if an off-the-shelf product will work with no compromises, but if
    that product doesn't quite fit the requirements or is a bit clunky in
    its application - even if it just means an extra mouse click or two -
    any small irritation multiplied by 1.5 million is likely to end up as a
    major headache.
     
    Willy Eckerslyke, Aug 2, 2005
    #8
  9. How is that going to speed data inputting? If it doesn't exist in
    machine-readble form (and Umgall hasn't said it does), entering it to a
    database form is going to be no quicker than entering to a purpose-made
    product, possibly the reverse.
     
    Roger Whitehead, Aug 2, 2005
    #9
  10. You don't access the database directly, you write your own form in PHP
    that only asks what you want it to and only shows the fields you need.
    So instead of a page full of text fields, you may only have one or two
    and a submit button. If a field only ever needs to contain one of a
    choice of text strings, you can set up your form so that you click on a
    radio button to choose one from a list, rather than having to type it in
    afresh every time.
    If you want, you can tell it to pre-fill the form fields with the last
    image's data for you to edit rather than start afresh for every image.
    Also you could set it up to bulk fill certain fields if you want it to.

    With a little thought, your input form should be _the_ most efficient
    way of inputting data. No purpose-made product could ever be as streamlined.
     
    Willy Eckerslyke, Aug 2, 2005
    #10
  11. Umgall

    Umgall Guest

    I suppose to be fair, the 'metadata' will exist in machine readable form.
    This will be generated from an existing database, and if the application
    supports it, will be imported in XML. There aren't many fields, but it is
    vital that these can be searched: County, Date, Description, Surname,
    Forename, Placename and image ID.

    Willy is right - due to the huge volumes, it's important to get something
    which is flexible to allow us to search quickly and return matches, then to
    display the image with one keyclick. Browsing the images is imporant too,
    but fast search capabilities are vital.

    Thanks for the suggestions so far!

    Umgall.
     
    Umgall, Aug 2, 2005
    #11
  12. Umgall

    Neil Barker Guest

    I tell you - you want Fotostation Pro - does all that straight out of
    the box :)
     
    Neil Barker, Aug 2, 2005
    #12
  13. You're splitting hairs now.
    streamlined.

    Unless one has looked at all the significant products, one cannot know. A
    sensible buying process would be to do this first, then look into a
    roll-your-own answer once one has a basis for comparison.
     
    Roger Whitehead, Aug 2, 2005
    #13
  14. Hardly. That's fundamental to the whole thing.
    I have difficulty remembering so far back but I thought that was pretty
    much what I suggested in the first place, hence my link to cerious.com.
     
    Willy Eckerslyke, Aug 2, 2005
    #14
  15. Life's too short to nail your feet to the floor so I'll stop bothering.
    Your memory clearly is failing. You suggested one product, not a survey of
    them.
     
    Roger Whitehead, Aug 2, 2005
    #15
  16. Still have to have the last word though, eh?
    Any idea what I had for tea yesterday? I'm trying to decide whether I
    need to shop on the way home.
     
    Willy Eckerslyke, Aug 2, 2005
    #16
  17. Umgall

    Bandicoot Guest

    [SNIP]
    Last time I had anything to do with this sort of thing we wouldn't put a
    base that size in Access. Even at 250,000 items we'd prefer SQL. Big
    databases we mostly used SQL or a Terradata solution (sometimes with a
    Natural Language front end).

    These were not usually on PC architectures, though they would be accessed
    via PCs: Auspex was a good choice if not putting them on the mainframes.
    One DB that was used by only about four people, but which was big, went on
    an old SGI box attached to the office network, which worked well and was
    cheap.


    Peter
     
    Bandicoot, Aug 4, 2005
    #17
  18. Umgall

    Phil Kyle Guest

    Closest you've ever been to a box.

    --
    Phil Kyle™
    Uno
    Dos
    Tres
    Cuatro
    CINCO!!!!!!

    "Be very aware that my willingness
    to continue to criticise your sig
    is infinite." -- Neil Barker
     
    Phil Kyle, Aug 20, 2005
    #18
  19. Umgall

    Phil Kyle Guest

    He means that literally.

    --
    Phil Kyle™
    Uno
    Dos
    Tres
    Cuatro
    CINCO!!!!!!

    "Be very aware that my willingness
    to continue to criticise your sig
    is infinite." -- Neil Barker
     
    Phil Kyle, Aug 20, 2005
    #19
  20. Umgall

    ah Guest

    Oooohhhh..
     
    ah, Aug 21, 2005
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.