ZJSK.COM
welcome to my space
X
Search:  
 HOME   Capital letters in Russian UTF-8 problem
Capital letters in Russian UTF-8 problem
Published by: mike 2010-03-17
Welcome to:zjsk.com

  • I have just downloaded the Free Edition of Zoom to give it a try before deciding to buy. My site is multilingual (Russian/English/German) and I use UTF-8 encoding. The indexing of 50 pages was done OK. But when I try to search it looks like it is not possible to find any words starting with capital Russian letters (family names, towns, first words of the sentences, etc.). The "Search result for" string shows that the query word was converted to lowercase correctly. Also when I seach for any adjacent words I can see those problem words in the results so they must have been indexed. What is wrong? Searching for any English words does not cause any problem at all.


  • Here is the search page of my site: http://www.icon-art.info/search/search.php.

    I am not sure that I would be able to put any cyrillic word on this forum so idea is as follows: have a look at this page: http://www.icon-art.info/library.php?lng=ru - it has been indexed. You can check it if you would search for any lowercase word. But if you would try searching for any word that starts with capital cyrillic letter (e.g. family names of authors) you would fail.


  • Can you post the URL to your web site search function & details of what words you are searching for, so that we can see the problem.

    -----
    David


  • So if you can leave the search page on your web site for a couple of weeks that would be good.

    The search page would be there as long as you need for your investigations. Good luck and Merry Christmas! :wink:


  • We've fixed this problem in the latest build (4.2.1007) released today. This is available for download here:
    http://www.wrensoft.com/zoom/whatsnew.html

    The latest version should now be able to perform case insensitive searches on Cyrillic words (for UTF-8 encoded websites) without any problems. This also applies for other foreign languages encoded with UTF-8.


  • We have had a look at the site and agree the behaviour is not correct.

    We don't have a full solution for the problem and will not be able to investigate the problem in detail until after Christmas now.

    So if you can leave the search page on your web site for a couple of weeks that would be good.

    As an temporary solution you could remove the following lines of code in search.php

    if ($UseUTF8 == 1 && function_exists('mb_strtolower'))
    $query = mb_strtolower($query, "UTF-8");
    else

    This will avoid the conversion of Russian search words into lower case and you should then be able to do case sensitive searches in Russian.

    ---
    David





  • Heres my question?
    Why would someone get bored with doing IT for 10 years?

    You are looking at:zjsk.com's Capital letters in Russian UTF-8 problem, click zjsk.com to home
  • 4 21 04 fcst great lakes
  • 04 28 04 fcst midwest
  • 4 22 04 fcst ok n tx
  • 4 30 04 now tx ok
  • 4 20 04 now s ks e ok
  • 4 25 04 reports northeast midwest
  • 4 25 04 fcst ohio valley great lakes
  • 04 25 04 fcst north tx west ok
  • 4 29 04 now tx
  • 4 29 04 fcst midwest
  • 4 19 04 now w tx se nm w ok
  • 4 30 04 fcst southern plains
  • 4 24 04 rpts ne storms fl dd s
  • 4 30 04 rpts tx ok
  •  
  • 4 20 04 reports
  • 04 24 04 fcst mo w il
  • 04 23 04 now ks ok tx
  • 4 20 04 fcst central southern plains
  • 4 21 04 fcst ok northeast tx western ar
  • 4 23 04 fcst tx ok
  • 4 20 04 now il in
  • 04 25 04 now nm tx ok
  • 4 22 04 now oklahoma western ar far north tx
  • 4 22 04 rpts tx ok ar mo
  • 04 23 04 reports ok tx tn mo
  • 4 29 04 fcst ok ks
  • 4 21 04 rpts ok ks
  • 4 21 04 now ok
  •  Homepage | Add to favorites | Contact us | Exchange links | LOGIN | Site map | 
    Copyright© 2008 zjsk.com        Site made:CFZ