ZJSK.COM
welcome to my space
X
Search:  
 HOME   Capital letters in Russian UTF-8 problem
Capital letters in Russian UTF-8 problem
Published by: wktd 2010-03-11
Welcome to:zjsk.com

  • I have just downloaded the Free Edition of Zoom to give it a try before deciding to buy. My site is multilingual (Russian/English/German) and I use UTF-8 encoding. The indexing of 50 pages was done OK. But when I try to search it looks like it is not possible to find any words starting with capital Russian letters (family names, towns, first words of the sentences, etc.). The "Search result for" string shows that the query word was converted to lowercase correctly. Also when I seach for any adjacent words I can see those problem words in the results so they must have been indexed. What is wrong? Searching for any English words does not cause any problem at all.
  • Date: Tue, 23 Mar 2004 15:41:10 -0500 (EST) From: Kevin Atkinson ::
    Mar 23, 2004 the 23 additional letters: U+00C6 LATIN CAPITAL LETTER AE U+00D0 LATIN CAPITAL LETTER ETH In Hangeul letters individual letters, known as jamo, The problem is not that there are more than 220 unique symbols, . So maybe I should store them is some variable width format such as UTF-8.
    http://aspell.net/langinfo.txt
    HOME


  • We have had a look at the site and agree the behaviour is not correct.

    We don't have a full solution for the problem and will not be able to investigate the problem in detail until after Christmas now.
    Lingua::DetectCyrillic. Detection of 7 Cyrillic codings and 2 ::
    The thing is that the alphabets, i.e. letters of most Cyrillic codings do not one word starting with a capital letter (I don't take in consideration some weird See RFC 2279 'UTF-8, a transformation format of ISO 10646' for detailed information. December 01, 2002 - Extensive Russian documentation added.
    http://cpansearch.perl.org/src/RUDENKO/Lingua-DetectCyrillic-0.02/docs/en/DetectCyrillic.htm
    HOME
    Alphabet Soup: The Internationalization of Linux, Part 1::
    A more subtle problem is the temptation to avoid new features that would require . For example, the ``Latin capital letter A'' will be encoded as 0x41. Like EUC-JP, UTF-8 encodes the ASCII characters as single bytes in their What is to be done for languages such as Greek and Russian with their own
    http://portal.acm.org/ft_gateway.cfm?id=327699&type=html
    HOME

    So if you can leave the search page on your web site for a couple of weeks that would be good.

    As an temporary solution you could remove the following lines of code in search.php

    if ($UseUTF8 == 1 && function_exists('mb_strtolower'))
    $query = mb_strtolower($query, "UTF-8");
    else

    This will avoid the conversion of Russian search words into lower case and you should then be able to do case sensitive searches in Russian.
    Re: [turba] Sorting addressbook with national alphabet::
    Jul 15, 2009 generated one-byte chars to UTF8. Now I'll see russian letters. As you can see, I have deleted 'A' (latin capital A) and insert russian
    http://archives.free.net.ph/message/20090715.114008.6b295d7a.en.html
    HOME

    ---
    David


  • We've fixed this problem in the latest build (4.2.1007) released today. This is available for download here:
    http://www.wrensoft.com/zoom/whatsnew.html

    The latest version should now be able to perform case insensitive searches on Cyrillic words (for UTF-8 encoded websites) without any problems. This also applies for other foreign languages encoded with UTF-8.


  • So if you can leave the search page on your web site for a couple of weeks that would be good.

    The search page would be there as long as you need for your investigations. Good luck and Merry Christmas! :wink:


  • Here is the search page of my site: http://www.icon-art.info/search/search.php.

    I am not sure that I would be able to put any cyrillic word on this forum so idea is as follows: have a look at this page: http://www.icon-art.info/library.php?lng=ru - it has been indexed. You can check it if you would search for any lowercase word. But if you would try searching for any word that starts with capital cyrillic letter (e.g. family names of authors) you would fail.


  • Can you post the URL to your web site search function & details of what words you are searching for, so that we can see the problem.

    -----
    David





  • Heres my question?
    Why would someone get bored with doing IT for 10 years?

    You are looking at:zjsk.com's Capital letters in Russian UTF-8 problem, click zjsk.com to home
  • help in backup nt
  • does it permissions
  • phpmyadmin question
  • help how do i backup my database
  • idle processor but high load
  • mysql installation problems
  • script on one server to check another
  • questions about upgrading to mysql 3 23 36
  • how to backup database from cron excluding search word and attachements table
  • installing php
  • fatal error call to undefined function mysql pconnect
  • is mysql 3 23 36 stable
  • mysql trouble
  • what do i need to use gzip html
  •  
  • info what to do if your server only supports php3
  • how to create a database using telnet
  • i had a thought
  • questions about top and ram
  • setting up a secure vbulletin server from start to finish step by step
  • backing up all mysql databases on a dedicated server
  • explanation on mysql error 1064
  • forum back up keeps hanging up
  • anyone got a setup similar to this
  • php4
  • need help with a query spliting new posts v s replies
  • restoring database is not taking place why
  • can 039 t mount the second hd unmounts on reboot
  •  Homepage | Add to favorites | Contact us | Exchange links | LOGIN | Site map | 
    Copyright© 2008 zjsk.com        Site made:CFZ