I have just downloaded the Free Edition of Zoom to give it a try before deciding to buy. My site is multilingual (Russian/English/German) and I use UTF-8 encoding. The indexing of 50 pages was done OK. But when I try to search it looks like it is not possible to find any words starting with capital Russian letters (family names, towns, first words of the sentences, etc.). The "Search result for" string shows that the query word was converted to lowercase correctly. Also when I seach for any adjacent words I can see those problem words in the results so they must have been indexed. What is wrong? Searching for any English words does not cause any problem at all. Date: Tue, 23 Mar 2004 15:41:10 -0500 (EST) From: Kevin Atkinson :: Mar 23, 2004 the 23 additional letters: U+00C6 LATIN CAPITAL LETTER AE U+00D0 LATIN CAPITAL LETTER ETH In Hangeul letters individual letters, known as jamo, The problem is not that there are more than 220 unique symbols, . So maybe I should store them is some variable width format such as UTF-8. http://aspell.net/langinfo.txtHOME |
We have had a look at the site and agree the behaviour is not correct.
We don't have a full solution for the problem and will not be able to investigate the problem in detail until after Christmas now. Lingua::DetectCyrillic. Detection of 7 Cyrillic codings and 2 :: The thing is that the alphabets, i.e. letters of most Cyrillic codings do not one word starting with a capital letter (I don't take in consideration some weird See RFC 2279 'UTF-8, a transformation format of ISO 10646' for detailed information. December 01, 2002 - Extensive Russian documentation added. http://cpansearch.perl.org/src/RUDENKO/Lingua-DetectCyrillic-0.02/docs/en/DetectCyrillic.htmHOME | Alphabet Soup: The Internationalization of Linux, Part 1:: A more subtle problem is the temptation to avoid new features that would require . For example, the ``Latin capital letter A'' will be encoded as 0x41. Like EUC-JP, UTF-8 encodes the ASCII characters as single bytes in their What is to be done for languages such as Greek and Russian with their own http://portal.acm.org/ft_gateway.cfm?id=327699&type=htmlHOME |
So if you can leave the search page on your web site for a couple of weeks that would be good.
As an temporary solution you could remove the following lines of code in search.php
if ($UseUTF8 == 1 && function_exists('mb_strtolower'))
$query = mb_strtolower($query, "UTF-8");
else
This will avoid the conversion of Russian search words into lower case and you should then be able to do case sensitive searches in Russian. Re: [turba] Sorting addressbook with national alphabet:: Jul 15, 2009 generated one-byte chars to UTF8. Now I'll see russian letters. As you can see, I have deleted 'A' (latin capital A) and insert russian http://archives.free.net.ph/message/20090715.114008.6b295d7a.en.htmlHOME |
---
David
We've fixed this problem in the latest build (4.2.1007) released today. This is available for download here:
http://www.wrensoft.com/zoom/whatsnew.html
The latest version should now be able to perform case insensitive searches on Cyrillic words (for UTF-8 encoded websites) without any problems. This also applies for other foreign languages encoded with UTF-8.
So if you can leave the search page on your web site for a couple of weeks that would be good.
The search page would be there as long as you need for your investigations. Good luck and Merry Christmas! :wink:
Here is the search page of my site: http://www.icon-art.info/search/search.php.
I am not sure that I would be able to put any cyrillic word on this forum so idea is as follows: have a look at this page: http://www.icon-art.info/library.php?lng=ru - it has been indexed. You can check it if you would search for any lowercase word. But if you would try searching for any word that starts with capital cyrillic letter (e.g. family names of authors) you would fail.
Can you post the URL to your web site search function & details of what words you are searching for, so that we can see the problem.
-----
David
Heres my question?
Why would someone get bored with doing IT for 10 years?
|