<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    На 09.04.2015 г. в 17:32, Sah War написа:<br>
    <blockquote
cite="mid:CAEps0eR36vi6bjObfx-huY_SMAtRXZnXLTpajOxj5OAbR687Cw@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div>
          <div>
            <div>
              <div>
                <div>
                  <div>
                    <div>@Радостин Раднев:<br>
                      <br>
                    </div>
                    Благодаря много! :)<br>
                    <br>
                  </div>
                  @Стоян Димитров:<br>
                  <br>
                </div>
                Вече минахме на „ти“ с теб, спокойно. :)<br>
                <br>
                Да, хубаво е да опиташ, но леко се съмнявам, че
                SourceForge ще ни позволят да имаме хранилище с над 5 GB
                данни, но кой знае. :D<br>
                <br>
                „Базата е UTF-8, а файловете са cp1251, което само по
                себе си е намаляване почти наполовина.“<br>
                <br>
              </div>
              Това е много странно, очаквах всички данни да са с
              кодировка UTF-8, вероятно Борислав Манолов не е променил
              кодировката на слоформената база от данни на IDI (по-точно
              старата ѝ версия, която е използвал), вероятно именно с
              цел да не увеличава излишно големината на файловете.<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    Аз прекодирах файловете преди да ги пусна тук с цел по-лесно
    сравняване с текстовата база на „БГ Офис“. В базата от „Читанка“
    всичко е utf.<br>
    <blockquote
cite="mid:CAEps0eR36vi6bjObfx-huY_SMAtRXZnXLTpajOxj5OAbR687Cw@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div>
          <div>
            <div><br>
            </div>
            CP1251 върши работа, но иначе би било по-добре всичко да е с
            кодировка UTF-8, но това винаги е на цената на по-голям
            размер на файловете. Плюс това и повечето от файловете на
            „БГ Офис“ са с CP1251, ако не се лъжа, тъй че това май не е
            проблем.<br>
          </div>
        </div>
      </div>
    </blockquote>
    Всичко е 1251. Ако трябва да съм честен не е много удобно при
    положение, че всички среди (shell), стандартно са utf, но не е
    фатално.<br>
    <blockquote
cite="mid:CAEps0eR36vi6bjObfx-huY_SMAtRXZnXLTpajOxj5OAbR687Cw@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div>
          <div><br>
          </div>
          <div>И все пак мисля, че би било нереалистично и неефективно
            да се ползва база от данни над 50 MB само за едната
            правописна проверка. Вариантът с 2 разклонения на базата от
            данни с думите ми се струва най-добър — тази без всичките
            слоформи на думите е стандартната (както е и сега), а
            другата да е за тези, които искат възможно най-пълна
            поддръжка на правописната проверка (например писатели,
            блогъри и т.н. хора, които пишат много (но не програмен
            код)).<br>
          </div>
        </div>
      </div>
    </blockquote>
    Обемът едва ли е проблем за някого ако проектът се разширява и
    подобрява, т. е. има осезаем ефект от тях като качество на крайния
    продукт.<br>
    <blockquote
cite="mid:CAEps0eR36vi6bjObfx-huY_SMAtRXZnXLTpajOxj5OAbR687Cw@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div>
          <div><br>
          </div>
          <div>П.П. Очаквам мненията ви за предложението за преминаване
            към GitHub/GitLab или гласове и обяснения в подкрепа на това
            да останем със SVN-то на SourceForge (има го и вариантът с
            ползване на git в SourceForge, както вече отбелязах). Все
            пак вероятно не е особено добра идея да фрагментираме пак
            проекта чрез едновременното поддържане и на SVN в
            SourceForge и на GitHub/GitLab/git-SourceForge (май
            синхронизацията между 2-те хранилища в този случай няма да
            бъде особено лесна, но пък и аз не съм специалист по
            синхронизацията между 2 хранилища на различни видове системи
            за следене на версиите).<br>
          </div>
        </div>
      </div>
    </blockquote>
    Не е невъзможно, но все пак някой трябва да го прави, а времето е
    безпощадно към такива начинания и рано или късно нещата ще излязат
    от синхрон (<- тази дума липсва в речника). Хранилището в
    SourceForge се използва от други хора, така че даже и да бъде
    планирана миграция към Git-нещо-си тя няма да стане веднага. Те
    трябва да бъдат уведомени, че може да има и съпротива.<br>
    Не съм против мигриране, но в случая гледам по-консервативно. За мен
    дали Git-нещо-си или старото хранилище се свежда до следния въпрос:
    какво добро ще донесе на проекта? Ако е само за по-добрия интерфейс
    не си струва. Нека се задвижат нещата и който има желание да започне
    да добавя низове, а като излязат наяве кривините на SVN-а тогава да
    мислим как да ги изправим.<br>
    <blockquote
cite="mid:CAEps0eR36vi6bjObfx-huY_SMAtRXZnXLTpajOxj5OAbR687Cw@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div>
          <div><br>
          </div>
          Поздрави,<br>
        </div>
        Sah War (sahwar)<br>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">На 9 април 2015 г., 16:52, Radostin
          Radnev <span dir="ltr"><<a moz-do-not-send="true"
              href="mailto:radnev@gmail.com" target="_blank">radnev@gmail.com</a>></span>
          написа:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div dir="ltr">Здравей,
              <div><br>
              </div>
              <div>Добавен си в проекта на SourceForge.</div>
              <div><br>
              </div>
              <div>Поздрави,</div>
              <div><br>
              </div>
            </div>
            <div class="HOEnZb">
              <div class="h5">
                <div class="gmail_extra"><br>
                  <div class="gmail_quote">2015-04-09 15:10 GMT+03:00
                    Стоян Димитров <span dir="ltr"><<a
                        moz-do-not-send="true"
                        href="mailto:stoyan@gmx.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:stoyan@gmx.com">stoyan@gmx.com</a></a>></span>:<br>
                    <blockquote class="gmail_quote" style="margin:0 0 0
                      .8ex;border-left:1px #ccc solid;padding-left:1ex">
                      <div bgcolor="#FFFFFF" text="#000000"> <font
                          face="Fira Sans">    70МБ е само архивът.
                          Самата база от данни е още по-голяма. Това
                          което не съм изпратил е една огромна таблица
                          (~</font><font face="Fira Sans"><span>4
                            милиона</span> реда) с име „derivative_form</font>“,
                        която предполагам е „разгънатия“ списък с думи.
                        Не съм сигурен, дали ще мога да я обърна в
                        същата структура, но ако настояваш мога да
                        опитам (хм, минах на „ти“). Има и друг фактор -
                        кодирането. Базата е UTF-8, а файловете са
                        cp1251, което само по себе си е намаляване почти
                        наполовина.
                        <div>
                          <div><br>
                            <br>
                            <div>На 09.04.2015 г. в 14:23, Sah War
                              написа:<br>
                            </div>
                            <blockquote type="cite">
                              <div dir="ltr">
                                <div>
                                  <div>
                                    <div>
                                      <div>
                                        <div>
                                          <div>@Радостин Раднев<br>
                                            <br>
                                          </div>
                                          <div>Засега ще се радвам и на
                                            мен да ми дадеш права за
                                            commit-ване към SVN
                                            хранилището на проекта „БГ
                                            Офис“ в SourceForge.
                                            Потребителското ми име в
                                            SourceForge е sahwar (<a
                                              moz-do-not-send="true"
                                              href="http://sourceforge.net/u/sahwar/profile/"
                                              target="_blank"><a class="moz-txt-link-freetext" href="http://sourceforge.net/u/sahwar/profile/">http://sourceforge.net/u/sahwar/profile/</a></a>).<br>
                                            <br>
                                          </div>
                                          <div>Аз предлагам да преместим
                                            всичко в GitHub, защото git
                                            ми се струва по-приятна за
                                            употреба, а интерфейсът на
                                            GitHub е много приятен.
                                            Инструкции за извършване на
                                            тази дейност има на следните
                                            страници:<br>
                                            <br>
                                            <a moz-do-not-send="true"
href="http://www.17od.com/2010/11/11/migrating-a-sourceforge-subversion-repository-to-github/"
                                              target="_blank">http://www.17od.com/2010/11/11/migrating-a-sourceforge-subversion-repository-to-github/</a><br>
                                            <a moz-do-not-send="true"
                                              href="https://twitter.com/ve4ernik/status/584102649114529792"
                                              target="_blank">https://twitter.com/ve4ernik/status/584102649114529792</a><br>
                                            <br>
                                          </div>
                                          <div>Ако искате, можем да
                                            извършваме основната дейност
                                            в GitHub, а само периодично
                                            да синхронизираме версията
                                            от GitHub с тази от
                                            SourceForge (в смисъл: да
                                            копираме новите неща от
                                            GitHub в SourceForge, като
                                            SVN-то в SF да е по принцип
                                            заключено за промени, а само
                                            от администратора да се
                                            добавят новите неща чрез
                                            копирането им от GitHub).
                                            Разбира се, ако държите да
                                            ползваме SVN, ще се примиря
                                            и с него. Но има и вариант
                                            SVN-то да се преобразува в
                                            git, което да е пак в
                                            SourceForge и именно него да
                                            клонираме в GitHub (и в GH
                                            да действаме), а при промени
                                            да вливаме промените обратно
                                            в git хранилището на
                                            SourceForge (аз съм лично за
                                            този вариант). Варианти има
                                            много...<br>
                                          </div>
                                          <div><br>
                                          </div>
                                          @Михаил Балабанов<br>
                                          <br>
                                          <blockquote style="margin:0px
                                            0px 0px
                                            0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex" class="gmail_quote">Иначе се
                                            присъединявам към
                                            препоръката изходните данни
                                            на проекта да останат във
                                            формат „основни форми +
                                            правила за формообразуване“
                                            и да не се превръщат в
                                            „плосък“ списък от
                                            разгърнати словоформи. Така
                                            обемът на данните е много
                                            по-обозрим от човек,
                                            по-лесно се допълва базата и
                                            се отстраняват грешки, а
                                            списъкът със словоформи така
                                            или иначе може да се
                                            генерира по всяко време в
                                            какъвто искаме формат.<br>
                                          </blockquote>
                                          <br>
                                        </div>
                                        <div>По принцип е така, но
                                          словоформите на думите в
                                          българския език не са винаги
                                          по най-често използвания модел
                                          на словообразуване на
                                          словоформите, поради което
                                          автоматичното генериране на
                                          словоформи просто няма как да
                                          е перфектно точно и винаги ще
                                          има грешки, макар и дребни...<br>
                                          <br>
                                        </div>
                                        <div>Ако се съгласите да
                                          преминем към използване на
                                          GitHub, там можем да направим
                                          2 копия на данните: master
                                          (основно копие, по което да се
                                          работи) и full-wordforms
                                          („плосък“ списък с разгърнати
                                          словоформи), като второто ще
                                          следва развитието на първото и
                                          промените в него.<br>
                                        </div>
                                        <div><br>
                                        </div>
                                        @Стоят Димитров<br>
                                        <br>
                                      </div>
                                      Много добра работа си свършил,
                                      браво. Но имам един въпрос. Файлът
                                      със SQL базата от данни на речника
                                      на <a moz-do-not-send="true"
                                        href="http://chitanka.info"
                                        target="_blank">chitanka.info</a>
                                      е около 70 МБ, а файла, който ти
                                      ни прати, е само 528 КБ, сигурен
                                      ли си, че това са всички данни от
                                      SQL файла, че ми се струва
                                      прекалено голямо намалението на
                                      размера на базата от данни, макар
                                      и преобразувана в текстов вид?<br>
                                      <br>
                                    </div>
                                    П.П. Ако по някаква причина ви
                                    харесва git, но не харесвате GitHub,
                                    защото кодът му не е пуснат, можем
                                    да помислим за инсталация на GitLab
                                    на нечий сървър (например на този на
                                    <a moz-do-not-send="true"
                                      href="http://ludost.net"
                                      target="_blank">ludost.net</a> или
                                    да помолим Борислав Манолов от <a
                                      moz-do-not-send="true"
                                      href="http://chitanka.info"
                                      target="_blank">chitanka.info</a>
                                    да ползваме неговата инсталация на
                                    GitLab?).<br>
                                    <br>
                                  </div>
                                  <div>В скоро време искам да пусна нови
                                    файлове за частта с речниците в
                                    BGOffice, затова са ми нужни права
                                    за SVN, докато не решим дали ще
                                    ползваме и занапред него или ще
                                    минем към GitHub/GitLab. :)<br>
                                  </div>
                                  <div><br>
                                  </div>
                                  Поздрави,<br>
                                </div>
                                Sah War (sahwar)<br>
                              </div>
                              <div class="gmail_extra"><br>
                                <div class="gmail_quote">На 3 април 2015
                                  г., 23:18, Стоян Димитров <span
                                    dir="ltr"><<a
                                      moz-do-not-send="true"
                                      href="mailto:stoyan@gmx.com"
                                      target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:stoyan@gmx.com">stoyan@gmx.com</a></a>></span>
                                  написа:<br>
                                  <blockquote class="gmail_quote"
                                    style="margin:0 0 0
                                    .8ex;border-left:1px #ccc
                                    solid;padding-left:1ex">
                                    <div bgcolor="#FFFFFF"
                                      text="#000000"> <font face="Fira
                                        Sans">    Здравейте,<br>
                                            Ето ги и първите добавени</font><font
                                        face="Fira Sans"><font
                                          face="Fira Sans"> от мен</font>
                                        думи [1]. Наистина са само, за
                                        да усетя процеса.<br>
                                        ___<br>
                                        [1] <a moz-do-not-send="true"
                                          href="http://sourceforge.net/p/bgoffice/code/479/"
                                          target="_blank">http://sourceforge.net/p/bgoffice/code/479/</a><br>
                                      </font>
                                      <div>
                                        <div><br>
                                          <div>На 29.03.2015 г. в 12:15,
                                            Стоян Димитров написа:<br>
                                          </div>
                                        </div>
                                      </div>
                                      <blockquote type="cite">
                                        <div>
                                          <div>    Здравейте, <br>
                                                интересува ме някой от
                                            вас знае ли дали в момента
                                            се извършва дейност по
                                            осъвременяването на модула
                                            за проверка на правописа в
                                            БГ Офис [1] по-скоро списъка
                                            с думи (вероятно се нарича
                                            „речник“). След
                                            предварителен преглед на
                                            базата данни [2] от речника
                                            [3] мисля, че списъкът с
                                            думи, включени в БГ Офис,
                                            може да бъде осъвременен и
                                            поддържан във форма
                                            сравнително лесно. Като
                                            допълнителен бонус процесът
                                            на обновяване може да бъде
                                            автоматизиран. Не е съм
                                            съвсем сигурен, но вероятно
                                            всички модули (напр.
                                            сричкопренасянето) и за
                                            всички продукти (OpenOffice,
                                            Mozilla) ще имат полза от
                                            това. <br>
                                                Започнал съм работа по
                                            файла .aff, което да послужи
                                            като шаблон за генерирането
                                            на допълнен речник за
                                            проверка на правописа, та
                                            идеята ми е да не се
                                            настъпим с някого. <br>
                                            <br>
                                            П.П. <br>
                                            Разборът, който е направен
                                            на изходния материал, за да
                                            бъде реализиран [3] е
                                            страхотна основа за
                                            надграждане и незная как е
                                            останал незабелязан до
                                            момента. Поздравления за
                                            автора! <br>
                                            <br>
                                            __ <br>
                                            [1] - <a
                                              moz-do-not-send="true"
                                              href="http://bgoffice.sf.net"
                                              target="_blank"><a class="moz-txt-link-freetext" href="http://bgoffice.sf.net">http://bgoffice.sf.net</a></a>
                                            <br>
                                            [2] - <a
                                              moz-do-not-send="true"
                                              href="http://rechnik.chitanka.info/db.sql.gz"
                                              target="_blank"><a class="moz-txt-link-freetext" href="http://rechnik.chitanka.info/db.sql.gz">http://rechnik.chitanka.info/db.sql.gz</a></a>
                                            <br>
                                            [3] - <a
                                              moz-do-not-send="true"
                                              href="http://rechnik.chitanka.info"
                                              target="_blank"><a class="moz-txt-link-freetext" href="http://rechnik.chitanka.info">http://rechnik.chitanka.info</a></a>
                                            <br>
                                            <br>
                                            <br>
                                            <fieldset></fieldset>
                                            <br>
                                          </div>
                                        </div>
                                        <span>
                                          <pre>_______________________________________________
Dict mailing list
<a moz-do-not-send="true" href="mailto:Dict@ludost.net" target="_blank">Dict@ludost.net</a>
<a moz-do-not-send="true" href="http://lists.ludost.net/cgi-bin/mailman/listinfo/dict" target="_blank">http://lists.ludost.net/cgi-bin/mailman/listinfo/dict</a>
</pre>
                                        </span></blockquote>
                                      <span><font color="#888888"> <br>
                                          <pre cols="72">-- 
С</pre>
                                        </font></span></div>
                                    <br>
_______________________________________________<br>
                                    Dict mailing list<br>
                                    <a moz-do-not-send="true"
                                      href="mailto:Dict@ludost.net"
                                      target="_blank">Dict@ludost.net</a><br>
                                    <a moz-do-not-send="true"
                                      href="http://lists.ludost.net/cgi-bin/mailman/listinfo/dict"
                                      target="_blank">http://lists.ludost.net/cgi-bin/mailman/listinfo/dict</a><br>
                                    <br>
                                  </blockquote>
                                </div>
                                <br>
                              </div>
                              <br>
                              <fieldset></fieldset>
                              <br>
                              <pre>_______________________________________________
Dict mailing list
<a moz-do-not-send="true" href="mailto:Dict@ludost.net" target="_blank">Dict@ludost.net</a>
<a moz-do-not-send="true" href="http://lists.ludost.net/cgi-bin/mailman/listinfo/dict" target="_blank">http://lists.ludost.net/cgi-bin/mailman/listinfo/dict</a>
</pre>
                            </blockquote>
                            <br>
                            <pre cols="72">-- 
С</pre>
                          </div>
                        </div>
                      </div>
                      <br>
                      _______________________________________________<br>
                      Dict mailing list<br>
                      <a moz-do-not-send="true"
                        href="mailto:Dict@ludost.net" target="_blank">Dict@ludost.net</a><br>
                      <a moz-do-not-send="true"
                        href="http://lists.ludost.net/cgi-bin/mailman/listinfo/dict"
                        target="_blank">http://lists.ludost.net/cgi-bin/mailman/listinfo/dict</a><br>
                      <br>
                    </blockquote>
                  </div>
                  <br>
                </div>
              </div>
            </div>
            <br>
            _______________________________________________<br>
            Dict mailing list<br>
            <a moz-do-not-send="true" href="mailto:Dict@ludost.net">Dict@ludost.net</a><br>
            <a moz-do-not-send="true"
              href="http://lists.ludost.net/cgi-bin/mailman/listinfo/dict"
              target="_blank">http://lists.ludost.net/cgi-bin/mailman/listinfo/dict</a><br>
            <br>
          </blockquote>
        </div>
        <br>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
Dict mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Dict@ludost.net">Dict@ludost.net</a>
<a class="moz-txt-link-freetext" href="http://lists.ludost.net/cgi-bin/mailman/listinfo/dict">http://lists.ludost.net/cgi-bin/mailman/listinfo/dict</a>
</pre>
    </blockquote>
    <br>
    <pre class="moz-signature" cols="72">-- 
С</pre>
  </body>
</html>