[첫화면으로]UTF-8이전작업/링크패턴

마지막으로 [b]

1. 링크 패턴 고치기
1.1. 소스 수정
1.1.1. 환경 변수 변동
1.1.2. 추가되는 스타일쉬트 항목
1.1.3. 바뀌는 패턴
1.2. 참고 자료
1.3. 의견

1. 링크 패턴 고치기

InitLinkPatterns 에서 지정하는 각종 링크 패턴들을 UTF-8에 맞게 고쳐보자.

1.1. 소스 수정

1.1.1. 환경 변수 변동

불필요하다 싶은 옵션을 제거함

1.1.2. 추가되는 스타일쉬트 항목

/* 위키 페이지 링크 */
A.wikipagelink:link, A.wikipagelink:active, A.wikipagelink:visited, A.wikipagelink:hover {
    background: transparent;
    color: green;
    text-decoration: none;
}
A.wikipagelink:hover
{
    text-decoration: underline;
}

/* 위키 페이지 편집 링크 */
A.wikipageedit:link, A.wikipageedit:active, A.wikipageedit:visited, A.wikipageedit:hover {
    background: transparent;
    color: red;
    text-decoration: none;
}
A.wikipageedit:hover
{
    text-decoration: underline;
}

1.1.3. 바뀌는 패턴

그 외의 패턴은 InitLinkPatterns 함수의 diff로 설명을 대신함: 사실 패턴의 의미 자체가 달라진 건 거의 없고, oddmuse를 따라서 문자의 코드값 범위를 \x80-\xff 로 넓힌 게 핵심.

 sub InitLinkPatterns {
    my ($UpperLetter, $LowerLetter, $AnyLetter, $LpA, $LpB, $QDelim);

    # Field separators are used in the URL-style patterns below.
 #  $FS  = "\xb3";      # The FS character is a superscript "3"
-   $FS  = "\x7f";
+   $FS  = "\x1e";      # by gypark. from oddmuse
    $FS1 = $FS . "1";   # The FS values are used to separate fields
    $FS2 = $FS . "2";   # in stored hashtables and other data structures.
    $FS3 = $FS . "3";   # The FS character is not allowed in user data.
 ###############
 ### added by gypark
    $FS_lt = $FS . "lt";
    $FS_gt = $FS . "gt";
 ###
 ###############
-
-   $UpperLetter = "[A-Z";
-   $LowerLetter = "[a-z";
-   $AnyLetter   = "[A-Za-z";
-   if ($NonEnglish) {
-       $UpperLetter .= "\xc0-\xde";
-       $LowerLetter .= "\xdf-\xff";
-       $AnyLetter   .= "\xc0-\xff";
-   }
-   if (!$SimpleLinks) {
-       $AnyLetter .= "_0-9";
-   }
-   $UpperLetter .= "]"; $LowerLetter .= "]"; $AnyLetter .= "]";
+   $UpperLetter = "[A-Z\xc0-\xde]";
+   $LowerLetter = "[a-z\xdf-\xff]";
+   $AnyLetter   = "[A-Za-z\x80-\xff_0-9]";

    # Main link pattern: lowercase between uppercase, then anything
    $LpA = $UpperLetter . "+" . $LowerLetter . "+" . $UpperLetter
                 . $AnyLetter . "*";
    # Optional subpage link pattern: uppercase, lowercase, then anything
    $LpB = $UpperLetter . "+" . $LowerLetter . "+" . $AnyLetter . "*";

    if ($UseSubpage) {
        # Loose pattern: If subpage is used, subpage may be simple name
        $LinkPattern = "((?:(?:$LpA)?\\/$LpB)|$LpA)";
        # Strict pattern: both sides must be the main LinkPattern
        # $LinkPattern = "((?:(?:$LpA)?\\/)?$LpA)";
    } else {
        $LinkPattern = "($LpA)";
    }
    $QDelim = '(?:"")?';     # Optional quote delimiter (not in output)
-###############
-### replaced by gypark
-### anchor 에 한글 사용
-#  $AnchoredLinkPattern = $LinkPattern . '#(\\w+)' . $QDelim if $NamedAnchors;
-   $AnchoredLinkPattern = $LinkPattern . '#([0-9A-Za-z\xa0-\xff]+)' . $QDelim if $NamedAnchors;
-###
-###############
    $LinkPattern .= $QDelim;

    # Inter-site convention: sites must start with uppercase letter
    # (Uppercase letter avoids confusion with URLs)
    $InterSitePattern = $UpperLetter . $AnyLetter . "+";
    $InterLinkPattern = "((?:$InterSitePattern:[^\\]\\s\"<>$FS]+)$QDelim)";

+   # free link [[pagename]]
    if ($FreeLinks) {
        # Note: the - character must be first in $AnyLetter definition
-       #if ($NonEnglish) {
-           $AnyLetter = "[-,.()' _0-9A-Za-z\xa0-\xff]";
-       #} else {
-       #  $AnyLetter = "[-,.()' _0-9A-Za-z]";
-       #}
+       $AnyLetter = "[-,.()' _0-9A-Za-z\x80-\xff]";
    }
-   $FreeLinkPattern = "($AnyLetter+)";
    if ($UseSubpage) {
        $FreeLinkPattern = "((?:(?:$AnyLetter+)?\\/)?$AnyLetter+)";
+   } else {
+       $FreeLinkPattern = "($AnyLetter+)";
    }
    $FreeLinkPattern .= $QDelim;

-###############
-### added by gypark
-### 한글패이지에 anchor 사용
-### from Bab2's patch
-   $AnchoredFreeLinkPattern = $FreeLinkPattern . '#([0-9A-Za-z\xa0-\xff]+)' . $QDelim if $NamedAnchors;
-###
-###############
+   # anchored link
+   $AnchorPattern = '#([0-9A-Za-z\x80-\xff]+)';
+   $AnchoredLinkPattern = $LinkPattern . $AnchorPattern . $QDelim if $NamedAnchors;
+   $AnchoredFreeLinkPattern = $FreeLinkPattern . $AnchorPattern . $QDelim if $NamedAnchors;

    # Url-style links are delimited by one of:
    #   1.  Whitespace                           (kept in output)
    #   2.  Left or right angle-bracket (< or >) (kept in output)
    #   3.  Right square-bracket (])             (kept in output)
    #   4.  A single double-quote (")            (kept in output)
    #   5.  A $FS (field separator) character    (kept in output)
    #   6.  A double double-quote ("")           (removed from output)
-
-   $UrlProtocols = "http|https|ftp|afs|news|nntp|mid|cid|mailto|wais|mms|mmst|"
-                   . "prospero|telnet|gopher";
-   $UrlProtocols .= '|file'  if $NetworkFile;
+   $UrlProtocols = 'http|https|ftp|afs|news|nntp|mid|cid|mailto|wais|mms|mmst|prospero|telnet|gopher|irc';
+   $UrlProtocols .= '|file' if $NetworkFile;
    $UrlPattern = "((?:(?:$UrlProtocols):[^\\]\\s\"<>$FS]+)$QDelim)";
    $ImageExtensions = "(gif|jpg|png|bmp|jpeg|GIF|JPG|PNG|BMP|JPEG)";
    $RFCPattern = "RFC\\s?(\\d+)";
-###############
-### replaced by gypark
-### ISBN 패턴 수정
-#  $ISBNPattern = "ISBN:?([0-9- xX]{10,})";
    $ISBNPattern = "ISBN:?([0-9-xX]{10,})";
-###
-###############
 }

1.2. 참고 자료

Oddmuse 의 해당 함수:

sub InitLinkPatterns {
  my ($UpperLetter, $LowerLetter, $AnyLetter, $WikiWord, $QDelim);
  $QDelim = '(?:"")?';# Optional quote delimiter (removed from the output)
  $WikiWord = '[A-Z]+[a-z\x80-\xff]+[A-Z][A-Za-z\x80-\xff]*';
  $LinkPattern = "($WikiWord)$QDelim";
  $FreeLinkPattern = "([-,.()' _0-9A-Za-z\x80-\xff]+)";
  # Intersites must start with uppercase letter to avoid confusion with URLs.
  $InterSitePattern = '[A-Z\x80-\xff]+[A-Za-z\x80-\xff]+';
  $InterLinkPattern = "($InterSitePattern:[-a-zA-Z0-9\x80-\xff_=!?#\$\@~`\%&*+\\/:;.,]*[-a-zA-Z0-9\x80-\xff_=#\$\@~`\%&*+\\/])$QDelim";
  $FreeInterLinkPattern = "($InterSitePattern:[-a-zA-Z0-9\x80-\xff_=!?#\$\@~`\%&*+\\/:;.,()' ]+)"; # plus space and other characters, and no restrictions on the end of the pattern
  $UrlProtocols = 'http|https|ftp|afs|news|nntp|mid|cid|mailto|wais|prospero|telnet|gopher|irc';
  $UrlProtocols .= '|file'  if $NetworkFile;
  my $UrlChars = '[-a-zA-Z0-9/@=+$_~*.,;:?!\'"()&#%]'; # see RFC 2396
  my $EndChars = '[-a-zA-Z0-9/@=+$_~*]'; # no punctuation at the end of the url.
  $UrlPattern = "((?:$UrlProtocols):$UrlChars+$EndChars)";
  $FullUrlPattern="((?:$UrlProtocols):$UrlChars+)"; # when used in square brackets
  $ImageExtensions = '(gif|jpg|png|bmp|jpeg)';
}

1.3. 의견

소스 보다가 잘 모르겠어서... $QDelim은 뭐하는 걸까요?

-- ㅈㅍ 2011-12-2 11:12 am

따옴표 두 개 쓰면 화면에는 안 보이는 구분자가 되지요. "Google:조프위키를 보면" 여기서 '키'와 '를' 사이에 넣었습니다.
-- Raymundo 2011-12-2 11:46 pm

앗 그렇군요. 감사합니다. 그러고보니 다른 위키에도 다 있는건데 왜 이걸 있을꺼라고 생각을 안했지...
-- 조프 2011-12-4 8:23 pm
이름:  
Homepage:
내용:
 

위키위키분류

마지막 편집일: 2011-12-4 8:23 pm (변경사항 [d])
1286 hits | Permalink | 변경내역 보기 [h] | 페이지 소스 보기