Main Page | Alphabetical List | Data Structures | File List | Data Fields | Globals

mlex.h File Reference

Markup Language EXpression. More...

Go to the source code of this file.

Data Structures

struct  chunk_t
 the struct that is used to identify a piece of string More...


Functions

list_tmlmatch (char *data, char *exp, char *ret)
 Finds Markup Language chuks matching exp.

void mlmatch_print_results (list_t *res, char *str)
 debug functions that prints the resul matrix

void mlmatch_free_results (list_t *res)
 free the list of lists returned by mlmatch

char * mlmatch_get_result (int x, int y, list_t *res, char *s)
 gets a cell from the result matrix


Detailed Description

Markup Language EXpression.

Author:
Enrico Tassi   <sorry guy>

Definition in file mlex.h.


Function Documentation

list_t* mlmatch char *  data,
char *  exp,
char *  ret
 

Finds Markup Language chuks matching exp.

What is an ml-expression?
Simply a regular expression with some more infos about murkups.
Grammar:

  • MLEX := MREGEX | MTAGEX | MLEX MLEX | ''
  • MTAGEX := '{'REGEX'}' | '<'REGEX'>'
  • MREGEX := '['REGEX']' | REGEX
  • REGEX := regular expression
Example:
  • ".*<b>([0-9]*(Kb|Mb))</b>"
    This matches a generic size in bold.
  • ".*<(b|i)>([0-9]*(Kb|Mb))</(b|i)>"
    This matches a generic size in bold or italics, obviously it doesn't check if it opens with a b and closes with a /i.
  • "a<b>[c]{d}e{f}[g]<h>"
    This matches abdefgh, abeh and othe strigs created considering optionals the tags/strings between {} and []
Limitation:
  • You can use regular expressions inside tags or outside tags,but you can't use regexp with tags. For example it is impossible to specify an arbutrary number of "<b>".
  • A string, say an MREGEX not optional, cant start with [ since it is reserver for optional strings. You must put the expression into round brackets to avoid this.
  • The parser is not really smart. It always alternates a string with a tag, so an xpression "<a><b>" is interpreted as this sequence of tokens: "","<a>","","<b>".

What is an ml-get-expression?
It is the counterpart of a ml-expression. It selects what is important and what not.
Grammar:
  • MLGEX := REGGEX TAGGEX | MLGEX MLGEX | ''
  • TAGGEX := '<'EX'>' | '{'EX'}'
  • REGGEX := EX | '['EX']'
  • EX := 'X' | 'O'
Example:
  • If the ml-expression is ".*<b>.*<.*img.*src.*>.*</b>"
    and the ml-get-expression is "O<O>O<X>X<O>"
    and data is "<tt><b><img src="nice.jpg">hello</b>"
    mlmatch returns a list of length 2 (read: the nember of "X") the first defining "img src="nice.jpg"" and the second defining "hello".
Remembre that if an optional string/tag is used in the ml-expression, the corrspong optional string/tag signature must be used in the ml-get-expression.

A short explanation of how the engine works (considering the prevoius example):
  1. tokenize the strings:
    • "<tt><b><img src="nice.jpg">hello</b>" becames "","<tt>","","<b>","","<img src="nice.jpg">","hello","</b>"
    • ".*<b>.*<.*img.*src.*>.*</b>" becames ".*","<b>",".*","<.*img.*src.*>",".*","</b>"
    • "O<O>O<X>X<O>" becames "O","<O>","O","<X>","X","<O>"
  2. The ml-expression matches perfectly the data starting from the third token, since each regexp matches the corresponding token. so we obtain this sub-list of tokens "","<b>","","<img src="nice.jpg">","hello","</b>"
  3. The sublist has the same length of the ret expression and selecting only the tokens with a corresponding X we obtain {"img src="nice.jpg"","hello".}
Notes:
  • data, exp and ret MUST be modifyable. they will not be altered, but during processing they may be accessed in write.

Parameters:
data is a Markup Language file like an html page (must be modifyable)
exp is the ml-expression (must be modifyable)
ret is the ml-get-expression (must be modifyable)
Returns:
a list of list of chunk_t

void mlmatch_free_results list_t res  ) 
 

free the list of lists returned by mlmatch

char* mlmatch_get_result int  x,
int  y,
list_t res,
char *  s
 

gets a cell from the result matrix

mlmatch returns a list of lists. this is a matrix. each line is the list of X fields. Example:

  • src := "<b>hello</b> bad <i>guys</b>"
    exp := "<.*>.*</b>"
    ret := "<X>X<O>"
    calling
    rc = mlmatch(src,exp,ret);
    will return
    {{"b","hello"},
     {"i","guys"} }
    and the respective coordinates are from 0,0 to 1,1. For example "hello" is 1,0. The returned poiter must be freed by the caller.

Parameters:
x column
y row
res returned by mlmatch
s the src string
Returns:
a strdup of s chunked in the right position

void mlmatch_print_results list_t res,
char *  str
 

debug functions that prints the resul matrix


Generated on Wed May 5 15:48:04 2004 for LiberoPOPs by doxygen 1.3.6-20040222