DescriptionAssignment 4
The goal of this assignment is to create a C program which:
-takes a text file as input
-tokenizes the given input
-Parses the tokenized input to determine if it is grammatically valid
This program will consist of four class files: Givens.c Tokenizer.c Parser.c and Analyzer.c . Each class file
has a corresponding header file where all constants, method declarations, struct definitions, include statements,
and global variables will be placed. (Header files are named the same as their corresponding class file but with a
.h file extension rather than a .c file extension) Tokenizer.h and Parser.h will include Givens.h and Analyzer.h
will include both Tokenizer.h and Parser.h .
Givens.c
Givens.c is provided with this assignment. Givens.c includes constants for TRUE and FALSE, an enum
containing all token values in the given lexical structure, a constant for the max size of a lexeme, and the
definition for a struct named lexics, which consists of an enum token property named token and a character
array property named lexeme. The lexics struct is used to store both a token and its corresponding lexeme.
Givens.c also provides two functions which return a boolean value indicating if the given String matches a
specified regular expression.
Tokenizer.c
Tokenizer.c is not provided and needs to be created. Tokenizer.c will read characters from a given FILE variable
and convert them into tokens. It will do so using a function defined as follows:
_Bool tokenizer(struct lexics *aLex, int *numLex, FILE *inf);
Which takes an array of type lexics, an int pointer representing the number of tokens in the input file, and a
pointer to a FILE. The tokenizer function will read characters from the given FILE parameter, creating lexemes
and the associated tokens. Each time a lexeme is generated, a new lexics struct will be created and the lexeme
added. The generated lexeme is then tokenized, the token is added to the generated lexics struct, the lexics struct
is then added to the end of the given lexics array. (Note: another option is to generate lexemes first, then
tokenize the generated lexemes)
The given lexical structure is free format and the location of tokens in the text file does not affect their meaning.
Alphanumeric lexemes will be delimited by both whitespace and by character lexemes. Because character
lexemes are used as delimiters, they cannot be constructed one token at a time. Rather the next several tokens in
the file will need to be examined to determine which (if any) character lexeme is present. (HINT: Because both
whitespace and character lexemes can be delimiters, split functions such as strtok do not provide the needed
functionality and should really be avoided)
The use of helper functions in the Tokenizer.c class is highly recommended. Once the tokenization process is
complete, the tokenizer function should return TRUE. If there occurs an error in the process, the function should
return FALSE.
Parser.c
Parser.c is not provided and needs to be created. Parser.c will implement a recursive decent parser based upon a
provided EBNF grammar. It will do so using a function defined as follows:
_Bool parser(struct lexics *someLexics, int numberOfLexics);
Which takes an array of type lexics and an int representing the number of tokens in the given lexics array. The
parser method must take the tokens (given in the array of lexics structs) and determine if they are legal in the
language defined by the given grammar. The purpose of our parser is to apply the grammar rules and report any
syntax errors. If no syntax errors are identified, parser returns TRUE, otherwise it returns FALSE.
Parser.c must be a recursive decent predictive parser which utilizes single-symbol lookahead. Parsers which
utilize multi-symbol lookahead will not be accepted. If given a grammatically valid input, every token given
must be parsed. If a syntax error is found, parsing does not need to continue. Parsers which do not consume
every given token for a grammatically valid input will not be accepted.
Analyzer.c
Analyzer.c is provided with this assignment. Analyzer.c includes a method to prompt the user for a file path, the
initialization of an array of type lexics and an int containing the number of lexics structs in the array (initialized
to 0). Analyzer.c makes calls both the tokenizer method and the parser method, passing the initialized int and
array to both functions.
All four class files need to be compiled into a single executable file. When run, this executable file is expected
to call the int main function defined in Analyzer.c with the tokenization and parsing functions being called from
Analyzer.c ’s int main. All programs will be graded with Gradescope, where you will submit Parser.c,
Tokenizer.c, and their corresponding header files. The -std=c99 flag may be used when testing your code as I
will be compiling with said flag when I grade your projects.
Final Notes
A set of text files will be provided for testing. There will be nine files which have syntactically valid inputs and
nine which have syntactically invalid inputs. All tests are lexicographically valid. These 18 files are identical to
the input data on Gradescope. If your program can successfully parse all 18 files, it should pass all tests on
Gradescope. Gradescope error messages will print a ^ character as a substitute for a n character.
No manual memory management is needed for this project. All memory can be statically allocated and I
recommend using static memory allocation as it reduces complexity.
There is no recommended development environment for this assignment, you are welcome to use whatever
application you are most comfortable with. However, you must develop this program on either a Linux
distribution or the macOS operating system. This is because Givens.c does not compile on Windows and to
ensure compatibility with Gradescope. If you are on a Windows machine, you have two options. Option one is
to use the Windows Subsystem for Linux and install Ubuntu. The second is to use the compile.vcu.edu server.
Instructions for both options will be posted to Canvas
Provided EBNF grammar:
function
header
arg-decl
body
statement-list
statement
while-loop
return
assignment
expression
term
–> header body
–> VARTYPE IDENTIFIER LEFT_PARENTHESIS [arg-decl] RIGHT_PARENTHESIS
–> VARTYPE IDENTIFIER {COMMA VARTYPE IDENTIFIER}
–> LEFT_BRACKET [statement-list] RIGHT_BRACKET
–> statement {statement}
–> while-loop | return | assignment | body
–> WHILE_KEYWORD LEFT_PARENTHESIS expression RIGHT_PARENTHESIS statement
–> RETURN_KEYWORD expression EOL
–> IDENTIFIER EQUAL expression EOL
–> term {BINOP term} | LEFT_PARENTHESIS expression RIGHT_PARENTHESIS
–> IDENTIFIER | NUMBER
Provided lexical structure:
LEFT_PARENTHESIS
–> (
RIGHT_PARENTHESIS
–> )
LEFT_BRACKET
–> {
RIGHT_BRACKET
–> }
WHILE_KEYWORD
–> while
RETURN_KEYWORD
–> return
EQUAL
–> =
COMMA
–> ,
EOL
–> ;
VARTYPE
–> int | void
IDENTIFIER
–> [a-zA-Z][a-zA-Z0-9]*
BINOP
–> + | * | != | == | %
NUMBER
–> [0-9][0-9]*
Grading Rubric:
Category
Functions and Classes properly defined
Header files correctly utilized
Tokenizer.c produces correct output
Parser.c produces correct output
Code is well commented
Code is well formatted
Total
Points
10
10
32.5
32.5
7.5
7.5
100
Purchase answer to see full
attachment