SENNA is a software distributed under a non-commercial license, which outputs a host of Natural Language Processing (NLP) predictions: part-of-speech (POS) tags, chunking (CHK), name entity recognition (NER) and semantic role labeling (SRL).
SENNA is fast because it uses a simple architecture, self-contained because it does not rely on the output of existing NLP system, and accurate because it offers state-of-the-art or near state-of-the-art performance.
SENNA is written in ANSI C, with about 2500 lines of code. It requires about 150MB of RAM and should run on any IEEE floating point computer.
Proceed to the download page. Read the compilation section in you want to compile SENNA yourself. Try out a sanity check. And read about the usage.
We also provide binaries for a couple of platforms, in the same archive. The name of the executable for each platform is given as follow:
senna-linux64senna-linux32senna-win32.exesenna-osxEverything is included in a single tar-gzipped file (142MB). Proceed to the download page.
gcc -o senna -O3 -ffast-math *.cYou might want to add additional suitable optimization flags for your platform. SENNA also compiles fine with the Intel compiler (icc).
If speed is critical, we recommend to compile SENNA with the Intel MKL
library, which provides a very efficient BLAS. Add the definition USE_MKL_BLAS,
as well as correct MKL libraries and include path.
gcc -o senna -O3 -ffast-math *.c -DUSE_MKL_BLAS [...]
SENNA also compiles with ATLAS BLAS. On our platform, the handcrafted code compiled with the gcc command line shown above was faster. However, if you want to use it, you can compile it with:
gcc -o senna -O3 -ffast-math *.c -DUSE_ATLAS_BLAS [...]
gcc -o senna -O3 -ffast-math *.c -DUSE_APPLE_BLAS -framework AccelerateThis will compile against Apple BLAS libraries included in your system.
Win32 console project under Microsoft Visual Studio (you can
download the Express Edition). Add all the includes and C file
into the project, and build the solution.
We recommend to use Intel MKL for speed. See your MKL manual for adding proper libraries and includes. Add also
the preprocessor definition USE_MKL_BLAS in the project.
senna < sanity-test-input.txt > sanity-test-result.txtSENNA should create a file
sanity-test-result.txt which should be
identical to the provided sanity-test-output.txt file.
The file sanity-test-input.txt comes from the CoNLL 2000 chunking testing set. SENNA will output
all tags for this file. It should run in about 40 seconds on a recent desktop computer.
senna [options] < input.txt > output.txtOf course you can run SENNA in an interactive mode without the "pipes"
< and >.
Each input line is considered as a sentence. SENNA has its own
tokenizer for separating words, which can be deactivated with
the -usrtokens option.
SENNA outputs one line per "token", with all the corresponding tags (in IOBES format) on the same line. An empty line is inserted between each output sentence. The first column is the token. Tags for all task then follow by default (POS, CHK, NER and SRL). Tags for SRL are preceeded by a column which indicates if SENNA considered the token as a SRL verb or not ("-"). Then, there is one column per SRL verb.
SENNA supports the following options:
-h-verbose-notokentags-iobtags-brackettags-path <path>-usrtokens-posvbs-usrvbs <file>-pos
-chk
-ner
-srl| Task | Benchmark | Performance | |
|---|---|---|---|
| Part of Speech (POS) | (Toutanova et al, 2003) | (Accuracy) | 97.29% |
| Chunking (CHK) | CoNLL 2000 | (F1) | 94.32% |
| Name Entity Recognition (NER) | CoNLL 2003 | (F1) | 89.59% |
| Semantic Role Labeling (SRL) | CoNLL 2005 | (F1) | 75.49% |