Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Upcoming SlideShare
Loading in …5
×

of

Real time cloud native open source streaming of any data to apache solr Slide 1 Real time cloud native open source streaming of any data to apache solr Slide 2 Real time cloud native open source streaming of any data to apache solr Slide 3 Real time cloud native open source streaming of any data to apache solr Slide 4 Real time cloud native open source streaming of any data to apache solr Slide 5 Real time cloud native open source streaming of any data to apache solr Slide 6 Real time cloud native open source streaming of any data to apache solr Slide 7 Real time cloud native open source streaming of any data to apache solr Slide 8 Real time cloud native open source streaming of any data to apache solr Slide 9 Real time cloud native open source streaming of any data to apache solr Slide 10 Real time cloud native open source streaming of any data to apache solr Slide 11 Real time cloud native open source streaming of any data to apache solr Slide 12 Real time cloud native open source streaming of any data to apache solr Slide 13 Real time cloud native open source streaming of any data to apache solr Slide 14 Real time cloud native open source streaming of any data to apache solr Slide 15 Real time cloud native open source streaming of any data to apache solr Slide 16 Real time cloud native open source streaming of any data to apache solr Slide 17 Real time cloud native open source streaming of any data to apache solr Slide 18 Real time cloud native open source streaming of any data to apache solr Slide 19 Real time cloud native open source streaming of any data to apache solr Slide 20 Real time cloud native open source streaming of any data to apache solr Slide 21 Real time cloud native open source streaming of any data to apache solr Slide 22 Real time cloud native open source streaming of any data to apache solr Slide 23 Real time cloud native open source streaming of any data to apache solr Slide 24 Real time cloud native open source streaming of any data to apache solr Slide 25 Real time cloud native open source streaming of any data to apache solr Slide 26 Real time cloud native open source streaming of any data to apache solr Slide 27 Real time cloud native open source streaming of any data to apache solr Slide 28 Real time cloud native open source streaming of any data to apache solr Slide 29 Real time cloud native open source streaming of any data to apache solr Slide 30 Real time cloud native open source streaming of any data to apache solr Slide 31
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0 Likes

Share

Download to read offline

Real time cloud native open source streaming of any data to apache solr

Download to read offline

Real time cloud native open source streaming of any data to apache solr

Utilizing Apache Pulsar and Apache NiFi we can parse any document in real-time at scale. We receive a lot of documents via cloud storage, email, social channels and internal document stores. We want to make all the content and metadata to Apache Solr for categorization, full text search, optimization and combination with other datastores. We will not only stream documents, but all REST feeds, logs and IoT data. Once data is produced to Pulsar topics it can instantly be ingested to Solr through Pulsar Solr Sink.

Utilizing a number of open source tools, we have created a real-time scalable any document parsing data flow. We use Apache Tika for Document Processing with real-time language detection, natural language processing with Apache OpenNLP, Sentiment Analysis with Stanford CoreNLP, Spacy and TextBlob. We will walk everyone through creating an open source flow of documents utilizing Apache NiFi as our integration engine. We can convert PDF, Excel and Word to HTML and/or text. We can also extract the text to apply sentiment analysis and NLP categorization to generate additional metadata about our documents. We also will extract and parse images that if they contain text we can extract with TensorFlow and Tesseract.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Real time cloud native open source streaming of any data to apache solr Utilizing Apache Pulsar and Apache NiFi we can parse any document in real-time at scale. We receive a lot of documents via cloud storage, email, social channels and internal document stores. We want to make all the content and metadata to Apache Solr for categorization, full text search, optimization and combination with other datastores. We will not only stream documents, but all REST feeds, logs and IoT data. Once data is produced to Pulsar topics it can instantly be ingested to Solr through Pulsar Solr Sink. Utilizing a number of open source tools, we have created a real-time scalable any document parsing data flow. We use Apache Tika for Document Processing with real-time language detection, natural language processing with Apache OpenNLP, Sentiment Analysis with Stanford CoreNLP, Spacy and TextBlob. We will walk everyone through creating an open source flow of documents utilizing Apache NiFi as our integration engine. We can convert PDF, Excel and Word to HTML and/or text. We can also extract the text to apply sentiment analysis and NLP categorization to generate additional metadata about our documents. We also will extract and parse images that if they contain text we can extract with TensorFlow and Tesseract.

Views

Total views

191

On Slideshare

0

From embeds

0

Number of embeds

2

Actions

Downloads

0

Shares

0

Comments

0

Likes

0

×