Airbyte, Dagster, Langchain Example
References:
- Implement AI data pipelines with Langchain, Airbyte, and Dagster
- LLM training pipelines with Langchain, Airbyte, and Dagster
- Using Airbyte with Dagster
- Langchain Airbyte JSON
Tested on Ubuntu 20.04:
- Install docker and docker-compose
- Install and start Airbyte
mkdir ~/build cd ~/build git clone https://github.com/airbytehq/airbyte.git cd airbyte # Edit BASIC_AUTH_PASSWORD in .env from the default ./run-ab-platform.sh
- Once you see an Airbyte banner, the UI is ready to go at http://localhost:8000. It actually runs at
0.0.0.0:8000
.- You will be asked for a username and password. By default, that’s username
airbyte
and passwordpassword
. - Once you deploy airbyte to your servers, be sure to change these in your
.env
file. - Configure Airbyte to connect
Sample Data
source toLocal JSON
destination. - Pick
test
as destination path. This will result in three output files/tmp/airbyte_local/_airbyte_raw_{stream_name}.jsonl
.
- You will be asked for a username and password. By default, that’s username
- Create the build folder:
mkdir ~/build/airbyte-dagster-langchain cd ~/build/airbyte-dagster-langchain
- Set up
venv
, and enter the virtual environment:python3 -m venv ~/.venv/langchain . ~/.venv/langchain/bin/activate pip install openai faiss-cpu requests beautifulsoup4 tiktoken dagster_managed_elements langchain dagster dagster-airbyte dagit
- Download https://github.com/airbytehq/dagster-langchain/blob/main/ingest.py
- Edit the
airbyte_loader
to point it to one of the configured/tmp/airbyte_local/_airbyte_raw_{stream_name}.jsonl
files - Add the user and password here:
python airbyte_instance = AirbyteResource( host="localhost", port="8000", username="airbyte", password="password", )
- Set up Dagster
if [ ! -d $DAGSTER_HOME ]; then mkdir -p $DAGSTER_HOME ]; fi touch ${DAGSTER_HOME}/dagster.yaml
- Put your
OPENAI_API_KEY
andDAGSTER_HOME
in your~/.bashrc
:export OPENAI_API_KEY=XXXg export DAGSTER_HOME=~/build/airbyte-dagster-langchain/dagster_home
- Re-source the
~/.bashrc
if necessary. Re-enter the virtual env. - Start Dagster:
dagster dev -f ingest.py -h 0.0.0.0
The
-h
parameter controls thedagit
host, and-p
controls the port. Default port is3000
. Useingest_all.py
for ingesting multiple sources. - Start Streamlit:
. ~/.venv/streamlit/bin/activate streamlit run chatbot.py --server.port 8080
- Open the ui at port 8501