References:
- Implement AI data pipelines with Langchain, Airbyte, and Dagster
- LLM training pipelines with Langchain, Airbyte, and Dagster
- Using Airbyte with Dagster
- Langchain Airbyte JSON
Tested on Ubuntu 20.04:
- Install docker and docker-compose
- Install and start Airbyte
mkdir ~/build cd ~/build git clone https://github.com/airbytehq/airbyte.git cd airbyte # Edit BASIC_AUTH_PASSWORD in .env from the default ./run-ab-platform.sh - Once you see an Airbyte banner, the UI is ready to go at http://localhost:8000. It actually runs at
0.0.0.0:8000.- You will be asked for a username and password. By default, that’s username
airbyteand passwordpassword. - Once you deploy airbyte to your servers, be sure to change these in your
.envfile. - Configure Airbyte to connect
Sample Datasource toLocal JSONdestination. - Pick
testas destination path. This will result in three output files/tmp/airbyte_local/_airbyte_raw_{stream_name}.jsonl.
- You will be asked for a username and password. By default, that’s username
- Create the build folder:
mkdir ~/build/airbyte-dagster-langchain cd ~/build/airbyte-dagster-langchain - Set up
venv, and enter the virtual environment:python3 -m venv ~/.venv/langchain . ~/.venv/langchain/bin/activate pip install openai faiss-cpu requests beautifulsoup4 tiktoken dagster_managed_elements langchain dagster dagster-airbyte dagit - Download https://github.com/airbytehq/dagster-langchain/blob/main/ingest.py
- Edit the
airbyte_loaderto point it to one of the configured/tmp/airbyte_local/_airbyte_raw_{stream_name}.jsonlfiles - Add the user and password here:
python airbyte_instance = AirbyteResource( host="localhost", port="8000", username="airbyte", password="password", ) - Set up Dagster
if [ ! -d $DAGSTER_HOME ]; then mkdir -p $DAGSTER_HOME ]; fi touch ${DAGSTER_HOME}/dagster.yaml - Put your
OPENAI_API_KEYandDAGSTER_HOMEin your~/.bashrc:export OPENAI_API_KEY=XXXg export DAGSTER_HOME=~/build/airbyte-dagster-langchain/dagster_home - Re-source the
~/.bashrcif necessary. Re-enter the virtual env. - Start Dagster:
dagster dev -f ingest.py -h 0.0.0.0The
-hparameter controls thedagithost, and-pcontrols the port. Default port is3000. Useingest_all.pyfor ingesting multiple sources. - Start Streamlit:
. ~/.venv/streamlit/bin/activate streamlit run chatbot.py --server.port 8080 - Open the ui at port 8501